[PDF] Identification of deregulated transcription factors involved in subtypes of cancers

Abstract

We propose a methodology for the identification of transcription factors involved in the deregulation of genes in tumoral cells. This strategy is based on the inference of a reference gene regulatory network that connects transcription factors to their downstream targets using gene expression data. The behavior of genes in tumor samples is then carefully compared to this network of reference to detect deregulated target genes. A linear model is finally used to measure the ability of each transcription factor to explain those deregulations. We assess the performance of our method by numerical experiments on a breast cancer data set. We show that the information about deregulation is complementary to the expression data as the combination of the two improves the supervised classification performance of samples into cancer subtypes.

Full PDF

IIdentiﬁcation of Deregulated Transcription FactorsInvolved in Speciﬁc Bladder Cancer Subtypes

Magali Champion , Julien Chiquet , Pierre Neuvial , Mohamed Elati ,Franois Radvanyi , and Etienne Birmel Laboratoire MAP5, UMR8145 CNRS, Universit Paris Descartes, Paris, France [email protected] UMR MIA-Paris, AgroParisTech, INRA, Universit Paris-Saclay, Paris, France Institut de Mathmatiques de Toulouse, UMR 5219, Universit de Toulouse, CNRS, France Universit de Lille, INSERM U908, Cell Plasticity and Cancer, Lille, France Institut Curie, PSL Research University, CNRS, UMR144, Paris, France

Abstract

Comparison between tumoral and healthy cells may reveal abnormal regulation behav-iors between a transcription factor and the genes it regulates, without exhibiting diﬀerentialexpression of the former genes. We propose a methodology for the identiﬁcation of tran-scription factors involved in the deregulation of genes in tumoral cells. This strategy isbased on the inference of a reference gene regulatory network that connects transcriptionfactors to their downstream targets using gene expression data. Gene expression levels intumor samples are then carefully compared to this reference network to detect deregulatedtarget genes. A linear model is ﬁnally used to measure the ability of each transcriptionfactor to explain these deregulations. We assess the performance of our method by nu-merical experiments on a public bladder cancer data set derived from the Cancer GenomeAtlas project. We identify genes known for their implication in the development of speciﬁcbladder cancer subtypes as well as new potential biomarkers.

Today, after decades of intensive research, cancer is still one of the most deadly diseases world-wide, killing millions of people every year. Cancer is mainly caused by somatic mutations thataﬀect critical genes and pathways. These mutations are mostly triggered by environmentalfactors (e.g. obesity, smoking, alcohol, lifestyle,...) often promoted by certain genetic conﬁgu-rations. In the last two decades, large-scale projects, such as the Cancer Genome Atlas project(TCGA), which has produced massive amounts of multi-omics data, have launched to improveour understanding of cancers [32]. In this context, developing statistical algorithms able tointerpret these large data sets and to identify genes that are the origin of diseases and theircausal pathways still remains an important challenge.Genes are commonly aﬀected by genomic changes in the pathogenesis of human cancer.Cancer is moreover a heterogeneous disease, with aﬀected gene sets that may be highly diﬀerentdepending on subtypes, and thus requires diﬀerent treatments of patients. Speciﬁc analyses ofsubtypes have for example revealed signiﬁcant diﬀerences between breast cancer subgroups [13]but also pancancer similarities between breast and bladder cancer subgroups [16].Using transcriptional data allows to look beyond DNA, that is to study abnormalities interms of gene expression. As a common approach, diﬀerential expression analysis, for whichstatistical procedures have been intensively explored, can be performed and altered genes arethen diﬀerentially expressed genes [12]. This points to relevant genes but does not take intoaccount the regulations (activation and inhibition) between genes. a r X i v : . [ q - b i o . M N ] A p r dentiﬁcation of Deregulated TFs in Bladder Cancer Champion, Chiquet, Neuvial, Elati, Radvanyi, Birmel The approach we consider consists in taking into account the regulation structure betweengenes. We particularly focus on transcription factors (TFs), for their major role played in theregulation of gene expression, which make them an attractive target for cancer therapy [29, 36].Regulation processes between TFs and their targets are usually represented by Gene RegulatoryNetworks (GRNs). In the last few years, many diﬀerent methods have been proposed to inferGRNs from collections of gene expression data. In a discrete framework, gene expression canbe discretized depending on their status (under/over-expressed or normal) and truth tablesprovide the regulation structure [9]. In the continuous case, regression methods, including thepopular Lasso [33] and its derivatives, have provided powerful results [18, 25].A deregulated gene then corresponds to a gene whose expression does not correspond tothe expression level expected from its regulators expression. It is diﬀerent from the notion ofdiﬀerential expression since a loss of regulation between a target gene and one of its regulat-ing TFs implies a loss of correlation between them but not necessarily diﬀerential expression.Conversely, a TF can be diﬀerentially expressed and one of its targets not, precisely because itis deregulated.To discover deregulated genes, a ﬁrst possibility is to infer one network per condition andto compare them. Statistical diﬃculties due to the noisy nature of transcriptomic data and thelarge number of features compared to the sample size can be taken into account by inferring thenetworks jointly and penalizing the diﬀerences between them [4]. A second possible approach isto assess the adequacy of gene expression in tumoral cells to a reference GRN, in order to exhibitthe most striking discrepancies, i.e. the regulations which are not fulﬁlled by the data [21, 23].Such methods however focus on checking the validity of the network rather than highlightinggenes with an abnormal behavior. Finally, analyses may be conducted at the pathway levelrather than the gene level [11, 14]. They are then not network-wide in the sense that each genehas a deregulation score by pathway it belongs to and pathways are treated independently.Moreover, as the pathways are extracted from curated databases, the regulations taken intoaccount are not tissue-speciﬁc.Here, we propose a statistical deregulation model that uses gene expression data to identifyderegulated TFs involved in speciﬁc subtypes of cancer. This paper is organized as follows: inSection 1, we present the 3-steps method we developed and our validation procedure. In Section2, we illustrate its interest on the TCGA bladder cancer data set. We show that it can be usedcomplementary to diﬀerential expression analysis to point to potential biomarkers of cancers.

Our approach for the identiﬁcation of deregulated transcription factors (TFs) involved in cancersis based on a 3-steps strategy that ( i ) creates a reference gene regulatory network (GRN),which represents regulations between groups of co-expressed TFs and target genes using areference data set (Step 1), ( ii ) computes a deregulation score for each target gene in eachtumor sample by comparing their behavior with the reference GRN (Step 2), ( iii ) identiﬁesthe most signiﬁcant TFs involved in the deregulation of the target genes in each sample fromspeciﬁc cancer subtypes (Step 3). These steps are presented in Figure 1 and described in detailin the next sections.2 dentiﬁcation of Deregulated TFs in Bladder Cancer Champion, Chiquet, Neuvial, Elati, Radvanyi, Birmel GRN inference

Step 1

Deregulation score

Step 2 T a r g e t g e n e s Tumor samples Subtypes T F s Step 3

Deregulated TFs

Figure 1: Workﬂow of the proposed 3-steps algorithm for identifying TFs involved in speciﬁccancer subtypes.

Step 1 of the algorithm consists in inferring a GRN that connects TFs to their downstreamtargets. Among the large number of existing methods, we choose hLICORN, available in the

CoRegNet

R-package [30]. This algorithm is based on a hybrid version of the LICORN model[17], in which groups of co-regulated TFs act together to regulate the expression of their targets(Figure 2). More precisely, LICORN uses heuristic techniques to identify co-activator and co-inhibitor sets from discretized gene expression matrices and locally associates each target geneto pairs of co-activators and co-inhibitors that signiﬁcantly explain its discretized expression.The hybrid variation of LICORN then ranks the local candidate networks according to howwell they predict the target gene expression, through a linear regression, and selects the GRNthat minimizes the prediction error. This selection step limits the eﬀects of overﬁtting, inducedby the model complexity, especially the large number of features (genes) as compared to thesample size [3]. In this work, we slightly enrich the LICORN model by creating a copy of eachTF in the target layer to allow regulations between TFs.TF TF TF TF TF TF TF g g TFsCo-regulatorsTarget genesFigure 2: Example of LICORN graph involving 7 TFs and 2 target genes. TFs are gathered intogroups of co-expressed genes that co-regulate (square for co-activators, circle for co-inhibitors)each target gene.To construct a speciﬁc GRN, note that one may prefer using another inference method [5]3 dentiﬁcation of Deregulated TFs in Bladder Cancer Champion, Chiquet, Neuvial, Elati, Radvanyi, Birmel or a pre-existing regulatory network, which can be loaded from the RegNetwork database [26].Here, we focus on hLICORN since the induced model is particularly suitable for the rest ofour analysis. In addition, it was shown to provide powerful results for cooperative regulationdetection, especially on cancer data set [17, 30].

Step 2 of the algorithm aims at identifying deregulated target genes by carefully comparingtheir expression across all tumor samples with the reference GRN inferred in Step 1. For thispurpose, we use the method described in [19], which assumes that all genes from a hLICORNmodel are allowed to be deregulated, i.e. not to respond to their regulators as expected.More precisely, according to the hLICORN model, each gene g is connected with a set of co-regulated TFs split into a group of co-activators A and co-inhibitors I . A binary deregulationvariable D g , assumed to be non-zero with probability Y , is then introduced to compare the truestatus S g (under/over-expressed or normal) of each target gene in each tumor sample with itsexpected value S ∗ g , resulting from a truth Table (see Figure 3 (b)) and the inferred GRN. Toavoid discretization of the data, the status of all genes are considered as hidden variables. Themodel is described in Figure 3 (a).Co-activator set A Co-inhibitor set I S A S I Collectivestatus S ∗ g Expected status S g True status X g expression of gene g D g Deregulation variable (a) Deregulation model.

ActivatorInhibitor collective statuscollective status - 0 +- e0e + +0 - 0 ++ - - - (b) Truth table.

Figure 3: (a) The deregulation model [19] used to compute a deregulation score for each targetgene in each sample: each gene g is associated to a hidden status S g (under, over-expressedor normal). Target genes are allowed to be deregulated, i.e. not follow their co-regulator rules(Truth table (b)). The binary variable D g indicates whether the corresponding target gene g is deregulated ( D g = 1) or not ( D g = 0). The deregulation score Y of gene g in sample j is then the probability, given the observation, that D g = 1 in sample j . (b) LICORN truthtable, which gives the expected status of a target gene according to the collective status of itsco-activators and co-inhibitors. Collective status are set by default to 0 except if and only if allof its elements share the same status. This table is derived from biological experiments [17].As the number of hidden variables grows exponentially with the number of genes, the like-4 dentiﬁcation of Deregulated TFs in Bladder Cancer Champion, Chiquet, Neuvial, Elati, Radvanyi, Birmel lihood of the model rapidly becomes intractable. The unknown parameters, including thederegulation score Y , are thus estimated using a dedicated EM-algorithm (see [19] for moredetails). Note that the deregulation score Y does not capture information about diﬀerentiallyexpressed genes but genes whose expression does not correspond to the level expected from itsregulator expression. Step 3 consists in identifying TFs that cause deregulations of target genes. Our approach isbased on linear regression models, in which we try to explain the deregulation score of all targetgenes in one sample (Step 2) using their co-regulator TFs as explanatory variables (Step 1).Assume that we have q TFs and p target genes. Denote by Y ij the deregulation score of targetgene i (1 ≤ i ≤ p ) in sample j (1 ≤ j ≤ n ) and G := ( G i(cid:96) ) ≤ i ≤ p, ≤ (cid:96) ≤ q the GRN adjacencymatrix, whose non-zero elements encode the structure (edges) of the graph. We then cast ourmodel as follows: ∀ j ∈ (cid:74) , n (cid:75) , ∀ i ∈ (cid:74) , p (cid:75) , Y ij = G i(cid:96) · B (cid:96)j + ε ij , (1)or, in a matrix form, Y = G · B + ε , where each element B (cid:96)j of matrix B , to estimate, measuresthe deregulation importance of TF (cid:96) in sample j and ε stands for the presence of noise.Solving the B -estimation problem (1) can be viewed as a classical multi-task linear learningproblem, in which the number of observations is the number of target genes p , the numberof linear tasks is n and the number of variables q . To estimate B , we use a constrained leastsquares estimation procedure. As we only expect to ﬁnd TFs positively causing the deregulationof their targets in each sample, we consider the induced constrained optimization problem: ∀ j ∈ (cid:74) , n (cid:75) , ˆ B · j := argmin β ∈ R q (cid:107) Y · j − Gβ (cid:107) , (2) s.t ∀ (cid:96) ∈ (cid:74) , q (cid:75) , ≤ β (cid:96) ≤ (cid:107) . (cid:107) stands for the euclidian norm. The closer ˆ B (cid:96)j is to 1, the more important the roleof TF (cid:96) in the deregulation of its targets in sample j . To solve Eq. (2), we use the limSolve R-package.

Gene expression is commonly aﬀected by copy number alterations (CNA) [1]. Step 2 of ourprocedure is particularly sensitive to CNA, associating high deregulation scores to ampliﬁedor deleted target genes [19]. Indeed, the number of copies of a gene can strongly inﬂuenceits expression, independently from its regulators expression, making some regulations wronglyderegulated.To remove CNA eﬀects on gene expression and improve the rest of our analysis, we preprocesstarget genes expression data beforehand as proposed in [15]. Gene expression is considered aslinearly modiﬁed by CNA through the linear regression model: X ij = α + α CNA ij + ε ij , (3)where X ij is the expression of gene j in sample i and CNA ij its associated copy number. Letˆ α and ˆ α be the estimated solutions of Eq. (3), the corrected expression is then given by:˜ X ij = X ij − ˆ α − ˆ α CNA ij . dentiﬁcation of Deregulated TFs in Bladder Cancer Champion, Chiquet, Neuvial, Elati, Radvanyi, Birmel We apply our method on bladder cancer data, produced in the framework of the Cancer GenomeAtlas (TCGA) project and available at the Genomic Data Commons Data Portal ( https://portal.gdc.cancer.gov/ ). These data include a set of 401 bladder cancer samples withgene expression and copy number for a total number of 15,430 genes, split into 2,020 TFs and13,410 targets. Gene expression data were produced using RNA-sequencing on bladder cancertissues. Preprocessing is done by log-transformation and quantile-normalization of the arrays.Missing values are estimated using nearest neighbor averaging [35]. TCGA samples are analyzedin batches and signiﬁcant batch eﬀects are observed based on a one-way analysis of variance inmost data modes. We apply Combat [22] to adjust for these eﬀects. Genes are ﬁnally ﬁlteredbased on their variability: among them, we only keep the 75% most varying genes.Based on RNA-seq data analysis from the TCGA data portal, samples are split into ﬁvesubtypes: basal-squamous (BaSq), luminal (Lum), luminal-inﬁltrated (LumI), luminal-papillary(LumP) and neuronal (NE) with diﬀerent characteristics [10] (Table 1).Subtypes BaSq Lum LumI LumP NESamples 131 44 74 134 18Table 1: Molecular subtypes distribution of the 401 bladder cancer samples [10] .

GRN network.

To validate our method, we have to provide a tissue-speciﬁc reference GRN(Step 1), which is computed given a ﬁrst set of reference samples. In many cancers, the purenormal tissue of origin is not available. Here, we work with the ﬁve diﬀerent subtypes of theTCGA data set presented in Section 3.1. Using samples from one subtype as test cases andthe rest as reference, we infer ﬁve diﬀerent GRNs. Each of them reﬂect averaged relationshipsbetween genes for patients who are not part of one speciﬁc subtype. Due to the very-highheterogeneity of cancers, especially of bladder cancers [24, 34], we think that our method willstill point to relevant deregulations of speciﬁc subtypes.After calibrating the internal parameters of the hLICORN agorithm, the GRNs we inferare made of an averaged total number of 28 ,

246 edges connecting 586 TFs to 3 ,

432 of theirtargets. These networks are relatively sparse, each of the target genes being associated with anaveraged number of 8 TFs.

Deregulation scores.

We then run ﬁve times the EM procedure (Step 2) on the ﬁve subsetsof the gene expression data matrix to compute a deregulation score of each target gene in eachsample of each subtype. From now on, all samples are treated individually, the results reﬂectinghow genes behaved in each sample of one subtype in comparison to reference samples from allother subtypes.To check the eﬀect of the copy number correction we apply at the beginning of our procedure,we compare the distribution of the deregulation scores across copy number states. To this aim,we use the TCGA CNA thresholded data set, which associates to each gene-sample pair a6 dentiﬁcation of Deregulated TFs in Bladder Cancer Champion, Chiquet, Neuvial, Elati, Radvanyi, Birmel copy number state of “0” for the diploid state (two copies), “1” for a copy number gain, “-1”for a copy number loss, “2” for an ampliﬁcation and “-2” for a deletion. We then test forsigniﬁcant diﬀerences between the diploid state and the altered states (-2,-1,1,2) using Studenttests. Results in terms of p-values, which are corrected for multiple hypothesis testing using theFDR [2], are presented in Table 2. With corrected p-values ranging from 0.10 to 1, deregulationscores are no longer associated with CNA.

SubtypesBaSq Lum LumI LumP NE-2 -1 1 2 -2 -1 1 2 -2 -1 1 2 -2 -1 1 2 -2 -1 1 21 0.81 0.28 1 1 1 1 1 1 0.25 0.10 0.60 1 1 1 1 1 1 1 1

Table 2: Corrected p-values for Student tests when comparing the distribution of the deregu-lation scores between the diploid state (0) and each altered state (-2,-1,1,2) for each subtype.

Deregulated TFs.

We ﬁnally apply Step 3 of our method to identify TFs involved in thederegulation scores of the target genes, that is having a non-zero coeﬃcient in ˆ B , as given inEq. (1). We then rank the TFs according to their number of non-zero coeﬃcients across allsamples belonging to each speciﬁc subtype. Results are presented in Table 3. SubtypesBaSq Lum LumI LumP NETF % ˆ B TF % ˆ B TF % ˆ B TF % ˆ B TF % ˆ B SPOCD1 92% ZNF268 91% TSHZ1 88% RARB 84% FAIM3 89%ZNF382 86% HES2 80% ZNF354B 88% RFX5 84% SMARCA2 83%RCOR2 86% TBX2 80% AR 85% CBFA2T3 83% RARB 78%ATM 83% PRDM8 75% HES2 82% TBX18 81% ZNF235 78%HABP4 83% TSHZ1 75% HTATIP2 81% TBX3 79% TBX2 72%IRX3 82% ZNF354C 73% MAFG 80% PTRF 79% STAT3 72%IFI16 79% RARB 70% ENO1 80% TBX2 70% HIF1A 72%TEAD2 79% KLF13 70% TBX2 74% PPARG 76% THRA 72%NOTCH4 79% SCML2 68% ZNF563 74% NCOR2 75% PIR 67%SNAI2 79% SNAI3 68% IRX3 72% ZFP2 75% FOSL1 67%

Table 3: List of the 10 most important TFs for explaining the deregulation scores of theirtargets and number of non-zero coeﬃcients in ˆ B (in %) across all samples from each subtype. Top TFs include biomarkers of bladder cancer.

Among TFs of Table 3, we retrievecharacteristic genes of bladder cancer subtypes. For instance, SNAI2, which is deregulatedacross 79% of the BaSq samples, is particularly well-known for its implication in EMT pathwaysfor cancer patients [7] and its capacity to discriminate between basal and luminal subgroups[28]. The presence of NOTCH4 in BaSq samples is particularly interesting as it is part of7 dentiﬁcation of Deregulated TFs in Bladder Cancer Champion, Chiquet, Neuvial, Elati, Radvanyi, Birmel the NOTCH pathway, whose inactivation tends to promote bladder cancer progression [27].Research works also focus on its implication on the basal subgroup [20]. Similarly, TBX2,involved in all three luminal subtypes is an indicator of luminal cancers [8]. We can ﬁnallyemphasize the presence of PPARG in LumP, whose high level of expression is used to describeluminal subtypes [6].

Deregulation is complementary to diﬀerential gene expression analysis

Diﬀeren-tial gene expression analysis consists in performing statistical analysis to discover quantitativechanges in terms of expression levels between groups. It is frequently used in cancer research toidentify genes with important changes between tumor and normal samples, called diﬀerentiallyexpressed genes (DEGs) [31].We perform diﬀerential gene expression analysis using the R -package limma [31] on all sam-ples from each subtype when comparing to samples from all other subtypes. We then verifywhether the identiﬁed DEGs are diﬀerent from the deregulated TFs derived from our method(Figure 4). To this aim, we use the following thresholds: a gene is called DEG for p-valuessmaller than 0.01 whereas it is deregulated for a subtype as soon as it is deregulated ( ˆ B (cid:54) = 0)for more than the 50% of the subtype samples. This threshold is purely arbitrary but is notcrucial, as the results remain almost the same with slight changes. As shown in Figure 4, exceptfor BaSq and LumP subtypes, more than the 70% of the identiﬁed deregulated TFs are notdiﬀerentially expressed, which means that our procedure does not only point to DEGs.879 48107BaSq 193 6517Lum 366 5723LumI 784 4591LumP 337 4612NEFigure 4: Venn Diagrams representing the number of DEGs (in pink), the number of deregulatedTFs identiﬁed by our method (in blue) and their intersection. Conclusion

With the aim of understanding the deregulation processes in tumoral cells, we develop a 3-stepsstrategy that measures the inﬂuence of TFs in the deregulation of genes in tumor samples. A listof TFs characterizing given subtypes can then be established. Even if a biological experimentalvalidation should be done in future work, it seems that it can be used complementary todiﬀerential gene expression analysis to point to potential biomarkers of cancers.An open question, which has also to be tackled, is to determine in which extend the in-formation carried by mutations can explain the deregulations. Mutation data are particularlyhard to explore in this context due to various reasons : ﬁrst of all, mutations do not necessarilyaﬀect gene expression. Secondly, in cancers, besides the most signiﬁcant mutated genes, manysequencing projects have shown that genes are mutated in less than 5% of the samples.In this work, among the identiﬁed TFs of Table 3, we ﬁnd ATM, which is highly deregulated(83%) and mutated (15%) for BaSq samples. Mutations of ATM have been recently shown to beassociated with shorter survival in urothelial cancers [37]. As a preliminary result, we observethat 95% of the mutated BaSq samples corresponds to non-zero ˆ B coeﬃcients (Table 4). This8 dentiﬁcation of Deregulated TFs in Bladder Cancer Champion, Chiquet, Neuvial, Elati, Radvanyi, Birmel table is unfortunately still too unbalanced to positively conclude for a signiﬁcant associationbut supplementary works need to be done to go further.ˆ B (cid:54) = 0 ˆ B = 0Non mutated 91 21Mutated 18 1Table 4: Confusion matrix indicating the association between mutation and deregulation statusˆ B for TF ATM across all 131 basal samples. References [1] P. Aldred, E. Hollox, and J. Armour. Copy number polymorphism and expression level variationof the human alpha-defensin genes DEFA1 and DEFA3.

Hum Mol Genet , 14(14):2045–2052, 2005.[2] Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: a practical and powerfulapproach to multiple testing.

J R Stat Soc Ser B Methodol , 57(1):289–300, 1995.[3] I. Chebil, R. Nicolle, G. Santini, C. Rouveirol, and M. Elati. Hybrid method inference for theconstruction of cooperative regulatory network in human.

IEEE Trans Nanobioscience , 13(2):97–103, 2014.[4] J. Chiquet, Y. Grandvalet, and C. Ambroise. Inferring multiple graphical structures.

Stat Comput ,21(4):537–553, 2011.[5] J. Chiquet, Y. Grandvalet, and C. Charbonnier. Sparsity in sign-coherent groups of variables viathe cooperative-lasso.

Ann Appl Stat , 6:795–830, 2012.[6] W. Choi et al. Identiﬁcation of basal and luminal subtypes of muscle-invasive bladder cancer withdiﬀerent sensitivities to frontline chemotherapy.

Cancer cell , 25(2):152–165, 2014.[7] C. Cobaleda, M. Prez-Caro, C. Vicente-Dueas, and I. Snchez-Garcia. Function of the zinc-ﬁngertranscription factor SNAI2 in cancer and development.

Annu Rev Genet , 41:41–61, 2007.[8] D. Dhawanet al. Comparative gene expression analyses identify luminal and basal subtypes ofcanine invasive urothelial carcinoma that mimic patterns in human invasive bladder cancer.

PLoSOne , 10(9):e0136688, 2015.[9] M. Elati and C. Rouveirol.

Unsepervised learning for Gene Regulation Network Inference fromExpression Data: a review . John Wiley and Sons, Inc., Hoboken, NJ, 2011.[10] A. G. Robertson et al. Comprehensive molecular characterization of muscle-invasive bladder can-cer.

Cell , 171(3):540–556, 2017.[11] A.L. Tarca et al. A novel signaling pathway impact analysis.

Bioinformatics , 25(1):75–82, 2009.[12] B. Kaczkowski et al. Transcriptome analysis of recurrently deregulated genes across multiplecancers identiﬁes new pan-cancer biomarkers.

Cancer Research , 76(2):216–226, 2016.[13] B.D. Lehman et al. Identiﬁcation of human triple-negative breast cancer subtypes and preclinicalmodels for selection of targeted therapies.

J Clin Invest , 121(7):2750–2767, 2011.[14] C.J. Vaske et al. Inference of patient-speciﬁc pathway activities from multi-dimensional cancergenomics data using paradigm.

Bioinformatics , 26:i237–i245, 2010.[15] E.I. Delatola et al. Segcorr: a statistical procedure for the detection of genomic regions of correlatedexpression.

BMC Bioinformatics , 18(1):333, 2017.[16] J.S. Damrauer et al. Intrinsic subtypes of high-grade bladder cancer reﬂect the hallmarks of breastcancer biology.

Proc Natl Acad Sci USA , 111(8):3110–3115, 2014. dentiﬁcation of Deregulated TFs in Bladder Cancer Champion, Chiquet, Neuvial, Elati, Radvanyi, Birmel [17] M. Elati et al. Licorn: learning cooperative regulation networks from gene expression data. Bioin-formatics , 23(18):2407–14, 2007.[18] M. Vignes et al. Gene regulatory network reconstruction using bayesian networks, the dantzigselector, the lasso and their meta-analysis.

PloS one , 6(12):e29165, 2011.[19] T. Picchetti et al. A model for gene deregulation detection using expression data.

BMC SystemsBiology , 9:S6, 2015.[20] A. Greife et al. Canonical notch signalling is inactive in urothelial carcinoma.

BMC Cancer ,14:628, 2014.[21] C. Guziolowski, A. Bourde, F. Moreews, and A. Siegel. Bioquali cytoscape plugin: analysing theglobal consistency of regulatory networks.

BMC Genomics , 10(1):244, 2009.[22] W.E. Johnson, C. Li, and A. Rabinovic. Adjusting batch eﬀects in microarray expression datausing empirical bayes methods.

Biostatistics , 8:118–127, 2007.[23] G. Karlebach and R. Shamir. Constructing logical models of gene regulatory networks by inte-grating transcription factor-DNA interactions with expression data: An entropy-based approach.

J Comput. Biol , 19(1):30–41, 2012.[24] M.A. Knowles and C.D. Hurst. Molecular biology of bladder cancer: new insights into pathogenesisand clinical diversity.

Nat Rev Cancer , 15:25–41, 2014.[25] B. Liu, A. de la Fuente, and I. Hoeschele. Gene network inference via structural equation modelingin genetical genomics experiments.

Genetics , 178(3):1763–1776, 2008.[26] Z. Liu, W. Canglin, H. Miao, and H. Wu. Regnetwork: an integrating database of transcriptionaland post-transcriptional regulatory networks in human and mouse.

Database (Oxford) , 2015, 2015.[27] A. Maraver et al. Notch pathway inactivation promotes bladder cancer progression.

J Clin Invest ,125(2):824–830, 2015.[28] D.S. Mistry, Y. Chen, Y. Wang, and G.L. Sen. SNAI2 controls the undiﬀerentiated state of humanepidermal progenitor cells.

Stem Cells , 32(12):3209–3218, 2014.[29] D.W. Nebert. Transcription factors and cancer: an overview.

Toxicology , 181-182:131–141, 2002.[30] R. Nicolle, F. Radvanyi, and M. Elati. Coregnet: reconstruction and integrated analysis of co-regulatory networks.

Bioinformatics , 31(18):3066–3068, 2015.[31] M. E. Ritchie et al. limma powers diﬀerential expression analyses for RNA-sequencing and mi-croarray studies.

Nucleid Acids Res , 43(7):e47, 2015.[32] The Cancer Genome Atlas. Comprehensive molecular characterization of urothelial bladder car-cinoma.

Nature , 507(7492):315–322, 2014.[33] R. Tibshirani. Regression shrinkage and selection via the lasso.

J R Statist Soc Ser B , 58(1):267–288, 1996.[34] F.S. Togneri et al. Genomic complexity of urothelial bladder cancer revealed in urinary cfDNA.

Eur J Human Genet , 24:1167–1174, 2016.[35] O. Troyanskaya et al. Missing value estimation methods for DNA microarrays.

Bioinformatics ,17(6):520–525, 2001.[36] J.E. Yeh, P.A. Toniolo, and D.A. Frank. Targeting transcription factors: promising new strategiesfor cancer therapy.

Current Opinion in Oncology , 25(6):652–658, 2013.[37] M. Yin et al. ATM/RB1 mutations predict shorter overall survival in urothelial cancer.

Oncotarget ,9(24):16891–16898, 2018.,9(24):16891–16898, 2018.