Identification of deregulated transcription factors involved in subtypes of cancers
Magali Champion, Julien Chiquet, Pierre Neuvial, Mohamed Elati, François Radvanyi, Etienne Birmelé
IIdentification of Deregulated Transcription FactorsInvolved in Specific Bladder Cancer Subtypes
Magali Champion , Julien Chiquet , Pierre Neuvial , Mohamed Elati ,Franois Radvanyi , and Etienne Birmel Laboratoire MAP5, UMR8145 CNRS, Universit Paris Descartes, Paris, France [email protected] UMR MIA-Paris, AgroParisTech, INRA, Universit Paris-Saclay, Paris, France Institut de Mathmatiques de Toulouse, UMR 5219, Universit de Toulouse, CNRS, France Universit de Lille, INSERM U908, Cell Plasticity and Cancer, Lille, France Institut Curie, PSL Research University, CNRS, UMR144, Paris, France
Abstract
Comparison between tumoral and healthy cells may reveal abnormal regulation behav-iors between a transcription factor and the genes it regulates, without exhibiting differentialexpression of the former genes. We propose a methodology for the identification of tran-scription factors involved in the deregulation of genes in tumoral cells. This strategy isbased on the inference of a reference gene regulatory network that connects transcriptionfactors to their downstream targets using gene expression data. Gene expression levels intumor samples are then carefully compared to this reference network to detect deregulatedtarget genes. A linear model is finally used to measure the ability of each transcriptionfactor to explain these deregulations. We assess the performance of our method by nu-merical experiments on a public bladder cancer data set derived from the Cancer GenomeAtlas project. We identify genes known for their implication in the development of specificbladder cancer subtypes as well as new potential biomarkers.
Today, after decades of intensive research, cancer is still one of the most deadly diseases world-wide, killing millions of people every year. Cancer is mainly caused by somatic mutations thataffect critical genes and pathways. These mutations are mostly triggered by environmentalfactors (e.g. obesity, smoking, alcohol, lifestyle,...) often promoted by certain genetic configu-rations. In the last two decades, large-scale projects, such as the Cancer Genome Atlas project(TCGA), which has produced massive amounts of multi-omics data, have launched to improveour understanding of cancers [32]. In this context, developing statistical algorithms able tointerpret these large data sets and to identify genes that are the origin of diseases and theircausal pathways still remains an important challenge.Genes are commonly affected by genomic changes in the pathogenesis of human cancer.Cancer is moreover a heterogeneous disease, with affected gene sets that may be highly differentdepending on subtypes, and thus requires different treatments of patients. Specific analyses ofsubtypes have for example revealed significant differences between breast cancer subgroups [13]but also pancancer similarities between breast and bladder cancer subgroups [16].Using transcriptional data allows to look beyond DNA, that is to study abnormalities interms of gene expression. As a common approach, differential expression analysis, for whichstatistical procedures have been intensively explored, can be performed and altered genes arethen differentially expressed genes [12]. This points to relevant genes but does not take intoaccount the regulations (activation and inhibition) between genes. a r X i v : . [ q - b i o . M N ] A p r dentification of Deregulated TFs in Bladder Cancer Champion, Chiquet, Neuvial, Elati, Radvanyi, Birmel The approach we consider consists in taking into account the regulation structure betweengenes. We particularly focus on transcription factors (TFs), for their major role played in theregulation of gene expression, which make them an attractive target for cancer therapy [29, 36].Regulation processes between TFs and their targets are usually represented by Gene RegulatoryNetworks (GRNs). In the last few years, many different methods have been proposed to inferGRNs from collections of gene expression data. In a discrete framework, gene expression canbe discretized depending on their status (under/over-expressed or normal) and truth tablesprovide the regulation structure [9]. In the continuous case, regression methods, including thepopular Lasso [33] and its derivatives, have provided powerful results [18, 25].A deregulated gene then corresponds to a gene whose expression does not correspond tothe expression level expected from its regulators expression. It is different from the notion ofdifferential expression since a loss of regulation between a target gene and one of its regulat-ing TFs implies a loss of correlation between them but not necessarily differential expression.Conversely, a TF can be differentially expressed and one of its targets not, precisely because itis deregulated.To discover deregulated genes, a first possibility is to infer one network per condition andto compare them. Statistical difficulties due to the noisy nature of transcriptomic data and thelarge number of features compared to the sample size can be taken into account by inferring thenetworks jointly and penalizing the differences between them [4]. A second possible approach isto assess the adequacy of gene expression in tumoral cells to a reference GRN, in order to exhibitthe most striking discrepancies, i.e. the regulations which are not fulfilled by the data [21, 23].Such methods however focus on checking the validity of the network rather than highlightinggenes with an abnormal behavior. Finally, analyses may be conducted at the pathway levelrather than the gene level [11, 14]. They are then not network-wide in the sense that each genehas a deregulation score by pathway it belongs to and pathways are treated independently.Moreover, as the pathways are extracted from curated databases, the regulations taken intoaccount are not tissue-specific.Here, we propose a statistical deregulation model that uses gene expression data to identifyderegulated TFs involved in specific subtypes of cancer. This paper is organized as follows: inSection 1, we present the 3-steps method we developed and our validation procedure. In Section2, we illustrate its interest on the TCGA bladder cancer data set. We show that it can be usedcomplementary to differential expression analysis to point to potential biomarkers of cancers.
Our approach for the identification of deregulated transcription factors (TFs) involved in cancersis based on a 3-steps strategy that ( i ) creates a reference gene regulatory network (GRN),which represents regulations between groups of co-expressed TFs and target genes using areference data set (Step 1), ( ii ) computes a deregulation score for each target gene in eachtumor sample by comparing their behavior with the reference GRN (Step 2), ( iii ) identifiesthe most significant TFs involved in the deregulation of the target genes in each sample fromspecific cancer subtypes (Step 3). These steps are presented in Figure 1 and described in detailin the next sections.2 dentification of Deregulated TFs in Bladder Cancer Champion, Chiquet, Neuvial, Elati, Radvanyi, Birmel GRN inference
Step 1
Deregulation score
Step 2 T a r g e t g e n e s Tumor samples Subtypes T F s Step 3
Deregulated TFs
Figure 1: Workflow of the proposed 3-steps algorithm for identifying TFs involved in specificcancer subtypes.
Step 1 of the algorithm consists in inferring a GRN that connects TFs to their downstreamtargets. Among the large number of existing methods, we choose hLICORN, available in the
CoRegNet
R-package [30]. This algorithm is based on a hybrid version of the LICORN model[17], in which groups of co-regulated TFs act together to regulate the expression of their targets(Figure 2). More precisely, LICORN uses heuristic techniques to identify co-activator and co-inhibitor sets from discretized gene expression matrices and locally associates each target geneto pairs of co-activators and co-inhibitors that significantly explain its discretized expression.The hybrid variation of LICORN then ranks the local candidate networks according to howwell they predict the target gene expression, through a linear regression, and selects the GRNthat minimizes the prediction error. This selection step limits the effects of overfitting, inducedby the model complexity, especially the large number of features (genes) as compared to thesample size [3]. In this work, we slightly enrich the LICORN model by creating a copy of eachTF in the target layer to allow regulations between TFs.TF TF TF TF TF TF TF g g TFsCo-regulatorsTarget genesFigure 2: Example of LICORN graph involving 7 TFs and 2 target genes. TFs are gathered intogroups of co-expressed genes that co-regulate (square for co-activators, circle for co-inhibitors)each target gene.To construct a specific GRN, note that one may prefer using another inference method [5]3 dentification of Deregulated TFs in Bladder Cancer Champion, Chiquet, Neuvial, Elati, Radvanyi, Birmel or a pre-existing regulatory network, which can be loaded from the RegNetwork database [26].Here, we focus on hLICORN since the induced model is particularly suitable for the rest ofour analysis. In addition, it was shown to provide powerful results for cooperative regulationdetection, especially on cancer data set [17, 30].
Step 2 of the algorithm aims at identifying deregulated target genes by carefully comparingtheir expression across all tumor samples with the reference GRN inferred in Step 1. For thispurpose, we use the method described in [19], which assumes that all genes from a hLICORNmodel are allowed to be deregulated, i.e. not to respond to their regulators as expected.More precisely, according to the hLICORN model, each gene g is connected with a set of co-regulated TFs split into a group of co-activators A and co-inhibitors I . A binary deregulationvariable D g , assumed to be non-zero with probability Y , is then introduced to compare the truestatus S g (under/over-expressed or normal) of each target gene in each tumor sample with itsexpected value S ∗ g , resulting from a truth Table (see Figure 3 (b)) and the inferred GRN. Toavoid discretization of the data, the status of all genes are considered as hidden variables. Themodel is described in Figure 3 (a).Co-activator set A Co-inhibitor set I S A S I Collectivestatus S ∗ g Expected status S g True status X g expression of gene g D g Deregulation variable (a) Deregulation model.
ActivatorInhibitor collective statuscollective status - 0 +- e0e + +0 - 0 ++ - - - (b) Truth table.
Figure 3: (a) The deregulation model [19] used to compute a deregulation score for each targetgene in each sample: each gene g is associated to a hidden status S g (under, over-expressedor normal). Target genes are allowed to be deregulated, i.e. not follow their co-regulator rules(Truth table (b)). The binary variable D g indicates whether the corresponding target gene g is deregulated ( D g = 1) or not ( D g = 0). The deregulation score Y of gene g in sample j is then the probability, given the observation, that D g = 1 in sample j . (b) LICORN truthtable, which gives the expected status of a target gene according to the collective status of itsco-activators and co-inhibitors. Collective status are set by default to 0 except if and only if allof its elements share the same status. This table is derived from biological experiments [17].As the number of hidden variables grows exponentially with the number of genes, the like-4 dentification of Deregulated TFs in Bladder Cancer Champion, Chiquet, Neuvial, Elati, Radvanyi, Birmel lihood of the model rapidly becomes intractable. The unknown parameters, including thederegulation score Y , are thus estimated using a dedicated EM-algorithm (see [19] for moredetails). Note that the deregulation score Y does not capture information about differentiallyexpressed genes but genes whose expression does not correspond to the level expected from itsregulator expression. Step 3 consists in identifying TFs that cause deregulations of target genes. Our approach isbased on linear regression models, in which we try to explain the deregulation score of all targetgenes in one sample (Step 2) using their co-regulator TFs as explanatory variables (Step 1).Assume that we have q TFs and p target genes. Denote by Y ij the deregulation score of targetgene i (1 ≤ i ≤ p ) in sample j (1 ≤ j ≤ n ) and G := ( G i(cid:96) ) ≤ i ≤ p, ≤ (cid:96) ≤ q the GRN adjacencymatrix, whose non-zero elements encode the structure (edges) of the graph. We then cast ourmodel as follows: ∀ j ∈ (cid:74) , n (cid:75) , ∀ i ∈ (cid:74) , p (cid:75) , Y ij = G i(cid:96) · B (cid:96)j + ε ij , (1)or, in a matrix form, Y = G · B + ε , where each element B (cid:96)j of matrix B , to estimate, measuresthe deregulation importance of TF (cid:96) in sample j and ε stands for the presence of noise.Solving the B -estimation problem (1) can be viewed as a classical multi-task linear learningproblem, in which the number of observations is the number of target genes p , the numberof linear tasks is n and the number of variables q . To estimate B , we use a constrained leastsquares estimation procedure. As we only expect to find TFs positively causing the deregulationof their targets in each sample, we consider the induced constrained optimization problem: ∀ j ∈ (cid:74) , n (cid:75) , ˆ B · j := argmin β ∈ R q (cid:107) Y · j − Gβ (cid:107) , (2) s.t ∀ (cid:96) ∈ (cid:74) , q (cid:75) , ≤ β (cid:96) ≤ (cid:107) . (cid:107) stands for the euclidian norm. The closer ˆ B (cid:96)j is to 1, the more important the roleof TF (cid:96) in the deregulation of its targets in sample j . To solve Eq. (2), we use the limSolve R-package.
Gene expression is commonly affected by copy number alterations (CNA) [1]. Step 2 of ourprocedure is particularly sensitive to CNA, associating high deregulation scores to amplifiedor deleted target genes [19]. Indeed, the number of copies of a gene can strongly influenceits expression, independently from its regulators expression, making some regulations wronglyderegulated.To remove CNA effects on gene expression and improve the rest of our analysis, we preprocesstarget genes expression data beforehand as proposed in [15]. Gene expression is considered aslinearly modified by CNA through the linear regression model: X ij = α + α CNA ij + ε ij , (3)where X ij is the expression of gene j in sample i and CNA ij its associated copy number. Letˆ α and ˆ α be the estimated solutions of Eq. (3), the corrected expression is then given by:˜ X ij = X ij − ˆ α − ˆ α CNA ij . dentification of Deregulated TFs in Bladder Cancer Champion, Chiquet, Neuvial, Elati, Radvanyi, Birmel We apply our method on bladder cancer data, produced in the framework of the Cancer GenomeAtlas (TCGA) project and available at the Genomic Data Commons Data Portal ( https://portal.gdc.cancer.gov/ ). These data include a set of 401 bladder cancer samples withgene expression and copy number for a total number of 15,430 genes, split into 2,020 TFs and13,410 targets. Gene expression data were produced using RNA-sequencing on bladder cancertissues. Preprocessing is done by log-transformation and quantile-normalization of the arrays.Missing values are estimated using nearest neighbor averaging [35]. TCGA samples are analyzedin batches and significant batch effects are observed based on a one-way analysis of variance inmost data modes. We apply Combat [22] to adjust for these effects. Genes are finally filteredbased on their variability: among them, we only keep the 75% most varying genes.Based on RNA-seq data analysis from the TCGA data portal, samples are split into fivesubtypes: basal-squamous (BaSq), luminal (Lum), luminal-infiltrated (LumI), luminal-papillary(LumP) and neuronal (NE) with different characteristics [10] (Table 1).Subtypes BaSq Lum LumI LumP NESamples 131 44 74 134 18Table 1: Molecular subtypes distribution of the 401 bladder cancer samples [10] .
GRN network.
To validate our method, we have to provide a tissue-specific reference GRN(Step 1), which is computed given a first set of reference samples. In many cancers, the purenormal tissue of origin is not available. Here, we work with the five different subtypes of theTCGA data set presented in Section 3.1. Using samples from one subtype as test cases andthe rest as reference, we infer five different GRNs. Each of them reflect averaged relationshipsbetween genes for patients who are not part of one specific subtype. Due to the very-highheterogeneity of cancers, especially of bladder cancers [24, 34], we think that our method willstill point to relevant deregulations of specific subtypes.After calibrating the internal parameters of the hLICORN agorithm, the GRNs we inferare made of an averaged total number of 28 ,
246 edges connecting 586 TFs to 3 ,
432 of theirtargets. These networks are relatively sparse, each of the target genes being associated with anaveraged number of 8 TFs.
Deregulation scores.
We then run five times the EM procedure (Step 2) on the five subsetsof the gene expression data matrix to compute a deregulation score of each target gene in eachsample of each subtype. From now on, all samples are treated individually, the results reflectinghow genes behaved in each sample of one subtype in comparison to reference samples from allother subtypes.To check the effect of the copy number correction we apply at the beginning of our procedure,we compare the distribution of the deregulation scores across copy number states. To this aim,we use the TCGA CNA thresholded data set, which associates to each gene-sample pair a6 dentification of Deregulated TFs in Bladder Cancer Champion, Chiquet, Neuvial, Elati, Radvanyi, Birmel copy number state of “0” for the diploid state (two copies), “1” for a copy number gain, “-1”for a copy number loss, “2” for an amplification and “-2” for a deletion. We then test forsignificant differences between the diploid state and the altered states (-2,-1,1,2) using Studenttests. Results in terms of p-values, which are corrected for multiple hypothesis testing using theFDR [2], are presented in Table 2. With corrected p-values ranging from 0.10 to 1, deregulationscores are no longer associated with CNA.
SubtypesBaSq Lum LumI LumP NE-2 -1 1 2 -2 -1 1 2 -2 -1 1 2 -2 -1 1 2 -2 -1 1 21 0.81 0.28 1 1 1 1 1 1 0.25 0.10 0.60 1 1 1 1 1 1 1 1
Table 2: Corrected p-values for Student tests when comparing the distribution of the deregu-lation scores between the diploid state (0) and each altered state (-2,-1,1,2) for each subtype.
Deregulated TFs.
We finally apply Step 3 of our method to identify TFs involved in thederegulation scores of the target genes, that is having a non-zero coefficient in ˆ B , as given inEq. (1). We then rank the TFs according to their number of non-zero coefficients across allsamples belonging to each specific subtype. Results are presented in Table 3. SubtypesBaSq Lum LumI LumP NETF % ˆ B TF % ˆ B TF % ˆ B TF % ˆ B TF % ˆ B SPOCD1 92% ZNF268 91% TSHZ1 88% RARB 84% FAIM3 89%ZNF382 86% HES2 80% ZNF354B 88% RFX5 84% SMARCA2 83%RCOR2 86% TBX2 80% AR 85% CBFA2T3 83% RARB 78%ATM 83% PRDM8 75% HES2 82% TBX18 81% ZNF235 78%HABP4 83% TSHZ1 75% HTATIP2 81% TBX3 79% TBX2 72%IRX3 82% ZNF354C 73% MAFG 80% PTRF 79% STAT3 72%IFI16 79% RARB 70% ENO1 80% TBX2 70% HIF1A 72%TEAD2 79% KLF13 70% TBX2 74% PPARG 76% THRA 72%NOTCH4 79% SCML2 68% ZNF563 74% NCOR2 75% PIR 67%SNAI2 79% SNAI3 68% IRX3 72% ZFP2 75% FOSL1 67%
Table 3: List of the 10 most important TFs for explaining the deregulation scores of theirtargets and number of non-zero coefficients in ˆ B (in %) across all samples from each subtype. Top TFs include biomarkers of bladder cancer.
Among TFs of Table 3, we retrievecharacteristic genes of bladder cancer subtypes. For instance, SNAI2, which is deregulatedacross 79% of the BaSq samples, is particularly well-known for its implication in EMT pathwaysfor cancer patients [7] and its capacity to discriminate between basal and luminal subgroups[28]. The presence of NOTCH4 in BaSq samples is particularly interesting as it is part of7 dentification of Deregulated TFs in Bladder Cancer Champion, Chiquet, Neuvial, Elati, Radvanyi, Birmel the NOTCH pathway, whose inactivation tends to promote bladder cancer progression [27].Research works also focus on its implication on the basal subgroup [20]. Similarly, TBX2,involved in all three luminal subtypes is an indicator of luminal cancers [8]. We can finallyemphasize the presence of PPARG in LumP, whose high level of expression is used to describeluminal subtypes [6].
Deregulation is complementary to differential gene expression analysis
Differen-tial gene expression analysis consists in performing statistical analysis to discover quantitativechanges in terms of expression levels between groups. It is frequently used in cancer research toidentify genes with important changes between tumor and normal samples, called differentiallyexpressed genes (DEGs) [31].We perform differential gene expression analysis using the R -package limma [31] on all sam-ples from each subtype when comparing to samples from all other subtypes. We then verifywhether the identified DEGs are different from the deregulated TFs derived from our method(Figure 4). To this aim, we use the following thresholds: a gene is called DEG for p-valuessmaller than 0.01 whereas it is deregulated for a subtype as soon as it is deregulated ( ˆ B (cid:54) = 0)for more than the 50% of the subtype samples. This threshold is purely arbitrary but is notcrucial, as the results remain almost the same with slight changes. As shown in Figure 4, exceptfor BaSq and LumP subtypes, more than the 70% of the identified deregulated TFs are notdifferentially expressed, which means that our procedure does not only point to DEGs.879 48107BaSq 193 6517Lum 366 5723LumI 784 4591LumP 337 4612NEFigure 4: Venn Diagrams representing the number of DEGs (in pink), the number of deregulatedTFs identified by our method (in blue) and their intersection. Conclusion
With the aim of understanding the deregulation processes in tumoral cells, we develop a 3-stepsstrategy that measures the influence of TFs in the deregulation of genes in tumor samples. A listof TFs characterizing given subtypes can then be established. Even if a biological experimentalvalidation should be done in future work, it seems that it can be used complementary todifferential gene expression analysis to point to potential biomarkers of cancers.An open question, which has also to be tackled, is to determine in which extend the in-formation carried by mutations can explain the deregulations. Mutation data are particularlyhard to explore in this context due to various reasons : first of all, mutations do not necessarilyaffect gene expression. Secondly, in cancers, besides the most significant mutated genes, manysequencing projects have shown that genes are mutated in less than 5% of the samples.In this work, among the identified TFs of Table 3, we find ATM, which is highly deregulated(83%) and mutated (15%) for BaSq samples. Mutations of ATM have been recently shown to beassociated with shorter survival in urothelial cancers [37]. As a preliminary result, we observethat 95% of the mutated BaSq samples corresponds to non-zero ˆ B coefficients (Table 4). This8 dentification of Deregulated TFs in Bladder Cancer Champion, Chiquet, Neuvial, Elati, Radvanyi, Birmel table is unfortunately still too unbalanced to positively conclude for a significant associationbut supplementary works need to be done to go further.ˆ B (cid:54) = 0 ˆ B = 0Non mutated 91 21Mutated 18 1Table 4: Confusion matrix indicating the association between mutation and deregulation statusˆ B for TF ATM across all 131 basal samples. References [1] P. Aldred, E. Hollox, and J. Armour. Copy number polymorphism and expression level variationof the human alpha-defensin genes DEFA1 and DEFA3.
Hum Mol Genet , 14(14):2045–2052, 2005.[2] Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: a practical and powerfulapproach to multiple testing.
J R Stat Soc Ser B Methodol , 57(1):289–300, 1995.[3] I. Chebil, R. Nicolle, G. Santini, C. Rouveirol, and M. Elati. Hybrid method inference for theconstruction of cooperative regulatory network in human.
IEEE Trans Nanobioscience , 13(2):97–103, 2014.[4] J. Chiquet, Y. Grandvalet, and C. Ambroise. Inferring multiple graphical structures.
Stat Comput ,21(4):537–553, 2011.[5] J. Chiquet, Y. Grandvalet, and C. Charbonnier. Sparsity in sign-coherent groups of variables viathe cooperative-lasso.
Ann Appl Stat , 6:795–830, 2012.[6] W. Choi et al. Identification of basal and luminal subtypes of muscle-invasive bladder cancer withdifferent sensitivities to frontline chemotherapy.
Cancer cell , 25(2):152–165, 2014.[7] C. Cobaleda, M. Prez-Caro, C. Vicente-Dueas, and I. Snchez-Garcia. Function of the zinc-fingertranscription factor SNAI2 in cancer and development.
Annu Rev Genet , 41:41–61, 2007.[8] D. Dhawanet al. Comparative gene expression analyses identify luminal and basal subtypes ofcanine invasive urothelial carcinoma that mimic patterns in human invasive bladder cancer.
PLoSOne , 10(9):e0136688, 2015.[9] M. Elati and C. Rouveirol.
Unsepervised learning for Gene Regulation Network Inference fromExpression Data: a review . John Wiley and Sons, Inc., Hoboken, NJ, 2011.[10] A. G. Robertson et al. Comprehensive molecular characterization of muscle-invasive bladder can-cer.
Cell , 171(3):540–556, 2017.[11] A.L. Tarca et al. A novel signaling pathway impact analysis.
Bioinformatics , 25(1):75–82, 2009.[12] B. Kaczkowski et al. Transcriptome analysis of recurrently deregulated genes across multiplecancers identifies new pan-cancer biomarkers.
Cancer Research , 76(2):216–226, 2016.[13] B.D. Lehman et al. Identification of human triple-negative breast cancer subtypes and preclinicalmodels for selection of targeted therapies.
J Clin Invest , 121(7):2750–2767, 2011.[14] C.J. Vaske et al. Inference of patient-specific pathway activities from multi-dimensional cancergenomics data using paradigm.
Bioinformatics , 26:i237–i245, 2010.[15] E.I. Delatola et al. Segcorr: a statistical procedure for the detection of genomic regions of correlatedexpression.
BMC Bioinformatics , 18(1):333, 2017.[16] J.S. Damrauer et al. Intrinsic subtypes of high-grade bladder cancer reflect the hallmarks of breastcancer biology.
Proc Natl Acad Sci USA , 111(8):3110–3115, 2014. dentification of Deregulated TFs in Bladder Cancer Champion, Chiquet, Neuvial, Elati, Radvanyi, Birmel [17] M. Elati et al. Licorn: learning cooperative regulation networks from gene expression data. Bioin-formatics , 23(18):2407–14, 2007.[18] M. Vignes et al. Gene regulatory network reconstruction using bayesian networks, the dantzigselector, the lasso and their meta-analysis.
PloS one , 6(12):e29165, 2011.[19] T. Picchetti et al. A model for gene deregulation detection using expression data.
BMC SystemsBiology , 9:S6, 2015.[20] A. Greife et al. Canonical notch signalling is inactive in urothelial carcinoma.
BMC Cancer ,14:628, 2014.[21] C. Guziolowski, A. Bourde, F. Moreews, and A. Siegel. Bioquali cytoscape plugin: analysing theglobal consistency of regulatory networks.
BMC Genomics , 10(1):244, 2009.[22] W.E. Johnson, C. Li, and A. Rabinovic. Adjusting batch effects in microarray expression datausing empirical bayes methods.
Biostatistics , 8:118–127, 2007.[23] G. Karlebach and R. Shamir. Constructing logical models of gene regulatory networks by inte-grating transcription factor-DNA interactions with expression data: An entropy-based approach.
J Comput. Biol , 19(1):30–41, 2012.[24] M.A. Knowles and C.D. Hurst. Molecular biology of bladder cancer: new insights into pathogenesisand clinical diversity.
Nat Rev Cancer , 15:25–41, 2014.[25] B. Liu, A. de la Fuente, and I. Hoeschele. Gene network inference via structural equation modelingin genetical genomics experiments.
Genetics , 178(3):1763–1776, 2008.[26] Z. Liu, W. Canglin, H. Miao, and H. Wu. Regnetwork: an integrating database of transcriptionaland post-transcriptional regulatory networks in human and mouse.
Database (Oxford) , 2015, 2015.[27] A. Maraver et al. Notch pathway inactivation promotes bladder cancer progression.
J Clin Invest ,125(2):824–830, 2015.[28] D.S. Mistry, Y. Chen, Y. Wang, and G.L. Sen. SNAI2 controls the undifferentiated state of humanepidermal progenitor cells.
Stem Cells , 32(12):3209–3218, 2014.[29] D.W. Nebert. Transcription factors and cancer: an overview.
Toxicology , 181-182:131–141, 2002.[30] R. Nicolle, F. Radvanyi, and M. Elati. Coregnet: reconstruction and integrated analysis of co-regulatory networks.
Bioinformatics , 31(18):3066–3068, 2015.[31] M. E. Ritchie et al. limma powers differential expression analyses for RNA-sequencing and mi-croarray studies.
Nucleid Acids Res , 43(7):e47, 2015.[32] The Cancer Genome Atlas. Comprehensive molecular characterization of urothelial bladder car-cinoma.
Nature , 507(7492):315–322, 2014.[33] R. Tibshirani. Regression shrinkage and selection via the lasso.
J R Statist Soc Ser B , 58(1):267–288, 1996.[34] F.S. Togneri et al. Genomic complexity of urothelial bladder cancer revealed in urinary cfDNA.
Eur J Human Genet , 24:1167–1174, 2016.[35] O. Troyanskaya et al. Missing value estimation methods for DNA microarrays.
Bioinformatics ,17(6):520–525, 2001.[36] J.E. Yeh, P.A. Toniolo, and D.A. Frank. Targeting transcription factors: promising new strategiesfor cancer therapy.
Current Opinion in Oncology , 25(6):652–658, 2013.[37] M. Yin et al. ATM/RB1 mutations predict shorter overall survival in urothelial cancer.
Oncotarget ,9(24):16891–16898, 2018.,9(24):16891–16898, 2018.