[PDF] Controlling the Precision-Recall Tradeoff in Differential Dependency Network Analysis

Abstract

Graphical models have gained a lot of attention recently as a tool for learning and representing dependencies among variables in multivariate data. Often, domain scientists are looking specifically for differences among the dependency networks of different conditions or populations (e.g. differences between regulatory networks of different species, or differences between dependency networks of diseased versus healthy populations). The standard method for finding these differences is to learn the dependency networks for each condition independently and compare them. We show that this approach is prone to high false discovery rates (low precision) that can render the analysis useless. We then show that by imposing a bias towards learning similar dependency networks for each condition the false discovery rates can be reduced to acceptable levels, at the cost of finding a reduced number of differences. Algorithms developed in the transfer learning literature can be used to vary the strength of the imposed similarity bias and provide a natural mechanism to smoothly adjust this differential precision-recall tradeoff to cater to the requirements of the analysis conducted. We present real case studies (oncological and neurological) where domain experts use the proposed technique to extract useful differential networks that shed light on the biological processes involved in cancer and brain function.

Full PDF

CControlling the Precision-Recall Tradeoff inDifferential Dependency Network Analysis

Diane Oyen

University of New MexicoAlbuquerque, NM USA [email protected]

Alexandru Niculescu-Mizil

NEC LaboratoriesPrinceton, NJ USA [email protected]

Rachel Ostroff

SomaLogic Inc.Boulder, CO USA [email protected]

Alex Stewart

SomaLogic Inc.Boulder, CO USA [email protected]

Vincent P. Clark

University of New MexicoMind Research NetworkAlbuquerque, NM USA [email protected]

Abstract

Graphical models have gained a lot of attention recently as a tool for learning andrepresenting dependencies among variables in multivariate data. Often, domainscientists are looking speciﬁcally for differences among the dependency networksof different conditions or populations (e.g. differences between regulatory net-works of different species, or differences between dependency networks of dis-eased versus healthy populations). The standard method for ﬁnding these differ-ences is to learn the dependency networks for each condition independently andcompare them. We show that this approach is prone to high false discovery rates(low precision) that can render the analysis useless. We then show that by impos-ing a bias towards learning similar dependency networks for each condition thefalse discovery rates can be reduced to acceptable levels, at the cost of ﬁnding areduced number of differences. Algorithms developed in the transfer learning lit-erature can be used to vary the strength of the imposed similarity bias and providea natural mechanism to smoothly adjust this differential precision-recall tradeoffto cater to the requirements of the analysis conducted. We present real case studies(oncological and neurological) where domain experts use the proposed techniqueto extract useful differential networks that shed light on the biological processesinvolved in cancer and brain function.

Network structure learning algorithms, such as Gaussian graphical models, enable scientists to visu-alize dependency structure in multivariate data. Recently, attention has been brought to the problemof identifying differences between the dependency networks of various conditions or populations.For example, in a neuroimaging study, we want to understand how regions of the brain share infor-mation before and after a person acquires a particular skill. The goal is to identify the regions of thebrain that are most inﬂuential after a skill has been learned so that direct current stimulation can be1 a r X i v : . [ s t a t . M L ] J u l pplied to those regions to accelerate a person’s learning process. In another example we analyzehow the dependency structure of plasma proteins changes between healthy patients and patients thathave cancer, with the goal of understanding the cancer biology and identifying better diagnostics.Tackling these problems, we found that traditional methods for differential dependency networkanalysis, based on learning the dependency network for each condition independently and then com-paring them, tend to produce a large number of spurious differences. This hampers the analysis andprevents drawing any reliable conclusions, signiﬁcantly limiting the usefulness of the differentialanalysis. We also found that there is a need for an intuitive mechanism to control the quality of thelearned differences, and trade off having a small number of spurious differences (high differentialprecision) with identifying a large number of differences (high differential recall).In this paper, we propose a novel use of transfer learning to control the precision-recall tradeoff in differential network analysis, and show that this approach dramatically improves the quality of thelearned differences. The key idea is to learn the dependency networks for the different conditionsjointly, imposing a bias that the learned networks be similar. The more heavily this bias is enforced,the fewer differences will be learned between networks. Our thesis is that true differences that arewell supported in the data tend to require a higher bias to be eliminated, while spurious differencesare eliminated with a lower bias. Thus, by adjusting the strength of the similarity bias, spurious dif-ferences can be ﬁltered out decreasing the number of false discoveries and increasing the reliabilityof the analysis. Using this technique in two oncology studies we identify differential dependenciesthat give insight into cancer biology. In a neuroimaging study we ﬁnd known visual processingpathways and discover interesting insights into regions that relate to visual object recognition. Related Work.

The most common method for performing differential network analysis is to learnthe networks independently for each condition and compare them. As a post-hoc analysis, a boot-strap procedure or permutation test can be applied to eliminate some of the false differences (e.g.[1]). We show that the transfer learning based approach performs signiﬁcantly better and is far lesscomputationally expensive than the bootstrap procedure. A related, but different problem is to learn(Bayesian) networks that discriminate between conditions (e.g. [2]). Discriminative methods intro-duce an arc between two variables when their interaction gives useful discriminatory information.This does not mean, and usually is not the case, that there is a statistical dependency between thetwo variables in either condition.Transfer learning algorithms for graphical models have been extensively studied, and have beenshown to produce networks that are more accurate than networks learned independently [3, 4, 5].However, we are not aware of any existing research that investigates using transfer learning to obtainhigh quality differences between networks or to provide a mechanism to control the precision-recalltradeoff in differential network analysis. Danaher et al. [3], mention low recall of differences learnedon synthetic data, but do not explore further. A recent paper [6] explores techniques for biasinglearning such that the dependency networks differ in a limited number of variables. They show thatif the differences match their assumption, the individual networks can be recovered more accurately.We emphasize that our interest lies squarely in improving the quality of the differential dependencynetwork analysis and providing an intuitive mechanism for trading off the precision and recall ofthe learned differences. Improving the quality of individual dependency networks, or devising newalgorithms for dependency network or transfer learning, while interesting, are orthogonal to thescope of this paper.

In real applications data is often limited and noisy, and modeling assumptions usually do not hold.So we must assume that there will be errors when learning dependency networks from data. Thedifferent types of errors can be visualized in a confusion matrix, as in Figure 1a. Ideally, all edgeswould be identiﬁed as true positives (TP) or true negatives (TN), but this is usually not possible sothere will also be some false positives (FP) and false negatives (FN). Using sparse network learningalgorithms, such as graphical lasso [7], one can trade off between the two types of errors by adjustingthe degree of sparsity of the learned network. This moves the boundary between the learned edgesand non-edges shown as the horizontal line highlighted in green in Figure 1a. Assuming the algo-rithm is able to identify edges with better than random probability, the precision (TP / ( TP + FP ) ) willincrease with sparsity; meanwhile, the recall (TP / ( TP + FN ) ) will decrease. Figure 1c shows that2 ctualactualedge no edge TP FPFN TN l e a r n e d e d g e n o e d g e sparsity actualactualdifference no diff TP A & TN B TN A & TP B FN A & FP B FP A & FN B FP A & TN B TN A & FP B TP A & FN B FN A & TP B TP A & FP B FP A & TP B FN A & TN B TN A & FN B TP A & TP B TN A & TN B FN A & FN B FP A & FP B l e a r n e d d i ff e r e n c e n o d i ff transfer (a) actualactualedge no edge TP FPFN TN l e a r n e d e d g e n o e d g e sparsity l e a r n e d d i ff e r e n c e n o d i ff transferactualactualdifference no diff TP A & TN B TN A & TP B FN A & FP B FP A & FN B FP A & TN B TN A & FP B TP A & FN B FN A & TP B TP A & FP B FP A & TP B FN A & TN B TN A & FN B TP A & TP B TN A & TN B FN A & FN B FP A & FP B Friday, February 15, 13 (b) . . . . . . Edge recall E dge p r e c i s i on ●●●●●●●●● ●●●●●●●●● ● ● Number of training samples1005002000

Sparser Networks Denser Networks (c) . . . . . . Differential recall D i ff e r en t i a l p r e c i s i on ●●●●●●●●● ●●●●●●●●● ● ● Number of training samples1005002000

Sparser Networks Denser Networks (d)

Figure 1: (a) Confusion matrix for learning edges of a single network, (b) Confusion matrix forlearning differences between networks A and B, (c) Edge precision-recall graph, (d) Differentialprecision-recall graph.this is indeed the case. The ﬁgure plots the edge recall vs. the edge precision for networks learnedfrom training sets of various sizes. The true network has 1000 nodes and 1000 edges. Each lineis generated by changing the sparsity to obtain different edge precision-recall tradeoffs. For densernetworks, recall is high, but precision is lower. As networks get sparser, the precision increases butthe recall decreases. Thus the degree of sparsity controls the edge precision-recall tradeoff.When conducting a differential network analysis (i.e. identifying differences between networks),one would like to control in a similar manner the tradeoff between the differential precision (thepercentage of the inferred differences that are actually true) and the differential recall (the percent-age of true differences that are recovered). Unfortunately, the traditional approach of learning thenetworks independently and comparing them provides no mechanism for controlling the differentialprecision-recall tradeoff. Also, adjusting the sparsity of the learned networks will not help with ob-taining the high differential precision required in many applications (e.g. in biological applications,a differential precision above 80% (FDR of 20%) is desirable). Figure 1d plots the differential pre-cision versus the differential recall for pairs of networks learned from training sets of various sizes.The true networks have 1000 nodes and 1000 edges and about 80% of edges in common. Each lineis obtained by varying the sparsity of the learned networks. While adjusting the sparsity has someinﬂuence on the differential precision and recall, the differential precision never gets above 60%.The reason for this is that the sparsity controls a tradeoff between two types of mistakes: inferringan edge where no edge exists, and missing a true edge (FP and FN in ﬁgure 1a). Both types of3istakes will lower the differential precision if the other network does not make the same mistake(see Figure 1b), so trading off between them will not improve the differential precision.It is important to note that getting more data will not solve this problem. Increasing the trainingset size four fold from 500 to 2000 instances per task barely improves the differential precision.Also note that the learning algorithm does a very good job at recovering the individual networks(Figure 1c). Thus, short of learning almost perfect networks which is usually impossible in practice,simply improving the performance of the individual network learning algorithms will not be enoughto obtain the high differential precision required in many practical applications.

The differential confusion matrix in Figure 1b provides a clue about how to obtain the desireddifferential precision-recall tradeoff: the horizontal line can be controlled by imposing an inductivebias towards learning similar networks. A stronger bias for similar networks leads to fewer learned differences, while a weaker bias leads to more learned differences. Assuming that true differencescan better overcome this bias, the differential precision will increase with a stronger bias, whilethe differential recall will probably decrease. Thus the strength of the bias for similar networks iscontrolling the differential precision-recall tradeoff much in the same way the sparsity bias controlsthe edge precision-recall tradeoff.To impose an inductive bias towards learning similar networks, we borrow techniques developed inthe transfer learning literature. In transfer learning or multi-task learning, inductive bias towardssimilar networks is used to obtain more accurate dependency structures when the true networks aresimilar [3, 4, 5]. The same algorithms can be employed to control the differential precision-recalltradeoff. We emphasize that, even though the learning algorithm is the same, the goal is different. Intransfer learning the goal is to improve the accuracy of the individual networks while in this paperthe goal is to improve the differential precision and control the differential precision-recall tradeoff.As we shall see, this different goal makes the technique more widely applicable and easier to use.For instance we do not need to assume that the true networks are similar.In this paper, we use the joint graphical lasso algorithm from [3], which we very brieﬂy describe be-low. However, other transfer learning algorithms can be used as well. Assuming that the differentialanalysis is performed over K conditions or populations, the algorithm infers a precision matrix (cid:98) Θ k for each condition by solving the following joint optimization problem: arg max Θ k (cid:31) , ∀ k K (cid:88) k =1 (cid:104) log det Θ k − tr( (cid:98) Σ k Θ k ) (cid:105) − λ (cid:88) i (cid:54) = j  (1 − λ ) K (cid:88) k =1 | θ k,ij | + λ (cid:32) K (cid:88) k =1 θ k,ij (cid:33) /  where (cid:98) Σ k is a generalized correlation matrix estimated from the data. If Gaussian covariance is usedas a measure of correlation, then a multi-variate Gaussian distribution is ﬁtted to the data of eachcondition. In this paper we measure correlation using Kendall’s Tau which expands the model classto transelliptical graphical models and leads to increased robustness to outliers and non-Gaussianitywithout a signiﬁcant loss in performance [8]. After (cid:98) Θ k are learned, a dependency network for eachdomain is obtained by connecting all the variables that have a non-zero entry in (cid:98) Θ k , and differencesbetween domains are obtained by comparing these networks.The parameter λ controls the sparsity bias for the learned networks, while λ ( ≤ λ ≤ ) isthe similarity bias parameter and controls the strength of the bias towards learning similar networks.When λ = 0 , there is no bias towards similar networks and is equivalent to the traditional method oflearning a network for each condition independently. As λ approaches , the bias towards learningsimilar networks gets stronger, and only differences that are highly supported in the data survive. At λ = 1 the learned structures will be identical and no differences will be recovered. We ﬁrst test the approach using synthetic networks and data. To create a synthetic data set, wegenerate a network with 1000 Gaussian variables and 1000 undirected edges. Then, the endpoint4 .0 0.2 0.4 0.6 0.8 1.0 . . . . . . Differential recall D i ff e r en t i a l p r e c i s i on ●●●●●●● ●●●●●●● ● ● Number of training samples1005002000 λ = λ = λ = λ = λ = λ = λ = λ = λ = λ = λ = λ = (a) Various sample sizes . . . . . . Differential recall D i ff e r en t i a l p r e c i s i on ●●●●●●● ●●●●●●● ●●●●●●● ● ● ● Lambda_1 (sparsity)0.10.20.30.4 0.50.60.7 λ = λ = λ = λ = λ = λ = λ = λ = λ = λ = λ = λ = λ = λ = λ = (b) Various sparsity levels . . . . . . Differential recall D i ff e r en t i a l p r e c i s i on ●●●●●●● ●●●●●●● ●●●●●●●● ● ● Rewiring probability0.10.050.020.01 λ = λ = λ = λ = λ = λ = λ = λ = λ = λ = λ = (c) Various rewiring probabilities . . . . . . Differential recall D i ff e r en t i a l p r e c i s i on ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● TransferBootstrap λ = λ = λ = λ = λ = λ = c = c = (d) Transfer vs. standard bootstrap Figure 2: Differential precision-recall curves on synthetic data.of each edge is re-wired with some probability to another node, creating a different network withedges in common with the ﬁrst one. The goal is to correctly identify the differences between thetwo networks. For each network k we generate a precision matrix Θ k by independently samplingeach entry that corresponds to and edge from a normal distribution, then re-scaling to ensure that Θ k is positive deﬁnite. Training data is then drawn from N (0 , Θ − k ) for each condition k . Resultsare averaged across 5 trials.The results of the experiments are depicted in Figure 2 in terms of differential precision-recallcurves. In these plots, the differential precision-recall curves are obtained by varying the simi-larity bias parameter λ between 0 and 1 to obtain different tradeoffs between differential precisionand recall. For λ = 0 we recover the performance of the traditional approach of learning the net-works independently and comparing them. This is always the rightmost point of each differentialprecision-recall curve, with the highest recall but the lowest precision.Figure 2a shows the differential precision-recall curves for different training set sizes. For each train-ing set size, the sparsity parameter λ is set to the value that yields the highest differential precisionin Figure 1d (i.e. the best differential precision obtained by learning the networks independently5nd comparing them). For all data sizes, increasing the similarity bias (increasing λ ) improvesthe differential precision, showing that the proposed technique does indeed enable a more reliabledifferential analysis. Even with as little data as 100 instances per condition, we are able to obtaina differential precision above 0.8 which would be considered acceptable in many applications. Theprice to pay is a reduction in the number of differences recovered (reduction in differential recall).Note that in our case there is no a priori “correct” value for the similarity bias parameter. Differentvalues lead to different tradeoffs between the differential precision and recall, and the right oper-ating point depends on the application and even on the analysis stage (similar to ROC analysis instandard classiﬁcation). In contrast, in the usual use of transfer learning where the goal is to recoverthe individual networks, there is a “correct” value for this parameter that depends on how similar thetrue networks are (if the similarity bias is too strong performance will drop due to negative transfer,while if the similarity bias is too weak not enough useful information is transferred between tasks.).While the tradeoff between precision and recall for learned differences is mainly controlled by the λ parameter, the sparsity parameter, λ , also has an effect on the differences learned because itcontrols which edges are present in each network. Figure 2b shows the differential precision-recalltradeoff for various values of λ . Lower values of λ (denser networks) increase the differentialrecall by identifying differences due to weaker dependencies that do not appear in the sparser graphs.However, the differential precision is lowered because more fake edges are learned in each graphwhich induce fake differences. At the extreme when the networks are too dense ( λ = 0 . ), thereare ten times more spurious edges than real ones so the differential precision is low even when thesimilarity bias is high.We also vary the rewiring probability when generating the true networks, varying the number oftrue differences. Figure 2c shows the differential precision-recall graphs for various values of therewiring probability for λ = 0 . . As the fraction of true differences decreases (lower rewiringprobability), it gets harder to identify them and performance drops. Increasing the similarity bias(increasing λ ), however, still leads to higher differential precision. These results highlight anotherfundamental difference with the usual use of transfer learning. In transfer learning the true networksmust be similar in order to get any beneﬁt, while in differential analysis there is no such constraint.In fact, in differential analysis performance improves if the true networks are more dissimilar, asopposed to transfer learning where performance gets better with more similar true networks.Finally, we compare to using bootstrap procedures to identify higher conﬁdence differences. Forthe bootstrap procedure, we generate a bootstrap sample of the data, train independent graphicalmodels on that data, then compare the learned networks. We repeat this 44400 times. For eachedge, e , we calculate the bootstrap frequency, P B ( e ) , of it appearing in one network but not theother (i.e. e is a difference). For a given cut-off, c , we consider all edges with P B ( e ) ≥ c asinferred differences. We then generate a differential precision-recall graph by varying c from 0 to1 so that at c = 0 any difference that appeared in any bootstrap is considered a difference, and at c = 1 only differences that appear in all 500 bootstraps are considered differences. Figure 2d showsthe comparison between the transfer method and the bootstrap method for λ = 0 . . The transfermethod dominates the precision-recall curve compared to bootstrapping, and, importantly, can reacha high precision regime that is unattainable via bootstrapping. Also, the transfer method requiresabout a factor of 50 less computation than the bootstrapping (the transfer method learns the networksabout 10 times, once for each setting of λ , while bootstrapping learns the networks 500 times, oncefor each bootstrap). Moreover, the bootstrapping procedure becomes increasingly computationallyexpensive as higher levels of differential precision (or recall) are desired. Bootstrapping achievesthe highest differential precision when c = 1 and there are ∼

100 differences ( ∼

30 of them false)that occur in all 500 bootstraps. Therefore, to push the precision ratio higher than 0.7, we wouldneed to run more bootstraps until these false differences do not appear in at least one of them so theycan be ﬁltered out (while hoping that some true differences remain).

In this section we present three real case studies where domain experts performed differential depen-dency network analysis in an ovarian cancer study, a pancreatic cancer study, and a neuroimagingstudy. A quantitative evaluation of the results is difﬁcult in real usage scenarios because there is noground truth to compare against so true differential precision and recall can not be calculated. In6

10 100 1000 10000 . . . . . . Number of Learned Differences (logscale) − F DR ●●●● ●●● ● ● Lambda_1 (sparsity)0.60.70.8 λ = λ = λ = λ = λ = λ = λ = (a) Ovarian Cancer . . . . . . Number of Learned Differences (logscale) − F DR ●●●● ●● ● ● Lambda_1 (sparsity)0.60.70.8 λ = λ = λ = λ = λ = (b) Pancreatic Cancer Figure 3: Oncology studies. Tradeoff between FDR and number of differences.these cases, the usual approach is to estimate the false discovery rate (FDR) through permutationtests. We take the following approach to estimating FDR: ﬁrst pool the data from all populations(conditions) together, then randomly split the data into synthetic populations with the same numberof instances as the original ones. There should not be any difference between the dependency struc-tures of the newly generated synthetic populations, so any difference identiﬁed by the algorithm isa false discovery. The splitting procedure is performed multiple times and the average FDR is usedas an estimate of the FDR of the algorithm on the original problem. This approach obviously is notperfect and can underestimate the FDR. When tested on synthetic data we found that the true FDRis indeed underestimated.When learning dependency differences from data, we have observed from real users that the bestuse of multi-network learning is as a part of an exploratory tool that allows interactive explorationof the various tradeoffs controlled by the sparsity and transfer parameters. To this end, we providethe domain experts with a Cytoscape [9] plugin that allows them to interactively explore differentsettings of λ and λ and quickly identify and visualize the differences in the dependency networks(see Appendix C). For the most part there is no “correct” setting of the parameters as different trade-offs convey different information about the domain and the right operating point changes dependingon the application and even on the analysis stage. Domain experts applied the technique described in this paper to perform differential dependencynetwork analysis with the goal of identifying and analyzing cancer-induced changes in the depen-dency structure of plasma proteins. They analyze two cancers: ovarian and pancreatic. The ovariancancer study uses data from a cohort of 247 patients (114 cases and 133 controls). The pancreaticcancer study uses a cohort of 469 patients (239 cases and 230 controls). Each patient had a bloodsample taken prior to the diagnosis, and plasma concentrations of 858 proteins were measured usingSomamer technology [10].Analogous to the precision recall curves in Figure 2, Figure 3 shows the tradeoff between estimateddifferential precision (1-FDR) and the number of differences found (in log-scale) for the Ovarianand Pancreatic cancer studies. In both cases, the standard approach of learning the networks inde-pendently and comparing them ( λ = 0 at the right end of the curves) discovers between 1000 and5000 differences, but the majority of them (almost 90% for Ovarian and more than 50% for Pan-creatic) are estimated to be false. This level of false discovery renders the results of the differentialanalysis pretty much useless. However, as the bias for learning similar networks ( λ ) increases, the7igure 4: Differential dependency network between Case and Control populations in the Ovarianoncology study. Each edge represents a dependency that is present in the cancer population but notin the control population.estimated FDR steadily decreases for all settings of the sparsity parameter, reaching levels below10% which is very acceptable in biological applications. As in the synthetic data, fewer differencesare found, but we have much higher conﬁdence that the remaining differences are real.Figure 4 shows the differential dependency network between the cancer and control populations inthe Ovarian study for λ = 0 . and λ = 0 . . Every arc in this network represents a dependencythat is present in the cancer population but not in the control population. For comparison, we alsoshow the differential dependency network obtained using the standard technique ( λ = 0 . , λ = 0 )in Figure 7 in Appendix A. To ensure that the differential network in ﬁgure 4 reveals relevant bio-logical information, we ran a standard enrichment analysis using DAVID [11] on the 24 proteins thatappear in the ﬁgure, and asked collaborators with extensive expertise in cancer biology to analyzethe results. The enrichment analysis shows that the following functional clusters are signiﬁcantlyenriched : endopeptidase inhibitor, inﬂammatory response, complement and coagulation, and ex-tracellular matrix. This is consistent with what is known about ovarian cancer biology. The body’sreaction to ovarian cancer includes stimulation of both the adaptive (antibodies, cellular immunity)and innate (complement, inﬂammation) immune systems. In fact, the new “foreign” entity (ovariancancer) that stimulates these responses also creates a new milieu in which tumor mutations are se-lected for when they help the cancer evade these immune responses [12]. Ovarian cancers (as wellas many other cancers) also tend to induce a hyper-coagulable state, which involve coagulation andcomplement proteins. Endopeptidases play essential roles in homeostasis and signal transduction.Changes in the extracellular matrix are also key to the process as cancer cells escape the primarytumor and metastasize. A list of proteins associated with each of these processes is given in Ap-pendix A. Many of the proteins in Figure 4 have been associated with cancer in general and withovarian cancer in particular. For instance CA125 is a well known and clinically used ovarian cancermarker; SLPI has been shown to be over-expressed in gastric, lung and ovarian cancers, accelerating A functional cluster is enriched if there are signiﬁcantly more members of that cluster present in the querylist than it would be expected from the random background distribution. ; IGFBP4 has been associatedwith a number of cancers, including ovarian [14];Figure 5 shows the differential dependency network between the cancer and control populations inthe pancreatic cancer study for λ = 0 . and λ = 0 . , with the node labels showing functional de-scriptions in lieu of the protein names. The differential dependency network shows proteins linkedwith the endocrine pancreas (e.g. endosomal insulin protease, insulin sensitivity regulator, proteinregulating secretion of hormones by pancreas) and with the exocrine pancreas (e.g, HDL, LDL, IDLproteins, bile dependent digestive enzyme), as well as proteins associated with cancer and cancerrelated processes (e.g. tumor cell lysis receptor, mesothelial tumor differentiation antigen, downregulator of p53, endoplasmic reticulum chaperone). An enrichment analysis ﬁnds the followingprocesses to be signiﬁcantly enriched: extracellular matrix, lipid transport and cell adhesion. Theseprocesses are relevant to the pancreatic cancer biology. As mentioned above, changes in the extra-cellular matrix are involved in cancer cells escaping the primary tumor and metastasizing. Related tothe extracellular matrix, cell adhesion is also a key process that regulates the migration (spreading)of cancer cells through the body and the destruction of the histological structure in cancerous tissues[15]. The lipid transport is related to the exocrine pancreatic function [16].

Functional magnetic resonance imaging (fMRI) measures the activity level in regions of the brainwhile a subject is in the scanner. The dependency network between regions of interest (ROI) in thebrain, is called a functional brain network because it indicates which regions have activity patterns Tumors require heavy vascularization to grow. Since the results could be of signiﬁcant commercial interest in pancreatic cancer diagnosis, our collabora-tors requested that we do not reveal the actual proteins in this network until a patent is ﬁled to protect the IP.By the time of publication this should not be an issue any more and the actual protein names will be revealed. llllllllllllllllll . . . . Number of Learned Differences (logscale) − F DR lllllllllllllllllll lllllllllllllllll llllllllllllllll lllllllllllll oooo Lambda_1 (sparsity)0.20.30.40.5 (a)(b) (c)

Figure 6: Accelerated Learning study. (a) Tradeoff between FDR and number of differences (b)Edges in Novice but not in Intermediate, (c) Edges in Intermediate but not in Novicethat appear to be exchanging information with each other. A common question is whether thesedependencies are different in subjects under different conditions.Using data from the Accelerated Learning fMRI Study, we want to see how brain regions inter-act before and after a person learns a new skill [17]. In this study, subjects are asked to identifyconcealed objects in still images taken from a virtual reality environment. Initially, all subjects areconsidered Novice (i.e. not signiﬁcantly better than random at identifying images with concealedobjects). fMRI data are collected from these subjects while performing this identiﬁcation task. Then,subjects are trained until they reach a level of Intermediate competency (midway between chanceand perfect). At this point, fMRI data are again collected while performing the identiﬁcation task.In total, we have data from 12 subjects at the Novice stage and 4 at the Intermediate stage. For eachsubject, there are 1056 samples of brain activity from 116 regions of interest (ROIs) in the brain. TheROIs are deﬁned by the AAL atlas [18]. The goal is to identify dependencies among the ROIs thatare different between the Intermediate and Novice stages. Looking at the networks (rather than theactivity levels of individual ROIs) shows us which ROIs are most critical for performing a cognitivetask [17].Figure 6a shows the tradeoff between the estimated differential precision (1-FDR) and the numberof differences found (in log-scale) for different values of λ . As before, λ is varied to obtain eachcurve. Using the standard approach ( λ = 0 , lower right end of the curves), one identiﬁes from 300to 1000 differences between novice and intermediate functional brain networks, but almost 80%10f them are estimated to be false. As the similarity bias increases, the estimated FDR decreasesrapidly to levels close to 0 (again, remember that this is an optimistic estimate), so we are muchmore conﬁdent that we are identifying true differences in the functional brain networks.Figures 6b and 6c show the connections that appear in novices but not in intermediates, and viceversa for λ = 0 . and λ = 0 . (these parameter settings were selected by the domain expertto give just a handful of dependencies of high conﬁdence). Figure 8 in Appendix B also showsthe connections present in both novices and intermediates. These results show that for both stages,groups of brain regions are found that share information, which correlate well with sensori-motorpathways found in humans. This includes the occipito-parietal dorsal visual pathway that computesthe location of objects, the occipito-temporal ventral pathway that determines the identity of objects,collections of frontal and cingulate regions that help to make decisions about responses, as wellas separate cerebellar and middle temporal networks, along with other smaller networks of brainregions [19]. With learning to identify hidden objects in this task, it was found that portions of theventral pathway increased in strength, suggesting that learning resulted in greater information ﬂowamong regions that specialize in visual object identiﬁcation. Differential analysis of dependency networks of multivariate data allows domain experts to uncoverand understand the differences between related populations and the processes that are generatingthese differences. Such questions arise in many domains including biology, medicine, and neuro-science. We have shown that the traditional approach of learning the dependency networks for eachtask independently and comparing them is prone to having high false discovery rates. We havediscussed the importance of controlling the quality of the inferred differences between dependencynetworks, and explored a novel use of transfer learning techniques to provide a natural and explicit“knob” that controls the precision-recall tradeoff in differential network analysis. We have shownempirically that this approach achieves higher precision than existing methods, and yields better per-formance than signiﬁcantly more expensive bootstrapping procedures. Finally, we have presentedthree real case studies where domain experts used the proposed techniques to uncover compellingevidence of biological processes involved in cancer and human learning.While in this paper we have focused on differential network analysis, the idea of using transferlearning techniques to improve differential analysis is quite general. For instance, similar techniquescould be used in conjunction with feature selection to answer questions like “are there differentcancer biomarkers for men and women?”, or in conjunction with clustering/unsupervised learningto detect signiﬁcant changes in cluster structures between conditions.

Acknowledgements

We would like to acknowledge the contributions of several collaborators. The pancreatic cancersamples were collected by Randall Brand, M.D. of the University of Pittsburgh Medical Center andMichelle A. Anderson, M.D. of the University of Michigan Hospital and Health Systems. BrittaSinger and Ed Brody of SomaLogic Inc. helped with the analysis of the ovarian and pancreaticcancer results.

References [1] B. Zhang, H. Li, R.B. Riggins, M. Zhan, J. Xuan, Z. Zhang, E.P. Hoffman, R. Clarke, andY. Wang. Differential dependency network analysis to identify condition-speciﬁc topologicalchanges in biological networks.

Bioinformatics , 25(4):526–532, 2009.[2] D. Grossman and P. Domingos. Learning Bayesian network classiﬁers by maximizing con-ditional likelihood. In

Proceedings of the Twenty-First International Conference on MachineLearning , page 46, 2004.[3] Patrick Danaher, Pei Wang, and Daniela Witten. The joint graphical lasso for inverse covari-ance estimation across multiple classes. arXiv stat.ME , 1111(00324v1), November 2011.114] Julien Chiquet, Yves Grandvalet, and Christophe Ambroise. Inferring multiple graphical struc-tures.

Statistics and Computing , 21(4):537–553, October 2011.[5] J. Guo, E. Levina, G. Michailidis, and J. Zhu. Joint estimation of multiple graphical models.

Biometrika , 98(1):1, 2011.[6] Karthik Mohan, Mike Chung, Seungyeop Han, Daniela Witten, Su-In Lee, and Maryam Fazel.Structured learning of Gaussian graphical models. In

Advances in Neural Information Pro-cessing Systems 25 , pages 629–637. 2012.[7] Nicolai Meinshausen and Peter B¨uhlmann. High-dimensional graphs and variable selectionwith the lasso.

The Annals of Statistics , 34(3):1436–1462, June 2006.[8] Han Liu, Fang Han, and Cun-Hui Zhang. Transelliptical graphical models. In

Advances inNeural Information Processing Systems 25 , pages 809–817. 2012.[9] Paul Shannon, Andrew Markiel, Owen Ozier, Nitin S Baliga, Jonathan T Wang, Daniel Ram-age, Nada Amin, Benno Schwikowski, and Trey Ideker. Cytoscape: a software environment forintegrated models of biomolecular interaction networks.

Genome research , 13(11):2498–2504,2003.[10] Larry Gold, Deborah Ayers, Jennifer Bertino, Christopher Bock, Ashley Bock, Edward NBrody, Jeff Carter, Andrew B Dalby, Bruce E Eaton, Tim Fitzwater, et al. Aptamer-basedmultiplexed proteomic technology for biomarker discovery.

PloS one , 5(12):e15004, 2010.[11] Da Wei Huang, Brad T Sherman, and Richard A Lempicki. Systematic and integrative analysisof large gene lists using DAVID bioinformatics resources.

Nat. Protocols , 4(1):44–57, 12 2008.[12] Xipeng Wang, Ena Wang, John J Kavanagh, and Ralph S Freedman. Ovarian cancer, thecoagulation pathway, and inﬂammation.

Journal of Translational Medicine , 3(1):25, 2005.[13] Baik-Dong Choi, Soon-Jeong Jeong, Guanlin Wang, Jin-Ju Park, Do-Seon Lim, Byung-HoonKim, Yong-Ick Cho, Chang-Seok Kim, Moon-Jin Jeong, et al. Secretory leukocyte proteaseinhibitor is associated with MMP-2 and MMP-9 to promote migration and invasion in SNU638gastric cancer cells.

International Journal of Molecular Medicine , 28(4):527, 2011.[14] Graeme Walker, Kenneth MacLeod, Alistair RW Williams, David A Cameron, John F Smyth,and Simon P Langdon. Insulin-like growth factor binding proteins IGFBP3, IGFBP4, andIGFBP5 predict endocrine responsiveness in patients with ovarian cancer.

Clinical CancerResearch , 13(5):1438–1444, 2007.[15] Setsuo Hirohashi and Yae Kanai. Cell adhesion system and human cancer morphogenesis.

Cancer Science , 94(7):575–581, 2005.[16] Angel Lopez-Candales, Matthew S Bosner, Curtis A Spilburg, and Louis G Lange. Cholesteroltransport function of pancreatic cholesterol esterase: directed sterol uptake and esteriﬁcationin enterocytes.

Biochemistry , 32(45):12085–12089, 1993.[17] Vincent P Clark, Brian A Coffman, Andy R Mayer, Michael P Weisend, Terran D R Lane,Vince D Calhoun, Elaine M Raybourn, Christopher M Garcia, and Eric M Wassermann. TDCSguided using fMRI signiﬁcantly accelerates learning to identify concealed objects.

Neuroim-age , 59(1):117–128, Jan 2012.[18] N Tzourio-Mazoyer, B Landeau, D Papathanassiou, F Crivello, O Etard, N Delcroix, B Ma-zoyer, and M Joliot. Automated anatomical labeling of activations in SPM using a macroscopicanatomical parcellation of the MNI MRI single-subject brain.

Neuroimage , 15(1):273–289, Jan2002.[19] Mortimer Mishkin, Leslie G Ungerleider, and Kathleen A Macko. Object vision and spatialvision: two cortical pathways.

Trends in Neurosciences , 6:414–417, 1983.12

Ovarian Cancer Results

Immune responseproteins Inﬂammatory re-sponse proteins Coagulation andcomplementproteins Proteins that areinvolved in the ex-tracellular matrix Endopeptidase in-hibitor proteinsa2-Macroglobulin a2-Macroglobulin a2-Macroglobulin TIMP1 SLPIC2 Ck-b-8-1 C2 TIMP1 TIMP1C6 GHR C6 URB a2-MacroglobulinCk-b-8-1 LBP Factor B a1-Antitrypsin a2-HS-Glycoprotein

Factor B a1-Antitrypsin a1-Antitrypsin BGH3 a1-AntitrypsinProperdin TIMP-1 PCI VGEF KallistatinGHR CD30 TIMP-1 PCILBP VEGF CD30sL-Selectin a2-HS-Glycoprotein

VEGFa1-AntitrypsinTIMP-1VEGFCA-125 may alsobe involved inthese responses.

Table 1: Proteins from the Ovarian cancer differential network that are involved in each of theenriched functional processes. 13igure 7: Differential network for Ovarian cancer obtained using the traditional method of learningthe dependency networks independently and compare them ( λ = 0 . , λ = 0 ).14 Accelerated Learning fMRI Results

Figure 8: Network of dependencies shared among Novice and Intermediate stages of the AcceleratedLearning study. The network gives valuable re-assurance that the network learning algorithm isidentifying true pathways. 15