[PDF] SimBins: An information-theoretic approach to link prediction in real multiplex networks

Abstract

The entities of real-world networks are connected via different types of connections (i.e. layers). The task of link prediction in multiplex networks is about finding missing connections based on both intra-layer and inter-layer correlations. Our observations confirm that that in a wide range of real-world multiplex networks, from social to biological and technological, a positive correlation exists between connection probability in one layer and similarity in other layers. Accordingly, a similarity-based automatic general-purpose multiplex link prediction method -- SimBins -- is devised that quantifies the amount of connection uncertainty based on observed inter-layer correlations in a multiplex network. Moreover, SimBins enhances the prediction quality in the target layer by incorporating the effect of link overlap across layers. Applied to various datasets from different domains, SimBins proves to be robust and superior than compared methods in majority of experimented cases in terms of accuracy of link prediction. Furthermore, it is discussed that SimBins imposes minor computational overhead to the base similarity measures making it a potentially fast method, suitable for large-scale multiplex networks.

Full PDF

11 SimBins: An information-theoretic approach to link prediction in real multiplex networks

Seyed Hossein Jafari , Amir Mahdi Abdolhosseini-Qomi , Maseud Rahgozar , Masoud Asadpour , Naser Yazdani School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran * Corresponding author E-mail: [email protected] (SHJ)

Abstract

Network science has proven to be extremely successful in understanding of complex systems. In recent years, the study of systems comprised of numerous types of relations i.e. multiplex networks has brought higher resolution details on dynamics of these systems. Link prediction puts networks under the microscope from the angle of associations among node pairs. Although link prediction in single-layer networks has a long history, efforts on the same task in multiplex networks are not plentiful. In this study, question under discussion is that, how trans-layer correlations in a multiplex network can be used to enhance prediction of missing links. It is shown that in a wide-range of real-world multiplex networks, from social to biological and technological, a positive correlation exists between connection probability in one layer and similarity in other layers. Subsequently, a similarity-based automatic general-purpose multiplex link prediction method –SimBins– is devised that for an arbitrary layer of a multiplex network, employs the structural features from both the layer itself and an additional auxiliary layer via information theoretic techniques. Applied to various datasets from different contexts, SimBins proves to be robust and superior than compared methods in majority of experimented cases in terms of accuracy of link prediction. Furthermore, it is discussed that SimBins imposes minor computation overhead to the base similarity measures making it a potentially fast method, suitable for large-scale multiplex networks.

Introduction

Link prediction has been an area of interest in the research of complex networks for over two decades [1], studying the relationships between entities (nodes) in data represented as graphs. The main goal is to reveal the underlying truth behind emerging or missing connections between node pairs of a network. Link prediction methods have a wide range of applications, from discovery of latent and spurious interactions in biological networks (which is basically quite costly if performed in traditional methods) [2, 3] to recommender systems [4, 5] and better routing in wireless mobile networks [6]. Numerous perspectives have been adopted to attack the problem of link prediction. Similarity-based methods tend to measure how similar nodes are as an indication on the likelihood of linkage between them. This approach is a result of assuming two nodes are similar if they share many common features [7]. A whole lot of nodes’ features stay hidden (or kept hidden intentionally) in real networks. Additionally, it is an interesting question that despite of hiding a considerable amount of network information, what fraction of the truth behind a process (e.g. link formation) can still be extracted by solely including structural features ? That is one of the main drives to utilize structural similarity indices for link prediction. Several different classifications of similarity measures have been proposed, among all, classifying based on locality of indices is of great importance. To name a few, Common Neighbors (CN) [1], Preferential Attachment (PA) [8], Adamic-Adar (AA) [9] and Resource Allocation (RA) [10] are popular indices focusing mostly on nodes’ structural features, each with unique characteristics. Despite their simplicity, these indices are popular due to their low computational cost. On the other hand, global indices take features of the whole network structure into account, tolerating higher cost of computation, usually in favor of more accurate information. Take length of paths between pairs of nodes for instance, which the well-known Katz [11] index operates on. Average Commute Time (ACT) [1] and PageRank [12] are some other notable global indices. Somewhere in between lies the quasi local methods which inherit properties from both local and global indices meaning that although they utilize some global network information, computational complexity is kept comparable to local methods, such as the Local Path (LP) [13] index and Local Random Walk (LRW) [14]. For more detailed information on these similarity indices (also described as unsupervised methods in the literature [15]), readers are advised to refer to [16]. Some researchers have tackled the link prediction problem using the ideas of information theory; as in [17] mutual information (MI) of common neighbors has been utilized to estimate the connection likelihood of a pair of nodes. Moreover, Path Entropy (PE) [18] similarity index has been conducted which not only takes quantity and length of paths between a pair of nodes into account, but also considers the entropy of those paths affecting connection likelihood of the pair. From a coarse-grained point of view, supervised models of link prediction reside in a different class than aforementioned unsupervised ones. They learn a group of parameters by processing input graph and use certain models, such as feature-based prediction (HPLP [19]) and latent feature extraction (Matrix Factorization [15]). Representation learning has helped automating the whole process of link prediction especially feature selection, one such example method is node2vec [20]. Learning-based methods usually lead to better results compared to similarity-based counterparts, but this does not mean that unsupervised models should be considered obsolete. On the one hand, unsupervised models provide a clearer insight on underlying characteristics of networks, take common neighbors (CN) for example which indicates the high clustering property of networks [18] or Adamic-Adar index which is based on the size of common nodes’ neighborhoods [9]. On the other hand, unsupervised methods can take much less computation effort, which makes them suitable for online prediction without any high cost training phase or feature selection process [21].

As said so far, complex networks research was focused on single-layer networks (simplex or monoplex) for many years. The study of multi-layer (multiplex or heterogeneous) networks dates back to several years ago, although with disparate terminology. Most of the work in this field have been done since 2012, Refs. [22, 23] provide noteworthy reviews on history of multi-layer networks. Attempts for multi-layer link prediction are not abundant in which some of them are introduced here. Hidden geometric correlation in real multiplex networks [24] is an interesting work which depicts how multiplex networks are not just random combinations of single-layer networks. They employ these geometric correlations for trans-layer link prediction i.e. incorporating observations of other layers for predicting connections in a specific layer. This work is followed by a study that argues the requirement of a link persistence factor to explain high edge overlap in real multiplex systems [25]. In heterogeneous networks (i.e. networks with different types of nodes and relations), several similarity search approaches have been proposed. PathSim [26] is a meta path-based similarity measure that can find similar peers in heterogeneous networks (e.g. authors in similar fields in a bibliographic network). The intuition behind PathSim is that two peer objects are similar if they are not only strongly connected, but also share comparable visibility (number of path instances from a node to itself). HeteSim [27] is another method of the same kind which can measure similarity of objects of different type, inspired by the intuition that two objects are related if they are referenced by related objects. Their drawback, however, is their dependence on connectivity degrees of node-pairs (neglecting further information provided by meta paths themselves) and their necessity of using one and usually symmetric meta-path. In [28], a mutual information model has been employed to tackle these problems. Most meta path-based models suffer from lack of automated meta-path selection mechanism, in other words, pre-defined meta paths (mostly specific to the dataset under study) are utilized to help with prediction tasks. Another major issue of previously discussed methods is that by including longer meta paths much more computation is needed to analyze these paths and their role in prediction. Extending traditional similarity measures to multiplex networks have always been a challenge. In this paper, an information-theoretic model is devised that employs other layers’ structural information for better link prediction in some arbitrary layer of the network. By incorporating several similarity indices (AA, RA, PA, CN and ACT) as base proximity measures, we will describe that the proposed method, SimBins, can be used to extend all similarity indices for multiplex link prediction without significantly degrading time complexity. Finally, it is shown that SimBins improves prediction performance on several different real-world social, biological and technological multiplex networks.

Materials and Methods

Link Prediction in Multiplex Networks

Consider a multiplex network   [1] [ ] [ ] , ,..., ; {1, 2,..., } M G V E E E V V M      where M , V and [ ] E  are the number of layers, the set of all nodes and existing edges in layer  of the multiplex network, respectively. Let U V V   be the set of all possible node pairs. Current research aims to study undirected multiplex networks; therefore, it is assumed that [ ] ( , )

G V E  for any arbitrary layer  is an undirected simple graph. The link prediction in multiplex networks is concerned with the issue of predicting missing links in an arbitrary target layer {1, 2,..., } T M  with the help of other auxiliary layers. To be able to evaluate the proposed method, [ ] T E i.e. the edges in target layer is divided into a training set [ ]train T E ( of [ ] T E ) and a test set [ ]test T E ( of [ ] T E ) so that [ ] [ ] [ ]train test T T T

E E E  and [ ] [ ]train test T T

E E   . Only the information provided by the training set is used in the prediction task and eventually, [ ]test T E is compared to the output of the proposed algorithm (link-existence likelihood scores for a subset of [ ]train T U E  , including [ ]test T E ), determining the performance of the method. To be more specific, link likelihood scores are calculated for node pairs of [ ]test T E and a random subset test T Z of [ ] T U E  where [ ]test test | | 2 | | T T

Z E  for which all of them are disconnected in [ ]train T E . To put it in a few words; only a subset of non-observed links in training set are scored for the sake of complexity which will be discussed in detail later. Notice the coefficient , which is a ratio involved to satisfy the link imbalance assumption in real-world networks (that are mostly sparse by nature [29]). In the present study, the issue under scrutiny is how employing one layer of the multiplex network such as A , facilitates the task of link prediction in another layer T where , {1,..., }; T A M T A   i.e. a duplex subset of the multiplex network. In ‘Discussion’ section, it is argued that how one can extend the proposed method to utilize the structural information of multiple layers for link prediction.

Evaluation Method

In their ideal form, link prediction algorithms tend to rank non-observed links in a network so that all latent links are situated on top of the ranking and all other non-existent links underneath. This ranking is based on a link-likelihood score that is dedicated to node pairs corresponding to non-observed links in the network. Acquisition of Area Under Receiver Operating Characteristic Curve (AUC or AUROC) [30] is prominent in the literature for evaluating link prediction methods [16]. AUC indicates the probability that a randomly chosen missing link is scored higher than a randomly chosen non-existent link, denoted as: n nn   (1) where by performing n times of independent comparisons ( n  in our experiments), a randomly chosen latent link has a higher score compared to a randomly chosen non-existent link in n  times and are equally scored in n  times. AUC will be if the node pairs are flawlessly ranked and if the scores follow an identical and independent distribution i.e. the higher the AUC, the better the scoring scheme is. Data

Various real-world multiplex network datasets from different contexts are selected for investigation; from social (Physicians, NTN and CS-Aarhus) to technological (Air/Train and London Transport) and biological systems (C. Elegans, Drosophila and Human Brain). They also have diverse characteristics that are briefly introduced in Table 1.  Air/Train (AT).

This dataset consists of Indian airports network and train stations network and their geographical distances [31]. To relate the train stations to the geographically nearby air ports, in [24] they have aggregated all train stations within 50km from an airport into a supernode. Then, the supernodes are considered as connected if they share a common train station, or if one train station of one supernode is directly connected to a station of the other supernode. Air is the network of airports and Train is the network of aggregated train station supernodes.  C. Elegans.

The network of neurons of the nematode Caenorhabditis Elegans that are connected through miscellaneous synaptic connection types:

Electric , Chemical Monadic and

Chemical Polyadic [32].  Drosophila Melanogaster (DM).

Layers of this network represent different types of protein-protein interactions belonged to the fly Drosophila Melanogaster, namely suppressive genetic interaction and additive genetic interaction. More details can be found in [33, 34].  Human Brain (HB).

The human brain multiplex network is taken from [24, 35]. It consists of a structural or anatomical layer and a functional layer that connect 90 different regions of the human brain (nodes) to each other. The structural network is gathered by dMRI and the functional network by BOLD fMRI [35]. In this multiplex network, the structural connections are obtained by setting a threshold on connection probability of brain regions (which is proportional to density of axonal fibers in between) [24]. The functional interactions are derived in a similar manner, by putting a threshold on the connection probability of regions which is proportional to a correlation coeﬃcient measured for activity of brain region pairs [24].  Physicians.

Taken from [36], the Physicians multiplex dataset contains 3 layers which relate physicians in four US towns by different types of relationships; to be specific, advice , discuss and friendship connections.  Noordin Top Terrorist Network (NTN).

Taken from [37], this multiplex dataset is made of information among 78 individuals i.e. Indonesian terrorists that depicts their relationships with respect to exchanged communications , financial businesses, common operations and mutual trust .  London Transport.

For the purpose of studying navigability performance under network failures, De Domenico et al. [38] gathered a dataset for public transport of London consisting of 3 different layers; the tube , the overground , and the docklands light railway (

DLR ). Nodes are stations which are linked to each other if a real connection exists between them in the corresponding layer.  CS-Aarhus.

This dataset is collected from [39] which is conducted at the Department of Computer Science at Aarhus University in Denmark among the employees. The network consists of 5 different interactions corresponding to current work relationships, repeated leisure activities, regularly eating lunch together, co-author ship of publications and friendship on

Facebook . Node multiplexity in Table 1 shows the fraction of nodes in a multiplex network that are active (have at least one link attached) in more than one layer.

Table 1 Basic Characteristics of Multiplex Networks Used in Experiments.

MULTIPLEX NAME

2 69 1 Air 69 180 Train 69 322

C. Elegans

3 280 0.98 Electric 253 515 Chem-mono 260 888 Chem-poly 278 1703

Drosophila

2 839 0.89 Suppress 838 1858 Additive 755 1424

Brain

2 90 0.85 Structure 85 230 Function 80 219

Physicians

3 246 0.93 Advice 215 449 Discuss 231 498 Friend 228 423

NTN

4 78 0.94 Communication 74 200 Financial 13 15 Operational 68 437 Trust 70 259

London

3 368 0.13 Tube 271 312 Overground 83 83 DLR 45 46

CS-Aarhus

5 61 0.96 Lunch 60 193 Facebook 32 124 Co-author 25 21 Leisure 47 88 Work 60 194

Information Theory Background

This sub-section is concerned with the issue of introducing necessary concepts of information theory, as it lays out the main mathematical background of the proposed method. What follows is the definition of self-information and mutual information. Given a random variable X , the self-information or surprisal of occurrence of event x X  with probability ( ) p x is defined as [40]: ( ) log ( ) I X x p x    (2)

The self-information implies how much uncertainty or surprise there is in the occurrence of an event; the less probable the outcome is, the more the surprise it conveys. The base of the logarithmic functions is assumed to be throughout the paper, as they measure uncertainty in bits of information. Let’s proceed with the definition of mutual information between two random variables X and Y with joint probability mass function ( , ) p x y and marginal probability mass functions ( ) p x and ( ) p x , respectively. The mutual information ( ; ) I X Y is [41]: ( ; )

I X Y ( , )( , ) log ( ) ( ) x X y Y p x yp x y p x p y      , ( , )( , ) log ( ) ( ) x y p x yp x y p x p y   , ( | )( , ) log ( ) x y p x yp x y p x   (3) Consequently, the mutual information of two events x X  and y Y  can be denoted as [17]: ( ; ) I X x Y y   ( | )log ( ) p x yp x  log ( | ) ( log ( )) p x y p x     ( ) ( | ) I x I x y   (4)

In fact, the mutual information indicates how much two variables are dependent to each other i.e. for a variable X , how much uncertainty is reduced due to another variable Y . The mutual information would be zero if and only if two variables are independent. In the following section, we will describe how these two measures play their roles in designation of our method. Base Similarity Measures

There is extensive literature on similarity measures that determine how similar two nodes are in a single-layer network; as it was partially presented on introduction of this paper. In our proposed method, a subset of these similarity indices (both local and global) is used as base measures that the multiplex link prediction model is built on top of them.  CN [1]: Maybe, the most well-known and typical way to measure similarity of two nodes x and y is to count the number of their common neighbors: | ( ) ( ) | CNxy

S x y    (5) where ( ) x  and ( ) y  are the set of neighbors of x and y , respectively.  PA [8]: Preferential Attachment is a well-known phenomenon in social networks i.e. nodes with more links are more likely to make new connections, thus it is said that “the rich get richer” specifically in financial use-cases: | ( ) | | ( ) | PAxy

S x y     (6)

This measure needs no information about what the neighbors of nodes are, only the number of neighbors or degree of the nodes; making it quite low complexity.  RA [10]: In Resource Allocation, degree of a node is considered as a resource that is allocated to the neighbors of that node negatively proportional to its degree:

1( ) ( ) | ( ) |

RAxy z x y

S z      (7)  AA [9]: This metric is another way of weighted counting of common features instead of simply adding them up. The rare features are more contributing and more heavily weighted than RA:

1( ) ( ) (log ( ))

AAxy z x y

S z      (8)  ACT [1]: Random-walk based methods account for the steps required for reaching one node starting from some arbitrary node. Average Commute Time measures the average number of steps required for a random walker to reach node y starting from node x . For the sake of computational complexity, pseudo-inverse of Laplacian matrix is utilized to calculate the commute time: ACTxy xx yy xy

S l l l       (9) where xy l  is the [ , ] x y entry in pseudo-inverse Laplacian matrix i.e. [ ] xy xy l L    . The pseudo-inverse of Laplacian is calculated as [42]: ee eeL L n n         (10) where e is a column vector of 1’s ( e  is its transpose) and n is the total number of nodes. Results

Does the structure of one layer of a multiplex, provide any information on the formation of links in some other layer of the same network? Take a social multiplex network, for example, in which one layer states people’s work relationships and the other layer represents their friendship. Intuitively it can be conjectured that in a real multiplex like our sample social network, structural changes in one layer can affect the others; if two people become colleagues, the conditions of them being friends will probably not be the same as it was before. More specifically, is there any correlation among the structure of layers of a multiplex network? This question has been positively answered in previous studies with different approaches. In [24] a null model is created for a multiplex network, by randomly reshuffling trans-layer node-to-node mappings. Subsequently, it is shown that geometric trans-layer correlations are destroyed in the null model compared to the original network. Learning based methods have also employed structural features to predict links in multiplex networks [43, 44].

Various structural features can be analyzed to uncover correlations between layers. Direct links, common neighbors, paths [1] and eigenvectors [45] are such examples. In the following sections we will develop a set of tools that assist in collection of evidences about trans-layer correlations in multiplex networks, as basic intuitions supporting the proposed link prediction framework.

Partitioning Node Pairs (Binning)

Consider two layers , {1, 2,..., };

T A M T A   of a multiplex network with M layers and V nodes. T is the target layer, so it is intended to predict likelihood of presence of links in that layer, and A is the auxiliary layer assisting the prediction task. A subset U  of U V V   is constituted so that train train

T T

U E Z   where train T Z is a random sample of non-observed links from [ ] T U E  and [ ]train train | | 2 | | T T

Z E  . The size of train T Z is twice as large as [ ]train T E , so that U  would be a suitable representative of the target layer due to the link imbalance phenomenon in real complex systems. Two different partitions of U  is formed (using equal-depth binning, described in the following paragraph): (i) w.r.t the target layer T : { , ,..., } T T T Tb

S S S where T b Tii S U   and , {1, 2,..., }, T TT i j i j b i j S S       (ii)

With respect to the auxiliary layer A : { , ,..., } A A A Ab

S S S where A b Ajj S U   and , {1, 2,..., }, A AA i j i j b i j S S      

These partitions are introduced as bins of node pairs in current study. The number of bins w.r.t target and auxiliary layer are T b and A b , respectively. In ‘Discussion’ section, it will be argued that how the number of bins should be chosen and how they impact the prediction results. An equal-depth (frequency) binning strategy is applied to the target layer similarity scores of the node pairs in U  , in order that each partition ; {1, 2,..., } Ti T

S i b  contains approximately the same number of members (node pairs). The same strategy goes for similarity scores in auxiliary layer A , establishing ; {1, 2,..., } Aj A

S j b  partitions. Aforementioned partitions (bins) form the building blocks of how the multiplex networks are scrutinized in this paper, as they put forward a coarse-grained view of the data; tolerating the statistically insignificant phenomena observed in particular regions of the networks. The setting denoted above will be used from now onwards, to avoid any further repetitions. Intra-layer and Trans-layer Correlations

The foregoing discussion introduces two key measures for target and auxiliary layer bins, namely Ti S and Aj S : (1) intra-layer connection probability intra ( ) Ti p S , and (2) trans-layer connection probability trans ( ) T Aj p S . Intra-layer connection probability in Ti S is the connection likelihood of pairs existing in that bin. This measure can also be expressed as conditional probability of connection of an arbitrary node pair , x y in layer T , given their similarity (bin) in that same layer: intra ( ) ( 1| ); {1, 2,..., } T T Ti i T p S p L S i b    (11)

Notice T L  , which is the event that any random pair ( , ) x y are linked in layer T . Empirically, intra ( ) Ti p S is computed as proportion of node pairs in Ti S that are linked to all of node pairs in the set: trainintra | |( ) ; {1, 2,..., }| | T TT ii TTi

S Ep S i bS   (12)

Intra-layer connection probability for four different multiplex (duplex) networks is provided for each bin in (Fig 1). In data-driven observations of this paper, wherever a similarity measure is involved, Adamic-Adar index is used; otherwise specified. Additionally, it is assumed that the number of bins in both the target and auxiliary layers i.e. T b and A b are set to . An extensive argument will be given in ‘Discussion’ section on how to choose number of bins and how it affects the prediction results. (a) (b) (c) (d) Fig 1 Intra-layer connection probability in target layer bins. Intra-layer connection probability in layer (a) ‘Air’ of Air/Train, (b) ‘Structure’ of Human Brain, (c) ‘Advice’ of Physicians, (d) ‘Suppressive’ of Drosophila

0 The bars with dotted lines in (Fig 1) represent imputed values. Because of high frequency of some certain similarity values (such as scores in AA for node pairs with no common neighbors), a perfect equal-depth binning may not be feasible; as a result, a number of bins will contain no sample node pairs. The value of intra-layer connection probability for these bins has been imputed using a penalized least squares method which allows fast smoothing of gridded (missing) data [46]. In addition to more clear observations, this imputation will let us fix the number of bins and handle missing data in a systematic way. This indicates that by the increment of similarity (higher bin numbers) intra-layer probability increases respectively, depicting a positive correlation between similarity (bin number) and intra-layer connection probability; as stated in one of the most substantial works on the history of link prediction. Trans-layer connection probability is defined analogously except that although connection in target layer T is concerned, the similarity scores of node pairs are given in auxiliary layer A . Comparable to formula (11), trans ( ) T Aj p S can be declared as follows: trans ( ) ( 1| ); {1, 2,..., }

T A T Aj j A p S p L S j b    (13)

Empirical value of trans-layer connection probability is calculated likewise: traintrans | |( ) ; {1, 2,..., }| |

A TjT Aj AAj

S Ep S j bS   (14)

In other words, trans T p w.r.t A relates the similarity of node pairs in layer A to their probability of connection in layer T . Trans-layer connection probability of four duplexes is depicted in (Fig 2). (a) (b) (c) (d) Fig 2 Trans-layer connection probability in auxiliary layer bins.

Trans-layer connection probability in layer (a) ‘Train’ of Air/Train w.r.t ‘Air’, (b) ‘Function’ of Human Brain w.r.t ‘Structure’, (c) ‘Discuss’ of Physicians w.r.t ‘Advice’, (d) ‘Additive’ of Drosophila w.r.t ‘Suppressive’ layer

1 The bars with dotted lines represent imputed trans-layer connection probabilities, similar to intra-layer connection probabilities in (Fig 1). By inspecting the values of trans-layer connection probabilities for the datasets under study, a rising pattern is prominent by moving to bins corresponding to higher similarity ranges. Drosophila in (Fig 2-d) brings up an exceptional case, where similarity in the auxiliary (Additive) layer shows no correlation with connection in the target (Suppressive) layer. Except these kind of irregularities in data, the available evidence appears to suggest that in most of the real multiplex networks, probability of connection in one (target) layer of the network does have positive correlation with similarity in some other (auxiliary) layer i.e. as similarity grows higher in the auxiliary layer, it can be a signal of higher connection probability in target layer. This observation, develops the claim that for link prediction in target layer, not only the similarity of nodes in that same layer, but also their similarity in some other auxiliary layer can be utilized. Notice that this rising pattern in trans p is observed in almost all datasets under scrutiny, independent from the choice of similarity measure. The previously described property of trans-layer connection probability lies at the heart of the current study, shaping the main idea of the proposed multiplex link prediction method. Furthermore, by simultaneously partitioning U  based on their similarity in both target and auxiliary layers, we obtain T A b b  partitions or . Within each 2d-bin, the fraction of target layer links to total node pairs included i.e. the empirical connection probability in target layer is computed. In (Fig 3), empirical probability of connection in 2d-bins is presented for the same duplexes as in (Fig 2). (a) Air,Trainemp P (b) Structure,Functionemp P (c) Advice,Discussemp P (d) Suppressive,Additiveemp P Fig 3 Empirical probability of connection in 2d-bins.

NaN values represent 2d-bins that contain no sample pairs (a) ‘Train’ of Air/Train w.r.t ‘Air’, (b) ‘Function’ of Human Brain w.r.t ‘Structure’, (c) ‘Discuss’ of Physicians w.r.t ‘Advice’, (d) ‘Additive’ of Drosophila w.r.t ‘Suppressive’ layer

2 Several results can be inferred by scrutinizing (Fig 3). Increment of the empirical probability of connection in the horizontal axis expresses the effectiveness of the similarity measure in target layer; the higher the bin number, the larger the fraction of node pairs that have formed links. Another aspect of the above figure is the ascension of the empirical probability of connection as the bin number of the auxiliary layer or the vertical axis (except Drosophila in Fig 3-d), which is a sign of positive correlation between the probability of connection in target layer and similarity in the auxiliary layer; so far totally consistent with Figs 1 and 2 . This cross-layer connection and similarity correlation are observed in the majority of datasets under study, in which a subset of them are presented above.

A subtle observation on the data comprises a difference in the ascension pace of empirical connection probability in target similarity (horizontal axis) versus auxiliary similarity (vertical axis). On the basis of the evidence currently available, it seems fair to suggest that, although the growth of trans-layer connection probability increases the empirical probability of connection in the target layer, intra-layer similarity brings it up faster. It can be deduced that intra-layer similarities play more important roles compared to trans-layer similarities. Therefore, later in the proposed model, the intra-layer connection probability will be considered a stronger signal than the trans-layer counterpart. The following sub-sections are concerned with the issue of how to estimate probability of connection in the target layer of a multiplex network by incorporating other layers’ structural information with a systematic approach that generalizes beyond specific data.

Fusion of Decisions

Consider two independent decision makers that determine the probability of occurrence of a certain event corresponding to a binary random variable. Each of them declares a probability p and q (where p q   ) for the same event, respectively. One would want to reach to a consensus based on these two different opinions. This goal can be achieved by incorporating various functions that operate on input probabilities. The AND operator is one such function:

AND( , ) p q pq  (15) Another option could be the OR operator, defined as: OR( , ) p q p q pq    (16)

If the opinion of one of the decision makers is superior to the other one, the OR operator can be easily modified by employing a weight parameter  : weighted OR ( , ) ( ) p q p q pq    (17)

The more interesting function in the context of current research is the OR operator, for two reasons: 1) fits much better in the problem of link prediction as it is less prone to variations of only one of the input probabilities, 2) the weighted form provides a parameter to control the superiority of one of the input opinions. We will return to the issue of fusion of decisions in the following sub-section when characterizing the link prediction model. The Multiplex Link Prediction Model

On these grounds, a model is suggested to predict probability of connection between node pairs in a layer of the multiplex network such as T which incorporates information both from the layer itself and from some other auxiliary layer A . The similarity between two distinct nodes x and y is defined as: , ( 1| , ); ( , ) T A T T A T Axy xy i j i j

SB I L S S x y S S     (18)

3 where ( 1| , )

T T Axy i j

I L S S  is the uncertainty of existence of a link between ( , ) x y in the target layer when their target and auxiliary bin numbers are known. According to equation (4), we can write: ( 1| , ) ( 1) ( 1; , ) T T A T T T Axy i j xy xy i j

I L S S I L I L S S        (19)

The first term in equation (19) can be derived by incorporating equation (2): ( 1) log ( 1) log( )

T T Txy xy xy

I L p L S      (20) where

Txy S is the min-max normalized similarity score of the pair ( , ) x y in target layer T , i.e. the probability of connection in target layer (without any knowledge on bins partitioning) is estimated with similarity in that same layer, intuitively. The second term in equation (19) is the mutual information of ( , ) x y being connected in the target layer and belonging to Ti S and Aj S bins; which is estimated as follows: ( 1; , ) ( 1; , ) T T A T T Axy i j i j

I L S S I L S S    (21)

Equation (21) propounds the view that a group of node pairs dwelling in known target and auxiliary bins can be looked at similarly. To be more specific, if the goal is to obtain the mutual information between the event that ( , ) x y are connected and the event that it resides in both Ti S and Aj S , a possible workaround is to estimate it with the reduction in uncertainty of connection of any node pair due to which bins (target and auxiliary) it belongs to. Thus, according to equation (4), we proceed by expanding the right hand side of equation (21): ( 1; , ) ( 1) ( 1| , ) T T A T T T Ai j i j

I L S S I L I L S S      (22)

The term ( 1) T I L  in equation (22) is the self-information of that a randomly chosen node pair is linked in target layer T . Clearly, ( 1) T I L  is the same for every node pair in the multiplex network; therefore, it does not affect the scoring (node pairs ranking), and it can be safely neglected. Thus, to carry out the model specification, ( 1| , ) T T Ai j

I L S S  needs to be calculated; which is the conditional self-information of that a randomly chosen node pair is linked in layer T when the pair’s state of binning in target and auxiliary layer is known. Using equation (2) we have ( 1| , ) log ( 1| , ) T T A T T Ai j i j

I L S S p L S S    . On the basis of our discussion on fusion of decisions, the probability ( 1| , )

T T Ai j p L S S  for any random node pair ( , ) x y which is a member of T Ai j

S S is estimated by incorporating intra ( ) Ti p S i.e. intra-layer connection probability in target layer T and trans ( ) T Aj p S i.e. trans-layer connection probability in T w.r.t auxiliary layer A . Therefore, similar to equation (17), the weighted OR of intra and trans-layer connection probabilities concludes in: ( 1| , ) T T Ai j p L S S  intra trans intra trans

1( ) ( ) ( ) ( )

T T A T T Ai j i j A jp S p S p S p S b      (23) ,est

T A ij P     where the weight A jb  is a non-linear term meeting desired properties discussed in closing paragraphs of Intra-layer and ‘Trans-layer Correlations’ sub-section: (1) In lower auxiliary bins intra p plays a more important role compared to trans p . At its extreme, in j  i.e. where similarity in auxiliary layer is 4 miniscule, the effect of trans p is entirely neglected. (2) When similarity grows stronger in auxiliary layer, A jb  converges to , balancing the influence of intra p and trans p on connection probability estimation. (Fig 4-a, c) shows the values of Air,Trainest P in Air/Train and Suppressive,Additiveest P in Drosophila based on equation (23) which their equivalent (in the same train/test phase) empirical probability of connection i.e. Air,Trainemp P and Suppressive,Additiveemp P was computed in (Fig 3-a, c), respectively. In (Fig 4-b, d) the distance matrix of estimated and empirical connection probability in 2d-bins corresponding to Air/Train and Drosophila Air,Train Air,Train Air,Trainest,emp est emp

D P P   and

Suppressive,Additiveest,emp D (defined similarly) are given, respectively. The distances corresponding to NaN (Not a Number) entries of Air,Trainemp P and Suppressive,Additiveemp P are assumed to be zero, as no empirical connection probability is computed for them because of lack of sample node pairs. (a) Air,Trainest P (b) Air,Trainest,emp D (c) Suppressive,Additiveest P (d) Suppressive,Additiveest,emp D Fig 4 Estimated probability of connection in 2d-bins and their distance with empirical counterparts. (a) Estimation of connection probability in (a) ‘Train’ w.r.t ‘Air’ of Air/Train (c) ‘Suppressive w.r.t ‘Additive’ of Drosophila

Distance matrix between estimated and empirical probability of connection in (b) ‘Train’ w.r.t ‘Air’ of Air/Train (d) ‘Suppressive w.r.t ‘Additive’ of Drosophila (Fig 4-b, d) demonstrates that estimated connection probability matrices are similar to their empirical counterparts as only a few intense colors (large differences) can be observed on distance matrices. Figs 3 5 and 4 belong to a single training phase (iteration), therefore a quantitative measure is needed to analyze the general estimation quality. We denote a notion of dissimilarity as: ,emp,est

T A d , ,emp ,emp T A T Aest FT A F

P PP  (24) where F Q is the Frobenius norm of matrix Q , denoted as follows [47]: F Q | | iji j q    (25) The result of equation (23) i.e. d will be if ,emp T A P and ,est T A P are completely matched and will get close to as they get sufficiently different (extreme case happens when ,est T A P is a zero matrix of the same size). In Table 2 the dissimilarity measure defined by equation (23) is computed for multiplex layer pairs of the networks under study. It shows that in most cases the dissimilarity is between and . This moderately low dissimilarity indicates that our estimation can represent the underlying empirical connection probability in 2d-bins without over-fitting. Table 2 Estimation quality for multiplex layer pairs (duplexes).

Column 𝒅̅ represents the dissimilarity between empirical and estimated connection probabilities in 2d-bins, averaged over 1000 iterations. 𝒅 ̅ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A U X L A Y E R T r a i n A i r C h e m - M o n o C h e m - P o l y E l e c t r i c C h e m - P o l y E l e c t r i c C h e m - M o n o A dd i t i v e S upp r e ss i v e F un c t i o n S t r u c t u r e D i s c u ss F r i e nd s h i p A d v i c e F r i e nd s h i p A d v i c e D i s c u ss F i n a n c i a l O p e r a t i o n T r u s t C o mm un i . O p e r a t i o n T r u s t C o mm un i . F i n a n c i a l T r u s t C o mm un i . F i n a n c i a l O p e r a t i o n O v e r g r o und D L R T ub e D L R T ub e O v e r g r o und F a c e b oo k C o - a u t h o r L e i s u r e W o r k L un c h C o - a u t h o r L un c h F a c e b oo k L un c h L un c h T A R G E T L A Y E R A i r T r a i n E l e c t r i c C h e m - M o n o C h e m - P o l y S upp r e ss i v e A dd i t i v e S t r u c t u r e F un c t i o n A d v i c e D i s c u ss F r i e nd s h i p C o mm un i . F i n a n c i a l O p e r a t i o n T r u s t T ub e O v e r g r o und D L R L un c h F a c e b oo k C o - a u t h o r L e i s u r e W o r k AT C. ELEGANS DM HB PHYSICIANS NTN LONDON TRAN CS-AARHUS

To put it altogether, we plug equations (19)-(23) into equation (18) which results in the final scoring scheme. Thus, SimBins similarity score of a node pair ( , ) x y in target layer T with the aid of auxiliary layer A where ( , ) ; {1,..., }, {1,..., } T Ai j T A x y S S i b j b    and , {1,..., };

T A M T A   is (empirical values of intra and trans-layer connection probabilities are used): , intra trans intra trans

T A T T T A T T Axy xy i j i j A jSB S p S p S p S p S b          (26)

Now that our multiplex scoring model is complete, we will proceed by evaluating the method on the datasets introduced earlier.

Experimental Results

The link prediction performance on different datasets and a total of network layers has been reported in Table 3. The evaluation metric is the average AUC over training phases (iterations) with train ratio set to as described in ‘Evaluation Method’ section. Five basic measures has been incorporated i.e. AA, RA, PA, CN and ACT that were explained in ‘Base Similarity Measures’ section. 6 SimBins ( , SB SB

A T AT  ) is compared with scoring based on similarity in the target layer ( T S ) and the simple addition of similarity scores of the target and auxiliary layer ( T A

S S  ). Table 3 Average AUC over 1000 iterations for the networks under study.

AA RA CN PA ACT Target Layer Auxiliary Layer 𝑆 𝑇 𝑆 𝑇 + 𝑆 𝐴 𝑆𝐵 𝑇𝐴 𝑆 𝑇 𝑆 𝑇 + 𝑆 𝐴 𝑆𝐵 𝑇𝐴 𝑆 𝑇 𝑆 𝑇 + 𝑆 𝐴 𝑆𝐵 𝑇𝐴 𝑆 𝑇 𝑆 𝑇 + 𝑆 𝐴 𝑆𝐵 𝑇𝐴 𝑆 𝑇 𝑆 𝑇 + 𝑆 𝐴 𝑆𝐵 𝑇𝐴 A T Air

Train 83.1 88.3

Train

Air 83.5 84.0 C . E L E G A N S Electric

Chem-Mono 70.8 79.3

Chem-Poly 71.0 84.7

Electric 75.9 76.6

Chem-Poly 76.1

Electric 85.5 85.5

Chem-Mono 85.5 86.7 D M Suppressive

Additive

Suppressive H B Structure

Function 91.2 91.4

Function

Structure 86.0 88.7 P H Y S I C I A N S Advice

Discuss 71.7

Discuss

Advice 75.0

Friendship

Advice 70.1 N T N Communi.

Financial

Communi. 90.2

Operation 90.2 80.0

Trust

Communi. 97.8 98.0

Financial

Trust

Communi. 88.8

Financial

Operation 89.0 87.5 L O N D O N T R A N S Tube

Overground

Tube 50.0

DLR

Tube 53.0

Overground C S - AA R H U S Lunch

Facebook 94.6 92.1

Co-author

Leisure

Work 94.5 94.5

Lunch 93.5 91.2

Co-author

Lunch 71.3

Facebook

Leisure

Lunch 82.6

Lunch 87.7

For each base measure, the highest average AUC is shown in bold. For each duplex (row), the highest AUC among all of the methods (independent from the choice of base measure) is highlighted with an underscore. SimBins dominates other two methods and proves to be an effective multiplex link prediction method because of several reasons: (i)

Most of the time SimBins is superior to the other methods (in some cases up to 30% performance advantage is observed). Consequently, in most duplexes, nearly 7 independent from the base similarity measure, SimBins dedicates the bold entries to itself (ii)

In a large fraction of duplexes, the overall best mean AUC belongs to SimBins (iii)

SimBins performs better than the single-layer method (or T S ) in most of the cases and more frequently than similarities addition (or T A

S S  ); meaning it is capable of using other layer’s information effectively. And, , SB T A is more robust against deceptive signals compared to

T A

S S  . Consider Drosophila in Table 3 for example. The slightly negative correlation between similarity in the auxiliary layer (Suppressive) and connection probability in the target layer (Additive), as argued in discussion on (Fig 2-d) has caused performance reduction for T A

S S  whereas SimBins still performs as good as —if not better than— T S . A similar outcome can be observed for NTN and London Transport, more clearly when ACT and PA are used as base similarity measures. In CS-Aarhus, where Facebook is the target layer both T S and T A

S S  perform even worse than random scoring (expected AUC) while SimBins keeps the performance up about . There exist occasions in which SimBins cannot improve the link prediction performance compared to the base similarity measure. Specifically, Drosophila which the absence of trans-layer correlation as discussed earlier is the underlying reason. And, in London Transport, node multiplexity is far too low as shown in Table 1; consequently, very few nodes are shared among different layers that makes utilization of structural similarities between layers a hard task. Evidently, the AA scores of Overground and DLR layers in London Transport are almost all zeros, hence is the

AUC. In Physicians simple addition of the layers’ similarities and SimBins perform much the same. Interestingly, the degree correlation between the duplexes of Physicians is very high e.g. the Pearson degree correlation between Advice and Discuss layers is . Remarkably, the results appear to suggest that choosing AA and RA as base similarity measures, leads to the best overall performance in most of the multiplex networks.

Complexity Analysis

Consider a duplex network [1] [2] [ ] [ ] × }( , , ; ), | | {1, 2 i ii

G V E E E V V m E i    where layer is the target, and layer is the auxiliary layer. Let ( ) O  be a representative of computational complexity for the base similarity measures. The similarity of node pairs in both layers is needed for subset U  of U V V   as formulated in ‘Partitioning Node Pairs (Binning)’ section. Therefore, the computing complexity of measuring similarities is ( ) ii O m    . Partitioning U  into equal-depth bins requires sorting of similarities, consequently it would have complexity of ( log ) i ii O m m   . Total estimation complexity of intra-layer and trans-layer connection probabilities is   i ii O m b   where 𝑏 𝑖 is the number of bins in corresponding layer. And, estimation of probability of connection in all 2d-bins according to equation (23) would be of order ( ) O b b which is negligible w.r.t bounded number of bins. Accordingly, the total computational complexity of scoring a node pair in SimBins would be ( log )

O m m where 𝑚 is in the same order as , m m . if the sparsity of multiplex layers are comparable. This tolerable computing complexity indicates that SimBins can be scaled for usage in large networks. Notice that for obtaining a full ranking of propensity of links, SimBins, like the majority of link prediction algorithms would need at least ( ); O n n V  computations which is not easily scalable to very large networks without pruning the n space. To be specific, for a full ranking, SimBins would have a computing complexity of ( log ) O n m m   in which ( ) O n  is the dominating term in real-networks; meaning that SimBins imposes minor overhead to the base similarity measures. 8 Discussion and Conclusions

In this manuscript, we explored the intra-layer and trans-layer correlations in multiplex networks and verified that in many real multiplex networks, connection probability in some layer is correlated with similarity in another layer of the same multiplex. Subsequently, we developed a link prediction model by incorporating information theory concepts for characterizing intuitions gather from observed data. The proposed method, works on a pair of multiplex’s layers i.e. a duplex. Different ideas can be conducted to extend it to use multiple layers’ topology for link prediction. Considering a target layer 𝑇 and auxiliary layers ,..., M A A , the simplest idea is to add up the SimBins scores for each possible layer pairs, symbolically ,{ ,..., } ,1 SB SB

M i

MT A A T Ai    where , SB i T A is computed according to equation (26). The other –not as straightforward as previous– idea is to compose and study bins of more than two dimensions. This extension, although more systematic, might suffer from heavy sparsity of samples (imagine node pairs residing in 3d-bins). In SimBins, it is stated that a default parameter value is chosen for the number of bins or node pairs partitions. Obviously, the higher the number of bins the higher the resolution of estimations; if set too high, the efficiency and generalization capability of the method weakens and, if set too low, the loss of resolution concludes in insufficient discrimination. A value between and is recommended; SimBins shows no significant sensitivity in terms of accuracy in the mentioned range. Eventually, SimBins is compared with a single-layer method and a multiplex method on multiplexes; (1) base similarity measure in the target layer and (2) simple addition of similarities in target and auxiliary layers, respectively. It is shown that SimBins outperforms the other two methods by up to mean AUC boost in most cases. Besides, it performs worse than target similarity very rarely and is more robust to deceptive signals compared to simple addition of similarities. It is mentioned that in some networks, such as London Transport and Drosophila, SimBins seems to be unprofitable as a result of massively condensed node pairs similarity distribution and negative trans-layer correlations. It is shown that SimBins imposes negligible computation overhead to the base similarity measures. The idea of using an equal-width strategy for partitioning node pairs leads to even more efficiency due to its ( ) O m complexity (instead of ( log )

O m m in equal-depth binning), although the accuracy of prediction might be affected. The aforementioned issues can be tackled in future related works.

Acknowledgements

We wish to thank Dr. Behnam Bahrak for reviewing current manuscript and providing us with his helpful comments and insights. References