Tensor-Based Link Prediction in Intermittently Connected Wireless Networks
Mohamed-Haykel Zayani, Vincent Gauthier, Ines Slama, Djamal Zeghlache
TTensor-Based Link Prediction in Intermittently Connected Wireless Networks
Mohamed-Haykel ZAYANI, Vincent GAUTHIER, Ines SLAMA, Djamal ZEGHLACHE
Lab. CNRS SAMOVAR UMR 5157, Telecom SudParis, Evry, France
Abstract
Through several studies, it has been highlighted that mobility patterns in mobile networks are driven by human behaviors.This effect has been particularly observed in intermittently connected networks like DTN (Delay Tolerant Networks).Given that common social intentions generate similar human behavior, it is relevant to exploit this knowledge in thenetwork protocols design, e.g. to identify the closeness degree between two nodes. In this paper, we propose a temporallink prediction technique for DTN which quantifies the behavior similarity between each pair of nodes and makes use of itto predict future links. Our prediction method keeps track of the spatio-temporal aspects of nodes behaviors organized asa third-order tensor that aims to records the evolution of the network topology. After collapsing the tensor information,we compute the degree of similarity for each pair of nodes using the Katz measure. This metric gives us an indicationon the link occurrence between two nodes relying on their closeness. We show the efficiency of this method by applyingit on three mobility traces: two real traces and one synthetic trace. Through several simulations, we demonstrate theeffectiveness of the technique regarding another approach based on a similarity metric used in DTN. The validity ofthis method is proven when the computation of score is made in a distributed way (i.e. with local information). Weattest that the tensor-based technique is effective for temporal link prediction applied to the intermittently connectednetworks. Furthermore, we think that this technique can go beyond the realm of DTN and we believe this can be furtherapplied on every case of figure in which there is a need to derive the underlying social structure of a network of mobileusers.
Keywords:
Link prediction, wireless networks, intermittent connections, tensor, Katz measure, behavior similarity,DTN
1. Introduction
In recent years, extensive research has addressed chal-lenges and problems raised in mobile, sparse and intermit-tently connected networks (i.e. DTN). In this case, for-warding packets tightly depends on contacts occurrence.Since the existence of links is crucial to deliver data froma source to a destination, the contacts and their propertiesemerge as a key issue in designing efficient communicationprotocols [1]. Obviously, the occurrence of links is led bythe behavior of the nodes in the network [2]. It has beenwidely shown in [3, 4] that human mobility is directedby social intentions and reflects spatio-temporal regular-ity. A node can follow other nodes to a specific location(spatial level) and may bring out a behavior which maybe regulated by a schedule (temporal level). The socialintentions that govern the behavior of mobile users havealso been observed through statistical analyses in [2, 5] byshowing that the distribution of inter-contact times followtruncated power law.
Email addresses: [email protected] (Mohamed-HaykelZAYANI), [email protected] (VincentGAUTHIER), [email protected] (Ines SLAMA), [email protected] (Djamal ZEGHLACHE)
With the intention of improving the performance of in-termittently connected wireless network protocols, it isparamount to track and understand the behaviors of thenodes. We aim at proposing an approach that analyzesthe network statistics, quantifies the social relationshipbetween each pair of nodes and exploits this measure asa score which indicates if a link would occur in the im-mediate future. We strongly believe that the social tiesbetween nodes highly govern the status of a link and es-tablishes an indication for the link prediction: it wouldnever occur if two nodes have no common social interac-tions and would rather be effective and lasting with morecorrelated moving patterns.In this paper, we adapt a tensor-based link predictionalgorithm successfully designed for the data-mining con-text [6, 7]. Our proposal records the network structurefor T time periods and predicts links occurrences for the(T+1) th period. This link prediction technique is designedthrough two steps. First, tracking time-dependent net-work snapshots in adjacency matrices which form a ten-sor. Second, applying of the Katz measure [8] inspiredfrom sociometry. The link prediction technique computesthe degree of behavior similarity of each pair of nodes rely-ing on the tensor obtained in the first step. A high degreeof behavior similarity means that the two nodes have the Preprint submitted to Elsevier November 2, 2018 a r X i v : . [ c s . N I] A ug ame “social” intentions. These common intentions areexpressed by the willingness to meet each other and/orby similar moving patterns to visit a same location. Theyalso promote the link occurrence between two socially closenodes in the immediate future (prediction of the period T +1 after tracking the behaviors of nodes during T timeperiods).We further discuss how we design the tensor-based pre-diction method and detail the two main steps in order toachieve link prediction. On the one hand, we describe howto track the network topology over time with a tensor.On the other hand, we explain how to compute and inter-pret the Katz measure. We then evaluate the effectivenessof predictability through several simulation scenarios de-pending on the nature of the trace (real or synthetic), thenumber of recording periods and the similarity metric com-putation which can be used in a centralized or distributedway. Besides, to the best of our knowledge, this work is thefirst to perform the prediction technique in a distributedway. The assessment of its efficiency can be beneficial forthe improvement or the design of communication protocolsin mobile, sparse and intermittently connected networks.The paper is organized as follows: Section 2 presentsthe related work that highlights the growing interest to thesocial analysis and justifies the recourse to the tensors andto the Katz measure to perform predictions. In Section3, we emphasize the two main steps that characterize ourproposal. Section 4 details simulation scenarios used toevaluate the tensor-based prediction approach, analyzesthe obtained results, assesses its efficiency and proposes adiscussion about the described link prediction technique.Finally, we conclude the paper in Section 5.
2. Related Work
The Social Network Analysis (SNA) [9, 10] and ad-hoc networking have provided new perspectives for thedesign of network protocols [11, 12, 13]. These proto-cols aim to exploit the social aspects and relationshipfeatures between the nodes. Studies conducted in thefield of SNA have mainly focused on two kinds of con-cepts: the most well-known centrality metrics suggested in[9, 14, 15, 16] and the community detection mechanismsproposed in [17, 18, 19, 9]. From this perspective, severalworks have tried to develop synthetic models that aim toreproduce realistic moving patterns [3, 20]. Nonetheless,the study done in [1] has outlined that synthetic modelscannot faithfully reproduce the human behavior becausethese synthetic models are only location-driven and theydo not track social intentions explicitly. We consider inthis work the Time-Variant Community mobility model(TVC model) [3]. The TVC model depends on two maincharacteristics that influence the behavior of nodes: geo-graphical location preferences and time-dependent behav-ior. This design tries to be closer to human-based behaviorand implicitly reproduces the social aspects that charac-terizes ad-hoc networks. Nevertheless, [10] has underlined the limits of these pro-tocols when the network topology is time-varying. Themain drawback comes down to their inability to modeltopology changes as they are based on graph theory tools.Nevertheless, the tensor-based approaches have been usedin some works to build statistics on the behaviors of nodesin wireless networks over time as in [21]. Thakur et al.[4] have also developed a model using a collapsed ten-sor that tracks user’s location preferences (characterizedby probabilities) with a considered time granularity (weekdays for example) in order to considered the emergence of“behavior-aware” delay tolerant networks.In this paper, we propose a link prediction techniquethat tracks the temporal network topology evolution ina tensor and computes a metric in order to character-ize the social-based behavior similarity of each pair ofnodes. Some approaches have addressed the same prob-lem in data-mining in order to perform link prediction.Acar et al. [6] and Dunlavy et al. [7] have provided de-tailed methods based on matrix and tensor factorizationsfor link prediction in social networks such as the DBLPdata set [22]. These methods have been successfully ap-plied to predict a collaboration between two authors rely-ing on the data set of the structures of relationships overtime. Moreover, they have highlighted the use of the Katzmeasure [8], which can be seen as a similarity metric, byassigning a link prediction score for each pair of nodes.The efficiency of the Katz measure in link prediction hasbeen also demonstrated in [23, 24].
3. Description of the Tensor Based PredictionMethod
It has been highlighted that a human mobility patternshows a high degree of temporal and spatial regularity, andeach individual is characterized by a time-dependent mo-bility pattern and a trend to return to preferred locations[2, 3, 4]. In order to improve the design of wireless networkprotocols, and especially the intermittently connected net-works, it is important to exploit this knowledge since theseinteractions usually have an impact on the network struc-ture and consequently on the network performance. Thus,in this paper, we propose an approach that aims to exploitsimilar behavior of nodes in order to predict link occur-rence referring to the social closeness.Predicting future links based on their social closenessis a challenge that is worth an investigation. Indeed, agood link prediction technique contributes to improve theopportunistic forwarding of packets and also enhances thedelivery rate and/or decreases latency. Moreover, it helpsto avoid situations where packets encumber the queue ofthe nodes that are not able to forward them towards theirfinal destinations.To quantify the social closeness between each pair ofnodes in the network, we use the Katz measure [8] inspiredfrom sociometry. This measure aims at measuring the so-cial distance between persons inside a social network. We2lso need to use a structure that records link occurrencebetween each pair of nodes over a certain period of timein order to perform the similarity measure computation.The records represent the network behavior statistics intime and space. To this end, tensors are used. A tensor Z consists in a set of slices and each slice corresponds toan adjacency matrix of the network tracked over a givenperiod of time D . After the tracking phase, we reduce thetensor into a matrix (or collapsed tensor) which expressesthe weight of each link according to its lifetime and itsrecentness. A high weight value in this matrix denotes alink whose corresponding nodes share an important degreeof closeness. We apply the Katz measure on the collapsedtensor to compute a matrix of scores S that not only con-siders direct links but also indirect links (multi-hop con-nections). The matrix of scores expresses the degree ofsimilarity of each pair of nodes respecting to the spatialand the temporal levels. The higher the score is, the bet-ter the similarity pattern gets. Therefore, two nodes thathave a high similarity score are most likely expected tohave a common link in the future. Scalars are denoted by lowercase letters, e.g., a . Vectorsare denoted by boldface lowercase letters, e.g., a . Matri-ces are denoted by boldface capital letters, e.g., A . The r th column of a matrix A is denoted by a r . Higher-ordertensors are denoted by bold Euler script letters, e.g., T .The n th frontal slice of a tensor T is denoted T n . The i th entry of a vector a is denoted by a ( i ), element ( i, j ) of amatrix A is denoted by A ( i, j ), and element ( i, j, k ) of athird-order tensor T is denoted by T i ( j, k ). The computation of the similarity scores is modeledthrough two distinct steps. First, we store the inter-contact between nodes in a tensor Z and reduce it to amatrix X called the collapsed tensor. In a second step, wecompute the matrix of similarity scores S relying on thematrix X (cf. Fig. 1). We consider that the data is collected into the tensor Z . The slice Z t ( i, j ) describes the status of a link be-tween a node i and a node j during a time period between[ tD, ( t + 1) D [ where Z t ( i, j ) is 1 if the link exists duringthe time period D and 0 otherwise. The tensor is formedby a succession of adjacency matrices Z to Z T where thesubscript letters designs the observed period. To collapsethe data into one matrix as done in [6, 7], we choose tocompute the collapsed weighted tensor (which is more ef-ficient than collapsed tensor as shown in [6] and [7]). Thelinks structure is considered over time and the more recentthe adjacency matrix is, the more weighted the structuregets. X ( i, j ) = T (cid:88) t =0 (1 − θ ) T − t Z t ( i, j ) (1)Where the matrix X is the collapsed weighted tensorof Z , and θ is a parameter used to adjust the weight ofrecentness and is between 0 and 1. The Katz measure, which is affiliated to sociometry, wasfirst proposed by Leo Katz in [8]. He considers a socialnetwork as a undirected graph G = ( V, E ) where eachvertices V = { v , v , ..., v k } is a finite set of node thatrepresent a persons and each edge E = { e , e , ..., e k } is afinite set of connection (or an endorsement) between twopersons. We denote a subset P (cid:96) ( v i , v j ) = { e , e , ..., e (cid:96) } ⊂ E as a path of length (cid:96) between node v i and v j . Thescore that characterizes the couple ( v i , v j ) is defined by theweight of paths P (cid:96) ( v i , v j ) connecting person v i to person v j , ∀ v ∈ V .Katz defined his metric between two nodes as S ( i, j ) asdepicted in Eq.(2). It is a function that decreases pro-portionally to the path length P (cid:96) ( v i , v j ). Katz did so inorder to emphasize the fact that endorsements strengthfade over a successive chains of recommendations. Thismeasure can be seen as a generalization of followers (as inTwitter) or an indegree measure. It indicates the numberof votes of a person as well as the identity of the voter(his vote is valuable compared to the number of votes hereceives). This metric is widely used in studies which aimto predict links occurrence [6, 7, 24], especially in socialnetworks as co-authorship communities as the DBLP [22]and arXiv [25] databases. Given that there are “social re-lationships” between nodes in networks with intermittentconnections, it is challenging to exploit this measure andto apply it on collected data. Therefore, the Katz score ofa link between a node i and a node j as given by [8]: S ( i, j ) = + ∞ (cid:88) (cid:96) =1 β (cid:96) P (cid:96) ( v i , v j ) (2)Where β is a user defined parameter strictly superiorto zero and β (cid:96) is the weight of a (cid:96) hops path length. Itis clear that the longer the path is, the lower the weightgets. There is also another formulation to compute Katzscores by means of collapsed weighted tensor as detailedpreviously. Then, the score matrix S can be rewritten as: S = + ∞ (cid:88) (cid:96) =1 β (cid:96) · X (cid:96) = ( I − β · X ) − − I (3)Where I is the identity matrix and X is the obtainedcollapsed weighted tensor.In Fig. 1, we provide an example that describes thetwo main steps of the link prediction technique. We con-sider a network of 4 nodes whose topology is dynamic overtime. At each period t (from 0 to 3), each occurred link3
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 0 1 1 1 1 0 0 1 1 1 0 0 1 0 1 0 1 1 1 2 1 0 0 0 2 0 0 0 1 2 0 0 1 1 2 1 0 0 1 3 1 0 0 1 3 1 0 0 1 3 1 1 0 0 3 1 0 0 0 4 1 0 1 0 4 1 1 1 0 4 0 1 0 0 4 1 1 0 0
1 2 3 4 1 0 1.512 2.952 2.152 2 1.512 0 0.8 2.44 3 2.952 0.8 0 1.152 4 2.152 2.44 1.152 0 1 2 3 4 1 0 0.0015 0.003 0.0022 2 0.0015 0 0.0008 0.0024 3 0.003 0.0008 0 0.0012 4 0.0022 0.0024 0.0012 0
1 2 3 4 t =0
1 2 3 4 t =1
1 2 3 4 t =2
1 2 3 4 t =3 (1) Collect the adjacency matrix over successive periods of time(2) Collapse the different slices into one matrix (cf. eq. 1) (3) Compute the Katz Scores (cf. eq. 3) Figure 1: Example of the matrix S computation is caught in the corresponding adjacency matrix. All ad-jacency matrices form the tensor. The latter structure isused to determine the collapsed weighted tensor by com-puting Eq. (1) (by setting θ to 0.2) for each pair of nodes.Then, the matrix of scores is computed by applying Eq.(3) ( β is set to 0.001) on the collapsed weighted tensor.The measure goes beyond estimating a link weight be-tween two nodes. Indeed, it takes into consideration allpossible paths between two nodes and then quantifies thesocial relationship between them. As described previously,when two nodes are connected through short paths, thescore characterizing this pair is high. Hence, the scorecan be treated as the node’s moving pattern similarity,in view of the fact that the nodes conserve their vicin-ity (short paths). When two nodes share high score, thismeans that their behaviors are similar and that they aregeographically quite close. Therefore, a link occurrencebetween them is very likely. The relationship between each pair of nodes is expressedby a score S ( i, j ), this score reflects the degree of similaritybetween node i and node j . As mentioned in the Katz mea-sure analysis, shorter paths lead to higher scores. Thus,two nodes that share a high score are nodes that are con-nected through short paths during some period of timeand therefore have similar behaviors (similar social inten-tions). The similarity here is related to common prefer-ences in spatial and temporal space. Two nodes maintaintheir connectivity when they move in the same direction and at the same time. Therefore, these scores can be con-sidered as indicators to a possible link existence in thefuture. Thus, the link prediction is done through measur-ing behavior similarity for each nodes pairs in the matrix S .The computation of matrix S , as described before, isdone in a centralized way . It means that the matrix S is computed based on a full knowledge of the networktopology over time. This may not be suitable with ad hocwireless networks where no central entity is considered andcould in addition be very costly. A distributed mecha-nism should then be examined. In a distributed mecha-nism, each node would apply the prediction method rely-ing only on information related to its nearest neighbors. Itis paramount to remember that a Katz formulation givesmore weight to short paths and assigns low scores to longpaths. Therefore, the scores with neighbors located at fewhops away should be sufficient and strong enough com-pared to scores with further ones. This hypothesis will bediscussed in Section 3.
4. Performance Evaluation and Simulation Results
To evaluate how efficient is the tensor-based link pre-diction in intermittently connected wireless networks, weconsider three different traces (two real traces and onesynthetic trace). For each scenario, we compute the cor-responding scores matrix S as described earlier and assessthe efficiency of the link prediction method through eval-uation techniques. In the following, we firstly present the4 able 1: Major Parameters Used in TVC Model Parameter Value
Simulation Area Edge Length 1000 metersNetwork Nodes Number 100 nodesNetwork Nodes Range 75 metersNetwork Geographical Communities Number 2Maximum Nodes Speed 15 m/sMinimum Nodes Speed 5 m/sAverage Nodes Speed 10 m/s traces used for the link prediction evaluation. Then, weexpose the corresponding results and analyze the effective-ness of the prediction method.
We consider three traces to evaluate the link predictionapproach. Two of them are real traces and the third issynthetic. We exploit them to construct the tensor bygenerating adjacency matrices (with different time period t : 5, 10, 30 and 60 minutes). At each case, we track therequired statistics about nodes behavior within T periods.We also consider the adjacency matrix corresponding tothe period T +1 as a benchmark to evaluate Katz scoresmatrix. We detail, in the following, the used traces. • First Trace: Dartmouth Campus trace:
Wechoose the trace of 01/05/06 [26] and construct thetensor slices relying on SYSLOG traces between 8a.m. and midday (4 hours). The number of nodesis 1018. • Second Trace: MIT Campus trace:
We focus onthe trace of 07/23/02 [27] and consider also the eventsbetween 8 a.m. and midday to build up the tensor.The number of nodes is 646. • Third Trace: TVC Model trace:
In this scenario,we use the trace generator proposed by Hsu et al.[28] which reproduces the concept of the TVC model.We consider a square simulation area with an edgelength equal to 1000 meters and where 100 nodes arein motion. We randomly generate two locations as thenode’s geographical preferences and keep communityswitching and roaming probabilities as in the exampleprovided in the generator. As in the other scenarios,we track nodes behavior during 4 hours. Table 1 sum-marizes the main parameters considered in generatingTVC Model traces.For each scenario, we generate adjacency matrices corre-sponding to a different period t : 5, 10, 30 and 60 minutes.Then, to record the network statistics over 4 hours, thetensor has respectively a number of slices T equal to 48,24, 8 and 4 slices (for the case where t =5 minutes, it isnecessary to have 48 periods to cover 4 hours). As men-tioned earlier, we take into account both centralized anddistributed cases for the computation of scores. • The Centralized Computation:
The centralizedway assumes that there is a central entity which hasfull knowledge of the network structure at each periodand applies Katz measure to the global adjacency ma-trices. • The Distributed Computation:
Each node hasa limited knowledge of the network structure. Weassume that a node is aware of its two-hop neigh-borhood. Hence, computation of Katz measures isperformed on a local-information-basis.In both cases, we fix θ and β to 0.2 et 0.001 respectively.Later, we explain why we choose these values. As described in the previous section, we apply the linkprediction method to the three types of traces while con-sidering the different tensor slice periods in both central-ized and distributed cases. In order to assess the efficiencyof this method, we consider several link prediction sce-narios (according to the trace, the tensor slice period andthe scores computation way) and we use different evalu-ation techniques (ROC curves, CDF curves, AUC metricand top scores ratio at T +1). We detail in the followingthe results obtained with each evaluation technique andanalyze the link prediction efficiency. Fig. 2, 4 and 6 depict the ROC curves (Receiver Op-erating Characteristic) [29] for both distributed and cen-tralized computing approaches respectively obtained fromDartmouth Campus trace, MIT Campus trace and TVCmodel trace. For each trace figure, (a), (b), (c) and (d)curves correspond to a tensor slice time of 5, 10, 30 and60 minutes respectively.We first notice that, for all scenarios, the prediction ofall links is quite efficient, compared to the random guess(the curve’s bends are at the upper left corner). More-over, two other observations have to be mentioned in thecase of real traces (Dartmouth Campus and MIT Campustraces). First, it is highlighted that the smaller the ten-sor slice (adjacency matrix) period is, the more reliablethe prediction gets. This observation is obvious for tworeasons. On the one hand, with a low tensor slice time,the probability of tracking a short and occasional contactbetween two nodes is not likely. On the other hand, record-ing four hours of statistics requires 48 adjacency matricesof 5-minutes periods instead of 4 matrices for 60-minutesperiods case. Thus, tracking a short contact between twonodes has less influence when the tensor slices are morenumerous. As an example, in the case where the tensorslice time is 5 minutes, a fleeting contact can be caughtby one adjacency matrix among 48. However, for the casewhere the slice time is 60 minutes, the fleeting contact istracked by one tensor slice among 4, which significantlygives it more weight compared to the former case. Hence,5hort tensor slice periods enable us to minimize the prob-ability of tracking a short contact existence and to restrictits impact.Short tensor slice periods also allow us to better trackthe social interactions (meetings in a cafeteria, courses inan amphitheater, etc) between nodes which determine theoccurrence of links. Successive adjacency matrices of 5minutes give more accurate description of network struc-ture over time as both analyzing and identifying these so-cial events are easier through smaller periods.The second observation concerns the similar resultsobtained at the centralized and distributed matrix ofscores computation. In fact, the similarity is higher whenthe paths considered between a pair of nodes are short.Thereby, paths that have more than two hops have weakerscores and so are less weighted compared to shorter ones.The distributed case assumes that each node knows itsneighbors at most at two hops. That is why distributedscores computation presents performances which are sosimilar to the centralized ones.Regarding the results obtained from the synthetic trace(TVC model trace), it is obvious that there are no sig-nificant differences between the ROC curves as the tensorslice periods varies (especially for the scenarios where theperiod t is higher than 5 minutes). On top of having thesame performances with the two scores matrix computa-tion ways, changing the adjacency matrix time period doesnot impact the link prediction efficiency. This observationcould be explained by the conclusion drawn by Hossmannet al. in [1] which outlines that location-driven mobilitymodels do not care about social intentions. In addition,through the proposed behavior similarity metric between apair of nodes, Thakur et al. [4] prove that TVC model lim-its moving patterns to visiting preferred locations and donot take care of any social coordination. With TVC model,movement patterns are the same for all nodes (moving intotwo geographical communities) and repetitive (the chosenmoving speed is between 5 and 15 m/s with an averagespeed of 10 m/s). They are only regulated by geographi-cal preferences (each node visits the preferred communitywith an “individual willingness” and there is no correla-tion between its moving pattern and those of other nodes).Therefore, having several tensor slices is not different fromconsidering fewer ones. Moreover, the adjacency matrix at T +1 in each scenario is quite the same. In order to highlight the impact of the choice of the pe-riod t on the link prediction, we represent in Fig. 3, 5 and7 the skewed Cumulative Distribution Function (CDF) ofthe scores obtained respectively for Dartmouth Campustrace, MIT Campus trace and TVC model trace (onlystrictly positive scores are considered). At each trace’sCDF figure, (a) and (b) correspond to the distributed andcentralized scores matrix computation respectively. Theobtained results for real traces (Dartmouth Campus andMIT Campus) show that the spreading of distribution is Table 2: Evaluation metrics for the prediction of all links ap-plied on Dartmouth Campus trace (cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)
Prediction Cases Metrics AUC Top Scores Ratio at T +1 Distributed Case and t =5 mins 0.9850 2944/3267 (90.11%)Centralized Case and t =5 mins 0.9844 2945/3267 (90.14%)Distributed Case and t =10 mins 0.9817 2866/3340 (85.80%)Centralized Case and t =10 mins 0.9826 2866/3340 (85.80%)Distributed Case and t =30 mins 0.9360 2758/3832 (71.97%)Centralized Case and t =30 mins 0.9324 2758/3832 (71.97%)Distributed Case and t =60 mins 0.9153 2928/4270 (68.57%)Centralized Case and t =60 mins 0.9069 2926/4270 (68.52%) narrower when the period t is larger. In fact, at a CDFwith wider spreading (especially at the case of t =5 min),high scores that express link occurrence prediction are eas-ier to figure out. On the contrary, the interval of scores isnarrow and so the score’s analysis is more imprecise. Theseresults confirm the ones obtained through ROC curves.While the CDF results of real traces look similar, the onesof the synthetic trace show that the tensor slice periodhas a less significative impact. Indeed, the cumulative dis-tribution functions are redundant at over 80% of obtainedscores (when the scores are situated between 0 and 3 . − ).This observation also applies to ROC curves results.As a final note, we underline that the synthetic traceCDF shows a higher percentage of weak scores than thoseof real traces. This observation explains the more limitedprediction efficiency outlined with the TVC model trace. As another evaluation step, adapted metrics are used inorder to further weigh the performance of the proposedlink prediction technique. At this step, on top of evalu-ating prediction of all links, we try to focus on assessingthe efficiency of our technique in predicting new links thatoccurred for the first time at T +1 (while ignoring all previ-ously seen links). To this end, we compute the Area Underthe ROC Curve metric (AUC metric) [29] which could beconsidered as a good performance indicator in our case.Thus the top scores ratio metric at T +1 is also consid-ered. To determine this metric, we compute the accu-rate number of links identified through the link predictiontechnique. We list, for each considered time period, thenumber of existing links at period T +1, which we call L .Then, we extract the links having the L highest scores anddetermine the number of existing links in both sets. Theevaluation metrics are computed for all traces with differ-ent tensor slice periods in both distributed and centralizedscenarios. The results corresponding to all links predictionare listed in Table 2 (Dartmouth Campus trace), Table 3(MIT Campus trace) and Table 4. The results correspond-ing to new links prediction are listed in Table 5, Table6 and Table 7(respectively for Dartmouth Campus, MITCampus and TVC model traces).6 a) 5 minutes tensor slice period (b) 10 minutes tensor slice period(c) 30 minutes tensor slice period (d) 60 minutes tensor slice period Figure 2: ROC Curves for different prediction cases applied on Dartmouth Campus trace (a) Distributed computation of scores (b) Centralized computation of scores
Figure 3: Cumulative distribution function of Katz scores obtained from Dartmouth Campus trace
Regarding all links prediction results, we note, based onthe high values of AUC metric (over than 0.9 at real traces)and top scores ratio obtained at T +1, that the predictionmethod is efficient in predicting future links. Moreover, wenote that prediction is better when the tensor slice peri-ods are shorter. We also observe that the centralized anddistributed matrix of scores computation achieve similarperformances. In addition, the results related to the topscores metric attests to the fact that the prediction of alllinks is efficient (at least 68% of links are identified andthis percentage can exceed 90% in some cases) at bothcentralized and distributed scenarios. We also note thatthe previous observation regarding the redundancy of the results as the tensor slice period varies with the synthetictrace is confirmed. Indeed, The number of existing linksat T + 1 is the same when the period t is over 5 minutes.Moreover, AUC metric and top scores ratio has almost al-ways the same value. Nonetheless, when the predictiononly concerns new links, AUC metric values considerablydecrease. This observation presumes that the predictionis not that accurate when only new links are considered.Given that new links are not tracked by the tensor, theirscores are low (and even null). This interpretation is sup-ported by the top scores ratio at T + 1. In fact, the per-centages of identified new links are very low (no more than8% in the best cases). Hence, the tensor-based link predic-7 a) 5 minutes tensor slice period (b) 10 minutes tensor slice period(c) 30 minutes tensor slice period (d) 60 minutes tensor slice period Figure 4: ROC Curves for different prediction cases applied on MIT Campus trace (a) Distributed computation of scores (b) Centralized computation of scores
Figure 5: Cumulative distribution function of Katz scores obtained from MIT Campus trace tion technique is not efficient when the prediction targetsthe occurrence of new links. This result is also highlightedin [6] and [7].It is also important underline the fact that our mecha-nism efficiency is dependent of chosen values of θ and β .We depict in Fig. 8(a) and Fig. 8(b) the top scores ratioat T + 1 and the AUC, respectively, obtained for differentvalues of θ and β . We can note that the values set to θ (i.e. 0.2) and to β (i.e. 0.001) enables us to reach a quiteefficient level of prediction. This results are relative to aprediction set performed on the MIT Campus trace withthe distributed version of our method (as described in theSection 4.1). We aim through this subsection to compare our pro-posal to another similar approach (we use the distributedapproach to compute the scores). As we are designing ametric that expresses the degree of similarity of two nodes,we choose to compare the tensor-based technique perfor-mance to the one of the similarity metric suggested byThakur et al. [4]. The latter metric measures the de-gree of similarity of the behaviors of two mobile nodes andthe behavior of each node is expressed by an associationmatrix . The columns of the matrix represent the possi-8 a) 5 minutes tensor slice period (b) 10 minutes tensor slice period(c) 30 minutes tensor slice period (d) 60 minutes tensor slice period
Figure 6: ROC Curves for different prediction cases applied on TVC model trace (a) Distributed computation of scores (b) Centralized computation of scores
Figure 7: Cumulative distribution function of Katz scores obtained from TVC model trace ble locations that a node can visit and the rows expresstime granularity (hours, days, weeks, etc.). The dominantbehavioral patterns are tracked using the Singular ValueDecomposition (SVD) [30]. For more details about thesimilarity metric computation, we refer the reader to [4].We compare the top scores ratio at T +1 and the areaunder the ROC curve metrics and we measure them fordifferent tensor slice times or time granularities (5, 10, 30and 60 minutes). For this comparison, we use the MITtrace of 07/23/02 and track adjacency matrices and/orassociation matrices from 8 a.m. to 3 p.m.. The associatedresults for the top scores ratio at T +1 and for the areaunder the ROC curve are respectively depicted in Fig. 9(a) and Fig. 9(b) and the performance gap are respectivelydisplayed in Fig. 9(c) and Fig. 9(d).We firstly focus on the comparison according to the topscores ratio at T +1. We underline that our proposal showsmore efficient prediction ability compared to Thakur etal. framework especially when the tensor slice time/timegranularity tends to be short. The different ”natures” ofthe metrics used in each approach explain the results ob-tained for the two sets of comparison. Indeed, the measurequantifies the similarity of nodes based on their encountersand geographical closeness. In other words, the predictionmeasure cares about contacts (or closenesses) at (around)the same location and at the same time. Meanwhile, the9 a) Area Under the ROC Curve (b) Top Scores Ratio at T +1 Figure 8: Prediction performance of the tensor-based technique distributed version for different values of β and θ (a) Comparison according to the Top Scores Ratioat T +1 metric (b) Comparison according to the Area Under theROC Curve metric(c) The Top Scores Ratio at T +1 metric gap be-tween the two approaches (d) The Area Under the ROC Curve metric gap be-tween the two approaches Figure 9: Prediction performance comparison between the tensor-based technique and the approach of Thakur et al. similarity metric proposed by Thakur et al. is defined asan association metric. Hence, it measures the degree ofsimilarity of behaviors of two mobile nodes without neces-sarily seeking if they are in the same location at the sametime. As we have previously stated, the prediction perfor-mance of the tensor-based link prediction method is betterwith shorter tensor slice times. Then, with a longer tensorslice time, the interpretation of network statistics becomes less precise. This observation accounts for the predictionperformance more comparable for both approaches with alarger tensor slice time/time granularity.On the other hand the comparison based on the AUCmetric, we remark that the two approaches show quasi-similar prediction efficiency with a slightly better perfor-mance for our proposal, mostly because the overwhelm-ing number of noneffective links introduce a bias in the10 able 3: Evaluation metrics for the prediction of all links ap-plied on MIT Campus trace (cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)
Prediction Cases Metrics AUC Top Scores Ratio at T +1 Distributed Case and t =5 mins 0.9838 1922/2147 (89.52%)Centralized Case and t =5 mins 0.9842 1925/2147 (89.66%)Distributed Case and t =10 mins 0.9813 1867/2187 (85.36%)Centralized Case and t =10 mins 0.9807 1866/2187 (85.32%)Distributed Case and t =30 mins 0.9631 1757/2311 (76.02%)Centralized Case and t =30 mins 0.9618 1757/2311 (76.02%)Distributed Case and t =60 mins 0.9256 1803/2657 (67.85%)Centralized Case and t =60 mins 0.9361 1817/2657 (68.38%) Table 4: Evaluation metrics for the prediction of all links ap-plied on TVC model trace (cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)
Prediction Cases Metrics AUC Top Scores Ratio at T +1 Distributed Case and t =5 mins 0.8851 717/931 (77.01%)Centralized Case and t =5 mins 0.8860 717/931 (77.01%)Distributed Case and t =10 mins 0.8409 750/1080 (69.44%)Centralized Case and t =10 mins 0.8401 749/1080 (69.35%)Distributed Case and t =30 mins 0.8412 757/1080 (70.09%)Centralized Case and t =30 mins 0.8424 757/1080 (70.09%)Distributed Case and t =60 mins 0.8388 755/1080 (69.90%)Centralized Case and t =60 mins 0.8399 755/1080 (69.90%) Table 5: Evaluation metrics for the prediction of new links ap-plied on Dartmouth Campus trace (cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)
Prediction Cases Metrics AUC Top Scores Ratio at T +1 Distributed Case and t =5 mins 0.6671 1/144 (0.69%)Centralized Case and t =5 mins 0.6518 1/144 (0.69%)Distributed Case and t =10 mins 0.6759 1/184 (0.54%)Centralized Case and t =10 mins 0.6913 1/184 (0.54%)Distributed Case and t =30 mins 0.6469 20/684 (2.89%)Centralized Case and t =30 mins 0.6269 24/684 (3.50%)Distributed Case and t =60 mins 0.6472 51/1008 (5.05%)Centralized Case and t =60 mins 0.6115 58/1008 (5.75%) Table 6: Evaluation metrics for the prediction of new links ap-plied on MIT Campus trace (cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)
Prediction Cases Metrics AUC Top Scores Ratio at T +1 Distributed Case and t =5 mins 0.6823 8/107 (7.47%)Centralized Case and t =5 mins 0.6921 8/107 (7.47%)Distributed Case and t =10 mins 0.7221 0/141 (0.00%)Centralized Case and t =10 mins 0.7121 4/141 (2.83%)Distributed Case and t =30 mins 0.6955 0/267 (0.00%)Centralized Case and t =30 mins 0.6843 0/267 (0.00%)Distributed Case and t =60 mins 0.6929 23/620 (3.70%)Centralized Case and t =60 mins 0.7383 25/620 (4.03%) calculations of the AUC metric. The reduced number ofoccurring links and the findings obtained for the first setof comparison explain the little AUC gap in favor of thetensor-based link prediction approach. Table 7: Evaluation metrics for the prediction of new links ap-plied on TVC model trace (cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)
Prediction Cases Metrics AUC Top Scores Ratio at T +1 Distributed Case and t =5 mins 0.4954 0/76 (0.00%)Centralized Case and t =5 mins 0.4920 0/76 (0.00%)Distributed Case and t =10 mins 0.4758 2/131 (1.52%)Centralized Case and t =10 mins 0.4664 2/131 (1.52%)Distributed Case and t =30 mins 0.4730 2/131 (1.52%)Centralized Case and t =30 mins 0.4816 2/131 (1.52%)Distributed Case and t =60 mins 0.4583 2/131 (1.52%)Centralized Case and t =60 mins 0.4769 4/131 (3.05%) In wireless networks and specifically in intermittentlyconnected ones, it is important to exploit social relation-ships that influence nodes mobility. Taking advantage ofthe social aspect within these networks could ensure a bet-ter routing strategy and therefore improve the packets de-livery rate and reduce latency. Through our proposal, weaim to track eventual similarities between mobility pat-terns of nodes and wisely exploit them for a better linkprediction.As seen earlier, link occurrence between two nodes ismore likely when they have similar social behaviors. Then,identifying nodes that have similar mobility pattern couldhelp to predict effective links between nodes in the future.The more accurate the link prediction is the more opti-mized the routing scheme could get. In fact, an efficientlink prediction would help to make better decisions in theforwarding process. For example, a node would ratherdecide to postpone sending a packet to an eventual cur-rent next hop because the link prediction scheme estimatesthat a better forwarder (closer to the destination for ex-ample) is going to appear in the immediate future. Also,link prediction could prevent buffer overloading. Indeed,an overloaded node would rather drop a packet if the linkprediction scheme indicates that there will not be any pos-sible route toward the destination in the future and beforethe packet’s TTL expires. Through this approach, we canget quite efficient prediction results.As mentioned previously, the efficiency of the techniqueused can exceed 90% of identified links (with slice periodequal to 5 minutes). The link prediction relies on mea-suring the similarity of the mobility of nodes. Song etal. [31] have investigated the limits of predictability inhuman mobility. Relying on data collected from mobilephone carriers, they have found that 93% of user mobilityis potentially predictable. The best predictability percent-age reached by our approach agrees with the conclusion ofSong et al..We have also shown through the simulations that pre-diction efficiency is similar, for a specific scenario (typeof trace and slice time period), in the case of both cen-tralized and distributed computation of Katz scores. Aswe have explained, the distributed scheme is only able to11 able 8: Evaluation metrics for distributed prediction scenar-ios of applied on MIT Campus trace (cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)
Prediction Cases Metrics AUC Top Scores Ratio at T +1 One-hop knowledge and t =5 mins 0.9747 1921/2147 (89.47%)Two-hops knowledge and t =5 mins 0.9838 1922/2147 (89.52%)One-hop knowledge and t =10 mins 0.9671 1865/2187 (85.27%)Two-hops knowledge and t =10 mins 0.9813 1867/2187 (85.36%)One-hop knowledge and t =30 mins 0.9406 1756/2311 (75.98%)Two-hops knowledge and t =30 mins 0.9631 1757/2311 (76.02%)One-hop knowledge and t =60 mins 0.8810 1789/2657 (67.33%)Two-hops knowledge and t =60 mins 0.9256 1803/2657 (67.85%) maintain high scores (link occurrence is likely) as nodesrecord neighbors at one and two hops. The seeming lackof information does not infer on predicting effectiveness.This observation also tallies with Acar and al. conclu-sion. Indeed, in the data mining context, they have triedto make the method scalable and proposed the TruncatedKatz technique (expressed by eq. (11) in [6]). It con-sists in determining Katz scores replacing the collapsedweighted tensor by a low-rank approximation one. Theresults show that this latter technique retains high predic-tion efficiency. Hence, restricting the scores computationon most weighted links (in terms of recentness and dura-tion) does not incur dramatic consequences on predictionefficiency.We have assumed for the computation of similarityscores using the distributed way that nodes know theirtwo-hop neighbors. It is obvious that exchanging infor-mation between nodes about neighbors causes additionaloverhead and consequently more solicited resources. Fromthis perspective, a question can be highlighted: would thetensor-based link prediction method remain effective if theknowledge of nodes is limited to the direct neighbors? Toanswer to this question, we take into consideration the sce-nario where the distributed computation of scores is basedon one-hop neighboring knowledge and we compare it tothe scenario which uses the two-hops knowledge. We usethe MIT Campus trace and track the network topologyduring 4 hours (i.e. the trace of 07/23/02 from 8 a.m. tomidday) and we consider different tensor slice times. Thecomparison is made with the top scores ratio at T +1 andthe AUC metrics. The results are reported in Table 8.When the knowledge is limited to the neighbors at onehop, the closeness only means that it exists a direct linkbetween two nodes. This scenario does not consider therelationships between nodes when they are separated bymulti-hops paths. The results confirm that the predictioneffectiveness is lesser. Even if the top scores ratios at T +1are close, the performance of the one-hop knowledge sce-narios are slightly worse. The AUC metric attests alsothat the prediction is less efficient in such cases. In fact,considering the two-hops knowledge generates more signi-ficative true positive rate for the ROC curve (expressedby better top scores ratio at T +1) while the false positive rate remains practically the same (due to the overwhelm-ing number of noneffective links). Nevertheless, the bestscenario to retain is not obvious to identify if we comparethe cost of exchanging local information between nodesto the cost of less efficient link prediction. Future simu-lations and real deployments will enable us to determinewhich setting is preferable to consider.
5. Conclusion
Human mobility patterns are mostly driven by social in-tentions and correlations in the behaviors of people form-ing the network appear. These similarities quantifies thecorrelation between the spatial level in terms of visitedlocations and the temporal level regarding mobility cor-relation over period of time. The knowledge about thebehavior of nodes greatly helps in improving the designof communication protocols. Intuitively, two nodes thatfollow the same social intentions over time promote theoccurrence of link in the immediate future.In this paper, we presented a link prediction techniqueinspired from data-mining and exploit it in the context ofwireless networks. Our contribution in this paper, as a newlink prediction technique for the intermittently connectedwireless networks, is designed through two major steps.First, the network topology is tracked over several timeperiods in a tensor. Secondly, after collapsing the struc-tural information, Katz measure is computed for each pairof nodes as a score. A high score means similar movingpatterns inferring the closeness of the nodes and indicatesthat a link occurrence is likely in the future.Through the link prediction evaluation, we have ob-tained relevant results that attest the efficiency of ourcontribution and agree with some findings referred in theliterature. We summarize them in the following points: • The tensor-based link prediction technique is quiteefficient especially when applied on real traces (Dart-mouth Campus and MIT Campus traces). The resultare supported by the ROC curves and the evaluationmetrics (AUC and Top Scores Ratio at T +1 metrics). • Applied on real traces, the proposed prediction tech-nique provides more accurate results with lower tensorslice (or tensor adjacency matrices) times. • The prediction results with the synthetic trace (TVCmodel trace) confirm the lack of social interactions.The intentions of node are only governed by the pre-ferred locations and do not correlate with the inten-tions of the other nodes. • The link prediction method guarantees good perfor-mance when prediction is applied to all links. Nev-ertheless, the prediction of new links (not occurringaccording to statistics and by ignoring all links seenpreviously) is not accurate (very low AUC and topscores ratio at T +1 metrics).12 Applying the prediction technique in a distributedway (nodes knows only their neighbors at most at twohops) achieves similar predicting performance com-pared to the use in centralized way (an entity hasfull-knowledge about network structure over time). • The temporal tensor-based link prediction describedin this paper is based on an encounter metric whichtakes into account the occurring contacts at the samelocation and at the same time. We provide a per-formance comparison with a similar approach builtaround an association similarity metric (that quan-tifies similarity based on preferred locations regard-less of time correlations) and show that our proposalachieves better prediction results.Good link prediction offers the possibility to further im-prove opportunistic packet forwarding strategies by mak-ing better decisions in order to enhance the delivery rateor limiting latency. Therefore, it will be relevant to supplysome routing protocols with prediction information and toassess the contribution of our approach in enhancing theperformance of the network especially as we propose anefficient distributed version of the prediction method. Theproposed technique also motivates us to inquire into futureenhancements as a more precise tracking of the behaviorof nodes and a more efficient similarity computation.
Acknowledgements
We want to thank wholeheartedly Evrim Acar, Dim-itrios Katsaros, Walid Benameur and Rachit Agarwal fortheir valuable comments and helpful advice.
References [1] T. Hossmann, T. Spyropoulos, F. Legendre, Social networkanalysis of human mobility and implications for dtn perfor-mance analysis and mobility modeling, Tech. Rep. 323, Com-puter Engineering and Networks Laboratory ETH Zurich (July2010).[2] A. Chaintreau, P. Hui, J. Crowcroft, C. Diot, R. Gass, J. Scott,Impact of human mobility on opportunistic forwarding algo-rithms, IEEE Trans. on Mobile Computing 6 (6) (2007) 606–620.[3] W.-J. Hsu, T. Spyropoulos, K. Psounis, A. Helmy, ModelingSpatial and Temporal Dependencies of User Mobility in Wire-less Mobile Networks, IEEE/ACM Trans. on Networking 17 (5)(2009) 1564–1577.[4] G. S. Thakur, A. Helmy, W.-J. Hsu, Similarity analysis andmodeling in mobile societies: the missing link, in: Proc. of the5th ACM workshop on Challenged networks (CHANTS ’10),2010, pp. 13–20.[5] T. Karagiannis, J.-Y. Le Boudec, M. Vojnovi´c, Power law andexponential decay of inter contact times between mobile devices,in: Proc. of the 13th annual ACM international conference onMobile computing and networking, (MobiCom ’07), 2007, pp.183–194.[6] E. Acar, D. M. Dunlavy, T. G. Kolda, Link Prediction on Evolv-ing Data Using Matrix and Tensor Factorizations, in: Proc. ofthe IEEE International Conference on Data Mining Workshops,2009, pp. 262–269. [7] D. M. Dunlavy, T. G. Kolda, E. Acar, Temporal link predictionusing matrix and tensor factorizations, ACM Trans. Knowl. Dis-cov. Data 5 (2) (2011) 10:1–10:27.[8] L. Katz, A new status index derived from sociometric analysis,Psychometrika 18 (1) (1953) 39–43.[9] S. Wasserman, K. Faust, Social Network Analysis: Methods andApplications, Cambridge University Press, 1994.[10] D. Katsaros, N. Dimokas, L. Tassiulas, Social network analysisconcepts in the design of wireless Ad Hoc network protocols,IEEE Network 24 (6) (2010) 23–29.[11] P. Hui, J. Crowcroft, E. Yoneki, Bubble rap: social-based for-warding in delay tolerant networks, in: Proc. of the 9th ACMinternational symposium on Mobile ad hoc networking and com-puting (MobiHoc ’08), 2008, pp. 241–250.[12] E. M. Daly, M. Haahr, Social network analysis for routing in dis-connected delay-tolerant MANETs, in: Proc. of the 8th ACMinternational symposium on Mobile ad hoc networking and com-puting, (MobiHoc ’07), 2007, pp. 32–40.[13] T. Hossmann, T. Spyropoulos, F. Legendre, Know thy neighbor:Towards optimal mapping of contacts to social graphs for dtnrouting, in: Proc. of IEEE INFOCOM, 2010, pp. 1–9.[14] L. Page, S. Brin, R. Motwani, T. Winograd, The PageRank Ci-tation Ranking: Bringing Order to the Web., Tech. rep., Stan-ford InfoLab. (1999).[15] W. Hwang, T. Kim, M. Ramanathan, A. Zhang, Bridging cen-trality: Graph mining from element level to group level, in:Proc. of the 14th ACM SIGKDD international conference onKnowledge discovery and data mining, 2008, pp. 336–344.[16] F. R. K. Chung, Spectral Graph Theory (CBMS Regional Con-ference Series in Mathematics, No. 92), American MathematicalSociety, 1997.[17] B. Bollobas, Modern Graph Theory, Springer, 1998.[18] M. E. J. Newman, Modularity and community structure in net-works., Proceedings of the National Academy of Sciences of theUnited States of America 103 (23) (2006) 8577–82.[19] G. Palla, I. Der´enyi, I. Farkas, T. Vicsek, Uncovering the over-lapping community structure of complex networks in nature andsociety, Nature 435 (7043) (2005) 814–8.[20] K. Lee, S. Hong, S. J. Kim, I. Rhee, S. Chong, Slaw: A newmobility model for human walks, in: Proc. of IEEE INFOCOM,2009, pp. 855–863.[21] U. G. Acer, P. Drineas, A. A. Abouzeid, Random walks in time-graphs, in: Proc. of the Second International Workshop on Mo-bile Opportunistic Networking, (MobiOpp ’10), 2010, pp. 93–100.[22] The DBLP computer science bibliography, .[23] C. Wang, V. Satuluri, S. Parthasarathy, Local ProbabilisticModels for Link Prediction, in: Proc. of the Seventh IEEE In-ternational Conference on Data Mining, (ICDM ’07), 2007, pp.322–331.[24] D. Liben-Nowell, J. Kleinberg, The link-prediction problem forsocial networks, Journal of the American Society for Informa-tion Science and Technology 58 (7) (2007) 1019–1031.[25] Cornell university library arxiv.org, http://arxiv.org/ .[26] CRAWDAD: A community resource for archiving wireless dataat dartmouth, http://crawdad.cs.dartmouth.edu/ .[27] M. Balazinska, P. Castro, Characterizing mobility and networkusage in a corporate wireless local-area network, in: Proc. ofthe 1st international conference on Mobile systems, applicationsand services, (MobiSys ’03), 2003, pp. 303–316.[28] Time-variant community mobility model, http://nile.cise.ufl.edu/~weijenhs/TVC_model/ .[29] T. Fawcett, An introduction to ROC analysis, Pattern Recog-nition Letters 27 (8) (2006) 861–874.[30] R. A. Horn, C. R. Johnson, Matrix Analysis, Cambridge Uni-versity Press, 1990.[31] C. Song, Z. Qu, N. Blumm, A.-L. Barab´asi, Limits of Pre-dictability in Human Mobility, Science 327 (5968) (2010) 1018–1021..[29] T. Fawcett, An introduction to ROC analysis, Pattern Recog-nition Letters 27 (8) (2006) 861–874.[30] R. A. Horn, C. R. Johnson, Matrix Analysis, Cambridge Uni-versity Press, 1990.[31] C. Song, Z. Qu, N. Blumm, A.-L. Barab´asi, Limits of Pre-dictability in Human Mobility, Science 327 (5968) (2010) 1018–1021.