[PDF] Generating dynamic contact graphs with indirect links

Abstract

Graph models are widely used to study diffusion processes in contact networks. Recent data-driven research has highlighted the significance of indirect links, where interactions are possible when two nodes visit the same place at different times (SPDT), in determining network structure and diffusion dynamics. However, how to generate dynamic graphs with indirect links for modeling diffusion remains an unsolved challenge. Here, we present a dynamic contact graph model for generating contact networks with direct and indirect links. Our model introduces the concept of multiple concurrently active copies of a node for capturing indirect transmission links. The SPDT graph model builds on activity driven time-varying network modelling for generating dynamic contact networks using simple statistical distributions. This model is fitted with a large city-scale empirical dataset using maximum likelihood estimation methods. Finally, the performance of the model is evaluated by analysing the capability of capturing the network properties observed in empirical graphs constructed using the location updates of a social networking app and simulating SPDT diffusion processes. Our results show that, in comparison to current graph models that only include direct links, our graph model with indirect links match empirical network properties and diffusion dynamics much more closely.

Full PDF

IIMA Journal of Complex Networks (2019) Page 1 of 32doi:10.1093/comnet/xxx000

Generating dynamic contact graphs with indirect links ∗ M D S HAHZAMAL , , R AJA J URDAK , , B ERNARD M ANS , F RANK DE H OOG AND D EAN P AINI Macquarie University, Sydney, Australia CSIRO, Australia Queensland University of Technology, Brisbane, Australia ∗ Corresponding author: [email protected][Received on 12 November 2019]

Graph models are widely used to study diffusion processes in contact networks. Recent data-drivenresearch has highlighted the signiﬁcance of indirect links, where interactions are possible when twonodes visit the same place at different times (SPDT), in determining network structure and diffusiondynamics. However, how to generate dynamic graphs with indirect links for modeling diffusion remainsan unsolved challenge. Here, we present a dynamic contact graph model for generating contact networkswith direct and indirect links. Our model introduces the concept of multiple concurrently active copiesof a node for capturing indirect transmission links. The SPDT graph model builds on activity driventime-varying network modelling for generating dynamic contact networks using simple statisticaldistributions. This model is ﬁtted with a large city-scale empirical dataset using maximum likelihoodestimation methods. Finally, the performance of the model is evaluated by analysing the capability ofcapturing the network properties observed in empirical graphs constructed using the location updates ofa social networking app and simulating SPDT diffusion processes. Our results show that, in comparisonto current graph models that only include direct links, our graph model’s indirect links match empiricalnetwork properties and diffusion dynamics much more closely.

Keywords : Social networks, network modelling, diffusion processes, epidemiology

1. Introduction

Graph modelling is an efﬁcient method to understand and explore behaviours of diffusion processes innetworked systems of individuals. In addition, modelling contact graphs is an attractive research topicdue to its wide applications, from disease modelling to viral marketing [14, 16, 19, 20, 28]. Graphmodels represent individuals as nodes with interactions among individuals as links or edges [23, 25,33, 40, 43]. A key function of graph models is to generate contact networks capable of characterisingdiffusion behaviours. In diffusion processes, infectious items appear at a node or a group of nodesand then spread further through inter-node interactions (edges). The inter-node interactions create local(node level) transmission links of infectious items and a series of such local transmissions spread theinfectious items throughout the network. This paper focuses on dynamic time varying graphs [12,29, 30], which are more realistic though more challenging for capturing the temporal features such ascontact frequencies or inter-contact times.Most current graph models assume that both infected and susceptible individuals are present togetherin a physical or virtual space to make an individual level transmission of infectious items [12, 32]. These c (cid:13) The author 2019. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved. a r X i v : . [ c s . S I] N ov of 32 M SHAHZAMAL graph models implicitly assume diffusion processes where the only underlying transmission contact net-works are created due to the direct interactions between individuals, i.e. presence at the same physicalor virtual locations. However, there are many diffusion scenarios where susceptible individuals canreceive infectious items in the absence of an infected individual due to delayed interactions with infec-tious items [8, 24, 29, 36]. We refer to diffusion processes with indirect transmissions as SPDT (sameplace different time) diffusion and to diffusion processes with the direct transmission links as SPST(same place same time) diffusion. The enhancement in diffusion dynamics by the SPDT model overSPST models has been reported in our earlier work [29–31]. One of the key properties of SPDT dif-fusion is that nodes in this model can create transmission links at multiple locations simultaneously: adirect transmission link at the current location and indirect transmission links at previous locations. Cur-rent graph models only consider that nodes can create links with other nodes at a single location. Theother key properties of SPDT diffusion is that the time lag between the arrival of susceptible individualsand departure of infected individuals along with the duration of a link is crucial, as these temporal fea-tures determine the likelihood of transmission. Thus, a graph model that can explicitly generate thesetemporal features associated with indirect links is essential for modelling SPDT diffusion.In this paper, we propose a generative SPDT graph model that supports both direct and indirect links,and that explicitly captures the temporal features of these links for more realistic diffusion modelling.The SPDT graph model adopts the principal of activity driven time varying networks (ADN) modellingto generate contact networks with both direct and indirect links [21, 22, 35]. In the basic ADN model,a node is activated with a probability deﬁned by the activation potential at each time step and create m links with other nodes regardless of their activation status. The activation potential of the node is oftenassigned according to a power-law distribution. At each time step, links among nodes are generatedindependently. In our graph model, the activation of a node means arrival of an individual at a locationwhere at least one other individual is present. To model realistic scenarios where invididuals can visit alocation of a period of time, we introduce the concept of active period that is deﬁned as the consecutiverange of time steps during which the individual is capable of spreading the infectious items to others. Inorder to represent both direct and indirect transmission opportunities, an active copy of a node is createdfor each active period of the node. In the proposed model, links are created between the active copyof the host node and neighbour nodes. The active copy survives for the active period, when the host ispresent at the interaction location, in addition to an indirect transmission period, when the host leaves thelocation yet the contagious items persist at the location. In the SPDT graph, an active copy creates linksat a time step within the active period and indirect transmission period. Thus, the SPDT graph evolvesaccording to temporal changes of link and node status. Interestingly, the SPDT graph model supportsmultiple simultaneous copies of a node in the network that are active, where one copy is aligned thecurrent location of the node, and any other copies represent the ongoing indirect transmission periodsfrom recently visited locations of the same node.Our key contribution in this paper is the design of a graph generation method based on the deﬁnitionof the SPDT graph model in which we deﬁne six co-location interaction parameters (CIP), capturingtemporal dynamics and contact properties, to generate the transmission contact networks by the SPDTgraph model. We use simple statistical distributions such as the geometric distribution to generate co-location interaction parameters. Various algorithms are developed using maximum likelihood estimationmethods to ﬁt model parameters with that of the networks constructed using the location updates oflocation based social networking app Momo. The model is validated by analysing network propertiesof the generated contact networks and by the capability of simulating SPDT diffusion processes. Thediffusion dynamics are compared with that of the ADN model and SPST models. Our paper is organisedas follows: ODELLING CO-LOCATION NETWORKS • We introduce a generative SPDT graph model to include indirect transmission links with the directtransmission links and deﬁned contact network generation methods in Section 2. • We develop data-driven algorithms to ﬁt the proposed graph model with a real dynamic contactgraph and the estimated model parameters using a Momo data set in Section 3. • We analyse how the proposed model with active copies of nodes capture network properties ofreal world SPDT graph and simulated SPDT diffusion dynamics realistically, and we conduct asensitivity analysis of the model’s parameters in Section 4.

2. SPDT graph model

The SPDT graph model is developed using the deﬁnition of SPDT diffusion. Before introducing thegraph model, we recall brieﬂy this diffusion model as introduced in [30]. In the SPDT diffusion model,an SPDT link is created through location and time, and may have two components: a direct transmissionlink and/or an indirect transmission link. SPDT links require the visits of an infected individual to thelocations where at least one susceptible individual is present. Visits of an infected individual to locationswhere no susceptible individuals are present do not lead to transmission of disease. We thus ﬁrst explainthe link creation procedures in the SPDT diffusion processes. We then deﬁne the SPDT graph model byfollowing contact network generation methods under this graph deﬁnition.2.1

Modelling Scenarios

The link creation procedure for the SPDT model can be explained by airborne disease spreading phe-nomena as shown in Figure 1. In this particular scenario, an infected individual A (host individual)arrives at location L at time t followed by the arrival of susceptible individuals u and v at time t .The appearance of v at L creates a directed link for transmitting infectious particles from A to v andlasts until time t making direct contact during [ t , t ] and indirect contact during [ t , t ] (see Figure 1).The indirect contact is created as the impact of A still persists (as the virtual presence of A is shownby the dashed circle surrounding A) after it left L at time t , due to the survival of the airborne infec-tious particles in the air at L. The appearance of u has only created direct links from A to u during [ t , t ] . Another susceptible individual w arrives at location L at time t and a link is created from A tow through the indirect contact due to A’s infectious particles still being active at L. However, the timedifference between t (arrival time of w) and t (departure time of A) should at most δ seconds, deﬁnedthe maximum time that infectious particles can infect another person after A left at t .2.2 Graph deﬁnition

The main goal of the SPDT graph model is to support indirect transmission links along with direct linksfor co-located interactions among individuals. The SPDT graph model focuses on link creations in thetime domain to ensure scalability. Temporal modelling is sufﬁcient to identify the nodes participatingin possible disease transmission links. Accordingly, the link creation events in the proposed scenariocan be represented as a process where an infected node activates for a period of time (staying at alocation where susceptible nodes are present) and creates SPDT links. Then, the infected nodes becomeinactive for a period during which it does not create SPDT links. Inactive periods represent the waitingtime between two active periods. Thus, the co-located interaction status of an infected node during an of 32

M SHAHZAMAL 𝒕 𝒕 𝒕 𝒕 𝒕 𝒕 A active A virtually active v exposed direct indirect w exposedindirect 𝒕 𝒕 𝒕 𝒕 𝒕 𝒕 u exposed Infected individualVirtual presence of infected individual ASusceptible individual

Areas infectious particles available from infected

Temporal snapshots ofindividual interactions F IG . 1. [30] Disease transmission links creation for co-located interactions among individuals in SPDT model. The upper partshows the six snapshots of interactions over time and the lower part shows the periods of exposure through direct and indirectinteractions. Susceptible individuals are linked with the infected individual if they enter the blue dashed circle areas within whichinfectious particles are available to cause infection observation period can be summarised by a set { a , w , a , w , .. } where a i is active period and w i isinactive period (see Fig. 3).The proposed SPDT graph is deﬁned as G T = ( Z , A , L , T ) to represent all possible disease transmis-sion links among nodes, where Z is the set of nodes. The number of nodes in the graph is constant;however, nodes may have one or more active copies in the graph which captures their ability to spreaddiseases both at locations at which they are present and at locations from which they recently departed.The set of active copies of nodes is represented by A and is dynamically updated as new copies are cre-ated and old ones expired. L is the set of links in the graph. The graph is represented over a discrete timeset T= { t , t , . . . t z } . Each node in the graph has a set of active and inactive periods { a , w , a , w . . . } .An active copy v i = v ( t is , t il ) is deﬁned for an active period a i of node v , where a i starts at time step t s and ﬁnishes at t l (see Fig.4). Thus, each node will have several such temporal copies for the observationperiod. For an active copy v i + of a node v , t i + s should be greater than t il of v i to capture the requirementthat a node should have left the ﬁrst location before arriving in another location. In this graph, a link e vu ∈ L is deﬁned between an active copy v i of host node v and neighbour node u (node u visits thecurrent or recent location of node v ) as e vu = ( v i , u , t (cid:48) s , t (cid:48) l ) where t (cid:48) s is the joining time and t (cid:48) l is departuretime of u from the interacting location. The value of t (cid:48) s should be within t is and t il + δ , as an active copy v i of a node v expires after t il + δ seconds. During the indirect transmission period δ , node v can startanother active period at another location (see Fig.4). However, if the infected node v leaves and returnsto the location of u within a time period δ , then there will be two active copies of v , each with a link tothe susceptible node u . The ﬁrst copy is due to the persistent of particles from v ’s last visit, while the ODELLING CO-LOCATION NETWORKS inactive activep1-p 1-q q F IG . 2. Node’s states switch with the transitional probabilities p and q. When a node is at inactive state, it switches to activestate with a probability p while the probability of remaining in the inactive state is 1-p. Similarly, a node switches from active toinactive state with a probability q and remain in the active state with a probability 1-q Periods 𝒂 𝒘 𝒂 𝒘 𝒂 𝒘 𝒂 State 1 1 0 0 1 1 1 0 1 0 0 0 1 1 Time step 𝒕 𝒕 𝒕 𝒕 𝒕 𝒕 𝒕 𝒕 𝒕 𝒕 𝒕 𝒕 𝒕 𝒕 Direct 𝒂 𝒂 𝒂 Indirect 𝒂

𝟏 ′ 𝒂 Action time 𝒕 𝒕 𝒕 𝒕 𝒕 𝒕 𝒂 F IG . 3. State transitions of a node over time and how active and inactive periods are formed with the underlying states F IG . 4. Direct and indirect transmission periods of a node over time. Indirect transmission periods are formed following a directperiod and can be concurrent with other direct and indirect periods active and inactive periods 𝒂 (𝒕 , 𝒕 ) 𝒘 (𝒕 , 𝒕 ) 𝒂 (𝒕 , 𝒕 ) 𝒘 (𝒕 , 𝒕 ) 𝒗 = 𝒗 (𝒕 𝒔𝟏 = 𝒕 , 𝒕 𝒍𝟏 = 𝒕 ) active copies 𝒗 = 𝒗 (𝒕 𝒔 = 𝒕 , 𝒕 𝒍 = 𝒕 ) where 𝒕 𝒔𝟐 > 𝒕 𝒍𝟏 F IG . 5. Creation of active node copies corresponding to each active period. The active copies of a node are identiﬁed by the activeperiod time and survive for active period duration plus δ time that is the duration when indirect links are created with active copy second copy is due to v ’s current visit.The evolution of the proposed graph is governed by two dynamic processes: 1) switching of nodesbetween active and inactive states, as in Figures 2 and 3; and 2) link creation and deletion for active of 32 M SHAHZAMAL copies of nodes. As stay times at locations are not ﬁxed, a transition probability ρ is deﬁned to determineswitching from active state to inactive state (modelling stay and departure events of a node at a location).This induces variable-length active periods. Similarly, another transition probability q is deﬁned todetermine when a node switches from an inactive to an active state (modelling arrival of a node atlocation). A similar approach is taken to deﬁne link update dynamics in the graph. An active copy ofa node creates a link to a newly arriving neighbour node with probability p c at each time step until theactive copy expires. During an activation period, multiple neighbours can arrive at interaction locations.Thus, an activation degree probability P ( d ) is deﬁned to model the arrival of multiple new neighboursfor an active copy. The created links break (neighbour node leaves the interaction area) with probability p b at each time step.2.3 Network generation

Based on the above SPDT graph model deﬁnition, dynamic contact networks among a set of nodes canbe generated in various ways ranging from homogeneous to complex heterogeneous scenarios. Thesimplest homogeneous version of a model often provides useful insights about the studied processes,but cannot capture higher resolution interactions of the actual system. On the other hand, the heteroge-neous system is more representative at the cost of added complexity. Our proposed network generationprocess combines both approaches. To generate the SPDT contact network, we ﬁrst use geometric dis-tributions (because of their computational simplicity), and second a power-law distribution is added toallow heterogeneity in the generated networks. The SPDT graph contains a ﬁxed number N of nodesinteracting with each other over the observation period [ , T ] . The graph is implemented over discretetime with time step ∆ t . The link generation procedure in the SPDT graph model is implemented throughﬁve co-location interaction parameters (CIP) namely: active periods( t a ), waiting periods( t w ), activationdegree(d), link creation delay ( t c ) and link duration( t d ). The generation procedures are as follows:2.3.1 Node activation

Active copies of nodes are the key building blocks for constructing an SPDTgraph. The active node copies are created over time according to the active periods. Thus, it is ﬁrstrequired to generate active periods and intervening inactive periods. In the proposed model, determiningwhether a node will stay in the current state or transit into the other state at the next time step resemblesa Bernoulli process with two outcomes. Thus, the number of time steps a node stays in a state can beobtained from a geometric distribution. With the transitional probability ρ of switching from active toinactive state, the active period duration t a can be drawn from the following distribution as: Pr ( t a = t ) = ρ ( − ρ ) t − (2.1)where t = { , , . . . } are the number of time steps. Similarly, the inactive period duration, t w , with thetransition probability q can be drawn from the following distribution as: Pr ( t w = t ) = q ( − q ) t − (2.2)where t = { , , . . . } are the number of time steps.Knowing the active and inactive period duration, the graph generation process requires the initialstates of nodes to generate the active copies of nodes. The model’s state dynamics can be described witha two state Markov-process with transition matrix P = (cid:20) q − q ρ − ρ (cid:21) ODELLING CO-LOCATION NETWORKS π and π respectively, where π = ρ q + ρ and π = qq + ρ (2.3)If the initial state of node v is active, the ﬁrst active copy v is created for the time interval ( t s = , t l = t a ) . Otherwise, v will be created for the interval ( t s = t w , t l = t w + t a ) . Active copy creation continuesover the observation period and the corresponding interval ( t s , t l ) is deﬁned according to the drawn t a and t w . Active copies are generated for each node independently. The values of ρ and q are ﬁtted withreal data.2.3.2 Activation degree

The next step in the graph generation process is to deﬁne interactions ofneighbour nodes with an active copy. Multiple neighbour nodes can contact with an active copy. Thenumber of neighbour nodes interacting with an active copy is noted activation degree d . The value of d depends on the spatio-temporal dynamics of the graph and are drawn from a geometric distribution(Eq. 2.4) instead of ﬁnding the arrival times of neighbour nodes separately. Pr ( d = k ) = ( − λ ) λ k − (2.4)where k = { , , . . . } are the number of neighbour nodes of an active copy and λ is a scaling parameter.If the same parameter λ is assigned to all nodes in the network, it is a homogeneous contact network.However, individuals in the real world contact scenarios have heterogeneous propensity to interact withother individuals. This heterogeneity is incorporated into the model by selecting λ from a power lawdistribution of Equation 2.5: f ( λ i = x ) = α x − ( α + ) ξ − α − ψ − α (2.5)where α is the scaling parameter, ξ is the lower limit of λ i and ψ is the upper limit which is approx-imately 1. The value of λ i deﬁnes the range of variations of d for active copies of a node i and Equa-tion 2.4 ensures wide ranges for large values of λ . Combining geometric and power law distributions isknown to generate more realistic degree distribution [4] as is demonstrated in the model ﬁtting section.Given the degree distribution, a link creation delay t c , time gap between arrivals of host node and neigh-bour node ( t s − t (cid:48) s ) , and a link duration t d , the stay time of neighbour at the interacted location ( t (cid:48) s − t (cid:48) l ) ,are assigned to each link.2.3.3 Link creation

With the activation neighbour set, the graph generation process deﬁnes the arrivaland departure dynamics of neighbour nodes for each link created with an active copy. A similar approachto the deﬁnition of active and inactive periods creation with transition probabilities is adapted. Herethe assumption is that each link is created with probability p c at each time step during the life period ( t s , t l + δ ) of an active copy and is broken with probability p b after creation. For the link creation delay t c , the time gap between arrivals of host node and neighbour node ( t s − t (cid:48) s ) , the truncated geometricdistribution is used: P ( t c = t ) = p c ( − p c ) t − ( − p c ) t a + δ (2.6)where t = { , , , . . . , t a + δ } are the number of time steps and t a is the active period duration of cor-responding active copy. Truncation ensures that links are created within t l + δ , i.e. before the activecopy expires. In contrast, the link duration t d , the stay time of neighbour at the interacted location, of 32 M SHAHZAMAL does not have a speciﬁc upper bound and is generated for each link upon creation through a geometricdistribution: P ( t d = t ) = p b ( − p b ) t − (2.7)where t = { , , . . . } are the number of time steps. For simplicity, p b is set to ρ as both probabilitiesrelate to how long nodes stay at a location. Each link with an active copy, thus, has timing characteristicsas t (cid:48) s = t s + t c and t (cid:48) l = t (cid:48) s + t d . A link with t (cid:48) s (cid:62) t l is an indirect transmission only component. Link canalso have indirect component if t (cid:48) s < t l and t l > t (cid:48) l . The above graph generation steps capture the temporalbehaviour of SPDT links. The social mixing patterns are integrated by selecting the neighbouring node,as it is described below.2.3.4 Social structure

The underlying social structure of the generated graph depends on the selec-tion of a neighbour node for each link. The simplest way of selecting the neighbour node is to pick anode randomly. However, social network analysis has shown that the neighbour selection for creatinga link follows a memory-based process. Thus, the reinforcement process [13] is applied to realisticallycapture the repeated interactions between individuals. In this process, a neighbour node from the set ofalready contacted nodes is selected with probability P ( n t + ) = n t / ( n t + η ) where n t is the number ofnodes the host node already contacted up to this time t . On the other hand, a new neighbour node isselected with the probability 1 − P ( n t + η ) . Here, the size of the contact set, the number of nodes a nodecontact over the observation period, depends on the λ . This is because the system forces a host nodeto select a new neighbour when neighbours from the current contact set are already selected during anactive period even if P ( n t + ) is true. On the other hand, a new node is selected as a neighbour withthe probability 1 − P ( n t + ) . To maintain local clustering among nodes, µ proportion new neighbouris selected from the neighbours of neighbour nodes. In addition, when a node j is chosen as a newneighbour by node i in the heterogeneous networks, it is selected with the probability proportional toits λ j as nodes with higher λ will be neighbours to the more nodes [2]. This ensures nodes with higherpotential to create links also have a higher potential to receive links.

3. Materials and Methods

The performance of the proposed graph model is studied by comparing the network properties of thegenerated networks with the networks properties of the empirical SPDT graph and analysing its abil-ity to simulate SPDT diffusion. The SPDT model requires appropriate model parameters to generatecontact networks. We have used empirical GPS location updates from Shanghai city to estimate themodel parameters and location updates from Beijing city for evaluating the model, based on the datasetdescribed below.3.1

Data set

This study applies location update information from users of a social discovery network

Momo . TheMomo App enables users to interact with nearby users by sharing their current locations. Whenevera user launches the Momo app, the current location is forwarded to the Momo server. The serversends back the latest location updates of all users nearby. These location updates have been previouslycollected by the authors of [38] using a set of network API communicating with Momo server. The APIretrieved location updates every 15 minutes over a period of 71 days (from May to October 2012). The v and a neighbour user u (susceptible user), it is required to ﬁnd the arrival times ( t s , t (cid:48) s )and departure times ( t l , t (cid:48) l ) of two users. As the ﬁrst step, it is identiﬁed that an infected host user v isstaying at a location. Consecutive updates, X = { ( x , t ) , ( x , t ) , . . . ( x k , t k ) } where x i are the co-ordinatevalues and t i are the update times, from a user v within a radius of 20m (travel distance of airborneinfection particles [8]) of the initial update’s location x are indicative of the user staying within thesame proximity of x . A threshold is set for time difference of any two updates to 30 minutes to makesure an infected host remain within the same proximity, as longer gaps may indicate a data gap in theuser pattern. Then, the central co-ordinate in the update X is searched where the distances from eachupdate to all other updates are added together and the update x c with the minimum sum is selected ascentral co-ordinate. For the host user v , its visit to the proximity of x c will represent a valid visit ifa susceptible user u has location updates starting at t (cid:48) while v is present, or within δ seconds after v leaves the area. The user u should have at least two updates within 20m of x c to be valid to ensure thatit is in fact staying at the same proximity, and therefore can be exposed to the infected particles, ratherthan simply passing by. The stay period of host user v at the proximity of x c is ( t l = t k , t s = t ), where t k represents the end of the current stay period. If u ’s last update within 20m around x c is ( x (cid:48) j , t (cid:48) j ) , thecreated SPDT link has a link duration ( t (cid:48) l = t (cid:48) j , t (cid:48) s = t (cid:48) ) due to active visit ( t l = t k , t s = t ). All links toother users for this active visit ( t l = t k , t s = t ) are computed. Similarly, all visits made by v are searchedover the updates of 32 days and SPDT links are extracted. This process is executed for all users presentin the data set to ﬁnd all possible SPDT disease transmission links and provide a real contact networkwith SPDT links among users. An SPDT link is noted as e vu = (cid:0) v ( t s , t l ) , t (cid:48) s , t (cid:48) l (cid:1) which means that a user v visit a location during ( t s , t l ) where another user u is present for the duration ( t (cid:48) s , t (cid:48) l ) where t (cid:48) s (cid:62) t s . Eachlink between the two same users are distinguished by the time intervals ( t s , t l ) and ( t (cid:48) s , t (cid:48) l ) as there can bemultiple links to a neighbour for the same stay interval ( t s , t l ) of the host user.The above process is executed for all host users present in the data set to ﬁnd all possible SPDTdisease transmission links and provide a SPDT contact network of 338K users. This network includespossible direct and indirect transmission links due to direct and indirect co-location interactions amongusers. However, users appear in the system for on average 3-4 days and then disappear for the remainderof the simulation period. Thus, the link density in the network is sparse and the network is called SparseSPDT network (SDT). In this sparse SPDT network, infected individuals cannot apply their full potentialto infect other individuals due to absence from the networks and thus underestimate diffusion dynamics.Thus, we reconstruct a dense SPDT network (DDT) from this network repeating the links from theavailable days of a user to the missing days for that user [34, 39]. If a user has links for multiple days,a day will be randomly chosen and will be copied to a randomly chosen day where that user has nolinks. The resulting network is used as the empirical contact network that produces realistic diffusiondynamics. The DDT network is used for simulating SPDT diffusion.0 of 32 M SHAHZAMAL

Model parameters estimation

In the SPDT graph model, contact networks are generated based on the ﬁve co-located interactionparameters (CIP). We collect the CIP parameters from real contact networks which are created usinglocations updates of Momo users. Then, maximum likelihood estimation methods are applied to ﬁndthe model parameters.3.2.1

CIP data selection

We select the extracted links of seven days from Shanghai city to estimatemodel parameters. The seven-day data contains 2.53 million active periods from 126K users. Thedistribution of active period durations t a exhibits heterogeneous behaviour and is captured by long taildistributions. Then, the waiting periods t w are collected. However, the waiting period collection is notstraight forward like t a . This is because Momo users do not use the app regularly and many potentialvisits are not captured. If we assume that individuals in the system are activated at least once a dayand one actives at the end of the current day after his last activation at the beginning of previous days,the maximum waiting period will be up to 2 days. A new co-location interaction parameter calledactivation frequency h is deﬁned which measures the rate of activation of nodes to avoid the irregularityin the waiting period. This is measured with the number of active periods created by a Momo user ina day. The activation frequencies h are collected for all users over each day during the selected sevendays. This provides 0.5M activation frequencies for the 126K users. The activation frequencies withactive period durations model the inactive periods. As the activation frequencies are daily behavioursof Momo users, modelling active period and waiting period based on h avoids the impact of missingupdates. Next, the activation degrees are collected based on the number of other Momo users visiting alocation where the host user has been for the corresponding active period. Here, the users are requiredto visit a location within δ = t c and link duration t d arecollected for all links created over the seven days. These also capture all possible lengths of t c and t d and can be described by the long tail distributions.3.2.2 Goodness of ﬁt

Using the developed graph model, a synthetic SPDT contact network (GDT-1,generated SPDT network) is created with 126K nodes for 7 days applying the estimated model param-eters. Then, the CIP parameters of GDT-1 are compared with the parameters of real SPDT network(SDT) constructed by 7 days updates of 126K users from Beijing city. To understand the model’sresponse across network sizes, another synthetic SPDT contact network (GDT-2) of 0.5M nodes for 7days is generated. Results of 100 runs for each network are presented in Figures 6, 7 and 8 whereperiods are based on time steps of ∆ t = RSE = (cid:115) m ∑ i = ( x i − y i ) (3.1)where observed values are grouped in m bins as they are discrete, x i is the proportion of observationsfor the i th bin, y i is the proportion of empirical dataset values in the i th bin. As the RSE is computedfrom the proportion values, bins are naturally weighted so that bins representing larger proportions ofevents have higher contributions to the error calculation. Note that the bins up to the last bin of x i areconsidered assuming that the discarded values of y i provide very small error as the selected distributions ODELLING CO-LOCATION NETWORKS

11 of 32are long-tailed. The distributions of network CIP parameters are presented in log scale with RSE errorbars for GDT-1, GDT-2 and SDT in Figures 6, 7 and 8.3.2.3

Fitting activation parameters

In this model, the active periods t a are drawn from a geometricdistribution with the scaling parameter ρ . The value of ρ can be estimated using a sample data set of t a to the following MLE condition of geometric distribution:ˆ ρ = m ∑ ml = s l (3.2)where m is the size of the selected sample set. About m = . M real activation periods made by the126 K momo users over the 7 days are used here. This large set of active periods contains the activedurations from workdays and weekends. The above MLE Equation 3.2 estimates ˆ ρ = . t a for both networks GDT-1 and GDT-2 are shown in Figure 6 A. The RSEerror for GDT-1 is 0.0876 with standard deviation (STD) of 0.00041 while RSE for GDT-2 is verysimilar giving 0.08758 with STD of 0.00018. The model with ﬁtted parameters consistently generatethe active periods t a for nodes of both network sizes. The distribution of active periods is also supportedby the ﬁndings of other studies where they found the contact duration is broadly distributed [11, 26]. Inthese studies, contact duration means staying at a location.The activation periods t a have similar patterns for all individuals. However, the inactivation periods, t w waiting time between two active periods, depends on how frequently individuals visit public places.Thus, the waiting periods t w are ﬁtted based on the ﬁtting of activation frequencies h . The number oftransition events from inactive to active states or active to inactive states will be the activation frequencyduring an observation periods. According to Equation 2.3, the probability of transition event 0 → p = ρ qq + ρ Thus, the number of transition events 0 → z time steps (here number of time steps in a day)represents the number of activation events h . The probability of h activation for a node is given by thebinomial distribution as: Pr ( h | q ) = (cid:18) zh (cid:19) (cid:18) ρ qq + ρ (cid:19) h The term ρ qq + ρ becomes small as ρ = . z is usually large and q <

1. Thus, the above equationcan be approximated by a Poisson distribution as: Pr ( h | q ) = (cid:16) z ρ qq + ρ (cid:17) h e − z ρ qq + ρ h ! (3.3)The MLE condition for the Equation 3.4 is derived in the Appendix as qz ρ q + ρ = l l ∑ i = h i (3.4)In the selected Momo data set, the users are not present every day. Thus, it is assumed that the num-ber of activations made during a day is the representation of the activation frequency. As the focus is to2 of 32 M SHAHZAMAL Active peroid t a P D F ( t a ) (A) SDTGDT-1GDT-2 Waiting peroid t w P D F ( t w ) (B) SDTGDT-1GDT-2 Activation frequency h P D F ( h ) (C) SDTGDT-1GDT-2 F IG . 6. Fitting activation CIP parameters. The CIP parameters of generated networks GDT-1, having same number of nodes ofDDT network, and GDT-2, having 0.5M nodes, are compared with the CIP of DDT network: (A) active period t a , (B) waitingperiod t w and (C) activation frequency h . The period is the number of time steps and h is the number of activation in a day model the waiting time with a geometric distribution, it makes sense to activate the node with variablefrequency over the observation days. The day activation frequencies of 126 K users are collected overthe seven days. Applying the activation sample set h = { h , h , . . . h r } of size r=300K to MLE Equa-tion 3.4 gives ˆ q = . t w withRSE around 0.102 in both networks. The obtained STD of error is 0.01. The t w is characterized by theirregularity of using Momo Apps by users. Activation degree d P D F ( d ) (A) HomogeneousSDTGDT-1GDT-2 10 Activation degree d P D F ( d ) (B) Heterogeneous SDTGDT-1GDT-2 F IG . 7. Fitting activation degree CIP of generated networks GDT-1 and GDT-2 with DDT network: (A) nodes are homogeneous- having same contact potential and (B) nodes are heterogeneous - having different contact potential Fitting activation degree

The activation degree is drawn from the geometric distribution withthe scaling parameter λ . If we assume that nodes in the network are homogeneous, λ can be estimatedusing the following equation ˆ ρ = m ∑ ml = d l (3.5) ODELLING CO-LOCATION NETWORKS

13 of 32The activation degree sample set d = { d , . . . , d m } of size m=518K are used to the MLE equation. Theestimated value for ˆ λ is 0.32. The generated activation degree distributions for GDT-1 and GDT-2 andcomparison with real one are presented in Figure 7 A. The RSE error is 0.017 with STD of 0.004. Theﬂuctuating error at the tail is due to data sparsity and has little effect on the RSE.When nodes in the network has heterogeneous propensity to contact the other individuals, the dis-tribution of d in the network will be given for any λ as: Pr ( d ) = βξ β − (cid:90) ξ ( λ d − β − − λ d − β − ) d λ = βξ − β − (cid:32) − ξ d − β − d − β − − − ξ d − β d − β (cid:33) (3.6)The MLE condition for the Equation 3.6 is derived in the Appendix which provide the following twoequations. The MLE condition for β assuming ψ ≈ = m β − m ξ β ln ξξ β − + n ∑ k = ξ dk − β − ln ξ d k − β − − − ξ dk − β − ( d k − β − ) − ξ dk − β ln ξ d k − β − − ξ dk − β ( d k − β ) − ξ dk − β − d k − β − − − ξ dk − β d k − β (3.7)where m is the length of the data set d = { d , . . . , d m } . Then the lower limit of power law distribution isestimated with the following MLE equation.0 = m β ξ − β − ξ − β − + m ∑ k = ( d k − β ) ξ dk − β − d k − β − ( d k − β − ) ξ dk − β − d k − β − − ξ dk − β − d k − β − − − ξ dk − β d k − β (3.8)As the MLE equations do not have closed form solutions, the python module called fsolve is used toestimate the best value of ξ and β . We assume that β = . ξ = .

26 using the MLE Equation 3.8.Using ξ = .

26 in Equation 3.7, the value of β is estimated as 2.963. The generated activation degreedistributions with the heterogeneous potential are presented in the Figure 7. The RSE error is 0.0152with STD of 0.004. There are few values with high degree and that can happen in the real networks.3.2.5 Fitting link creation parameters

The link creation delay t c is drawn with the truncated geomet-ric equation. The MLE condition is not a straight forward as the MLE of geometric equation. The MLEfor the truncated equation is derived as follows:0 = p c n ∑ k = ∑ ml = ( t kc − )( − ( − p c ) tla + δ )+( t la + δ )( − p c ) tla + δ ( − p c )( − ( − p c ) tlta + δ ) ∑ ml = (( − ( − p c ) t la + δ ) − where t c , t c , . . . , t nc are sample set of size n = . M and t a = { t a , . . . , t ma } with m = p c is 0.02. The generated link creation delays are presented in the Figure 8A, where the gen-erated t c have RSE of 0.009 in comparison with the real distribution. The errors was consistent in bothGDT-1 and GDT-2. The value of ρ = . M SHAHZAMAL Link creation delay t c P D F ( t c ) (A) SDTGDT-1GDT-2 10 Link duration t d P D F ( t d ) (B) SDTGDT-1GDT-2 F IG . 8. Fitting link creation CIP of DDT network with the generated networks GDT-1 and GDT-2: (A) link creation delay t c and(B) link duration t d . The values of t c and t d are given with the number of time steps the link duration t d . Figure 8B presents the comparison of generated link duration with real durationwhich has RSE error of 0.022 with STD 0.004. The six SPDT network parameters of the generatednetwork are a good ﬁt to the real network parameters built by Momo users. The variations for gener-ated networks are low and consistent in both GDT-1 and GDT-2 networks. With the estimated modelparameters, synthetic contact networks are generated and their properties are explored to understand theperformance of the model.

4. Results and discussion

We now validate the SPDT graph model by analyzing the generated network properties and the capa-bility of simulating SPDT diffusion process on it. We ﬁrst analyse the network properties to understandthe capability of the graph model to generate properties of the real graph. Then, the diffusion dynamicson the generated networks are analyzed to assess its ability to replicate the diffusion dynamics of thereal contact networks. The previous section tuned the model’s parameters using the location updatesfrom Shanghai. This section uses the location updates from Beijing to compare the network propertiesand diffusion dynamics. The SPDT graph properties and diffusion dynamics are studied generatingnetworks with both homogeneous and heterogeneous degree distributions.4.1

Network Properties

The previous section has focused on estimating co-location interaction parameters (CIP) to ﬁt the SPDTgraph which is based on the activity-driven network modelling to empirical data. This section exploresthe ﬁtted model’s ability to reproduce the static and temporal properties of empirical networks.4.1.1

Static Properties

The SPDT contact networks generated by the developed graph model areconverted to static contact networks where a directed edge between two nodes is created if they have atleast an SPDT link from the host node to neighbouring node at any time over the observation period.Then, two static network metrics, degree centrality and clustering coefﬁcients, are studied. The degreecentrality quantiﬁes the extent of a node’s connectedness to other nodes [7], i.e. how many nodesa host node contacts during the observation period. In a disease spread context, nodes with higher

ODELLING CO-LOCATION NETWORKS

15 of 32 (a) (b) (c) (d) F IG . 9. Triangles for various SPDT link conﬁguration among nodes degree centrality get infected quickly as well as infect a higher number of other nodes [5]. Whiledegree centrality highlights the node connectivity, the clustering coefﬁcient represents how a node islocally connected with the neighbour, i.e. the local social contact structure of nodes in the network [17].The local clustering coefﬁcient is deﬁned as the ratio of the number of triangles present among theneighbours and the possible maximum triangles among neighbours. If a node is already infected in acluster, it is not infected further by other nodes of the same cluster. Thus, the spreading varies basedon the clustering coefﬁcient and the realistic clustering coefﬁcient is required in the generated network.To compute the clustering coefﬁcient, the directions of links are neglected. However, the triangles stillpreserve the social structure formed by friends of friends as shown in Figure 9. All the triangles formedwith directional links are the same as the triangle made by Figure 9. Thus, the generated directednetwork is converted into an undirected network for computing clustering coefﬁcients.In the developed model, the growth of the contact set of a node is determined by λ and the neighbourselection process deﬁned by P ( n t + ) = n t / ( n t + η ) . The value of η controls the degree to whichnodes expand their contact set size. In this study, η is empirically set to 0.1, and the selection ofan optimal value of η is beyond the scope of this paper. The growing of contact set size dependson λ which is the same for all nodes in the homogeneous network (GDH) and varies across nodesin the heterogeneous network (GDT). However, setting η = . η increases, the contact set sizes can be larger. We have also generatedsynthetic network (GDH) with homogeneous degree distribution. The degree distribution in the GDHnetwork follows the Poisson distribution as the activation degree in the GDH network is generated witha geometric distribution and with the small parameter value. Therefore, the homogeneous intensity toinvolve in interaction with others can not capture the real contact set size distribution.In the developed graph, local clustering is formed with selecting µ proportion neighbours from theneighbours of neighbouring nodes. The clustering coefﬁcient for two values µ = µ = . M SHAHZAMAL Out-degree10 P r o p o r t i o n (A) Out-degree centralityDDTGDHGDT 10 In-degree10 P r o p o r t i o n (B) In-degree centrality DDTGDHGDT10 Clustering Co-efficient10 P r o p o r t i o n (C) Clustering Co-efficient, = 0.00 DDTGDHGDT 10 Clustering Co-efficient10 P r o p o r t i o n (D) Clustering Co-efficient, = 0.40 DDTGDHGDT10 20 30Day0246 A v e r a g e o u t - d e g r ee (E) Daily average out-degreeDDTGDHGDT 0 10 20 30Day0.000.050.100.15 A v e r a g e c l u s t e r i n g c o - e ff (F) Daily average clustering co-effDDTGDHGDT F IG . 10. Static network properties in the real contact network DDT, homogeneous contact network GDH and heterogeneouscontact network GDT: A) out-degree centrality, B) In-degreee centrality, C) Clustering co-efﬁcient with µ =

0, D) clusteringco-efﬁcient with µ = .

40, E) Daily average out-degree, and F) Daily average clustering co-efﬁcient

Figure 10D where nodes are binned based on their clustering coefﬁcient with a step of 0.01. If µ is set tozero, the clustering coefﬁcient is low. The clustering coefﬁcient in the GDH network is 0.005 and 353Knodes are in the ﬁrst bin, i.e 353k nodes having clustering coefﬁcient below 0.01 while 93K nodes haveclustering coefﬁcient below 0.01. The clustering coefﬁcient improves a bit in the GDT network withlong-tailed activation degree distribution where 307k nodes having a clustering coefﬁcient below 0.01 ODELLING CO-LOCATION NETWORKS

17 of 32and average clustering coefﬁcient in the GDT with µ = µ is set to 0.4 which is foundanalysing Momo data set. With this value, the GDT network improves signiﬁcantly to form realisticclustering coefﬁcient and the number of nodes in the ﬁrst bin is 45K. However, the clustering coefﬁcientis still not realistic in the GDH networks and have similar trends of µ =

0. The heterogeneous degreedistribution contributes to making the local clustering as many lower degree nodes can create localcluster through the higher degree nodes connect.This experiment also studies the average daily degree and clustering coefﬁcient of nodes. This ismeasured converting the everyday dynamic graph as a static graph where a pair of nodes is linked if oncethey have contacted. This can indicate how many other nodes a node contact during each day and whatis their clustering coefﬁcient. The results are presented in Figure 10E and Figure 10F. It is observedthat nodes in DDT network contact on average 4 nodes which are around 5 nodes in the GDT network.As the contact set size grows in the GDT network, it has a high average contact degree over the day.The GDH network has the highest average daily degree. This is because nodes in the GDH networkhave moderate activation degree and do not have many nodes with the low degree as it is in the DDTand GDT networks. The daily average clustering coefﬁcient is low in the GDH network. The averagedaily clustering coefﬁcient in the GDT network is similar to DDT network at the early days. However,it decreases as simulation time goes. This is because the contact set sizes grow in the GDT networkand nodes have more options to select neighbour nodes. Therefore, common neighbours among nodesdecrease and hence clustering coefﬁcients are low.4.1.2

Temporal Properties

The proposed graph model generates the temporal activities of nodesthrough SPDT link creation. The temporal properties should match those of real networks. Two com-monly used temporal network central properties to characterise the network: closeness and betweennessare studied. The node closeness measures the geodesic distance between all pairs of nodes in the net-work. The node betweenness is measured as the fraction of geodesics distances, which is connectedbetween all node pairs in the network, passes through a node [9]. The generated network is convertedto the temporal network G T = ( Z , L T ) where Z is the set of nodes and set L T ⊆ V × Z × [ , T ] of time-stamped links ( u , v ; t ) ∈ L T . In airborne disease propagation, susceptible individuals who get infecteddo not start infecting immediately and go through an incubation period before they can infect others. Inthis study, the incubation period is set to one day, i.e. individuals that become infected on a simulationday can start spreading the disease the next day after infection. Thus, all links of a day between a pairof nodes are aggregated over the day and the timestamp t j represents the interaction day j . If a diseaseis transmitted from node u to node v through a link ( u , v ; t j ) , the disease will be transmitted from node v to another node w with the link ( v , w ; t j + k ) where 1 (cid:54) k (cid:54) u to a susceptible node v may create through asequence of time-stamped links as follows: ( w , w ; t ) , ( w , w ; t ) , . . . , ( w l − , w l ; t l ) where w = u , w l = v and the sequence of time-stamped links should be t (cid:54) t (cid:54) . . . (cid:54) t l . The timedifference between two consecutive links of a time-respecting path can be at most 5.In the temporal network, the betweenness of a node is deﬁned as the number of shortest time-respecting paths pass through it. However, the temporal betweenness depends on the starting time t of time-respecting paths as the path length between the same pair of nodes may change if the starting8 of 32 M SHAHZAMAL BC t P r o p o r t i o n (a) BC - SNBC - RN CC t P r o p o r t i o n (b) CC - SNCC - RN F IG . 11. Temporal network properties in the generated contact network and comparison with that of real contact network : A)Betweenness centralities and B) Closeness centralities time is shifted. Thus, shortest paths between a pair of nodes are computed at all starting times and theshortest paths regardless of starting time are counted as the set to measure betweenness. The temporalbetweenness centrality BC t of an node v is measured as: BC t ( v ) = ∑ u (cid:54) = v (cid:54) = u | Q t ( u , w ; v ) | where Q t is the set of shortest time-respecting paths between u and w passing through v starting anytime during observation period [27]. In a similar way, the temporal closeness centrality CC t based onthe shortest time respecting paths is deﬁned as: CC t ( v ) = ∑ u (cid:54) = v dist t ( u , v ) where dist t is the distance of the shortest time-respecting path from u to v starting at any time during theobservation period. The temporal network analysis methods proposed by the authors of [27] are used inthis experiment. The generated networks of 32 days are used and the results obtained are presented inFigure 11 for both the real network (RN) and synthetic network (SN). The betweenness distributions of BC t and CC t have similar trends. However, the synthetic network has a higher number of nodes with alower BC t . This is because the synthetic network is more random than the real network. In the syntheticnetwork, some nodes may have higher activation degree as it depends on the power law distributed λ i and hence have higher CC t . The synthetic network overall produces similar network properties of thereal network.4.2 Diffusion dynamics

We now focus on the capability of simulating the SPDT process by the generated networks. Accordingly,airborne disease spreading is simulated on the generated synthetic contact networks and the real contactnetworks. We ﬁrst study the similarities in diffusion dynamics on various networks and then evaluatethe model’s parameter sensitivity. The experimental setup and diffusion analysis are presented below.

ODELLING CO-LOCATION NETWORKS

19 of 324.2.1

Experimental setup

For the real contact networks, the DDT contact networks constructed inSection 3 are selected since this network has shown realistic diffusion dynamics due to dense linkdensity [30]. In addition, the developed model is expected to generate realistic diffusion dynamics.Homogeneous (GDH) and heterogeneous (GDT) synthetic contacts networks among 364K nodes aregenerated for 32 days as the DDT network contains 364K nodes and 32 days period. The simulationsare also conducted on another synthetic graph constructed according to the basic activity driven networks(ADN) model [22] to understand how well the proposed model captures diffusion dynamics comparedto the current graph models that are only based on direct transmission links. In this graph, nodes activateat each time step ∆ t with a probability φ and generate m links to others nodes. The probabilities ϑ ofactivation of nodes are assigned with a power law degree distribution. The number of links m createdduring activation can be constant for all nodes or heterogeneous. In this study, both types of m areused and homogeneous (BDH) and heterogeneous (BDT) basic ADN networks with homogeneous andheterogeneous degree distribution are generated. For ﬁtting activation potential ϑ , the daily activationfrequencies h of Momo users are used. The analysis of Momo users shows that their average stayperiods ∆ t are 50 minutes. Therefore, the number of time steps in one day is 28.8 and the potential is ϑ = . / h . Thus, the activation frequencies are converted to activation potential ϑ and ﬁtted with apower law degree distribution with a lower limit of ϑ is 0.02 and an upper limit of 0.18 while power lawexponent is 2.95. For the activation degree, it is found that the average activation degree is m =

3. Forheterogeneous degree distribution, the activation degree distribution from the developed SPDT graphmodel are being used. With these ﬁtted parameters BDH and BDT networks are also generated for364K nodes over 32 days.For propagating disease on the selected contact networks, we consider a generic Susceptible-Infected-Recovered (SIR) epidemic model. In this model, nodes remain in one of the three compartments,namely, Susceptible (S), Infectious (I) and Recovered (R). If a node in the susceptible compartmentreceives a SPDT link from a node in the infectious compartment, the former is subject to exposure E l ofinfectious pathogens for both direct and indirect transmission links according to the following equation E l = gpV r (cid:104) r (cid:0) t i − t (cid:48) s (cid:1) + e rt l (cid:16) e − rt i − e − rt (cid:48) l (cid:17)(cid:105) + gpV r (cid:16) e − rt (cid:48) l − e − rt (cid:48) s (cid:17) e rt s (4.1)where g is the particle generation rate of infected individual, p is the pulmonary rate of susceptibleindividual, V is the volume of the interaction area, r is the particles removal rates from the interactionarea, t s is the arrival time of infected individual, t l is the leaving time of infected individual, t (cid:48) s is thearrival time of susceptible individuals and t (cid:48) l is the leaving time of susceptible from the interactionlocation and t i is given as follows: t i = t (cid:48) l when SPDT link has only direct component, t i = t l if SPDTlink has both direct and indirect components, and otherwise t i = t (cid:48) s . If t s < t (cid:48) s , t s is set to t (cid:48) s for calculatingappropriate exposure [30]. If a susceptible individual receives m SPDT links from infected individualsduring an observation period, the total exposure E is E = m ∑ k = E kl (4.2)where E kl is the received exposure for k th link. The probability of infection for causing disease can bedetermined by the dose-response relationship deﬁned as P I = − e − σ E (4.3)where σ is the infectiousness of the virus to cause infection [6]. This value depends on the diseasetypes and even virus types. If the susceptible individual move to the infected compartment with the0 of 32 M SHAHZAMAL probability P I , it continues to produce infectious particles over its infectious period τ days until theyenter the recovered state, where 1 / τ is the rate of recovering from the disease. In this model, no eventof births, deaths or entry of new individual are considered.The simulations are step forwarded in our experiments with one day interval [34, 39]. We chosean initial set of 500 seed nodes randomly in each experiment to start simulations assuming that it willbe capable of showing the full epidemic curve in the studied simulation duration. During each day ofdisease simulation, the received SPDT links for each susceptible individual from infected individualsare separated and infection causing probabilities are calculated by Eqn. 4.3. The volume V of proximityin Eqn. 4.2 is ﬁxed to 2512 m assuming that the distance, within which a susceptible individual caninhale the infectious particles from an infected individual, is 20m and the particles will be available up tothe height of 2m [6, 8]. The other parameters are assigned as follows: particle generation rate g = . q = . r = b to Eqn 4.2 where b is particle removal time randomly chosen from [7.5-300] min given a median particleremoval time r t . The parameter σ is set to 0.33 as the median value of required exposures for inﬂuenzato induce disease in 50% susceptible individuals is 2.1 PFU [3]. Susceptible individuals stochasticallyswitch to the infected states in the next day of simulation according to the Bernoulli process with theinfection probability P I (Eqn 4.3). Individual stays infected up to τ days randomly picked up from 3-5days maintaining ¯ τ = I p , number of infected nodes in the network, and cumulative infection I a ,number of the infected individuals up to a day are calculated. To characterize the diffusion reproductionability, the absolute percentage variation (APV) for infection events are calculated for each networkwith the reference of real contact networks as: APV = × I r − I o I r where I r is the number of infection events in the real network and I o is the infection event in the corre-sponding observed graph. For the both I p and I a are measured for each network.4.2.2 Diffusion analysis

The selected four networks (BDH, BDT, GDH and GDT) are generated with364K nodes for 32 days and simulations are run with the selected disease parameters. The infectionprobability P I in Equation 4.2 is calculated with particle decay rates r t =

60 min. The average resultsfor 1,000 realisations are shown in Figure 12. The proposed graph model with heterogeneous degreedistributions (GDT) has diffusion dynamics that are close to real contact networks (DDT). The GDTnetwork has higher disease prevalence I p than the DDT network (Fig. 12A). This is because nodes inthe GDT network grow their contact sizes substantially compared to DDT network where links betweentwo nodes are repeated. Thus, infected nodes in the DDT network repeatedly forward SPDT linksto the same susceptible nodes and disease propagates faster in the DDT network at the early days ofsimulations. On the other hand, infected nodes in the GDT networks forward links to more susceptiblenodes and disease transmission happen slowly in the GDT network. Thus, the initial spreading rates inthe DDT network is faster than the GDT network. For the same reason, the GDT network carries thegrowth of I p for a longer time and to a higher value. The total infection in GDT network is also closeto the total of the real network (Figure 12B). On the other hand, the BDT networks show increasingof I p . But, the diffusion dynamics are underestimated signiﬁcantly and total infection is comparativelylow. This is because the BDT network does not have indirect transmission links. Thus, the growth ODELLING CO-LOCATION NETWORKS

21 of 32

Table 1. Summary of diffusion dynamics on contact networks

Networks r t I p (max) I a (max) average DV (%) average AV (%)DDT 60 1731 11221 0 040 1332 8671 0 020 966 6460 0 0GDT 60 2094 11929 15 1040 1531 9584 18 1220 987 6853 22 13BDT 60 1007 7809 29 3740 673 5666 30 3820 335 3170 45 44GDH 60 698 3600 61 5040 626 2110 69 5320 586 1592 70 51BDH 60 600 1892 73 6240 581 1645 73 5920 562 1386 72 55of I p is not carried out in this network as it is in the DDT and GDT networks. On the other hand,homogeneous networks cannot replicate the diffusion dynamics of real contact networks. Initially, thedisease prevalence I p increases for a few days in the GDH network and declines. A similar trend is foundfor the BDH network. This is because the homogeneous networks do not have higher degree nodes thathave strong disease spreading potential and forward disease to different points of the network. In thehomogeneous networks, the neighbour nodes quickly get infected and infection resistance force growsdue to reducing susceptible neighbour nodes. The prediction variations in APV for all networks areshown in the Figure 12 C and D. The GDT network shows the lowest variation in daily predictionsamong all networks compared to the DDT network. This has on average 15% variation while the BDTnetwork shows on average 29% variation. However, the variations are high for the BDT network withthe maximum variation being 60%. The accumulative prediction variation in the GDT network has anaverage of 10%, while this is very high for the BDT networks at 37%. For the other networks, the GDHnetwork has on average 61% daily variation and BDH network has 70% variation. Though it has ahigher variation in daily prediction, the proposed model still maintains a cumulative prediction variationaround 10%. The proposed graph model is up to 27% better than current models for predicting diseasespread dynamics.Now, the model’s sensitivity to the various diffusion parameters is studied. The SPDT model ishighly sensitive to the particle decay rates r t . Thus, disease spreading is simulated for three differentvalues of r t = { , , } min for all networks and results are presented in Figure 13. The homogeneousnetwork GDH can model the variation in diffusion dynamics with r t . However, it underestimates thediffusion at all r t . The prediction variations are increased with r t where the average prediction variationreaches to 70% with the average accumulative variation of 51%. The other homogeneous network BDHcannot vary the diffusion with r t , as this network has only direct transmission links. The BDH networkalways has more than 72% variation in daily prediction for any value of r t . The proposed heterogeneous2 of 32 M SHAHZAMAL

DDT GDT GDH BDT BDH

Day D i s e a s e p r e v a l e n c e ( I p ) × (A) 0 10 20 30 Day A cc u m u l a t i v e i n f e c t i o n ( I a ) × (B)0 10 20 30 Day A P V ( % ) (C) Variation in daily predictions Day A P V ( % ) (D) Variation in accumulative predictons F IG . 12. Diffusion dynamics on various contact networks: A) disease prevalence dynamics, B) accumulative infection oversimulation days, C) variations in the daily prediction comparing to real contact network and D) variation in the accumulativeprediction. network GDT shows increasing daily prediction variation. However, it is still 23% better than the BDTnetworks at particle decay rates r t =

20 min. The model also maintains the same 32% improvement inthe cumulative prediction variations at r t =

20 min compared to the GDT model. Although the predictionvariation increases with the raising of particle decay rates, the model still predicts well compared to othermodels.The diffusion dynamics are also inﬂuenced by the properties of contagious items. Therefore, themodel should respond well to varying the properties in the contagious items as well. Now, we varythe infectiousness σ and infectious period τ are varied keeping r t =

60 min constant to analyse howthe generated network responds. The results are presented in Figure 14. The disease prevalence getsstronger in all networks for increasing infectiousness from σ = .

33 to from σ = .

40. However, theincrease in I p are different in different networks. The variation in the GDT network follow the sametrends of the DDT network while the remaining networks deviate signiﬁcantly. In the GDT network,the average daily variation in I p compared to that of DDT network is 13%. For the other networks,the average daily variations in I p increases for increasing σ = .

40 and the maximum is 70% in the

ODELLING CO-LOCATION NETWORKS

23 of 32DDT GDH BDH GDT BDT r t = r t = r t = D i s e a s e p r e v a l e n c e ( I p ) × (A) 0 5 10 15 20 25 30Day0.00.51.01.52.0 D i s e a s e p r e v a l e n c e ( I p ) × (B)0 5 10 15 20 25 30Day0.00.51.01.52.0 D i s e a s e p r e v a l e n c e ( I p ) × (C) 0 5 10 15 20 25 30Day0.00.51.01.52.0 D i s e a s e p r e v a l e n c e ( I p ) × (D) F IG . 13. Sensitivity analysis of the contact networks with various particle decay rates r t . The diffusion dynamics of DDT networkat various r t are compared with that of the generated networks: A) GDH network, B) BDH network, C) GDT network and D)BDT network GDH networks. Therefore, the GDT network responds similarly to DDT network in changing σ . Thesimilar trends are found for the increase in infectious period τ = I p for GDT network is 12% while all others networks show higher average daily variation in I p with amaximum in BDH network. The overall responses of GDT network for changing in the properties ofcontagious items follow the trends of the DDT network.To understand the response of the graph model for larger scale simulation with large network sizes,the simulations are also run on various sizes of the networks. Two dimensions are considered here:increasing the length of the simulation period and number of nodes. In this experiment, only heteroge-neous networks are considered as they assume more realistic diffusion dynamics. For this study, a GDTnetwork and a BDT network with 364K nodes are generated for 42 days period. Then, the real contactnetwork DDT is extended to a network of 42 days repeating all links of a day randomly picked up from32 days and placing to a random day within days 33 to 42. The results are presented in Figure 15. The4 of 32 M SHAHZAMAL

DDT GDH BDH GDT BDT σ = . σ = . τ = D i s e a s e p r e v a l e n c e ( I p ) × (A) 0 5 10 15 20 25 30Day0.00.51.01.52.02.53.0 D i s e a s e p r e v a l e n c e ( I p ) × (B)0 5 10 15 20 25 30Day0.00.51.01.52.02.53.03.5 D i s e a s e p r e v a l e n c e ( I p ) × (C) 0 5 10 15 20 25 30Day0.00.51.01.52.02.53.03.5 D i s e a s e p r e v a l e n c e ( I p ) × (D) F IG . 14. Sensitivity analysis of the contact networks with disease parameters infectiousness σ and infectious period τ . Thediffusion dynamics of DDT network are compared with that of the generated networks: A) GDH network, B) BDH network, C)GDT network and D) BDT network diffusion dynamic is consistent in the GDT network. We have found the similar amount of variation in I p to the previous experiments. The total infection in the GDT network becomes almost the same at theend of the simulation period. However, the BDT network keep continuing underestimation of diffusiondynamic and total infection as that was in the previous experiments. Thus, our GDT network performbetter even with longer simulation period. The model response is also analyzed generating the networkwith a large number of nodes. For these experiments, the GDT and BDT networks are selected andgenerated with 0.364M, 0.5M and 1M nodes for 32 days. The simulations are run with the same pro-portion of seed nodes and results are presented in the Figure 16. The results show that the GDT networkbehaves consistently for diffusion dynamic and accumulative diffusion while BDT network shows somevariations for increasing node numbers. The generated contact network GDT by our model can capturethe characteristics of DDT network efﬁciently than the other graph models. ODELLING CO-LOCATION NETWORKS

25 of 32

Day D i s e a s e p r e v a l e n c e ( I p ) × (A) DDTGDTBDT

Day C u m u l a t i v e d i s e a s e ( I a ) × (B) DDTGDTBDT F IG . 15. Sensitivity of the model with observation period. The diffusion dynamics of DDT network are compared with that of thegenerated networks: A) disease prevalence dynamics and B) accumulative infection over simulation days GDT BDT 0.364M 0.50M 1.0M P r o p o r t i o n o f n o d e s i n f e c t e d × (A) Disease prevalence 0 5 10 15 20 25 30Day0123 P r o p o r t i o n o f n o d e s i n f e c t e d × (B) Accumulative disease F IG . 16. Sensitivity of the graph model with network sizes. The diffusion dynamics of DDT network are compared with that ofthe generated networks: A) disease prevalence dynamics and B) accumulative infection over simulation days

5. Conclusion

In this paper, a novel generative graph model is developed and analysed to support the study of SPDTdiffusion dynamics. The developed SPDT graph can capture both direct and indirect interactions formodelling the spread of contagious items. The synthetic dynamic contact networks generated by thisgraph model are capable of capturing SPDT dynamics, applying reinforcement for repetitive interactionsand heterogeneous propensity to engage in interactions. This paper has demonstrated how the modelcan be generated based on empirical data from a large social networking application and reproduce bothnetwork properties and diffusion dynamics of empirical contact networks. The developed model shows30% higher efﬁciency to generate the diffusion dynamics of the real SPDT contact networks. Analysisof the sensitivity to the SPDT diffusion model parameters has demonstrated that the SPDT graph modelresponds similarly to that of the real SPDT graph. The model is also consistent for a longer simulation6 of 32

M SHAHZAMAL period and large network size.The previous studies show that contact degree distribution follows a power law. In the SPDT graphmodel, there are two types of contact degree distribution: activation degree and overall contact set sizeover the observation period. While both degree distributions follow the power law, we ﬁnd that usingthe homogeneous degree distribution leads to an underestimation of the spread as diffusion is not drivenby the higher degree individuals. This is because the disease spreading force is accelerated graduallyby the higher degree nodes, which is not reﬂected in homogeneous networks. ADN networks withpower law degree distribution have also not shown similar diffusion dynamics as they can not coverthe distribution of indirect links. The integration of active copies of nodes in our model facilitates theinclusion of indirect links and can better mimic the real network.The clustering co-efﬁcient is also changed in the network by adding indirect links. The neighbourselection mechanism controls the growth of clustering co-efﬁcient. Survival of the active copy forindirect transmission period can create links and maintain the local clustering co-efﬁcient. The stay timeof an individual in a location is found to be long tailed distribution and the developed model for staytime captured the distribution of stay time very closely with simple modelling parameters of transitionprobability.The developed SPDT graph model can be extended for other applications as well. For modellinguser activities across multiple social networks, the concept of active copies can be exploited to generateunderlying interaction networks. We can create an active copy of a node when they live in an onlinesocial network (OSN) and the active copies expire when a user leaves the OSN. Moreover, the concurrentexistence of active copies can represent concurrent activities of a user in different OSN [15, 42]. Thegraph model can also be applied to study queen message dissemination in social ant-colonies whereants spray a chemical encoded with the queen message and the message is transferred to the ants thatencountered the chemical [24]. Vector borne disease (transmitting via mosquitoes, ﬂies and mites etc.)spreading can be studied using the SPDT model as well. For example, if a virus is transferred froman infected individual to mosquitoes, the virus can be transferred to the susceptible individuals whenthe infected individual moves to the other locations [1]. Similarly, propagation of mobile malware canbe explored by the SPDT graph model where a malicious mobile phone worm can be transferred tothe Wi-Fi router and this can be transferred to the susceptible phones when the infected phone is notpresent [37].There are several potential and interesting future research directions on the SPDT graph modelling.In the current SPDT graph, we have only considered that the indirect transmission links can be createdfor the three hour period as the signiﬁcant infectious particles can persist up to three hours. However, theindirect links can be created for a longer period for other applications. It is interesting to explore how tocreate indirect links for any period of times and how it interacts with the decay rates of infectious items.We have studied the network properties such as degree distributions, clustering co-efﬁcient and temporalproperties through simulations. It would be interesting to ﬁnd the network properties analytically aswell as the analytic diffusion dynamics. The community structure and sub-graphs also play vital rolesin the modelling of diffusion processes on the contact networks. This could be another interestingresearch direction where we can investigate how the community structure and sub-graphs are changedfor including the indirect links and what is their impact on diffusion dynamics.

Acknowledgment

This study was supported by the Australian Research Council Grant (ARC - DP170102794) and Com-monwealth Scientiﬁc and Industrial Research Organisation (CSIRO). Md.S. was partially supported by

ODELLING CO-LOCATION NETWORKS

27 of 32the CSIRO and ARC- DP170102794. B.M. was partially supported by the ARC-DP170102794. R.J.and F.d.H. were supported by the CSIRO. The authors gratefully acknowledge the Distributed SensingSystem Group, Data61, CSIRO for providing research facilities for this research.R

EFERENCES Adams, B. & Kapan, D. D. (2009) Man bites mosquito: understanding the contribution of human movementto vector-borne disease dynamics.

PloS one , (8), e6763. Alessandretti, L., S., K., B., A. & P., N. (2017) Random walks on activity-driven networks with attractiveness.

Physical Review E , (5), 052318. Alford RH, Kasel JA, G. P. & V, K. (1966) Human inﬂuenza resulting from aerosol inhalation.

Proceedingsof the Society for Experimental Biology and Medicine , (3), 800–804. C., S. & Murthy, C.and P., S. K. (2014) Fitting truncated geometric distributions in large scale real worldnetworks.

Theoretical Computer Science , . De Arruda, G. F., B., A. L., Rodr´ıguez, P. M., R., F. A., M., Y. & da F. C., L. (2014) Role of centrality for theidentiﬁcation of inﬂuential spreaders in complex networks.

Physical Review E , (3), 032812. Fernstrom, A. & Goldblatt, M. (2013) Aerobiology and its role in the transmission of infectious diseases.

Journal of pathogens , . Freeman, L. C. (1978) Centrality in social networks conceptual clariﬁcation.

Social networks , (3), 215–239. Han, Z., Weng, W., Huang, Q. & Zhong, S. (2014) A Risk Estimation Method for Airborne Infectious DiseasesBased on Aerosol Transmission in Indoor Environment. In

Proceedings of the World Congress on Engineer-ing , volume 2. Holme, P. (2015) Modern temporal network theory: a colloquium.

The European Physical Journal B , (9),234. Huang, C., Liu, X., Sun, S., Li, S. C., Deng, M., He, G., Zhang, H., Wang, C., Zhou, Y., Zhao, Y. et al. (2016)Insights into the transmission of respiratory infectious diseases through empirical human contact networks.

Scientiﬁc Reports , . Hui, P., Chaintreau, A., Scott, J., Gass, R., Crowcroft, J. & Diot, C. (2005) Pocket switched networks andhuman mobility in conference environments. In

Proceedings of the 2005 ACM SIGCOMM workshop onDelay-tolerant networking , pages 244–251. ACM.

Jacquet, P., Mans, B. & Rodolakis, G. (2010) Information propagation speed in mobile and delay tolerantnetworks.

IEEE Transactions on Information Theory , (10), 5001–5015. Karsai, M., Perra, N. & Vespignani, A. (2014) Time varying networks and the weakness of strong ties.

Scien-tiﬁc reports , , srep04001. Keeling, M. J. & Rohani, P. (2008)

Modeling infectious diseases in humans and animals . Princeton UniversityPress.

Kong, X., Zhang, J. & Yu, P. S. (2013) Inferring anchor links across multiple heterogeneous social networks.In

Proceedings of the 22nd ACM international conference on Information & Knowledge Management , pages179–188. ACM.

Kov´acs, I. A., Luck, K., Spirohn, K., Wang, Y., Pollis, C., Schlabach, S., Bian, W., Kim, D.-K., Kishore, N.,Hao, T. et al. (2019) Network-based prediction of protein interactions.

Nature communications , (1), 1240. Laurent, G., S.¨aki, J. & Karsai, M. (2015) From calls to communities: a model for time-varying social net-works.

The European Physical Journal B , . Lindsley, W. G., Noti, J. D., Blachere, F. M., Thewlis, R. E., Martin, S. B., Othumpangat, S., Noorbakhsh, B.,Goldsmith, W. T., Vishnu, A., Palmer, J. E. et al. (2015) Viable inﬂuenza A virus in airborne particles fromhuman coughs.

Journal of occupational and environmental hygiene , (2), 107–113. Moon, S. A., Cohnstaedt, L. W., McVey, D. S. & Scoglio, C. M. (2019) A spatio-temporal individual-basednetwork framework for West Nile virus in the USA: Spreading pattern of West Nile virus.

PLoS computationalbiology , (3), e1006875. Nagatani, T., Ichinose, G. & Tainaka, K.-i. (2019) Epidemic spreading of random walkers in metapopulation

M SHAHZAMAL model on an alternating graph.

Physica A: Statistical Mechanics and its Applications , , 350–360. Ogura, M., Preciado, V. M. & Masuda, N. (2019) Optimal Containment of Epidemics over Temporal Activity-Driven Networks.

SIAM Journal on Applied Mathematics , (3), 986–1006. Perra, N., Gonc¸alves, B., Pastor-Satorras, R. & Vespignani, A. (2012) Activity driven modeling of time vary-ing networks.

Scientiﬁc reports , . Porter, M. A. & Bianconi, G. (2016) Network analysis and modelling: Special issue of European Journal ofApplied Mathematics.

European Journal of Applied Mathematics , (6), 807–811. Richardson, T. O. & Gorochowski, T. E. (2015) Beyond contact-based transmission networks: the role ofspatial coincidence.

Journal of The Royal Society Interface , (111), 20150705. Rushton, S. P., Sanderson, R. A., Reid, W. D., Shirley, M. D., Harris, J. P., Hunter, P. R. & O’Brien, S. J. (2019)Transmission routes of rare seasonal diseases: the case of norovirus infections.

Philosophical Transactions ofthe Royal Society B , (1776), 20180267. Scherrer, A., Borgnat, P., Fleury, E., Guillaume, J.-L. & Robardet, C. (2008) Description and simulation ofdynamic mobility networks.

Computer Networks , (15), 2842–2858. Scholtes, I., W., N. & G., A. (2016) Higher-order aggregate networks in the analysis of temporal networks:path structures and centralities.

The European Physical Journal B , (3), 61. Sekamatte, M., Riad, M. H., Tekleghiorghis, T., Linthicum, K. J., Britch, S. C., Richt, J. A., Gonzalez, J. &Scoglio, C. M. (2019) Individual-based network model for Rift Valley fever in Kabale District, Uganda.

PloSone , (3), e0202721. Shahzamal, M., Jurdak, R., Arablouei, R., Kim, M., Thilakarathna, K. & Mans, B. (2017) Airborne DiseasePropagation on Large Scale Social Contact Networks. In

Proceedings of the 2nd Int. Workshop on SocialSensing , pages 35–40. ACM.

Shahzamal, M., Jurdak, R., Mans, B. & de Hoog, F. (2009) Indirect Interactions Inﬂuence Contact NetworkStructure and Diffusion Dynamics.

Royal Society Open Science , (8), 190845. Shahzamal, M., Jurdak, R., Mans, B., El Shoghri, A. & De Hoog, F. (2018) Impact of Indirect Contacts inEmerging Infectious Disease on Social Networks. In

Paciﬁc-Asia Conference on Knowledge Discovery andData Mining , pages 53–65. Springer.

Shahzamal, M. & Pervez, M. F. (2016) Smartphones based warning messaging system for marine ﬁsheriesand its characteristics. In ,pages 111–116. IEEE.

Starnini, M., Baronchelli, A. & Pastor-Satorras, R. (2013) Modeling human dynamics of face-to-face interac-tion networks.

Physical review letters , (16), 168701. Stehl´e, J., Voirin, N., Barrat, A., Cattuto, C., Colizza, V., Isella, L., R´egis, C., Pinton, J.-F., Khanafer, N.,Van den Broeck, W. et al. (2011) Simulation of an SEIR infectious disease model on the dynamic contactnetwork of conference attendees.

BMC medicine , (1), 87. Sun, K., Baronchelli, A. & Perra, N. (2015) Contrasting effects of strong ties on SIR and SIS processes intemporal networks.

The European Physical Journal B , (12), 326. Sze To, G. & Chao, C. (2010) Review and comparison between the Wells–Riley and dose-response approachesto risk assessment of infectious respiratory diseases.

Indoor Air , (1), 2–16. Tang, J., Mascolo, C., Musolesi, M. & Latora, V. (2011) Exploiting temporal complex network metrics inmobile malware containment. In , pages 1–9. IEEE.

Thilakarathna, K., Seneviratne, S., Gupta, K., Kaafar, M. A. & Seneviratne, A. (2016) A deep dive intolocation-based communities in social discovery networks.

Computer Communications . Toth, D. J., Leecaster, M., Pettey, W. B., Gundlapalli, A. V., Gao, H., Rainey, J. J., Uzicanin, A. & Samore,M. H. (2015) The role of heterogeneity in contact timing and duration in network models of inﬂuenza spreadin schools.

Journal of The Royal Society Interface , (108), 20150279. Xu, Z., Glass, K., Lau, C. L., Geard, N., Graves, P. & Clements, A. (2017) A synthetic population for mod-elling the dynamics of infectious disease transmission in American Samoa.

Scientiﬁc reports , (1), 16725. Yan, J., Grantham, M., Pantelic, J., de Mesquita, P. J. B., Albert, B., Liu, F., Ehrman, S., Milton, D. K., Adam-

ODELLING CO-LOCATION NETWORKS

29 of 32 son, W., Beato-Arribas, B. et al. (2018) Infectious virus in exhaled breath of symptomatic seasonal inﬂuenzacases from a college community.

Proceedings of the National Academy of Sciences , page 201716561.

Zhang, P., Lu, T., Gu, H. & Gu, N. (2017) Identifying user identity across social network sites based onoverlapping relationship and social interaction. In

Proceedings of the 12th Chinese Conference on ComputerSupported Cooperative Work and Social Computing , pages 25–32. ACM.

Zhang, Y., C., J., Z., S.-M., Z., Q. & L., X. (2016) Modelling temporal networks of human face-to-facecontacts with public activity and individual reachability.

The European Physical Journal B , (2), 26. A. Appendix

A.1

Formulating Maximum Likelihood Estimation for Node Activation

The distribution function for the frequency of node activation given z number of time step in the system, ρ scaling parameter for active periods t a and the waiting time parameter q , we can write Pr ( h | q ) = (cid:16) z ρ qq + ρ (cid:17) h e − z ρ qq + ρ h !Thus, the Likelihood function for a given set h = { h , h . . . , h m } is L ( q | h ) = m ∏ i = (cid:16) z ρ qq + ρ (cid:17) h e − z ρ qq + ρ h !Taking log on the both sides as it increases monotonically, we obtainln ( L ( q | h )) = ln (cid:18) zq ρρ + q (cid:19) m ∑ i = h i − mzq ρ q + ρ − m ∑ i = ln h i Differentiating partial derivatives with respect to q , we get ∂ ln ( L ) ∂ q = ρ q ( ρ + q ) m ∑ i = h i − mz ρ ( q + ρ ) At the maximum, ∂ ln ( L ) ∂ q = = q m ∑ i = h i − mz ρ q + ρ Thus, the maximum likelihood estimation condition will be qz ρ q + ρ = m m ∑ i = h i (A.1)A.2 Link creation delay distribution

The probability function for link creation delay t is given by the following equation f ( t c | p c , t a ) = p c ( − p c ) t c − − ( − p c ) t a + δ M SHAHZAMAL

The probability of t c given p c can be written according to law of total probability as f ( t c | p c ) = (cid:90) f ( t c | p c , t a ) f ( t a ) dt a Approximating the above equation for m values of t a applying quadrature, we get f ( t c | p c ) ≈ m m ∑ l = f (cid:16) t c | p c , t la (cid:17) ≈ m m ∑ l = p c ( − p c ) t c − − ( − p c ) t la + δ Now we deﬁne a likelihood function L ( p c | t c ) for the probability distribution function f ( t c | p c ) .Function L ( p c | t c ) will be maximizes for the observations t c , t c , . . . , t nc . The approximated likelihoodfunction can be written as L ( p c | t c ) ≈ n ∏ k = m m ∑ l = p c ( − p c ) t kc − − ( − p c ) t la + δ Since the log function is monotonically increasing, we can maximize ln L ( p c | t c ) instead L ( p c | t c ) .Therefore, we obtainln L ( p c | t c ) = n ln ( p c ) − n ln ( m ) + n ∑ k = ln (cid:32) m ∑ l = ( − p c ) t kc − − ( − p c ) t la + δ (cid:33) Differentiating the above equation, we get ∂∂ p c ( ln L ( p c | t c )) = np c − n ∑ k = ∑ ml = ( t kc − )( − p c ) tkc − ( − ( − p c ) tla + δ )+( t la + δ )( − p c ) tkc + tla + δ − ( − ( − p c ) tlta + δ ) ∑ ml = ( − p c ) tkc − − ( − p c ) tla + δ For maximizing, ˆ p c we set ∂∂ p c ( ln L ( p c | t c )) = = np c − n ∑ k = ∑ ml = ( t kc − )( − ( − p c ) tla + δ )+( t la + δ )( − p c ) tla + δ ( − p c )( − ( − p c ) tlta + δ ) ∑ ml = (( − ( − p c ) t la + δ ) − (A.2)We can solve this equation numerically to ﬁnd ˆ p c .A.3 Activation degree distribution

The probability density function for activation degree d given λ is, f ( d | λ ) = ( − λ ) λ d − and λ is drawn from the following power law distribution f ( λ ) = β λ − ( β + ) ξ − β − ψ − β ODELLING CO-LOCATION NETWORKS

31 of 32Therefore, the probability distribution of d for any λ can be found according to Law of Total Probabilityas follows f ( d ) = βξ − β − ψ − β (cid:90) ψξ ( − λ ) λ d − λ − ( β + ) d λ = βξ β − ψ β (cid:90) ψξ ( λ d − β − − λ d − β − ) d λ = βξ − β − ψ − β (cid:32) ψ d − β − − ξ d − β − d − β − − ψ d − β − ξ d − β d − β (cid:33) Now we would like to derive the maximum likelihood estimator (MLE) conditions to ﬁnd the param-eters β , ξ and ψ given d samples from real world network. The likelihood function L for the randomvalues of d = { d , d , . . . d n } can be written as L ( β , ξ , ψ | d ) = n ∏ k = βξ − β − ψ − β (cid:32) ψ d k − β − − ξ d k − β − d k − β − − ψ d k − β − ξ d k − β d k − β (cid:33) Taking log on both sides, we getln L ( β , ξ , ψ | d ) = n ln β − n ln ( ξ − β − ψ − β ) + n ∑ k = ln (cid:32) ψ d k − β − − ξ d k − β − d k − β − − ψ d k − β − ξ d k − β d k − β (cid:33) (A.3)To ﬁnd the maximum of likelihood function L ( β , ξ , ψ ) for the value of β , we take the partial derivativeof A.3 with respect to β and set zero to result for the maximum of ˆ β = n β − n ψ − β ln ψ − ξ − β ln ξψ − β − ψ − β + n ∑ k = ξ dk − β − ln ξ − ψ dk − β − ln ψ d k − β − − ψ dk − β − − ξ dk − β − ( d k − β − ) − ξ dk − β ln ξ − ψ dk − β ln ψ d k − β − ψ dk − β − ξ dk − β ( d k − β ) ψ dk − β − − ξ dk − β − d k − β − − ψ dk − β − ξ dk − β d k − β Since ψ ≈

1, we assume ln ψ = ψ x = x . We can simplify as0 = n β − n ξ β ln ξξ β − ψ − β + n ∑ k = ξ dk − β − ln ξ d k − β − − − ξ dk − β − ( d k − β − ) − ξ dk − β ln ξ d k − β − − ξ dk − β ( d k − β ) − ξ dk − β − d k − β − − − ξ dk − β d k − β (A.4)If we set y = In the similar way, we can ﬁnd ˆ ξ derivating L ( β , ξ , ψ ) ( Eq.A.3) with respect to ξ andsetting zero for maximal of ˆ ξ = n β ξ − β − ξ − β − ψ − β + n ∑ k = ( d k − β ) ξ dk − β − d k − β − ( d k − β − ) ξ dk − β − d k − β − ψ dk − β − − ξ dk − β − d k − β − − ψ dk − β − ξ dk − β d k − β With simpliﬁcation we get0 = n ( β + ) ξ − β − ψ − β + n ∑ k = ξ d k − ξ d k − ( ψ d k − β − − ξ d k − β − )( d k − β ) − − ( d k − β − ) − ( ψ d k − β − ξ d k − β ) (A.5)2 of 32 M SHAHZAMAL

To estimate ˆ ψ , we derivate L ( β , ξ , ψ ) ( Eq.A.3) with respect to ψ and setting zero for maximal ofˆ ψ = − n β ψ − β − ξ − β − ψ − β + n ∑ k = ( d k − β − ) ψ dk − β − d k − β − − ( d k − β ) ψ dk − β − d k − βψ dk − β − − ξ dk − β − d k − β − − ψ dk − β − ξ dk − β d k − β = − n ( β + ) ξ − β − ψ − β + n ∑ k = ψ d k − − ψ d k ( ψ d k − β − − ξ d k − β − )( d k − β ))