[PDF] Detecting Hidden Layers from Spreading Dynamics on Complex Networks

Abstract

When dealing with spreading processes on networks it can be of the utmost importance to test the reliability of data and identify potential unobserved spreading paths. In this paper we address these problems and propose methods for hidden layer identification and reconstruction. We also explore the interplay between difficulty of the task and the structure of the multilayer network describing the whole system where the spreading process occurs. Our methods stem from an exact expression for the likelihood of a cascade in the Susceptible-Infected model on an arbitrary graph. We then show that by imploring statistical properties of unimodal distributions and simple heuristics describing joint likelihood of a series of cascades one can obtain an estimate of both existence of a hidden layer and its content with success rates far exceeding those of a null model. We conduct our analyses on both synthetic and real-world networks providing evidence for the viability of the approach presented.

Full PDF

DDetecting Hidden Layers from Spreading Dynamics on Complex Networks

Łukasz G. Gajewski ∗ and Jan Chołoniewski Center of Excellence for Complex Systems Research, Faculty of Physics,Warsaw University of Technology, Koszykowa 75, 00-662, Warsaw, Poland

Mateusz Wilinski

Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA

When dealing with spreading processes on networks it can be of the utmost importance to test thereliability of data and identify potential unobserved spreading paths. In this paper we address theseproblems and propose methods for hidden layer identiﬁcation and reconstruction. We also explorethe interplay between diﬃculty of the task and the structure of the multilayer network describing thewhole system where the spreading process occurs. Our methods stem from an exact expression forthe likelihood of a cascade in the Susceptible-Infected model on an arbitrary graph. We then showthat by imploring statistical properties of unimodal distributions and simple heuristics describingjoint likelihood of a series of cascades one can obtain an estimate of both existence of a hidden layerand its content with success rates far exceeding those of a null model. We conduct our analyses onboth synthetic and real-world networks providing evidence for the viability of the approach presented.

I. INTRODUCTION

Real-world complex systems can often be described byinterconnected structures known as multilayer networks[1–4]. Transportation, social or economic networks, toname just a few general examples, can have various typesof connections, see Fig. 1 for an example depiction of sucha system. Each such type of a connection in a network canbe represented as a speciﬁc sub-system or sub-network.Railway, ﬂights and bus connections can all be describedwith a network but to have a full description of the trans-portation system, they need to be joined and describedwith a multilayer network. In reality obtaining full infor-mation which would allow to create a complete multilayernetwork is rarely possible. Moreover, in some cases, eventhe knowledge about all existing layers is limited. As aresult, researchers often have to deal with uncertaintywhich arise from dealing with partial information aboutconnectivity in analysed system. This speciﬁcally con-cerns one of the fundamental problems in network science,the spreading processes on networks [5–11] but is also ofsigniﬁcance for opinion dynamics [12–15].In the following article we focus on the problem ofdetecting hidden layers based on observations of a dynam-ical processes on graphs. We also propose and exploremethods for ﬁnding missing connections of diﬀerent types.Finally, we analyse potential limitations and diﬃculties aswell as beneﬁcial settings, i.e. when solving the problemis easier, for these methods.The problem of detecting hidden layers has appearedrecently in the literature in a non-markovian setting [16]and on quantum graphs [17]. It is also closely related tothe problem of network reconstruction which was exten-sively analysed in the past [18–22] and also in a partialobservation setting [23–25]. Our setting is a bit simpler ∗ [email protected] in some regards but at the same time still fairly realisticand thus should still be viable for real-world problems.We feel that our simpliﬁcations are justiﬁed since solvingthe general problem was proved to be limited [19] andprevious papers often approached only limited cases any-way, such as very short cascades [26]. This is not to saythat successful approximations are not possible [18], how-ever, the goal of this paper is to investigate the challengesassociated with detecting hidden layers in interconnectednetworks in the context of spreading processes.Reader should not confuse the problem of ﬁnding hid-den layers based on observed spreading with extensivelyanalysed branch of network science called link prediction ,where the hidden connections are estimated using onlythe network structure. A seminal paper in this directionis [27]. An extension, including multilayer networks, canbe found in [28].The paper is structured as follows: in the next sectionwe describe all the methods used in the analysis, startingwith tools which allow for detecting hidden layers and thenproposing methods for identifying unobserved connections.In the third section we analyse both synthetic and realworld networks and show how our methods work underdiﬀerent structural circumstances. Finally, we discuss allthe results and present our conclusions in the last section. II. METHODS

In our analysis we focus on one of the simplest spreadingmodel – the Susceptible Infected (SI) model [30]. We use itin a network version where the dynamics can be describedas follows: for each node i , which is in the infected state I at time t , each of its neighbors j (in susceptible state S ) will become infected at time t + 1 with probability β .This model was used because of its simplicity on one handand the mechanism of multiple infection opportunities (incomparison with, e.g. an independent cascade model) on a r X i v : . [ phy s i c s . s o c - ph ] J a n FIG. 1: Visualisation of the multilayer network representing the Aarhus data [29]. We utilise this network as a realworld example of possible application of our methods. It is quite natural to imagine we know only one of the layerspresented here and would like to infer the existence and possibly structure of others. Note that co-author and leisurelayers are disconnected and thus we do not use them as visible layers in our analyses as that makes the task ofdetecting hidden connections potentially much easier.the other. The latter makes the combinatorial analysismuch more diﬃcult, as we will see in further sections.The former is reﬂected by an easy to derive likelihood ofany observed spreading, including a multilayer scenario,assuming that the full knowledge about connectivity isavailable. Let us denote a multilayer graph with G andlet the probability of infection (spreading) on each layer j be equal to β j . We will refer to a single spreadingdynamics as a cascade and denote it with Σ c . A singlecascade can be described by a set of infection times τ ci for each node i . We will also assume that a cascade endsat time t max and if a given node was not infected at all,its infection time will be equal to t max . In other words,if node’s i activation times is equal to t max , it was eitheractivated at t max or later – this will be more clear oncethe likelihood is derived. The set of all available cascadeswill be denoted with Σ . A. Cascade likelihood for Susceptible-Infectedmodel

As mentioned before, the likelihood of a given set ofcascades, for a speciﬁc and fully known multilayer networkcan be derived, similarly as it was done in [23]. In short,the probability of a given data-set can be written as aproduct over cascades, which are independent, and nodes,because the problem can be considered locally: P (Σ | G, { β j } ) = (cid:89) i ∈ V (cid:89) c ∈ C P i ( τ ci | Σ c , G, { β j } ) , (1) with each element of the product being the probability ofnode activation under a speciﬁc cascade: P i ( τ ci | Σ c , G, { β j } ) = (cid:32) τ ci − (cid:89) t =0 (cid:89) j (cid:89) k ∈ ∂ j i (1 − β j τ ck ≤ t ) (cid:33) × (cid:32) − (cid:89) j (cid:89) k ∈ ∂ j i (1 − β j τ ck ≤ τ ci − ) τ ci

Assuming X is a random variable with unimodal dis-tribution (mean µ , ﬁnite and positive variance σ ) and λ > (cid:113) , we have: Pr( | X − µ | ≥ λσ ) ≤ λ , which after normalisation ˜ X = | X − µ | σ gives: Pr (cid:16) ˜ X ≥ λ (cid:17) ≤ λ . In a boundary case λ = ˜ x : Pr (cid:16) ˜ X ≥ ˜ x (cid:17) ≤ x , which means that an upper bound of probability of ob-taining a result ˜ x or greater from a normalized unimodaldistribution ˜ X is: p (˜ x ) = min (cid:18) x , (cid:19) . (3)In our case likelihood plays the role of X . We use p (˜ x ) to measure how surprising given cascades are, assumingthey were generated by an SI process with known β sim-ulated on a visible layer of the network. We validateour approach in the next section, using simulations andsynthetic networks. C. Detecting hidden edges

After discarding the possibility that given cascades weregenerated by a given topology and process the same datacan be processed to estimate the topology of a hiddenlayer(s). This requires assuming some topology (givenvisible layer and estimated hidden layer) and β hidden tosimulate the process again and calculate new likelihoods.We will try to ﬁnd the topology by ﬁnding cases of nodeactivation that could not be explained by the assumedsingle layer model. Then in each such case all nodesthat were infected in a step preceding the unexplainedactivation are used to construct a set of possible edges.The details of the procedure are as follows:1. Let J c ( t ) = { i ∈ N : τ ci ≤ t } be a set of nodesthat are infected in a simulation step t of cascade c . N represents the set of nodes in graph G and τ ci is the activation time of node i in cascade c . ∆ J c ( t ) = { i ∈ N : τ ci = t } will be a set of nodesthat became infected in a simulation step t of cascade c .2. Using above notation we can introduce a set of nodesthat became infected in simulation step t of cascade c but were not a neighbor of any node infected at t − : U c ( t ) = ∆ J c ( t ) \ (cid:91) m ∈J c ( t − ∂m, where ∂m is the known neighborhood of node m .3. If the likelihood of a cascade is zero then there is atleast one hidden edge in a set: E ( i, c ) = (cid:40) ( i, k ) : k ∈ J c ( τ ci − , i ∈ t max (cid:91) t =1 U c ( t ) (cid:41) . Likelihood of such an edge being the one that wasactivated to infect i is unknown and diﬃcult toﬁnd. However we can say heuristically that saidlikelihood is: P c ( i, k ) ∼ (1 − β hidden ) | τ ci − τ ck | .

4. Then we must unify the candidates amongst thecascades. Namely if an edge (a, b) was detected in c = 1 but not in c = 2 we need to associate it withthe likelihood of not being detected which is similarlynon-trivial. Also similarly we can say whatever thatlikelihood is it must be ∼ (1 − β hidden ) | τ ci − τ ck |

5. Finally, for each edge e we multiply its likelihoods,therefore obtaining a joint likelihood J : J ( e ) = (cid:89) c P c ( e ) .

6. Edges that maximise J are most likely the hiddenedges we seek.In order to evaluate the quality of our approach, wewill use two metrics: Sensitivity - the ratio between true-positives and allpositives. In our case it is the fraction of hidden edgesthat were detected. α - Credible Set Size ( α -CSS) - a measure intro-duced in [33]. It represents the number of candidatesone must investigate in order to have α level of certaintyof ﬁnding the sought entity. In practice one computesthe rank of the entity he wishes to ﬁnd on the list of thecandidates, in accordance to a given measure said entityshould maximise. The step is repeated many times inorder to get a distribution of that rank. Next, one takes aquantile q = α of that distribution. In our case there aremultiple entities (edges) and so we have taken the libertyof adapting said measure such that we record the highestrecorded rank amongst the hidden entities and follow therest as usual.A null model would naturally be random guessing.There are (cid:0) N (cid:1) edges to check in a system with N nodes.So for instance, with N = 100 that is: (cid:0) (cid:1) = 4950 inwhich case 95% certainty of ﬁnding 1 hidden link requires o f li n k s t o c h e c k f o r . - C SS FIG. 2: The number of edges required to check in orderto have 95% certainty, according to the null model, oftesting all hidden edges, as a function of number ofhidden edges. The plot is done for a network with N = 100 nodes, but the shape of the curve scales withthe size of the network.checking . × (rounded up) edges. Ingeneral the number r of links required to check in orderto have α certainty, according to the null model, can beobtained from: (cid:0) rk (cid:1)(cid:0) ( N ) k (cid:1) = r !( r − k )! (cid:16)(cid:0) N (cid:1) − k (cid:17) ! (cid:0) N (cid:1) ! = α, (4)where k is the number of hidden edges and N is thenumber of nodes. The curve representing r as a functionof k in the case of N = 100 nodes is shown in Fig. 2. Aswe show later on our method requires substantially lessedges to be checked. Since Eq. (4) requires to be solvednumerically, we also derive an asymptotic approximation,which can be ﬁnd in the appendix A. III. EXPERIMENTS

We use both real and synthetic data in our experiments.In the latter case we build networks that are realistic andnot trivial in the sense that we do not want the occurrenceof an activation not explained by the visible network tobe likely. To achieve that, we need a way to control thecorrelation between diﬀerent layers of the network. There-fore we propose our own models for generating multilayernetworks.

A. Synthetic Networks

In the ﬁrst setting, we generate a two-layer network us-ing Barabasi-Albert algorithm [34]. There are two param-eters, which need to be selected – m hidden and m observed , which represent the number of edges added at each step ofthe algorithm, for hidden and observed layers respectively.The two layers are independent but both have power-lawdegree distribution, which is believed to resemble realsocial networks [35] (although lately it is seen more as anidealised approximation [36]). Since lack of correlationmakes the problem of detecting hidden layers much easierwe also propose another setting. We take a square latticeas the ﬁrst layer and then apply a rewiring proceduresimilar to the one introduced by Watts and Strogatz [37]to produce another layer. We start with a square latticeas an equivalent of real relationships (aﬀected by distance)but we also explore scale-free network as a starting layer.The correlation between the two layers is parameterisedby p – the probability of each node being rewired toany random (other) node. In all described settings wekeep the spreading probability of observed layer equalto β observed = 0 . . For the hidden layer, this probabilitytakes the values of β hidden = 0 . and β hidden = 0 . . Rewire probability I n f i n i t e l o g - li k e li h oo d s [ % ] Graph

LatticeBA t max FIG. 3: The percentage of log-likelihoods resulting with −∞ as a function of the probability of rewiring p . Weinvestigate two diﬀerent cases of networks: square latticeand Barabasi-Albert network for diﬀerent lengths ofcascades. The simulations were made for networks of size N = 100 with periodic boundary conditions in the latticecase.When it comes to detecting hidden layers using anytwo-layer network with independent layers results withmajority of cascades giving likelihood equal to . andtherefore the problem becomes trivial. The only inter-esting case is the model with rewiring as there the cor-relations between layers can be large. Fig. 3 shows thepercentage of cascades resulting with likelihood equal tozero as a function of rewiring probability. We test itfor local networks (represented by a square lattice) andwhen there are many long connections (like in the case ofa scale-free Barabasi-Albert network). As expected theproblem is easier for short cascades and quickly becomesmore diﬃcult when the length of cascades grows. Notethat even a very small rewiring probability results in adrastic change of the discussed percentage. It practicallymeans that detecting an unknown transmission channel isfairly simple with our approach. A much more challengingtask, however, is to ﬁnd the actual unknown connections.We shall focus on that in further experiments but ﬁrst letus discuss the case when there is no prohibited dynamicsand if we can investigate the likelihood of such observeddata.If the probability of known cascades is positive we cancompare it with the empirical distribution of cascadessimulated on the observed network. This allows us to usethe Vysochanskij–Petunin inequality and decide whetherthe observed data was generated by process run a graphwith an additional (hidden from us) layer. As seen inFig. 4 using a typical signiﬁcance level of . allows tosuccessfully reject the hypothesis about a single layer insigniﬁcant number of cases (or even all of them). Low t max and β hidden especially in the case of local networkslike the square lattice decrease the eﬀectiveness of the testbut apart from the extreme case (lattice with t max = 5 and β hidden = 0 . ) our proposed approach is an eﬃcienttool for detecting hidden layers.Once we know that there is a hidden layer aﬀectingdynamics we aim at ﬁnding its edges. Tables I and IIbelow show the results of applying our method to bothlattice and Barabasi-Albert networks with hidden layersproduced by rewiring (with probability p = 0 . ). Whencomparing the two settings one can observe a certaininterplay between sensitivity and α -CSS. For lattice basednetwork the sensitivity is signiﬁcantly higher than inBarabasi-Albert case but at the same time scale-free caseis characterised by a much lower α -CSS for both α = 0 . and α = 0 . . In other words, it is easier to correctlyidentify hidden edges when we have a locally connectednetwork (lattice) but at the same time a scale-free networkrequires a smaller set to ﬁnd all hidden edges (despitereaching a lower sensitivity level). Note that in bothcases the observed 0.5-CSS and 0.95-CSS are signiﬁcantlylower than for the null model where, depending on thenumber of rewired links, they would be larger than 2475and 4703 respectively (see Eq. (4)). Full distribution ofranks from which the α -CSS was computed is shown forboth networks at Fig. 5. β hidden β observed sensitivity 0.5-CSS 0.95-CSS TABLE I: Sensitivity and α -CSS for a square lattice withrewiring ( N = 100 , t max = 10 , p = 0 . ). Resultsobtained for realisations per scenario where eachscenario had independent cascades from (possibly)diﬀerent sources.As already discussed when the layers are not correlatedit is easy to identify that there is a hidden spreadingchannel. Nevertheless, ﬁnding the actual unobservedlinks may still be challenging. Both Fig. 6 and 7 show β hidden β observed sensitivity 0.5-CSS 0.95-CSS TABLE II: Sensitivity and α -CSS for a Barabasi-Albertnetwork with rewiring ( N = 100 , t max = 10 , m = 3 , p = 0 . ). Results obtained for realisations perscenario where each scenario had independentcascades from (possibly) diﬀerent sources.that density of the observed network is an importantfactor. From the sensitivity perspective it is better tohave a denser observed network. Unfortunately the 0.95-CSS also grows with the density of known connections,making it more demanding to ﬁnd all the connections.Additionally, although the eﬀect is weaker, it is beneﬁcialfor both measures if the hidden layer is sparser. Thisaligns with intuition since more hidden connections canmake the observed dynamics much more complex andunsurprisingly having more data about the cascades alsomakes the task easier. The actual dependence betweenthe number of cascades and the sensitivity is shown in Fig.8. For a relatively big BA network we need around 30-40cascades to reach a fairly satisfactory sensitivity level ofaround . . Depending on the speciﬁcs of the problemsuch amounts of data may be considered a lot (e.g., inepidemic spreading) or easily available (e.g. informationspreading on social media). In the next subsection wewill see that scaling for real-life networks. B. Real World Networks

On top of synthetic networks we also use real-worlddata to build a multilayer network and empirically testour methods. For that purpose we choose the data col-lected among employees of the Department of ComputerScience at Aarhus University [29]. It is a multilayer net-work consisting of Facebook friendships, co-authorships,work, leisure (repeated leisure activities) and a lunchlayer (regularly eating lunch together). Its full structureis presented on Fig. 1 with each layer being shown as aseparate network. The whole network has nodes and (unique) edges in total.Main results for the Aarhus data are shown in TablesIII, IV and V, which use respectively Facebook, work andlunch layers as the visible parts of the graph. We omittedthe other two possible cases because they are made out ofmore than one connected component. We treat remaininghidden layers as one, aggregated layer as it does notmatter how many layers exactly there are in our detectionmethod. Sensitivity and CSS values are consistent withsynthetic results in the sense that they both grow with thedensity of the visible layer. It is also apparent that ourapproach far exceeds the performance of the null model.Randomly guessing would require us to check links(out of possible in total! See Eq. (4) and appendix A) C o un t t max = 5 | hidden = 0.3 t max = 5 | hidden = 0.7 p ( x ) C o un t t max = 10 | hidden = 0.3 p ( x ) t max = 10 | hidden = 0.7 C o un t t max = 5 | hidden = 0.3 . . . . t max = 5 | hidden = 0.7 p ( x ) C o un t t max = 10 | hidden = 0.3 . . . . . p ( x ) t max = 10 | hidden = 0.7 FIG. 4: Histograms of p (˜ x ) (see Eq. (3)) for various combinations of t max and β hidden . Plots on the left are generatedfor square lattice with rewiring, while the ones on the right are generated for BA network with rewiring. Parametersfor all the networks are as follows: β observed = 0 . , N = 100 , p = 0 . and m = 3 (in case of BA networks). Eachhistogram was made with realisations. d e n s i t y d e n s i t y FIG. 5: Distribution of ranks of hidden edges with medians as vertical lines. Left: lattice with rewiring ( N = 100 , t max = 10 , p = 0 . ). Right: BA network with rewiring ( N = 100 , t max = 10 , m = 3 , p = 0 . ). Results obtained with realisations per scenario where each scenario had independent cascades from diﬀerent sources. These results arefor those realisations where all hidden edges were detected.whereas our method needs just a fraction of that. A moredetailed dependence between sensitivity and the numberof cascades is shown in Fig. 8, where diﬀerent colorsrepresent diﬀerent observed layers. In Fig. 9 we showthe distributions of ranks for the work layer as the visiblenetwork and when comparing them with the syntheticexperiments the two distributions for β hidden = 0 . and β hidden = 0 . are much more symmetric and separated.The distributions of the other two analysed visible layersare qualitatively similar further supporting the merit ofour approach (see appendix B). m hidden . . . . . m o b s e r v e d (a) 1000 nodes and 10 cascades m hidden . . . . . m o b s e r v e d (b) 1000 nodes and 100 cascades FIG. 6: The sensitivity as a function of m hidden and m observed for two layer Barabasi-Albert network with β hidden = 0 . , β observed = 0 . and t max = 10 . The resultsare averaged over 20 independent runs. β hidden β observed sensitivity 0.5-CSS 0.95-CSS TABLE III: Sensitivity and α -CSS for the Aarhus data,with Facebook layer as the observed network. Resultsobtained for 10 cascades with t max = 10 . β hidden β observed sensitivity 0.5-CSS 0.95-CSS TABLE IV: Sensitivity and α -CSS for the Aarhus data,with work layer as the observed network. Resultsobtained for 10 cascades with t max = 10 . IV. DISCUSSION

Spreading processes on networks are a valuable toolwhen describing real-life global diﬀusion processes, like m hidden . . . . . m o b s e r v e d (a) 1000 nodes and 10 cascades m hidden . . . . . m o b s e r v e d (b) 1000 nodes and 100 cascades FIG. 7: The 0.95-CSS as a function of m hidden and m observed for two layer Barabasi-Albert network with β hidden = 0 . , β observed = 0 . and t max = 10 . The resultsare averaged over 20 independent runs. β hidden β observed sensitivity 0.5-CSS 0.95-CSS TABLE V: Sensitivity and α -CSS for the Aarhus data,with lunch layer as the observed network. Resultsobtained for 10 cascades with t max = 10 .epidemics, information spreading, cascading failures etc.These processes may have several spreading channels andrarely do we know, or are even aware, of all of them. Itis therefore crucial to identify whether observed spread-ing was in fact generated only by the observed network.Furthermore, should one conﬁrm the existence of an unob-served spreading path, ﬁnding these hidden connectionscan be of the utmost importance.In this paper we focused on identifying both the exis-tence and the structure of a hidden spreading layer byobserving a diﬀusion process unraveling on a graph. Weprovide methods for i) determining whether a hidden layerexists and ii) estimating what links are present in that

20 40 60 80 100 s e n s i t i v i t y lunchfbworkBA FIG. 8: The sensitivity as a function of number ofcascades for a) two layer Barabasi-Albert network with m = 4 for both layers, β hidden = 0 . , β observed = 0 . , N = 1000 and t max = 10 (red line); b) Aarhus data withdiﬀerent layers as the observed network (lunch – blueline, facebook – yellow line and work – green line). Theresults are averaged over 20 independent runs and theerror bars represent one standard deviation.

200 400 600 800rank0.0000.0010.0020.0030.0040.0050.0060.007 D e n s i t y FIG. 9: Distribution of ranks of hidden edges, withmedians as vertical lines, for the Aarhus data, with worklayer as the observed network. Results obtained for 10cascades with t max = 10 , β observed = 0 . and two valuesof β hidden – 0.3 and 0.7.layer. Our approach is based on an exact formula forthe likelihoods of an observed cascade given knowledgeof the system’s topology. Using said likelihood and thefact its distribution can be assumed to be unimodal weestablished a practical and eﬀective way of discerning theexistence of a hidden layer. Furthermore using a series of heuristics we obtain an algorithm for estimating thejoint likelihood of given (hidden) edge taking part in theobserved cascade therefore providing a tool for assess-ing which nodes are most likely to exchange informationvia channel we do not know of that is vastly superior torandom guessing.Data from synthetic and empirical networks alike con-ﬁrm that uncovering the hidden spreading channel is arelatively simple task with our approach - especially whenthe layers are uncorrelated. It is, however, more diﬃcultto identify speciﬁc hidden connections. Despite the gen-eral similarities there are some quantitative diﬀerencesbetween the results obtained with synthetic and real data.One of the most signiﬁcant diﬀerences is how β hidden relates to the distribution of ranks of hidden edges, inﬂu-encing the diﬃculty of hidden connections reconstruction.This eﬀect is much stronger in real world networks thanin synthetic ones. It can, however, be explained by thediﬀerence in density between hidden and observed net-works. In the corresponding plots for synthetic data (seeFig. 5) both layers have the same density. Here thehidden layer is denser (it is a sum of four hidden layers)and so changing the hidden spreading probability aﬀectsmajority of connections.An important factor in being able to successfully re-cover the hidden connections turns out to be the densityof both the hidden and the observed transmission lay-ers. Speciﬁcally, we observe that the denser the hiddenlayer the harder it is to ﬁnd the exact connections. Aninteresting interplay takes place when it comes to thedensity of the observed layer. On one hand the sensitivitydecreases with the density of observed layer, on the otherhand, the α -CSS is also decreasing with the density. Thisobservation, conﬁrmed by both synthetic and real data,means that as the number of connections on a visiblelayer increases, we are able to identify less hidden edgeson average but we need to take into account a smallerset of potential edges in order to ﬁnd all of the hiddenconnections.It should be pointed out that we only focus on thehidden connections which are not overlapping with theobserved ones. This means that for correlated layersthere might be only few unknown connections whereasthe overlapping edges are also inﬂuencing the dynamics.Focusing on the more general picture and including theoverlapping connections is an interesting subject for futureresearch. Another research direction would be to focus onfurther improving the hidden connections identiﬁcationalgorithms. These improvements should include both theeﬀectiveness and scalability of proposed methods. Thelatter is speciﬁcally important since real world networksare often quite substantial in size. From the perspectiveof empirical data it would also be useful to have a way ofhandling a scenario where diﬀerent layers have diﬀerentvalues of β which may or may not be known. Finally,a more radical generalisations like including temporalnetworks could also prove to be an interesting researchproblem. While we do hope to address some of the abovetopics in the near future we feel that methods presentedhere already provide eﬀective and practical tools for realworld applications. ACKNOWLEDGMENTS

Ł.G.G and J.C. were supported by National Science Centre, Poland Grant No. 2015/19/B/ST6/02612. [1] Manlio De Domenico, Albert Solé-Ribalta, EmanueleCozzo, Mikko Kivelä, Yamir Moreno, Mason A Porter, Ser-gio Gómez, and Alex Arenas. Mathematical formulationof multilayer networks.

Physical Review X , 3(4):041022,2013.[2] Mikko Kivelä, Alex Arenas, Marc Barthelemy, James PGleeson, Yamir Moreno, and Mason A Porter. Multilayernetworks.

Journal of complex networks , 2(3):203–271,2014.[3] Stefano Boccaletti, Ginestra Bianconi, Regino Criado,Charo I Del Genio, Jesús Gómez-Gardenes, Miguel Ro-mance, Irene Sendina-Nadal, Zhen Wang, and Massim-iliano Zanin. The structure and dynamics of multilayernetworks.

Physics Reports , 544(1):1–122, 2014.[4] Manlio De Domenico, Albert Solé-Ribalta, Elisa Omodei,Sergio Gómez, and Alex Arenas. Ranking in intercon-nected multilayer networks reveals versatile nodes.

Naturecommunications , 6(1):1–6, 2015.[5] Alain Barrat, Marc Barthelemy, and Alessandro Vespig-nani.

Dynamical processes on complex networks . Cam-bridge university press, 2008.[6] Romualdo Pastor-Satorras, Claudio Castellano, PietVan Mieghem, and Alessandro Vespignani. Epidemic pro-cesses in complex networks.

Reviews of modern physics ,87(3):925, 2015.[7] Manlio De Domenico, Clara Granell, Mason A Porter,and Alex Arenas. The physics of spreading processesin multilayer networks.

Nature Physics , 12(10):901–906,2016.[8] Guilherme Ferraz de Arruda, Francisco A Rodrigues, andYamir Moreno. Fundamentals of spreading processes insingle and multilayer complex networks.

Physics Reports ,756:1–59, 2018.[9] Robert Paluch, Łukasz G Gajewski, K Suchecki, andJanusz A Hołyst. Source location on multilayer networks. arXiv preprint arXiv:2012.02023 , 2020.[10] Sergio Gomez, Albert Diaz-Guilera, Jesus Gomez-Gardenes, Conrad J Perez-Vicente, Yamir Moreno, andAlex Arenas. Diﬀusion dynamics on multiplex networks.

Physical review letters , 110(2):028701, 2013.[11] Albert Sole-Ribalta, Manlio De Domenico, Nikos E Kou-varis, Albert Diaz-Guilera, Sergio Gomez, and Alex Are-nas. Spectral properties of the laplacian of multiplexnetworks.

Physical Review E , 88(3):032807, 2013.[12] Anna Chmiel and Katarzyna Sznajd-Weron. Phase tran-sitions in the q-voter model with noise on a duplex clique.

Physical Review E , 92(5):052812, 2015.[13] Anna Chmiel, Julian Sienkiewicz, and Katarzyna Sznajd-Weron. Tricriticality in the q-neighbor ising model on apartially duplex clique.

Physical Review E , 96(6):062137,2017. [14] Anna Chmiel, Julian Sienkiewicz, Agata Fronczak, andPiotr Fronczak. A veritable zoology of successive phasetransitions in the asymmetric q-voter model on multiplexnetworks.

Entropy , 22(9):1018, 2020.[15] Łukasz G Gajewski, Julian Sienkiewicz, and Janusz AHołyst. Bifurcations and catastrophes in temporal bi-layermodel of echo chambers and polarisation. arXiv preprintarXiv:2101.03430 , 2021.[16] Lucas Lacasa, Inés P Mariño, Joaquin Miguez, VincenzoNicosia, Édgar Roldán, Ana Lisica, Stephan W Grill, andJesús Gómez-Gardeñes. Multiplex decomposition of non-markovian dynamics and the hidden layer reconstructionproblem.

Physical Review X , 8(3):031038, 2018.[17] Łukasz G Gajewski, Julian Sienkiewicz, and Janusz AHołyst. Discovering hidden layers in quantum graphs. arXiv preprint arXiv:2012.01454 , 2020.[18] Manuel Gomez-Rodriguez, Jure Leskovec, and AndreasKrause. Inferring networks of diﬀusion and inﬂuence.

ACM Transactions on Knowledge Discovery from Data(TKDD) , 5(4):1–37, 2012.[19] Bruno Abrahao, Flavio Chierichetti, Robert Kleinberg,and Alessandro Panconesi. Trace complexity of networkinference. In

Proceedings of the 19th ACM SIGKDDinternational conference on Knowledge discovery and datamining , pages 491–499, 2013.[20] Jean Pouget-Abadie and Thibaut Horel. Inferring graphsfrom cascades: A sparse recovery framework. In

Proceed-ings of the 24th International Conference on World WideWeb , pages 625–626, 2015.[21] Alfredo Braunstein, Alessandro Ingrosso, and Anna PaolaMuntoni. Network reconstruction from infection cascades.

Journal of the Royal Society Interface , 16(151):20180844,2019.[22] Praneeth Netrapalli and Sujay Sanghavi. Learning thegraph of epidemic cascades.

ACM SIGMETRICS Perfor-mance Evaluation Review , 40(1):211–222, 2012.[23] Andrey Lokhov. Reconstructing parameters of spreadingmodels from partial observations. In

Advances in NeuralInformation Processing Systems , pages 3467–3475, 2016.[24] Jiin Woo, Jungseul Ok, and Yung Yi. Iterative learn-ing of graph connectivity from partially-observed cascadesamples. In

Proceedings of the Twenty-First InternationalSymposium on Theory, Algorithmic Foundations, and Pro-tocol Design for Mobile Networks and Mobile Computing ,pages 141–150, 2020.[25] Mateusz Wilinski and Andrey Y Lokhov. Scalable learningof independent cascade dynamics from partial observa-tions. arXiv preprint arXiv:2007.06557 , 2020.[26] Vincent Gripon and Michael Rabbat. Reconstructing agraph from path traces. In , pages 2488–2492. IEEE, Proceedings of the National Academy of Sci-ences , 106(52):22073–22078, 2009.[28] Caterina De Bacco, Eleanor A Power, Daniel B Larremore,and Cristopher Moore. Community detection, link pre-diction, and layer interdependence in multilayer networks.

Physical Review E , 95(4):042317, 2017.[29] Matteo Magnani, Barbora Micenkova, and Luca Rossi.Combinatorial analysis of multiple networks. arXivpreprint arXiv:1303.4986 , 2013.[30] Maureen Hurley, Glen Jacobs, and Melinda Gilbert. Thebasic si model.

New Directions for Teaching and Learning ,2006(106):11–22, 2006.[31] Harrison C White, Scott A Boorman, and Ronald LBreiger. Social structure from multiple networks. i. block-models of roles and positions.

American journal of soci-ology , 81(4):730–780, 1976.[32] Friedrich Pukelsheim. The three sigma rule.

The AmericanStatistician , 48(2):88–91, 1994.[33] Robert Paluch, Łukasz G Gajewski, Janusz A Hołyst, andBoleslaw K Szymanski. Optimizing sensors placement incomplex networks for localization of hidden signal source:A review.

Future Generation Computer Systems , 112:1070–1092, 2020.[34] Albert-László Barabási and Réka Albert. Emergence ofscaling in random networks. science , 286(5439):509–512,1999.[35] Aaron Clauset, Cosma Rohilla Shalizi, and Mark EJ New-man. Power-law distributions in empirical data.

SIAMreview , 51(4):661–703, 2009.[36] Anna D Broido and Aaron Clauset. Scale-free networksare rare.

Nature communications , 10(1):1–10, 2019.[37] Duncan J Watts and Steven H Strogatz. Collective dynam-ics of ‘small-world’networks. nature , 393(6684):440–442,1998.

Appendix A: Null model

In the main text we use Eq. (4) as our null model,however, that form must (for the most part) be solvednumerically for r with a set α . Here we present an ap-proximation that gives a closed form for r and providesan excellent match to numerical computations (and is ofcourse much easier and faster to compute).Let us recall the aforementioned formula: α = (cid:0) rk (cid:1)(cid:0) ( N ) k (cid:1) = r !( r − k )! (cid:16)(cid:0) N (cid:1) − k (cid:17) ! (cid:0) N (cid:1) ! . (A1) o f li n k s t o c h e c k f o r . - C SS Eq. (4) r n ( n k FIG. 10: Comparison of numerical solution of Eq. (4)and the approximation (A3) for N = 100 . The curverepresents the number of edges required to check by thenull model in order to have 95% certainty of testing allhidden edges.Denote (cid:0) N (cid:1) as ξ , then α = r !( r − k !) ( ξ − k )! ξ != ( r − r − . . . ( r − k )!( r − k !) ( ξ − k )!( ξ − ξ − . . . ( ξ − k )!= ( r − r − . . . ( r − k + 1)( ξ − ξ − . . . ( ξ − k + 1) ∼ (cid:18) rξ (cid:19) k . (A2)We can therefore conclude that r ∼ n ( n − k √ α. (A3)The comparison of this result to the numerical solution isshown in Fig. 10. Appendix B: Distributions of ranks for Aarhus data

We analyse three scenarios for the Aarhus data (de-scribed in the main text). For each one of them we usediﬀerent layer as the visible one and project all the otherlayers to one hidden network. The distributions of ranksobtained by applying our approach to the lunch and face-book as observed layer scenarios, are shown in Fig. 11and 12. The third scenario is shown in Fig. 9, in themain text.1

400 600 800 1000rank0.0000.0010.0020.0030.0040.0050.0060.007 D e n s i t y FIG. 11: Distribution of ranks of hidden edges, withmedians as vertical lines, for the Aarhus data, with lunchlayer as the observed network. Results obtained for 10cascades with t max = 10 , β observed = 0 . and two valuesof β hidden – 0.3 and 0.7. D e n s i t y FIG. 12: Distribution of ranks of hidden edges, withmedians as vertical lines, for the Aarhus data, withFacebook layer as the observed network. Resultsobtained for 10 cascades with t max = 10 , β observed = 0 . and two values of β hiddenhidden