Detecting Hidden Layers from Spreading Dynamics on Complex Networks
DDetecting Hidden Layers from Spreading Dynamics on Complex Networks
Łukasz G. Gajewski ∗ and Jan Chołoniewski Center of Excellence for Complex Systems Research, Faculty of Physics,Warsaw University of Technology, Koszykowa 75, 00-662, Warsaw, Poland
Mateusz Wilinski
Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
When dealing with spreading processes on networks it can be of the utmost importance to test thereliability of data and identify potential unobserved spreading paths. In this paper we address theseproblems and propose methods for hidden layer identification and reconstruction. We also explorethe interplay between difficulty of the task and the structure of the multilayer network describing thewhole system where the spreading process occurs. Our methods stem from an exact expression forthe likelihood of a cascade in the Susceptible-Infected model on an arbitrary graph. We then showthat by imploring statistical properties of unimodal distributions and simple heuristics describingjoint likelihood of a series of cascades one can obtain an estimate of both existence of a hidden layerand its content with success rates far exceeding those of a null model. We conduct our analyses onboth synthetic and real-world networks providing evidence for the viability of the approach presented.
I. INTRODUCTION
Real-world complex systems can often be described byinterconnected structures known as multilayer networks[1–4]. Transportation, social or economic networks, toname just a few general examples, can have various typesof connections, see Fig. 1 for an example depiction of sucha system. Each such type of a connection in a network canbe represented as a specific sub-system or sub-network.Railway, flights and bus connections can all be describedwith a network but to have a full description of the trans-portation system, they need to be joined and describedwith a multilayer network. In reality obtaining full infor-mation which would allow to create a complete multilayernetwork is rarely possible. Moreover, in some cases, eventhe knowledge about all existing layers is limited. As aresult, researchers often have to deal with uncertaintywhich arise from dealing with partial information aboutconnectivity in analysed system. This specifically con-cerns one of the fundamental problems in network science,the spreading processes on networks [5–11] but is also ofsignificance for opinion dynamics [12–15].In the following article we focus on the problem ofdetecting hidden layers based on observations of a dynam-ical processes on graphs. We also propose and exploremethods for finding missing connections of different types.Finally, we analyse potential limitations and difficulties aswell as beneficial settings, i.e. when solving the problemis easier, for these methods.The problem of detecting hidden layers has appearedrecently in the literature in a non-markovian setting [16]and on quantum graphs [17]. It is also closely related tothe problem of network reconstruction which was exten-sively analysed in the past [18–22] and also in a partialobservation setting [23–25]. Our setting is a bit simpler ∗ [email protected] in some regards but at the same time still fairly realisticand thus should still be viable for real-world problems.We feel that our simplifications are justified since solvingthe general problem was proved to be limited [19] andprevious papers often approached only limited cases any-way, such as very short cascades [26]. This is not to saythat successful approximations are not possible [18], how-ever, the goal of this paper is to investigate the challengesassociated with detecting hidden layers in interconnectednetworks in the context of spreading processes.Reader should not confuse the problem of finding hid-den layers based on observed spreading with extensivelyanalysed branch of network science called link prediction ,where the hidden connections are estimated using onlythe network structure. A seminal paper in this directionis [27]. An extension, including multilayer networks, canbe found in [28].The paper is structured as follows: in the next sectionwe describe all the methods used in the analysis, startingwith tools which allow for detecting hidden layers and thenproposing methods for identifying unobserved connections.In the third section we analyse both synthetic and realworld networks and show how our methods work underdifferent structural circumstances. Finally, we discuss allthe results and present our conclusions in the last section. II. METHODS
In our analysis we focus on one of the simplest spreadingmodel – the Susceptible Infected (SI) model [30]. We use itin a network version where the dynamics can be describedas follows: for each node i , which is in the infected state I at time t , each of its neighbors j (in susceptible state S ) will become infected at time t + 1 with probability β .This model was used because of its simplicity on one handand the mechanism of multiple infection opportunities (incomparison with, e.g. an independent cascade model) on a r X i v : . [ phy s i c s . s o c - ph ] J a n FIG. 1: Visualisation of the multilayer network representing the Aarhus data [29]. We utilise this network as a realworld example of possible application of our methods. It is quite natural to imagine we know only one of the layerspresented here and would like to infer the existence and possibly structure of others. Note that co-author and leisurelayers are disconnected and thus we do not use them as visible layers in our analyses as that makes the task ofdetecting hidden connections potentially much easier.the other. The latter makes the combinatorial analysismuch more difficult, as we will see in further sections.The former is reflected by an easy to derive likelihood ofany observed spreading, including a multilayer scenario,assuming that the full knowledge about connectivity isavailable. Let us denote a multilayer graph with G andlet the probability of infection (spreading) on each layer j be equal to β j . We will refer to a single spreadingdynamics as a cascade and denote it with Σ c . A singlecascade can be described by a set of infection times τ ci for each node i . We will also assume that a cascade endsat time t max and if a given node was not infected at all,its infection time will be equal to t max . In other words,if node’s i activation times is equal to t max , it was eitheractivated at t max or later – this will be more clear oncethe likelihood is derived. The set of all available cascadeswill be denoted with Σ . A. Cascade likelihood for Susceptible-Infectedmodel
As mentioned before, the likelihood of a given set ofcascades, for a specific and fully known multilayer networkcan be derived, similarly as it was done in [23]. In short,the probability of a given data-set can be written as aproduct over cascades, which are independent, and nodes,because the problem can be considered locally: P (Σ | G, { β j } ) = (cid:89) i ∈ V (cid:89) c ∈ C P i ( τ ci | Σ c , G, { β j } ) , (1) with each element of the product being the probability ofnode activation under a specific cascade: P i ( τ ci | Σ c , G, { β j } ) = (cid:32) τ ci − (cid:89) t =0 (cid:89) j (cid:89) k ∈ ∂ j i (1 − β j τ ck ≤ t ) (cid:33) × (cid:32) − (cid:89) j (cid:89) k ∈ ∂ j i (1 − β j τ ck ≤ τ ci − ) τ ci Assuming X is a random variable with unimodal dis-tribution (mean µ , finite and positive variance σ ) and λ > (cid:113) , we have: Pr( | X − µ | ≥ λσ ) ≤ λ , which after normalisation ˜ X = | X − µ | σ gives: Pr (cid:16) ˜ X ≥ λ (cid:17) ≤ λ . In a boundary case λ = ˜ x : Pr (cid:16) ˜ X ≥ ˜ x (cid:17) ≤ x , which means that an upper bound of probability of ob-taining a result ˜ x or greater from a normalized unimodaldistribution ˜ X is: p (˜ x ) = min (cid:18) x , (cid:19) . (3)In our case likelihood plays the role of X . We use p (˜ x ) to measure how surprising given cascades are, assumingthey were generated by an SI process with known β sim-ulated on a visible layer of the network. We validateour approach in the next section, using simulations andsynthetic networks. C. Detecting hidden edges After discarding the possibility that given cascades weregenerated by a given topology and process the same datacan be processed to estimate the topology of a hiddenlayer(s). This requires assuming some topology (givenvisible layer and estimated hidden layer) and β hidden tosimulate the process again and calculate new likelihoods.We will try to find the topology by finding cases of nodeactivation that could not be explained by the assumedsingle layer model. Then in each such case all nodesthat were infected in a step preceding the unexplainedactivation are used to construct a set of possible edges.The details of the procedure are as follows:1. Let J c ( t ) = { i ∈ N : τ ci ≤ t } be a set of nodesthat are infected in a simulation step t of cascade c . N represents the set of nodes in graph G and τ ci is the activation time of node i in cascade c . ∆ J c ( t ) = { i ∈ N : τ ci = t } will be a set of nodesthat became infected in a simulation step t of cascade c .2. Using above notation we can introduce a set of nodesthat became infected in simulation step t of cascade c but were not a neighbor of any node infected at t − : U c ( t ) = ∆ J c ( t ) \ (cid:91) m ∈J c ( t − ∂m, where ∂m is the known neighborhood of node m .3. If the likelihood of a cascade is zero then there is atleast one hidden edge in a set: E ( i, c ) = (cid:40) ( i, k ) : k ∈ J c ( τ ci − , i ∈ t max (cid:91) t =1 U c ( t ) (cid:41) . Likelihood of such an edge being the one that wasactivated to infect i is unknown and difficult tofind. However we can say heuristically that saidlikelihood is: P c ( i, k ) ∼ (1 − β hidden ) | τ ci − τ ck | . 4. Then we must unify the candidates amongst thecascades. Namely if an edge (a, b) was detected in c = 1 but not in c = 2 we need to associate it withthe likelihood of not being detected which is similarlynon-trivial. Also similarly we can say whatever thatlikelihood is it must be ∼ (1 − β hidden ) | τ ci − τ ck | 5. Finally, for each edge e we multiply its likelihoods,therefore obtaining a joint likelihood J : J ( e ) = (cid:89) c P c ( e ) . 6. Edges that maximise J are most likely the hiddenedges we seek.In order to evaluate the quality of our approach, wewill use two metrics: Sensitivity - the ratio between true-positives and allpositives. In our case it is the fraction of hidden edgesthat were detected. α - Credible Set Size ( α -CSS) - a measure intro-duced in [33]. It represents the number of candidatesone must investigate in order to have α level of certaintyof finding the sought entity. In practice one computesthe rank of the entity he wishes to find on the list of thecandidates, in accordance to a given measure said entityshould maximise. The step is repeated many times inorder to get a distribution of that rank. Next, one takes aquantile q = α of that distribution. In our case there aremultiple entities (edges) and so we have taken the libertyof adapting said measure such that we record the highestrecorded rank amongst the hidden entities and follow therest as usual.A null model would naturally be random guessing.There are (cid:0) N (cid:1) edges to check in a system with N nodes.So for instance, with N = 100 that is: (cid:0) (cid:1) = 4950 inwhich case 95% certainty of finding 1 hidden link requires o f li n k s t o c h e c k f o r . - C SS FIG. 2: The number of edges required to check in orderto have 95% certainty, according to the null model, oftesting all hidden edges, as a function of number ofhidden edges. The plot is done for a network with N = 100 nodes, but the shape of the curve scales withthe size of the network.checking . × (rounded up) edges. Ingeneral the number r of links required to check in orderto have α certainty, according to the null model, can beobtained from: (cid:0) rk (cid:1)(cid:0) ( N ) k (cid:1) = r !( r − k )! (cid:16)(cid:0) N (cid:1) − k (cid:17) ! (cid:0) N (cid:1) ! = α, (4)where k is the number of hidden edges and N is thenumber of nodes. The curve representing r as a functionof k in the case of N = 100 nodes is shown in Fig. 2. Aswe show later on our method requires substantially lessedges to be checked. Since Eq. (4) requires to be solvednumerically, we also derive an asymptotic approximation,which can be find in the appendix A. III. EXPERIMENTS We use both real and synthetic data in our experiments.In the latter case we build networks that are realistic andnot trivial in the sense that we do not want the occurrenceof an activation not explained by the visible network tobe likely. To achieve that, we need a way to control thecorrelation between different layers of the network. There-fore we propose our own models for generating multilayernetworks. A. Synthetic Networks In the first setting, we generate a two-layer network us-ing Barabasi-Albert algorithm [34]. There are two param-eters, which need to be selected – m hidden and m observed , which represent the number of edges added at each step ofthe algorithm, for hidden and observed layers respectively.The two layers are independent but both have power-lawdegree distribution, which is believed to resemble realsocial networks [35] (although lately it is seen more as anidealised approximation [36]). Since lack of correlationmakes the problem of detecting hidden layers much easierwe also propose another setting. We take a square latticeas the first layer and then apply a rewiring proceduresimilar to the one introduced by Watts and Strogatz [37]to produce another layer. We start with a square latticeas an equivalent of real relationships (affected by distance)but we also explore scale-free network as a starting layer.The correlation between the two layers is parameterisedby p – the probability of each node being rewired toany random (other) node. In all described settings wekeep the spreading probability of observed layer equalto β observed = 0 . . For the hidden layer, this probabilitytakes the values of β hidden = 0 . and β hidden = 0 . . Rewire probability I n f i n i t e l o g - li k e li h oo d s [ % ] Graph LatticeBA t max FIG. 3: The percentage of log-likelihoods resulting with −∞ as a function of the probability of rewiring p . Weinvestigate two different cases of networks: square latticeand Barabasi-Albert network for different lengths ofcascades. The simulations were made for networks of size N = 100 with periodic boundary conditions in the latticecase.When it comes to detecting hidden layers using anytwo-layer network with independent layers results withmajority of cascades giving likelihood equal to . andtherefore the problem becomes trivial. The only inter-esting case is the model with rewiring as there the cor-relations between layers can be large. Fig. 3 shows thepercentage of cascades resulting with likelihood equal tozero as a function of rewiring probability. We test itfor local networks (represented by a square lattice) andwhen there are many long connections (like in the case ofa scale-free Barabasi-Albert network). As expected theproblem is easier for short cascades and quickly becomesmore difficult when the length of cascades grows. Notethat even a very small rewiring probability results in adrastic change of the discussed percentage. It practicallymeans that detecting an unknown transmission channel isfairly simple with our approach. A much more challengingtask, however, is to find the actual unknown connections.We shall focus on that in further experiments but first letus discuss the case when there is no prohibited dynamicsand if we can investigate the likelihood of such observeddata.If the probability of known cascades is positive we cancompare it with the empirical distribution of cascadessimulated on the observed network. This allows us to usethe Vysochanskij–Petunin inequality and decide whetherthe observed data was generated by process run a graphwith an additional (hidden from us) layer. As seen inFig. 4 using a typical significance level of . allows tosuccessfully reject the hypothesis about a single layer insignificant number of cases (or even all of them). Low t max and β hidden especially in the case of local networkslike the square lattice decrease the effectiveness of the testbut apart from the extreme case (lattice with t max = 5 and β hidden = 0 . ) our proposed approach is an efficienttool for detecting hidden layers.Once we know that there is a hidden layer affectingdynamics we aim at finding its edges. Tables I and IIbelow show the results of applying our method to bothlattice and Barabasi-Albert networks with hidden layersproduced by rewiring (with probability p = 0 . ). Whencomparing the two settings one can observe a certaininterplay between sensitivity and α -CSS. For lattice basednetwork the sensitivity is significantly higher than inBarabasi-Albert case but at the same time scale-free caseis characterised by a much lower α -CSS for both α = 0 . and α = 0 . . In other words, it is easier to correctlyidentify hidden edges when we have a locally connectednetwork (lattice) but at the same time a scale-free networkrequires a smaller set to find all hidden edges (despitereaching a lower sensitivity level). Note that in bothcases the observed 0.5-CSS and 0.95-CSS are significantlylower than for the null model where, depending on thenumber of rewired links, they would be larger than 2475and 4703 respectively (see Eq. (4)). Full distribution ofranks from which the α -CSS was computed is shown forboth networks at Fig. 5. β hidden β observed sensitivity 0.5-CSS 0.95-CSS TABLE I: Sensitivity and α -CSS for a square lattice withrewiring ( N = 100 , t max = 10 , p = 0 . ). Resultsobtained for realisations per scenario where eachscenario had independent cascades from (possibly)different sources.As already discussed when the layers are not correlatedit is easy to identify that there is a hidden spreadingchannel. Nevertheless, finding the actual unobservedlinks may still be challenging. Both Fig. 6 and 7 show β hidden β observed sensitivity 0.5-CSS 0.95-CSS TABLE II: Sensitivity and α -CSS for a Barabasi-Albertnetwork with rewiring ( N = 100 , t max = 10 , m = 3 , p = 0 . ). Results obtained for realisations perscenario where each scenario had independentcascades from (possibly) different sources.that density of the observed network is an importantfactor. From the sensitivity perspective it is better tohave a denser observed network. Unfortunately the 0.95-CSS also grows with the density of known connections,making it more demanding to find all the connections.Additionally, although the effect is weaker, it is beneficialfor both measures if the hidden layer is sparser. Thisaligns with intuition since more hidden connections canmake the observed dynamics much more complex andunsurprisingly having more data about the cascades alsomakes the task easier. The actual dependence betweenthe number of cascades and the sensitivity is shown in Fig.8. For a relatively big BA network we need around 30-40cascades to reach a fairly satisfactory sensitivity level ofaround . . Depending on the specifics of the problemsuch amounts of data may be considered a lot (e.g., inepidemic spreading) or easily available (e.g. informationspreading on social media). In the next subsection wewill see that scaling for real-life networks. B. Real World Networks On top of synthetic networks we also use real-worlddata to build a multilayer network and empirically testour methods. For that purpose we choose the data col-lected among employees of the Department of ComputerScience at Aarhus University [29]. It is a multilayer net-work consisting of Facebook friendships, co-authorships,work, leisure (repeated leisure activities) and a lunchlayer (regularly eating lunch together). Its full structureis presented on Fig. 1 with each layer being shown as aseparate network. The whole network has nodes and (unique) edges in total.Main results for the Aarhus data are shown in TablesIII, IV and V, which use respectively Facebook, work andlunch layers as the visible parts of the graph. We omittedthe other two possible cases because they are made out ofmore than one connected component. We treat remaininghidden layers as one, aggregated layer as it does notmatter how many layers exactly there are in our detectionmethod. Sensitivity and CSS values are consistent withsynthetic results in the sense that they both grow with thedensity of the visible layer. It is also apparent that ourapproach far exceeds the performance of the null model.Randomly guessing would require us to check links(out of possible in total! See Eq. (4) and appendix A) C o un t t max = 5 | hidden = 0.3 t max = 5 | hidden = 0.7 p ( x ) C o un t t max = 10 | hidden = 0.3 p ( x ) t max = 10 | hidden = 0.7 C o un t t max = 5 | hidden = 0.3 . . . . t max = 5 | hidden = 0.7 p ( x ) C o un t t max = 10 | hidden = 0.3 . . . . . p ( x ) t max = 10 | hidden = 0.7 FIG. 4: Histograms of p (˜ x ) (see Eq. (3)) for various combinations of t max and β hidden . Plots on the left are generatedfor square lattice with rewiring, while the ones on the right are generated for BA network with rewiring. Parametersfor all the networks are as follows: β observed = 0 . , N = 100 , p = 0 . and m = 3 (in case of BA networks). Eachhistogram was made with realisations. d e n s i t y d e n s i t y FIG. 5: Distribution of ranks of hidden edges with medians as vertical lines. Left: lattice with rewiring ( N = 100 , t max = 10 , p = 0 . ). Right: BA network with rewiring ( N = 100 , t max = 10 , m = 3 , p = 0 . ). Results obtained with realisations per scenario where each scenario had independent cascades from different sources. These results arefor those realisations where all hidden edges were detected.whereas our method needs just a fraction of that. A moredetailed dependence between sensitivity and the numberof cascades is shown in Fig. 8, where different colorsrepresent different observed layers. In Fig. 9 we showthe distributions of ranks for the work layer as the visiblenetwork and when comparing them with the syntheticexperiments the two distributions for β hidden = 0 . and β hidden = 0 . are much more symmetric and separated.The distributions of the other two analysed visible layersare qualitatively similar further supporting the merit ofour approach (see appendix B). m hidden . . . . . m o b s e r v e d (a) 1000 nodes and 10 cascades m hidden . . . . . m o b s e r v e d (b) 1000 nodes and 100 cascades FIG. 6: The sensitivity as a function of m hidden and m observed for two layer Barabasi-Albert network with β hidden = 0 . , β observed = 0 . and t max = 10 . The resultsare averaged over 20 independent runs. β hidden β observed sensitivity 0.5-CSS 0.95-CSS TABLE III: Sensitivity and α -CSS for the Aarhus data,with Facebook layer as the observed network. Resultsobtained for 10 cascades with t max = 10 . β hidden β observed sensitivity 0.5-CSS 0.95-CSS TABLE IV: Sensitivity and α -CSS for the Aarhus data,with work layer as the observed network. Resultsobtained for 10 cascades with t max = 10 . IV. DISCUSSION Spreading processes on networks are a valuable toolwhen describing real-life global diffusion processes, like m hidden . . . . . m o b s e r v e d (a) 1000 nodes and 10 cascades m hidden . . . . . m o b s e r v e d (b) 1000 nodes and 100 cascades FIG. 7: The 0.95-CSS as a function of m hidden and m observed for two layer Barabasi-Albert network with β hidden = 0 . , β observed = 0 . and t max = 10 . The resultsare averaged over 20 independent runs. β hidden β observed sensitivity 0.5-CSS 0.95-CSS TABLE V: Sensitivity and α -CSS for the Aarhus data,with lunch layer as the observed network. Resultsobtained for 10 cascades with t max = 10 .epidemics, information spreading, cascading failures etc.These processes may have several spreading channels andrarely do we know, or are even aware, of all of them. Itis therefore crucial to identify whether observed spread-ing was in fact generated only by the observed network.Furthermore, should one confirm the existence of an unob-served spreading path, finding these hidden connectionscan be of the utmost importance.In this paper we focused on identifying both the exis-tence and the structure of a hidden spreading layer byobserving a diffusion process unraveling on a graph. Weprovide methods for i) determining whether a hidden layerexists and ii) estimating what links are present in that 20 40 60 80 100 s e n s i t i v i t y lunchfbworkBA FIG. 8: The sensitivity as a function of number ofcascades for a) two layer Barabasi-Albert network with m = 4 for both layers, β hidden = 0 . , β observed = 0 . , N = 1000 and t max = 10 (red line); b) Aarhus data withdifferent layers as the observed network (lunch – blueline, facebook – yellow line and work – green line). Theresults are averaged over 20 independent runs and theerror bars represent one standard deviation. 200 400 600 800rank0.0000.0010.0020.0030.0040.0050.0060.007 D e n s i t y FIG. 9: Distribution of ranks of hidden edges, withmedians as vertical lines, for the Aarhus data, with worklayer as the observed network. Results obtained for 10cascades with t max = 10 , β observed = 0 . and two valuesof β hidden – 0.3 and 0.7.layer. Our approach is based on an exact formula forthe likelihoods of an observed cascade given knowledgeof the system’s topology. Using said likelihood and thefact its distribution can be assumed to be unimodal weestablished a practical and effective way of discerning theexistence of a hidden layer. Furthermore using a series of heuristics we obtain an algorithm for estimating thejoint likelihood of given (hidden) edge taking part in theobserved cascade therefore providing a tool for assess-ing which nodes are most likely to exchange informationvia channel we do not know of that is vastly superior torandom guessing.Data from synthetic and empirical networks alike con-firm that uncovering the hidden spreading channel is arelatively simple task with our approach - especially whenthe layers are uncorrelated. It is, however, more difficultto identify specific hidden connections. Despite the gen-eral similarities there are some quantitative differencesbetween the results obtained with synthetic and real data.One of the most significant differences is how β hidden relates to the distribution of ranks of hidden edges, influ-encing the difficulty of hidden connections reconstruction.This effect is much stronger in real world networks thanin synthetic ones. It can, however, be explained by thedifference in density between hidden and observed net-works. In the corresponding plots for synthetic data (seeFig. 5) both layers have the same density. Here thehidden layer is denser (it is a sum of four hidden layers)and so changing the hidden spreading probability affectsmajority of connections.An important factor in being able to successfully re-cover the hidden connections turns out to be the densityof both the hidden and the observed transmission lay-ers. Specifically, we observe that the denser the hiddenlayer the harder it is to find the exact connections. Aninteresting interplay takes place when it comes to thedensity of the observed layer. On one hand the sensitivitydecreases with the density of observed layer, on the otherhand, the α -CSS is also decreasing with the density. Thisobservation, confirmed by both synthetic and real data,means that as the number of connections on a visiblelayer increases, we are able to identify less hidden edgeson average but we need to take into account a smallerset of potential edges in order to find all of the hiddenconnections.It should be pointed out that we only focus on thehidden connections which are not overlapping with theobserved ones. This means that for correlated layersthere might be only few unknown connections whereasthe overlapping edges are also influencing the dynamics.Focusing on the more general picture and including theoverlapping connections is an interesting subject for futureresearch. Another research direction would be to focus onfurther improving the hidden connections identificationalgorithms. These improvements should include both theeffectiveness and scalability of proposed methods. Thelatter is specifically important since real world networksare often quite substantial in size. From the perspectiveof empirical data it would also be useful to have a way ofhandling a scenario where different layers have differentvalues of β which may or may not be known. Finally,a more radical generalisations like including temporalnetworks could also prove to be an interesting researchproblem. While we do hope to address some of the abovetopics in the near future we feel that methods presentedhere already provide effective and practical tools for realworld applications. ACKNOWLEDGMENTS Ł.G.G and J.C. were supported by National Science Centre, Poland Grant No. 2015/19/B/ST6/02612. [1] Manlio De Domenico, Albert Solé-Ribalta, EmanueleCozzo, Mikko Kivelä, Yamir Moreno, Mason A Porter, Ser-gio Gómez, and Alex Arenas. Mathematical formulationof multilayer networks. Physical Review X , 3(4):041022,2013.[2] Mikko Kivelä, Alex Arenas, Marc Barthelemy, James PGleeson, Yamir Moreno, and Mason A Porter. Multilayernetworks. Journal of complex networks , 2(3):203–271,2014.[3] Stefano Boccaletti, Ginestra Bianconi, Regino Criado,Charo I Del Genio, Jesús Gómez-Gardenes, Miguel Ro-mance, Irene Sendina-Nadal, Zhen Wang, and Massim-iliano Zanin. The structure and dynamics of multilayernetworks. Physics Reports , 544(1):1–122, 2014.[4] Manlio De Domenico, Albert Solé-Ribalta, Elisa Omodei,Sergio Gómez, and Alex Arenas. Ranking in intercon-nected multilayer networks reveals versatile nodes. Naturecommunications , 6(1):1–6, 2015.[5] Alain Barrat, Marc Barthelemy, and Alessandro Vespig-nani. Dynamical processes on complex networks . Cam-bridge university press, 2008.[6] Romualdo Pastor-Satorras, Claudio Castellano, PietVan Mieghem, and Alessandro Vespignani. Epidemic pro-cesses in complex networks. Reviews of modern physics ,87(3):925, 2015.[7] Manlio De Domenico, Clara Granell, Mason A Porter,and Alex Arenas. The physics of spreading processesin multilayer networks. Nature Physics , 12(10):901–906,2016.[8] Guilherme Ferraz de Arruda, Francisco A Rodrigues, andYamir Moreno. Fundamentals of spreading processes insingle and multilayer complex networks. Physics Reports ,756:1–59, 2018.[9] Robert Paluch, Łukasz G Gajewski, K Suchecki, andJanusz A Hołyst. Source location on multilayer networks. arXiv preprint arXiv:2012.02023 , 2020.[10] Sergio Gomez, Albert Diaz-Guilera, Jesus Gomez-Gardenes, Conrad J Perez-Vicente, Yamir Moreno, andAlex Arenas. Diffusion dynamics on multiplex networks. Physical review letters , 110(2):028701, 2013.[11] Albert Sole-Ribalta, Manlio De Domenico, Nikos E Kou-varis, Albert Diaz-Guilera, Sergio Gomez, and Alex Are-nas. Spectral properties of the laplacian of multiplexnetworks. Physical Review E , 88(3):032807, 2013.[12] Anna Chmiel and Katarzyna Sznajd-Weron. Phase tran-sitions in the q-voter model with noise on a duplex clique. Physical Review E , 92(5):052812, 2015.[13] Anna Chmiel, Julian Sienkiewicz, and Katarzyna Sznajd-Weron. Tricriticality in the q-neighbor ising model on apartially duplex clique. Physical Review E , 96(6):062137,2017. [14] Anna Chmiel, Julian Sienkiewicz, Agata Fronczak, andPiotr Fronczak. A veritable zoology of successive phasetransitions in the asymmetric q-voter model on multiplexnetworks. Entropy , 22(9):1018, 2020.[15] Łukasz G Gajewski, Julian Sienkiewicz, and Janusz AHołyst. Bifurcations and catastrophes in temporal bi-layermodel of echo chambers and polarisation. arXiv preprintarXiv:2101.03430 , 2021.[16] Lucas Lacasa, Inés P Mariño, Joaquin Miguez, VincenzoNicosia, Édgar Roldán, Ana Lisica, Stephan W Grill, andJesús Gómez-Gardeñes. Multiplex decomposition of non-markovian dynamics and the hidden layer reconstructionproblem. Physical Review X , 8(3):031038, 2018.[17] Łukasz G Gajewski, Julian Sienkiewicz, and Janusz AHołyst. Discovering hidden layers in quantum graphs. arXiv preprint arXiv:2012.01454 , 2020.[18] Manuel Gomez-Rodriguez, Jure Leskovec, and AndreasKrause. Inferring networks of diffusion and influence. ACM Transactions on Knowledge Discovery from Data(TKDD) , 5(4):1–37, 2012.[19] Bruno Abrahao, Flavio Chierichetti, Robert Kleinberg,and Alessandro Panconesi. Trace complexity of networkinference. In Proceedings of the 19th ACM SIGKDDinternational conference on Knowledge discovery and datamining , pages 491–499, 2013.[20] Jean Pouget-Abadie and Thibaut Horel. Inferring graphsfrom cascades: A sparse recovery framework. In Proceed-ings of the 24th International Conference on World WideWeb , pages 625–626, 2015.[21] Alfredo Braunstein, Alessandro Ingrosso, and Anna PaolaMuntoni. Network reconstruction from infection cascades. Journal of the Royal Society Interface , 16(151):20180844,2019.[22] Praneeth Netrapalli and Sujay Sanghavi. Learning thegraph of epidemic cascades. ACM SIGMETRICS Perfor-mance Evaluation Review , 40(1):211–222, 2012.[23] Andrey Lokhov. Reconstructing parameters of spreadingmodels from partial observations. In Advances in NeuralInformation Processing Systems , pages 3467–3475, 2016.[24] Jiin Woo, Jungseul Ok, and Yung Yi. Iterative learn-ing of graph connectivity from partially-observed cascadesamples. In Proceedings of the Twenty-First InternationalSymposium on Theory, Algorithmic Foundations, and Pro-tocol Design for Mobile Networks and Mobile Computing ,pages 141–150, 2020.[25] Mateusz Wilinski and Andrey Y Lokhov. Scalable learningof independent cascade dynamics from partial observa-tions. arXiv preprint arXiv:2007.06557 , 2020.[26] Vincent Gripon and Michael Rabbat. Reconstructing agraph from path traces. In , pages 2488–2492. IEEE, Proceedings of the National Academy of Sci-ences , 106(52):22073–22078, 2009.[28] Caterina De Bacco, Eleanor A Power, Daniel B Larremore,and Cristopher Moore. Community detection, link pre-diction, and layer interdependence in multilayer networks. Physical Review E , 95(4):042317, 2017.[29] Matteo Magnani, Barbora Micenkova, and Luca Rossi.Combinatorial analysis of multiple networks. arXivpreprint arXiv:1303.4986 , 2013.[30] Maureen Hurley, Glen Jacobs, and Melinda Gilbert. Thebasic si model. New Directions for Teaching and Learning ,2006(106):11–22, 2006.[31] Harrison C White, Scott A Boorman, and Ronald LBreiger. Social structure from multiple networks. i. block-models of roles and positions. American journal of soci-ology , 81(4):730–780, 1976.[32] Friedrich Pukelsheim. The three sigma rule. The AmericanStatistician , 48(2):88–91, 1994.[33] Robert Paluch, Łukasz G Gajewski, Janusz A Hołyst, andBoleslaw K Szymanski. Optimizing sensors placement incomplex networks for localization of hidden signal source:A review. Future Generation Computer Systems , 112:1070–1092, 2020.[34] Albert-László Barabási and Réka Albert. Emergence ofscaling in random networks. science , 286(5439):509–512,1999.[35] Aaron Clauset, Cosma Rohilla Shalizi, and Mark EJ New-man. Power-law distributions in empirical data. SIAMreview , 51(4):661–703, 2009.[36] Anna D Broido and Aaron Clauset. Scale-free networksare rare. Nature communications , 10(1):1–10, 2019.[37] Duncan J Watts and Steven H Strogatz. Collective dynam-ics of ‘small-world’networks. nature , 393(6684):440–442,1998. Appendix A: Null model In the main text we use Eq. (4) as our null model,however, that form must (for the most part) be solvednumerically for r with a set α . Here we present an ap-proximation that gives a closed form for r and providesan excellent match to numerical computations (and is ofcourse much easier and faster to compute).Let us recall the aforementioned formula: α = (cid:0) rk (cid:1)(cid:0) ( N ) k (cid:1) = r !( r − k )! (cid:16)(cid:0) N (cid:1) − k (cid:17) ! (cid:0) N (cid:1) ! . (A1) o f li n k s t o c h e c k f o r . - C SS Eq. (4) r n ( n k FIG. 10: Comparison of numerical solution of Eq. (4)and the approximation (A3) for N = 100 . The curverepresents the number of edges required to check by thenull model in order to have 95% certainty of testing allhidden edges.Denote (cid:0) N (cid:1) as ξ , then α = r !( r − k !) ( ξ − k )! ξ != ( r − r − . . . ( r − k )!( r − k !) ( ξ − k )!( ξ − ξ − . . . ( ξ − k )!= ( r − r − . . . ( r − k + 1)( ξ − ξ − . . . ( ξ − k + 1) ∼ (cid:18) rξ (cid:19) k . (A2)We can therefore conclude that r ∼ n ( n − k √ α. (A3)The comparison of this result to the numerical solution isshown in Fig. 10. Appendix B: Distributions of ranks for Aarhus data We analyse three scenarios for the Aarhus data (de-scribed in the main text). For each one of them we usedifferent layer as the visible one and project all the otherlayers to one hidden network. The distributions of ranksobtained by applying our approach to the lunch and face-book as observed layer scenarios, are shown in Fig. 11and 12. The third scenario is shown in Fig. 9, in themain text.1 400 600 800 1000rank0.0000.0010.0020.0030.0040.0050.0060.007 D e n s i t y FIG. 11: Distribution of ranks of hidden edges, withmedians as vertical lines, for the Aarhus data, with lunchlayer as the observed network. Results obtained for 10cascades with t max = 10 , β observed = 0 . and two valuesof β hidden – 0.3 and 0.7. D e n s i t y FIG. 12: Distribution of ranks of hidden edges, withmedians as vertical lines, for the Aarhus data, withFacebook layer as the observed network. Resultsobtained for 10 cascades with t max = 10 , β observed = 0 . and two values of β hiddenhidden