Systemic risk analysis in reconstructed economic and financial networks
Giulio Cimini, Tiziano Squartini, Diego Garlaschelli, Andrea Gabrielli
aa r X i v : . [ phy s i c s . s o c - ph ] M a y Systemic Risk Analysis on Reconstructed Economicand Financial Networks
Giulio Cimini , Tiziano Squartini , Diego Garlaschelli , and Andrea Gabrielli Istituto dei Sistemi Complessi (ISC-CNR) UoS “Sapienza” Universit `a di Roma, 00185 Rome, Italy Lorentz Institute for Theoretical Physics, University of Leiden, 9506 Leiden, Netherlands IMT Institute for Advanced Studies, 55100 Lucca, Italy * [email protected] ABSTRACT
We address a fundamental problem that is systematically encountered when modeling complex systems: the limitedness ofthe information available. In the case of economic and financial networks, privacy issues severely limit the information thatcan be accessed and, as a consequence, the possibility of correctly estimating the resilience of these systems to events suchas financial shocks, crises and cascade failures. Here we present an innovative method to reconstruct the structure of suchpartially-accessible systems, based on the knowledge of intrinsic node-specific properties and of the number of connectionsof only a limited subset of nodes. This information is used to calibrate an inference procedure based on fundamental conceptsderived from statistical physics, which allows to generate ensembles of directed weighted networks intended to represent thereal system—so that the real network properties can be estimated with their average values within the ensemble. Here wetest the method both on synthetic and empirical networks, focusing on the properties that are commonly used to measuresystemic risk. Indeed, the method shows a remarkable robustness with respect to the limitedness of the information available,thus representing a valuable tool for gaining insights on privacy-protected economic and financial systems.
Introduction
The estimation of the structural properties of a complex network when the available information on the system is incompleterepresents an unsolved challenge, yet it brings to many important applications. The most typical case is that of financialnetworks, whose nodes represent financial institutions and edges stand for financial ties ( e.g. , loans or derivative contracts)—the latter indicating dependencies among the institutions themselves, allowing for the propagation of financial distress acrossthe network. The resilience of the system to the default or the distress of one or more institutions considerably depends on thetopology of the whole network; however, because of confidentiality issues, the information on mutual exposures that regula-tors are able to collect is very limited. Systemic risk analysis has been typically pursued by reconstructing the unknown linksof the network using maximum entropy approaches.
These methods are also known as “dense reconstruction” techniquesbecause they assume that the network is fully connected—an hypothesis that represents their strongest limitation. In fact, notonly real networks show a largely heterogeneous distribution of the connectivity, but such a dense reconstruction was shownto lead to systemic risk underestimation.
More refined techniques like “sparse reconstruction” algorithms allow to obtaina network with arbitrary heterogeneity, however they still underestimate systemic risk because of the homogeneity principleused to assign link weights. A more recent approach instead uses the limited topological information on the networkto be reconstructed in order to generate an ensemble of graphs using the configuration model (CM) —where, however, theLagrange multipliers that define it are replaced by fitnesses , i.e. known intrinsic node-specific features. The average valuesof the observables computed on the CM-induced ensemble are then used as estimates for the real network properties. Thelatter approach overcomes the heterogeneity issue described above, yet it only allows to reconstruct systems in which each tieis undirected and unweighted—thus limiting the analysis to unrealistic and oversimplified configurations. Indeed, link direc-tionality has been shown to play an important role in contagion processes and percolation analysis over these systems by, e.g. , speeding up or confining the infection with respect to the undirected case. Since real economic and financial networksare, by their nature, directed, links directionality has to be taken into account when assessing their robustness to shock andcrashes. Moreover, the connection weights between the entities of these systems often assume heterogeneous values, whichin turn strongly affect the way such entities react to the default or distress of their interacting partners. In order to achieve a realistic and faithful reconstruction of economic and financial networks, here we develop an improvedprocedure that allows to reconstruct links directionality, and at the same time we implement a simple yet effective prescriptionto assign link weights. Our method can thus be employed specifically for systemic risk estimation, by assessing those topo-logical properties that have been shown to play a crucial role in contagion processes and in the propagation of distress over aetwork: the k-core structure, the percolation threshold, the mean shortest path length and the DebtRank. In particular,we perform an extensive analysis in order to quantify the accuracy of our method with respect to the size of the subset of nodesfor which the topological information is available. Validation of the method is carried out on benchmark synthetic networksgenerated through a fitness-induced CM, as well as on two representative empirical systems, namely the International TradeNetwork (WTW) and the (E-mid) Electronic Market for Interbank Deposits. In both cases, we have full information onthese systems and we can thus unambiguously assess the accuracy of the method in describing them.
Method
Before explaining our method in detail, let us introduce some notation. We will deal with weighted directed networks, i.e. ,graphs composed by a set V of nodes (with | V | = N ) and described by a weighted directed adjacency matrix—whose genericelement w i → j represents the weight of the connection that runs from node i to node j . The incoming total weight or in-strength for a generic node i is then defined by s ini = (cid:229) j ∈ V w j → i , whereas, its outgoing total weight out-strength reads s outi = (cid:229) j ∈ V w i → j .It is also convenient to introduce the binary directed adjacency matrix that describes the binary topology: a i → j = Q [ w i → j ] ( Q is the Heaviside step function: Q [ a ] = a > Q [ a ] = i ’s number of incomingconnections or in-degree k ini = (cid:229) j ∈ V a j → i and number of outgoing connections or out-degree k outi = (cid:229) j ∈ V a i → j . Finally, thebinary undirected adjacency matrix—whose elements are obtained as a i j ≡ a ji = Q [ w i → j + w j → i ] —is used to define thenumber of incident connections or undirected degree of node i : k i = (cid:229) j ∈ V a i j ≡ (cid:229) j ∈ V a ji .Given these ingredients, our network reconstruction method works as follows. Let us suppose to have incomplete informa-tion about the topology of a given network G . In particular, suppose to know the in-degree and out-degree sequences { k ini } i ∈ I and { k outi } i ∈ I only for a subset I ⊂ V of all nodes (where | I | = n < N ). Moreover, suppose to know a pair of intrinsic properties { c i } i ∈ V and { y i } i ∈ V for all the nodes—that will be our fitnesses (see below). The method then invokes a statistical procedureto find the most probable estimate for the value X ( G ) of a given property X computed on the network G , compatible withthe aforementioned constraints. We build on two important hypotheses.I) The network G is drawn from an ensemble W induced by a directed CM —meaning that W is a set of networks that aremaximally random, except for the ensemble averages of the in/out-degrees {h k ini i W } i ∈ V and {h k outi i W } i ∈ V that are constrainedto the observed values { k ini } i ∈ V and { k outi } i ∈ V , respectively. The directed CM prescribes that the probability distributionover W is defined via a set of Lagrange multipliers { x i , y i } i ∈ V (two for each node), whose values can be adjusted in order tosatisfy the equivalence h k ini i W ≡ k ini and h k outi i W ≡ k outi , ∀ i ∈ V . The values of x i and y i are thus induced by the in- and out-degree of node i , respectively. The role of { x i , y i } i ∈ V in controlling the topology is better clarified by writing explicitly theensemble probability for a directed connection between any two nodes i and j : p i → j ≡ h a i → j i W = x j y i + x j y i , (1)so that x i ( y i ) quantifies the ability of node i to receive incoming (form outgoing) connections.II) The fitnesses { c i } i ∈ V and { y i } i ∈ V are assumed to be linearly correlated, respectively, to the in-degree-induced andout-degree-induced Lagrange multipliers { x i } i ∈ V and { y i } i ∈ V through universal (unknown) parameters a and b : x i ≡ √ ac i and y i ≡ p by i , ∀ i ∈ V . Therefore eq. (1) becomes: p i → j = √ ac j p by i + √ ac j p by i = z c j y i + z c j y i , (2)where we have defined z ≡ p ab . Such an hypothesis is inspired by the so-called fitness or hidden-variables model, whichassumes the network topology to be determined by intrinsic properties associated to each node of the network. Note that thisapproach has been already used in the past to model several empirical economic and financial networks, possibly withinthe CM framework assuming a connection between fitnesses and Lagrange multipliers. These two hypotheses allow us to map the problem of evaluating X ( G ) into the one of choosing the optimal CM ensemble W induced by the fitnesses { c i } i ∈ V and { y i } i ∈ V , that is compatible with the constraints on G —given by the knowledge of { k ini } i ∈ I and { k outi } i ∈ I . Indeed, because of the limited available information, finding the CM of the real system is impossible,and we thus have to impose it by assigning ad hoc values to the Lagrange multipliers—whence the name of “fitness-induced”CM (FiCM). Once the FiCM ensemble W is determined (it is univocally defined by the set { x i , y i } i ∈ V , and thus by the set { c i , y i } i ∈ V ), statistical mechanics of networks prescribes that the quantity X ( G ) typically varies in the range h X i W ± s W X ,where h X i W and s W X are respectively average and standard deviation of property X estimated over W . We can thus use h X i W as a good estimation for X ( G ) . In practice, since we know the fitness values { c i , y i } i ∈ V , in order to determine unambiguouslythe ensemble W we need to find the most likely value of the proportionality constant z that defines W according to eq. (2). This an be done using the partial knowledge of the degree sequences to estimate the appropriate value of z through a maximum-likelihood argument, i.e. , by comparing, for the nodes in the set I , the average number of incoming and outgoing connectionsin the ensemble W with their in-degrees and out-degrees observed in G : (cid:229) i ∈ I (cid:2) h k ini i W + h k outi i W (cid:3) = (cid:229) i ∈ I (cid:2) k ini + k outi (cid:3) . (3)In the above expression, h k ini i W = (cid:229) j ( = i ) p j → i and h k outi i W = (cid:229) j ( = i ) p i → j contain the unknown parameter z through eq. (2), andsince { c i , y i } i ∈ V and { k ini , k outi } i ∈ I are known, eq. (3) defines an algebraic equation in z , whose solution allows to build theFiCM ensemble and, at the end, to obtain an estimation of X ( G ) —even with the knowledge of the in- and out- degree of justa single node.Summing up, the algorithm works as follows. Given a network G , two fitness values c and y for each of the N nodes,and the in-degrees and out-degrees only for a subset I of | I | = n < N nodes: • we compute the sum of the in-degrees and out-degrees of the nodes in I and use it together with the fitnesses { c i , y i } i ∈ V to obtain the value of z by solving eq. (3); • using such estimated z and the fitnesses { c i , y i } i ∈ V , we generate the ensemble W by placing a directed link from a givennode i to a given node j with probability p i → j of eq. (2); • we compute the estimate of X ( G ) as h X i W ± s W X in the FiCM ensemble, either analytically or numerically ( i.e. , bymeasuring it on networks drawn from W ).Note that the numerical generation of a sample network from W consists in building a binary directed adjacency matrix, sothat its generic element a i → j = p i → j ( i.e. , the existence probability for the link from i to j given by eq. (2))and a i → j = ∀ i , j a weight˜ w i → j on the directed link from i to j (provided its existence, i.e. , a i → j =
1) according to the following prescription:˜ w i → j = c j y i W p i → j a i → j ≡ W ( z − + c j y i ) a i → j , (4)where the last equality comes from eq. (2). In this expression, the normalization W represents the induced total weight ofthe network , defined as the geometric mean of the sum of the fitnesses: W = p ( (cid:229) i c i )( (cid:229) i y i ) . Indeed, W corresponds to theensemble average of the total network weight: W ≡ (cid:229) i j h ˜ w i → j i W , where h ˜ w i → j i W = ( c j y i ) h a i → j i W / ( W p i → j ) = ( c j y i ) / W isthe ensemble average for the weight of the link from i to j .Remarkably, thanks to this procedure to assign link weights, the ensemble averages of a node i ’s total in-strength h s ini i W = (cid:229) j ∈ V h ˜ w j → i i W and out-strength h s outi i W = (cid:229) j ∈ V h ˜ w i → j i W turn out to be directly proportional to c i and y i , respectively and ∀ i ∈ V .This suggests a natural way for selecting appropriate quantities to play the role of fitnesses in our method: the straightforwardinterpretation for c and y is that of nodes in-strengths and out-strengths observed in the real network G . Indeed, theassumption c i = s ini and y i = s outi , ∀ i ∈ V , brings to W = (cid:229) i ∈ V c i ≡ (cid:229) i ∈ V y i , and finally to the important equivalences h s ini i W ≡ s ini and h s outi i W ≡ s outi . This means that we successfully preserve, on average, the strength sequences of the real network G (and thus its total weight). In other words, our network reconstruction method calibrated on in/out strengths is based on anull model constraining the in-degree and out-degree sequence of a subset of nodes, together with the relative in-strength andout-strength sequence (see Figure 1). Empirical Dataset
In order to test our network reconstruction method, we use two representative empirical systems of economic and financialnature. The first one is the international trade network of the World Trade Web (WTW), i.e. , the network whose nodes arethe countries and links represent trade volumes between them: thus, w i → j is the monetary flux from country i to country j (the“amount” of the export from j to i ). The second one is the (E-mid) Electronic Market for Interbank Deposits: in this case,the nodes are banks and a link w i → j from bank i to bank j represents the amount of the loan that i granted to j .In the following analysis we will use and show results for WTW trade volume data of year 2000, and E-mid aggregatedtransaction data of year 1999 (both temporal snapshots correspond to the largest size of the network). Analyses for other annualsnapshots are reported in the Supplementary Information, and bring to comparable results. In the light of the discussionat the end of the previous section, we will use as fitnesses c i ( y i ) the real node in-strength s ini = (cid:229) j ∈ V w j → i (out-strength s outi = (cid:229) j ∈ V w i → j ), i.e. , the total import (export) volumes of countries for WTW, and with the total liquidity borrowed (lent)by banks for E-mid. Note that the goodness of any choice for the fitness values must be first validated according to hypothesisII of our method (see the first part of section Results). s in < s i n > s out < s ou t > s in < s i n > s out < s ou t > (a) (b)(c) (d) Figure 1.
Conservation of the strength sequences. Scatter plots of node in-strengths s in and out-strengths s out observed forthe real network G and their ensemble averages obtained from eq. (4). Upper panels (a,b) refer to WTW, lower panels (c,d)to E-mid. Topological Properties
As stated in the introduction, we will test our network reconstruction method focusing on the network properties (each playingthe role of X in the discussion of section Methods) which are commonly regarded as the most significant for describing thenetwork resilience to systemic shocks and crashes. We first consider two properties defined for undirected networks (in orderto reconstruct these properties, we use the undirected version of the method ): • Degree of the main core k main and size of the main core S main , where a k -core is defined as the “largest connectedsubgraph whose nodes all have at least k connections” (within this subgraph), and the main core is the k -core with thehighest possible degree ( k main ). The main core is relevant to our analysis as it consists of the most influential spreaders(of, e.g. , an infection or a shock) in a network. • Size of the giant component S GC at the bond percolation threshold p ∗ = ¯ k − (¯ k is the mean degree of the network),where bond percolation is the process of occupying each link of the network with probability p , and p ∗ is the criticalvalue of p at which a percolation cluster containing a finite fraction of all nodes first occurs. Note that the percolationthreshold at p ∗ = ¯ k − (that we take as reference value) is a feature proper of homogeneous graphs in the infinite volumelimit, whereas, for scale-free networks in the same limit it is p ∗ →
0. Note also that a bond percolation process can bemapped into a SIR model with infection rate b and uniform infection time t . In fact, by defining the trasmissibility T = − e − tb as the probability that the infection will be transmitted from an infected node to at least a susceptibleneighbor before recovery takes place, the set of nodes reached by a SIR epidemic outbreak originated from a singlenode is statistically equivalent to the cluster of the bond percolation problem (with p ≡ T ) the initial node belongs to. We then move to properties defined for directed graphs: • Link reciprocity r , measuring the tendency of node pairs to form mutual connections. It is defined as the ratio betweenthe number of bidirected links and the total number of network connections: r = ( (cid:229) i j a i → j a j → i ) / ( (cid:229) i j a i → j ) . Reciprocityis considered a sensible parameter for systemic risk, giving a measure of direct mutual exposure among nodes. • Average shortest path length l , where the shortest path length l i → j from node i to node j is the minimum numberof links required to connect i to j (following link directions), and l = N ( N − ) / ( (cid:229) i = j l − i → j ) (the harmonic mean is ommonly used to avoid problems caused by pairs of nodes that are not reachable from one to another, and for which l diverges). This quantity measures the number of steps that are required, on average, for a signal or a shock to propagatebetween any two nodes of the network. • The Group DebtRank DR , a measure of the total economic value in the network that is potentially affected by a distresson all nodes amounting to 0 < F <
1, with F = DR is based on computing the recursiveimpact ( i.e., the reverberation on the network) of the initial distress, and is defined as: DR = (cid:229) i ( h ∗ i − h i ) n i (5)where h ∗ i is the final amount of distress on i ( h i = F ) and n i is the relative economic value of i . We refer to the originalpaper for the details on how to compute DR . Results
Test of FiCM modeling
When testing our network reconstruction procedure it is important to keep in mind that the method is subject to three differentkind of errors. The first one comes from hypothesis I that the real network G can be properly described by a CM, whoseLagrange multipliers are obtained by constraining the whole in-degree and out-degree sequences. The second one derivesinstead from hypothesis II that the node fitnesses { c , y } i ∈ V are proportional to the CM’s Lagrange multipliers { x , y } i ∈ V , i.e. ,from imposing a FiCM. Finally, the third one is due to the limited information available for calibrating the FiCM and obtainthe true value of z —namely, the partial knowledge of the in-degree and out-degree sequences. Note however that the firstsource of mistakes cannot be controlled for in our context, as finding the CM that describes the data requires the knowledgeof the whole in-degree and out-degree sequences (which is not accessible for our case studies). This is exactly why we haveto make hypothesis II and impose a FiCM by assigning ad hoc values to the Lagrange multipliers. In this section we thusconcentrate on the second source of errors.Indeed, real networks are not perfect realizations of the FiCM and can only be approximated by it. In order to assessqualitatively how well this FiCM describes the real network G , one can compare the observed in-degrees and out-degrees of G with their averages h k ini i W and h k outi i W computed on the FiCM ensemble W . Figure 2 shows such comparison when theaverage degrees are obtained through eq. (2) for a fully informed FiCM, i.e. , with the value of z computed via eq. (3) usingthe knowledge of in- and out-degrees for all nodes. We indeed observe a remarkable agreement between these quantities forour empirical networks: the real degrees are scattered around the functional form of their expected values. The amount ofdeviations from perfect correlation (which would correspond to an actual realization of the FiCM) gives an indication of howwell our model describes the real network. Note that the validity of hypothesis II can be evaluated also in the case of partialinformation by performing such comparison on the subset I of nodes whose topological properties are available.In the following, in order to have a quantitative global assessments of the errors caused by hypothesis II, we will testour network reconstruction method both on real networks and on benchmark synthetic networks numerically generated withthe fully informed FiCM through eq. (2). In the latter case, the errors made by the method will be due only to the limitedinformation available about the degree sequences. It is then interesting to check whether such generated synthetic networksare equivalent to the real networks in term of systemic risk. Figure 3 shows that bond percolation properties, shortest pathlength distribution and DebtRank values of synthetic networks are in excellent agreement with those of their real counterparts(the correlation coefficients between real and synthetic curves are all above 0.99). FiCM thus proves itself to be a properframework for modeling our empirical networks. Test against limited information
In this section we finally proceed to the key testing of the method against the third (and more relevant) source of errors: thelimitedness of the information available on the degree sequences for calibrating the FiCM. In order to obtain a quantitativeestimation of the method effectiveness in reconstructing a topological property X of a given a network G (which can be eitherthe real one or its synthetic version), we implement a procedure consisting in the following operative steps: • Choose a value of n < N (the number of nodes for which the in- and out- degrees are known). • Build a set of M =
100 subsets { I a } M a = of n nodes picked at random from G . • For each subset I a , use the degree sequences from G to evaluate z from eq. (3), and name such value z a . • Build the ensemble W ( z a ) using the linking probabilities from eq. (2): generate m =
100 networks from W ( z a ) , andcompute the average value X a of property X on this ensemble. c k i n y k ou t c k i n k y k ou t
Qualitative assessment for FiCM description of the real network G . Scatter plots of node fitnesses { c , y } versusreal node in- and out-degrees { k in , k out } of G (red circles) and their ensemble averages computed via the FiCM (blueasterisks). Upper panels (a,b) refer to WTW, lower panels (c,d) to E-mid. p S G C real G l P ( l ) F D R p S G C synth. G l P ( l ) F D R (a) (b) (c)(d) (e) (f) Figure 3.
Properties of real and synthetic networks. Left plots (a,d): dependence of the size of the giant component S GC onthe occupation probability p (the vertical dotted line indicates p ∗ ). Central plots (b,e): probability distribution of the directedshortest path length l . Right plots (c,f): dependence of DR on the initial distress F . Top panels (a,b,c) refer to WTW, bottompanels (d,e,f) to E-mid. Correlation coefficient values c between real and synthetic curves: (a) c = . c = . c = . c = . c = . c = . Compute the relative root mean square error (rRMSE) of property X over the subsets { I a } M a = : r [ X ] = rRMSE X ≡ s M M (cid:229) a = (cid:20) X a X ( G ) − (cid:21) (6)where X ( G ) is the value of X measured on G .We then study how the rRMSE for the various network properties we consider varies as a function of the size n of thesubset of nodes used to calibrate the FiCM ( i.e. , for which in- and out- degree information is available). Results are shown inFigures 4 and 5. We observe that in most of the cases there is a rapid decrease of the relative error as the number of nodes n used to reconstruct the topology increases. For instance, generally the error drops to half of the starting rRMSE (for n =
1) at n / N = n / N = n ≡ N . This isan indication of the goodness of the estimation provided by our method. As expected, the rRMSE is higher for real networksthan for synthetic networks, and the difference between the two curves gives a quantitative estimation of the error made inmodeling real networks with the FiCM. The fact that such a difference is higher for E-mid than for WTW is directly relatedto a slightly better correlation between real and expected degrees observed in the latter case (Figure 2). Note that the variousrRMSE for synthetic networks do not necessarily tend to zero, because the generated synthetic configuration might be highlyimprobable—in some cases, the synthetic network can be even more atypical than the real one. We thus indicate with errorbars the range of performance of our method for different choices of synthetic G .Generally, S GC , l , k main and S main are the properties which are reconstructed better: for instance, with the knowledge ofonly 10% of the nodes, all the relative errors become smaller than 10%, and they decrease for increasing n . The rRMSEfor r and DR show instead a behavior almost flat in n . The fact that the rRMSE for r computed for real networks remainssteadily high is probably due to the fact that reciprocity is hardly reproduced by a directed CM, and is better suited as additionalimposed constraint. The rRMSE for DR is instead remarkably small for real networks (with values around 0.5%), and we canthus conclude that our method is efficient in estimating DR also when the available information is minimal. This is particularlyrelevant to our analysis, since we are estimating DR at its peak ( i.e. , at its maximum, and thus mostly fluctuating, value), wherethe details of the weighted topology play a fundamental role in the process of risk propagation. Besides, and more importantly,the value of DR for the real network is computed using the original weighted topology, whereas, the computation of DR in thereconstructed network builds on link weights obtained by the simple prescription of eq. (4).In conclusion, the outcome of this analysis is that our network reconstruction method is able to estimate the networkproperties related to systemic risk with good approximation, by using the information on the number of connections of arelatively small fraction of nodes—as long as the fitnesses of all nodes is known. Discussion
In this paper we studied a novel method that allows to reconstruct a directed weighted network and estimate its topologicalproperties by using only partial information about its connection patterns, as well as two additional intrinsic properties (inter-preted as fitnesses) associated to each node. Tests on empirical networks as well as on synthetic networks generated through afitness-induced configuration model reveals that the method is highly valuable in overcoming the lack of topological informa-tion that often hinders the estimation of systemic risk in economic and financial systems. Indeed, the information exploitedby the method is minimal but is (or should be) publicly available for these kind of systems.Our work originates from the study of Musmeci et al. , that represented a first attempt in tackling the problem of networkreconstruction from partial information within the framework of fitness-induced configuration models. Here however wemake fundamental improvements to the method, the key advance being that of extending it to directed weighted networks (themost general class of networks). In the present form, the method is then suited to reconstruct high-order network propertiesrelated to systemic risk, a task of primary practical importance the method was conceived to address—that was however farbeyond the reach of its original version. Besides, the validation of the fitness-induced configuration model approach to modelreal networks, as well as the reconstruction of benchmark synthetic networks generated as fitness-based counterparts of theempirical networks, are both novel ingredients that allow to assess quantitatively the accuracy of the method. Last but notleast, the extensive analysis of different temporal snapshots of the real networks we provide in the Supplementary Informationallows to strengthens considerably the effectiveness and robustness of our method.We remark that the method we are proposing here, by reproducing both the binary and weighted topology of the network,represent a substantial step forward in the field of network reconstruction. In fact, most of the previous works focusedon reproducing the strengths of the real network to the detriment of connection patters, whereas, only recently it has beenrealized that a successful reconstruction procedure must resort also on topological constrains. Here we are proposing amethod that allows to always reproduce the strengths, but also to tune the network topology through appropriate connection ,01 0,1 1n / N00,050,10,150,20,25 r [ k m a i n ] real G r [ S m a i n ] synth. G r [ S G C ] ( p = p * ) r [r] r [ l ] -4 -3 -2 -1 r [ D R m a x ] (d) (f)(e)(a) (b) (c) Figure 4. rRMSE of various topological properties versus n for the reconstructed G of the WTW (both the real networkand its synthetic version). rRMSE for: (a) degree of the main core k main , (b) size of the main core S main , (c) size of the giantcomponent S GC at the bond percolation threshold p ∗ = ¯ k − , (d) link reciprocity r , (e) mean shortest path length l , (f)maximum value of the group DebtRank DR . r [ k m a i n ] real G r [ S m a i n ] synth. G r [ S G C ] ( p = p * ) r [r] r [ l ] -3 -2 -1 r [ D R m a x ] (d) (c)(a) (b) (e) (f) Figure 5. rRMSE of various topological properties versus n for the reconstructed G of the E-mid (both the real networkand its synthetic version). rRMSE for: (a) degree of the main core k main , (b) size of the main core S main , (c) size of the giantcomponent S GC at the bond percolation threshold p ∗ = ¯ k − , (d) link reciprocity r , (e) mean shortest path length l , (f)maximum value of the group DebtRank DR . robabilities. In this respect, the use of probabilities derived from degree constraints represent the most general case, whichinclude as specific instances both the dense reconstruction techniques (obtained for p i → j = ∀ i = j ∈ V ) and the sparsereconstruction method (roughly obtained for p i → j = − l ∀ i = j ∈ V ).Note that one should not be much surprised that the knowledge of a small number of nodes allows to precisely estimate awide range of network properties, because the method assumes the additional knowledge of the fitness parameters for all thenodes. Besides, the effectiveness of the method strongly depends on the accuracy of the fitness model used to calibrate the CMin order to fit the empirical dataset. In the case of WTW and E-mid, the fitness model well describes how links are establishedacross nodes, and our method is thus effective in reconstructing the network properties. Finally, we remark that the issue ofhaving limited information on the system under investigation, while being typical for social, economic and financial systems(that are privacy-protected), is very relevant also for biological systems such as ecological networks, metabolic networks andfunctional brain networks—where, due to observational limitations and high experimental costs for collecting data, detailedtopological information about connections is often missing. Notably, our method can be used to reconstruct any networkrepresenting a set of (directed and weighted) dependencies among the constituents of a complex system, and we thus believeit will find wide applicability in the field of complex networks and statistical physics of networks. References Clauset, A., Moore, C. & Newman, M. E. J. Hierarchical structure and the prediction of missing links in networks.
Nature Mastromatteo, I., Zarinelli, E. & Marsili M. Reconstruction of financial networks for robust estimation of systemic risk.
J. Stat. Mech. Theory Exp.
P03011 (2012). Battiston, S., Gatti, D., Gallegati, M. Greenwald, B. & Stiglitz, J. Liaisons dangereuses: increasing connectivity, risksharing, and systemic risk.
J. Econ. Dyn. Control Battiston, S., Puliga, M., Kaushik, R., Tasca, P. & Caldarelli, G. DebtRank: too central to fail? Financial networks, thefed and systemic risk.
Sci. Rep.
541 (2012). Fouque, J. P. & Langsam, J. A. (Eds.).
Handbook on Systemic Risk (Cambridge University Press, 2013). Wells, S. Financial interlinkages in the United Kingdom’s interbank market and the risk of contagion.
Bank of England’sWorking paper (2004). van Lelyveld, I. & Liedorp, F. Interbank contagion in the dutch banking sector. Int. J. Cent. Bank. Degryse, H. & Nguyen, G. Interbank exposures: an empirical examination of contagion risk in the Belgian bankingsystem.
Int. J. Cent. Bank. Mistrulli, P. Assessing financial contagion in the interbank market: Maximum entropy versus observed interbank lendingpatterns.
J. Bank. Finance
Musmeci, N., Battiston, S., Caldarelli, G., Puliga, M. & Gabrielli, A. Bootstrapping topological properties and systemicrisk of complex networks using the fitness model.
J. Stat. Phys.
Caldarelli, G., Chessa, A., Gabrielli, A., Pammolli, F. & Puliga, M. Reconstructing a credit network.
Nature Physics
125 (2013).
Park, J. & Newman, M. E. J. Statistical mechanics of networks.
Phys. Rev. E
Caldarelli, G., Capocci, A., De Los Rios, P. & Mu˜noz, M. Scale-free networks from varying vertex intrinsic fitness.
Phys.Rev. Lett.
Bogu˜n´a, M. & Serrano, M. A. Generalized percolation in random directed networks.
Phys. Rev. E
Meyers, L. A. , Newman, M. E. J. & Pourbohloul, B. Predicting epidemics on directed contact networks.
J. Theo. Biol.
Kitsak, M., Gallos, L., Havlin, S., Liljeros, F., Muchnik, L., Stanley, H. & Makse H. Identification of influential spreadersin complex networks.
Nat. Phys. Barrat, A., Barth´ekemy, M. & Vespignani, A.
Dynamical Processes On Complex Networks (Cambridge University Press,2008).
Bellman, R. On a routing problem.
Quart. Appl. Math.
Gleditsch, K. S. Expanded trade and GDP data.
J. Confl. Res. De Masi, G., Iori, G. & Caldarelli, G. A fitness model for the Italian Interbank Money Market.
Phys. Rev. E
Squartini, T. & Garlaschelli, D. Analytical maximum-likelihood method to detect patterns in real networks.
New J. Phys.
Garlaschelli, D. & Loffredo, M. Fitness-dependent topological properties of the World Trade Web.
Phys. Rev. Lett.
Garlaschelli, D., Battiston, S., Castri, M., Servedio, V. & Caldarelli, G. The scale-free topology of market investments.
Physica A
Dorogovtsev, S. Lectures on complex networks.
Phys. J.
51 (2010).
Newman, M. E. J. Spread of epidemic disease on networks.
Phys. Rev. E
Squartini, T. & Garlaschelli, D. Triadic motifs and dyadic self-organization in the World Trade Network.
Lec. Not. Comp.Sci.
Acknowledgements