[PDF] Fast and scalable likelihood maximization for Exponential Random Graph Models with local constraints

Abstract

Exponential Random Graph Models (ERGMs) have gained increasing popularity over the years. Rooted into statistical physics, the ERGMs framework has been successfully employed for reconstructing networks, detecting statistically significant patterns in graphs, counting networked configurations with given properties. From a technical point of view, the ERGMs workflow is defined by two subsequent optimization steps: the first one concerns the maximization of Shannon entropy and leads to identify the functional form of the ensemble probability distribution that is maximally non-committal with respect to the missing information; the second one concerns the maximization of the likelihood function induced by this probability distribution and leads to its numerical determination. This second step translates into the resolution of a system of O(N) non-linear, coupled equations (with N being the total number of nodes of the network under analysis), a problem that is affected by three main issues, i.e. accuracy, speed and scalability. The present paper aims at addressing these problems by comparing the performance of three algorithms (i.e. Newton's method, a quasi-Newton method and a recently-proposed fixed-point recipe) in solving several ERGMs, defined by binary and weighted constraints in both a directed and an undirected fashion. While Newton's method performs best for relatively little networks, the fixed-point recipe is to be preferred when large configurations are considered, as it ensures convergence to the solution within seconds for networks with hundreds of thousands of nodes (e.g. the Internet, Bitcoin). We attach to the paper a Python code implementing the three aforementioned algorithms on all the ERGMs considered in the present work.

Full PDF

aa r X i v : . [ phy s i c s . d a t a - a n ] J a n Fast and scalable likelihood maximizationfor Exponential Random Graph Models

Nicol`o Vallarano, Matteo Bruno, Emiliano Marchese, Giuseppe Trapani, Fabio Saracco, Tiziano Squartini,

1, 2, ∗ Giulio Cimini, and Mario Zanon IMT School for Advanced Studies Lucca, P.zza San Francesco 19, 55100 Lucca (Italy) Institute for Advanced Study (IAS), University of Amsterdam,Oude Turfmarkt 145, 1012 GC Amsterdam (The Netherlands) Physics Department and INFN, ‘Tor Vergata’ University of Rome, 00133 Rome (Italy) (Dated: February 1, 2021)Exponential Random Graph Models (ERGMs) have gained increasing popularity over the years.Rooted into statistical physics, the ERGMs framework has been successfully employed for recon-structing networks, detecting statistically signiﬁcant patterns in graphs, counting networked con-ﬁgurations with given properties. From a technical point of view, the ERGMs workﬂow is deﬁnedby two subsequent optimization steps: the ﬁrst one concerns the maximization of Shannon entropyand leads to identify the functional form of the ensemble probability distribution that is maximallynon-committal with respect to the missing information; the second one concerns the maximizationof the likelihood function induced by this probability distribution and leads to its numerical deter-mination. This second step translates into the resolution of a system of O ( N ) non-linear, coupledequations (with N being the total number of nodes of the network under analysis), a problem thatis aﬀected by three main issues, i.e. accuracy , speed and scalability . The present paper aims at ad-dressing these problems by comparing the performance of three algorithms (i.e. Newton’s method, aquasi-Newton method and a recently-proposed ﬁxed-point recipe) in solving several ERGMs, deﬁnedby binary and weighted constraints in both a directed and an undirected fashion. While Newton’smethod performs best for relatively little networks, the ﬁxed-point recipe is to be preferred whenlarge conﬁgurations are considered, as it ensures convergence to the solution within seconds fornetworks with hundreds of thousands of nodes (e.g. the Internet, Bitcoin). We attach to the papera Python code implementing the three aforementioned algorithms on all the ERGMs considered inthe present work. PACS numbers: 02.10.Ox, 89.75.Hc, 02.70.Rr, 05.10.-a

INTRODUCTION

Over the last 20 years, network theory has emerged asa successful framework to address problems of scientiﬁcand societal relevance [1]: examples of processes that areaﬀected by the structural details of the underlying net-work are provided by the spreading of infectious diseases[2–4], opinion dynamics [5], the propagation of losses dur-ing ﬁnancial crises [6], etc.Within such a context, two needs have emerged quitenaturally [7]: 1) detecting the topological properties of areal network that can be deemed as statistically signiﬁ-cant - typically those higher-order properties that cannotbe explained by local features of the nodes as the degrees(e.g. the assortativity, the clustering coeﬃcient, etc.), 2)inferring the relevant details of a given network structurein case only partial information is available. Both goalscan be achieved by constructing a framework for deﬁn-ing benchmarks , i.e. synthetic conﬁgurations retainingonly some of the properties of the original system - theso-called constraints - and, otherwise, being maximallyrandom.Two diﬀerent kinds of approaches have been pro-posed so far, i.e. the microcanonical and the canonical ones. Microcanonical approaches [8–14] artiﬁcially gener- ate many randomized variants of the observed network byenforcing the constraints in a ‘hard’ fashion, i.e. by creat-ing conﬁgurations on each of which the constrained prop-erties are identical to the empirical ones. On the otherhand, canonical approaches [15–19] enforce constraints ina ‘soft’ fashion, by creating a set of conﬁgurations overwhich the constrained properties are on average identi-cal to the empirical ones. Softening the requirement ofmatching the constraints has a clear advantage: allow-ing the mathematical expression for the probability of ageneric conﬁguration, G , to be obtained analytically, asa function of the enforced constraints.In this second case, a pivotal role is played by theformalism of the Exponential Random Graph Models(ERGMs) [20] whose popularity has steadily increasedover the years. The ERGMs mathematical frameworkdates back to Gibbs’ (re)formulation of statistical me-chanics and is based upon the variational principle knownas maximum entropy , stating that the probability distri-bution that is maximally non-committal with respect tothe missing information is the one maximizing the Shan-non entropy [21]. This allows self-consistent inferenceto be made, by assuming maximal ignorance about theunknown degrees of freedom of the system.In the context of network theory, the ERGMs frame-work has found a natural application to the resolution ofthe two aforementioned problems, i.e. 1) that of deﬁning null models of the original network, in order to assessthe signiﬁcance of empirical patterns - against the hy-pothesis that the network structure is solely determinedby the constraints themselves and 2) that of deriving themost probable conﬁgurations that are compatible withthe available details about a speciﬁc network.In both cases, after the functional form of the probabil-ity distribution has been identiﬁed, via the maximum en-tropy principle, one also needs to numerically determineit: to this aim, the likelihood maximization principle canbe invoked, stating to maximize the probability of ob-serving the actual conﬁguration. This prescription typi-cally translates into the resolution of a system of O ( N )non-linear, coupled equations - with N representing thenumber of nodes of the network under analysis.Problems like these are usually aﬀected by the is-sues of accuracy , speed and scalability : the present pa-per aims at addressing them at once, by comparing theperformance of three algorithms, i.e. Newton’s method,a quasi-Newton method and a recently-proposed ﬁxed-point recipe [22, 23], to solve a variety of ERGMs, deﬁnedby binary and weighted constraints in both a directed andan undirected fashion.Additionally, we provide a Python code implementingthe three aforementioned recipes on all the ERGMs con-sidered in the present work. GENERAL THEORY

Canonical approaches aim at obtaining the mathemat-ical expression for the probability of a generic conﬁg-uration, G , as a function of the observed constraints:ERGMs realize this by maximizing the Shannon entropy[15, 16]. The Maximum Entropy Principle

Generally speaking, the problem to be solved in orderto ﬁnd the functional form of the probability distributionto be employed as a benchmark readsarg max P S [ P ] (1a)s . t . X G P ( G ) C i ( G ) = h C i i , i = 0 . . . M (1b)where Shannon entropy reads S [ P ] = − X G P ( G ) ln P ( G ) (2) and ~C ( G ) is the vector of constraints representing theinformation deﬁning the benchmark itself (notice that C = h C i = 1 sums up the normalization condition).The solution to the problem above can be found by max-imizing the Lagrangian function L ( P, ~θ ) ≡ S [ P ] + M X i =0 θ i " − X G P ( G ) C i ( G ) + h C i i (3)with respect to P ( G ). As a result one obtains P ( G | ~θ ) = e −H ( G ,~θ ) Z ( ~θ ) (4)with H ( G , ~θ ) = ~θ · ~C ( G ) = P Mi =1 θ i C i ( G ) represent-ing the Hamiltonian , i.e. the function summing up theproper, imposed constraints and Z ( ~θ ) = P G P ( G ) = P G e −H ( G ,~θ ) representing the partition function , ensur-ing that P ( G ) is properly normalized. Constraints play apivotal role, either representing the information to ﬁlter,in order to assess the signiﬁcance of certain quantities,or the only available one, in order to reconstruct the in-accessible details of a given conﬁguration. The Maximum Likelihood Principle

The formalism above is perfectly general; however, itcan be instantiated to study an empirical network con-ﬁguration, say G ∗ . In this case, the Lagrange multipliers‘acting’ as unknown parameters in eq. (4) can be numer-ically estimated by maximizing the associated likelihoodfunction [15, 24]. The latter is deﬁned as L ( ~θ ) ≡ ln P ( G ∗ | ~θ ) = −H ( G ∗ , ~θ ) − ln Z ( ~θ ) (5)and must be maximized with respect to the vector ~θ .Remarkably, whenever the probability distribution is ex-ponential (as the one deriving from the Shannon entropymaximization), the likelihood maximization problemarg max ~θ L ( ~θ ) (6)is characterized by ﬁrst-order necessary conditions for op-timality reading ∇ θ i L ( ~θ ) = ∂ L ( ~θ ) ∂θ i = − C i ( G ∗ ) − ∂ ln Z ( ~θ ) ∂θ i = − C i ( G ∗ ) + X G C i ( G ) P ( G )= − C i ( G ∗ ) + h C i i = 0 , i = 1 . . . M (7)and leading to the system of equations ∇ L ( ~θ ) = ~ ⇒ ~C ( G ∗ ) = h ~C i (8)to be solved. These conditions, however, are suﬃcient tocharacterize a maximum only if L ( ~θ ) is concave. This isindeed the case, as we prove by noticing that H ij = ∇ θ i θ j L ( ~θ ) = ∂ L ( ~θ ) ∂θ i ∂θ j = − ∂ ln Z ( ~θ ) ∂θ i ∂θ j = ∂ h C j i ∂θ i = − Cov[ C i , C j ] , i, j = 1 . . . M (9)i.e. that the Hessian matrix, H , of our likelihood functionis ‘minus’ the covariance matrix of the constraints, hencenegative semideﬁnite by deﬁnition. The fourth passageis an example of the well-known ﬂuctuation-response re-lation [20]. Combining the MEP and the MLP principles

The MEP and the MLP encode two diﬀerent prescrip-tions aiming, respectively, at determining the functionalform of a probability distribution and its numerical value.In optimization theory, the problem (1) is known as pri-mal problem : upon noticing that the Shannon entropyis concave, while the imposed constraints are linear in P ( G ), one concludes that the primal problem is convex(it is easy to see this, by rewriting it as a minimizationproblem for − S [ P ]).As convexity implies strong duality , we can, equiva-lently, consider an alternative version of the problem tooptimize, know as dual problem . In order to deﬁne it, letus consider the Lagrangian function L ( P, ~θ ) ≡ S [ P ] + M X i =1 θ i " − X G P ( G ) C i ( G ) + C i ( G ∗ ) (10)where, now, the generic expectation of the i -th con-straint, h C i i , has been replaced by the correspondingempirical value, C i ( G ∗ ). As the dual function is givenby P ( G ∗ | ~θ ) ≡ arg max P L ( P, ~θ ) , (11)the dual problem readsarg max ~θ arg min P −L ( P ( ~θ ) , ~θ ) (12) which is a convex problem by construction; this is readilyseen by substituting eq. (4) into eq. (10), operation thatleads to the expression −L ( P ( ~θ ) , ~θ ) = − ~θ · ~C ( G ∗ ) − ln Z ( ~θ ) = L ( ~θ ) , (13)i.e. the likelihood function introduced in eq. (5). In otherwords, eq. (12) combines the MEP and the MLP into aunique optimization step whose score function becomesthe Lagrangian function deﬁned in eq. (10). Optimization algorithms for non-linear problems

In general, the optimization problem deﬁned in eq.(12) cannot be solved analytically, whence the need toresort to numerical methods. For an exhaustive reviewon numerical methods for optimization we refer the in-terested reader to [25, 26]: in the following, we presentonly the concepts that are of some relevance for us. Theproblem arg max ~θ L ( ~θ ) (14)is a Nonlinear Programming Problem (NLP). In order tosolve it numerically, we will adopt a Sequential QuadraticProgramming (SQP) approach. Starting from an initialguess ~θ (0) , SQP solves eq. (14) by iteratively updatingthe vector of Lagrange multipliers θ ( n +1) i = θ ( n ) i + α ∆ θ ( n ) i , i = 1 . . . M (15)according to the rule∆ θ ( n ) i = arg max ∆ θ i  ∇ θ i L ( ~θ )∆ θ i + X j,k

12 ∆ θ j H ( n ) jk ∆ θ k  (16) ∀ i , leading to the set of equations ∇ i L ( ~θ ) + X j H ( n ) ij ∆ θ j = 0 , i = 1 . . . M (17)which can be compactly rewritten as∆ ~θ ( n ) = − H ( n ) − ∇ L ( ~θ ) . (18)The stepsize α ∈ (0 ,

1] is selected to ensure that L ( ~θ ( n +1) ) > L ( ~θ ( n ) ) via a back-tracking, line-searchprocedure: starting from α = 1, if the Armijo condition L ( ~θ ( n ) + α ∆ ~θ ( n ) ) < L ( ~θ ( n ) ) + γα ∇ L ( ~θ ) ⊤ ∆ ~θ, (19)is violated, we set α ← βα ( γ ∈ (0 , .

5] and β ∈ (0 ,

1) arethe parameters of the algorithm). On the other hand,the term H ( n ) can be selected according to a variety ofmethods. In the present contribution we focus on thefollowing three ones. Newton’s method.

One speaks of Newton’s method incase H ( n ) is chosen to be H ( n ) = ∇ L ( ~θ ( n ) ) + ∆ H ( n ) (20)where ∇ L ( ~θ ) is the Hessian matrix of the likelihoodfunction and the term ∆ H ( n ) is typically selected assmall as possible in order to avoid slowing conver-gence and to ensure that H ( n ) is negative deﬁnite (i.e. ∇ L ( ~θ ( n ) ) + ∆ H ( n ) ≺ H ( n ) is alsoreferred to as ‘exact Hessian’. Quasi-Newton methods.

Any Hessian approximationwhich is negative deﬁnite (i.e. satisfying H ( n ) ≺ H ( n ) = − I , which yields the ‘steepest ascent’algorithm, here we have opted for the following recipe,i.e. the purely diagonal version of Newton’s method: H ( n ) ii = ∇ ii L ( ~θ ( n ) ) + ∆ H ( n ) ii < ∀ i and H ( n ) ij = 0, ∀ i = j . Fixed-point iteration on modiﬁed KKT conditions.

Inaddition to the (classes of) algorithms above, we willalso consider an iterative recipe which is constructedas a ﬁxed-point iteration on a modiﬁed version of theKarush–Kuhn–Tucker (KKT) conditions, i.e. F ( ~θ ) = ~ ~θ = G ( ~θ ); the iterate can, then, be madeexplicit by rewriting the latter as ~θ ( n ) = G ( ~θ ( n − ) . (21)The condition above will be made explicit, for eachnetwork model, in the corresponding subsection. We alsoobserve that this choice yields a non-standard SQP met-hood as H ( n ) is typically not symmetric, for our models. APPLICATIONS

Let us now apply the algorithms described in the pre-vious section to a number of speciﬁc cases of interest.

UBCM: binary undirected graphs with given degree sequence

Let us start by considering binary, undirected networks(BUNs). The simplest, non-trivial set of constraints is represented by the degrees of nodes: the degree ofnode i , i.e. k i ( A ) = P Nj ( = i )=1 a ij , counts the numberof its neighbours and coincides with the total numberof 1s along the i -th row (or, equivalently, along the i -thcolumn) of the adjacency matrix A ≡ { a ij } Ni,j =1 . Thebenchmark deﬁned by this set of constraints is known as Undirected Binary Conﬁguration Model (UBCM) and itsHamiltonian reads H UBCM ( A , ~θ ) = N X i =1 θ i k i ( A ); (22)entropy maximization [15, 16] leads to the factorizedgraph probability P UBCM ( A | ~θ ) = N Y i =1 N Y j =1( j

Resolution of the UBCM.

Newton’s and the quasi-Newton method can be easily implemented via the recipedeﬁned in eq. (18) (see Appendix A for the deﬁnition ofthe UBCM Hessian).The explicit deﬁnition of the ﬁxed-point recipe, in-stead, requires a preliminary observation, i.e. that thesystem of equations embodying the UBCM ﬁrst-order op-timality conditions can be re-written as follows e − θ i = k i ( A ∗ ) P Nj =1( j = i ) (cid:16) e − θj e − θi − θj (cid:17) , i = 1 . . . N (26)i.e. as a set of consistency equations. The observationthat the term e − θ i appears on both sides of the equationcorresponding to the i -th constraint suggests an iterativerecipe to solve such a system, i.e. θ ( n ) i = − ln  k i ( A ∗ ) P Nj =1( j = i ) (cid:18) e − θ ( n − j e − θ ( n − i − θ ( n − j (cid:19)  , i = 1 . . . N (27)originally proposed in [22] and further reﬁned in [23].The identiﬁcation p UBCM ij ≡ e − θ ( ∞ ) i − θ ( ∞ ) j e − θ ( ∞ ) i − θ ( ∞ ) j , ∀ i < j al-lows the probability coeﬃcients deﬁning the UBCM tobe numerically determined.As any other iterative recipe, the one proposed aboveneeds to be initialized as well. To this aim, we havetested three diﬀerent sets of initial values: the ﬁrst one isdeﬁned by the position θ (0) i = − ln h k i ( A ∗ ) √ L i , ∀ i - usually,a good approximation of the solution of the system ofequations in (25), in the ‘sparse case’ (i.e. whenever p UBCM ij ≃ e − θ i − θ j [27]; the second one is a variant ofthe position above, reading θ (0) i = − ln h k i ( A ∗ ) √ N i , ∀ i ;the third one, instead, prescribes to randomly draw thevalue of each parameter from a uniform distributionwith support on the unit interval, i.e. θ (0) i ∼ U (0 , ∀ i . Reducing the dimensionality of the problem.

Theproblem deﬁning the UBCM can be further simpliﬁed bynoticing that nodes with the same degree, say k , can beassigned the same value of the multiplier θ [24] - a resultresting upon the observation that any value k i ( A ∗ ) mustmatch the sum of monotonic, increasing functions. Thistranslates into the possibility of rewriting L UBCM ( ~θ ) ina ‘reduced’ fashion, as L reducedUBCM ( ~θ ) = − X k f ( k ) θ k k ( A ∗ ) − X k X k ′ ( k ′ ≤ k ) f ( k )[ f ( k ′ ) − δ kk ′ ] ·· ln (cid:2) e − θ k − θ k ′ (cid:3) (28)where the sums run over the distinct values of the de-grees and f ( k ) counts the number of nodes whose degreeis k . Rewriting the problem with respect to the set { θ k } k leads one to recover simpliﬁed versions of the three algo-rithms considered here: Newton’s and the quasi-Newtonmethods can, now, be solved via a ‘reduced’ version ofeq. (18) (since both the dimension of the gradient andthe order of the Hessian matrix of the likelihood functionare, now, less than N ), while the iterative recipe deﬁnedin (26) can be rewritten in terms of the ‘non-degenerate’degrees, as θ ( n ) k = − ln  k ( A ∗ ) P k ′ [ f ( k ′ ) − δ kk ′ ] (cid:18) e − θ ( n − k ′ e − θ ( n − k − θ ( n − k ′ (cid:19)  (29) ∀ k , where, at the denominator, the self-contribution(i.e. the probability that a node links to itself) has beenexplicitly excluded. Performance testing.

The performance of the threealgorithms, considered in the present paper, to solvethe reduced version of eqs. (25), has been tested ona bunch of real-world networks. The latter ones spana wide variety of systems, including natural, ﬁnancialand technological ones. In particular, we have con-sidered the synaptic network of the worm

C. Elegans [28], the network of the largest US airports [29], theprotein-protein interaction network of the bacterium

H. Pylori [30], Internet at the level of AutonomousSystems [31] and eight daily snapshots of the so-calledBitcoin Lightning Network [32], chosen throughout itsentire history. Before commenting on the results of ournumerical exercises, let us, ﬁrst, describe how the latterones have been carried out.The accuracy of each algorithm in reproducing the con-straints deﬁning the UBCM has been quantiﬁed via the maximum absolute error metrics, deﬁned, in a perfectlygeneral fashion, as max i {| C ∗ i − h C i i|} Ni =1 (where C ∗ i isthe empirical value of the i -th constraint, C i ). Naturally,in the UBCM case, C i = k i , ∀ i and the aforementionederror score becomesMADE = max i {| k ∗ i − h k i i|} Ni =1 (30) Newton Quasi-Newton Fixed-point

N L c c r MADE Time (s) MADE Time (s) MADE Time (s)

C. Elegans (nn) 265 1879 ≃ · − ≃ . · − ≃ . · − ≃ . ≃ · − ≃ . ≃ · − ≃ . ≃ · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . H. Pylori (pp) 732 1465 ≃ · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ · − ≃ . ≃ · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ · − ≃ . ≃ · − ≃ . · − ≃ · − ≃ . ≃ . · − ≃ . ≃ · − ≃ . ≃ · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ · − ≃ . · − ≃ . · − ≃ . ≃ · − ≃ . ≃ . · − ≃ . ≃ · − ≃ . · − ≃ . · − ≃ . ≃ · − ≃ . ≃ . · − ≃ . ≃ · − ≃ . · − ≃ . · − ≃ . ≃ · − ≃ . ≃ · − ≃ . ≃ · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . N , the total number oflinks, L , and the connectance, c = 2 L/N ( N − ||∇ L ( ~θ ) || ≤ − is satisﬁed or because the condition || ∆ ~θ || ≤ − is satisﬁed. For what concerns accuracy, the two most accuratemethods are Newton’s and the ﬁxed-point ones; for what concerns speed, the fastest method is the ﬁxed-point one (althoughNewton’s one approximately requires the same amount of time on each speciﬁc conﬁguration). Only the results correspondingto the best choice of initial conditions are reported. (the acronym standing for Maximum Absolute DegreeError). Equivalently, it is the inﬁnite norm of the dif-ference between the vector of the empirical values of theconstraints and that of their expected values.For each algorithm, we have considered three diﬀerentstopping criteria: the ﬁrst one puts a condition on theEuclidean norm of the gradient of the likelihood function,i.e. ||∇ L ( ~θ ) || = vuut N X i =1 (cid:16) ∇ i L ( ~θ ) (cid:17) ≤ − ; (31)the second one puts a condition on the Euclidean normof the vector of diﬀerences between the values of the pa-rameters at subsequent iterations, i.e. || ∆ ~θ || = vuut N X i =1 (∆ θ i ) ≤ − ; (32)the third one concerns the maximum number of itera-tions: after 1000 steps, any of the three algorithms stops.The results about the performance of our three algo-rithms are reported in table I. Overall, all recipes per-form very satisfactorily, being accurate, fast and scalable;moreover, all algorithms stop either because the condi-tion on the norm of the likelihood is satisﬁed or becausethe condition on the norm of the vector of parameters issatisﬁed.For what concerns accuracy, the largest maximumerror per method spans an interval (across all conﬁg-urations) that amounts at 10 − . MADE reducedNewton . − , 10 − ≤ MADE reducedQuasi-Newton ≤ − and 10 − . MADE reducedﬁxed-point . − . By looking at each speciﬁc net-work, it is evident that the two most accurate methodsare systematically Newton’s and the ﬁxed-point ones.For what concerns speed, the amount of time re-quired by each method to achieve convergence spansan interval (across all conﬁgurations) that is 0 . ≤ T reducedNewton ≤ .

01, 0 . ≤ T reducedQuasi-Newton ≤ .

15 and0 . ≤ T reducedﬁxed-point ≤ .

015 (time is measured in sec-onds). The fastest method is the ﬁxed-point one, al-though Newton’s method approximately requires thesame amount of time, when compared to it on each spe-ciﬁc conﬁguration. Diﬀerences in the speed of conver-gence of any method, caused by the choice of a partic-ular set of initial conditions, are indeed observable: theprescription reading θ (0) i = − ln h k i ( A ∗ ) √ N i , ∀ i outperformsthe other ones.Let us now comment on the scalability of our algo-rithms. What we learn from our exercise is that scalabil-ity is not related to the network size in a simple way: thefactors seemingly playing a major role are the ones aﬀect-ing the reducibility of the original system of equations,i.e. the ones ‘deciding’ the number of diﬀerent equationsthat actually need to be solved.While reducibility can be easily quantiﬁed a posteriori ,e.g. by calculating the coeﬃcient of reduction , c r , deﬁnedas the ratio between the number of equations that sur-vive to reduction and the number of equations deﬁningthe original problem (hence, the smaller the better), pro-viding an exhaustive list of the aforementioned factors apriori is much more diﬃcult.One may argue that reducibility is aﬀected by the het-erogeneity of the degree distribution; upon consideringthat the latter can be quantiﬁed by computing the co-eﬃcient of variation , one may derive a simple rule ofthumb: a larger coeﬃcient of variation (pointing out alarger heterogeneity of the degree distribution) leads toa larger coeﬃcient of reduction and a larger amount oftime for convergence will be required. Although intu-itive, deviations from this picture should be neverthelessexpected. In fact, other factors may play a role: as anexample, hubs may be present, thus forcing the corre-sponding parameters to assume either very large or verysmall values - hence, slowing down the entire convergenceprocess.In this sense, scalability is the result of a (non-trivial)interplay between size and reducibility. Let us take alook at table I: Internet is the most reducible networkof our basket, although being the largest in size, whilethe neural network of C. Elegans is one of the leastreducible networks of our basket, although being thesecond smallest one; as a consequence, the actualnumber of equations deﬁning the UBCM on

C. Elegans is ≃

30 while the actual number of equations deﬁningthe UBCM on Internet is ≃

100 - whence the largeramount of time to solve the latter. Remarkably, thetime required by our recipes to ensure that the largestsystem of equations converges to the solution rangesfrom thousandths to tenths of seconds.As a last comment, we would like to stress that, un-like several popular approximations as the Chung-Lu one[27], the generic coeﬃcient p UBCM ij always represents aproper probability, in turn implying that eq. (23) alsoprovides us with a recipe to sample the canonical en-semble of BUNs, under the UBCM. Notice that the fac-torization of the graph probability P UBCM ( A | ~θ ) greatlysimpliﬁes the entire procedure, allowing a single graph tobe sampled by implementing the Bernoulli trial a ij = ( − p UBCM ij p UBCM ij (33)for each (undirected) pair of nodes, in either a sequen-tial or a parallel fashion. The sampling process, whosecomputational complexity amounts at O ( N ), can be re-peated to generate as many conﬁgurations as desired.The pseudo-code for explicitly sampling the UBCM en-semble is summed up by Algorithm 1. It is deﬁned as c v = s/m , where s and m are, respectively, thestandard deviation and the mean of the degree distribution ofthe network at hand. Algorithm 1

Sampling the UBCM ensemble for m = 1 . . . | E | do A = ; for i = 1 . . . N do for j = 1 . . . N and j < i do if RandomUniform[0 , ≤ p UBCM ij then a ij = a ji = 1; else a ij = a ji = 0; end if end for end for Ensemble[ m ] = A ; end for DBCM: binary directed graphswith given in-degree and out-degree sequences Let us now move to consider binary, directed net-works (BDNs). In this case, the simplest, non-trivialset of constraints is represented by the in-degrees andthe out-degrees of nodes, where k ini ( A ) = P Nj ( = i )=1 a ji counts the number of nodes ‘pointing’ to node i and k outi ( A ) = P Nj ( = i )=1 a ij counts the number of nodes i ‘points’ to. The benchmark deﬁned by this set of con-straints is known as Directed Binary Conﬁguration Model (DBCM) whose Hamiltonian reads H DBCM ( A , ~α, ~β ) = N X i =1 [ α i k outi ( A ) + β i k ini ( A )]; (34)as in the undirected case, entropy maximization [15, 16]leads to a factorized probability distribution, i.e. P DBCM ( A | ~α, ~β ) = N Y i =1 N Y j =1( j = i ) p a ij ij (1 − p ij ) − a ij (35)where p ij = p DBCM ij ≡ e − αi − βj e − αi − βj . The canonical ensem-ble of BDNs is, now, the set of networks with the samenumber of nodes, N , of the observed graph and a num-ber of (directed) links varying from zero to the maximumvalue N ( N − A ∗ becomes L DBCM ( ~α, ~β ) = − N X i =1 [ α i k outi ( A ∗ ) + β i k ini ( A ∗ )] − N X i =1 N X j =1( j = i ) ln (cid:2) e − α i − β j (cid:3) (36)whose ﬁrst-order optimality conditions read ∇ α i L DBCM = − k outi ( A ∗ ) + N X j =1( j = i ) e − α i − β j e − α i − β j = − k outi ( A ∗ ) + N X j =1( j = i ) p DBCM ij = − k outi ( A ∗ ) + h k outi i = 0 , i = 1 . . . N (37)and ∇ β i L DBCM = − k ini ( A ∗ ) + N X j =1( j = i ) e − α j − β i e − α j − β i = − k ini ( A ∗ ) + N X j =1( j = i ) p DBCM ji = − k ini ( A ∗ ) + h k ini i = 0 , i = 1 . . . N. (38) Resolution of the DBCM.

Newton’s and the quasi-Newton method can be easily implemented via the recipedeﬁned in eq. (18) (see Appendix A for the deﬁnition ofthe DBCM Hessian).The ﬁxed-point recipe for solving the system of equa-tions embodying the DBCM ﬁrst-order optimality con-ditions can, instead, be re-written in the usual iterativefashion as follows α ( n ) i = − ln  k outi ( A ∗ ) P Nj =1( j = i ) (cid:18) e − β ( n − j e − α ( n − i − β ( n − j (cid:19)  , i = 1 . . . Nβ ( n ) i = − ln  k ini ( A ∗ ) P Nj =1( j = i ) (cid:18) e − α ( n − j e − α ( n − j − β ( n − i (cid:19)  , i = 1 . . . N (39)Analogously to the undirected case, the initialization ofthis recipe has been implemented in three diﬀerent ways.The ﬁrst one reads α (0) i = − ln h k outi ( A ∗ ) √ L i , i = 1 . . . N and β (0) i = − ln h k ini ( A ∗ ) √ L i , i = 1 . . . N and representsa good approximation to the solution of the system ofequations deﬁning the DBCM in the ‘sparse case’ (i.e.whenever p DBCM ij ≃ e − α i − β j ); the second one is a variantof the position above, reading α (0) i = − ln h k outi ( A ∗ ) √ N i , i = 1 . . . N and β (0) i = − ln h k ini ( A ∗ ) √ N i , i = 1 . . . N ; thethird one, instead, prescribes to randomly draw thevalue of each parameter from a uniform distributiondeﬁned on the unit interval, i.e. α (0) i ∼ U (0 , ∀ i and β (0) i ∼ U (0 , ∀ i . As for the UBCM, the identiﬁcation p DBCM ij ≡ e − α ( ∞ ) i − β ( ∞ ) j e − α ( ∞ ) i − β ( ∞ ) j , ∀ i = j allows the probabilitycoeﬃcients deﬁning the DBCM to be numericallydetermined. Reducing the dimensionality of the problem.

As forthe UBCM, we can deﬁne a ‘reduced’ version of theDBCM likelihood, accounting only for the distinct (pairsof) values of the degrees . By deﬁning k out ≡ k and k in ≡ h , in order to simplify the formalism, the reducedDBCM recipe reads L reducedDBCM ( ~θ ) = − X k X h n ( k, h )[ α k,h k ( A ∗ ) + β k,h h ( A ∗ )] − X k,h X k ′ ,h ′ n ( k, h )[ n ( k ′ , h ′ ) − δ kk ′ δ hh ′ ] ·· ln (cid:2) e − α k,h − β k ′ ,h ′ (cid:3) ; (40)the implementation of the algorithms considered heremust be modiﬁed in a way that is analogous to the onealready described for the UBCM. In particular, the ﬁxed-point recipe for the DBCM can be re-written by assigningto the nodes with the same out- and in-degrees ( k, h ) thesame pair of values ( α, β ), i.e. as α ( n ) k,h = − ln  k ( A ∗ ) P k ′ ,h ′ [ n ( k ′ , h ′ ) − δ kk ′ δ hh ′ ] e − β ( n − k ′ ,h ′ e − α ( n − k,h − β ( n − k ′ ,h ′ !  , ∀ k, h (41) β ( n ) k,h = − ln  h ( A ∗ ) P k ′ ,h ′ [ n ( k ′ , h ′ ) − δ kk ′ δ hh ′ ] e − α ( n − k ′ ,h ′ e − α ( n − k,h − β ( n − k ′ ,h ′ !  , ∀ k, h (42)where the sums, now, run over the distinct values ofthe out- and in-degrees, n ( k, h ) is the number of nodeswhose out-degree is k and whose in-degree is h and, asusual, the last term at the denominator excludes theself-contribution (i.e. the probability that a node linksto itself). Performance testing.

As for the UBCM, the perfor-mance of the three algorithms in solving the reducedversion of eqs. (37) and (38) has been tested on a bunchof real-world networks. The latter ones span economic,ﬁnancial and social networks. In particular, we haveconsidered the World Trade Web (WTW) during thedecade 1992-2002 [33], a pair of snapshots of the BitcoinUser Network at the weekly time scale (the ﬁrst day ofthose weeks being 13-02-12 and 27-04-15, respectively)[34] and of the corresponding largest weakly connectedcomponent (whose size is, respectively, ≃

65% and ≃

90% of the full network size) and a snapshot of thesemantic network concerning the Twitter discussionabout the Covid-19 pandemics [35]. Before commentingon the results of our numerical exercises, let us, ﬁrst,describe how the latter ones have been carried out.The accuracy of each algorithm in reproducing the con-straints deﬁning the DBCM has been quantiﬁed via themaximum absolute error metrics that, in this case, readsMADE = max i {| k ∗ i − h k i i| , | h ∗ i − h h i i|} Ni =1 (43)and accounts for the presence of two diﬀerent degreesper node. As for the UBCM, it is the inﬁnite norm ofthe diﬀerence between the vector of the empirical valuesof the constraints and that of their expected values.The three diﬀerent ‘stop criteria’ we have consideredmatch the ones adopted for analysing the undirected More precisely, nodes can be divided into equivalence classes, ac-cording to the following deﬁnition: any two nodes are equivalentif and only if they have the same couple of out- and in-degrees. It is the network of re-tweets of the (online) Italian debate aboutCovid-19, collected in the period 21 st February - 20 th April 2020. case and consist in a condition on the Euclideannorm of the gradient of the likelihood function, i.e. ||∇ L ( ~θ ) || ≤ − , in a condition on the Euclideannorm of the vector of diﬀerences between the val-ues of the parameters at subsequent iterations, i.e. || ∆ ~θ || ≤ − , and in a condition on the maximumnumber of iterations: after 10000 steps, any of the threealgorithms stops.The results about the performance of our three algo-rithms are reported in table II. Overall, all recipes per-form very satisfactorily, being accurate, fast and scalable;however, while Newton’s and the quasi-Newton methodsstop because the condition on the norm of the likelihoodis satisﬁed, the ﬁxed-point recipe is always found to sat-isfy the limit condition on the number of steps (i.e. itruns for 10000 steps and, then, stops).Let us start by commenting the results concerningthe WTW. For what concerns accuracy, the largestmaximum error per method spans an interval, acrossall conﬁgurations, that amounts at MADE reducedNewton ≃ − , 10 − . MADE reducedQuasi-Newton . − andMADE reducedﬁxed-point ≃ − . By looking at each speciﬁcnetwork, it is evident that the most accurate method issystematically the quasi-Newton one.For what concerns speed, the amount of time requiredby each method to achieve convergence spans an interval(across all conﬁgurations) that is 0 . ≤ T reducedNewton ≤ . ≤ T reducedQuasi-Newton ≤ .

33 and 2 . ≤ T reducedﬁxed-point ≤ . Newton Quasi-Newton Fixed-point

N L c c r MADE Time (s) MADE Time (s) MADE Time (s)WTW ≃ . · − ≃ . · − ≃ . · − ≃ ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ · − ≃ ≃ · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ ≃ . · − ≃ . ≃ . · − ≃ . CCweek-1 ≃ . · − ≃ · − ≃ · − ≃ . ≃ . · − ≃ . ≃ · − ≃ . week-1 ≃ . · − ≃ . · − ≃ . · − ≃ ≃ . · − ≃ . ≃ · − ≃ . CCweek-2 ≃ · − ≃ · − ≃ · − ≃ ≃ . · − ≃ ≃ · − ≃ . week-2 ≃ · − ≃ . · − ≃ . · − ≃ ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ ≃ . · − ≃ ≃ . · − ≃ . N , the total numberof links, L , and the connectance, c = L/N ( N − ||∇ L ( ~θ ) || ≤ − is satisﬁed; the ﬁxed-point recipe, instead, alwaysreaches the limit of 10000 steps. The fastest and most accurate method is systematically the quasi-Newton one. The picturechanges when very large networks, as Bitcoin and Twitter, are considered: in these cases, the fastest and most accurate methodis the ﬁxed-point one. Only the results corresponding to the best choice of initial conditions are reported. i.e. they are very redundant, hosting many nodes withthe same out- and in-degrees. To provide a speciﬁc ex-ample, out of the original 676688 equations deﬁning theDBCM for one of the two Bitcoin snapshots consideredhere, only ≃

339 equations survive the reduction; by con-verse, the WTW can be reduced to a much smaller extent(to be more speciﬁc, out of the original 324 equationsdeﬁning the DBCM for the WTW in 1997, only ≃ O ( N ) forNewton’s method and O ( N ) for the quasi-Newton one,its calculation can be (very) time demanding - besiderequiring a lot of memory for the step-wise updateof the corresponding Hessian matrix. However, whilethis is compensated by a larger accuracy in the caseof Newton’s method, this is no longer true when the quasi-Newton recipe is considered - the reason maybelying in the poorer approximation provided by the di-agonal of the Hessian matrix in case of systems like these.As a last comment, we would like to stress that, as inthe undirected case, the generic coeﬃcient p DBCM ij rep-resents a proper probability, in turn implying that eq.(35) also provides us with a recipe to sample the canon-ical ensemble of BDNs, under the DBCM. Notice thatthe factorization of the graph probability P DBCM ( A | ~θ )greatly simpliﬁes the entire procedure, allowing a singlegraph to be sampled by implementing the Bernoulli trial a ij = ( − p DBCM ij p DBCM ij (44)for each (directed) pair of nodes, in either a sequen-tial or a parallel fashion. The sampling process, whosecomputational complexity amounts at O ( N ), can be re-peated to generate as many conﬁgurations as desired.The pseudo-code for explicitly sampling the DBCM en-semble is summed up by Algorithm 2.1 Algorithm 2

Sampling the DBCM ensemble for m = 1 . . . | E | do A = ; for i = 1 . . . N do for j = 1 . . . N and j = i do if RandomUniform[0 , ≤ p DBCM ij then a ij = 1; else a ij = 0; end if end for end for Ensemble[ m ] = A ; end forBiCM: bipartite binary undirected graphswith given degree sequences So far, we have considered monopartite networks.However, the algorithm we have described for solvingthe DBCM can be adapted, with little eﬀort, to solvea null model designed for bipartite, binary, undirectednetworks (BiBUNs), i.e. the so-called

Bipartite Conﬁgu-ration Model (BiCM) [36]. These networks are deﬁned bytwo distinct layers (say, ⊤ and ⊥ ) and obey the rule thatlinks can exist only between (and not within) layers: forthis reason, they can be compactly described via a biadja-cency matrix B ≡ { b iα } i,α whose generic entry b iα is 1 ifnode i belonging to layer ⊥ is linked to node α belongingto layer ⊤ and 0 otherwise. The constraints deﬁning theBiCM are represented by the degree sequences { k i } i ∈⊥ and { d α } α ∈⊤ where k i = P α ∈⊤ b iα counts the neigh-bors of node i (belonging to layer ⊤ ) and d α = P i ∈⊥ b iα counts the neighbors of node α (belonging to layer ⊥ ).Analogously to the DBCM case, P ( B | ~γ, ~β ) = Y i ∈⊥ Y α ∈⊤ p b iα iα (1 − p iα ) − b iα (45)where p iα = p BiCM iα ≡ e − γi − βα e − γi − βα . The canonical ensem-ble of BiBUNs includes all networks with, say, N nodeson one layer, M nodes on the other layer and a number oflinks (connecting nodes of diﬀerent layers) ranging fromzero to the maximum value N · M .The BiCM likelihood function reads L BiCM ( ~γ, ~β ) = − X i ∈⊥ γ i k i ( B ∗ ) − X α ∈⊤ β α d α ( B ∗ ) − X i ∈⊥ X α ∈⊤ ln (cid:2) e − γ i − β α (cid:3) (46)whose ﬁrst-order optimality conditions read ∇ γ i L BiCM = − k i ( B ∗ ) + X α ∈⊤ e − γ i − β α e − γ i − β α , i ∈ ⊥∇ β α L BiCM = − d α ( B ∗ ) + X i ∈⊥ e − γ i − β α e − γ i − β α , α ∈ ⊤ . (47) Resolution of the BiCM.

As for the DBCM case,Newton’s and the quasi-Newton methods can be imple-mented by adapting the recipe deﬁned in eq. (18) to thebipartite case (see Appendix A for the deﬁnition of theBiCM Hessian).As for the DBCM, the ﬁxed-point recipe for the BiCMcan be re-written in the usual iterative fashion as follows γ ( n ) i = − ln  k i ( B ∗ ) P α ∈⊤ (cid:18) e − β ( n − α e − γ ( n − i − β ( n − α (cid:19)  , ∀ i ∈ ⊥ β ( n ) α = − ln  d α ( B ∗ ) P i ∈⊥ (cid:18) e − γ ( n − i e − γ ( n − i − β ( n − α (cid:19)  , ∀ α ∈ ⊤ (48)and the initialization is similar as well: in fact, wecan employ the value of the solution of the BiCM inthe sparse case, i.e. γ (0) i = − ln h k i ( B ∗ ) √ L i , ∀ i ∈ ⊥ and β (0) α = − ln h d α ( B ∗ ) √ L i , ∀ α ∈ ⊤ (only this set of initialconditions has been employed to analyse the bipartitecase). Reducing the dimensionality of the problem.

Exactlyas for the DBCM case, the presence of nodes with thesame degree, on the same layer, leads to the appearanceof identical equations in the system above; hence, thecomputation of the solutions can be sped up by writing γ ( n ) k = − ln  k ( B ∗ ) P d g ( d ) (cid:18) e − β ( n − k,d e − γ ( n − k,d − β ( n − k,d (cid:19)  , ∀ kβ ( n ) d = − ln  d ( B ∗ ) P k f ( k ) (cid:18) e − γ ( n − k,d e − γ ( n − k,d − β ( n − k,d (cid:19)  , ∀ d (49)where f ( k ) is the number of nodes, belonging to layer ⊥ , whose degree is k and g ( d ) is the number of nodes,belonging to layer ⊤ , whose degree is d .2 Newton Quasi-Newton Fixed-point N + M L c c r MADE Time (s) MADE Time (s) MADE Time (s)WTW ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . N , the total numberof links, L , and the connectance, c = L/ ( N · M ), are provided). All algorithms stop because the condition ||∇ L ( ~θ ) || ≤ − is satisﬁed. For what concerns both accuracy and speed, the best performing method is Newton’s one, followed by the quasi-Newton and the ﬁxed-point recipes. Only the results corresponding to the best choice of initial conditions are reported. Performance testing.

The performance of the threealgorithms in solving eqs. (49) has been tested on 16snapshot of the bipartite, binary, undirected versionof the WTW, gathering the country-product exportrelationships across the years 1995-2010 [36]. Beforecommenting on the results of our numerical exercises, letus, ﬁrst, describe how the latter ones have been carriedout.The accuracy of each algorithm in reproducing the con-straints deﬁning the BiCM has been quantiﬁed via themaximum absolute error metrics that, now, readsMADE = max {| k ∗ − h k i| . . . | k ∗ N − h k N i| , | d ∗ − h d i| . . . | d ∗ M − h d M i|} (50)to account for the degrees of nodes on both layers.The three diﬀerent ‘stop criteria’ match the onesadopted for analysing the UBCM and the DBCM andconsist in a condition on the Euclidean norm of the gra-dient of the likelihood function, i.e. ||∇ L ( ~θ ) || ≤ − ,in a condition on the Euclidean norm of the vectorof diﬀerences between the values of the parametersat subsequent iterations, i.e. || ∆ ~θ || ≤ − , and acondition on the maximum number of iterations: after1000 steps, any of the three algorithms stops.The results about the performance of our three algo-rithms are reported in table III. Overall, all recipes areaccurate, fast and scalable; all methods stop because thecondition on the norm of the likelihood is satisﬁed. For what concerns accuracy, the largest maximumerror per method spans an interval (across all con-ﬁgurations) that amounts at MADE reducedNewton ≃ − ,10 − . MADE reducedQuasi-Newton . − and 10 − . MADE reducedﬁxed-point . − . By looking at each speciﬁcnetwork, it is evident that the most accurate method issystematically Newton’s one.For what concerns speed, the amount of time requiredby each method to achieve convergence spans an interval(across all conﬁgurations) that is T reducedNewton ≃ . T reducedQuasi-Newton ≃ .

016 (on average) and T reducedﬁxed-point ≃ .

018 (on average) (time is measured inseconds). The fastest method is Newton’s one andis followed by the quasi-Newton and the ﬁxed-pointrecipes. Overall, the gain in terms of speed due to thereducibility of the system of equations deﬁning the BiCMis evident: while solving the original problem wouldhave required handling a system of ≃ equations, thereduced one is deﬁned by just ≃ distinct equations.Overall, a solution is always found within thousandthsor hundredths of seconds.As for the DBCM case, the ensemble of BiBUNs can besampled by implementing a Bernoulli trial b iα ∼ Ber[ p iα ]for any two nodes (belonging to diﬀerent layers) in eithera sequential or a parallel fashion. The computationalcomplexity of the sampling process amounts at O ( N · M )and can be repeated to generate as many conﬁgurationsas desired. The pseudo-code for explicitly sampling theBiCM ensemble is summed up by Algorithm 3.3 Algorithm 3

Sampling the BiCM ensemble for m = 1 . . . | E | do B = ; for i = 1 . . . N do for α = 1 . . . M do if RandomUniform[0 , ≤ p BiCM iα then b iα = 1; else b iα = 0; end if end for end for Ensemble[ m ] = B ; end forUECM: weighted undirected graphswith given strengths and degrees So far, we have considered purely binary models. Letus now focus on a class of ‘mixed’ null models forweighted networks, deﬁned by constraining both binaryand weighted quantities . Let us start by the simplestmodel, i.e. the one constraining the degrees and thestrengths in an undirected fashion. While k i ( A ) = P Nj ( = i )=1 a ij counts the number of neighbors of node i , s i ( W ) = P Nj ( = i )=1 w ij deﬁnes the weighted equivalent ofthe degree of node i , i.e. its strength. For consistency, thebinary adjacency matrix can be deﬁned via the Heavisidestep function, Θ[ . ], as A ≡ Θ[ W ] a position indicatingthat a ij = 1 if w ij > ∀ i < j and zero otherwise. Thisparticular model is known as Undirected Enhanced Con-ﬁguration Model (UECM) [37–39] and its Hamiltonianreads H UECM ( W , ~α, ~β ) = N X i =1 [ α i k i ( A ) + β i s i ( W )]; (51)it induces a probability distribution which is halfway be-tween a Bernoulli and a geometric one [38], i.e. Q UECM ( W | ~α, ~β ) = N Y i =1 N Y j =1( j Purely weighted models such as the

Undirected and the

DirectedWeighted Conﬁguration Model have not been considered since,as it has been proven elsewhere [37], they perform quite poorlywhen employed to reconstruct networks. for any two nodes i and j such that i < j and p UECM ij = e − αi − αj − βi − βj − e − βi − βj + e − αi − αj − βi − βj . Notice that the functional formabove is obtained upon requiring that the weights onlyassume (non-negative) integer values (i.e. w ij ∈ [0 , + ∞ ), ∀ i < j ): hence, the canonical ensemble is now consti-tuted by the weighted conﬁgurations with N nodes anda number of (undirected) links ranging between zero andthe maximum value (cid:0) N (cid:1) .The argument of the problem (14) for the speciﬁc net-work W ∗ now becomes L UECM ( ~α, ~β ) = − N X i =1 [ α i k i ( A ∗ ) + β i s i ( W ∗ )] − N X i =1 N X j =1( j
Newton’s and the quasi-Newton methods can be easily implemented via therecipe deﬁned in eq. (18) (see Appendix A for the deﬁ-nition of the UECM Hessian).As for the purely binary models, the ﬁxed-point recipefor solving the UECM ﬁrst-order optimality conditionstransforms the following set of consistency equations α i = − ln  k i ( A ∗ ) P Nj =1( j = i ) (cid:16) e − αj − βi − βj − e − βi − βj + e − αi − αj − βi − βj (cid:17)  ,β i = − ln  s i ( W ∗ ) P Nj =1( j = i ) (cid:16) e − αi − αj − βj (1 − e − βi − βj )(1 − e − βi − βj + e − αi − αj − βi − βj ) (cid:17)  (56)(with i = 1 . . . N ) into the usual iterative fashion, byconsidering the parameters at the left hand side and atthe right hand side, respectively at the n -th and at the( n − Newton Quasi-Newton

N L c

MRDE MASE MRSE Time (s) MRDE MASE MRSE Time (s)WTW ≃ . ≃ . · − ≃ · − ≃ · − ≃ . ≃ . · − ≃ · − ≃ · − ≃ ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ · − ≃ . · − ≃ ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ · − ≃ · − ≃ . · − ≃ ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ · − ≃ . · − ≃ ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ · − ≃ · − ≃ . · − ≃ ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ · − ≃ . · − ≃ ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ · − ≃ · − ≃ ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ · − ≃ . · − ≃ ≃ . ≃ · − ≃ · − ≃ · − ≃ . ≃ . · − ≃ · − ≃ . · − ≃ ≃ . ≃ . · − ≃ · − ≃ · − ≃ . ≃ . · − ≃ · − ≃ . · − ≃ ≃ . ≃ · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ · − ≃ . · − ≃ N , the total number of links, L ,and the connectance, c = 2 L/N ( N − ||∇ L ( ~θ ) || ≤ − is satisﬁed, the quasi-Newton one always reaches the limit of 10000 steps. The results on accuracy and speed clearly indicatethat Newton’s method outperforms the quasi-Newton one. Only the results corresponding to the best choice of initial conditionsare reported. The results of the ﬁxed-point recipe are not shown. reduced version of the iterative recipe above can indeedbe written, by assigning the same pair of values ( α, β ) tothe nodes with the same pair of values ( k, s ): however,the larger heterogeneity of the strengths causes this eventto happen more rarely than for purely binary models suchas the UBCM and the DBCM.As for the purely binary cases, three diﬀerent sets ofinitial conditions have been considered, whose deﬁnitionfollows from the simplest conceivable generalization ofthe purely binary cases. In particular, the ﬁrst set ofvalues reads α (0) i = − ln h k i ( A ∗ ) √ L i , i = 1 . . . N and β (0) i = − ln h s i ( W ∗ ) √ W i , i = 1 . . . N ; the second set is a variantof the ﬁrst, reading α (0) i = − ln h k i ( A ∗ ) √ N i , i = 1 . . . N and β (0) i = − ln h s i ( W ∗ ) √ N i , i = 1 . . . N ; the third recipe,instead, prescribes to randomly draw the value of eachparameter from the uniform distribution deﬁned on theunit interval, i.e. α (0) i ∼ U (0 , ∀ i and β (0) i ∼ U (0 , ∀ i . Performance testing.

The performance of the threealgorithms to solve the system of equations deﬁningthe UECM has been tested on a bunch of real-worldnetworks. In particular, we have considered the WTWduring the decade 1990-2000 [40]. Since the weightsdeﬁning the conﬁgurations of the WTW are real num-bers, we have rounded them to the nearest integer value,before running the UECM. Before commenting on theresults of our numerical exercises, let us, ﬁrst, describehow the latter ones have been carried out.The accuracy of each algorithm in reproducing the con-straints deﬁning the UECM has been now quantiﬁed viathe maximum relative error metrics, deﬁned, in a per-fectly general fashion, as max i n | C ∗ i −h C i i| C i o Ni =1 (where C ∗ i is the empirical value of the i -th constraint, C i ). In theUECM case, we can deﬁne two variants of the aforemen-tioned error, i.e.MRDE = max i (cid:26) | k ∗ i − h k i i| k i (cid:27) Ni =1 (57)MRSE = max i (cid:26) | s ∗ i − h s i i| s i (cid:27) Ni =1 (58)(the acronyms standing for Maximum Relative DegreeError and Maximum Relative Strength Error). The rea-son driving this choice lies in the evidence that, in ab-solute terms, strengths are aﬀected by a larger numeri-cal error than degrees: this, however, doesn’t necessar-ily mean that a given algorithm performs poorly, as themagnitude of an error must be always compared with thenumerical value of the quantity it refers to - whence thechoice of considering relative scores.The three diﬀerent ‘stop criteria’ we have consid-ered for each algorithm match the ones adopted foranalysing the binary cases, consisting in a condition onthe Euclidean norm of the gradient of the likelihoodfunction, i.e. ||∇ L ( ~θ ) || ≤ − , and in a condition onthe Euclidean norm of the vector of diﬀerences betweenthe values of the parameters at subsequent iterations,i.e. || ∆ ~θ || ≤ − . The third condition concerns themaximum number of iterations: after 10000 steps, anyof the three algorithms stops.The results about the performance of our three algo-rithms are reported in table IV. Overall, two out of threealgorithms (i.e. Newton’s and the quasi-Newton meth-ods) perform very satisfactorily, being accurate, fast andscalable; the third one (i.e. the ﬁxed-point recipe), in-stead, performs very poorly. Moreover, while Newton’s5method stops because the condition on the norm of thelikelihood is satisﬁed, both the quasi-Newton and theﬁxed-point algorithms are always found to satisfy thelimit condition on the number of steps (i.e. they runfor 10000 steps and, then, stop).For what concerns accuracy, the largest maximumerror made by Newton’s method (across all con-ﬁgurations) amounts at 10 − . MRDE

Newton . − and MRSE Newton & − ; on the otherhand, the largest maximum error made by the quasi-Newton method (across all conﬁgurations) amountsat 10 − . MRDE

Quasi-Newton . − and 10 − ≤ MRSE

Quasi-Newton ≤ − . For what concerns speed,Newton’s method employs tenths of seconds to achieveconvergence on each conﬁguration while the quasi-Newton one always requires tens of seconds (speciﬁcally,almost thirty seconds for each considered conﬁguration).The results above indicate that the fastest and two mostaccurate method is systematically Newton’s one, suggest-ing that the ‘complexity’ of the model is such that theinformation encoded into the Hessian matrix cannot beignored without consequences on the quality of the so-lution. The ﬁxed-point algorithm, instead, stops afterseconds but is aﬀected by errors whose order of system-atically magnitude amounts at MRDE ﬁxed-point ≃ and 1 . MRSE ﬁxed-point . .We also explicitly notice that the MADE basicallycoincides with the MRDE for all considered conﬁgura-tions, meaning that the largest error, made by the al-gorithms considered here to solve the UECM, aﬀectsthe nodes with the lowest degree (i.e. equal to one).On the other hand, strengths are aﬀected by a largerabsolute error (i.e. the MASE, deﬁned as MASE =max i {| s ∗ i − h s i i|} Ni =1 ) than the degrees: if we calculatethe MRSE, however, we realize that the largest errors af-fect very large strengths - hence being perfectly accept-able. For example, let us consider the WTW in 1993:the MASE amounts at 0.1 but, as the MRSE reveals, itaﬀects a strength of the order of 10 .Lastly, diﬀerences in the speed of convergence ofthe two methods discussed in this section, caused bythe choice of a particular set of initial conditions, areobservable: the ‘uniform’ prescription outperforms theother ones.Finally, let us comment on the algorithm to sample theUECM ensemble and that can be compactly achieved byimplementing a two-step procedure. Let us look backat the formal expression for the pair-speciﬁc probabilitydistribution characterizing the UECM: it induces coeﬃ-cients reading  − p UECM ij , w = 0 p UECM ij (1 − e − β i − β j ) , w = 1 p UECM ij ( e − β i − β j )(1 − e − β i − β j ) , w = 2 p UECM ij ( e − β i − β j ) (1 − e − β i − β j ) , w = 3... (59)in turn suggesting that, for a speciﬁc pair of vertices i, j (with i < j ), the appearance of the ﬁrst link is ruled by aBernoulli distribution with probability p UECM ij while theremaining ( w −

1) ones can be drawn from a geometricdistribution whose parameter reads e − β i − β j ; in otherwords, the weight ( w −

1) is drawn conditionally on thepresence of a connection between the two considerednodes. The computational complexity of the samplingprocess is, again, O ( N ). The pseudo-code for explicitlysampling the DBCM ensemble is summed up by Algo-rithm 4. DECM: weighted directed graphswith given strengths and degrees

Let us now extend the ‘mixed’ model introduced in theprevious section to the case of directed networks. Con-straints are, now, represented by four sequences of values,i.e. { k outi } Ni =1 , { k ini } Ni =1 , { s outi } Ni =1 , { s ini } Ni =1 where thegeneric out-degree and in-degree are, respectively, deﬁnedas k outi ( A ) = P Nj ( = i )=1 a ij and k ini ( A ) = P Nj ( = i )=1 a ji and analogously for the generic out-strength and in-strength, reading s outi ( W ) = P Nj ( = i )=1 w ij and s ini ( W ) = P Nj ( = i )=1 w ji . Consistency requires that A ≡ Θ[ W ] asfor the UECM case. This model is known as DirectedEnhanced Conﬁguration Model (DECM) and its Hamil-tonian reads H DECM ( W , ~α, ~β, ~γ, ~δ ) = N X i =1 [ α i k outi ( A ) + β i k ini ( A )+ γ i s outi ( W ) + δ i s ini ( W )](60)in turn, inducing the directed counterpart of the UECMdistribution, i.e. Q DECM ( W | ~α, ~β, ~γ, ~δ ) = N Y i =1 N Y j =1( j = i ) q ij ( w ij ) (61)with6 Algorithm 4

Sampling the UECM ensemble for m = 1 . . . | E | do W = ; for i = 1 . . . N do for j = 1 . . . N and j < i do if RandomUniform[0 , ≤ p UECM ij then w ij = w ji = RandomGeometric[ e − β i − β j ]; else w ij = w ji = 0; end if end for end for Ensemble[ m ] = W ; end for q ij ( w ) = ( − p DECM ij w = 0 p DECM ij ( e − γ i − δ j ) w − (1 − e − γ i − δ j ) w > i and j such that i = j and p DECM ij = e − αi − βj − γi − δj − e − γi − δj + e − αi − βj − γi − δj . As for the undirected case,weights are required to assume only (non-negative) in-teger values (i.e. w ij ∈ [0 , + ∞ ), ∀ i = j ): hence, thecanonical ensemble is constituted by the weighted conﬁg-urations with N nodes and a number of (directed) linksranging between zero and the maximum value N ( N − W ∗ becomes L DECM ( ~α, ~β, ~γ, ~δ ) = − N X i =1 [ α i k outi ( A ∗ ) + β i k ini ( A ∗ )+ γ i s outi ( W ∗ ) + δ i s ini ( W ∗ )] − N X i =1 N X j =1( j = i ) ln z ij (63)where z ij = h e − α i − β j (cid:16) e − γi − δj − e − γi − δj (cid:17)i , ∀ i = j andwhose ﬁrst-order optimality conditions read ∇ α i L DECM = − k outi ( A ∗ ) + N X j ( = i )=1 p DECM ij = − k outi ( A ∗ ) + h k outi i = 0 , i = 1 . . . N, ∇ β i L DECM = − k ini ( A ∗ ) + N X j ( = i )=1 p DECM ji = − k ini ( A ∗ ) + h k ini i = 0 , i = 1 . . . N, ∇ γ i L DECM = − s outi ( W ∗ ) + N X j ( = i )=1 p DECM ij − e − γ i − δ j = − s outi ( W ∗ ) + h s outi i = 0 , i = 1 . . . N, ∇ δ i L DECM = − s ini ( W ∗ ) + N X j ( = i )=1 p DECM ji − e − γ j − δ i = − s ini ( W ∗ ) + h s ini i = 0 , i = 1 . . . N. (64) Resolution of the DECM.

Newton’s and the quasi-Newton methods can be easily implemented via therecipe deﬁned in eq. (18) (see Appendix A for the deﬁ-nition of the DECM Hessian).As for the UECM, the ﬁxed-point recipe for solvingthe DECM ﬁrst-order optimality conditions transformsthe following set of consistency equations α i = − ln  k outi ( A ∗ ) P Nj =1( j = i ) (cid:16) e − βj − γi − δj − e − γi − δj + e − αi − βj − γi − δj (cid:17)  ,β i = − ln  k ini ( A ∗ ) P Nj =1( j = i ) (cid:16) e − αj − γj − δi − e − γj − δi + e − αj − βi − γj − δi (cid:17)  ,γ i = − ln  s outi ( W ∗ ) P Nj =1( j = i ) (cid:16) e − αi − βj − δj (1 − e − γi − δj )(1 − e − γi − δj + e − αi − βj − γi − δj ) (cid:17)  ,δ i = − ln  s ini ( W ∗ ) P Nj =1( j = i ) (cid:16) e − αj − βi − γj (1 − e − γj − δi )(1 − e − γj − δi + e − αj − βi − γj − δi (cid:17)  . (65)(with i = 1 . . . N ) into the usual iterative fashion, byconsidering the parameters at the left hand side and atthe right hand side, respectively at the n -th and at the( n − α, β, γ, δ ) to thenodes for which the quantities ( k out , k in , s out , s in ) havethe same value: however, the larger heterogeneity of thestrengths causes the DECM to be much less reducible7than the purely binary models we have considered in thepresent contribution.The three diﬀerent sets of initial conditionsthat have been considered generalize the UECMones: in particular, the ﬁrst set of values reads α (0) i = − ln h k outi ( A ∗ ) √ L i , i = 1 . . . N , β (0) i = − ln h k ini ( A ∗ ) √ L i , i = 1 . . . N , γ (0) i = − ln h s outi ( W ∗ ) √ W i , i = 1 . . . N and δ (0) i = − ln h s ini ( W ∗ ) √ W i , i = 1 . . . N ; the second set ofinitial conditions can be obtained by simply replacing L with N ; the third recipe, as usual, prescribes torandomly draw the value of each parameter from theuniform distribution deﬁned on the unit interval. Performance testing.

The performance of the threealgorithms to solve the system of equations deﬁning theDECM has been tested on a bunch of real-world net-works. In particular, we have considered the ElectronicItalian Interbank Market (e-MID) during the decade2000-2010 [41]. Since e-MID weights are real numbers,we have rounded them to the nearest integer value,before running the DECM. Before commenting on theresults of our numerical exercises, let us, ﬁrst, describehow the latter ones have been carried out.The accuracy of each algorithm in reproducing the con-straints deﬁning the DECM has been quantiﬁed via the maximum relative error metrics, now readingMRDE = max i (cid:26) | k ∗ i − h k i i| k i , | h ∗ i − h h i i| h i (cid:27) Ni =1 (66)MRSE = max i (cid:26) | s ∗ i − h s i i| s i , | t ∗ i − h t i i| t i (cid:27) Ni =1 (67)(the acronyms standing for for Maximum Relative DegreeError and Maximum Relative Strength Error) where wehave deﬁned k out ≡ k , k in ≡ h , s out ≡ s and s in ≡ t inorder to simplify the formalism.The three diﬀerent ‘stop criteria’ we have adopted arethe same ones we have considered for both the binaryand the undirected, ‘mixed’ model, i.e. the conditionon the Euclidean norm of the gradient of the likelihoodfunction, i.e. ||∇ L ( ~θ ) || ≤ − ), the condition on theEuclidean norm of the vector of diﬀerences between thevalues of the parameters at subsequent iterations (i.e. || ∆ ~θ || ≤ − ) and the condition on the maximumnumber of iterations (i.e. after 10000 steps, any of thethree algorithms stops).The results about the performance of our three algo-rithms are reported in table V. Overall, Newton’s methodperforms very satisfactorily, being accurate, fast and scal-able; the quasi-Newton method is accurate as well al-though (in some cases, much) slower. The ﬁxed-point recipe, instead, performs very poorly, as for the undi-rected case. Moreover, while Newton’s method stopsbecause the condition on the norm of the likelihood issatisﬁed, both the quasi-Newton and the ﬁxed-point al-gorithms are always found to satisfy the limit conditionon the number of steps (i.e. they run for 10000 steps and,then, stop).For what concerns accuracy, the largest maximum er-ror made by Newton’s method (across all conﬁgurations)amounts at 10 − . MRDE

Newton . − and 10 − . MRSE

Newton . − ; on the other hand, the largest max-imum error made by the quasi-Newton method (across allconﬁgurations) amounts at 10 − . MRDE

Quasi-Newton . − and 10 − . MRSE

Quasi-Newton . − . For whatconcerns speed, Newton’s method employs tens of sec-onds to achieve convergence on each conﬁguration; thetime required by the quasi-Newton method is of the sameorder of magnitude, although it is systematically largerthan the time required by Newton’s one. Overall, theseresults indicate that the fastest and two most accuratemethod is Newton’s one. As in the undirected case,the ﬁxed-point algorithm, instead, stops after secondsbut is aﬀected by errors whose order of systematicallymagnitude amounts at 10 . MRDE ﬁxed-point . and1 . MRSE ﬁxed-point . .As for the UECM, the MADE basically coincideswith the MRDE, for all considered conﬁgurations, whilestrengths are aﬀected by a larger absolute error than thedegrees: still, upon calculating the MRSE, we realize thatthe largest errors aﬀect very large strengths - hence beingperfectly acceptable.Lastly, diﬀerences in the speed of convergence ofthe two methods discussed in this section, caused bythe choice of a particular set of initial conditions, areobservable: the ‘uniform’ prescription outperforms theother ones.Finally, let us comment on the algorithm to sample theDECM ensemble: as for the UECM, it can be compactlyachieved by implementing the directed counterpart of thetwo-step procedure described above. Given a speciﬁc pairof vertices i, j (with i = j ), the ﬁrst link can be drawn bysampling a Bernoulli distribution with probability p DECM ij while the remaining ( w −

1) ones can be drawn from ageometric distribution whose parameter reads e − γ i − δ j .The computational complexity of the sampling process is,again, O ( N ) and the pseudo-code for explicitly samplingthe DECM ensemble is summed up by Algorithm 5. Two-step models forundirected and directed networks

The need of considering network models deﬁned in atwo-step fashion arises from a number of considerations.First, the amount of information concerning binary and8

Newton Quasi-Newton

N L c

MRDE MASE MRSE Time (s) MRDE MASE MRSE Time (s)e-MID ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ . ≃ . · − ≃ · − ≃ · − ≃ . ≃ . · − ≃ ≃ · − ≃ . ≃ . ≃ . · − ≃ · − ≃ · − ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ ≃ . · − ≃ . ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ · − ≃ ≃ . · − ≃ . ≃ . ≃ · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ ≃ . · − ≃ . ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ · − ≃ ≃ . · − ≃ . ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ ≃ . · − ≃ . ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ ≃ . · − ≃ . ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ ≃ · − ≃ . ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ ≃ . · − ≃ . N , the total number of links, L ,and the connectance, c = L/N ( N − ||∇ L ( ~θ ) || ≤ − issatisﬁed, the quasi-Newton one always reaches the limit of 10000 steps. The results on accuracy and speed clearly indicate thatNewton’s method outperforms the quasi-Newton one. Only the results corresponding to the best choice of initial conditionsare reported. The results of the ﬁxed-point recipe are not shown. weighted quantities is often asymmetric: as it has beenpointed out in [42], information concerning a given net-work structure ranges from the knowledge of just a single,aggregated piece of information (e.g. the link density) tothat of entire subgraphs. Indeed, models exist that takeas input any binary, either probabilistic or determinis-tic, network model (i.e. any P ( A )) while placing linkweights optimally, conditionally on the input conﬁgura-tions [19, 42].Second, recipes like the UECM and the DECM are,generally speaking, diﬃcult to solve; as we have alreadyobserved, only Newton’s method performs in a satisfac-tory way, both for what concerns accuracy and speed:hence, easier-to-solve recipes are welcome.In what follows, we will consider the conditional recon-struction method (hereby, CReM) induced by the Hamil-tonian H CReM ( W , ~θ ) = N X i =1 θ i s i ( A ); (68)in case of undirected networks, it induces a conditionalprobability distribution reading Q ( W | A ) = N Y i =1 N Y j =1( j w ≤ W ∗ becomes G CReM = − N X i =1 θ i s i ( W ∗ ) + N X i =1 N X j =1( j
Newton’s and the quasi-Newton method can still be implemented via the recipedeﬁned in eq. (18) (see Appendix A for the deﬁnition ofthe CReM Hessian).As for the UECM and the DECM, the ﬁxed-pointrecipe for solving the system of equations embodying theCReM transforms the set of consistency equations θ i =  s i ( W ∗ ) P Nj =1( j = i ) (cid:16) f ij θ j /θ i (cid:17)  − , i = 1 . . . N (73)into an iterative recipe of the usual form, i.e. by consid-ering the parameters at the left hand side and at the righthand side, respectively at the n -th and at the ( n − θ (0) i = − ln h s i ( W ∗ ) √ W i , i = 1 . . . N ; the second one is a variant of the positionabove, reading θ (0) i = − ln h s i ( W ∗ ) √ N i ; the third one,instead, prescribes to randomly draw the value of eachparameter from the uniform distribution deﬁned on theunit interval, i.e. θ (0) i ∼ U (0 , ∀ i .When considering directed networks, the conditionalprobability distribution deﬁning the CReM reads q ij ( w ij | a ij = 1) = ( ( α i + β j ) e − ( α i + β j ) w w > w ≤ i and j such that i = j ; the set ofequations (73) can be generalized as follows α i =  s outi ( W ∗ ) P Nj =1( j = i ) (cid:16) f ij β j /α i (cid:17)  − , i = 1 . . . Nβ i =  s ini ( W ∗ ) P Nj =1( j = i ) (cid:16) f ji α j /β i (cid:17)  − , i = 1 . . . N (75)and analogously for the sets of values initializing them. Algorithm 5

Sampling the DECM ensemble for m = 1 . . . | E | do W = ; for i = 1 . . . N do for j = 1 . . . N and j = i do if RandomUniform[0 , ≤ p DECM iα then w ij = RandomGeometric[ e − γ i − δ j ]; else w ij = 0; end if end for end for Ensemble[ m ] = W ; end for Rescaling the CReM algorithm.

Although the equa-tions deﬁning the CReM algorithm cannot be eﬀectivelyreduced, they can be opportunely rescaled . To this aim,let us consider directed conﬁgurations and the system  P Nj =1 j ( = i ) f ij α i ( κ )+ β j ( κ ) = s outi ( W ∗ ) κ , ∀ i P Nj =1 j ( = i ) f ji α j ( κ )+ β i ( κ ) = s ini ( W ∗ ) κ , ∀ i (76)where the suﬃcient statistics has been divided by an op-portunely deﬁned factor (in this case, κ ) and the symbols α i ( κ ), α j ( κ ), β i ( κ ) and β j ( κ ) stress that the solution weare searching for is a function of the parameter κ itself.In fact, a solution of the system above reads α ∗ i = α ∗ i ( κ ) κ , ∀ i (77) β ∗ i = β ∗ i ( κ ) κ , ∀ i (78)as it can be proven upon substituting it back intoeqs. (76) and noticing that { α ∗ i } Ni =1 and { β ∗ i } Ni =1 aresolutions of the system of equations deﬁned in (76). Asour likelihood maximization problem admits a unique,global maximum, the prescription above allows us toeasily identify it. Rescaling will be tested in order toﬁnd out if our algorithms are enhanced by it under somerespect (e.g. accuracy or speed). Performance testing.

Before commenting on theperformance of the three algorithms in solving the sys-tem of equations deﬁning the CReM, let us stress oncemore that the formulas presented so far are perfectlygeneral, working for any binary recipe one may wantto employ. In what follows, we will test the CReM byposing f ij ≡ p UBCM ij and f ij ≡ p DBCM ij .To test the eﬀectiveness of our algorithms in solvingthe CReM on undirected networks we have consideredthe synaptic network of the worm C. Elegans [28]and the eight daily snapshots of the Bitcoin Lightning0

Newton Quasi-Newton Fixed-pointMRDE MASE MRSE Time (s) MRDE MASE MRSE Time (s) MRDE MASE MRSE Time (s)BLN ≃ · − ≃ − ≃ − ≃ . ≃ · − ≃ − ≃ − ≃ ≃ · − ≃ − ≃ − ≃ . ≃ · − ≃ − ≃ − ≃ . ≃ · − ≃ − ≃ − ≃ ≃ · − ≃ − ≃ − ≃ . ≃ · − ≃ − ≃ − ≃ ≃ · − ≃ − ≃ − ≃ ≃ · − ≃ − ≃ − ≃ ≃ · − ≃ − ≃ − ≃ ≃ · − ≃ − ≃ ≃ ≃ · − ≃ ≃ · − ≃ . ≃ · − ≃ − ≃ − ≃ ≃ · − ≃ − ≃ ≃ ≃ · − ≃ − ≃ − ≃ . ≃ · − ≃ − ≃ − ≃ ≃ · − ≃ − ≃ ≃ ≃ · − ≃ − ≃ − ≃ ≃ · − ≃ − ≃ − ≃ ≃ · − ≃ − ≃ ≃ ≃ · − ≃ − ≃ − ≃ ≃ · − ≃ − ≃ − ≃ ≃ · − ≃ − ≃ − ≃ ≃ · − ≃ − ≃ − ≃ ||∇ L ( ~θ ) || ≤ − is satisﬁed, the quasi-Newton one often reaches the limit of 10000 steps. The results on accuracyand speed indicate that Newton’s and the ﬁxed-point method compete, outperforming the quasi-Newton one. Only the resultscorresponding to the best choice of initial conditions are reported. Network [32]; the directed version of the CReM has,instead, been solved on the Electronic Italian InterbankMarket (e-MID) during the decade 2000-2010 [41].Before commenting on the results of our numericalexercises, let us, ﬁrst, describe how the latter ones havebeen carried out.As for the discrete ‘mixed’ models, the accuracy ofeach algorithm in reproducing the constraints deﬁningthe CReM has been quantiﬁed via the Maximum Rela-tive Degree Error and the Maximum Relative StrengthError metrics, whose deﬁnition is provided by eqs. (57),(58) and (66), (67) for the undirected and the directedcase, respectively. Analogously, the three ‘stop criteria’for each algorithm are the same ones that we haveadopted for the other models (and consist in a conditionon the Euclidean norm of the gradient of the likelihoodfunction, i.e. ||∇ L ( ~θ ) || ≤ − , a condition on theEuclidean norm of the vector of diﬀerences betweenthe values of the parameters at subsequent iterations,i.e. || ∆ ~θ || ≤ − , and a condition on the maximumnumber of iterations, i.e. 10000 steps).The results about the performance of our three al-gorithms are reported in table VI and in table VII.Let us start by commenting the results reported in ta-ble VI and concerning undirected networks. Generallyspeaking, Newton’s method is the most accurate one (itslargest maximum errors span intervals, across all conﬁg-urations, that amount at 10 − . MRDE

Newton . − and 10 − . MRSE

Newton . − ) although it scales verybadly with the size of the network on which it is tested(the amount of time, measured in seconds, required by itto achieve convergence spans an interval, across all con-ﬁgurations, that amounts at 0 . ≤ T reducedNewton ≤ Quasi-Newton ≃ − and 10 − . MRSE

Quasi-Newton . ≤ T Quasi-Newton ≤ ﬁxed-point ≃ − and 10 − . MRSE ﬁxed-point . − ) although it outperforms Newton’s method, some-times. For what concerns scalability, the ﬁxed-pointmethod is the less sensitive one to the growing size of theconsidered conﬁgurations: hence, it is also the fastest one(the amount of time, measured in seconds, required by itto achieve convergence spans an interval, across all con-ﬁgurations, that amounts at 0 . ≤ T ﬁxed-point ≤ O ( N ) for Newton’smethod and O ( N ) for the quasi-Newton one - can makeits calculation (very) time demanding.1 Newton Quasi-Newton Fixed-point

N L

MRDE MRSE Time (s) MRDE MRSE Time (s) MRDE MRSE Time (s)e-MID ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ||∇ L ( ~θ ) || ≤ − is satisﬁed. For what concerns accuracy, the two most accurate methods are Newton’s and the quasi-Newton one; for whatconcerns speed, the fastest method is the ﬁxed-point one. Only the results corresponding to the best choice of initial conditionsare reported. Let us now move to comment on the performance ofour algorithms when applied to solve the directed versionof the CReM (see table VII). Overall, all methods per-form much better than in the undirected case, stoppingbecause the condition on the norm of the likelihood issatisﬁed.In fact, all of them are very accurate in reproduc-ing the purely binary constraints, their largest max-imum errors spanning intervals, across all conﬁgura-tions, that amount at 10 − . MRDE

Newton . − ,10 − . MRDE

Quasi-Newton . − and 10 − . MRDE ﬁxed-point . − ; for what concerns the weightedconstraints, instead, the two most accurate methodsare Newton’s and the quasi-Newton one, their largestmaximum errors spanning intervals, across all conﬁgu-rations, that amount at 10 − . MRSE

Newton . − and 10 − . MRSE

Quasi-Newton . − (the ﬁxed-point method performs worse than them, since 10 − . MRSE ﬁxed-point . − ).For what concerns speed, the amount of time, mea-sured in seconds, required by Newton’s, the quasi-Newton and the ﬁxed-point algorithms to achieve con-vergence spans an interval, across all conﬁgurations, thatamounts at 0 . ≤ T reducedNewton ≤

1, 0 . ≤ T reducedQuasi-Newton ≤ . . ≤ T reducedﬁxed-point ≤ .

2, respectively. Hence, allmethods are also very fast - the ﬁxed-point being sys-tematically the fastest one.As already stressed above, the fact that the e-MID number of nodes remains approximately constantthroughout the considered time interval masks the strongdependence of the performance of Newton’s and thequasi-Newton method on the network size.Lastly, while rescaling the system of equations deﬁn-ing the CReM improves neither the accuracy nor thespeed of any of the three algorithms considered here, diﬀerences in their speed of convergence, caused bythe choice of a particular set of initial conditions, areobservable: the ‘uniform’ prescription outperforms theother ones (for both the undirected and the directedversion of the CReM).As usual, let us comment on the algorithm to samplethe CReM ensemble - for the sake of simplicity, on theundirected cae. As for the UECM it can be compactlyachieved by implementing a two-step procedure, the onlydiﬀerence lying in the functional form of the distributionfrom which weights are sampled. Given a speciﬁc pair ofvertices i, j (with i < j ), the ﬁrst link can be drawn froma Bernoulli distribution with probability p UBCM ij while theremaining ( w −

1) ones can be drawn from an exponentialdistribution whose parameter reads θ i + θ j . The com-putational complexity of the sampling process is, again, O ( N ) and the pseudo-code for explicitly sampling theCReM ensemble is summed up by Algorithm 6. DISCUSSION

The exercises carried out so far have highlighted anumber of (stylized) facts concerning the performanceof the three algorithms tested: in what follows, we willbrieﬂy sum them up.

Newton’s method.

Overall, Newton’s method is veryaccurate - often, the most accurate one - in reproducingboth the binary and the weighted constraints; moreover,it represent the only viable alternative when the mostcomplicated models are considered (i.e. the UECM andthe DECM, respectively deﬁned by a system of 2 N and4 N coupled, non-linear equations). However, the timerequired to run Newton’s method on a given model seemsto be quite dependent on the network size, especially2whenever the corresponding system of equations cannotbe reduced - see the case of the undirected CReM, runon the Bitcoin Lightning Network. Since one of thereasons aﬀecting the bad scaling of Newton’s methodwith the network size is the evaluation of the Hessianmatrix deﬁning a given model, this algorithm has to bepreferred for largely reducible networks. Quasi-Newton method.

For all the networks con-sidered here, the quasi-Newton method we haveimplemented is nothing else than the diagonal versionof the traditional Newton’s method. Even if this choicegreatly reduces the number of entries of the Hessianmatrix which are needed (i.e. just N elements for theundirected version of the CReM, 2 N elements for theUECM and the directed version of the CReM and 4 N elements for the DECM) dimensionality may still repre-sent an issue to achieve fast convergence. Moreover, sincethe diagonal approximation of the Hessian matrix is notnecessarily always a good one, the quasi-Newton methodmay require more time than Newton’s one to achieve thesame level of accuracy in reproducing the constraints.However, when such an approximation is a good one,the ‘regime’ in which the quasi-Newton method out-performs the competitors seems to be the one of small,non-reducible networks (e.g. see the results concerningthe DBCM run on the WTW) - althogh, in cases likethese, Newton’s method may still be a strong competitor. Fixed-point method.

From a purely theoretical pointof view, the ﬁxed-point recipe is the fastest one, sincethe time required to evaluate the generic n -th stepis (only) due to the evaluation of the model-speciﬁcmap at the ( n − ﬁxed-point & − and MRSE ﬁxed-point ≃ − ;naturally, the method is not as accurate as New-ton’s one, for which MRDE Newton & − andMRSE Newton ≃ − but is much faster as Newton’salgorithm requires ≃ Algorithm 6

Sampling the CReM ensemble for m = 1 . . . | E | do W = ; for i = 1 . . . N do for j = 1 . . . N and j < i do if RandomUniform[0 , ≤ p UBCM iα then w ij = w ji = RandomExponential[ θ i + θ j ]; else w ij = w ji = 0; end if end for end for Ensemble[ m ] = W ; end for The ‘NEMTROPY’ Python package.

As an addi-tional result, we release a comprehensive package, codedin Python, that implements the three aforementionedalgorithms on all the ERGMs considered in the presentwork. Its name is ‘NEMTROPY’, an acronym standingfor ‘Network Entropy Maximization: a Toolbox RunningOn Python’, and is freely downloadable at the followingURL: https://github.com/nicoloval/NEMtropy .Alternative techniques to improve accuracy and speedhave been tested as well, as the one of coupling two ofthe algorithms considered above. In particular, we havetried to solve the (undirected version of the) CReM byrunning the ﬁxed-point algorithm and using the solutionof the latter as input for the quasi-Newton method. Theresults are reported in table VIII: as they clearly show,the coupled algorithm is indeed more accurate that thesingle methods composing it and much faster than thequasi-Newton one (for some snapshots, more accurateand even faster than Newton’s method). Techniqueslike these are, in general, useful to individuate betterinitial conditions than the completely random ones: aﬁrst run of the fastest method may be, in fact, useful todirect the most accurate algorithm towards the (best)solution. This is indeed the case, upon consideringthat the quasi-Newton method, now, stops because thecondition ||∇ L ( ~θ ) || ≤ − is satisﬁed - and not forhaving reached the limit of 10000 steps.We would like to end the discussion about the resultspresented in this contribution by explicitly mentioning acircumstance that is frequently met when studying eco-nomic and ﬁnancial networks. When considering sys-tems like these, the information about the number ofneighbours of each node is typically not accessible: asa consequence, the models constraining both binary andweighted information cannot be employed as they havepresented in this contribution.Alternatives exist and rest upon the existence of somekind of relationship between binary and weighted con-straints. In the case of undirected networks, such a rela-tionship is usually written as3 Fixed-point + Quasi-NewtonMRDE MADE MRSE Time (s)BLN 24-01-18 ≃ · − ≃ . · − ≃ · − ≃ . ≃ . · − ≃ . · − ≃ · − ≃ . ≃ · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ ≃ . · − ≃ . · − ≃ . · − ≃ ≃ . · − ≃ . · − ≃ . · − ≃ ≃ . · − ≃ . · − ≃ . · − ≃ ≃ . · − ≃ . · − ≃ . · − ≃ ||∇ L ( ~θ ) || ≤ − is satisﬁed. As the results reveal, it is more accurate that the single methods composing it and much faster than the quasi-Newton one - for some snapshots, more accurate and even faster than Newton’s method. Only the results corresponding to thebest choice of initial conditions are reported. e − θ i = √ zs i (79)and establishes that the Lagrange multipliers controllingfor the degrees are linearly proportional to the strengths.If this is the case (or a valid reason exists for this to bethe case), the expression for the probability that any twonodes are connected becomes p dcGM ij = zs i s j zs i s j ∀ i < j (80)the acronym standing for degree-corrected Gravity Model [43]. The (only) unknown parameter z must be numer-ically estimated by employing some kind of topologicalinformation; this is usually represented by (a proxy of)the network link density, used to instantiate the (only)likelihood condition L ( A ∗ ) = h L i = N X i =1 N X j =1( j
The deﬁnition and correct implementation of null mod-els is a crucial issue in network analysis: the present con-tribution focuses on (a subset of) the ones constitutingthe so-called ERG framework - a choice driven by the ev-idence that they are the most commonly employed ones for tasks as diﬀerent as network reconstruction, patterndetection, graph enumeration. The optimization of thelikelihood function associated to them is, however, stillproblematic since it involves the resolution of large sys-tems of coupled, non-linear equations.Here, we have implemented and compared three algo-rithms for numerical optimization, with the aim of ﬁnd-ing the one performing best (i.e. being both accurate andfast) for each model. What emerges from our results isthat there is no a unique method which is both accurateand fast for all models on all conﬁgurations: under thisrespect, performance is just a trade-oﬀ between accuracyand speed. However, some general conclusions can stillbe drawn.Newton’s method is the one requiring the largestamount of information per step (in fact, the entire Hes-sian matrix has to be evaluated at each iteration): hence,it is the most accurate one but, for the same reason, of-ten the one characterized by the slowest performance. Amajor drawback of Newton’s method is that of scalingvery badly with the network size.At the opposite extreme lies the ﬁxed-point algo-rithm, theoretically the fastest one but, often, amongthe least accurate ones (at least, for what concernsthe weighted constraints); the performance if the quasi-Newton method often lies in-between the performancesof the two methods above, by achieving an accuracy thatis larger than the one achieved by the ﬁxed-point algo-rithm, requiring less time that the one needed by New-ton’s method.Overall, while Newton’s method seems to perform beston either relatively small or greatly reducible networks,the ﬁxed-point method must be preferred for large, non-reducible conﬁgurations. Deviations from this (over sim-pliﬁed) picture are, however, clearly visible.Future work concerns the application of the aforemen-tioned three numerical recipes to the models that havenot found place here. For what concerns the set of purelybinary constraints, the ones deﬁning the

Reciprocal Bi- nary Conﬁguration Model (RBCM) [15] deserve to bementioned. For what concerns the ‘mixed’ constraints,instead, the CReM framework is versatile enough to al-low for a test of a number of variants of the methodsabove, ranging from the discrete versions of the CReMmodels considered here - deﬁned by the positions h w ij i undd-CReM = f ij − e − β j − β i (82)with f ij ≡ p UBCM ij and h w ij i dird-CReM = f ij − e − γ j − δ i (83)with f ij ≡ p DBCM ij - to the continuous generalizations ofthe UECM and the DECM, respectively deﬁned by thepositions [19] p UECM ij = e − α i − α j β i + β j + e − α i − α j (84)and p DECM ij = e − α i − β j γ i + δ j + e − α i − β j . (85) AUTHORS CONTRIBUTIONS

FS, TS, GC, MZ developed the methods and de-signed the research. NV, EM, MB, GT performed theanalysis (NV: DBCM, DECM, RBCM; EM: UBCM,UECM, CReM; MB: BiCM; GT: preliminary version ofthe BiCM). FS, TS, GC, MZ wrote the manuscript. Allauthors reviewed and approved the manuscript.

ACKNOWLEDGEMENTS

FS and TS acknowledge support from the Europeanproject SoBigData++ (GA. 871042). GC and MZ ac-knowledge support from the PAI project ORCS (‘Opti-mized Reconstruction of Complex networkS’), funded bythe IMT School for Advanced Studies Lucca. FS is alsothankful to the Italian ‘Programma di Attivit`a Integrata’(PAI) project ‘TOol for Fighting FakEs’ (TOFFE),funded by IMT School for Advanced Studies Lucca forits support.

APPENDIX A: COMPUTINGTHE HESSIAN MATRIX

As we showed in the main text, the Hessian matrix ofour likelihood function is ‘minus’ the covariance matrixof the constraints, i.e. H ij = ∂ L ( ~θ ) ∂θ i ∂θ j = − Cov[ C i , C j ] , i, j = 1 . . . M ; (86)interestingly, a variety of alternative methods exists toexplicitly calculate the generic entry H ij , i.e. 1) takingthe second derivatives of the likelihood function charac-terizing the method under analysis, 2) taking the ﬁrstderivatives of the expectation values of the constraintscharacterizing the method under analysis, 3) calculatingthe moments of the pair-speciﬁc probability distributionscharacterizing each method. UBCM: binary undirected graphswith given degree sequence

The Hessian matrix for the UBCM is an N × N sym-metric table with entries reading H UBCM =  Var[ k i ] = P Nj =1( j = i ) p ij (1 − p ij ) , ∀ i Cov[ k i , k j ] = p ij (1 − p ij ) , ∀ i = j (87)where p ij ≡ p UBCM ij . Notice that Var[ k i ] = P Nj =1( j = i ) Cov[ k i , k j ] , ∀ i . DBCM: binary directed graphswith given in-degree and out-degree sequences

The Hessian matrix for the DBCM is a 2 N × N sym-metric table that can be further subdivided into four N × N blocks whose entries read H DBCM =  Var[ k outi ] = P Nj =1( j = i ) p ij (1 − p ij ) , ∀ i Var[ k ini ] = P Nj =1( j = i ) p ji (1 − p ji ) , ∀ i Cov[ k outi , k inj ] = p ij (1 − p ij ) , ∀ i = j Cov[ k outj , k ini ] = p ji (1 − p ji ) , ∀ i = j (88)while Cov[ k outi , k ini ] = Cov[ k outi , k outj ] = Cov[ k ini , k inj ] =0 and p ij ≡ p DBCM ij .Notice that the Hessian matrix of the BiCM mimicksthe DBCM one, the only diﬀerence being that the prob-ability coeﬃcients are now indexed by i and α : for ex-ample, in the BiCM case, one has that Cov[ k i , d α ] = p iα (1 − p iα ), ∀ i, α . UECM: weighted undirected graphs with given strengths and degrees The Hessian matrix for the UECM is a 2 N × N sym-metric table that can be further subdivided into fourblocks (each of which with dimensions N × N ). In or-der to save space, the expressions indexed by the singlesubscript i will be assumed as being valid ∀ i , while theones indexed by a double subscript i, j will be assumedas being valid ∀ i = j . The entries of the diagonal blocksread H UECM =  ∂ L UECM ∂α i = Var[ k i ] = P Nj =1( j = i ) p ij (1 − p ij ) ∂ L UECM ∂α i α j = Cov[ k i , k j ] = p ij (1 − p ij ) (89)and H UECM =  ∂ L UECM ∂β i = Var[ s i ] = P Nj =1( j = i ) p ij (1 − p ij + e − βi − βj )(1 − e − βi − βj ) ∂ L UECM ∂β i β j = Cov[ s i , s j ] = p ij (1 − p ij + e − βi − βj )(1 − e − βi − βj ) (90)where p ij ≡ p UECM ij . On the other hand, the entries ofthe oﬀ-diagonal blocks read H UECM =  ∂ L UECM ∂α i ∂β i = Cov[ k i , s i ] = P Nj =1( j = i ) p ij (1 − p ij )1 − e − βi − βj ∂ L UECM ∂α i ∂β j = Cov[ k i , s j ] = p ij (1 − p ij )1 − e − βi − βj (91)with p ij ≡ p UECM ij . DECM: weighted directed graphswith given strengths and degrees

The Hessian matrix for the DECM is a 4 N × N sym-metric table that can be further subdivided into fourblocks (each of which with dimensions N × N ). As for theUECM, in order to save space, the expressions indexedby the single subscript i will be assumed as being valid ∀ i , while the ones indexed by a double subscript i, j willbe assumed as being valid ∀ i = j . The entries of thediagonal blocks read H DECM =  ∂ L DECM ∂α i = Var[ k outi ] = P Nj =1( j = i ) p ij (1 − p ij ) ∂ L DECM ∂α i α j = Cov[ k outi , k outj ] = 0 (92)and H DECM =  ∂ L DECM ∂β i = Var[ k ini ] = P Nj =1( j = i ) p ji (1 − p ji ) ∂ L DECM ∂β i β j = Cov[ k ini , k inj ] = 0 (93) and H DECM =  ∂ L DECM ∂γ i = Var[ s outi ] = P Nj =1( j = i ) p ij (1 − p ij + e − γi − δj )(1 − e − γi − δj ) ∂ L DECM ∂γ i γ j = Cov[ s outi , s outj ] = 0 (94)and H DECM =  ∂ L DECM ∂δ i = Var[ s ini ] = P Nj =1( j = i ) p ji (1 − p ji + e − γj − δi )(1 − e − γj − δi ) ∂ L DECM ∂δ i δ j = Cov[ s ini , s inj ] = 0 (95)where p ij ≡ p DECM ij . On the other hand, the entries ofthe oﬀ-diagonal blocks read H DECM = ( ∂ L DECM ∂α i ∂β i = Cov[ k outi , k ini ] = 0 ∂ L DECM ∂α i ∂β j = Cov[ k outi , k inj ] = p ij (1 − p ij )(96)and H DECM =  ∂ L DECM ∂α i ∂γ i = Cov[ k outi , s outi ] = P Nj =1( j = i ) p ij (1 − p ij )1 − e − γi − δj ∂ L DECM ∂α i ∂γ j = Cov[ k outi , s outj ] = 0 (97)and H DECM = ( ∂ L DECM ∂α i ∂δ i = Cov[ k outi , s ini ] = 0 ∂ L DECM ∂α i ∂δ j = Cov[ k outi , s inj ] = p ij (1 − p ij )1 − e − γi − δj (98)and H DECM = ( ∂ L DECM ∂β i ∂γ i = Cov[ k ini , s outi ] = 0 ∂ L DECM ∂β i ∂γ j = Cov[ k ini , s outj ] = p ji (1 − p ji )1 − e − γj − δi (99)and H DECM =  ∂ L DECM ∂β i ∂δ i = Cov[ k ini , s ini ] = P Nj =1( j = i ) p ji (1 − p ji )1 − e − γj − δi ∂ L DECM ∂β i ∂δ j = Cov[ k ini , s inj ] = 0 (100)and H DECM =  ∂ L DECM ∂γ i ∂δ i = Cov[ s outi , s ini ] = 0 ∂ L DECM ∂γ i ∂δ j = Cov[ s outi , s inj ] = p ij (1 − p ij + e − γi − δj )(1 − e − γi − δj ) (101)with p ij ≡ p DECM ij .6 Two-step models forundirected and directed networks

The Hessian matrix for the undirected two-step modelconsidered here is an N × N symmetric table reading H undCReM =  Var[ s i ] = P Nj =1( j = i ) f ij ( θ i + θ j ) , ∀ i Cov[ s i , s j ] = f ij ( θ i + θ j ) , ∀ i = j (102)where f ij is given. In the directed case, instead, the Hes-sian matrix for the two-step model considered here is a2 N × N symmetric table that can be further subdividedinto four N × N blocks whose entries read H dirCReM  Var[ s outi ] = P Nj =1( j = i ) f ij ( α i + β j ) , ∀ i Var[ s ini ] = P Nj =1( j = i ) f ji ( α j + β i ) , ∀ i Cov[ s outi , s inj ] = f ij ( α i + β j ) , ∀ i = j (103)while Cov[ s outi , s ini ] = Cov[ s outi , s outj ] = Cov[ s ini , s inj ] = 0and f ij is given. APPENDIX B: A NOTE ONTHE CHANGE OF VARIABLES

In all methods we will considered in the present work,the variable θ i appears in the optimality conditions onlythrough negative exponential functions: it is thereforetempting to perform the change of variable x i ≡ e − θ i .Although this is often performed in the literature, onecannot guarantee that the new optimization problemremains convex: in fact, simple examples can be pro-vided for which convexity is lost. This has several conse-quences, e.g. 1) convergence to the global maximum is nolonger guaranteed (since the existence of a global max-imum is no longer guaranteed as well), 2) extra-care isneeded to guarantee that the Hessian matrix H employedin our algorithms is negative deﬁnite. While problem2) introduces additional complexity only for Newton’smethod, problem 1) is more serious from a theoreticalpoint of view.Let us now address problem 1) in more detail. First,it is possible to prove that any stationary point for L ( ~x )satisﬁes the optimality conditions for L ( ~θ ) as well. Infact, the application of the ‘chain rule’ leads to recoverthe set of relationships ∂ L ( ~θ ) ∂θ i = ∂x i ∂θ i ∂ L ( ~x ) ∂x i = − x i ∂ L ( ~x ) ∂x i , i = 1 . . . M ;(104)notice that requiring ∇ θ i L ( ~θ ) = 0 leads to require thateither ∇ x i L ( ~x ) = 0 or x i = 0. As the second eventual-ity precisely identiﬁes isolated nodes (i.e. the nodes for which the constraint C i ( G ∗ ), controlled by the multiplier θ i , is 0), one can get rid of it by explicitly removing thecorresponding addenda from the likelihood function.For what concerns convexity, let us explicitly calculatethe Hessian matrix for the set of variables { x i } Mi =1 . Informulas, ∂ L ( ~x ) ∂x i = e θ i ∂ L ( ~θ ) ∂θ i + ∂ L ( ~θ ) ∂θ i ! , i = 1 . . . M,∂ L ( ~x ) ∂x i ∂x j = e θ i + θ j ∂ L ( ~θ ) ∂θ i ∂θ j ! , ∀ i = j (105)according to the ‘chain rule’ for second-order derivatives.More compactly, H L ( ~x ) = e Θ ◦ (cid:16) − Cov[ C i , C j ] + I · ∇ ~θ L ( ~θ ) (cid:17) (106)where I is the identity matrix, the generic entry of thematrix e Θ reads (cid:2) e Θ (cid:3) ij ≡ e θ i + θ j , ∀ i, j and the symbol‘ ◦ ’ indicates the Hadamard (i.e. element-wise) productof matrices. In general, the expression above deﬁnes an indeﬁnite matrix, i.e. a neither positive nor negative(semi)deﬁnite one. APPENDIX C: FIXED POINT METHODIN THE MULTIVARIATE CASE

Equation (21) can be written as θ ( n ) i = G i ( ~θ ( n − ) , i = 1 . . . N ; (107)for the sake of illustration, let us discuss it for the UBCMcase. In this particular case, the set of equations abovecan be rewritten as θ ( n ) i = − ln  k i ( A ∗ ) P Nj =1( j = i ) (cid:18) e − θ ( n − j e − θ ( n − i − θ ( n − j (cid:19)  , i = 1 . . . N. (108)Since all components of the map G are continuous on R N , the map itself is continuous on R N . Hence, a ﬁxedpoint exists. Let us now consider its Jacobian matrix andcheck the magnitude of its elements. In the UBCM case,one ﬁnds that7 ∂G i ∂θ i = P Nj =1( j = i ) e − θi − θj ( e − θi − θj ) P Nj =1( j = i ) (cid:16) e − θj e − θi − θj (cid:17) = P Nj =1( j = i ) (cid:16) e − θi − θj e − θi − θj (cid:17) P Nj =1( j = i ) (cid:16) e − θi − θj e − θi − θj (cid:17) = 1 − Var[ k i ] h k i i = 1 − P j = i Cov[ k i , k j ] h k i i , ∀ i (109)and ∂G i ∂θ j = − e − θj ( e − θi − θj ) P Nj =1( j = i ) (cid:16) e − θj e − θi − θj (cid:17) = − e − θi − θj ( e − θi − θj ) P Nj =1( j = i ) (cid:16) e − θi − θj e − θi − θj (cid:17) = − Cov[ k i , k j ] h k i i , ∀ i, j. (110)Let us notice that 1) each element of the Jacobian ma-trix is a continuous function R N → R and that 2) thefollowing relationships hold (cid:12)(cid:12)(cid:12)(cid:12) ∂G i ∂θ j (cid:12)(cid:12)(cid:12)(cid:12) ≤ , ∀ i, j ; (111)unfortunately, however, when multivariate functions areconsidered, the set of conditions above is not enough toensure convergence to the ﬁxed point for any choice ofthe initial value of the parameters. What is needed to bechecked is the condition || J G ( ~θ ) || <

1, with J indicatingthe Jacobian of the map (i.e. the matrix of the ﬁrst, par-tial derivatives above) and || . || any natural matrix norm:the validity of such a condition has been numerically ver-iﬁed case by case. ∗ [email protected][1] M.E.J. Newman, Networks: an introduction , Oxford Uni-versity Press (2010).[2] V. Colizza, A. Barrat, M. Barthelemy, A. Vespignani,

Proceedings of the National Academy of Sciences (7),2015-2020 (2006).[3] A. Barrat, M. Barthlemy, A. Vespignani,

Dynamical pro-cesses on complex networks , Cambridge University Press(2008).[4] R. Pastor-Satorras, C. Castellano, P. Van Mieghem, A.Vespignani,

Rev. Mod. Phys. , 925 (2015).[5] C. Castellano, S. Fortunato, V. Loreto Rev. Mod. Phys. , 591 (2009).[6] T. Squartini, I. van Lelyveld, D. Garlaschelli, Sci. Rep. , 3357 (2013).[7] G. Cimini,T. Squartini, F. Saracco, D. Garlaschelli, A.Gabrielli, G. Caldarelli. Nat. Rev. Phys. (1), 58-71(2019). [8] S. Maslov and K. Sneppen, Science (5569), 910-913(2002).[9] A.C.C. Coolen, A. De Martino, A. Annibale,

J. Stat.Phys. , 1035-1067 (2009).[10] E.S. Roberts, A.C.C. Coolen,

Phys. Rev. E (4),046103 (2012).[11] Y. Artzy-Randrup, L. Stone, Phys. Rev. E (5), 056708(2005).[12] C.I. Del Genio, H. Kim, Z. Toroczkai, K.E. Bassler, PLoSOne (4), e10012 (2010).[13] H. Kim, C.I. Del Genio, K.E. Bassler, Z. Toroczkai, NewJ. Phys. , 023012 (2012).[14] J. Blitzstein, P. Diaconis, Internet Mathematics (4),489-522 (2011).[15] T. Squartini, D. Garlaschelli, New. J. Phys. , 083001(2011).[16] J. Park, M.E.J. Newman, Phys. Rev. E (6), 066117(2004).[17] G. Bianconi, Europhys. Lett. (2), 28005 (2007).[18] A. Fronczak, P. Fronczak, J.A. Holyst, Phys. Rev. E (1), 016108 (2006).[19] A. Gabrielli, R. Mastrandrea, G. Caldarelli, G. Cimini, Phys. Rev. E (3), 030301(R) (2019).[20] A. Fronczak, Exponential random graph models in Ency-clopedia of Social Network Analysis and Mining, Springer(edited by R. Alhajj and J. Rokne) (2014).[21] E.T. Jaynes,

Phys. Rev. (4), 620-630 (1957).[22] N. Dianati, arXiv:1607.01735 (2016).[23] N. Vallarano, C. Tessone, T. Squartini, arXiv:2005.00114 (2020).[24] D. Garlaschelli, M.I. Loﬀredo,

Phys. Rev. E (1),015101(R) (2008).[25] J. Nocedal, S.J. Wright, Numerical Optimization ,Springer (2006).[26] S. Boyd, L. Vandenberghe,

Convex Optimization , Cam-bridge University Press (2004).[27] F. Chung, L. Lu,

Ann. Combinatorics , , 125-145 (2002).[28] K. Oshio, Y. Iwasaki, S. Morita, Y. Osana, S. Gomi, E.Akiyama, K. Omata, K. Oka and K. Kawamura, Tech.Rep. of CCeP, Keio Future , (Keio University, 2003).[29] V. Colizza, R. Pastor–Satorras and A. Vespignani, Nat.Phys. , 276-282 (2007).[30] http://dip.doe-mbi.ucla.edu/dip/Main.cgi[31] V. Colizza, A. Flammini, M.A. Serrano and A. Vespig-nani, Nat. Phys. , 110-115 (2006).[32] J.-H. Lin, K. Primicerio, T. Squartini, C. Decker,C.J.Tessone, New J. Phys. , 083022 (2020).[33] T. Squartini, G. Fagiolo, D. Garlaschelli, Phys. Rev. E , 046117 (2011).[34] A. Bovet, C. Campajola, F. Mottes, V. Restocchi, N.Vallarano, T. Squartini, C,J. Tessone, arXiv:1907.03577 (2019).[35] G. Caldarelli, R. de Nicola, M. Petrocchi, M. Pratelli, F.Saracco, arXiv:2010.01913 (2020).[36] F. Saracco, R. Di Clemente, A. Gabrielli, T. Squartini, Sci. Rep. , 10595 (2015).[37] R. Mastrandrea, T. Squartini, G. Fagiolo, D. Gar-laschelli, New J. Phys. , 043022 (2014).[38] D. Garlaschelli, M.I. Loﬀredo, Phys. Rev. Lett. (3),038701 (2009).[39] R. Mastrandrea, T. Squartini, G. Fagiolo, D. Gar-laschelli,

Phys. Rev. E (6), 062804 (2014).[40] K. Gleditsch, J. Conﬂict Resol. , 712-24 (2002).[41] G. Iori, G. De Masi, O. Vasile Precup, G. Gabbi, G. Caldarelli,

J. Econ. Dyn. Control (1), 259-278 (2006).[42] F. Parisi, T. Squartini, D. Garlaschelli, New J. Phys. ,053053 (2020).[43] G. Cimini, T. Squartini, A. Gabrielli, D. Garlaschelli, Phys. Rev. E , 040802 (2015).[44] G. Cimini, T. Squartini, A. Gabrielli, D. Garlaschelli, Sci. Rep.5

Related Researches

Point Cloud Transformers applied to Collider Physics

by Vinicius Mikuni

Sequence-based Machine Learning Models in Jet Physics

by Rafael Teixeira de Lima

Generalized asymptotic formulae for estimating statistical significance in high energy physics analyses

by M. J. Basso

Learning to Isolate Muons

by Julian Collado

Deep learning polarization distributions in ferroelectrics from STEM data: with and without atom finding

by Ayana Ghosh

Predicting high-dimensional heterogeneous time series employing generalized local states

by Sebastian Baur

Progresses and Challenges in Link Prediction

by Tao Zhou

Comparative visualization of epidemiological data during various stages of a pandemic

by Thomas Kreuz

Data-driven formulation of natural laws by recursive-LASSO-based symbolic regression

by Yuma Iwasaki

Characterization of anomalous diffusion classical statistics powered by deep learning (CONDOR)

by Alessia Gentili

ordpy: A Python package for data analysis with permutation entropy and ordinal network methods

by Arthur A. B. Pessa

Robust test statistics for data sets with missing correlation information

by Lukas Koch

Persistent Homology of Weighted Visibility Graph from Fractional Gaussian Noise

by H. Masoomy

Bayesian inference of 1D activity profiles from segmented gamma scanning of a heterogeneous radioactive waste drum

by Eric Laloy

A data-driven convergence criterion for iterative unfolding of smeared spectra

by M. Licciardi

Fast and scalable likelihood maximization for Exponential Random Graph Models

by Nicolò Vallarano

Development of a Vertex Finding Algorithm using Recurrent Neural Network

by Kiichi Goto

Inference of stochastic time series with missing data

by Sangwon Lee

Discovering dependencies in complex physical systems using Neural Networks

by Sachin Kasture

Pattern Ensembling for Spatial Trajectory Reconstruction

by Shivam Pathak

Enhancing the accuracy of a data-driven reconstruction of bivariate jump-diffusion models with corrections for higher orders of the sampling interval

by Esra Aslim

Interlaboratory consensus building challenge

by Giovanni Mana

Reconstruction of nanoscale particles from single-shot wide-angle FEL diffractions patterns with physics-informed neural networks

by Thomas Stielow

MLPF: Efficient machine-learned particle-flow reconstruction using graph neural networks

by Joosep Pata

Ensemble learning and iterative training (ELIT) machine learning: applications towards uncertainty quantification and automated experiment in atom-resolved microscopy

by Ayana Ghosh

«

1

2

3

4

»

Submitted on 29 Jan 2021 (v1), last revised 17 May 2021 (this version, v2) Updated

arXiv.org Original Source

INSPIRE HEP

NASA ADS

Google Scholar

Semantic Scholar