Fast and scalable likelihood maximization for Exponential Random Graph Models with local constraints
Nicolò Vallarano, Matteo Bruno, Emiliano Marchese, Giuseppe Trapani, Fabio Saracco, Giulio Cimini, Mario Zanon, Tiziano Squartini
aa r X i v : . [ phy s i c s . d a t a - a n ] J a n Fast and scalable likelihood maximizationfor Exponential Random Graph Models
Nicol`o Vallarano, Matteo Bruno, Emiliano Marchese, Giuseppe Trapani, Fabio Saracco, Tiziano Squartini,
1, 2, ∗ Giulio Cimini, and Mario Zanon IMT School for Advanced Studies Lucca, P.zza San Francesco 19, 55100 Lucca (Italy) Institute for Advanced Study (IAS), University of Amsterdam,Oude Turfmarkt 145, 1012 GC Amsterdam (The Netherlands) Physics Department and INFN, ‘Tor Vergata’ University of Rome, 00133 Rome (Italy) (Dated: February 1, 2021)Exponential Random Graph Models (ERGMs) have gained increasing popularity over the years.Rooted into statistical physics, the ERGMs framework has been successfully employed for recon-structing networks, detecting statistically significant patterns in graphs, counting networked con-figurations with given properties. From a technical point of view, the ERGMs workflow is definedby two subsequent optimization steps: the first one concerns the maximization of Shannon entropyand leads to identify the functional form of the ensemble probability distribution that is maximallynon-committal with respect to the missing information; the second one concerns the maximizationof the likelihood function induced by this probability distribution and leads to its numerical deter-mination. This second step translates into the resolution of a system of O ( N ) non-linear, coupledequations (with N being the total number of nodes of the network under analysis), a problem thatis affected by three main issues, i.e. accuracy , speed and scalability . The present paper aims at ad-dressing these problems by comparing the performance of three algorithms (i.e. Newton’s method, aquasi-Newton method and a recently-proposed fixed-point recipe) in solving several ERGMs, definedby binary and weighted constraints in both a directed and an undirected fashion. While Newton’smethod performs best for relatively little networks, the fixed-point recipe is to be preferred whenlarge configurations are considered, as it ensures convergence to the solution within seconds fornetworks with hundreds of thousands of nodes (e.g. the Internet, Bitcoin). We attach to the papera Python code implementing the three aforementioned algorithms on all the ERGMs considered inthe present work. PACS numbers: 02.10.Ox, 89.75.Hc, 02.70.Rr, 05.10.-a
INTRODUCTION
Over the last 20 years, network theory has emerged asa successful framework to address problems of scientificand societal relevance [1]: examples of processes that areaffected by the structural details of the underlying net-work are provided by the spreading of infectious diseases[2–4], opinion dynamics [5], the propagation of losses dur-ing financial crises [6], etc.Within such a context, two needs have emerged quitenaturally [7]: 1) detecting the topological properties of areal network that can be deemed as statistically signifi-cant - typically those higher-order properties that cannotbe explained by local features of the nodes as the degrees(e.g. the assortativity, the clustering coefficient, etc.), 2)inferring the relevant details of a given network structurein case only partial information is available. Both goalscan be achieved by constructing a framework for defin-ing benchmarks , i.e. synthetic configurations retainingonly some of the properties of the original system - theso-called constraints - and, otherwise, being maximallyrandom.Two different kinds of approaches have been pro-posed so far, i.e. the microcanonical and the canonical ones. Microcanonical approaches [8–14] artificially gener- ate many randomized variants of the observed network byenforcing the constraints in a ‘hard’ fashion, i.e. by creat-ing configurations on each of which the constrained prop-erties are identical to the empirical ones. On the otherhand, canonical approaches [15–19] enforce constraints ina ‘soft’ fashion, by creating a set of configurations overwhich the constrained properties are on average identi-cal to the empirical ones. Softening the requirement ofmatching the constraints has a clear advantage: allow-ing the mathematical expression for the probability of ageneric configuration, G , to be obtained analytically, asa function of the enforced constraints.In this second case, a pivotal role is played by theformalism of the Exponential Random Graph Models(ERGMs) [20] whose popularity has steadily increasedover the years. The ERGMs mathematical frameworkdates back to Gibbs’ (re)formulation of statistical me-chanics and is based upon the variational principle knownas maximum entropy , stating that the probability distri-bution that is maximally non-committal with respect tothe missing information is the one maximizing the Shan-non entropy [21]. This allows self-consistent inferenceto be made, by assuming maximal ignorance about theunknown degrees of freedom of the system.In the context of network theory, the ERGMs frame-work has found a natural application to the resolution ofthe two aforementioned problems, i.e. 1) that of defining null models of the original network, in order to assessthe significance of empirical patterns - against the hy-pothesis that the network structure is solely determinedby the constraints themselves and 2) that of deriving themost probable configurations that are compatible withthe available details about a specific network.In both cases, after the functional form of the probabil-ity distribution has been identified, via the maximum en-tropy principle, one also needs to numerically determineit: to this aim, the likelihood maximization principle canbe invoked, stating to maximize the probability of ob-serving the actual configuration. This prescription typi-cally translates into the resolution of a system of O ( N )non-linear, coupled equations - with N representing thenumber of nodes of the network under analysis.Problems like these are usually affected by the is-sues of accuracy , speed and scalability : the present pa-per aims at addressing them at once, by comparing theperformance of three algorithms, i.e. Newton’s method,a quasi-Newton method and a recently-proposed fixed-point recipe [22, 23], to solve a variety of ERGMs, definedby binary and weighted constraints in both a directed andan undirected fashion.Additionally, we provide a Python code implementingthe three aforementioned recipes on all the ERGMs con-sidered in the present work. GENERAL THEORY
Canonical approaches aim at obtaining the mathemat-ical expression for the probability of a generic config-uration, G , as a function of the observed constraints:ERGMs realize this by maximizing the Shannon entropy[15, 16]. The Maximum Entropy Principle
Generally speaking, the problem to be solved in orderto find the functional form of the probability distributionto be employed as a benchmark readsarg max P S [ P ] (1a)s . t . X G P ( G ) C i ( G ) = h C i i , i = 0 . . . M (1b)where Shannon entropy reads S [ P ] = − X G P ( G ) ln P ( G ) (2) and ~C ( G ) is the vector of constraints representing theinformation defining the benchmark itself (notice that C = h C i = 1 sums up the normalization condition).The solution to the problem above can be found by max-imizing the Lagrangian function L ( P, ~θ ) ≡ S [ P ] + M X i =0 θ i " − X G P ( G ) C i ( G ) + h C i i (3)with respect to P ( G ). As a result one obtains P ( G | ~θ ) = e −H ( G ,~θ ) Z ( ~θ ) (4)with H ( G , ~θ ) = ~θ · ~C ( G ) = P Mi =1 θ i C i ( G ) represent-ing the Hamiltonian , i.e. the function summing up theproper, imposed constraints and Z ( ~θ ) = P G P ( G ) = P G e −H ( G ,~θ ) representing the partition function , ensur-ing that P ( G ) is properly normalized. Constraints play apivotal role, either representing the information to filter,in order to assess the significance of certain quantities,or the only available one, in order to reconstruct the in-accessible details of a given configuration. The Maximum Likelihood Principle
The formalism above is perfectly general; however, itcan be instantiated to study an empirical network con-figuration, say G ∗ . In this case, the Lagrange multipliers‘acting’ as unknown parameters in eq. (4) can be numer-ically estimated by maximizing the associated likelihoodfunction [15, 24]. The latter is defined as L ( ~θ ) ≡ ln P ( G ∗ | ~θ ) = −H ( G ∗ , ~θ ) − ln Z ( ~θ ) (5)and must be maximized with respect to the vector ~θ .Remarkably, whenever the probability distribution is ex-ponential (as the one deriving from the Shannon entropymaximization), the likelihood maximization problemarg max ~θ L ( ~θ ) (6)is characterized by first-order necessary conditions for op-timality reading ∇ θ i L ( ~θ ) = ∂ L ( ~θ ) ∂θ i = − C i ( G ∗ ) − ∂ ln Z ( ~θ ) ∂θ i = − C i ( G ∗ ) + X G C i ( G ) P ( G )= − C i ( G ∗ ) + h C i i = 0 , i = 1 . . . M (7)and leading to the system of equations ∇ L ( ~θ ) = ~ ⇒ ~C ( G ∗ ) = h ~C i (8)to be solved. These conditions, however, are sufficient tocharacterize a maximum only if L ( ~θ ) is concave. This isindeed the case, as we prove by noticing that H ij = ∇ θ i θ j L ( ~θ ) = ∂ L ( ~θ ) ∂θ i ∂θ j = − ∂ ln Z ( ~θ ) ∂θ i ∂θ j = ∂ h C j i ∂θ i = − Cov[ C i , C j ] , i, j = 1 . . . M (9)i.e. that the Hessian matrix, H , of our likelihood functionis ‘minus’ the covariance matrix of the constraints, hencenegative semidefinite by definition. The fourth passageis an example of the well-known fluctuation-response re-lation [20]. Combining the MEP and the MLP principles
The MEP and the MLP encode two different prescrip-tions aiming, respectively, at determining the functionalform of a probability distribution and its numerical value.In optimization theory, the problem (1) is known as pri-mal problem : upon noticing that the Shannon entropyis concave, while the imposed constraints are linear in P ( G ), one concludes that the primal problem is convex(it is easy to see this, by rewriting it as a minimizationproblem for − S [ P ]).As convexity implies strong duality , we can, equiva-lently, consider an alternative version of the problem tooptimize, know as dual problem . In order to define it, letus consider the Lagrangian function L ( P, ~θ ) ≡ S [ P ] + M X i =1 θ i " − X G P ( G ) C i ( G ) + C i ( G ∗ ) (10)where, now, the generic expectation of the i -th con-straint, h C i i , has been replaced by the correspondingempirical value, C i ( G ∗ ). As the dual function is givenby P ( G ∗ | ~θ ) ≡ arg max P L ( P, ~θ ) , (11)the dual problem readsarg max ~θ arg min P −L ( P ( ~θ ) , ~θ ) (12) which is a convex problem by construction; this is readilyseen by substituting eq. (4) into eq. (10), operation thatleads to the expression −L ( P ( ~θ ) , ~θ ) = − ~θ · ~C ( G ∗ ) − ln Z ( ~θ ) = L ( ~θ ) , (13)i.e. the likelihood function introduced in eq. (5). In otherwords, eq. (12) combines the MEP and the MLP into aunique optimization step whose score function becomesthe Lagrangian function defined in eq. (10). Optimization algorithms for non-linear problems
In general, the optimization problem defined in eq.(12) cannot be solved analytically, whence the need toresort to numerical methods. For an exhaustive reviewon numerical methods for optimization we refer the in-terested reader to [25, 26]: in the following, we presentonly the concepts that are of some relevance for us. Theproblem arg max ~θ L ( ~θ ) (14)is a Nonlinear Programming Problem (NLP). In order tosolve it numerically, we will adopt a Sequential QuadraticProgramming (SQP) approach. Starting from an initialguess ~θ (0) , SQP solves eq. (14) by iteratively updatingthe vector of Lagrange multipliers θ ( n +1) i = θ ( n ) i + α ∆ θ ( n ) i , i = 1 . . . M (15)according to the rule∆ θ ( n ) i = arg max ∆ θ i ∇ θ i L ( ~θ )∆ θ i + X j,k
12 ∆ θ j H ( n ) jk ∆ θ k (16) ∀ i , leading to the set of equations ∇ i L ( ~θ ) + X j H ( n ) ij ∆ θ j = 0 , i = 1 . . . M (17)which can be compactly rewritten as∆ ~θ ( n ) = − H ( n ) − ∇ L ( ~θ ) . (18)The stepsize α ∈ (0 ,
1] is selected to ensure that L ( ~θ ( n +1) ) > L ( ~θ ( n ) ) via a back-tracking, line-searchprocedure: starting from α = 1, if the Armijo condition L ( ~θ ( n ) + α ∆ ~θ ( n ) ) < L ( ~θ ( n ) ) + γα ∇ L ( ~θ ) ⊤ ∆ ~θ, (19)is violated, we set α ← βα ( γ ∈ (0 , .
5] and β ∈ (0 ,
1) arethe parameters of the algorithm). On the other hand,the term H ( n ) can be selected according to a variety ofmethods. In the present contribution we focus on thefollowing three ones. Newton’s method.
One speaks of Newton’s method incase H ( n ) is chosen to be H ( n ) = ∇ L ( ~θ ( n ) ) + ∆ H ( n ) (20)where ∇ L ( ~θ ) is the Hessian matrix of the likelihoodfunction and the term ∆ H ( n ) is typically selected assmall as possible in order to avoid slowing conver-gence and to ensure that H ( n ) is negative definite (i.e. ∇ L ( ~θ ( n ) ) + ∆ H ( n ) ≺ H ( n ) is alsoreferred to as ‘exact Hessian’. Quasi-Newton methods.
Any Hessian approximationwhich is negative definite (i.e. satisfying H ( n ) ≺ H ( n ) = − I , which yields the ‘steepest ascent’algorithm, here we have opted for the following recipe,i.e. the purely diagonal version of Newton’s method: H ( n ) ii = ∇ ii L ( ~θ ( n ) ) + ∆ H ( n ) ii < ∀ i and H ( n ) ij = 0, ∀ i = j . Fixed-point iteration on modified KKT conditions.
Inaddition to the (classes of) algorithms above, we willalso consider an iterative recipe which is constructedas a fixed-point iteration on a modified version of theKarush–Kuhn–Tucker (KKT) conditions, i.e. F ( ~θ ) = ~ ~θ = G ( ~θ ); the iterate can, then, be madeexplicit by rewriting the latter as ~θ ( n ) = G ( ~θ ( n − ) . (21)The condition above will be made explicit, for eachnetwork model, in the corresponding subsection. We alsoobserve that this choice yields a non-standard SQP met-hood as H ( n ) is typically not symmetric, for our models. APPLICATIONS
Let us now apply the algorithms described in the pre-vious section to a number of specific cases of interest.
UBCM: binary undirected graphs with given degree sequence
Let us start by considering binary, undirected networks(BUNs). The simplest, non-trivial set of constraints is represented by the degrees of nodes: the degree ofnode i , i.e. k i ( A ) = P Nj ( = i )=1 a ij , counts the numberof its neighbours and coincides with the total numberof 1s along the i -th row (or, equivalently, along the i -thcolumn) of the adjacency matrix A ≡ { a ij } Ni,j =1 . Thebenchmark defined by this set of constraints is known as Undirected Binary Configuration Model (UBCM) and itsHamiltonian reads H UBCM ( A , ~θ ) = N X i =1 θ i k i ( A ); (22)entropy maximization [15, 16] leads to the factorizedgraph probability P UBCM ( A | ~θ ) = N Y i =1 N Y j =1( j
Resolution of the UBCM.
Newton’s and the quasi-Newton method can be easily implemented via the recipedefined in eq. (18) (see Appendix A for the definition ofthe UBCM Hessian).The explicit definition of the fixed-point recipe, in-stead, requires a preliminary observation, i.e. that thesystem of equations embodying the UBCM first-order op-timality conditions can be re-written as follows e − θ i = k i ( A ∗ ) P Nj =1( j = i ) (cid:16) e − θj e − θi − θj (cid:17) , i = 1 . . . N (26)i.e. as a set of consistency equations. The observationthat the term e − θ i appears on both sides of the equationcorresponding to the i -th constraint suggests an iterativerecipe to solve such a system, i.e. θ ( n ) i = − ln k i ( A ∗ ) P Nj =1( j = i ) (cid:18) e − θ ( n − j e − θ ( n − i − θ ( n − j (cid:19) , i = 1 . . . N (27)originally proposed in [22] and further refined in [23].The identification p UBCM ij ≡ e − θ ( ∞ ) i − θ ( ∞ ) j e − θ ( ∞ ) i − θ ( ∞ ) j , ∀ i < j al-lows the probability coefficients defining the UBCM tobe numerically determined.As any other iterative recipe, the one proposed aboveneeds to be initialized as well. To this aim, we havetested three different sets of initial values: the first one isdefined by the position θ (0) i = − ln h k i ( A ∗ ) √ L i , ∀ i - usually,a good approximation of the solution of the system ofequations in (25), in the ‘sparse case’ (i.e. whenever p UBCM ij ≃ e − θ i − θ j [27]; the second one is a variant ofthe position above, reading θ (0) i = − ln h k i ( A ∗ ) √ N i , ∀ i ;the third one, instead, prescribes to randomly draw thevalue of each parameter from a uniform distributionwith support on the unit interval, i.e. θ (0) i ∼ U (0 , ∀ i . Reducing the dimensionality of the problem.
Theproblem defining the UBCM can be further simplified bynoticing that nodes with the same degree, say k , can beassigned the same value of the multiplier θ [24] - a resultresting upon the observation that any value k i ( A ∗ ) mustmatch the sum of monotonic, increasing functions. Thistranslates into the possibility of rewriting L UBCM ( ~θ ) ina ‘reduced’ fashion, as L reducedUBCM ( ~θ ) = − X k f ( k ) θ k k ( A ∗ ) − X k X k ′ ( k ′ ≤ k ) f ( k )[ f ( k ′ ) − δ kk ′ ] ·· ln (cid:2) e − θ k − θ k ′ (cid:3) (28)where the sums run over the distinct values of the de-grees and f ( k ) counts the number of nodes whose degreeis k . Rewriting the problem with respect to the set { θ k } k leads one to recover simplified versions of the three algo-rithms considered here: Newton’s and the quasi-Newtonmethods can, now, be solved via a ‘reduced’ version ofeq. (18) (since both the dimension of the gradient andthe order of the Hessian matrix of the likelihood functionare, now, less than N ), while the iterative recipe definedin (26) can be rewritten in terms of the ‘non-degenerate’degrees, as θ ( n ) k = − ln k ( A ∗ ) P k ′ [ f ( k ′ ) − δ kk ′ ] (cid:18) e − θ ( n − k ′ e − θ ( n − k − θ ( n − k ′ (cid:19) (29) ∀ k , where, at the denominator, the self-contribution(i.e. the probability that a node links to itself) has beenexplicitly excluded. Performance testing.
The performance of the threealgorithms, considered in the present paper, to solvethe reduced version of eqs. (25), has been tested ona bunch of real-world networks. The latter ones spana wide variety of systems, including natural, financialand technological ones. In particular, we have con-sidered the synaptic network of the worm
C. Elegans [28], the network of the largest US airports [29], theprotein-protein interaction network of the bacterium
H. Pylori [30], Internet at the level of AutonomousSystems [31] and eight daily snapshots of the so-calledBitcoin Lightning Network [32], chosen throughout itsentire history. Before commenting on the results of ournumerical exercises, let us, first, describe how the latterones have been carried out.The accuracy of each algorithm in reproducing the con-straints defining the UBCM has been quantified via the maximum absolute error metrics, defined, in a perfectlygeneral fashion, as max i {| C ∗ i − h C i i|} Ni =1 (where C ∗ i isthe empirical value of the i -th constraint, C i ). Naturally,in the UBCM case, C i = k i , ∀ i and the aforementionederror score becomesMADE = max i {| k ∗ i − h k i i|} Ni =1 (30) Newton Quasi-Newton Fixed-point
N L c c r MADE Time (s) MADE Time (s) MADE Time (s)
C. Elegans (nn) 265 1879 ≃ · − ≃ . · − ≃ . · − ≃ . ≃ · − ≃ . ≃ · − ≃ . ≃ · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . H. Pylori (pp) 732 1465 ≃ · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ · − ≃ . ≃ · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ · − ≃ . ≃ · − ≃ . · − ≃ · − ≃ . ≃ . · − ≃ . ≃ · − ≃ . ≃ · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ · − ≃ . · − ≃ . · − ≃ . ≃ · − ≃ . ≃ . · − ≃ . ≃ · − ≃ . · − ≃ . · − ≃ . ≃ · − ≃ . ≃ . · − ≃ . ≃ · − ≃ . · − ≃ . · − ≃ . ≃ · − ≃ . ≃ · − ≃ . ≃ · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . N , the total number oflinks, L , and the connectance, c = 2 L/N ( N − ||∇ L ( ~θ ) || ≤ − is satisfied or because the condition || ∆ ~θ || ≤ − is satisfied. For what concerns accuracy, the two most accuratemethods are Newton’s and the fixed-point ones; for what concerns speed, the fastest method is the fixed-point one (althoughNewton’s one approximately requires the same amount of time on each specific configuration). Only the results correspondingto the best choice of initial conditions are reported. (the acronym standing for Maximum Absolute DegreeError). Equivalently, it is the infinite norm of the dif-ference between the vector of the empirical values of theconstraints and that of their expected values.For each algorithm, we have considered three differentstopping criteria: the first one puts a condition on theEuclidean norm of the gradient of the likelihood function,i.e. ||∇ L ( ~θ ) || = vuut N X i =1 (cid:16) ∇ i L ( ~θ ) (cid:17) ≤ − ; (31)the second one puts a condition on the Euclidean normof the vector of differences between the values of the pa-rameters at subsequent iterations, i.e. || ∆ ~θ || = vuut N X i =1 (∆ θ i ) ≤ − ; (32)the third one concerns the maximum number of itera-tions: after 1000 steps, any of the three algorithms stops.The results about the performance of our three algo-rithms are reported in table I. Overall, all recipes per-form very satisfactorily, being accurate, fast and scalable;moreover, all algorithms stop either because the condi-tion on the norm of the likelihood is satisfied or becausethe condition on the norm of the vector of parameters issatisfied.For what concerns accuracy, the largest maximumerror per method spans an interval (across all config-urations) that amounts at 10 − . MADE reducedNewton . − , 10 − ≤ MADE reducedQuasi-Newton ≤ − and 10 − . MADE reducedfixed-point . − . By looking at each specific net-work, it is evident that the two most accurate methodsare systematically Newton’s and the fixed-point ones.For what concerns speed, the amount of time re-quired by each method to achieve convergence spansan interval (across all configurations) that is 0 . ≤ T reducedNewton ≤ .
01, 0 . ≤ T reducedQuasi-Newton ≤ .
15 and0 . ≤ T reducedfixed-point ≤ .
015 (time is measured in sec-onds). The fastest method is the fixed-point one, al-though Newton’s method approximately requires thesame amount of time, when compared to it on each spe-cific configuration. Differences in the speed of conver-gence of any method, caused by the choice of a partic-ular set of initial conditions, are indeed observable: theprescription reading θ (0) i = − ln h k i ( A ∗ ) √ N i , ∀ i outperformsthe other ones.Let us now comment on the scalability of our algo-rithms. What we learn from our exercise is that scalabil-ity is not related to the network size in a simple way: thefactors seemingly playing a major role are the ones affect-ing the reducibility of the original system of equations,i.e. the ones ‘deciding’ the number of different equationsthat actually need to be solved.While reducibility can be easily quantified a posteriori ,e.g. by calculating the coefficient of reduction , c r , definedas the ratio between the number of equations that sur-vive to reduction and the number of equations definingthe original problem (hence, the smaller the better), pro-viding an exhaustive list of the aforementioned factors apriori is much more difficult.One may argue that reducibility is affected by the het-erogeneity of the degree distribution; upon consideringthat the latter can be quantified by computing the co-efficient of variation , one may derive a simple rule ofthumb: a larger coefficient of variation (pointing out alarger heterogeneity of the degree distribution) leads toa larger coefficient of reduction and a larger amount oftime for convergence will be required. Although intu-itive, deviations from this picture should be neverthelessexpected. In fact, other factors may play a role: as anexample, hubs may be present, thus forcing the corre-sponding parameters to assume either very large or verysmall values - hence, slowing down the entire convergenceprocess.In this sense, scalability is the result of a (non-trivial)interplay between size and reducibility. Let us take alook at table I: Internet is the most reducible networkof our basket, although being the largest in size, whilethe neural network of C. Elegans is one of the leastreducible networks of our basket, although being thesecond smallest one; as a consequence, the actualnumber of equations defining the UBCM on
C. Elegans is ≃
30 while the actual number of equations definingthe UBCM on Internet is ≃
100 - whence the largeramount of time to solve the latter. Remarkably, thetime required by our recipes to ensure that the largestsystem of equations converges to the solution rangesfrom thousandths to tenths of seconds.As a last comment, we would like to stress that, un-like several popular approximations as the Chung-Lu one[27], the generic coefficient p UBCM ij always represents aproper probability, in turn implying that eq. (23) alsoprovides us with a recipe to sample the canonical en-semble of BUNs, under the UBCM. Notice that the fac-torization of the graph probability P UBCM ( A | ~θ ) greatlysimplifies the entire procedure, allowing a single graph tobe sampled by implementing the Bernoulli trial a ij = ( − p UBCM ij p UBCM ij (33)for each (undirected) pair of nodes, in either a sequen-tial or a parallel fashion. The sampling process, whosecomputational complexity amounts at O ( N ), can be re-peated to generate as many configurations as desired.The pseudo-code for explicitly sampling the UBCM en-semble is summed up by Algorithm 1. It is defined as c v = s/m , where s and m are, respectively, thestandard deviation and the mean of the degree distribution ofthe network at hand. Algorithm 1
Sampling the UBCM ensemble for m = 1 . . . | E | do A = ; for i = 1 . . . N do for j = 1 . . . N and j < i do if RandomUniform[0 , ≤ p UBCM ij then a ij = a ji = 1; else a ij = a ji = 0; end if end for end for Ensemble[ m ] = A ; end for DBCM: binary directed graphswith given in-degree and out-degree sequences Let us now move to consider binary, directed net-works (BDNs). In this case, the simplest, non-trivialset of constraints is represented by the in-degrees andthe out-degrees of nodes, where k ini ( A ) = P Nj ( = i )=1 a ji counts the number of nodes ‘pointing’ to node i and k outi ( A ) = P Nj ( = i )=1 a ij counts the number of nodes i ‘points’ to. The benchmark defined by this set of con-straints is known as Directed Binary Configuration Model (DBCM) whose Hamiltonian reads H DBCM ( A , ~α, ~β ) = N X i =1 [ α i k outi ( A ) + β i k ini ( A )]; (34)as in the undirected case, entropy maximization [15, 16]leads to a factorized probability distribution, i.e. P DBCM ( A | ~α, ~β ) = N Y i =1 N Y j =1( j = i ) p a ij ij (1 − p ij ) − a ij (35)where p ij = p DBCM ij ≡ e − αi − βj e − αi − βj . The canonical ensem-ble of BDNs is, now, the set of networks with the samenumber of nodes, N , of the observed graph and a num-ber of (directed) links varying from zero to the maximumvalue N ( N − A ∗ becomes L DBCM ( ~α, ~β ) = − N X i =1 [ α i k outi ( A ∗ ) + β i k ini ( A ∗ )] − N X i =1 N X j =1( j = i ) ln (cid:2) e − α i − β j (cid:3) (36)whose first-order optimality conditions read ∇ α i L DBCM = − k outi ( A ∗ ) + N X j =1( j = i ) e − α i − β j e − α i − β j = − k outi ( A ∗ ) + N X j =1( j = i ) p DBCM ij = − k outi ( A ∗ ) + h k outi i = 0 , i = 1 . . . N (37)and ∇ β i L DBCM = − k ini ( A ∗ ) + N X j =1( j = i ) e − α j − β i e − α j − β i = − k ini ( A ∗ ) + N X j =1( j = i ) p DBCM ji = − k ini ( A ∗ ) + h k ini i = 0 , i = 1 . . . N. (38) Resolution of the DBCM.
Newton’s and the quasi-Newton method can be easily implemented via the recipedefined in eq. (18) (see Appendix A for the definition ofthe DBCM Hessian).The fixed-point recipe for solving the system of equa-tions embodying the DBCM first-order optimality con-ditions can, instead, be re-written in the usual iterativefashion as follows α ( n ) i = − ln k outi ( A ∗ ) P Nj =1( j = i ) (cid:18) e − β ( n − j e − α ( n − i − β ( n − j (cid:19) , i = 1 . . . Nβ ( n ) i = − ln k ini ( A ∗ ) P Nj =1( j = i ) (cid:18) e − α ( n − j e − α ( n − j − β ( n − i (cid:19) , i = 1 . . . N (39)Analogously to the undirected case, the initialization ofthis recipe has been implemented in three different ways.The first one reads α (0) i = − ln h k outi ( A ∗ ) √ L i , i = 1 . . . N and β (0) i = − ln h k ini ( A ∗ ) √ L i , i = 1 . . . N and representsa good approximation to the solution of the system ofequations defining the DBCM in the ‘sparse case’ (i.e.whenever p DBCM ij ≃ e − α i − β j ); the second one is a variantof the position above, reading α (0) i = − ln h k outi ( A ∗ ) √ N i , i = 1 . . . N and β (0) i = − ln h k ini ( A ∗ ) √ N i , i = 1 . . . N ; thethird one, instead, prescribes to randomly draw thevalue of each parameter from a uniform distributiondefined on the unit interval, i.e. α (0) i ∼ U (0 , ∀ i and β (0) i ∼ U (0 , ∀ i . As for the UBCM, the identification p DBCM ij ≡ e − α ( ∞ ) i − β ( ∞ ) j e − α ( ∞ ) i − β ( ∞ ) j , ∀ i = j allows the probabilitycoefficients defining the DBCM to be numericallydetermined. Reducing the dimensionality of the problem.
As forthe UBCM, we can define a ‘reduced’ version of theDBCM likelihood, accounting only for the distinct (pairsof) values of the degrees . By defining k out ≡ k and k in ≡ h , in order to simplify the formalism, the reducedDBCM recipe reads L reducedDBCM ( ~θ ) = − X k X h n ( k, h )[ α k,h k ( A ∗ ) + β k,h h ( A ∗ )] − X k,h X k ′ ,h ′ n ( k, h )[ n ( k ′ , h ′ ) − δ kk ′ δ hh ′ ] ·· ln (cid:2) e − α k,h − β k ′ ,h ′ (cid:3) ; (40)the implementation of the algorithms considered heremust be modified in a way that is analogous to the onealready described for the UBCM. In particular, the fixed-point recipe for the DBCM can be re-written by assigningto the nodes with the same out- and in-degrees ( k, h ) thesame pair of values ( α, β ), i.e. as α ( n ) k,h = − ln k ( A ∗ ) P k ′ ,h ′ [ n ( k ′ , h ′ ) − δ kk ′ δ hh ′ ] e − β ( n − k ′ ,h ′ e − α ( n − k,h − β ( n − k ′ ,h ′ ! , ∀ k, h (41) β ( n ) k,h = − ln h ( A ∗ ) P k ′ ,h ′ [ n ( k ′ , h ′ ) − δ kk ′ δ hh ′ ] e − α ( n − k ′ ,h ′ e − α ( n − k,h − β ( n − k ′ ,h ′ ! , ∀ k, h (42)where the sums, now, run over the distinct values ofthe out- and in-degrees, n ( k, h ) is the number of nodeswhose out-degree is k and whose in-degree is h and, asusual, the last term at the denominator excludes theself-contribution (i.e. the probability that a node linksto itself). Performance testing.
As for the UBCM, the perfor-mance of the three algorithms in solving the reducedversion of eqs. (37) and (38) has been tested on a bunchof real-world networks. The latter ones span economic,financial and social networks. In particular, we haveconsidered the World Trade Web (WTW) during thedecade 1992-2002 [33], a pair of snapshots of the BitcoinUser Network at the weekly time scale (the first day ofthose weeks being 13-02-12 and 27-04-15, respectively)[34] and of the corresponding largest weakly connectedcomponent (whose size is, respectively, ≃
65% and ≃
90% of the full network size) and a snapshot of thesemantic network concerning the Twitter discussionabout the Covid-19 pandemics [35]. Before commentingon the results of our numerical exercises, let us, first,describe how the latter ones have been carried out.The accuracy of each algorithm in reproducing the con-straints defining the DBCM has been quantified via themaximum absolute error metrics that, in this case, readsMADE = max i {| k ∗ i − h k i i| , | h ∗ i − h h i i|} Ni =1 (43)and accounts for the presence of two different degreesper node. As for the UBCM, it is the infinite norm ofthe difference between the vector of the empirical valuesof the constraints and that of their expected values.The three different ‘stop criteria’ we have consideredmatch the ones adopted for analysing the undirected More precisely, nodes can be divided into equivalence classes, ac-cording to the following definition: any two nodes are equivalentif and only if they have the same couple of out- and in-degrees. It is the network of re-tweets of the (online) Italian debate aboutCovid-19, collected in the period 21 st February - 20 th April 2020. case and consist in a condition on the Euclideannorm of the gradient of the likelihood function, i.e. ||∇ L ( ~θ ) || ≤ − , in a condition on the Euclideannorm of the vector of differences between the val-ues of the parameters at subsequent iterations, i.e. || ∆ ~θ || ≤ − , and in a condition on the maximumnumber of iterations: after 10000 steps, any of the threealgorithms stops.The results about the performance of our three algo-rithms are reported in table II. Overall, all recipes per-form very satisfactorily, being accurate, fast and scalable;however, while Newton’s and the quasi-Newton methodsstop because the condition on the norm of the likelihoodis satisfied, the fixed-point recipe is always found to sat-isfy the limit condition on the number of steps (i.e. itruns for 10000 steps and, then, stops).Let us start by commenting the results concerningthe WTW. For what concerns accuracy, the largestmaximum error per method spans an interval, acrossall configurations, that amounts at MADE reducedNewton ≃ − , 10 − . MADE reducedQuasi-Newton . − andMADE reducedfixed-point ≃ − . By looking at each specificnetwork, it is evident that the most accurate method issystematically the quasi-Newton one.For what concerns speed, the amount of time requiredby each method to achieve convergence spans an interval(across all configurations) that is 0 . ≤ T reducedNewton ≤ . ≤ T reducedQuasi-Newton ≤ .
33 and 2 . ≤ T reducedfixed-point ≤ . Newton Quasi-Newton Fixed-point
N L c c r MADE Time (s) MADE Time (s) MADE Time (s)WTW ≃ . · − ≃ . · − ≃ . · − ≃ ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ · − ≃ ≃ · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ ≃ . · − ≃ . ≃ . · − ≃ . CCweek-1 ≃ . · − ≃ · − ≃ · − ≃ . ≃ . · − ≃ . ≃ · − ≃ . week-1 ≃ . · − ≃ . · − ≃ . · − ≃ ≃ . · − ≃ . ≃ · − ≃ . CCweek-2 ≃ · − ≃ · − ≃ · − ≃ ≃ . · − ≃ ≃ · − ≃ . week-2 ≃ · − ≃ . · − ≃ . · − ≃ ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ ≃ . · − ≃ ≃ . · − ≃ . N , the total numberof links, L , and the connectance, c = L/N ( N − ||∇ L ( ~θ ) || ≤ − is satisfied; the fixed-point recipe, instead, alwaysreaches the limit of 10000 steps. The fastest and most accurate method is systematically the quasi-Newton one. The picturechanges when very large networks, as Bitcoin and Twitter, are considered: in these cases, the fastest and most accurate methodis the fixed-point one. Only the results corresponding to the best choice of initial conditions are reported. i.e. they are very redundant, hosting many nodes withthe same out- and in-degrees. To provide a specific ex-ample, out of the original 676688 equations defining theDBCM for one of the two Bitcoin snapshots consideredhere, only ≃
339 equations survive the reduction; by con-verse, the WTW can be reduced to a much smaller extent(to be more specific, out of the original 324 equationsdefining the DBCM for the WTW in 1997, only ≃ O ( N ) forNewton’s method and O ( N ) for the quasi-Newton one,its calculation can be (very) time demanding - besiderequiring a lot of memory for the step-wise updateof the corresponding Hessian matrix. However, whilethis is compensated by a larger accuracy in the caseof Newton’s method, this is no longer true when the quasi-Newton recipe is considered - the reason maybelying in the poorer approximation provided by the di-agonal of the Hessian matrix in case of systems like these.As a last comment, we would like to stress that, as inthe undirected case, the generic coefficient p DBCM ij rep-resents a proper probability, in turn implying that eq.(35) also provides us with a recipe to sample the canon-ical ensemble of BDNs, under the DBCM. Notice thatthe factorization of the graph probability P DBCM ( A | ~θ )greatly simplifies the entire procedure, allowing a singlegraph to be sampled by implementing the Bernoulli trial a ij = ( − p DBCM ij p DBCM ij (44)for each (directed) pair of nodes, in either a sequen-tial or a parallel fashion. The sampling process, whosecomputational complexity amounts at O ( N ), can be re-peated to generate as many configurations as desired.The pseudo-code for explicitly sampling the DBCM en-semble is summed up by Algorithm 2.1 Algorithm 2
Sampling the DBCM ensemble for m = 1 . . . | E | do A = ; for i = 1 . . . N do for j = 1 . . . N and j = i do if RandomUniform[0 , ≤ p DBCM ij then a ij = 1; else a ij = 0; end if end for end for Ensemble[ m ] = A ; end forBiCM: bipartite binary undirected graphswith given degree sequences So far, we have considered monopartite networks.However, the algorithm we have described for solvingthe DBCM can be adapted, with little effort, to solvea null model designed for bipartite, binary, undirectednetworks (BiBUNs), i.e. the so-called
Bipartite Configu-ration Model (BiCM) [36]. These networks are defined bytwo distinct layers (say, ⊤ and ⊥ ) and obey the rule thatlinks can exist only between (and not within) layers: forthis reason, they can be compactly described via a biadja-cency matrix B ≡ { b iα } i,α whose generic entry b iα is 1 ifnode i belonging to layer ⊥ is linked to node α belongingto layer ⊤ and 0 otherwise. The constraints defining theBiCM are represented by the degree sequences { k i } i ∈⊥ and { d α } α ∈⊤ where k i = P α ∈⊤ b iα counts the neigh-bors of node i (belonging to layer ⊤ ) and d α = P i ∈⊥ b iα counts the neighbors of node α (belonging to layer ⊥ ).Analogously to the DBCM case, P ( B | ~γ, ~β ) = Y i ∈⊥ Y α ∈⊤ p b iα iα (1 − p iα ) − b iα (45)where p iα = p BiCM iα ≡ e − γi − βα e − γi − βα . The canonical ensem-ble of BiBUNs includes all networks with, say, N nodeson one layer, M nodes on the other layer and a number oflinks (connecting nodes of different layers) ranging fromzero to the maximum value N · M .The BiCM likelihood function reads L BiCM ( ~γ, ~β ) = − X i ∈⊥ γ i k i ( B ∗ ) − X α ∈⊤ β α d α ( B ∗ ) − X i ∈⊥ X α ∈⊤ ln (cid:2) e − γ i − β α (cid:3) (46)whose first-order optimality conditions read ∇ γ i L BiCM = − k i ( B ∗ ) + X α ∈⊤ e − γ i − β α e − γ i − β α , i ∈ ⊥∇ β α L BiCM = − d α ( B ∗ ) + X i ∈⊥ e − γ i − β α e − γ i − β α , α ∈ ⊤ . (47) Resolution of the BiCM.
As for the DBCM case,Newton’s and the quasi-Newton methods can be imple-mented by adapting the recipe defined in eq. (18) to thebipartite case (see Appendix A for the definition of theBiCM Hessian).As for the DBCM, the fixed-point recipe for the BiCMcan be re-written in the usual iterative fashion as follows γ ( n ) i = − ln k i ( B ∗ ) P α ∈⊤ (cid:18) e − β ( n − α e − γ ( n − i − β ( n − α (cid:19) , ∀ i ∈ ⊥ β ( n ) α = − ln d α ( B ∗ ) P i ∈⊥ (cid:18) e − γ ( n − i e − γ ( n − i − β ( n − α (cid:19) , ∀ α ∈ ⊤ (48)and the initialization is similar as well: in fact, wecan employ the value of the solution of the BiCM inthe sparse case, i.e. γ (0) i = − ln h k i ( B ∗ ) √ L i , ∀ i ∈ ⊥ and β (0) α = − ln h d α ( B ∗ ) √ L i , ∀ α ∈ ⊤ (only this set of initialconditions has been employed to analyse the bipartitecase). Reducing the dimensionality of the problem.
Exactlyas for the DBCM case, the presence of nodes with thesame degree, on the same layer, leads to the appearanceof identical equations in the system above; hence, thecomputation of the solutions can be sped up by writing γ ( n ) k = − ln k ( B ∗ ) P d g ( d ) (cid:18) e − β ( n − k,d e − γ ( n − k,d − β ( n − k,d (cid:19) , ∀ kβ ( n ) d = − ln d ( B ∗ ) P k f ( k ) (cid:18) e − γ ( n − k,d e − γ ( n − k,d − β ( n − k,d (cid:19) , ∀ d (49)where f ( k ) is the number of nodes, belonging to layer ⊥ , whose degree is k and g ( d ) is the number of nodes,belonging to layer ⊤ , whose degree is d .2 Newton Quasi-Newton Fixed-point N + M L c c r MADE Time (s) MADE Time (s) MADE Time (s)WTW ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . ≃ . · − ≃ . N , the total numberof links, L , and the connectance, c = L/ ( N · M ), are provided). All algorithms stop because the condition ||∇ L ( ~θ ) || ≤ − is satisfied. For what concerns both accuracy and speed, the best performing method is Newton’s one, followed by the quasi-Newton and the fixed-point recipes. Only the results corresponding to the best choice of initial conditions are reported. Performance testing.
The performance of the threealgorithms in solving eqs. (49) has been tested on 16snapshot of the bipartite, binary, undirected versionof the WTW, gathering the country-product exportrelationships across the years 1995-2010 [36]. Beforecommenting on the results of our numerical exercises, letus, first, describe how the latter ones have been carriedout.The accuracy of each algorithm in reproducing the con-straints defining the BiCM has been quantified via themaximum absolute error metrics that, now, readsMADE = max {| k ∗ − h k i| . . . | k ∗ N − h k N i| , | d ∗ − h d i| . . . | d ∗ M − h d M i|} (50)to account for the degrees of nodes on both layers.The three different ‘stop criteria’ match the onesadopted for analysing the UBCM and the DBCM andconsist in a condition on the Euclidean norm of the gra-dient of the likelihood function, i.e. ||∇ L ( ~θ ) || ≤ − ,in a condition on the Euclidean norm of the vectorof differences between the values of the parametersat subsequent iterations, i.e. || ∆ ~θ || ≤ − , and acondition on the maximum number of iterations: after1000 steps, any of the three algorithms stops.The results about the performance of our three algo-rithms are reported in table III. Overall, all recipes areaccurate, fast and scalable; all methods stop because thecondition on the norm of the likelihood is satisfied. For what concerns accuracy, the largest maximumerror per method spans an interval (across all con-figurations) that amounts at MADE reducedNewton ≃ − ,10 − . MADE reducedQuasi-Newton . − and 10 − . MADE reducedfixed-point . − . By looking at each specificnetwork, it is evident that the most accurate method issystematically Newton’s one.For what concerns speed, the amount of time requiredby each method to achieve convergence spans an interval(across all configurations) that is T reducedNewton ≃ . T reducedQuasi-Newton ≃ .
016 (on average) and T reducedfixed-point ≃ .
018 (on average) (time is measured inseconds). The fastest method is Newton’s one andis followed by the quasi-Newton and the fixed-pointrecipes. Overall, the gain in terms of speed due to thereducibility of the system of equations defining the BiCMis evident: while solving the original problem wouldhave required handling a system of ≃ equations, thereduced one is defined by just ≃ distinct equations.Overall, a solution is always found within thousandthsor hundredths of seconds.As for the DBCM case, the ensemble of BiBUNs can besampled by implementing a Bernoulli trial b iα ∼ Ber[ p iα ]for any two nodes (belonging to different layers) in eithera sequential or a parallel fashion. The computationalcomplexity of the sampling process amounts at O ( N · M )and can be repeated to generate as many configurationsas desired. The pseudo-code for explicitly sampling theBiCM ensemble is summed up by Algorithm 3.3 Algorithm 3
Sampling the BiCM ensemble for m = 1 . . . | E | do B = ; for i = 1 . . . N do for α = 1 . . . M do if RandomUniform[0 , ≤ p BiCM iα then b iα = 1; else b iα = 0; end if end for end for Ensemble[ m ] = B ; end forUECM: weighted undirected graphswith given strengths and degrees So far, we have considered purely binary models. Letus now focus on a class of ‘mixed’ null models forweighted networks, defined by constraining both binaryand weighted quantities . Let us start by the simplestmodel, i.e. the one constraining the degrees and thestrengths in an undirected fashion. While k i ( A ) = P Nj ( = i )=1 a ij counts the number of neighbors of node i , s i ( W ) = P Nj ( = i )=1 w ij defines the weighted equivalent ofthe degree of node i , i.e. its strength. For consistency, thebinary adjacency matrix can be defined via the Heavisidestep function, Θ[ . ], as A ≡ Θ[ W ] a position indicatingthat a ij = 1 if w ij > ∀ i < j and zero otherwise. Thisparticular model is known as Undirected Enhanced Con-figuration Model (UECM) [37–39] and its Hamiltonianreads H UECM ( W , ~α, ~β ) = N X i =1 [ α i k i ( A ) + β i s i ( W )]; (51)it induces a probability distribution which is halfway be-tween a Bernoulli and a geometric one [38], i.e. Q UECM ( W | ~α, ~β ) = N Y i =1 N Y j =1( j Purely weighted models such as the
Undirected and the
DirectedWeighted Configuration Model have not been considered since,as it has been proven elsewhere [37], they perform quite poorlywhen employed to reconstruct networks. for any two nodes i and j such that i < j and p UECM ij = e − αi − αj − βi − βj − e − βi − βj + e − αi − αj − βi − βj . Notice that the functional formabove is obtained upon requiring that the weights onlyassume (non-negative) integer values (i.e. w ij ∈ [0 , + ∞ ), ∀ i < j ): hence, the canonical ensemble is now consti-tuted by the weighted configurations with N nodes anda number of (undirected) links ranging between zero andthe maximum value (cid:0) N (cid:1) .The argument of the problem (14) for the specific net-work W ∗ now becomes L UECM ( ~α, ~β ) = − N X i =1 [ α i k i ( A ∗ ) + β i s i ( W ∗ )] − N X i =1 N X j =1( j
Newton’s and the quasi-Newton methods can be easily implemented via therecipe defined in eq. (18) (see Appendix A for the defi-nition of the UECM Hessian).As for the purely binary models, the fixed-point recipefor solving the UECM first-order optimality conditionstransforms the following set of consistency equations α i = − ln k i ( A ∗ ) P Nj =1( j = i ) (cid:16) e − αj − βi − βj − e − βi − βj + e − αi − αj − βi − βj (cid:17) ,β i = − ln s i ( W ∗ ) P Nj =1( j = i ) (cid:16) e − αi − αj − βj (1 − e − βi − βj )(1 − e − βi − βj + e − αi − αj − βi − βj ) (cid:17) (56)(with i = 1 . . . N ) into the usual iterative fashion, byconsidering the parameters at the left hand side and atthe right hand side, respectively at the n -th and at the( n − Newton Quasi-Newton
N L c
MRDE MASE MRSE Time (s) MRDE MASE MRSE Time (s)WTW ≃ . ≃ . · − ≃ · − ≃ · − ≃ . ≃ . · − ≃ · − ≃ · − ≃ ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ · − ≃ . · − ≃ ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ · − ≃ · − ≃ . · − ≃ ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ · − ≃ . · − ≃ ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ · − ≃ · − ≃ . · − ≃ ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ · − ≃ . · − ≃ ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ · − ≃ · − ≃ ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ · − ≃ . · − ≃ ≃ . ≃ · − ≃ · − ≃ · − ≃ . ≃ . · − ≃ · − ≃ . · − ≃ ≃ . ≃ . · − ≃ · − ≃ · − ≃ . ≃ . · − ≃ · − ≃ . · − ≃ ≃ . ≃ · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ · − ≃ . · − ≃ N , the total number of links, L ,and the connectance, c = 2 L/N ( N − ||∇ L ( ~θ ) || ≤ − is satisfied, the quasi-Newton one always reaches the limit of 10000 steps. The results on accuracy and speed clearly indicatethat Newton’s method outperforms the quasi-Newton one. Only the results corresponding to the best choice of initial conditionsare reported. The results of the fixed-point recipe are not shown. reduced version of the iterative recipe above can indeedbe written, by assigning the same pair of values ( α, β ) tothe nodes with the same pair of values ( k, s ): however,the larger heterogeneity of the strengths causes this eventto happen more rarely than for purely binary models suchas the UBCM and the DBCM.As for the purely binary cases, three different sets ofinitial conditions have been considered, whose definitionfollows from the simplest conceivable generalization ofthe purely binary cases. In particular, the first set ofvalues reads α (0) i = − ln h k i ( A ∗ ) √ L i , i = 1 . . . N and β (0) i = − ln h s i ( W ∗ ) √ W i , i = 1 . . . N ; the second set is a variantof the first, reading α (0) i = − ln h k i ( A ∗ ) √ N i , i = 1 . . . N and β (0) i = − ln h s i ( W ∗ ) √ N i , i = 1 . . . N ; the third recipe,instead, prescribes to randomly draw the value of eachparameter from the uniform distribution defined on theunit interval, i.e. α (0) i ∼ U (0 , ∀ i and β (0) i ∼ U (0 , ∀ i . Performance testing.
The performance of the threealgorithms to solve the system of equations definingthe UECM has been tested on a bunch of real-worldnetworks. In particular, we have considered the WTWduring the decade 1990-2000 [40]. Since the weightsdefining the configurations of the WTW are real num-bers, we have rounded them to the nearest integer value,before running the UECM. Before commenting on theresults of our numerical exercises, let us, first, describehow the latter ones have been carried out.The accuracy of each algorithm in reproducing the con-straints defining the UECM has been now quantified viathe maximum relative error metrics, defined, in a per-fectly general fashion, as max i n | C ∗ i −h C i i| C i o Ni =1 (where C ∗ i is the empirical value of the i -th constraint, C i ). In theUECM case, we can define two variants of the aforemen-tioned error, i.e.MRDE = max i (cid:26) | k ∗ i − h k i i| k i (cid:27) Ni =1 (57)MRSE = max i (cid:26) | s ∗ i − h s i i| s i (cid:27) Ni =1 (58)(the acronyms standing for Maximum Relative DegreeError and Maximum Relative Strength Error). The rea-son driving this choice lies in the evidence that, in ab-solute terms, strengths are affected by a larger numeri-cal error than degrees: this, however, doesn’t necessar-ily mean that a given algorithm performs poorly, as themagnitude of an error must be always compared with thenumerical value of the quantity it refers to - whence thechoice of considering relative scores.The three different ‘stop criteria’ we have consid-ered for each algorithm match the ones adopted foranalysing the binary cases, consisting in a condition onthe Euclidean norm of the gradient of the likelihoodfunction, i.e. ||∇ L ( ~θ ) || ≤ − , and in a condition onthe Euclidean norm of the vector of differences betweenthe values of the parameters at subsequent iterations,i.e. || ∆ ~θ || ≤ − . The third condition concerns themaximum number of iterations: after 10000 steps, anyof the three algorithms stops.The results about the performance of our three algo-rithms are reported in table IV. Overall, two out of threealgorithms (i.e. Newton’s and the quasi-Newton meth-ods) perform very satisfactorily, being accurate, fast andscalable; the third one (i.e. the fixed-point recipe), in-stead, performs very poorly. Moreover, while Newton’s5method stops because the condition on the norm of thelikelihood is satisfied, both the quasi-Newton and thefixed-point algorithms are always found to satisfy thelimit condition on the number of steps (i.e. they runfor 10000 steps and, then, stop).For what concerns accuracy, the largest maximumerror made by Newton’s method (across all con-figurations) amounts at 10 − . MRDE
Newton . − and MRSE Newton & − ; on the otherhand, the largest maximum error made by the quasi-Newton method (across all configurations) amountsat 10 − . MRDE
Quasi-Newton . − and 10 − ≤ MRSE
Quasi-Newton ≤ − . For what concerns speed,Newton’s method employs tenths of seconds to achieveconvergence on each configuration while the quasi-Newton one always requires tens of seconds (specifically,almost thirty seconds for each considered configuration).The results above indicate that the fastest and two mostaccurate method is systematically Newton’s one, suggest-ing that the ‘complexity’ of the model is such that theinformation encoded into the Hessian matrix cannot beignored without consequences on the quality of the so-lution. The fixed-point algorithm, instead, stops afterseconds but is affected by errors whose order of system-atically magnitude amounts at MRDE fixed-point ≃ and 1 . MRSE fixed-point . .We also explicitly notice that the MADE basicallycoincides with the MRDE for all considered configura-tions, meaning that the largest error, made by the al-gorithms considered here to solve the UECM, affectsthe nodes with the lowest degree (i.e. equal to one).On the other hand, strengths are affected by a largerabsolute error (i.e. the MASE, defined as MASE =max i {| s ∗ i − h s i i|} Ni =1 ) than the degrees: if we calculatethe MRSE, however, we realize that the largest errors af-fect very large strengths - hence being perfectly accept-able. For example, let us consider the WTW in 1993:the MASE amounts at 0.1 but, as the MRSE reveals, itaffects a strength of the order of 10 .Lastly, differences in the speed of convergence ofthe two methods discussed in this section, caused bythe choice of a particular set of initial conditions, areobservable: the ‘uniform’ prescription outperforms theother ones.Finally, let us comment on the algorithm to sample theUECM ensemble and that can be compactly achieved byimplementing a two-step procedure. Let us look backat the formal expression for the pair-specific probabilitydistribution characterizing the UECM: it induces coeffi-cients reading − p UECM ij , w = 0 p UECM ij (1 − e − β i − β j ) , w = 1 p UECM ij ( e − β i − β j )(1 − e − β i − β j ) , w = 2 p UECM ij ( e − β i − β j ) (1 − e − β i − β j ) , w = 3... (59)in turn suggesting that, for a specific pair of vertices i, j (with i < j ), the appearance of the first link is ruled by aBernoulli distribution with probability p UECM ij while theremaining ( w −
1) ones can be drawn from a geometricdistribution whose parameter reads e − β i − β j ; in otherwords, the weight ( w −
1) is drawn conditionally on thepresence of a connection between the two considerednodes. The computational complexity of the samplingprocess is, again, O ( N ). The pseudo-code for explicitlysampling the DBCM ensemble is summed up by Algo-rithm 4. DECM: weighted directed graphswith given strengths and degrees
Let us now extend the ‘mixed’ model introduced in theprevious section to the case of directed networks. Con-straints are, now, represented by four sequences of values,i.e. { k outi } Ni =1 , { k ini } Ni =1 , { s outi } Ni =1 , { s ini } Ni =1 where thegeneric out-degree and in-degree are, respectively, definedas k outi ( A ) = P Nj ( = i )=1 a ij and k ini ( A ) = P Nj ( = i )=1 a ji and analogously for the generic out-strength and in-strength, reading s outi ( W ) = P Nj ( = i )=1 w ij and s ini ( W ) = P Nj ( = i )=1 w ji . Consistency requires that A ≡ Θ[ W ] asfor the UECM case. This model is known as DirectedEnhanced Configuration Model (DECM) and its Hamil-tonian reads H DECM ( W , ~α, ~β, ~γ, ~δ ) = N X i =1 [ α i k outi ( A ) + β i k ini ( A )+ γ i s outi ( W ) + δ i s ini ( W )](60)in turn, inducing the directed counterpart of the UECMdistribution, i.e. Q DECM ( W | ~α, ~β, ~γ, ~δ ) = N Y i =1 N Y j =1( j = i ) q ij ( w ij ) (61)with6 Algorithm 4
Sampling the UECM ensemble for m = 1 . . . | E | do W = ; for i = 1 . . . N do for j = 1 . . . N and j < i do if RandomUniform[0 , ≤ p UECM ij then w ij = w ji = RandomGeometric[ e − β i − β j ]; else w ij = w ji = 0; end if end for end for Ensemble[ m ] = W ; end for q ij ( w ) = ( − p DECM ij w = 0 p DECM ij ( e − γ i − δ j ) w − (1 − e − γ i − δ j ) w > i and j such that i = j and p DECM ij = e − αi − βj − γi − δj − e − γi − δj + e − αi − βj − γi − δj . As for the undirected case,weights are required to assume only (non-negative) in-teger values (i.e. w ij ∈ [0 , + ∞ ), ∀ i = j ): hence, thecanonical ensemble is constituted by the weighted config-urations with N nodes and a number of (directed) linksranging between zero and the maximum value N ( N − W ∗ becomes L DECM ( ~α, ~β, ~γ, ~δ ) = − N X i =1 [ α i k outi ( A ∗ ) + β i k ini ( A ∗ )+ γ i s outi ( W ∗ ) + δ i s ini ( W ∗ )] − N X i =1 N X j =1( j = i ) ln z ij (63)where z ij = h e − α i − β j (cid:16) e − γi − δj − e − γi − δj (cid:17)i , ∀ i = j andwhose first-order optimality conditions read ∇ α i L DECM = − k outi ( A ∗ ) + N X j ( = i )=1 p DECM ij = − k outi ( A ∗ ) + h k outi i = 0 , i = 1 . . . N, ∇ β i L DECM = − k ini ( A ∗ ) + N X j ( = i )=1 p DECM ji = − k ini ( A ∗ ) + h k ini i = 0 , i = 1 . . . N, ∇ γ i L DECM = − s outi ( W ∗ ) + N X j ( = i )=1 p DECM ij − e − γ i − δ j = − s outi ( W ∗ ) + h s outi i = 0 , i = 1 . . . N, ∇ δ i L DECM = − s ini ( W ∗ ) + N X j ( = i )=1 p DECM ji − e − γ j − δ i = − s ini ( W ∗ ) + h s ini i = 0 , i = 1 . . . N. (64) Resolution of the DECM.
Newton’s and the quasi-Newton methods can be easily implemented via therecipe defined in eq. (18) (see Appendix A for the defi-nition of the DECM Hessian).As for the UECM, the fixed-point recipe for solvingthe DECM first-order optimality conditions transformsthe following set of consistency equations α i = − ln k outi ( A ∗ ) P Nj =1( j = i ) (cid:16) e − βj − γi − δj − e − γi − δj + e − αi − βj − γi − δj (cid:17) ,β i = − ln k ini ( A ∗ ) P Nj =1( j = i ) (cid:16) e − αj − γj − δi − e − γj − δi + e − αj − βi − γj − δi (cid:17) ,γ i = − ln s outi ( W ∗ ) P Nj =1( j = i ) (cid:16) e − αi − βj − δj (1 − e − γi − δj )(1 − e − γi − δj + e − αi − βj − γi − δj ) (cid:17) ,δ i = − ln s ini ( W ∗ ) P Nj =1( j = i ) (cid:16) e − αj − βi − γj (1 − e − γj − δi )(1 − e − γj − δi + e − αj − βi − γj − δi (cid:17) . (65)(with i = 1 . . . N ) into the usual iterative fashion, byconsidering the parameters at the left hand side and atthe right hand side, respectively at the n -th and at the( n − α, β, γ, δ ) to thenodes for which the quantities ( k out , k in , s out , s in ) havethe same value: however, the larger heterogeneity of thestrengths causes the DECM to be much less reducible7than the purely binary models we have considered in thepresent contribution.The three different sets of initial conditionsthat have been considered generalize the UECMones: in particular, the first set of values reads α (0) i = − ln h k outi ( A ∗ ) √ L i , i = 1 . . . N , β (0) i = − ln h k ini ( A ∗ ) √ L i , i = 1 . . . N , γ (0) i = − ln h s outi ( W ∗ ) √ W i , i = 1 . . . N and δ (0) i = − ln h s ini ( W ∗ ) √ W i , i = 1 . . . N ; the second set ofinitial conditions can be obtained by simply replacing L with N ; the third recipe, as usual, prescribes torandomly draw the value of each parameter from theuniform distribution defined on the unit interval. Performance testing.
The performance of the threealgorithms to solve the system of equations defining theDECM has been tested on a bunch of real-world net-works. In particular, we have considered the ElectronicItalian Interbank Market (e-MID) during the decade2000-2010 [41]. Since e-MID weights are real numbers,we have rounded them to the nearest integer value,before running the DECM. Before commenting on theresults of our numerical exercises, let us, first, describehow the latter ones have been carried out.The accuracy of each algorithm in reproducing the con-straints defining the DECM has been quantified via the maximum relative error metrics, now readingMRDE = max i (cid:26) | k ∗ i − h k i i| k i , | h ∗ i − h h i i| h i (cid:27) Ni =1 (66)MRSE = max i (cid:26) | s ∗ i − h s i i| s i , | t ∗ i − h t i i| t i (cid:27) Ni =1 (67)(the acronyms standing for for Maximum Relative DegreeError and Maximum Relative Strength Error) where wehave defined k out ≡ k , k in ≡ h , s out ≡ s and s in ≡ t inorder to simplify the formalism.The three different ‘stop criteria’ we have adopted arethe same ones we have considered for both the binaryand the undirected, ‘mixed’ model, i.e. the conditionon the Euclidean norm of the gradient of the likelihoodfunction, i.e. ||∇ L ( ~θ ) || ≤ − ), the condition on theEuclidean norm of the vector of differences between thevalues of the parameters at subsequent iterations (i.e. || ∆ ~θ || ≤ − ) and the condition on the maximumnumber of iterations (i.e. after 10000 steps, any of thethree algorithms stops).The results about the performance of our three algo-rithms are reported in table V. Overall, Newton’s methodperforms very satisfactorily, being accurate, fast and scal-able; the quasi-Newton method is accurate as well al-though (in some cases, much) slower. The fixed-point recipe, instead, performs very poorly, as for the undi-rected case. Moreover, while Newton’s method stopsbecause the condition on the norm of the likelihood issatisfied, both the quasi-Newton and the fixed-point al-gorithms are always found to satisfy the limit conditionon the number of steps (i.e. they run for 10000 steps and,then, stop).For what concerns accuracy, the largest maximum er-ror made by Newton’s method (across all configurations)amounts at 10 − . MRDE
Newton . − and 10 − . MRSE
Newton . − ; on the other hand, the largest max-imum error made by the quasi-Newton method (across allconfigurations) amounts at 10 − . MRDE
Quasi-Newton . − and 10 − . MRSE
Quasi-Newton . − . For whatconcerns speed, Newton’s method employs tens of sec-onds to achieve convergence on each configuration; thetime required by the quasi-Newton method is of the sameorder of magnitude, although it is systematically largerthan the time required by Newton’s one. Overall, theseresults indicate that the fastest and two most accuratemethod is Newton’s one. As in the undirected case,the fixed-point algorithm, instead, stops after secondsbut is affected by errors whose order of systematicallymagnitude amounts at 10 . MRDE fixed-point . and1 . MRSE fixed-point . .As for the UECM, the MADE basically coincideswith the MRDE, for all considered configurations, whilestrengths are affected by a larger absolute error than thedegrees: still, upon calculating the MRSE, we realize thatthe largest errors affect very large strengths - hence beingperfectly acceptable.Lastly, differences in the speed of convergence ofthe two methods discussed in this section, caused bythe choice of a particular set of initial conditions, areobservable: the ‘uniform’ prescription outperforms theother ones.Finally, let us comment on the algorithm to sample theDECM ensemble: as for the UECM, it can be compactlyachieved by implementing the directed counterpart of thetwo-step procedure described above. Given a specific pairof vertices i, j (with i = j ), the first link can be drawn bysampling a Bernoulli distribution with probability p DECM ij while the remaining ( w −
1) ones can be drawn from ageometric distribution whose parameter reads e − γ i − δ j .The computational complexity of the sampling process is,again, O ( N ) and the pseudo-code for explicitly samplingthe DECM ensemble is summed up by Algorithm 5. Two-step models forundirected and directed networks
The need of considering network models defined in atwo-step fashion arises from a number of considerations.First, the amount of information concerning binary and8
Newton Quasi-Newton
N L c
MRDE MASE MRSE Time (s) MRDE MASE MRSE Time (s)e-MID ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ . ≃ . · − ≃ · − ≃ · − ≃ . ≃ . · − ≃ ≃ · − ≃ . ≃ . ≃ . · − ≃ · − ≃ · − ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ ≃ . · − ≃ . ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ · − ≃ ≃ . · − ≃ . ≃ . ≃ · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ ≃ . · − ≃ . ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ · − ≃ ≃ . · − ≃ . ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ ≃ . · − ≃ . ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ ≃ . · − ≃ . ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ ≃ · − ≃ . ≃ . ≃ . · − ≃ · − ≃ . · − ≃ . ≃ . · − ≃ ≃ . · − ≃ . N , the total number of links, L ,and the connectance, c = L/N ( N − ||∇ L ( ~θ ) || ≤ − issatisfied, the quasi-Newton one always reaches the limit of 10000 steps. The results on accuracy and speed clearly indicate thatNewton’s method outperforms the quasi-Newton one. Only the results corresponding to the best choice of initial conditionsare reported. The results of the fixed-point recipe are not shown. weighted quantities is often asymmetric: as it has beenpointed out in [42], information concerning a given net-work structure ranges from the knowledge of just a single,aggregated piece of information (e.g. the link density) tothat of entire subgraphs. Indeed, models exist that takeas input any binary, either probabilistic or determinis-tic, network model (i.e. any P ( A )) while placing linkweights optimally, conditionally on the input configura-tions [19, 42].Second, recipes like the UECM and the DECM are,generally speaking, difficult to solve; as we have alreadyobserved, only Newton’s method performs in a satisfac-tory way, both for what concerns accuracy and speed:hence, easier-to-solve recipes are welcome.In what follows, we will consider the conditional recon-struction method (hereby, CReM) induced by the Hamil-tonian H CReM ( W , ~θ ) = N X i =1 θ i s i ( A ); (68)in case of undirected networks, it induces a conditionalprobability distribution reading Q ( W | A ) = N Y i =1 N Y j =1( j w ≤ W ∗ becomes G CReM = − N X i =1 θ i s i ( W ∗ ) + N X i =1 N X j =1( j
Newton’s and the quasi-Newton method can still be implemented via the recipedefined in eq. (18) (see Appendix A for the definition ofthe CReM Hessian).As for the UECM and the DECM, the fixed-pointrecipe for solving the system of equations embodying theCReM transforms the set of consistency equations θ i = s i ( W ∗ ) P Nj =1( j = i ) (cid:16) f ij θ j /θ i (cid:17) − , i = 1 . . . N (73)into an iterative recipe of the usual form, i.e. by consid-ering the parameters at the left hand side and at the righthand side, respectively at the n -th and at the ( n − θ (0) i = − ln h s i ( W ∗ ) √ W i , i = 1 . . . N ; the second one is a variant of the positionabove, reading θ (0) i = − ln h s i ( W ∗ ) √ N i ; the third one,instead, prescribes to randomly draw the value of eachparameter from the uniform distribution defined on theunit interval, i.e. θ (0) i ∼ U (0 , ∀ i .When considering directed networks, the conditionalprobability distribution defining the CReM reads q ij ( w ij | a ij = 1) = ( ( α i + β j ) e − ( α i + β j ) w w > w ≤ i and j such that i = j ; the set ofequations (73) can be generalized as follows α i = s outi ( W ∗ ) P Nj =1( j = i ) (cid:16) f ij β j /α i (cid:17) − , i = 1 . . . Nβ i = s ini ( W ∗ ) P Nj =1( j = i ) (cid:16) f ji α j /β i (cid:17) − , i = 1 . . . N (75)and analogously for the sets of values initializing them. Algorithm 5
Sampling the DECM ensemble for m = 1 . . . | E | do W = ; for i = 1 . . . N do for j = 1 . . . N and j = i do if RandomUniform[0 , ≤ p DECM iα then w ij = RandomGeometric[ e − γ i − δ j ]; else w ij = 0; end if end for end for Ensemble[ m ] = W ; end for Rescaling the CReM algorithm.
Although the equa-tions defining the CReM algorithm cannot be effectivelyreduced, they can be opportunely rescaled . To this aim,let us consider directed configurations and the system P Nj =1 j ( = i ) f ij α i ( κ )+ β j ( κ ) = s outi ( W ∗ ) κ , ∀ i P Nj =1 j ( = i ) f ji α j ( κ )+ β i ( κ ) = s ini ( W ∗ ) κ , ∀ i (76)where the sufficient statistics has been divided by an op-portunely defined factor (in this case, κ ) and the symbols α i ( κ ), α j ( κ ), β i ( κ ) and β j ( κ ) stress that the solution weare searching for is a function of the parameter κ itself.In fact, a solution of the system above reads α ∗ i = α ∗ i ( κ ) κ , ∀ i (77) β ∗ i = β ∗ i ( κ ) κ , ∀ i (78)as it can be proven upon substituting it back intoeqs. (76) and noticing that { α ∗ i } Ni =1 and { β ∗ i } Ni =1 aresolutions of the system of equations defined in (76). Asour likelihood maximization problem admits a unique,global maximum, the prescription above allows us toeasily identify it. Rescaling will be tested in order tofind out if our algorithms are enhanced by it under somerespect (e.g. accuracy or speed). Performance testing.
Before commenting on theperformance of the three algorithms in solving the sys-tem of equations defining the CReM, let us stress oncemore that the formulas presented so far are perfectlygeneral, working for any binary recipe one may wantto employ. In what follows, we will test the CReM byposing f ij ≡ p UBCM ij and f ij ≡ p DBCM ij .To test the effectiveness of our algorithms in solvingthe CReM on undirected networks we have consideredthe synaptic network of the worm C. Elegans [28]and the eight daily snapshots of the Bitcoin Lightning0
Newton Quasi-Newton Fixed-pointMRDE MASE MRSE Time (s) MRDE MASE MRSE Time (s) MRDE MASE MRSE Time (s)BLN ≃ · − ≃ − ≃ − ≃ . ≃ · − ≃ − ≃ − ≃ ≃ · − ≃ − ≃ − ≃ . ≃ · − ≃ − ≃ − ≃ . ≃ · − ≃ − ≃ − ≃ ≃ · − ≃ − ≃ − ≃ . ≃ · − ≃ − ≃ − ≃ ≃ · − ≃ − ≃ − ≃ ≃ · − ≃ − ≃ − ≃ ≃ · − ≃ − ≃ − ≃ ≃ · − ≃ − ≃ ≃ ≃ · − ≃ ≃ · − ≃ . ≃ · − ≃ − ≃ − ≃ ≃ · − ≃ − ≃ ≃ ≃ · − ≃ − ≃ − ≃ . ≃ · − ≃ − ≃ − ≃ ≃ · − ≃ − ≃ ≃ ≃ · − ≃ − ≃ − ≃ ≃ · − ≃ − ≃ − ≃ ≃ · − ≃ − ≃ ≃ ≃ · − ≃ − ≃ − ≃ ≃ · − ≃ − ≃ − ≃ ≃ · − ≃ − ≃ − ≃ ≃ · − ≃ − ≃ − ≃ ||∇ L ( ~θ ) || ≤ − is satisfied, the quasi-Newton one often reaches the limit of 10000 steps. The results on accuracyand speed indicate that Newton’s and the fixed-point method compete, outperforming the quasi-Newton one. Only the resultscorresponding to the best choice of initial conditions are reported. Network [32]; the directed version of the CReM has,instead, been solved on the Electronic Italian InterbankMarket (e-MID) during the decade 2000-2010 [41].Before commenting on the results of our numericalexercises, let us, first, describe how the latter ones havebeen carried out.As for the discrete ‘mixed’ models, the accuracy ofeach algorithm in reproducing the constraints definingthe CReM has been quantified via the Maximum Rela-tive Degree Error and the Maximum Relative StrengthError metrics, whose definition is provided by eqs. (57),(58) and (66), (67) for the undirected and the directedcase, respectively. Analogously, the three ‘stop criteria’for each algorithm are the same ones that we haveadopted for the other models (and consist in a conditionon the Euclidean norm of the gradient of the likelihoodfunction, i.e. ||∇ L ( ~θ ) || ≤ − , a condition on theEuclidean norm of the vector of differences betweenthe values of the parameters at subsequent iterations,i.e. || ∆ ~θ || ≤ − , and a condition on the maximumnumber of iterations, i.e. 10000 steps).The results about the performance of our three al-gorithms are reported in table VI and in table VII.Let us start by commenting the results reported in ta-ble VI and concerning undirected networks. Generallyspeaking, Newton’s method is the most accurate one (itslargest maximum errors span intervals, across all config-urations, that amount at 10 − . MRDE
Newton . − and 10 − . MRSE
Newton . − ) although it scales verybadly with the size of the network on which it is tested(the amount of time, measured in seconds, required by itto achieve convergence spans an interval, across all con-figurations, that amounts at 0 . ≤ T reducedNewton ≤ Quasi-Newton ≃ − and 10 − . MRSE
Quasi-Newton . ≤ T Quasi-Newton ≤ fixed-point ≃ − and 10 − . MRSE fixed-point . − ) although it outperforms Newton’s method, some-times. For what concerns scalability, the fixed-pointmethod is the less sensitive one to the growing size of theconsidered configurations: hence, it is also the fastest one(the amount of time, measured in seconds, required by itto achieve convergence spans an interval, across all con-figurations, that amounts at 0 . ≤ T fixed-point ≤ O ( N ) for Newton’smethod and O ( N ) for the quasi-Newton one - can makeits calculation (very) time demanding.1 Newton Quasi-Newton Fixed-point
N L
MRDE MRSE Time (s) MRDE MRSE Time (s) MRDE MRSE Time (s)e-MID ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ≃ · − ≃ · − ≃ . ||∇ L ( ~θ ) || ≤ − is satisfied. For what concerns accuracy, the two most accurate methods are Newton’s and the quasi-Newton one; for whatconcerns speed, the fastest method is the fixed-point one. Only the results corresponding to the best choice of initial conditionsare reported. Let us now move to comment on the performance ofour algorithms when applied to solve the directed versionof the CReM (see table VII). Overall, all methods per-form much better than in the undirected case, stoppingbecause the condition on the norm of the likelihood issatisfied.In fact, all of them are very accurate in reproduc-ing the purely binary constraints, their largest max-imum errors spanning intervals, across all configura-tions, that amount at 10 − . MRDE
Newton . − ,10 − . MRDE
Quasi-Newton . − and 10 − . MRDE fixed-point . − ; for what concerns the weightedconstraints, instead, the two most accurate methodsare Newton’s and the quasi-Newton one, their largestmaximum errors spanning intervals, across all configu-rations, that amount at 10 − . MRSE
Newton . − and 10 − . MRSE
Quasi-Newton . − (the fixed-point method performs worse than them, since 10 − . MRSE fixed-point . − ).For what concerns speed, the amount of time, mea-sured in seconds, required by Newton’s, the quasi-Newton and the fixed-point algorithms to achieve con-vergence spans an interval, across all configurations, thatamounts at 0 . ≤ T reducedNewton ≤
1, 0 . ≤ T reducedQuasi-Newton ≤ . . ≤ T reducedfixed-point ≤ .
2, respectively. Hence, allmethods are also very fast - the fixed-point being sys-tematically the fastest one.As already stressed above, the fact that the e-MID number of nodes remains approximately constantthroughout the considered time interval masks the strongdependence of the performance of Newton’s and thequasi-Newton method on the network size.Lastly, while rescaling the system of equations defin-ing the CReM improves neither the accuracy nor thespeed of any of the three algorithms considered here, differences in their speed of convergence, caused bythe choice of a particular set of initial conditions, areobservable: the ‘uniform’ prescription outperforms theother ones (for both the undirected and the directedversion of the CReM).As usual, let us comment on the algorithm to samplethe CReM ensemble - for the sake of simplicity, on theundirected cae. As for the UECM it can be compactlyachieved by implementing a two-step procedure, the onlydifference lying in the functional form of the distributionfrom which weights are sampled. Given a specific pair ofvertices i, j (with i < j ), the first link can be drawn froma Bernoulli distribution with probability p UBCM ij while theremaining ( w −
1) ones can be drawn from an exponentialdistribution whose parameter reads θ i + θ j . The com-putational complexity of the sampling process is, again, O ( N ) and the pseudo-code for explicitly sampling theCReM ensemble is summed up by Algorithm 6. DISCUSSION
The exercises carried out so far have highlighted anumber of (stylized) facts concerning the performanceof the three algorithms tested: in what follows, we willbriefly sum them up.
Newton’s method.
Overall, Newton’s method is veryaccurate - often, the most accurate one - in reproducingboth the binary and the weighted constraints; moreover,it represent the only viable alternative when the mostcomplicated models are considered (i.e. the UECM andthe DECM, respectively defined by a system of 2 N and4 N coupled, non-linear equations). However, the timerequired to run Newton’s method on a given model seemsto be quite dependent on the network size, especially2whenever the corresponding system of equations cannotbe reduced - see the case of the undirected CReM, runon the Bitcoin Lightning Network. Since one of thereasons affecting the bad scaling of Newton’s methodwith the network size is the evaluation of the Hessianmatrix defining a given model, this algorithm has to bepreferred for largely reducible networks. Quasi-Newton method.
For all the networks con-sidered here, the quasi-Newton method we haveimplemented is nothing else than the diagonal versionof the traditional Newton’s method. Even if this choicegreatly reduces the number of entries of the Hessianmatrix which are needed (i.e. just N elements for theundirected version of the CReM, 2 N elements for theUECM and the directed version of the CReM and 4 N elements for the DECM) dimensionality may still repre-sent an issue to achieve fast convergence. Moreover, sincethe diagonal approximation of the Hessian matrix is notnecessarily always a good one, the quasi-Newton methodmay require more time than Newton’s one to achieve thesame level of accuracy in reproducing the constraints.However, when such an approximation is a good one,the ‘regime’ in which the quasi-Newton method out-performs the competitors seems to be the one of small,non-reducible networks (e.g. see the results concerningthe DBCM run on the WTW) - althogh, in cases likethese, Newton’s method may still be a strong competitor. Fixed-point method.
From a purely theoretical pointof view, the fixed-point recipe is the fastest one, sincethe time required to evaluate the generic n -th stepis (only) due to the evaluation of the model-specificmap at the ( n − fixed-point & − and MRSE fixed-point ≃ − ;naturally, the method is not as accurate as New-ton’s one, for which MRDE Newton & − andMRSE Newton ≃ − but is much faster as Newton’salgorithm requires ≃ Algorithm 6
Sampling the CReM ensemble for m = 1 . . . | E | do W = ; for i = 1 . . . N do for j = 1 . . . N and j < i do if RandomUniform[0 , ≤ p UBCM iα then w ij = w ji = RandomExponential[ θ i + θ j ]; else w ij = w ji = 0; end if end for end for Ensemble[ m ] = W ; end for The ‘NEMTROPY’ Python package.
As an addi-tional result, we release a comprehensive package, codedin Python, that implements the three aforementionedalgorithms on all the ERGMs considered in the presentwork. Its name is ‘NEMTROPY’, an acronym standingfor ‘Network Entropy Maximization: a Toolbox RunningOn Python’, and is freely downloadable at the followingURL: https://github.com/nicoloval/NEMtropy .Alternative techniques to improve accuracy and speedhave been tested as well, as the one of coupling two ofthe algorithms considered above. In particular, we havetried to solve the (undirected version of the) CReM byrunning the fixed-point algorithm and using the solutionof the latter as input for the quasi-Newton method. Theresults are reported in table VIII: as they clearly show,the coupled algorithm is indeed more accurate that thesingle methods composing it and much faster than thequasi-Newton one (for some snapshots, more accurateand even faster than Newton’s method). Techniqueslike these are, in general, useful to individuate betterinitial conditions than the completely random ones: afirst run of the fastest method may be, in fact, useful todirect the most accurate algorithm towards the (best)solution. This is indeed the case, upon consideringthat the quasi-Newton method, now, stops because thecondition ||∇ L ( ~θ ) || ≤ − is satisfied - and not forhaving reached the limit of 10000 steps.We would like to end the discussion about the resultspresented in this contribution by explicitly mentioning acircumstance that is frequently met when studying eco-nomic and financial networks. When considering sys-tems like these, the information about the number ofneighbours of each node is typically not accessible: asa consequence, the models constraining both binary andweighted information cannot be employed as they havepresented in this contribution.Alternatives exist and rest upon the existence of somekind of relationship between binary and weighted con-straints. In the case of undirected networks, such a rela-tionship is usually written as3 Fixed-point + Quasi-NewtonMRDE MADE MRSE Time (s)BLN 24-01-18 ≃ · − ≃ . · − ≃ · − ≃ . ≃ . · − ≃ . · − ≃ · − ≃ . ≃ · − ≃ . · − ≃ . · − ≃ . ≃ . · − ≃ . · − ≃ . · − ≃ ≃ . · − ≃ . · − ≃ . · − ≃ ≃ . · − ≃ . · − ≃ . · − ≃ ≃ . · − ≃ . · − ≃ . · − ≃ ≃ . · − ≃ . · − ≃ . · − ≃ ||∇ L ( ~θ ) || ≤ − is satisfied. As the results reveal, it is more accurate that the single methods composing it and much faster than the quasi-Newton one - for some snapshots, more accurate and even faster than Newton’s method. Only the results corresponding to thebest choice of initial conditions are reported. e − θ i = √ zs i (79)and establishes that the Lagrange multipliers controllingfor the degrees are linearly proportional to the strengths.If this is the case (or a valid reason exists for this to bethe case), the expression for the probability that any twonodes are connected becomes p dcGM ij = zs i s j zs i s j ∀ i < j (80)the acronym standing for degree-corrected Gravity Model [43]. The (only) unknown parameter z must be numer-ically estimated by employing some kind of topologicalinformation; this is usually represented by (a proxy of)the network link density, used to instantiate the (only)likelihood condition L ( A ∗ ) = h L i = N X i =1 N X j =1( j
The definition and correct implementation of null mod-els is a crucial issue in network analysis: the present con-tribution focuses on (a subset of) the ones constitutingthe so-called ERG framework - a choice driven by the ev-idence that they are the most commonly employed ones for tasks as different as network reconstruction, patterndetection, graph enumeration. The optimization of thelikelihood function associated to them is, however, stillproblematic since it involves the resolution of large sys-tems of coupled, non-linear equations.Here, we have implemented and compared three algo-rithms for numerical optimization, with the aim of find-ing the one performing best (i.e. being both accurate andfast) for each model. What emerges from our results isthat there is no a unique method which is both accurateand fast for all models on all configurations: under thisrespect, performance is just a trade-off between accuracyand speed. However, some general conclusions can stillbe drawn.Newton’s method is the one requiring the largestamount of information per step (in fact, the entire Hes-sian matrix has to be evaluated at each iteration): hence,it is the most accurate one but, for the same reason, of-ten the one characterized by the slowest performance. Amajor drawback of Newton’s method is that of scalingvery badly with the network size.At the opposite extreme lies the fixed-point algo-rithm, theoretically the fastest one but, often, amongthe least accurate ones (at least, for what concernsthe weighted constraints); the performance if the quasi-Newton method often lies in-between the performancesof the two methods above, by achieving an accuracy thatis larger than the one achieved by the fixed-point algo-rithm, requiring less time that the one needed by New-ton’s method.Overall, while Newton’s method seems to perform beston either relatively small or greatly reducible networks,the fixed-point method must be preferred for large, non-reducible configurations. Deviations from this (over sim-plified) picture are, however, clearly visible.Future work concerns the application of the aforemen-tioned three numerical recipes to the models that havenot found place here. For what concerns the set of purelybinary constraints, the ones defining the
Reciprocal Bi- nary Configuration Model (RBCM) [15] deserve to bementioned. For what concerns the ‘mixed’ constraints,instead, the CReM framework is versatile enough to al-low for a test of a number of variants of the methodsabove, ranging from the discrete versions of the CReMmodels considered here - defined by the positions h w ij i undd-CReM = f ij − e − β j − β i (82)with f ij ≡ p UBCM ij and h w ij i dird-CReM = f ij − e − γ j − δ i (83)with f ij ≡ p DBCM ij - to the continuous generalizations ofthe UECM and the DECM, respectively defined by thepositions [19] p UECM ij = e − α i − α j β i + β j + e − α i − α j (84)and p DECM ij = e − α i − β j γ i + δ j + e − α i − β j . (85) AUTHORS CONTRIBUTIONS
FS, TS, GC, MZ developed the methods and de-signed the research. NV, EM, MB, GT performed theanalysis (NV: DBCM, DECM, RBCM; EM: UBCM,UECM, CReM; MB: BiCM; GT: preliminary version ofthe BiCM). FS, TS, GC, MZ wrote the manuscript. Allauthors reviewed and approved the manuscript.
ACKNOWLEDGEMENTS
FS and TS acknowledge support from the Europeanproject SoBigData++ (GA. 871042). GC and MZ ac-knowledge support from the PAI project ORCS (‘Opti-mized Reconstruction of Complex networkS’), funded bythe IMT School for Advanced Studies Lucca. FS is alsothankful to the Italian ‘Programma di Attivit`a Integrata’(PAI) project ‘TOol for Fighting FakEs’ (TOFFE),funded by IMT School for Advanced Studies Lucca forits support.
APPENDIX A: COMPUTINGTHE HESSIAN MATRIX
As we showed in the main text, the Hessian matrix ofour likelihood function is ‘minus’ the covariance matrixof the constraints, i.e. H ij = ∂ L ( ~θ ) ∂θ i ∂θ j = − Cov[ C i , C j ] , i, j = 1 . . . M ; (86)interestingly, a variety of alternative methods exists toexplicitly calculate the generic entry H ij , i.e. 1) takingthe second derivatives of the likelihood function charac-terizing the method under analysis, 2) taking the firstderivatives of the expectation values of the constraintscharacterizing the method under analysis, 3) calculatingthe moments of the pair-specific probability distributionscharacterizing each method. UBCM: binary undirected graphswith given degree sequence
The Hessian matrix for the UBCM is an N × N sym-metric table with entries reading H UBCM = Var[ k i ] = P Nj =1( j = i ) p ij (1 − p ij ) , ∀ i Cov[ k i , k j ] = p ij (1 − p ij ) , ∀ i = j (87)where p ij ≡ p UBCM ij . Notice that Var[ k i ] = P Nj =1( j = i ) Cov[ k i , k j ] , ∀ i . DBCM: binary directed graphswith given in-degree and out-degree sequences
The Hessian matrix for the DBCM is a 2 N × N sym-metric table that can be further subdivided into four N × N blocks whose entries read H DBCM = Var[ k outi ] = P Nj =1( j = i ) p ij (1 − p ij ) , ∀ i Var[ k ini ] = P Nj =1( j = i ) p ji (1 − p ji ) , ∀ i Cov[ k outi , k inj ] = p ij (1 − p ij ) , ∀ i = j Cov[ k outj , k ini ] = p ji (1 − p ji ) , ∀ i = j (88)while Cov[ k outi , k ini ] = Cov[ k outi , k outj ] = Cov[ k ini , k inj ] =0 and p ij ≡ p DBCM ij .Notice that the Hessian matrix of the BiCM mimicksthe DBCM one, the only difference being that the prob-ability coefficients are now indexed by i and α : for ex-ample, in the BiCM case, one has that Cov[ k i , d α ] = p iα (1 − p iα ), ∀ i, α . UECM: weighted undirected graphs with given strengths and degrees The Hessian matrix for the UECM is a 2 N × N sym-metric table that can be further subdivided into fourblocks (each of which with dimensions N × N ). In or-der to save space, the expressions indexed by the singlesubscript i will be assumed as being valid ∀ i , while theones indexed by a double subscript i, j will be assumedas being valid ∀ i = j . The entries of the diagonal blocksread H UECM = ∂ L UECM ∂α i = Var[ k i ] = P Nj =1( j = i ) p ij (1 − p ij ) ∂ L UECM ∂α i α j = Cov[ k i , k j ] = p ij (1 − p ij ) (89)and H UECM = ∂ L UECM ∂β i = Var[ s i ] = P Nj =1( j = i ) p ij (1 − p ij + e − βi − βj )(1 − e − βi − βj ) ∂ L UECM ∂β i β j = Cov[ s i , s j ] = p ij (1 − p ij + e − βi − βj )(1 − e − βi − βj ) (90)where p ij ≡ p UECM ij . On the other hand, the entries ofthe off-diagonal blocks read H UECM = ∂ L UECM ∂α i ∂β i = Cov[ k i , s i ] = P Nj =1( j = i ) p ij (1 − p ij )1 − e − βi − βj ∂ L UECM ∂α i ∂β j = Cov[ k i , s j ] = p ij (1 − p ij )1 − e − βi − βj (91)with p ij ≡ p UECM ij . DECM: weighted directed graphswith given strengths and degrees
The Hessian matrix for the DECM is a 4 N × N sym-metric table that can be further subdivided into fourblocks (each of which with dimensions N × N ). As for theUECM, in order to save space, the expressions indexedby the single subscript i will be assumed as being valid ∀ i , while the ones indexed by a double subscript i, j willbe assumed as being valid ∀ i = j . The entries of thediagonal blocks read H DECM = ∂ L DECM ∂α i = Var[ k outi ] = P Nj =1( j = i ) p ij (1 − p ij ) ∂ L DECM ∂α i α j = Cov[ k outi , k outj ] = 0 (92)and H DECM = ∂ L DECM ∂β i = Var[ k ini ] = P Nj =1( j = i ) p ji (1 − p ji ) ∂ L DECM ∂β i β j = Cov[ k ini , k inj ] = 0 (93) and H DECM = ∂ L DECM ∂γ i = Var[ s outi ] = P Nj =1( j = i ) p ij (1 − p ij + e − γi − δj )(1 − e − γi − δj ) ∂ L DECM ∂γ i γ j = Cov[ s outi , s outj ] = 0 (94)and H DECM = ∂ L DECM ∂δ i = Var[ s ini ] = P Nj =1( j = i ) p ji (1 − p ji + e − γj − δi )(1 − e − γj − δi ) ∂ L DECM ∂δ i δ j = Cov[ s ini , s inj ] = 0 (95)where p ij ≡ p DECM ij . On the other hand, the entries ofthe off-diagonal blocks read H DECM = ( ∂ L DECM ∂α i ∂β i = Cov[ k outi , k ini ] = 0 ∂ L DECM ∂α i ∂β j = Cov[ k outi , k inj ] = p ij (1 − p ij )(96)and H DECM = ∂ L DECM ∂α i ∂γ i = Cov[ k outi , s outi ] = P Nj =1( j = i ) p ij (1 − p ij )1 − e − γi − δj ∂ L DECM ∂α i ∂γ j = Cov[ k outi , s outj ] = 0 (97)and H DECM = ( ∂ L DECM ∂α i ∂δ i = Cov[ k outi , s ini ] = 0 ∂ L DECM ∂α i ∂δ j = Cov[ k outi , s inj ] = p ij (1 − p ij )1 − e − γi − δj (98)and H DECM = ( ∂ L DECM ∂β i ∂γ i = Cov[ k ini , s outi ] = 0 ∂ L DECM ∂β i ∂γ j = Cov[ k ini , s outj ] = p ji (1 − p ji )1 − e − γj − δi (99)and H DECM = ∂ L DECM ∂β i ∂δ i = Cov[ k ini , s ini ] = P Nj =1( j = i ) p ji (1 − p ji )1 − e − γj − δi ∂ L DECM ∂β i ∂δ j = Cov[ k ini , s inj ] = 0 (100)and H DECM = ∂ L DECM ∂γ i ∂δ i = Cov[ s outi , s ini ] = 0 ∂ L DECM ∂γ i ∂δ j = Cov[ s outi , s inj ] = p ij (1 − p ij + e − γi − δj )(1 − e − γi − δj ) (101)with p ij ≡ p DECM ij .6 Two-step models forundirected and directed networks
The Hessian matrix for the undirected two-step modelconsidered here is an N × N symmetric table reading H undCReM = Var[ s i ] = P Nj =1( j = i ) f ij ( θ i + θ j ) , ∀ i Cov[ s i , s j ] = f ij ( θ i + θ j ) , ∀ i = j (102)where f ij is given. In the directed case, instead, the Hes-sian matrix for the two-step model considered here is a2 N × N symmetric table that can be further subdividedinto four N × N blocks whose entries read H dirCReM Var[ s outi ] = P Nj =1( j = i ) f ij ( α i + β j ) , ∀ i Var[ s ini ] = P Nj =1( j = i ) f ji ( α j + β i ) , ∀ i Cov[ s outi , s inj ] = f ij ( α i + β j ) , ∀ i = j (103)while Cov[ s outi , s ini ] = Cov[ s outi , s outj ] = Cov[ s ini , s inj ] = 0and f ij is given. APPENDIX B: A NOTE ONTHE CHANGE OF VARIABLES
In all methods we will considered in the present work,the variable θ i appears in the optimality conditions onlythrough negative exponential functions: it is thereforetempting to perform the change of variable x i ≡ e − θ i .Although this is often performed in the literature, onecannot guarantee that the new optimization problemremains convex: in fact, simple examples can be pro-vided for which convexity is lost. This has several conse-quences, e.g. 1) convergence to the global maximum is nolonger guaranteed (since the existence of a global max-imum is no longer guaranteed as well), 2) extra-care isneeded to guarantee that the Hessian matrix H employedin our algorithms is negative definite. While problem2) introduces additional complexity only for Newton’smethod, problem 1) is more serious from a theoreticalpoint of view.Let us now address problem 1) in more detail. First,it is possible to prove that any stationary point for L ( ~x )satisfies the optimality conditions for L ( ~θ ) as well. Infact, the application of the ‘chain rule’ leads to recoverthe set of relationships ∂ L ( ~θ ) ∂θ i = ∂x i ∂θ i ∂ L ( ~x ) ∂x i = − x i ∂ L ( ~x ) ∂x i , i = 1 . . . M ;(104)notice that requiring ∇ θ i L ( ~θ ) = 0 leads to require thateither ∇ x i L ( ~x ) = 0 or x i = 0. As the second eventual-ity precisely identifies isolated nodes (i.e. the nodes for which the constraint C i ( G ∗ ), controlled by the multiplier θ i , is 0), one can get rid of it by explicitly removing thecorresponding addenda from the likelihood function.For what concerns convexity, let us explicitly calculatethe Hessian matrix for the set of variables { x i } Mi =1 . Informulas, ∂ L ( ~x ) ∂x i = e θ i ∂ L ( ~θ ) ∂θ i + ∂ L ( ~θ ) ∂θ i ! , i = 1 . . . M,∂ L ( ~x ) ∂x i ∂x j = e θ i + θ j ∂ L ( ~θ ) ∂θ i ∂θ j ! , ∀ i = j (105)according to the ‘chain rule’ for second-order derivatives.More compactly, H L ( ~x ) = e Θ ◦ (cid:16) − Cov[ C i , C j ] + I · ∇ ~θ L ( ~θ ) (cid:17) (106)where I is the identity matrix, the generic entry of thematrix e Θ reads (cid:2) e Θ (cid:3) ij ≡ e θ i + θ j , ∀ i, j and the symbol‘ ◦ ’ indicates the Hadamard (i.e. element-wise) productof matrices. In general, the expression above defines an indefinite matrix, i.e. a neither positive nor negative(semi)definite one. APPENDIX C: FIXED POINT METHODIN THE MULTIVARIATE CASE
Equation (21) can be written as θ ( n ) i = G i ( ~θ ( n − ) , i = 1 . . . N ; (107)for the sake of illustration, let us discuss it for the UBCMcase. In this particular case, the set of equations abovecan be rewritten as θ ( n ) i = − ln k i ( A ∗ ) P Nj =1( j = i ) (cid:18) e − θ ( n − j e − θ ( n − i − θ ( n − j (cid:19) , i = 1 . . . N. (108)Since all components of the map G are continuous on R N , the map itself is continuous on R N . Hence, a fixedpoint exists. Let us now consider its Jacobian matrix andcheck the magnitude of its elements. In the UBCM case,one finds that7 ∂G i ∂θ i = P Nj =1( j = i ) e − θi − θj ( e − θi − θj ) P Nj =1( j = i ) (cid:16) e − θj e − θi − θj (cid:17) = P Nj =1( j = i ) (cid:16) e − θi − θj e − θi − θj (cid:17) P Nj =1( j = i ) (cid:16) e − θi − θj e − θi − θj (cid:17) = 1 − Var[ k i ] h k i i = 1 − P j = i Cov[ k i , k j ] h k i i , ∀ i (109)and ∂G i ∂θ j = − e − θj ( e − θi − θj ) P Nj =1( j = i ) (cid:16) e − θj e − θi − θj (cid:17) = − e − θi − θj ( e − θi − θj ) P Nj =1( j = i ) (cid:16) e − θi − θj e − θi − θj (cid:17) = − Cov[ k i , k j ] h k i i , ∀ i, j. (110)Let us notice that 1) each element of the Jacobian ma-trix is a continuous function R N → R and that 2) thefollowing relationships hold (cid:12)(cid:12)(cid:12)(cid:12) ∂G i ∂θ j (cid:12)(cid:12)(cid:12)(cid:12) ≤ , ∀ i, j ; (111)unfortunately, however, when multivariate functions areconsidered, the set of conditions above is not enough toensure convergence to the fixed point for any choice ofthe initial value of the parameters. What is needed to bechecked is the condition || J G ( ~θ ) || <
1, with J indicatingthe Jacobian of the map (i.e. the matrix of the first, par-tial derivatives above) and || . || any natural matrix norm:the validity of such a condition has been numerically ver-ified case by case. ∗ [email protected][1] M.E.J. Newman, Networks: an introduction , Oxford Uni-versity Press (2010).[2] V. Colizza, A. Barrat, M. Barthelemy, A. Vespignani,
Proceedings of the National Academy of Sciences (7),2015-2020 (2006).[3] A. Barrat, M. Barthlemy, A. Vespignani,
Dynamical pro-cesses on complex networks , Cambridge University Press(2008).[4] R. Pastor-Satorras, C. Castellano, P. Van Mieghem, A.Vespignani,
Rev. Mod. Phys. , 925 (2015).[5] C. Castellano, S. Fortunato, V. Loreto Rev. Mod. Phys. , 591 (2009).[6] T. Squartini, I. van Lelyveld, D. Garlaschelli, Sci. Rep. , 3357 (2013).[7] G. Cimini,T. Squartini, F. Saracco, D. Garlaschelli, A.Gabrielli, G. Caldarelli. Nat. Rev. Phys. (1), 58-71(2019). [8] S. Maslov and K. Sneppen, Science (5569), 910-913(2002).[9] A.C.C. Coolen, A. De Martino, A. Annibale,
J. Stat.Phys. , 1035-1067 (2009).[10] E.S. Roberts, A.C.C. Coolen,
Phys. Rev. E (4),046103 (2012).[11] Y. Artzy-Randrup, L. Stone, Phys. Rev. E (5), 056708(2005).[12] C.I. Del Genio, H. Kim, Z. Toroczkai, K.E. Bassler, PLoSOne (4), e10012 (2010).[13] H. Kim, C.I. Del Genio, K.E. Bassler, Z. Toroczkai, NewJ. Phys. , 023012 (2012).[14] J. Blitzstein, P. Diaconis, Internet Mathematics (4),489-522 (2011).[15] T. Squartini, D. Garlaschelli, New. J. Phys. , 083001(2011).[16] J. Park, M.E.J. Newman, Phys. Rev. E (6), 066117(2004).[17] G. Bianconi, Europhys. Lett. (2), 28005 (2007).[18] A. Fronczak, P. Fronczak, J.A. Holyst, Phys. Rev. E (1), 016108 (2006).[19] A. Gabrielli, R. Mastrandrea, G. Caldarelli, G. Cimini, Phys. Rev. E (3), 030301(R) (2019).[20] A. Fronczak, Exponential random graph models in Ency-clopedia of Social Network Analysis and Mining, Springer(edited by R. Alhajj and J. Rokne) (2014).[21] E.T. Jaynes,
Phys. Rev. (4), 620-630 (1957).[22] N. Dianati, arXiv:1607.01735 (2016).[23] N. Vallarano, C. Tessone, T. Squartini, arXiv:2005.00114 (2020).[24] D. Garlaschelli, M.I. Loffredo,
Phys. Rev. E (1),015101(R) (2008).[25] J. Nocedal, S.J. Wright, Numerical Optimization ,Springer (2006).[26] S. Boyd, L. Vandenberghe,
Convex Optimization , Cam-bridge University Press (2004).[27] F. Chung, L. Lu,
Ann. Combinatorics , , 125-145 (2002).[28] K. Oshio, Y. Iwasaki, S. Morita, Y. Osana, S. Gomi, E.Akiyama, K. Omata, K. Oka and K. Kawamura, Tech.Rep. of CCeP, Keio Future , (Keio University, 2003).[29] V. Colizza, R. Pastor–Satorras and A. Vespignani, Nat.Phys. , 276-282 (2007).[30] http://dip.doe-mbi.ucla.edu/dip/Main.cgi[31] V. Colizza, A. Flammini, M.A. Serrano and A. Vespig-nani, Nat. Phys. , 110-115 (2006).[32] J.-H. Lin, K. Primicerio, T. Squartini, C. Decker,C.J.Tessone, New J. Phys. , 083022 (2020).[33] T. Squartini, G. Fagiolo, D. Garlaschelli, Phys. Rev. E , 046117 (2011).[34] A. Bovet, C. Campajola, F. Mottes, V. Restocchi, N.Vallarano, T. Squartini, C,J. Tessone, arXiv:1907.03577 (2019).[35] G. Caldarelli, R. de Nicola, M. Petrocchi, M. Pratelli, F.Saracco, arXiv:2010.01913 (2020).[36] F. Saracco, R. Di Clemente, A. Gabrielli, T. Squartini, Sci. Rep. , 10595 (2015).[37] R. Mastrandrea, T. Squartini, G. Fagiolo, D. Gar-laschelli, New J. Phys. , 043022 (2014).[38] D. Garlaschelli, M.I. Loffredo, Phys. Rev. Lett. (3),038701 (2009).[39] R. Mastrandrea, T. Squartini, G. Fagiolo, D. Gar-laschelli,
Phys. Rev. E (6), 062804 (2014).[40] K. Gleditsch, J. Conflict Resol. , 712-24 (2002).[41] G. Iori, G. De Masi, O. Vasile Precup, G. Gabbi, G. Caldarelli,
J. Econ. Dyn. Control (1), 259-278 (2006).[42] F. Parisi, T. Squartini, D. Garlaschelli, New J. Phys. ,053053 (2020).[43] G. Cimini, T. Squartini, A. Gabrielli, D. Garlaschelli, Phys. Rev. E , 040802 (2015).[44] G. Cimini, T. Squartini, A. Gabrielli, D. Garlaschelli, Sci. Rep.5