[PDF] Entropic Dynamics of Networks

Abstract

Here we present the entropic dynamics formalism for networks. That is, a framework for the dynamics of graphs meant to represent a network derived from the principle of maximum entropy and the rate of transition is obtained taking into account the natural information geometry of probability distributions. We apply this framework to the Gibbs distribution of random graphs obtained with constraints on the node connectivity. The information geometry for this graph ensemble is calculated and the dynamical process is obtained as a diffusion equation. We compare the steady state of this dynamics to degree distributions found on real-world networks.

Full PDF

aa r X i v : . [ phy s i c s . s o c - ph ] F e b Entropic dynamics of networks

Felipe Xavier Costa , Pedro Pessoa Department of Physics, University at Albany - SUNYAlbany, NY - USA

Abstract

Here we present the entropic dynamics formalism for networks. That is, a framework forthe dynamics of graphs meant to represent a network derived from the principle of maximumentropy and the rate of transition is obtained taking into account the natural informationgeometry of probability distributions. We apply this framework to the Gibbs distribution ofrandom graphs obtained with constraints on the node connectivity. The information geometryfor this graph ensemble is calculated and the dynamical process is obtained as a diﬀusionequation. We compare the steady state of this dynamics to degree distributions found onreal-world networks.

Keywords : Random graphs, Networks, Scale-free networks, Maximum Entropy, Informationgeometry, Entropic Dynamics, Information theory Introduction

Since the work of Jaynes [1, 2], the original method of maximum entropy (MaxEnt) hasexplained thermodynamics in terms of information theory by deriving Gibbs distributionsin the context of statistical mechanics. Those distributions arise as a result of a well-posedproblem, namely selecting the distribution that is least-informative under a set of expectedvalue constraints. Also, the Gibbs distributions coincide with what is known in statisticaltheory as the exponential family – the only distributions for which a ﬁnite set of suﬃcientstatistics, functions that generate the expected values, exist (see e.g. [3]). Under this generalunderstanding, it is not surprising that, after Jaynes, MaxEnt has been presented as a generalmethod for inference [4–7] and being applied in a large range of subjects such as, but notlimited to, economics [8, 9], ecology [10, 11], cell biology [12, 13], opinion dynamics [14, 15]and geography [16, 17]. This modern perspective of MaxEnt is seen as a method for updatingprobability distributions when new information about the system becomes available. Underthis understanding it is also not surprising that the methods of Bayesian inference [18] andseveral machine learning techniques [19], including those used in image processing [20, 21]and deep learning [22, 23] are found to be a particular application of MaxEnt – either as aconsequence of Bayes or directly.Gibbs distributions have been studied in the context of random graphs by Park andNewman [24], leading to an extensive investigation of MaxEnt applications in network science[25–31]. In this plethora of investigations, many models are proposed as diﬀerent choices ofthe sample space – e.g. simple graphs, weighted graphs – and suﬃcient statistics – functionsdeﬁned over the possible graphs (e.g. total connectivity, node degrees sequences, and averagenearest neighbour connectivity). However, considering these MaxEnt procedures, when onechooses a function as suﬃcient statistics it does not mean that a precise number is knownfor their expected values. As an example, MaxEnt obtains the correct distribution of linkplacement for these scale-free networks when one chooses the degree of each node as suﬃcientstatistics, the power law behavior for the node degrees is ﬁtted posteriorly with data. However,as explained by Radicchi et al. [31], this MaxEnt model alone does not justify why in somenetworks the node degrees are highly heterogeneous. Therefore an external model for samplingdegree sequences is needed.The issue described is a statement of the fact that, despite its generality, MaxEnt cannottell by itself what constraints are relevant to the speciﬁc problem. The accuracy of constraintsare justiﬁed by the fact that they work in practice, that is, they lead to a model that accu-rately describes the system of interest. For example, in physics [32] we can assume that themicroscopic world follows a conservative (Hamiltonian) dynamic, leading to expected valueconstraint of conserved quantities. Ultimately, unavoidable scientiﬁc labor is necessary tounderstand which constraint correct implement the information one has about the system ofinterest.On the other hand, the principles of information theory can be used to obtain the lawsof dynamics for stochastic dynamical systems – the transition probabilities are derived bymaximizing an entropy – this idea is referred to as entropic dynamics (EntDyn) [33] andhas already found successful applications in quantum mechanics [34], quantum ﬁelds [35],renormalization groups [36], ﬁnance [37], and neural networks [38]. In a recent paper [39] wepresented an entropic formalism for dynamics in a space of Gibbs distributions. The dynamicsdeveloped there relies on the concepts of information geometry [40–42], an area of investigationthat assigns diﬀerential geometric structure to the space of probability distributions . The For the applications of EntDyn in quantum mechanics and quantum ﬁeld theory the constraints must be chosenso that the Hamiltonian structure is recovered. Incidentally, information geometry has been applied to generate measures for complexity, see e.g. [43–46]. ynamics obtained is a diﬀusion process in the Gibbs statistical manifold - space of Gibbsdistributions parametrized by the expected values and endowed with a Riemmanian metricfrom information geometry.A widely known example for dynamics of networks considers a preferential attachmentmechanism for the evolution of node degrees(see e.g [47–49]), leading to scale-free networks,where the degree distribution follows a power law. On the other hand, there has been litera-ture challenging reported scale-free networks [50] and power laws in general [51]. These arguethat one can not verify the power law behavior in real world networks under strong statisticalvalidation, which indicates that scale-free networks are expected from a highly idealized pro-cesses and further dynamical models accounting for the peculiarities of a particular systemare in order [52, 53].Our goal with the present article is to show how EntDyn can provide a systematic wayto derive dynamics of network ensembles. As an example, we apply the EntDyn developedin [39] to the space of Gibbs distributions of graphs obtained after choosing the node degreesas constrains. We compare the steady state distributions obtained from EntDyn to the dis-tributions found in real-world networks [50]. These results are not based on an underlyingdynamics with particular assumptions, rather they are a consequence of the information ge-ometry of networks ensembles. We bring into attention that although the dynamics developedhere is simple, we also comment on how the framework provided by EntDyn is ﬂexible enoughso that further constraints can be implements to account for the information available aboutthe dynamical process.In the following section, we will present the random graphs model used and the maximumentropy distributions and information geometry derived from it. In section 3, we review theentropic dynamics presented on [39] in the context of random graphs obtaining a diﬀerentialequation for the dynamics of networks. In section 4, we ﬁnd the steady-states of the diﬀerentialequation and argue on how the power law behavior emerges from the dynamics. In this section, we will establish the random graph model for the present article and obtainthe Gibbs distribution and the metric tensor for its information geometric structure. A graphis deﬁned by a set of nodes (or vertices) V and a set of links (or edges) E , each link connectstwo nodes, ε = ( i, j ) ∈ E , where i, j ∈ { , , . . . , | V |} are elements of an enumeration for theset of nodes V. In network science, meaning is attached to the elements of a graph models asits nodes represent entities and its links represent interactions between entities (for examplenetworks where the links represent publication coauthorships between scientists – nodes –[54], links representing associations between gene/disease [55] or scientiﬁc concepts [56]). Forthe scope of the present article we treat graphs in a general manner without attributing anyinformation related to what the random graph may represent. Because of this, the constraintdeﬁned here and the dynamical assumptions in the next section will be as general as possible.

At this stage we will attribute, through MaxEnt, a probability distribution ρ ( G ) for eachgraph (microstate) G = ( V, E ). Inspired by Radicchi et al. [31] – although similar descriptionshave been proposed before e.g. [24, 26] – we will suppose a graph with N = | V | nodes and L = | E | connections, and constraints on the number of connections (also referred to as degreeor connectivity) of each node i .To obtain the appropriate distribution ρ ( G ) we ought to maximize the functional [ ρ | q ] = − X G ρ ( G ) log ρ ( G ) q ( G ) = − X E ρ ( E ) log ρ ( E ) , (1)where q is a prior distribution. The functional S [ ρ | q ] in (1) is known as Kullback-Leiber (KL) entropy, reducing to Shannon entropy when q is uniform. The last equality in (1) holdsas we assume that we are inferring over an already known number of nodes N and uniformprior. Each link is treated independently under the same constraints of node connectivity.Since Shannon entropy is additive – meaning that for independent subsystems the jointentropy is the sum of entropies calculated for each subsystem – and preserves subsystemsindependence [7] – meaning if two subsystems are independent in the prior and the constrainsdo not require correlations between the posterior (distribution that maximizes entropy) willalso be independent for each subsystem – therefore, with separate constraints for each link,(1) reduces to S [ ρ | q ] = − X E Y ε ρ ( ε ) ! log Y ε ρ ( ε ) ! = L s [ ρ ] , (2)where s is the entropy per node s [ ρ ] = − X ε ρ ( ε ) log ρ ( ε ) . (3)Thus maximizing the S is equivalent to maximizing the entropy per node s .To implement, in our MaxEnt procedure, that the relevant information is the degree ofeach node we propose the following suﬃcient statistics a i ( ε = ( j, m )) = ( δ ij + δ im ) so thatthe expeted value constraints are X j,m ρ ( ε = ( j, m )) δ ij + δ im ! = k i L = A i , (4)where k i is the expected degree of each node i . The 2 L factor is included so that the expectedvalues A i sum to unity, since by construction, the sum of degrees is twice the number ofconnections. The function that maximizes (3) under (4) and normalization is the Gibbsdistribution ρ ( i, j | λ ) = 1 Z e − λ i − λ j , where Z = X i,j e − λ i − λ j = X i e − λ i ! , (5)and λ = { λ , λ , . . . , λ N } is the set of Lagrange multipliers dual to the expected values A = { A , A , . . . , A N } . In (5), we have the probability that a link ε connects the nodes i, j given the set of Lagrange multipliers. However, (4) indicates that they can also beparametrized by the expected values A . The two sets of parameters are related by A i = − Z ∂Z∂λ i = e − λ i P j e − λ j . (6)Equating the previous result with (4) we obtain k i = e − λ i allowing us to write Even though the network problem might make one expect to obtain power laws it does not mean we shouldskew away from KL entropy. It has been widely reported that other functionals proposed to replace KL entropy,such as Renyi’s or Tsallis’, induce correlations not existing in the prior or constraints [57–60] and therefore lead toinconsistent statistics. ( ε = ( i, j ) | A ) = k i k j (2 L ) = A i A j . (7)That is, we can interpret A i as the probability for which a speciﬁc link ε has the node i inone of its ends. For reasons that will be presented later in our investigation, it is also usefulto calculate the node entropy at its maximum as a function, rather than a functional, of theexpected values, meaning s ( A ) . = s [ ρ ( i, j | A )] = − X i A i log A i , (8)the last equality is found by substituting (7) into (3).Since we can parametrize the space of probability distributions by expected values A i we will use those as coordinates when assigning the geometry to this space in the followingsubsection. Our present goal is to assign a Riemmanian geometric structure to the distributions deﬁnedin (7). That means the space of Gibbs distributions is uniquely deﬁned by the values of A ,and the distances obtained from d ℓ = P i,j g ij d A i d A j are a measure of distinguishabilitybetween the neighbouring distributions ρ ( i, j | A ) and ρ ( i, j | A + d A ). The metric components g ij are given by the Fisher-Rao information metric (FRIM) [61, 62] g ij = X m,n ρ ( m, n | A ) ∂ log ρ ( m, n | A ) ∂A i ∂ log ρ ( m, n | A ) ∂A j . (9)This metric is not arbitrarily chosen, FRIM provides the only Riemmanian geometric struc-ture that is consistent with Markov embeddings [63, 64], hence this metric structure is aconsequence of the grouping property of probability distributions.Before calculating the FRIM for the distributions deﬁned in (7) is important to rememberthat P i A i = 1. We will express then the value related to the last node in the enumerationas a dependent variable A N = 1 − P N − i =1 A i . Since we are describing graphs with a ﬁxednumber of links L , the constraints deﬁned in (4) have some level of redundancy, namely A N is automatically deﬁned by the set of all others. Even though this does not interferewith the maximization process – MaxEnt is robust enough to properly deal with redundantinformation – this has to be taken into account when calculating the summations in (9).Therefore the FRIM components for the probabilities obtained in (7) are g ij = 2 δ ij A i + 2 A N where i, j ∈ [1 , N − . (10)It would be useful to have an expression valid for all indexes, i, j ∈ [1 , N ]. For it we use, asin [40], that d A N = − P N − i =1 d A i , the expression for inﬁnitesimal distances then becomesd ℓ = N − X i =1 N − X j =1 (cid:18) δ ij A i + 2 A N (cid:19) d A i d A j = X i,j δ ij A i d A i d A j . (11)Yielding then a much simpler and diagonal metric tensor g ij = 2 A i δ ij . (12) s it is a property of Gibbs distributions [39,40] this metric tensor could also have been foundas the Hessian of s ( A ) in (8). A diagonal result is consistent with the fact that, per (7), bothnodes at the end of a link will be sampled independently with the same distribution.Having calculated the metric for the Gibbs distributions of our graph model, we have allelements to deﬁne a dynamics on it in the following section. Entropic dynamics is a formalism for which the laws of dynamics are derived from entropicmethods of inference. For the scope of the present article we are going to evolve the parameters A representing a change for the probabilities for ε in (7), this is equivalent to have distributionsfrom which the sequences of node degrees, k i = 2 LA i , are sampled. In this description, theprobabilities of links can be recovered from P ( ε, A ) = P ( A ) ρ ( ǫ | A ) , (13)where ρ is deﬁned in (7). The dynamical process will describe how the change from a set ofparameters A – representing an instant of the system– evolves to a set of parameters A ′ forwhich the distribution for a later instant P ( A ′ ) is assigned as P ( A ′ ) = Z d A P ( A ′ | A ) P ( A ) (14)EntDyn consists on ﬁnding the transition probability P ( A ′ | A ) through the methods of infor-mation theory. As done in our previous work [39], the dynamical process will rely on twoassumptions: (i) the changes happens continuously which will determine the choice of priorand (ii) that the motion has to be restricted to the Gibbs distributions obtained from ρ ( ε | A )in (7) which will determine our constraint. Beyond the scope of the present article, diﬀerentmodels can be generated by imposing constraints that implements other information knownabout the dynamical process. The entropy we need to maximize has to account for the joint change in the degrees ofuncertainty in the the graph connections ε as well as the parameters A which is representedby the distribution in (13). The transition from A to A ′ must also contain information aboutthe transitions from ε to a latter link distributions ε ′ . Therefore, we must maximize entropyfor the joint transition P ( ε ′ , A ′ | ε, A ), meaning S [ P | Q ] = − X ε ′ Z d A ′ P ( ε ′ , A ′ | ε, A ) log (cid:18) P ( ε ′ , A ′ | ε, A ) Q ( ε ′ , A ′ | ε, A ) (cid:19) . (15)where Q ( ε ′ , A ′ | ε, A ) is the prior to be determined. We shall call S the dynamical entropyto avoid confusion with the graph entropy in (1) and the entropy per node (3). The prior that implements continuity for the motion on the statistical manifold but isotherwise uninformative is of the form Q ( ε ′ , A ′ | ε, A ) = Q ( ε ′ | ε, A, A ′ ) Q ( A ′ | ε, A ) ∝ g / ( A ′ ) exp (cid:18) − τ g ij ∆ A i ∆ A j (cid:19) , (16)as explained in [39], where ∆ A i = A ′ i − A i , g = det g ij , and τ is a parameter that willeventually take the role of time, since when τ → dℓ → Continuous motion might not sound as a natural assumption in a discrete system, such as graphs, howevereven if a space is discrete, the set of probability distributions on it is continuous as are the expected values thatparametrize it. he constraint that implements that the motion does not leave the space of Gibbsdistributions deﬁned in Section 2.1 is P ( ε ′ , A ′ | ε, A ) = P ( ε ′ | ε, A, A ′ ) P ( A ′ | ε, A ) = ρ ( ε ′ | A ′ ) P ( A ′ | ε, A ) . (17)that means the distribution for ε ′ conditioned on A ′ must be of the form (7). Note that per(17) the only factor still undetermined for the full transition probability is P ( A ′ | ε, A ). The result obtained when maximizing (15) with the prior (16) and under (17) is P ( A ′ | ε, A ) ∝ g / ( A ′ ) exp (cid:18) s ( A ′ ) − τ g ij ∆ A i ∆ A j (cid:19) . (18)Note that it is independent of ε , which is not surprising since neither the prior nor theconstraints assume any correlation between A ′ and ε hence, by marginalization, P ( A ′ | A ) = P ( A ′ | ε, A ). For short steps, d ℓ → A →

0, we can expand s in the linear regime,leading to the transition probability of the form P ( A ′ | A ) = 1 Z ( A ) g / ( A ′ ) exp X i ∂s∂A i ∆ A i − X ij τ g ij ∆ A i ∆ A j  , (19)where the normalization factor Z ( A ) absorbs the proportionality constant in (18) and e s ( A ) .In [39] we calculate the moments for this transitions up to order τ obtaining (cid:10) ∆ A i (cid:11) = τ X j,k (cid:18) g ij ∂s∂A j −

12 Γ ijk g jk (cid:19) , (cid:10) ∆ A i ∆ A j (cid:11) = τ g ij and D ∆ A i ∆ A j ∆ A k E = 0 ; (20)where g ij are the elements of the inverse matrix to g ij , P j g ij g jk = δ ki and Γ ijk are theChristoﬀel symbols.Equation (20) is the deﬁnition of a smooth diﬀusion [65] if we choose τ as a time duration∆ t , which is equivalent to calibrating our time parameter in terms of the ﬂuctuations ∆ t . = τ ∝ P ij g ij ∆ A i ∆ A j . That means, here the role of time emerges from emergent properties ofthe motion, up to a multiplying constant, time measure the ﬂuctuations in A . The system isits own clock. As explained in [39] this leads to the evolution as a Fokker-Planck equation ∂p∂t = − g / X i ∂∂A i (cid:16) g / pv i (cid:17) , where v i = g ij ∂∂A j (cid:18) s −

12 log( p ) (cid:19) , (21)and p is the invariant probability density p ( A ) . = P ( A ) √ g ( A ) . If we substitute the graph entropyin (8) and the metric in (12) we obtain ∂p∂t = X i (cid:18)(cid:20)

12 log A i + 32 (cid:21) p + (cid:20) A i (cid:0) log A i + 1 (cid:1) + 18 (cid:21) ∂p∂A i + A i ∂ p∂A i (cid:19) . (22)This establishes the dynamical equation for the graph model. In the following section we willfocus on ﬁnding a steady-state ¯ p ( A ) for (22). In order to ﬁnd a steady state in (22) is interesting to see that for ∂ ¯ p∂t = 0 the equationis separable, meaning the solution can be written as a product of the same function, p ( a ), e-07 1e-06 1e-05 1e-04 1e-03 1e-02 1e-01 a P ( a ) dp = -10dp = -1dp = 1dp = 10L = 2 a P ( a ) dp = -10dp = -1dp = 1dp = 10L = 2 a P ( a ) dp = -10dp = -1dp = 1dp = 10L = 2 Figure 1: Degree distribution derived from the solution of the IVP (26) for diﬀerent values of L = 2 (left), 2 (center), and 2 (right). The degree distribution does not depend on d p andhave a very similar behaviour across L . for each argument ¯ p ( A = { A i } ) = Q i p ( a = A i ). This leads to say that each term in thesummation on (22) has to be zero and therefore p ( a ) has to follow a p d a + (cid:20) a (log a + 1) + 18 (cid:21) d p d a + (cid:20)

12 log a + 32 (cid:21) p = 0 . (23)In order to solve the above equation we make the substitution y = √ a , so that it transformsinto d p d y + y [ f ( y ) −

2] d p d y + f ( y ) p = 0 , (24)where f ( y ) = 2 log y − log 8 + 3. This substitution is equivalent to write (21) under a changeof coordinates Y i = √ A i , in which the metric (12) transforms into an Euclidean metric, P ij g ij d A i d A j = P ij δ ij d Y i d Y j .The range at which (24) is valid takes into account the fact that the maximum possibleconnectivity is the number of links, k = L , leading to a possible maximum value for a = 1and y = √

8, when self-connections are not ignored. However, Anand et. al. [28] arguesthat in order for the connectivity of each node to remain uncorrelated, a lower maximumconnectivity should be considered. Inspired by their arguments we can set y max = p /L ,corresponding to k max = √ L and a max = 1 / √ L . Also, we can see that (24) diverges at y = 0 unless p ( y = 0) = 0. Therefore, we’ll consider y ∈ (0 , y max ].Solving (24) is enough to obtain the steady-state values for p ( a ) in (22) and therefore thedegree distribution P ( a = k L ) = r a p ( y = √ a ) , (25)where the square root factor comes from the information metric p g ( a ) = q a . We chooseto solve (24) as an initial value problem (IVP) by making sure that the node connectivityremains uncorelated, thus setting p ( y = y max ) = 0 and d p d y (cid:12)(cid:12)(cid:12)(cid:12) y = y max = d p , (26)where y max = p /L was considered for L = 2 , 2 and 2 . The ﬁnal result is laternormalized, and the precise value of d p is to be investigated.The degree distribution P ( a = k L ) obtained from this method is presented in Fig. 1, wherewe see that, under normalization, the value of d p does not inﬂuence the probability values. P ( a ) ∝ Range RMSEWeibull(¯ a ; λ , k ) (cid:0) ¯ ak (cid:1) λ − e − ( ¯ ak ) λ ¯ a ∈ (0 , · − ] in Fig. 2 0.15Gamma(¯ a ; λ , k ) ¯ a − λ e − k ¯ a ¯ a ∈ (0 , − ] in Fig. 2 0.26Weibull( a ; λ , k ) (cid:0) ak (cid:1) λ − e − ( ak ) λ a ∈ [10 − , ] in Fig. 3 2.88Table 1: Summary of the real-world degree distributions ﬁt to the results of the IVPs (26) and(27). The chosen range takes into account the value of the root mean square error (RMSE), whichis minimized relative to the entire range of a in the plot. Furthermore, upon the rescalling ¯ a = a/a max and ¯ P = P √ a max , or similarly ¯ y = y/y max , thenumber of links L does not alter the behaviour of the degree distribution, as seen in Fig. 2.Another initial value condition we investigated was to consider every node in the graphto have at least one connection, p ( y = 0) = 0 and d p d y (cid:12)(cid:12)(cid:12)(cid:12) y =0 = d p , (27)The integration runs until y = √

8, where a = 1, and then normalized. Similarly to the previ-ous case, the value of d p does not interfere with the degree distribution after normalization,as shown in Fig. 3. a P ( a ) L = 2 L = 2 L = 2 Gamma( a ; 0.505, 2.797)Weibull( a ; 0.502, 0.578)

Figure 2: Re-scaled degree distribution forthe IVP (26) irrespective of the number oflinks L . The result ﬁts well with a Weibulland Gamma, also known as power law withcutoﬀ, distributions within most of the al-lowed range. a P ( a ) dp = -10dp = -1dp = 1dp = 10Weibull(a; 0.143655, 1.02562) Figure 3: Degree distribution derived fromthe solution of the IVP (27). The value ofd p does not interfere with the degree distri-bution. The result ﬁts well with a Weibulldistribution for a network with many con-nections, L > . Inspired by the distributions reported to be found in real-world networks [50], we ﬁt thenumerical results for P ( a ) to the fat-tailed distributions in Table 1. The range – region atwhich the distribution is valid – is chosen based on the values of a that minimize the rootmean squared error (RMSE) between the functional form and the numerical results. Thefact that for Fig. 2 the solutions start from above zero means that the degree distribution isconsidered from the minimal possible non-zero connectivity, k = 1. Also, the case where everynode has at least one connection only ﬁts with a real-world degree distribution for networks ith L > links. Note from (25) that the metric leads to a natural power law behavior as a − . , which was found throughout most of Fig. 2 range. We presented an entropic dynamics of graphs with a ﬁxed number of nodes N and con-nections L . This model leads to the Gibbs distribution (7) whose information metric is givenby (12). When the dynamical assumption is that we evolve the parameters continuouslyand constrained in the statistical manifold we are lead into the Fokker-Plank equation (21).Steady-state solutions for two diﬀerent IVPs are presented: The diﬀerential equation resultsfor (26) are graphically represented in Fig. 1 and ﬁtted for Weibull and Gamma distributionsin Fig. 2.Also, the results diﬀerential equation for (27) ﬁtted for the Weibull distributionare presented in Fig.3, where the real-world distribution is only valid for graphs with many L > links.Our result is an information theory approach for dynamics of networks in which, undervery general assumptions, the power law behavior emerges. Naturally, this work is not thesingle process for dynamics of networks. Rather, under this method, other random graphmodels can be studied and other constraints – instead of or in addition to (17) – can beimplemented when maximizing the dynamical entropy S as in (15). This brings a perspectivethat the present article can be seen as an example on the ability to obtain dynamical processin complex systems using information theory. Acknowledgments

We would like to thank A. Caticha for insightful discussion in the development of thisarticle. P.Pessoa was ﬁnanced in part by CNPq – Conselho Nacional de DesenvolvimentoCient´ıﬁco e Tecnol´ogico– (scholarship GDE 249934/2013-2)

References [1] Jaynes, E. T. Information theory and statistical mechanics. I.

Physical Review ,620, DOI: 10.1103/PhysRev.106.620 (1957).[2] Jaynes, E. T. Information theory and statistical mechanics. II.

Physical Review ,171, DOI: 10.1103/PhysRev.108.171 (1957).[3] Andersen, E. B. Suﬃciency and exponential families for discrete samplespaces.

Journal of the American Statistical Association , 1248–1255, DOI:10.1080/01621459.1970.10481160 (1970).[4] Shore, J. & Johnson, R. Axiomatic derivation of the principle of maximum entropy andthe principle of minimum cross-entropy. IEEE Transactions on information theory ,26–37, DOI: 10.1109/TIT.1980.1056144 (1980).[5] Skilling, J. The Axioms of Maximum Entropy. In Erickson, G. J. & Smith, C. R.(eds.) Maximum-Entropy and Bayesian Methods in Science and Engineering , vol. 31-32,173–187, DOI: 10.1007/978-94-009-3049-0 8 (Springer, Dordrecht, 1988).[6] Caticha, A. Relative Entropy and Inductive Inference. In

AIP Conference Proceedings ,vol. 707, 75–96, DOI: 10.1063/1.1751358 (American Institute of Physics, 2004).

7] Vanslette, K. Entropic updating of probabilities and density matrices.

Entropy , 664,DOI: 10.3390/e19120664 (2017).[8] Golan, A. Information and Entropy Econometrics — A Review and Synthesis. Founda-tions and Trends(R) in Econometrics , 1–145, DOI: 10.1561/0800000004 (2008).[9] Caticha, A. & Golan, A. An entropic framework for modeling economies. Physica A: Sta-tistical Mechanics and its Applications , 149–163, DOI: 10.1016/j.physa.2014.04.016(2014).[10] Harte, J.

Maximum entropy and ecology: a theory of abundance, distribution, and ener-getics (OUP Oxford, 2011).[11] Bertram, J., Newman, E. A. & Dewar, R. C. Comparison of two maximum en-tropy models highlights the metabolic structure of metacommunities as a key de-terminant of local community assembly.

Ecological Modelling , 108720, DOI:10.1016/j.ecolmodel.2019.108720 (2019).[12] De Martino, A. & De Martino, D. An introduction to the maximum entropy ap-proach and its application to inference problems in biology.

Heliyon , e00596, DOI:10.1016/j.heliyon.2018.e00596 (2018).[13] Dixit, P. D., Lyashenko, E., Niepel, M. & Vitkup, D. Maximum entropy framework forpredictive inference of cell population heterogeneity and responses in signaling networks. Cell Systems , 204–212, DOI: 10.1101/137513 (2020).[14] Vicente, R., Susemihl, A., Jeric´o, J. P. & Caticha, N. Moral foundations in an inter-acting neural networks society: A statistical mechanics analysis. Physica A: StatisticalMechanics and its Applications , 124–138, DOI: 10.1016/j.physa.2014.01.013 (2014).[15] Alves, F. & Caticha, N. Sympatric multiculturalism in opinion models. In

AIP Con-ference Proceedings , vol. 1757, 060005, DOI: 10.1063/1.4959064 (AIP Publishing LLC,2016).[16] Wilson, A. A statistical theory of spatial distribution models.

Transportation Research , 253–269, DOI: 10.1016/0041-1647(67)90035-4 (1967).[17] Yong, N., Ni, S., Shen, S. & Ji, X. An understanding of human dynamics in urbansubway traﬃc from the maximum entropy principle. Physica A: Statistical Mechanicsand its Applications , 222–227, DOI: 10.1016/j.physa.2016.03.071 (2016).[18] Caticha, A. & Giﬃn, A. Updating Probabilities. In

AIP Conference Proceedings , vol.872, 31–42, DOI: 10.1063/1.2423258 (American Institute of Physics, 2006).[19] Barber, D.

Bayesian Reasoning and Machine Learning (Cambridge University Press,2012).[20] Skilling, J. & Bryan, R. K. Maximum entropy image reconstruction: general al-gorithm.

Monthly Notices of the Royal Astronomical Society , 111–124, DOI:10.1093/mnras/211.1.111 (1984).[21] Higson, E., Handley, W., Hobson, M. & Lasenby, A. Bayesian sparse reconstruction: abrute-force approach to astronomical imaging and machine learning.

Monthly Notices ofthe Royal Astronomical Society

DOI: 10.1093/mnras/sty3307 (2018).[22] Goodfellow, I., Bengio, Y. & Courville, A.

Deep Learning et al.

Statistical mechanics of deep learning.

Annual Review of CondensedMatter Physics , 501–528, DOI: 10.1146/annurev-conmatphys-031119-050745 (2020).

24] Park, J. & Newman, M. E. J. Statistical mechanics of networks.

Physical Review E ,DOI: 10.1103/physreve.70.066117 (2004).[25] Bianconi, G. The entropy of randomized network ensembles. EPL (Europhysics Letters) , 28005, DOI: 10.1209/0295-5075/81/28005 (2007).[26] Bianconi, G. Entropy of network ensembles. Physical Review E , DOI:10.1103/physreve.79.036114 (2009).[27] Anand, K. & Bianconi, G. Entropy measures for networks: Toward an informationtheory of complex topologies. Physical Review E , DOI: 10.1103/physreve.80.045102(2009).[28] Anand, K., Bianconi, G. & Severini, S. Shannon and von Neumann entropy of ran-dom networks with heterogeneous expected degree. Physical Review E , DOI:10.1103/physreve.83.036109 (2011).[29] Peixoto, T. P. Entropy of stochastic blockmodel ensembles. Physical Review E , DOI:10.1103/physreve.85.056122 (2012).[30] Cimini, G. et al. The statistical physics of real-world networks.

Nature Reviews Physics , 58–71, DOI: 10.1038/s42254-018-0002-6 (2019).[31] Radicchi, F., Krioukov, D., Hartle, H. & Bianconi, G. Classical information theory ofnetworks. Journal of Physics: Complexity , 025001, DOI: 10.1088/2632-072x/ab9447(2020).[32] Jaynes, E. T. Gibbs vs boltzmann entropies. American Journal of Physics , 391–398,DOI: 10.1119/1.1971557 (1965).[33] Caticha, A. Entropic dynamics, time and quantum theory. Journal of Physics A: Math-ematical and Theoretical , 225303, DOI: 10.1088/1751-8113/44/22/225303 (2011).[34] Caticha, A. The entropic dynamics approach to quantum mechanics. Entropy , 943,DOI: 10.3390/e21100943 (2019).[35] Ipek, S., Abedi, M. & Caticha, A. Entropic dynamics: reconstructing quantumﬁeld theory in curved space-time. Classical and Quantum Gravity , 205013, DOI:10.1088/1361-6382/ab436c (2019).[36] Pessoa, P. & Caticha, A. Exact renormalization groups as a form of entropic dynamics. Entropy , 25, DOI: 10.3390/e20010025 (2018).[37] Abedi, M. & Bartolomeo, D. Entropic dynamics of exchange rates and options. Entropy , 586, DOI: 10.3390/e21060586 (2019).[38] Caticha, N. Entropic dynamics in neural networks, the renormalization group and thehamilton-jacobi-bellman equation. Entropy , 587, DOI: 10.3390/e22050587 (2020).[39] Pessoa, P., Costa, F. X. & Caticha, A. Entropic dynamics on gibbs statistical manifolds. under review Preprint available https://arxiv.org/abs/2008.04683.[40] Caticha, A. The basics of information geometry. In

AIP Conference Proceedings , vol.1641, 15–26, DOI: 10.1063/1.4905960 (American Institute of Physics, 2015).[41] Amari, S.

Information geometry and its applications (Springer, 2016).[42] Ay, N., Jost, J., Lˆe, H. V. & Schwachh¨ofer, L.

Information Geometry (Springer Inter-national Publishing, 2017).[43] Ay, N., Olbrich, E., Bertschinger, N. & Jost, J. A geometric approach to com-plexity.

Chaos: An Interdisciplinary Journal of Nonlinear Science , 037103, DOI:10.1063/1.3638446 (2011).

44] Felice, D., Mancini, S. & Pettini, M. Quantifying networks complexity from in-formation geometry viewpoint.

Journal of Mathematical Physics , 043505, DOI:10.1063/1.4870616 (2014).[45] Franzosi, R., Felice, D., Mancini, S. & Pettini, M. Riemannian-geometric entropy formeasuring network complexity. Physical Review E , DOI: 10.1103/physreve.93.062317(2016).[46] Felice, D., Cafaro, C. & Mancini, S. Information geometric methods for complex-ity. Chaos: An Interdisciplinary Journal of Nonlinear Science , 032101, DOI:10.1063/1.5018926 (2018).[47] Barab´asi, A.-L. & Albert, R. Emergence of scaling in random networks. Science ,509–512, DOI: 10.1126/science.286.5439.509 (1999).[48] Bianconi, G. & Barab´asi, A.-L. Bose-einstein condensation in complex networks.

PhysicalReview Letters , 5632–5635, DOI: 10.1103/physrevlett.86.5632 (2001).[49] Albert, R. & Barab´asi, A.-L. Statistical mechanics of complex networks. Reviews ofModern Physics , 47–97, DOI: 10.1103/revmodphys.74.47 (2002).[50] Broido, A. D. & Clauset, A. Scale-free networks are rare. Nature Communications ,DOI: 10.1038/s41467-019-08746-5 (2019).[51] Clauset, A., Shalizi, C. R. & Newman, M. E. J. Power-law distributions in empiricaldata. SIAM Review , 661–703, DOI: 10.1137/070710111 (2009).[52] Barab´asi, A.-L. Love is All You Need

Nature Commu-nications , DOI: 10.1038/s41467-019-09038-8 (2019).[54] Molontay, R. & Nagy, M. Two decades of network science: As seen through the co-authorship network of network scientists. In Proceedings of the 2019 IEEE/ACM Inter-national Conference on Advances in Social Networks Analysis and Mining , ASONAM’19, 578–583, DOI: 10.1145/3341161.3343685 (2019).[55] Goh, K.-I. et al.

The human disease network.

Proceedings of the National Academy ofSciences , 8685–8690, DOI: 10.1073/pnas.0701361104 (2007).[56] Stella, M., de Nigris, S., Aloric, A. & Siew, C. S. Q. Forma mentis networks quantifycrucial diﬀerences in STEM perception between students and experts.

PLOS ONE ,e0222870, DOI: 10.1371/journal.pone.0222870 (2019).[57] Press´e, S., Ghosh, K., Lee, J. & Dill, K. A. Nonadditive entropies yield probabilitydistributions with biases not warranted by the data. Physical Review Letters , DOI:10.1103/physrevlett.111.180604 (2013).[58] Press´e, S. Nonadditive entropy maximization is inconsistent with bayesian updating.

Physical Review E , DOI: 10.1103/physreve.90.052149 (2014).[59] Oikonomou, T. & Bagci, G. B. R´enyi entropy yields artiﬁcial biases not in thedata and incorrect updating due to the ﬁnite-size data. Physical Review E , DOI:10.1103/physreve.99.032134 (2019).[60] Pessoa, P. & Costa, B. A. Comment on “Black hole entropy: A closer look”. Entropy , 1110, DOI: 10.3390/e22101110 (2020).[61] Fisher, R. A. Theory of statistical estimation. Mathematical Proceedings of the Cam-bridge Philosophical Society , 700–725, DOI: 10.1017/S0305004100009580 (1925).

62] Rao, C. R. Information and the accuracy attainable in the estimation of statis-tical parameters. In

Bulletin of Calcutta Mathematical Society , vol. 37, 81, DOI:10.1007/978-1-4612-0919-5 16 (1945).[63] Cencov, N. N. Statistical decision rules and optimal inference.

Transl. Math. Monographs,vol. 53, Amer. Math. Soc., Providence-RI (1981).[64] Campbell, L. L. An extended ˇCencov characterization of the information metric.

Proceed-ings of the American Mathematical Society , 135–141, DOI: 10.2307/2045782 (1986).[65] Nelson, E. Quantum ﬂuctuations (Princeton University Press, 1985).(Princeton University Press, 1985).