RRandom walks on hypergraphs
Timoteo Carletti naXys, Namur Institute for Complex Systems, University of Namur, Belgium
Federico Battiston
Department of Network and Data Science, Central European University, Budapest 1051, Hungary
Giulia Cencetti
Mobs Lab, Fondazione Bruno Kessler, Via Sommarive 18, 38123, Povo, TN, Italy
Duccio Fanelli
Dipartimento di Fisica e Astronomia, Universit`a di Firenze,INFN and CSDC, Via Sansone 1, 50019 Sesto Fiorentino, Firenze, Italy
In the last twenty years network science has proven its strength in modelling many real-worldinteracting systems as generic agents, the nodes, connected by pairwise edges. Yet, in many relevantcases, interactions are not pairwise but involve larger sets of nodes, at a time. These systems are thusbetter described in the framework of hypergraphs, whose hyperedges effectively account for multi-body interactions. We hereby propose a new class of random walks defined on such higher-orderstructures, and grounded on a microscopic physical model where multi-body proximity is associatedto highly probable exchanges among agents belonging to the same hyperedge. We provide ananalytical characterisation of the process, deriving a general solution for the stationary distributionof the walkers. The dynamics is ultimately driven by a generalised random walk Laplace operatorthat reduces to the standard random walk Laplacian when all the hyperedges have size 2 and are thusmeant to describe pairwise couplings. We illustrate our results on synthetic models for which we havea full control of the high-order structures, and real-world networks where higher-order interactionsare at play. As a first application of the method, we compare the behaviour of random walkers onhypergraphs to that of traditional random walkers on the corresponding projected networks, drawinginteresting conclusions on node rankings in collaboration networks. As a second application, weshow how information derived from the random walk on hypergraphs can be successfully used forclassification tasks involving objects with several features, each one represented by a hyperedge.Taken together, our work contributes to unveiling the effect of higher-order interactions on diffusiveprocesses in higher-order networks, shading light on mechanisms at the hearth of biased informationspreading in complex networked systems.
INTRODUCTION
From social systems and the World Wide Web to eco-nomics and biology, networks define a powerful tool todescribe many real-world systems [1–3]. Over the lasttwenty years of network science [4, 5], many interact-ing systems with different functions were shown to ex-hibit surprisingly similar structural properties, at dif-ferent scales. Interestingly, the complex architecture ofreal-world networks was found to significantly interferewith the dynamical processes hosted on them, from socialdynamics [6] to synchronisation [7]. As a consequence,properly tailored dynamical processes are now routinelyemployed to extract information on the a priori unknownstructure of the underlying graphs architectures.Networks materialise as pairwise interactions, repre-sented by edges, among generic agents, the nodes: bytheir very first definition they are thus bound to encodebinary relationships among units. However, an increasingamount of data indicates that, from biological to socialsystems, real-world interactions often occur among morethan two nodes at a time. This phenomenon is not prop-erly described by the traditional paradigm constrained on pairwise interactions, and highlight the need for extendednotions in the realm of network theory. In recent years,an emerging stream of research has been focusing on de-veloping higher-order network models that account forthe diverse kinds of higher-order dependencies, as foundin complex systems.Let us here observe that the current “high-order frame-work” bears some ambiguity, as it has been occasionallyassumed to embrace features which are more specificallystemming from the interactions [8], as e.g. temporaland/or memory effects [9, 10], or reflect the multiplexnature of the examined system [11–13]. Here, the termhigher-order is exclusively meant to refer to agents inter-acting in groups of arbitrary numerosity [14–17], a pro-cess often modelled via simplicial complexes [18–20] orhypergraphs [21–23], non trivial mathematical generali-sation of the ordinary networks.Our focus is on hypergraphs, where relationshipsamong agents are described as collections of nodes as-sembled in sets, called hyperedges, made by any numberof nodes. Hypergraphs provide a natural representationfor many higher-order real-world networks [24, 25]. Insocial systems they can for instance be suited to describe a r X i v : . [ phy s i c s . s o c - ph ] D ec collaboration networks, where nodes denote authors andhyperedges stand for groups of authors, who have writ-ten papers together. Alternatively, hypergraphs can beinvoked to describe face-to-face social networks where in-dividuals can interact in groups of arbitrary sizes [26].In biology, hypergraphs allow to properly model bio-chemical reactions simultaneously involving more thantwo species, or conveniently describe higher-order inter-actions among different families of proteins [15]. Cru-cially, in all these examples, interactions among agentsoccur in groups of arbitrary size, and cannot be splitinto disjoint pairwise interactions. Differently from sim-plicial complexes, a higher-order interaction described byan hypergraph (e.g. a single three body interaction) doesnot require the existence of all lower order interactions(e.g. the three pairwise interactions associated to thesame triangle) [27]. Heterogeneous hypergraphs havebeen sometimes studied by mapping the nodes belong-ing to a hyperedge into a clique of suitable size. How-ever, the drawback of this procedure is that it eventuallyyields a projected network, e.g. shown in Fig. 1, whereonly pairwise interactions are ultimately accounted for(see Appendix A).Linear dynamics [28–30], and specifically randomwalks [31], constitute a simple, although powerful tool toextract information on the relational structure of inter-acting systems. In particular, random walks on complexnetworks [32] have been proven useful to compute cen-trality scores [33], finding communities [34] and providinga taxonomy of real-world networks [35]. In the simplestcase, at each time step, a walker jumps from the nodewhere it belongs to one of its adjacent neighbours, trav-eling across one of the available edges, chosen at randomwith uniform probability. Many variations of this funda-mental process have since then been considered. Theseinclude more sophisticated dynamical implementations,which allow to targeting the walks towards nodes withgiven structural features [36], let them interact at thenodes of the network [37], investigate non-linear transi-tion probabilities [38] and crowded conditions [39], con-sider the temporal [40–42] or multilayer [43, 44] dimen-sions of the edges under different network topologies.Random walks have been defined on simplicial com-plexes [45], but because of the cumbersome involvedcombinatorics, applications have been limited to higher-order interactions of the lowest dimensions, i.e. trian-gles. Moreover, walkers are in general allowed to hopbetween edges or even high-order structures. This is atvariance with the setting that we here aim at exploring,where hops can solely occur among nodes which join ina given high-order structure. In parallel, also randomwalks on hypergraph have been considered by assumingthat all the hyperedges are made by an identical – andconstant – number of nodes [46, 47]. The first randomwalk Laplacian defined on hypergraphs can be probablytraced back to the seminal paper by Zhou and collabora-tors [48]. Each hyperedge is endowed with an arbitraryweight, acting as a veritable bias to the walkers dynam- ics. As observed by the authors of [48], assigning theweights is an outstanding open problem, which deservesto be properly addressed. In this work, we will provethat a physically motivated choice for the aforementionedweights naturally emerge, when framing the problem onsolid microscopic grounds.Interestingly, more complicated nonlinear dynamicshave been also recently studied on simplicial com-plexes [16, 20, 49, 50] or in a pure multi-body frame [51].Once again, however, the focus is placed on low-dimensional simplicial complexes (triangles). Recently,several dynamics, including epidemic spreading [16, 49,52] and synchronisation [38], have been shown to producenew collective behaviours when higher-order interactionsare assumed to shape the networked arrangement. HypergraphProjected network
FIG. 1.
Hypergraph and projected network.
Hyper-graph (top) and corresponding projected network (bottom).In the projected network each hyperedge E α becomes a com-plete clique of size | E α | , with thus | E α | ( | E α | − / Starting from this setting, and by further elaboratingon the above, we propose in this work a new class ofrandom walks, evolving on generic heterogeneous hyper-graphs as dictated by a plausible physical model, andwithout any limitation on the sizes of the hyperedges.In this framework, multi-body proximity is associated tohighly probable exchanges among agents belonging to thesame hyperedge, and walkers mitigate their inclination toexplore the system with a tendency to naturally spendmore time in highly clustered cliques and communities.This feature is reminiscent of bias in information spread-ing, which is known to be affected by the phenomenonof echo chambers [53]. Similarly to the standard randomwalk, at each time step a walker sitting on a node, se-lects a hyperedge among the ones containing the originnode, with a probability proportional to the size of thehyperedge; then the walker jumps with uniform proba-bility onto any node contained in the selected hyperedge.In this way, higher-order interactions between a groupof nodes drive the process and the weights postulated in[48] take non trivial values, as stemming from the micro-scopic dynamics.We shall in particular provide an analytical descrip-tion of the process, by deriving a general formula for thestationary distribution of the walk, and show that the dy-namics is driven by a generalised Laplace operator, thatreduces to the standard random walk Laplacian when allhyperedges have size 2, and the hypergraph results in atraditional network.As already stated, random walks can be used to ranknodes, based on the stationary occupancy probabilityof walkers across the network. Because of these impli-cations, it is therefore interesting to compare the sta-tionary distribution, as obtained within the newly in-troduced framework, with that displayed by standardrandom walkers on the corresponding projected network.Because of the tight interactions among agents belongingto the same hyperedge, the probability to find a walkeron a given node is in principle different, when confrontingthe outcome of the two aforementioned processes. As aconsequence, we expect a different order in the ranking tobe obtained for the same node, depending on the dynam-ical process employed in the analysis. This observationopens up the way to a new definition of centrality forsystems where the high-order structure is known to berelevant. In particular, we will provide a direct evidencefor our claims working with co-authorship networks, asextracted from the arXiv on-line preprint server. Our sec-ond application, to which we alluded above, concerns aclassification task which is borrowed by [48]. Indeed it iswell known that one can model a dataset by resorting tonetworks and then make use of the associated LaplacianEigenmaps [54] to embed the data on a lower dimensionalspace, while hopefully preserving relevant information, inthe spirit of a generalised principal component analysis.Working in the lower dimensional space allows one tocluster together objects. However, when objects to beclassified share annotated features, the use of binary re-lationships, i.e. usual network, results in a dramatic lossof information. One can thus obtain a better embeddingvia hypergraphs and invoke the spectral characteristics ofthe associated Laplacian to achieve more effective clus-tering scores [48, 54]. By replicating the analysis in [48],we will here consider the problem of separating the ani-mals listed in the UCI Machine Learning Depository indistinct class, e.g. mammals, birds, ..., by using at thescope a set of annotated features, e.g. tail, hair, legs andso on. Here nodes are animals and hyperedges features.We will show that the presence of high-order interactionsamong features as encoded via the proposed Laplacianoperator, yields a very effective embedding with just afew of the most significative directions, a result which isin line with that reported in [48].Summing up, we here introduce and discuss a novelgeneralisation of the random walk picture to higher-ordernetworked systems, where hyperedge weights are natu-rally assigned and thus removing any ambiguity in theirvalues. Finally we hint at important exploitations of this novel dynamical framework working along two paradig-matic directions, ranking and classification of data.
MODEL
Incidence and hyper adjacency matrices.
Let usconsider an hypergraph H ( V, E ), where V = { , . . . , n } is the set of n nodes, E = { E , . . . , E m } the set of m hyperedges, with E α an unordered collection of nodes,i.e. E α ⊂ V , ∀ α = 1 , . . . , m . We observe that whenever E α = { i, j } , i.e. | E α | = 2, then the hyperedge is actuallya “standard” edge, denoting a binary interaction amongnodes i and j . An hypergraph where | E α | = 2 ∀ α reducesto a network.We can define the associated hyper incidence matrix e iα , carrying the information about how nodes are sharedamong hyperedges, as e iα = (cid:40) i ∈ E α . (1)We note that the same matrix exists for networks. How-ever, while in regular networks each column can haveonly two non zero entries, as each edge can contain twonodes only [55], in hypergraphs each column can displayseveral non zeros entries (i.e. an hyperedge can containseveral nodes).Starting from the above matrix, one can construct the n × n hyper adjacency matrix , A = ee T , whose entry A ij represents the number of hyperedges containing bothnodes i and j . We note that often the adjacency matrixis defined by setting to 0 the main diagonal. Let us alsodefine the m × m hyperedges matrix , C = e T e , whoseentry C αβ counts the number of nodes in E α ∩ E β . Ob-serve that in the literature the number of nodes in a givenhyperedge, C αα , is often called the degree of the hyper-edge, while the node degree stands for the number ofhyperedges containing the node, (cid:80) α e iα e iα . Transition probability.
To describe a random walkprocess, we need to define the transition probability topass from a state, hereby represented by the node onwhich the walker belongs to, to any other state, compat-ible with the former, in one time step. In the case ofsimple unbiased random walks on networks, one assumesthe walker to take with equal probability any link emerg-ing from the node that is initially occupied. Hence, thetransition probability can be readily computed as A ij /k i ,where k i = (cid:80) j A ij is the degree of the origin node.When dealing with hypergraphs, by choosing with uni-form probability any of the neighbouring nodes, namelyall the nodes belonging to hyperedges connected with theorigin node, is not a sensible choice. In this way, in fact,the real structure of the systems is not incorporated intothe dynamical picture. On the contrary, nodes belongingto the same hyperedge exhibit a higher-order interactionand we consequently assume that spreading among themis more probable than with nodes associated to other hy-peredges; because of this the information can thus spendlong periods inside the same hyperedge. For instance,a gossip can spread faster because of group interactionamong individuals, than as follows successive binary en-counters; similarly, ideas can circulate more effectivelyamong collaborators, the coauthors of a joined publica-tion, as compared to the setting where exchanges in pairsare solely allowed for. To compute the transition proba-bility to jump from i to j , we count the number of nodes,excluding i itself, belonging to the same hyperedge of i and j . Recalling the definition of the matrix C , this canbe written as k Hij = (cid:88) α ( C αα − e iα e jα = ( e ˆ Ce T ) ij − A ij ∀ i (cid:54) = j , (2)where ˆ C is a matrix whose diagonal coincides with thatof C and it is zero otherwise (see Appendix B). By nor-malising so as to impose a uniform choice among theconnected hyperedges, we get the following expressionfor the transition probabilities: T ij = ( e ˆ Ce T ) ij − A ij (cid:80) (cid:96) k Hi(cid:96) = ( e ˆ Ce T ) ij − A ij (cid:80) (cid:96) ( e ˆ Ce T ) i(cid:96) − k Hi , (3)where k Hi = (cid:80) (cid:96) A i(cid:96) is the hyperdegree of the node i , asynthetic measure reminiscent of the node degree, whichtakes into account both the number and the size of hy-peredges i in which i is involved.When the hypergraph is a network, all hyperedges have2 nodes. Hence( e ˆ Ce T ) ij = (cid:88) α C αα e iα e jα = 2 (cid:88) α e iα e jα = 2 A ij , (4)and Eq. (3) reduces to the standard transition probabilityfor random walk on networks T ij = 2 A ij − A ij k Hi − k Hi = A ij k i , (5)where we used the fact that, under this assumption, k Hi = k i . Stationary solution.
Having computed the transi-tion probabilities, we can proceed further by formulatingthe dynamical equation which rules the temporal evolu-tion of the probability p ( t ) = ( p ( t ) , . . . , p n ( t )) of findingthe walker on a given node after t > p i ( t + 1) = (cid:88) j p j ( t ) T ji , (6)where the right hand side term combines the probabilityto be in any node j at time t and the probability to per-form a jump towards the target node i , during the nexttime of iteration. As (cid:80) j T ij = 1 for all i , the stationaryprobability distribution, p ( ∞ ) , is thus the left eigenvectorassociated with the eigenvalue λ = 1 of T . Given T , it is possible to obtain an exact analyticalsolution for the stationary state p ( ∞ ) which encapsu-lates the higher-order structure of the system. Indeed,a straightforward computation (see Appendix C), yields: p ( ∞ ) j = (cid:80) (cid:96) ( e ˆ Ce T ) j(cid:96) − k Hj (cid:80) m(cid:96) (cid:104) ( e ˆ Ce T ) m(cid:96) − k Hm (cid:105) , (7)for all j = 1 , . . . , n . In the case the hypergraph is indeeda network, we recover the well known expression q ( ∞ ) j = k j / (cid:80) l k l for the stationary solution of the walk. Let usobserve that L ij = δ ij − T ij = δ ij − k Hij (cid:80) (cid:96) k Hi(cid:96) , (8)is a new random walk Laplacian that generalises the ran-dom walk one for networks. Moreover the former reducesto the latter in the case | E α | = 2 for all α .We observe that the formalism readily extends to thecase of continuous-time random walks, where the evolu-tion of the probability is given by˙ p i ( t ) = (cid:88) j p j ( t ) T ji − (cid:88) j p i T ij . Similarly to the case of networks, as (cid:80) j T ij = 1, it ispossible to rewrite the latter as˙ p i = (cid:88) j p j ( T ji − δ ij ) = − (cid:88) j p j L ji , where L is the above defined Laplace matrix. In thefollowing, for a sake of for the sake of definiteness we limitour analysis to explore the properties of the discrete-timerandom walks on synthetic and real-world hypergraphs,leaving the continuous time case to a further work.Denote by D , the diagonal matrix with entries d Hi = (cid:80) j k Hij and by K H , the matrix characterised by elements k Hij . We can introduce the symmetric
Laplacian L sym as: L sym = I − D − / K H D − / , which is well defined since k Hij ≥ L sym is simi-lar to the operator introduced via relation (8), indeed L = D − / L sym D / . The newly introduced operator L is hence a properly defined Laplacian: it is in factnon-negative definite, it displays real eigenvalues and thesmallest eigenvalue is identically equal to zero, as it read-ily follows by virtue of the proven similarity with L sym .Before turning to discussing the applications, we willbriefly draw a comparison with the setting proposed byZhou [48] and show how this materialises in a naturalsolution for the problem of weights determination. TheLaplacian operator L z introduced in [48] can be cast inthe form: L zij = δ ij − (cid:88) α w α W i C αα e iα e jα , (9)where w α identifies the undetermined weight of the hy-peredge E α , W i = (cid:80) α w α e iα is the total weight of thehyperedges containing the node i , i.e. weighted node de-gree, and C αα stands for the number of nodes in the hyperedge E α . A simple calculation, as detailed in thefollowing, shows that operator L can be eventually re-covered from L z by imposing the non trivial weights w α = C αα ( C αα − L zij = δ ij − (cid:88) α C αα ( C αα − C αα (cid:80) β C ββ ( C ββ − e iβ e iα e jα = δ ij − (cid:88) α ( C αα − (cid:80) β C ββ ( C ββ − e iβ e iα e jα = δ ij − k Hij (cid:80) β (cid:80) (cid:96) e (cid:96)β ( C ββ − e iβ = δ ij − k Hij (cid:80) (cid:96) k Hi(cid:96) = L ij , where used has been made of definition (2) for k Hij andthe fact that C ββ = (cid:80) (cid:96) e (cid:96)β . As anticipated, a naturalchoice for the weights as postulated in [48] can be envis-aged, which follows a sensible microscopic modelling ofthe random walk dynamics.By invoking Theorem 4 in [56], we can finally con-clude that our process is equivalent to a random walk ona weighted projected network, where the weights of thelink ij is given by k Hij , that is the weights scale extensivelywith the region of influence of the nodes, namely the sizeof the hyperedge they belong to. It is indeed quite re-markable that a properly weighted binary network encap-sulates the higher order information, as stemming for thecorresponding hypergraph representation. Observe thatauthors in [56] also consider an extension of the Zhou etal. model, where nodes bear a given weight, tuned so asto reflect the hyperedge characteristics. Again, the intro-duced weights are abstract quantities, and do not reflecta physically motivated choice.
RESULTS
Since the Page-Rank [57, 58], random walks onnetworks are routinely applied to compute centralityscores [33]. Indeed they can be used to rank nodes ac-cording to the probability to be visited by the walker,the larger the latter the more “important”/“central” thenode. In this section, we show that high-order interac-tions can strongly modify the ranking, as resulting froma random walk process on hypergraphs, with respect tothe homologous estimate as computed for the correspond-ing projected network. This fact can thus bear relevantimplications for ranking real data, stemming from a dy-namical process which is better explained in terms ofhypergraphs. In this case, in fact, the applications ofranking tools tailored to pairwise interactions might pro-duce misleading results (see Appendices C and E).To illustrate the effect of a non-trivial higher-orderstructure, we consider a simple hypergraph made by m hyperedges of size 2 all intersecting in a common node, h ; a different node, say c , belongs to one of such 2-hyperedges and to a hyperedge of size k (see Fig. 2 panel a ) for the case m = 7 and k = 6). The random walk on the projected network will ranknodes according to their degree, i.e. q ( ∞ ) i ∼ k i . Hence for m > k , the node h with k h = m , is ranked first, followedby the c node, k c = k , and all other ones (see green curvesin panel c Fig. 2). In contrast, the random walk on thehypergraph ranks nodes taking into account higher-orderrelations. Since from Eq. (7) we get p ( ∞ ) h ∼ m and p ( ∞ ) c ∼ k − , thus h is the top node as long as m > k − (see orange curves in panel c Fig. 2). In conclusion,for a fixed size of the hyperedge k , if the “hub” node is toosmall (see panel d), m < ˆ k = k +1, or the hub is very large(see panel f), m > ˜ k = 1 + ( k − , then both processeswill rank nodes in the same way. However, there exists arange of intermediate values, ˆ k < m < ˜ k , for which thetop ranked node on the hypergraph is the c node while therandom walk on the projected network returns the h nodeas top rank (see panel e). This phenomenon of rankinginversion will be further discussed in Appendix C. In theaim of maximising the probability of occupancy of a givennode, it is preferable for this latter to be connected tonodes organised into few large hyperedges, than to manyparcelled units.To further characterise the impact of the high-orderinteractions on diffusion on larger systems, we considera second synthetic model where all nodes have the samenumber of neighbours, which are arranged in a tuneablenumber of triangles, i.e. hyperedges of size | E α | = 3. Themodel interpolates between the case where the numberof triangles is zero, f = 0, meaning that all interactionsinvolve simple pairs, and the case where there are nopairwise interactions but only 3-body ones, f = 1. Moreprecisely, we start with a 1D regular lattice where nodesare connected to 4 neighbours (2 on the left and 2 onthe right). Each nodes has hence degree 4 and takes partto 2 distinct triangles, i.e. hyperedges with size 3, and f = 1. Then with probability p we iteratively swap theending points of the links with a “criss-cross” rewire, i.e.preserving the nodes degree, progressively eliminating 3-hyperedges, hence triangles. In the limit of high rewiretriangles have a negligible probability to be formed, andone eventually obtains a regular random graph with de-gree k = 4. In the process, we control that no hyperedgeof size greater than 3 is created, so that competition is m = 3
The ( m, k ) -star-clique network . Panel a: hypergraph made by m + k = 13 nodes, divided into m = 7 hyperedgesof size 2 and one large hyperedge of size k = 6. The node h belongs to all the 2-hyperedges, while the node c belongs toone 2-hyperedge and to the 6-hyperedge. Panel b: the projected network where hyperedges are mapped into complete cliques,the 6-hyperedge becomes thus a 6-clique. Panel c: we show the dependence on m of the asymptotic probability of findingthe walker on the node h (circle) or on the node c (square), in the projected network (green symbols) and in the hypergraph(orange symbols). Panels d, e and f: we report the asymptotic probabilities q ( ∞ ) i and p ( ∞ ) i for three values of m : m = 3 < ˆ k ,ˆ k < m = 15 < ˜ k and ˜ k < m = 35, where ˆ k = 6 and ˜ k = 26. only between 2-body and 3-body interactions.As the degree sequence is unchanged throughout thisprocess and every node shares the same number of links,the asymptotic distribution of walkers on the projectednetwork is uniform and given by q i = 1 /N for all i , where N is the number of nodes, set to 500 in the examplebelow, no matter the value of f . This is also the casefor the random walk on hypergraph, in the two limitingcases f = 0 and f = 1; indeed in the former case the hy-pergraph and the projected network do coincide becauseall the hyperedges have size 2. In the latter setting, allnodes are involved in the same number of higher-orderinteractions and thus they are all equivalent. However,for the walk on hypergraphs the stationary state changesat the intermediate stages of f . In order to quantifythe heterogeneity of the stationary state we rely on theGini coefficient, which is defined as the average absolutedifference between all pairs of elements in the vector p ,divided by the average: G ( p ) = (cid:80) Ni =1 (cid:80) Nj =1 | p i − p j | N (cid:80) Ni =1 p i . (10)The Gini coefficient for the stationary state of randomwalk on the above described hypergraph is reported intop panel of Fig. 3. For the limiting values f = 0 and f =1 the stationary state on the hypergraph coincides with the one on the projected network and the Gini index is0 being the asymptotic solution homogeneous. However,high-order structures arising for intermediate values ofthe fraction of triangles induce a heterogeneity in theoccupation of the different nodes at equilibrium, whichis thus different from the one obtained for the associatedprojected network.A standard metric to compare lists is the Jaccard in-dex, a measure of the fraction of elements that are com-mon between two lists with respect to the total numberof involved elements, J ( A, B ) = | A ∩ B | / | A ∪ B | . Asthe Jaccard index does not take into account the orderof the elements as appearing in the two confronted lists,we compare the rankings of the two stationary distribu-tions by means of a modified Jaccard index, ˆ J , recentlyintroduced in [59]. Here differences at the top of theranking induce a stronger change, than differences asso-ciated to the lower ranked elements. Let us observe alsothat the Jaccard index is unable to detect a permutationin the order of the elements in a list, while the modifiedone does. In the bottom panel of Fig. 3, we show theaverage modified Jaccard index, ˆ J , for the M -top rank-ing, M = 100 , , f of3-hyperedges existing in the system. The results are inagreement with the ones obtained via the Gini coefficient;for f = 0 and f = 1 the rankings do coincide and thus ˆ J achieves is maximum value, i.e. 1, while for intermediate M = 100M = 300M = 500
M = 100M = 300M = 500
FIG. 3.
Impact of the -body interaction on theasymptotic solution of the random walk on the hy-pergraph. The top panel reports the Gini coefficient forthe stationary state of the random walk on hypergraphs as afunction of the fraction of hyperedges of size three, f . Recallthat the model does not allow for hyperedges of size largerthan 3. The bottom panel shows for the same networks themodified Jaccard index in order to compare the rankings ofnodes for the hypergraph and the projected network. Dif-ferent colours (blue, red and green), correspond to differentnumbers of nodes chosen for the comparison, i.e. the top 100,the top 300 and all the 500 nodes, respectively. values of f the index ˆ J drops down reflecting differencesamong the rankings. Moreover, we can appreciate thepresence of a large turnover in the top lists: indeed ˆ J as-sociated to small M , i.e. comparing relatively few nodesin the top list, are much smaller than that for large M ,i.e. longer lists.To take one step forward, we consider a syntheticmodel where high-order structures are not limited to 3-body but larger hyperedges are allowed for. We thusbuild a third model which interpolates from a 1D ring toa fully connected network. More precisely, we start froma 1D ring where all the nodes have degree 2, and thenprogressively increase its density as measured by the to-tal number of links, l , until the process terminates witha complete network, corresponding to a hypergraph witha single hyperedge containing all the nodes. Links areadded at random avoiding self-loops and multiple links.We note that differently from the previous case, at in-termediate values of l , this model presents a much widervariety in the size of the hyperedges (or cliques in theprojected network), which are not anymore limited to 2-body and 3-body interactions. For this reason, the struc-ture of the ranking difference is definitely more complex and rich than what one could eventually guess by justlooking at the number of 3-hyperedges, 4-hyperedges or5-hyperedges (see Fig. 4).In the initial configuration of a 1D ring, the stationarysolutions of the hypergraph and the projected networkcoincide, because of the absence of higher-order interac-tions. Similarly, they are also equivalent in the oppositelimit, i.e. when the fully connected network is gener-ated. For intermediate number of added links, the twoprocesses result instead in different rankings. In Fig. 4,we report ˆ J as a function of the total number of links l , to compare the M -top rankings, as obtained by usingthe random walk on the hypergraph and on the projectednetwork, respectively. We reports in particular results forthree values, M = 5 , ,
20. The behaviour of the threescurves is qualitatively similar. Indeed they all reach thevalue 1, i.e. perfect matching of the respective rankingsfor l = 20 (initial 1D ring). Then, even the additionof just few links makes the rankings to change abruptlyand ˆ J consequently drops to low values. This is asso-ciated with the creation of small hyperedges with sizeequal to 3 (see bottom panel of Fig. 4). Adding morelinks reduces the differences, namely ˆ J increases, up to l = 190 (complete network) where again the rankings docoincide and the index equals 1. This is associated withthe birth of larger hyperedges. Let us remark that ˆ J for M = 5 is much smaller than the same quantity computedwith M = 10 (rank half of the nodes) and M = 20 (rankall the nodes) meaning that there is a strong turnoverin the top positions. The heterogeneity in the stationarysolutions of this model, as well as the star-clique exam-ple, is further investigated in the Appendix D where thecorresponding Gini coefficients are shown. APPLICATIONSNode ranking
In the previous section we have shown that hyper-graph and the projected network can exhibit differentstationary solutions because of ranking inversion (see Ap-pendix C). We thus decided to analyse the impact of thisobservation in real networks of scientific collaborations,in our opinion one of the most representative examples ofhigh-order structures in human interactions. The anal-ysed data have been gathered from the arXiv database(see Appendix E for more details). Human collaborationsare often schematised as resorting to pairwise interaction,a working ansatz which amounts to ignoring the organi-sation in teams. At variance, we have instead built a hy-pergraph where researchers (i.e nodes) co-authoring anarticle are part of the same hyperedge.We have then determined the largest connected com-ponent of the hypergraph and that of the projected net-work, considered maximal and unique hyperedges (tohave a fair comparison with the cliques) and computed:(i) the stationary distribution p ( ∞ ) for the random walk FIG. 4.
Impact of high-order structures on the asymp-totic distribution of walkers for the random walk onthe hypergraph and on the projected network.
Us-ing the algorithm presented in the text, by iteratively addinglinks we create hypergraphs that interpolate from a regular1 D ring (where N = 20 nodes are connected each one withits two neighbours) to a complete graph. We then performthe random walk process on respectively the hypergraphs andthe associated projected network and compare the resultingranking (the top 5 blue, the top 10 red and the top 20 green,i.e. the whole set of nodes) using ˆ J (bottom panel). For asmall number of available links, l , the hypergraphs does notpresent many hyperedges and thus the ranking are very close,ˆ J ∼
1. As l starts to increase, few hyperedges of size 3 arecreated (see circles in the top panel) and the rankings esti-mated with the two alternative methods deviate, the values ofˆ J dropping in turn. However, as l increases even more, largerhigh-order structures, e.g. 4 and 5 hyperedges, emerge (seesquare and diamond symbols in the top panel) and ˆ J steadilyincreases. For a large ensemble of added links, l (cid:38) J ∼ on the associated hypergraph, (ii) the stationary distri-bution q ( ∞ ) for the random walk on the correspondingprojected network. We then normalise the computed sta-tionary probabilities by their relative maximum so as tofavour a comparative visualisation. In Fig. 5 we plot p ( ∞ ) i / max j p ( ∞ ) j vs q ( ∞ ) i / max j q ( ∞ ) j for the case arXiv-astro and arXiv-physics. In Fig. 15 the same comparisonis drawn for the complete arXiv dataset.Author are ranked differently, according to the twocriteria, the one based on hypergraphs being more sen-sitive to the organisation in groups. If the computedrankings were (almost) the same, the data would (al-most) lie on the main diagonal; deviation from this, re- sults in novel information conveyed by the random walkon the hypergraph. The unitary square in the plane( q ( ∞ ) i , p ( ∞ ) i ) can be divided into four smaller squares (seeFig. 5). The majority of the authors lies in the bot-tom left square, [0 , / × [0 , / / , × [0 , /
2] (bottom right), [0 , / × [1 / ,
1] (topleft) and [1 / , × [1 / ,
1] (top right). Authors in the topright square are top ranked in both processes: they havehence written a large number of papers with different col-laborators (large degree), but they have also contributedto a relevant number of papers with many co-authors, i.e.large hyperedge size. Scholars in the bottom right squareare better ranked by the random walk on the network;this means that they have written several papers but witha small number of co-authors (see e.g. right panel corre-sponding to physics in Fig. 5). Finally, researchers in thetop left square manifest a complementary attitude: theyhave participated to a small number of papers, but writ-ten by many authors (see e.g. left panel correspondingto astro in Fig. 5).As a further consideration, we can bring to the foredifferent “habits” of publication and writing papers thatauthors exhibit in each domain, despite the distributionof node degrees, i.e. number of different collaborators perauthor, and of hyperedges size, i.e. number of co-authorsin papers, shows a quite similar shape across domains,as e.g. broad tails (see annexed supplementary informa-tion). This is particularly relevant for the High EnergyParticle (hep) archive, one among the oldest ones and di-vided into four subcategories, experimental (ex), lattice(lat), phenomenology (ph) and theory (th) (see Fig. 6).Indeed hep-ex and hep-ph populates mainly the top rightsquare, while hep-lat and hep-th are more present in thetop right and bottom right squares. Researchers belong-ing to the former community tend therefore to write sev-eral papers with many co-authors, while those associatedto the latter have papers with many different collabora-tors, each one co-authored by a small number of schol-ars. This is also confirmed by the largest degree found inthe four subcategories (see Table I) which is as large as ∼ FIG. 5.
Comparing the rankings in the arXiv community: the case of Astro and Physics . We report the scatterplot of the normalised rankings obtained with the RW on network, q ( ∞ ) i , and the one computed using the random walk onhypergraphs, p ( ∞ ) i for the arXiv-astro (left panel) and the arXiv-physics (right panel). hep-exhep-ph hep-lathep-th FIG. 6.
Publication habits arXiv-hep . We report thescatter plot of the normalised rankings obtained with therandom walk on the projected network, q ( ∞ ) i , and the onecomputed using the RW on hypergraphs, p ( ∞ ) i for the foursubdomains of the arXiv-hep domain. Classification task
To further test the interest of a generalised randomwalk process biased to account for hyperedged commu-nities within a plausible microscopic framework, we con-sider the classification task studied by Zhou et al. [48].We anticipate that the obtained classification outper- forms that obtained under the usual random walk frame-work, which ignores the annotated hyper structures.A standard pipeline to analyse a dataset starts withthe determination of pairwise similarities between the ob-jects to be eventually classified. This implies defining anetwork that can be studied by means of standard spec-tral methods. However, similarities involve often groupsof objects. In this respect, hypergraphs define the idealmathematical platform to account for the inherent com-plexity of the classification problem. More precisely, onecan make use of spectral methods based on the hyper-graph Laplace matrix to eventually obtain a classifica-tion which effectively accounts for high-order interactionas displayed in the data [60].Following [48] we consider an ensemble of animals froma zoologically heterogeneous set. Specifically, we usedthe zoo database taken from the UCI Machine Learn-ing Depository [61], containing 101 animals, each one en-dowed with 16 features, such as tail, hair, legs and soon. To each animal we associate its corresponding class,e.g. mammals, birds, etc. (see Appendix F). Here nodesare animals and hyperedges features; we will show thatthe presence of high-order interactions among featuresallow to obtain a very satisfying embedding using only2 or 3 dimensions, a result which is in line with thatreported in Zhou et al. [48] for an ad hoc choice of thefree weights parameters. To this end we build a hyper-graph using the above recipe, we compute its randomwalk Laplacian and eventually its ensuing spectrum. Welist the eigenvalues in ascending order and rename ac-0cordingly the eigenvectors. We use the first left eigenvec-tors [62], associated to the smallest eigenvalues, as coor-dinates of a Euclidean space where to embed the data(see Fig. 7). Let us observe that since we use a randomwalk Laplacian, the first eigenvector, i.e. the one asso-ciated to the 0 eigenvalue, is not homogeneous and italready contains non trivial information on the structureof the examined sample. Classes are identified by differ-ent colours: mammal (yellow circle), bird (magenta uptriangle), reptile (cyan left triangle), fish (red right trian-gle), amphibian (green diamond), bug (blue square) andinvertebrate (black down triangle). One can visually ap-preciate homologous symbols do cluster in space, hencesuggesting that the embedding yields an accurate classi-fication. Indeed, the ground-truth partition of animalsinto these seven classes, and the one obtained by per-forming a K-means clustering in this 3-dimensional spacehave an Adjusted Rand Index (ARI) [63] equal to 0 . − . FIG. 7.
Classification of the animals according to theirfeatures . We report a 3 D embedding of the zoo dataset,namely using the first three eigenvectors. Each combinationcolour/symbol refer to a know class and one can appreciateby eyeball analysis the resulting clusters. CONCLUSIONS
Summing up, we have here introduced a new class ofrandom walks on hypergraphs which take into accountthe presence of higher-order interactions. We providedan analytical expression for the ensuing stationary distri-bution, based on the structural features of the networkedsystem, and compared it to the distribution associated toa traditional random walk performed on the correspond-ing projected network. More precisely, we proposed aself-consistent recipe grounded on a microscopic phys-ical random process biased by the hyperedges sizes toassign weights to hyperedges. We further characterisedthe dynamics by comparing the two processes on sev-eral synthetic and real-world networks, both by meansof numerical simulations and analytical arguments. Weshow that our process produces stationary distributionsdifferent from those obtained for the corresponding pro-jected network, and that prove sensitive to higher-orderstructure in a networked architecture. Our framework isapplied to collaboration networks, yielding new insightson node ranking and centrality measure, which allow fora richer characterisation of individual performances, ascompared to traditional methods. Moreover, we showthat information embedded in the higher-order walk canbe used to achieve accurate classification. In particular,we applied our method to successfully cluster into dif-ferent families, animals with different features, each onerepresenting an hyperedge. The same procedure fails if asimple random walk on the corresponding projected net-work is considered. Importantly, the proposed Laplacianis equivalent to that stemming from a properly tunedweighted network [56]. Higher oder rankings and refinedclassifications could be hence immediately obtained bysupplying to conventional tools and analysis schemes theweighted adjacency matrix that characterises the graphwith pairwise edges associated to the hypergraph con-struction. Taken all together, our work sheds new lighton dynamical processes on networks which are not limitedto pairwise interactions, and on the complex interplay be-tween the structure and dynamics of higher-order inter-action networks. Future applications to machine learningbased approaches to classification are also envisaged. [1] Mark EJ Newman,
Networks: An Introduction (OxfordUniversity Press, Oxford, 2010). [2] Albert-L´aszl´o Barab´asi et al. , Network science (Cam-bridge university press, 2016). [3] Vito Latora, Vincenzo Nicosia, and Giovanni Russo, Complex networks: principles, methods and applications (Cambridge University Press, 2017).[4] R´eka Albert and Albert-L´aszl´o Barab´asi, “Statistical me-chanics of complex networks,” Reviews of modern physics , 47 (2002).[5] Stefano Boccaletti, Vito Latora, Yamir Moreno, MartinChavez, and D-U Hwang, “Complex networks: Structureand dynamics,” Physics reports , 175–308 (2006).[6] Claudio Castellano, Santo Fortunato, and VittorioLoreto, “Statistical physics of social dynamics,” Reviewsof modern physics , 591 (2009).[7] Alex Arenas, Albert D´ıaz-Guilera, Jurgen Kurths, YamirMoreno, and Changsong Zhou, “Synchronization in com-plex networks,” Physics reports , 93–153 (2008).[8] Renaud Lambiotte, Martin Rosvall, and Ingo Scholtes,“From networks to optimal higher-order models of com-plex systems,” Nature physics , 1 (2019).[9] Ingo Scholtes, Nicolas Wider, Ren´e Pfitzner, Anto-nios Garas, Claudio J Tessone, and Frank Schweitzer,“Causality-driven slow-down and speed-up of diffusionin non-markovian temporal networks,” Nature commu-nications , 5024 (2014).[10] Martin Rosvall, Alcides V Esquivel, Andrea Lanci-chinetti, Jevin D West, and Renaud Lambiotte, “Mem-ory in network flows and its effects on spreading dynam-ics and community detection,” Nature communications , 4630 (2014).[11] Manlio De Domenico, Clara Granell, Mason A Porter,and Alex Arenas, “The physics of spreading processes inmultilayer networks,” Nature Physics , 901 (2016).[12] Mikko Kivel¨a, Alex Arenas, Marc Barthelemy, James PGleeson, Yamir Moreno, and Mason A Porter, “Multi-layer networks,” Journal of complex networks , 203–271(2014).[13] Federico Battiston, Vincenzo Nicosia, and Vito Latora,“The new challenges of multiplex networks: Measuresand models,” The European Physical Journal SpecialTopics , 401–416 (2017).[14] Austin R Benson, David F Gleich, and Jure Leskovec,“Higher-order organization of complex networks,” Sci-ence , 163–166 (2016).[15] A. R. Benson, R. Abebe, M. T. Schaub, A. Jadbabaie,and J. Kleinberg, “Simplicial closure and higher-orderlink prediction,” Proceedings of the National Academyof Sciences , E11221 (2018).[16] Iacopo Iacopini, Giovanni Petri, Alain Barrat, and VitoLatora, “Simplicial models of social contagion,” Naturecommunications , 2485 (2019).[17] Jacopo Grilli, Gy¨orgy Barab´as, Matthew J Michalska-Smith, and Stefano Allesina, “Higher-order interactionsstabilize dynamics in competitive network models,” Na-ture , 210 (2017).[18] Karel Devriendt and Piet Van Mieghem, “The simplexgeometry of graphs,” Journal of Complex Networks ,469–490 (2019).[19] Owen T Courtney and Ginestra Bianconi, “Generalizednetwork structures: The configuration model and thecanonical ensemble of simplicial complexes,” Physical Re-view E , 062311 (2016).[20] Giovanni Petri and Alain Barrat, “Simplicial activitydriven model,” Physical Review Letters , 228301(2018). [21] Claude Berge, Graphs and hypergraphs , North-HollandPub. Co. (American Elsevier Pub. Co, 1973).[22] Ernesto Estrada and Juan A Rodr´ıguez-Vel´azquez,“Complex networks as hypergraphs,” arXiv preprintphysics/0505137 (2005).[23] Gourab Ghoshal, Vinko Zlati´c, Guido Caldarelli, andMark EJ Newman, “Random hypergraphs and their ap-plications,” Physical Review E , 066118 (2009).[24] Giovanni Petri, Paul Expert, Federico Turkheimer, RobinCarhart-Harris, David Nutt, Peter J Hellyer, andFrancesco Vaccarino, “Homological scaffolds of brainfunctional networks,” Journal of The Royal Society In-terface , 20140873 (2014).[25] Giovanni Petri and Alain Barrat, “Simplicial activ-ity driven model,” Physical review letters , 228301(2018).[26] Alice Patania, Giovanni Petri, and Francesco Vaccarino,“The shape of collaborations,” EPJ Data Science , 18(2017).[27] Owen T. Courtney and Ginestra Bianconi, “Generalizednetwork structures: The configuration model and thecanonical ensemble of simplicial complexes,” Phys. Rev.E , 062311 (2016).[28] Robert M May, “Will a large complex system be stable?”Nature , 413 (1972).[29] Stefano Allesina and Si Tang, “Stability criteria for com-plex ecosystems,” Nature , 205 (2012).[30] Louis M Pecora and Thomas L Carroll, “Master stabil-ity functions for synchronized coupled systems,” Physicalreview letters , 2109 (1998).[31] Sidney Redner, A guide to first-passage processes (Cam-bridge University Press, 2001).[32] Jae Dong Noh and Heiko Rieger, “Random walks oncomplex networks,” Physical review letters , 118701(2004).[33] Mark EJ Newman, “A measure of betweenness central-ity based on random walks,” Social networks , 39–54(2005).[34] Martin Rosvall and Carl T Bergstrom, “Maps of randomwalks on complex networks reveal community structure,”Proceedings of the National Academy of Sciences ,1118–1123 (2008).[35] Vincenzo Nicosia, Manlio De Domenico, and Vito La-tora, “Characteristic exponents of complex networks,”EPL (Europhysics Letters) , 58005 (2014).[36] Jes´us G´omez-Gardenes and Vito Latora, “Entropy rateof diffusion processes on complex networks,” Physical Re-view E , 065102 (2008).[37] Giulia Cencetti, Federico Battiston, Duccio Fanelli, andVito Latora, “Reactive random walkers on complex net-works,” Physical Review E , 052302 (2018).[38] Per Sebastian Skardal and Sabina Adhikari, “Dynamicsof nonlinear random walks on complex networks,” Jour-nal of Nonlinear Science , 1–26 (2018).[39] Malbor Asllani, Timoteo Carletti, Francesca Di Patti,Duccio Fanelli, and Francesco Piazza, “Hopping in thecrowd to unveil network topology,” Physical review let-ters , 158301 (2018).[40] Michele Starnini, Andrea Baronchelli, Alain Barrat, andRomualdo Pastor-Satorras, “Random walks on temporalnetworks,” Physical Review E , 056115 (2012).[41] Julien Petit, Martin Gueuning, Timoteo Carletti, BenLauwens, and Renaud Lambiotte, “Random walk ontemporal networks with lasting edges,” Physical Review E , 052307 (2018).[42] Julien Petit, Renaud Lambiotte, and Timoteo Carletti,“Classes of random walks on temporal networks,” arXivpreprint arXiv:1903.07453 (2019).[43] Manlio De Domenico, Albert Sol´e-Ribalta, SergioG´omez, and Alex Arenas, “Navigability of intercon-nected networks under random failures,” Proceedingsof the National Academy of Sciences , 8351–8356(2014).[44] Federico Battiston, Vincenzo Nicosia, and Vito Latora,“Efficient exploration of multiplex networks,” New Jour-nal of Physics , 043035 (2016).[45] Michael T Schaub, Austin R Benson, Paul Horn, GaborLippner, and Ali Jadbabaie, “Random walks on sim-plicial complexes and the normalized hodge laplacian,”arXiv preprint arXiv:1807.05044 (2018).[46] Linyuan Lu and Xing Peng, “High-ordered random walksand generalized laplacians on hypergraphs,” in Interna-tional Workshop on Algorithms and Models for the Web-Graph (Springer, 2011) pp. 14–25.[47] Amine Helali and Matthias L¨owe, “Hitting times, com-mute times, and cover times for random walks on randomhypergraphs,” Statistics and Probability Letters , 1–6 (2019).[48] Dengyong Zhou, Jiayuan Huang, and BernhardSch¨olkopf, “Learning with hypergraphs: Clustering, clas-sification, and embedding,” in
Advances in neural infor-mation processing systems (2007) pp. 1601–1608.[49] Joan T. Matamalas, Sergio G´omez, and Alex Are-nas, “Abrupt phase transition of epidemic spreadingin simplicial complexes,” arXiv preprint:1910.03069v1[physics.soc-ph] (2019).[50] Bukyoung Jhun, Minjae Jo, and B. Kahng, “Simpli-cial sis model in scale-free uniform hypergraph,” arXivpreprint:1910.00375v1 [physics.soc-ph] (2019).[51] Leonie Neuh¨auser, Andrew Mellor, and Renaud Lam-biotte, “Multi-body interactions and non-linear consen-sus dynamics on networked systems,” arXiv preprint arXiv:1910.09226 (2019).[52] Guilherme Ferraz de Arruda, Giovanni Petri, and YamirMoreno, “Social contagion models on hypergraphs,”arXiv preprint arXiv:1909.11154 (2019).[53] Michela Del Vicario, Gianna Vivaldo, Alessandro Bessi,Fabiana Zollo, Antonio Scala, Guido Caldarelli, andWalter Quattrociocchi, “Echo chambers: Emotional con-tagion and group polarization on facebook,” ScientificReports (2016).[54] M Belkin and P Niyogi, “Laplacian eigenmaps for di-mensionality reduction and data representation,” NeuralComputation , 1373–1396 (2002).[55] We do not consider here hyperedges with size 1, becausethey correspond to isolated nodes, i.e. nodes that cannottake part to the examined process.[56] Uthsav Chitra and Benjamin J. Raphael, “Random walkson hypergraphs with edge-dependent vertex weights,”arXiv preprint arXiv:1905.08287 (2019).[57] S. Brin and L. Page, “The anatomy of a large-scale hy-pertextual web search engine,” in Seventh InternationalWorld-Wide Web Conference (WWW 1998) (1998).[58] Lawrence Page, Sergey Brin, Rajeev Motwani, and TerryWinograd,
The PageRank Citation Ranking: BringingOrder to the Web. , Technical Report 1999-66 (StanfordInfoLab, 1999) previous number = SIDL-WP-1999-0120.[59] Floriana Gargiulo, Auguste Caen, Renaud Lambiotte,and Timoteo Carletti, “The classical origin of modernmathematics,” EPJ Data Science , 26 (2016).[60] Loc Hoang Tran, Linh Hoang Tran, Hoang Trang, andLe Trung Hieu, “Combinatorial and random walk hyper-graph laplacian eigenmaps,” International Journal of Ma-chine Learning and Computing , 462 (462).[61] Dheeru Dua and Casey Graff, “UCI machine learningrepository,” (2017).[62] In principle also the right eigenvectors can be used forclassification purposes.[63] L. Hubert and P. Arabie, “Comparing partitions,” Jour-nal of Classification , 193–218 (1985). ACKNOWLEDGEMENTS
Appendix A: About the projected network
A hypergraph is simple if each hyperedge does not contain any other hyperedge. We report in Fig. 8 two examples,the hypergraph H with nodes V = { , , } and hyperedges E = { , } and E = { , } is simple because either E (cid:54)⊂ E nor E (cid:54)⊂ E . On the other hand the hypergraph H with nodes W = { a, b, c } and hyperedges E = { a, b, c } and E = { a, b } , is not simple because E ⊂ E .Once we build the projected network, π ( H ), starting from the latter hypergraph we get a complete 3-clique,loosing thus information on the existence of hyperedge E (see left panel Fig. 9). Hence, we cannot get back to H ,by inverting the construction, π − π ( H ) (cid:54) = H . A possible way to overcome this difficulty is to consider a weightedprojection (see right panel Fig. 9) where edges inherit a weight counting the number of different hyperedges theybelong to. Observe however that for large hyperedge sizes the inversion can be computationally costly because of thecombinatorial structure of the problem.3
213 bac E = { , }
FIG. 8.
Simple and not simple hypergraphs . The hypergraph shown on the left with nodes V = { , , } and hyperedges E = { , } and E = { , } is simple, while the one on the right, with nodes W = { a, b, c } and hyperedges E = { a, b, c } and E = { a, b } , is not simple. ba cba c ⇡ ( H )
11 1 21 1
Projection Weighted projection
FIG. 9.
Weighted projection of hypergraphs . We propose a standard projection (left panel) and a weighted projection(right panel) of the hypergraph H shown in Fig. 8. In the latter case, the edge ( a, b ) has weight 2 because it belongs to twodifferent hyperedges in H . Appendix B: Transition probability
The aim of this section is to provide some details about the calculation of the generalised transition probabilitieswhich take into account the high-order structure of the hyperedges. To compute the transition probability to jumpfrom i to j , we first count the number of nodes, excluding node i itself, belonging to the same hyperedge of i and j : k Hij = (cid:88) α ( C αα − e iα e jα i (cid:54) = j and k Hii = 0 ∀ i ; (B1)namely for each hyper edge E α we consider the number of its nodes minus one, i.e. C αα −
1. Then, this quantity isadded to k Hij if and only if e iα = e jα = 1, that is if and only if both i and j belong to E α .Secondly, we normalise this quantity by considering a uniform choice among the connected hyperedges. Hence, weobtain a first formula for the transition probability T ij to jump from node i to node j : T ij = k Hij (cid:80) l k Hil = (cid:80) α ( C αα − e iα e jα (cid:80) l (cid:80) α ( C αα − e iα e lα (B2)so that (cid:80) j T ij = 1 ∀ i .The latter can be rewritten in an equivalent form, which allows one to draw a comparison with the transition prob-ability for unbiased random walks on networks. Indeed. By recalling the definition of C αβ = ( e T e ) αβ = (cid:80) l e Tαl e lβ = (cid:80) l e lα e lβ , we get C αα = (cid:80) l e lα e lα and then (cid:88) α C αα e iα e jα = (cid:88) α e iα C αα e Tαj = ( e ˆ Ce T ) ij , (B3)where ˆ C is a diagonal matrix: the diagonal of ˆ C coincides with that of C and its off-diagonal are identically equal tozero. This allows us to rewrite Eq. (B1) in a more compact way, Eq. (2) in the main text: k Hij = (cid:88) α ( C αα − e iα e jα = ( e ˆ Ce T ) ij − ( ee T ) ij = ( e ˆ Ce T ) ij − A ij ∀ i (cid:54) = j , T ij = ( e ˆ Ce T ) ij − A ij (cid:80) l k Hil = ( e ˆ Ce T ) ij − A ij (cid:80) l ( e ˆ Ce T ) il − k Hi , where k Hi = (cid:80) l A il is the hyperdegree of the node i .Let us observe that this equation remains valid even for not simple hypergraphs. For instance using again thehypergraph H shown in Fig. 8, where the hyperedge E is properly included into E , we get k ab = ( E −
1) + ( E −
1) = 2 + 1 and k ac = E − , and thus the following transition probabilities T ab = 35 and T ac = 25 , so the transition from a to b is 1 . c because a and b share two hyperedges. Among not simplehypergraphs, one has to account for the fact that hyperedges are repeated several times. The theory here proposedholds true also for weighted hyperedges.
1. Nonlinear transition rates
In deriving the transition rates Eq. (3), we assumed that the size of the hyperedge linearly correlates with theprobability for the walker to perform a jump, one can of course relax this assumption and introduce nonlineartransition rates. In other words, one can add a bias in (B1) in the selection rule for a target node j , as operated by awalker sitting on node i . For example, one can posit: k ( H,γ ) ij = (cid:88) α ( C αα − γ e iα e jα i (cid:54) = j and k ( H,γ ) ii = 0 ∀ i . (B4)In this way large hyperedges are even more favoured, if γ >
0, while the opposite happens if γ <
0, and we eventuallyget for the transition probabilities T ( γ ) ij = k ( H,γ ) ij (cid:80) l k ( H,γ ) il . Clearly, other choices are possible but exploring further generalisations is left for future investigations.
Appendix C: Stationary solution and ranking
Given the transition probability stored in the matrix T , we can obtain the analytical solution for the stationarystate p ( ∞ ) defined by p ( ∞ ) = p ( ∞ ) T . By recalling Eq. (7) as reported in the main text p ( ∞ ) j = (cid:80) l ( e ˆ Ce T ) jl − k Hj (cid:80) ml (cid:104) ( e ˆ Ce T ) ml − k Hm (cid:105) , we can straightforwardly verify that it solves the fixed point equation for the governing dynamics. To this end oneneeds to plug the above equation into Eq. (6) and recall the definition (3) for the T ij (cid:88) j (cid:32)(cid:88) l ( e ˆ Ce T ) jl − k Hj (cid:33) (cid:32) ( e ˆ Ce T ) ji − A ji (cid:80) l ( e ˆ Ce T ) jl − k Hj − δ ji (cid:33) = (cid:88) j (cid:34)(cid:16) ( e ˆ Ce T ) ji − A ji (cid:17) − (cid:32)(cid:88) l ( e ˆ Ce T ) il − k Hi (cid:33)(cid:35) = 0 , (C1)where the last step has been obtained by observing that A ij = A ji and ( e ˆ Ce T ) ij = ( e ˆ Ce T ) ji .As stated in the main text, random walks can be used to rank nodes. This is achieved by evaluating the asymptoticprobability to get the walkers on the selected node: the larger the probability, the more central the node. The analytical5expression for p ( ∞ ) indicates that the ranking provided by the random walkers on the hypergraph is proportional to d Hi = (cid:80) j k Hij , while it is well known that the ranking that follows the usual randoms walks on the projected networkscales proportionally to the node degree, k i . Two nodes, say i and j , are thus ranked differently by the two processes,if k i > k j but d Hi < d Hj . As we will now show, the presence of high-order structures can induce a rankings inversion.A simple example where this occurs is shown in the left panel of Fig. 10. The node i belongs to the intersection ofthree 2-hyperedges. Thus its degree (in the projected network) is given by k i = 3. Moreover, d Hi = 3, because, locally,the hypergraph reduces to a standard network; on the other hand the node j belongs to a 3-hyperedge, hence k j = 2,because it is part of a 3-clique, but d Hj = 4. Hence, k i > k j but d Hi < d Hj . Nodes j will be consequently ranked above i using the generalised random walks on the hypergraph, while the contrary happens if one relies on random walkson the projected network.The above construction can be readily generalised, as shown by the example presented on the right panel of Fig. 10.Here, i belongs to a 3-hyperedge and to two 2-hyperedges, hence k i = 4 and d Hi = 6; node j instead belongs to a4-hyperedge, thus k j = 3 and d Hj = 9. So again k i > k j while d Hi < d Hj . i j d Hi = 3 < d Hj = 4
Examples with ranking inversion.
We propose two typical examples of high-order structures that locally producetwo different rankings. On the left panel an example involving three 2-hyperedges and one 3-hyperedge, while on the rightpanel the case with one 3-hyperedge and two 2-hyperedges compared with a 4-hyperedge. In both cases the first configurationwill be ranked above the second one, when using the random walks on hypergraphs, while the opposite holds when the randomwalkers run on the projected networks.
Appendix D: Heterogeneity of stationary solution
The stationary solution that we obtain from random walk on a hypergraph is very different from the one we canget from the corresponding projected network, the first one being more sensitive to the organisation in groups. Theheterogeneity of the state, i.e. the difference among the occupation probability of the different nodes at equilibriumcan be quantified by making use of the Gini coefficient.Fig. 11 reports on the ratio between the coefficient G computed for the hypergraph and for the projected networkof Fig. 2 of the main text, at varying m , the size of the star, and k , the size of the clique.Fig. 12 instead shows the heterogeneity for the model which goes from a 1D lattice to a fully connected network,by subsequently adding the links (see Fig. 4 of the main text). The red points show the Gini coefficient for thehypergraph, while the green ones are plotted for the projected network, at varying l , the number of links in the graph.From the results presented in these Figures one can appreciate that the Gini coefficient associated to the stationarysolution for the random walk on the hypergraph is always larger than the same quantity computed for the random walkon the projected network. This implies thus that the distribution of walkers on the hypergraph is more heterogeneousthan for the projected network.6 FIG. 11.
Star-clique model.
Ratio between the Gini coefficient of the stationary state on the hypergraph and the projectednetwork, at varying of the size of the clique ( k ) and of the star ( m ).FIG. 12. Lattice to fully-connected model
Gini coefficient for the stationary state of the random walk on the hypergraph(red) and on the corresponding projected network (green).
Appendix E: Co-authorship networks from arXiv
The collaboration network is one of the most representative examples of hypergraph; nodes are authors and hyper-edges are groups of authors that collaborated to accomplish a task, e.g. write a scientific paper. For this reason wedecided to applied the method that we developed to the co-authorship networks extracted from the online preprintsplatform arXiv and hence analyse the nodes ranking obtained using the two processes.In this section we report some results for the co-authorship hypergraph for the subdomains of arXiv , since theirexistence up to 2018 included (second column Table I). In each subdomain, we gathered all the papers and thenextracted the authors names, so creating a hyperedge whose nodes are the authors. We thus obtain a set of nodes V (1) and hyperedges E (1) , and also the edges of the associated projected network, E (1) q . Such quantities are reportedin parentheses in the third, fourth and fifth column of the Table I. Once the hypergraph has been built we identifythe largest connected component that will contain the nodes V ( cc ) ; then we identify all the maximal, i.e. not properlycontained in any other larger hyperedge, and unique hyperedges E ( cc ) and the edges of the associated projectednetwork E ( cc ) q . Columns 3-4 and 5 of the Table show such values. Finally, we compute the largest hyperedge and thelargest node degree in the maximal connected component (columns 6 and 7). For instance in the arXiv-cs there isa node that belongs to a hyperedge of size 65 and that is linked to other 406 nodes: this means that this researcherhas signed a paper with 64 other researchers and in total he/she had 406 different collaborators with whom he/shehas written a paper. Let us also observe that because of the maximality and uniqueness assumptions, we do notknow if he/she has co-authored other papers with a subset of the 64 scholars. Moreover, because we used unweightednetworks, we also cannot estimate how many papers he/she wrote with her 406 collaborators. Let us recall that theneed for the maximality and uniqueness is only to compare the results with the projected network, while our methodworks also without these assumptions.Authors and articles in each subdomain follow different “rules” and “habits” of publication and writing papers.However, the distribution of node degrees, i.e. number of different collaborators per author, and of hyperedges size,i.e. number of co-authors in papers, exhibit quite similar shapes across the domains, as e.g. broad tails (see Fig. 137 arXiv period nodes hyperedges links max | E α | max k i astro-ph 1992-2018 185579 (195729) 136918 (201270) 4602315 (4617912) 81 2732cond-mat 1992-2018 221415 (243749) 141611 (207939) 1520895 (1551863) 63 1426cs 1993-2018 136146 (187689) 84184 (139334) 534462 (607560) 65 406econ 2017-2018 113 (1147) 63 (612) 214 (1295) 5 36gr-qc 1992-2018 32088 (40316) 25321 (45378) 216355 (228811) 80 511hep-ex 1992-2018 48460 (55634) 12310 (23249) 1418268 (1435372) 83 1228hep-lat 1992-2018 10275 (12483) 7439 (14143) 85194 (87254) 72 346hep-ph 1992-2018 62885 (70324) 50403 (86150) 814746 (823705) 74 1244hep-th 1992-2018 41814 (51045) 42410 (74136) 144710 (154737) 57 206math 1992-2018 112203 (159595) 106583 (194312) 279891 (313402) 60 336nlin 1993-2018 19491 (30445) 12428 (23503) 52089 (64890) 46 312physics 1996-2018 188142 (240866) 68805 (116611) 1859156 (1950143) 80 891q-bio 2003-2018 23630 (45103) 9926 (21191) 93127 (142136) 54 176q-fin 2008-2018 3136 (8721) 2155 (6042) 6851 (13078) 11 66stat 2008-2018 39422 (57955) 23377 (39366) 130665 (158435) 65 228TABLE I. Some figures for the arXiv subdomains . The first column shows the subdomain of the arXiv server, whilethe second one stands for the period of time for which we have extracted the information. The columns 3, 4 and 5 displayrespectively the number of nodes, the number of maximal unique hyperedges and the number of links in the largest connectedcomponent, while in parenthesis we show the same values for the whole hypergraph/network. In column 6, we report the sizeof the largest hyperedge and in the 7th the maximum degree. for the degree distribution and - see Fig. 14 - for the hyperedges sizes distribution).As already stated the random walk on the hypergraph gives more relevance to the size of the hyperedge, i.e. onthe number of co-authors, while the same process on a network emphasises the number of different collaborators. Letus remember that we hereby considered unweighted hypergraphs and networks. We can thus use these approachesto distinguish the different “publication habits” in the considered subdomains. To this aim we first normalise thestationary probabilities p ( ∞ ) i for the hypergraph and q ( ∞ ) i for the projected network, with respect to their maximumvalue, to be able to compare sets containing different amount of data, and then we report in the plane with coordinates (cid:16) q ( ∞ ) i / max j q ( ∞ ) j , p ( ∞ ) i / max j p ( ∞ ) j (cid:17) , the scatter plot of the data (each point is an author in the maximal connectedcomponent of the hypergraph), separated into different subdomains (see Fig. 15).If the computed rankings were (almost) the same, the data would (almost) lie on the main diagonal; deviation fromthis, results in novel information conveyed by the random walk on the hypergraph. Beside the region delimited by q ( ∞ ) i / max j q ( ∞ ) j ≤ / p ( ∞ ) i / max j p ( ∞ ) j ≤ /
2, associated to authors having written few articles (low degree)and in small group, we identify three interesting zones associated (roughly speaking) to the squares: [1 / , × [0 , / , / × [1 / ,
1] (top left) and [1 / , × [1 / ,
1] (top right). Authors in the top right square are topranked in both processes: they have hence written a large number of papers with different collaborators, i.e. largedegree, but also they have participated to a relevant number of papers with many co-authors, i.e. large hyperedgesize. Scholars in the bottom right square are better ranked by the random walk on the network. This means thatthey have written several papers but with a small number of co-authors (see e.g. panel physics in Fig. 15). Finallyscholars in the top left square behave in the opposite way: they have participated to a small number of papers butwritten by many authors (see e.g. panels gr-qc, q-bio or stat in Fig. 15).
Appendix F: The zoo UCI database
The zoo dataset from the UCI Machine Learning Depository [61], contains 101 animals, each one endowed with 15boolean features, whose value is thus yes/not, e.g. hair, feathers, eggs, milk, airborne, aquatic, predator, toothed,backbone, breathes, venomous, fins, tail, domestic and catsize. There is also a further class that reports on the numberof legs, i.e. 0 , , , ,
8. To homogenise the dataset we decided to introduce five new boolean features to replace thelatter one; the new ones being: “has 0 legs”, “has 2 legs”, “has 4 legs”, “has 6 legs”, “has 8 legs”. The dataset ismanually annotated, hence for each animal we have the right class it belongs to, e.g. Mammal, Bird, Reptile, Fish,Amphibian, Bug and Invertebrate.This dataset has been created to provide a benchmark for machine learning tools, to test their capacity to correctly8 -6 -4 -2 -8 -6 -4 -2 -8 -6 -4 -2 -6 -4 -2 -4 -3 -2 -1 -6 -4 -2 astro-phecon cond-matgr-qc cs -8 -6 -4 -2 math -8 -6 -4 -2 -5 statq-fin -6 -4 -2 nlin physics -6 -4 -2 q-bio -8 -6 -4 -2 exlatphth hep FIG. 13.
Degree distribution . We report for the arXiv subdomains the probability distribution of node degrees, p ( k i ),associated to the maximal connected component. In all the cases, we observe a broad distribution; Notice that the arXiv-econhas a relatively small number of papers and authors because of its young age (2017-2018) and thus also the maximal degree,i.e. number of papers written by an author, is quite small. assign each animal to the right class based on the associated features.Animals are the nodes of the hypergraph and the features are the hyperedges, hence all animals sharing the samefeature are put in the same hyperedge. The projected network is obtained by making a complete clique from eachhyperedge, that is to create a link between all the nodes sharing the same property. Let us observe that this canalso be seen as the projection of the bipartite network where there are two kinds of nodes, animals and features,each one linked only to nodes of the other kind. We thus computed the spectrum of the hypergraph Laplacian andthe one for the projected network and we ranked eigenvalues in ascending order, being 0 the smallest one. We thenaccordingly rename the associated left eigenvector and we use the first few to embed the data set in small dimensionalEuclidean space. Results reported in Fig. 16 visually show that classification performances are significantly worse as9 -6 -4 -2 -4 -3 -2 -1 -5 -6 -4 -2 -5 astro-phecon cond-matgr-qc cs -6 -4 -2 math -6 -4 -2 -4 -3 -2 -1 statq-fin -4 -3 -2 -1 nlin physics -4 -3 -2 -1 q-bio -5 exlatphth hep FIG. 14.
Hyperedges size distribution . We report for the arXiv subdomains the probability distribution of hyperedgessize, p ( | E α | ), associated to the maximal connected component. In all the cases we observe a broad distribution, except for thearXiv-econ for which the number of papers and authors is relatively small because of its young age (2017-2018) and thus alsothe maximal hyperedge size, i.e. number of co-authors of a paper, is quite small. For this reason we report data in the form ofa histogram. those obtained when preserving the high order information (see Fig. 7 in the main text). In Fig. 16 we report 2Dprojections, but similar conclusion (as testified by the quantitative ARI scores) are obtained for a 3D embedding ofthe data. In particular, this method is less sensitive to the differences among nodes and as a consequence, multiplenodes do overlap. Moreover, it is evident that, while some nodes are correctly clustered (like the yellow ones), theothers appear confusingly mixed together (see the magenta triangle, which is at the top of a pile of differently colouredsymbols).0 astro-phecon cond-matgr-qc cs math statq-finnlin physics q-bio hep FIG. 15.
Comparing the rankings in the arXiv community . We report the scatter plot of the normalised rankingsobtained with the RW on network, q ( ∞ ) i , and the one computed using the random walk on hypergraphs, p ( ∞ ) i . -0.4 -0.2 0 0.2-0.500.5 FIG. 16.