[PDF] Random Walks on Hypergraphs with Edge-Dependent Vertex Weights

Abstract

Full PDF

RRandom Walks on Hypergraphs withEdge-Dependent Vertex Weights

Uthsav Chitra ∗ , Benjamin J Raphael † Department of Computer Science, Princeton UniversityMay 22, 2019

Abstract

Hypergraphs are used in machine learning to model higher-order relationships in data. Whilespectral methods for graphs are well-established, spectral theory for hypergraphs remains an activearea of research. In this paper, we use random walks to develop a spectral theory for hypergraphs withedge-dependent vertex weights : hypergraphs where every vertex v has a weight γ e ( v ) for each incidenthyperedge e that describes the contribution of v to the hyperedge e . We derive a random walk-basedhypergraph Laplacian, and bound the mixing time of random walks on such hypergraphs. Moreover,we give conditions under which random walks on such hypergraphs are equivalent to random walks ongraphs. As a corollary, we show that current machine learning methods that rely on Laplacians derivedfrom random walks on hypergraphs with edge- independent vertex weights do not utilize higher-orderrelationships in the data. Finally, we demonstrate the advantages of hypergraphs with edge-dependentvertex weights on ranking applications using real-world datasets. Graphs are ubiquitous in machine learning, where they are used to represent pairwise relationships betweenobjects. For example, social networks, protein-protein interaction (PPI) networks, and the internet aremodeled with graphs. One limitation of graph models, however, is that they do not encode higher-orderrelationships between objects. A social network can represent a community of users (e.g. a friend group)as a collection of edges between each user, but this pairwise representation loses information about theoverall group structure [38]. In biology, protein interactions are not only between pairs of proteins, butalso between groups of proteins in protein complexes [32, 33].Such higher-order interactions can be modeled using a hypergraph: a generalization of a graph con-taining hyperedges that can be incident to more than two nodes. A hypergraph representation of a socialnetwork can model a community of friends with a single hyperedge. In contrast, the corresponding rep-resentation of a community in a graph requires many edges that connect pairs of individuals within thecommunity; conversely, it may not be clear which collection of edges in a graph represents a community(e.g. a clique, an edge-dense subnetwork, etc). Hypergraphs have been used in a variety of machine learningtasks, including clustering [1, 27, 28, 43], ranking keywords in a collection of documents [5], predictingcustomer behavior in e-commerce [26], object classification [41, 42], and image segmentation [24].A common approach to incorporate graph information in a machine learning algorithm is to utilizeproperties of random walks or diffusion processes on the graph. For example, random walks on graphsunderlie algorithms for recommendation systems [21], clustering [18, 31], information retrieval [6], and ∗ Email: [email protected] † Email: [email protected] a r X i v : . [ c s . L G ] M a y ther applications. In many machine learning applications, the graph is represented through the graphLaplacian. Spectral theory includes many key results regarding the eigenvalues and eigenvectors of thegraph Laplacian, and these results form the foundation of spectral learning algorithms.Spectral theory on hypergraphs is much less developed than on graphs. In seminal work, Zhou et al.[43] developed learning algorithms on hypergraphs based on random walks on graphs. However, at nearlythe same time, Agarwal et al. [2] showed that the hypergraph Laplacian matrix used by Zhou et al. is equalto the Laplacian matrix of a closely related graph, the star graph. A consequence of this equivalence is thatthe methods introduced by Zhou et al. utilize only pairwise relationships between objects, rather than thehigher-order relationships encoded in the hypergraph. More recently, Chan et al. [7] and Li and Milenkovic[27, 28] developed nonlinear Laplacian operators for hypergraphs that partially address this issue. However,all existing constructions of linear Laplacian operators utilize only pairwise relationships between vertices,as shown by Agarwal et al. [2].In this paper, we develop a spectral theory for hypergraphs with edge-dependent vertex weights . Insuch a hypergraph, each hyperedge e has an edge weight ω ( e ) , and each vertex v has a collection of vertexweights, with one weight γ e ( v ) for each hyperedge e incident to v . The edge-dependent vertex weight γ e ( v ) models the contribution of vertex v to hyperedge e . Edge-dependent vertex weights have previously beenused in several applications including: image segmentation, where the weights represent the probability ofan image pixel (vertex) belonging to a segment (hyperedge) [11]; e-commerce, where the weights modelthe quantity of a product (hyperedge) in a user’s shopping basket (vertex) [26]; and text ranking, where theweights represent the importance of a keyword (vertex) to a document (hyperedge) [5]. Hypergraphs withedge-dependent vertex weights have also been used in image search [20, 40] and 3D object classification[42], where the weights represent contributions of vertices in a k-nearest-neighbors hypergraph.Unfortunately, because of a lack of a spectral theory for hypergraphs with edge-dependent vertexweights, many of the papers that use these hypergraphs rely on incorrect or theoretically unsound assump-tions. For example, Zhang et al. [42] and Ding and Yilmaz [11] use a hypergraph Laplacian with no spectralguarantees, while Li et al. [26] derive an incorrect stationary distribution for a random walk on such ahypergraph (see Supplement for additional details). The reason such issues arise is because existing spectralmethods are developed for hypergraphs with edge-independent vertex weights , i.e. hypergraphs where the γ e ( v ) are identical for all hyperedges e .In this paper, we derive several results for hypergraphs with edge-dependent vertex weights. First, weshow that random walks on hypergraphs with edge-independent vertex weights are always equivalent torandom walks on the clique graph (Figure 1). This generalizes the results of Agarwal et al. [2] and gives theunderlying reason why existing constructions of hypergraph Laplacian matrices [34, 43] do not utilize thehigher-order relations of the hypergraph.Motivated by this result, we derive a random walk-based Laplacian matrix for hypergraphs with edge-dependent vertex weights that utilizes the higher-order relations expressed in the hypergraph structure.This Laplacian matrix satisfies the typical properties one would expect of a Laplacian matrix, includingbeing positive semi-definite and satisfying a Cheeger inequality. We also derive a formula for the stationarydistribution of a random walk on a hypergraph with edge-dependent vertex weights, and give a bound onthe mixing time of the random walk.Our paper is organized as follows. In Section 2, we define our notation, and introduce hypergraphswith edge-dependent vertex weights. In Section 3, we formally define random walks on hypergraphs withedge-dependent vertex weights, and show that when the vertex weights are edge-independent, a randomwalk on a hypergraph has the same transition matrix as a random walk on its clique graph. In Section4, we derive a formula for the stationary distribution of a random walk, and use it to bound the mixingtime. In Section 5, we derive a random-walk based Laplacian matrix for hypergraphs with edge-dependentvertex weights and show some basic properties of the matrix. Finally, in Section 6, we demonstrate twoapplications of hypergraphs with edge-dependent vertex weights: ranking authors in a citation network2nd ranking players in a video game. All proofs are in the Supplementary Material. Let G = ( V , E , w ) be a graph with vertex set V , edge set E , and edge weights w . For a vertex v , let N ( v ) = { u ∈ V : ( u , v ) ∈ E } denote the vertices incident to v . The adjacency matrix A of a graph is a | V | × | V | matrix where A ( u , v ) = w ( e ) if ( u , v ) ∈ E and 0 otherwise.Let H = ( V , E , ω ) be a hypergraph with vertex set V ; edge set E ⊂ V ; and hyperedge weights ω . Agraph is a special case of a hypergraph, where each hyperedge e has size | e | =

2. For hypergraphs, the terms“hyperedge” and “edge” are used interchangeably. A random walk on a hypergraph is typically defined asfollows [4, 9, 12, 43]. At time t , a “random walker” at vertex v t will:1. Select an edge e containing v t , with probability proportional to ω ( e ) .2. Select a vertex v from e , uniformly at random.3. Move to vertex v t + = v at time t + v uniformly at random from e , we pick v according to a fixed probability distribution on the vertices in e . This motivates the following definition ofa hypergraph with edge-dependent vertex weights . Definition 1.

A hypergraph H = ( V , E , ω , γ ) with edge-dependent vertex weights is a set of vertices V , a set E ⊂ V of hyperedges, a weight ω ( e ) for every hyperedge e ∈ E , and a weight γ e ( v ) for every hyperedge e ∈ E and every vertex v incident to e .We emphasize that a vertex v in a hypergraph with edge-dependent vertex weights has multiple weights:one weight γ e ( v ) for each hyperedge e that contains v . Intuitively, γ e ( v ) measures the contribution of vertex v to hyperedge e . In a random walk on a hypergraph with edge-dependent vertex weights, the randomwalker will pick a vertex v from hyperedge e with probability proportional to γ e ( v ) . Note that we set γ e ( u ) = u (cid:60) e . We show an example of a hypergraph with edge-dependent vertex weights in Figure 1.If each vertex has the same contribution to all incident hyperedges, i.e. γ e ( v ) = γ e (cid:48) ( v ) for all hyperedges e and e (cid:48) incident to v , then we say that the hypergraph has edge-independent vertex weights , and we use γ ( v ) = γ e ( v ) to refer to the vertex weights of H . If γ e ( v ) = v and incident hyperedges e ,we say the vertex weights are trivial .We define E ( v ) = { e ∈ E : v ∈ e } to be the hyperedges incident to a vertex v , and E ( u , v ) = { e ∈ E : u ∈ e , v ∈ e } to be the hyperedges incident to both vertices u and v . Let d ( v ) = (cid:205) e ∈ E ( v ) ω ( e ) denotethe degree of vertex v , and let δ ( e ) = (cid:205) v ∈ e γ e ( v ) denote the degree of hyperedge e . The vertex-weightmatrix R of a hypergraph with edge-dependent vertex weights H = ( V , E , ω , γ ) is an | E | × | V | matrixwith entries R ( e , v ) = γ e ( v ) , and the hyperedge weight matrix W is a | V | × | E | matrix with W ( v , e ) = ω ( e ) if v ∈ e , and W ( v , e ) = D V is a | V | × | V | diagonal matrix withentries D V ( v , v ) = d ( v ) , and the hyperedge-degree matrix D E is a | E | × | E | diagonal matrix with entries D E ( e , e ) = δ ( e ) .Given H = ( V , E , ω , γ ) , the clique graph of H , G H , is an unweighted graph with vertices V , and edges E (cid:48) = {( v , w ) ⊂ V × V : v , w ∈ e for some e ∈ E } . In other words, G H turns all hyperedges into cliques.We say a hypergraph H is connected if its clique graph G H is connected. In this paper, we assume allhypergraphs are connected.For a Markov chain with states S transition probabilities p , we use p u , v to denote the probability ofgoing from state u to state v . 3igure 1: Example illustrating Theorem 4. A hypergraph with edge-independent vertex weights H (left)and a corresponding edge-weighted clique graph G H (right) such that random walks on H and G H areequivalent. Note that, if one changes the vertex weights of b to be edge- dependent vertex weights, by setting γ e ( b ) = , γ e ( b ) =

2, then it is not possible to choose edge weights w u , v on G H such that random walks on G H and H are equivalent. Let H = ( V , E , ω , γ ) be a hypergraph with edge-dependent vertex weights. We first define a random walkon H . At time t , a random walker at vertex v t will do the following:1. Pick an edge e containing v , with probability ω ( e )/ d ( v ) .2. Pick a vertex w from e , with probability γ e ( w )/ δ ( e ) .3. Move to vertex v t + = w , at time t + H by writing out the transition probabilities according to theabove steps. Definition 2.

A random walk on a hypergraph with edge-dependent vertex weights H = ( V , E , ω , γ ) is aMarkov chain on V with transition probabilities p v , w = (cid:213) e ∈ E ( v ) (cid:18) ω ( e ) d ( v ) (cid:19) (cid:18) γ e ( w ) δ ( e ) (cid:19) . (1)The probability transition matrix P of a random walk on H is the | V | × | V | matrix with entries P ( v , w ) = p v , w and can be written in matrix form as P = D − V W D − E R . (We use the convention that probabilitytransition matrices have row sum 1.) Using the probability transition matrix P , we can also define a randomwalk with restart on H [36]. The random walk with restart is useful when it is unknown whether therandom walk is irreducible.Note that our definition allows self-loops, i.e. p v , v >

0, and thus the random walk is lazy. While onecan define a non-lazy random walk (i.e. p v , v = v ), the analysis of such walks is significantly moredifficult, as the probability transition matrix cannot be factored as easily. In the Supplement, we show thata weaker version of Theorem 4 below holds for a non-lazy random walk. Cooper et al. [9] also studies thecover time of a non-lazy random walk on a hypergraph with edge-independent vertex weights.Next, we define what it means for two random walks to be equivalent. Because random walks areMarkov chains, we define equivalence in terms of Markov chains.4 efinition 3. Let M and N be Markov chains with the same (countable) state space, and let P M and P N betheir respective probability transition matrices. We say that M and N are equivalent if P Mx , y = P Nx , y for all states x and y .Using this definition, we state our first main theorem: a random walk on a hypergraph with edge-independent vertex weights is equivalent to a random walk on its clique graph, for some choice of weightson the clique graph. Theorem 4.

Let H = ( V , E , ω , γ ) be a hypergraph with edge- independent vertex weights. There exist weights w u , v on the clique graph G H such that a random walk on H is equivalent to a random walk on G H . Theorem 4 generalizes the result by Agarwal et al. [2] who showed that the two hypergraph Laplacianmatrices constructed in Zhou et al. [43] and Rodriguez-Velazquez [34] are equal to the Laplacian matrix ofeither the clique graph or the star graph, another graph constructed from a hypergraph. Agarwal et al. [2]also showed that the Laplacians of the clique graph and the star graph are equal when H is k -uniform (i.e.when all hyperedges have size k ), and are very close otherwise. Since the Laplacian matrices in Zhou et al.[43] and Rodriguez-Velazquez [34] are derived from random walks on edge-independent vertex weights,Theorem 4 implies that both Laplacians are equal to the Laplacian of the clique graph – even when thehypergraph is not k -uniform – thus strengthening the result in Agarwal et al. [2].The proof of Theorem 4 relies on the fact that a random walk on H satisfies a property known as time-reversibility : π u p u , v = π v p v , u for all vertices u , v ∈ V , where π is the stationary distribution of therandom walk [3]. It is well-known that a Markov chain can be represented as a random walk on a graph ifand only if it is time-reversible. Moreover, time-reversiblility allows us to derive a formula for the weights w u , v on G H . Let γ ( v ) = γ e ( v ) be the edge-independent weight for vertex v . Then, w u , v = π u p u , v = (cid:213) e ∈ E ( u , v ) ω ( e ) γ ( u ) γ ( v ) δ ( e ) . (2)Conversely, the caption of Figure 1 describes a simple example of a hypergraph with edge-dependentvertex weights that is not time-reversible. This proves the following result. Theorem 5.

There exists a hypergraph with edge-dependent weights H = ( V , E , ω , γ ) such that a randomwalk on H is not equivalent to a random walk on its clique graph G H for any choice of edge weights on G H . Anecdotally, we find from simulations that most random walks on hypergraphs with edge-dependentvertex weights are not time-reversible, and therefore satisfy Theorem 5. However, it is not clear how toformalize this observation.Theorem 5 says that random walks on graphs with vertex set V are a strict subset of Markov chains on V . A natural follow-up question is whether all Markov chains on V can be described as a random walk onsome hypergraph H with vertex set V and edge-dependent vertex weights. In the Supplement, we showthat the answer to this question is no and provide a counterexample.In addition, we show in the Supplement that hypergraphs with edge-dependent vertex weights create arich hierarchy of Markov chains, beyond the division between time-reversible and time-irreversible Markovchains. In particular, we show that random walks on hypergraphs with edge dependent vertex weights andat least one hyperedge of cardinality k cannot in general be reduced to a random walk on a hypergraphwith hyperedges of cardinality at most k − H = ( V , E , ω , γ ) , do there exist5eights on the clique graph G H such that random walks on H and G H are “close”? We provide a partialanswer to this question in Section 5, where we show that, for a specific choice of weights on G H , thesecond-smallest eigenvalues of the Laplacian matrices of H and G H are close. Recall the formula for the stationary distribution of a random walk on a graph. If G = ( V , E , w ) is a graph,then the stationary distribution π of a random walk on G is π v = ρ (cid:213) e ∈ E ( v ) w ( e ) , (3)where ρ = (cid:0) (cid:205) e ∈ E w ( e ) (cid:1) − . We derive a formula for the stationary distribution for a random walk on ahypergraph with edge-dependent vertex weights; the formula is analogous to equation (3) above with twoimportant changes: first, the proportionality constant ρ depends on the hyperedge, and second, each termin the sum is multiplied by the vertex weight γ e ( v ) . Theorem 6.

Let H = ( V , E , ω , γ ) be a hypergraph with edge-independent vertex weights. There exist positiveconstants ρ e such that the stationary distribution π of a random walk on H is π v = (cid:213) e ∈ E ( v ) ρ e ω ( e ) γ e ( v ) . (4) Moreover, ρ e can be computed in time O (cid:0) | E | + | E | · | V | (cid:1) . Note that while the vertex weights γ e ( v ) can be scaled arbitrarily without affecting the properties ofthe random walk, Theorem 6 suggests that ρ e is the “correct” scaling factor.When the hypergraph has edge-independent vertex weights (i.e. γ e ( v ) = γ ( v ) for all incident hyperedges e ), ρ e = ( (cid:205) v ∈ V γ ( v ) d ( v )) − , leading to the following formula for the stationary distribution: π v = d ( v ) γ ( v ) (cid:205) v ∈ V d ( v ) γ ( v ) . (5)Furthermore, if the vertex weights are trivial (i.e. γ ( v ) =

1) then π v = d ( v )/ (cid:205) v ∈ V d ( v ) , recoveringthe formula derived in Zhou et al. [43] for the stationary distribution of hypergraphs with trivial vertexweights. In this section, we derive a bound on the mixing time of a random walk on H = ( V , E , ω , γ ) . First, we recallthe definition of the mixing time of a Markov chain. Definition 7.

Let M be a Markov chain with states S and probability transition matrix P . The mixing time of M is t mix ( ϵ ) = min { t ≥ || P t ( s , ·) − π || TV ≤ ϵ , ∀ s ∈ S } , where || · || TV is the total variation distance.We derive the following bound on the mixing time for a random walk on a hypergraph with edge-dependent vertex weights. 6 eorem 8. Let H = ( V , E , ω , γ ) be a hypergraph with edge-dependent vertex weights. Without loss ofgenerality, assume ρ e = (i.e. by multiplying the vertex weights in hyperedge e by ρ e ). Then, t Hmix ( ϵ ) ≤ (cid:38) β Φ log (cid:32) ϵ (cid:112) d min β (cid:33)(cid:39) , (6) where• Φ is the Cheeger constant of a random walk on H [22, 30]• d min is the minimum degree of a vertex in H , i.e. d min = min v d ( v ) ,• β = min e ∈ E , v ∈ e (cid:18) γ e ( v ) δ ( e ) (cid:19) ,• β = min e ∈ E , v ∈ e (cid:0) γ e ( v ) (cid:1) . This bound on the mixing time of the hypergraph random walk has a similar form to the bound on themixing time bound for a random walk on a graph [22]. For a graph G with edge weights w ( e ) satisfying (cid:205) v d ( v ) =

1, we have, t Gmix ( ϵ ) ≤ (cid:24) Φ log (cid:18) ϵ √ d min (cid:19)(cid:25) . (7)Note that both t Hmix ( ϵ ) and t Gmix ( ϵ ) have the same dependence on 1 / Φ , log ( / ϵ ) , and log ( /√ d min ) .Intuitively, the additional dependence of t Hmix ( ϵ ) on β and β is because small values of β and β correspondto the hypergraph having vertices that are hard to reach, and the presence of such vertices increases themixing time. Let H = ( V , E , ω , γ ) be a hypergraph with edge-dependent vertex weights. Since a random walk on H is aMarkov chain, we can model the transition probabilities p Hu , v of the random walk using a weighted directed graph G with the same vertex set V . Specifically, let G = ( V , E (cid:48) , w (cid:48) ) be a directed graph with directededges E (cid:48) = {( u , v ) : ∃ e ∈ E with u , v ∈ e } , and edge weights w (cid:48) u , v = p Hu , v . Extending the definition of theLaplacian matrix for directed graphs [8], we define a Laplacian matrix L for the hypergraph H as follows. Definition 9 (Random walk-based hypergraph Laplacian) . Let H = ( V , E , ω , γ ) be a hypergraph withedge-dependent vertex weights. Let P be the probability transition matrix of a random walk on H withstationary distribution π . Let Π be a | V | × | V | diagonal matrix with Π v , v = π v . Then, the random walk-basedhypergraph Laplacian matrix L is L = Π − Π P + P T Π . (8)At first glance, one might hypothesize that the hypergraph Laplacian L defined above does not modelhigher-order relations between vertices, since L is defined using a directed graph containing edges onlybetween pairs of vertices. Indeed, if H has edge-independent vertex weights, then it is true that L doesnot model higher-order relations between vertices. This is because the transition probabilities p Hu , v arecompletely determined by the edge weights of the undirected clique graph G H (Theorem 4). Thus, for eachpair ( u , v ) of vertices in H , only a single quantity w u , v , which encodes a pairwise relation between u and v ,is required to define the random walk. As such, the Laplacian matrix L defined in Equation (8) is equal to7he Laplacian matrix of an undirected graph, showing that L only encodes pairwise relationships betweenvertices.In contrast, when H has edge-dependent vertex weights, the transition probabilities p Hu , v generallycannot be computed from a single quantity w u , v defined for each pair ( u , v ) of vertices (Theorem 5). Theabsence of such a reduction implies that the transition probabilities p Hu , v , which are the edge weights ofthe directed graph G (cid:48) , encode higher-order relations between vertices. Thus, the Laplacian matrix L alsoencodes these higher-order relations.From Chung [8], the hypergraph Laplacian matrix L given in equation (8) is positive semi-definiteand has a Rayleigh quotient for computing its eigenvalues. L can be used in developing spectral learningalgorithms for hypergraphs with edge-dependent vertex weights, or to study the properties of randomwalks on such hypergraphs. For example, the following Cheeger inequality for hypergraphs follows directlyfrom the Cheeger inequality for directed graphs [8]. Theorem 10 (Cheeger inequality for hypergraphs) . Let H = ( V , E , ω , γ ) be a hypergraph with edge-dependentvertex weights. Let L be the Laplacian matrix given in equation (8) , and let Φ be the Cheeger constant of arandom walk on H . Let λ i be the non-zero eigenvalues of L , and let λ = min i λ i . We have Φ ≤ λ ≤ Φ . (9) In Section 3, we posed the following question: given a hypergraph H with edge-dependent vertex weights,can we find weights on the clique graph G H such that the random walks of H and G are close? We provethe following result. Theorem 11.

Let H = ( V , E , ω , γ ) be a hypergraph, with the edge-dependent vertex weights normalized sothat ρ e = for all hyperedges e . Let G H be the clique graph of H , with edge weights w u , v = (cid:213) e ∈ E ( u , v ) ω ( e ) γ e ( u ) γ e ( v ) δ ( e ) . (10) Let L H , L G be the Laplacians of H and G H , respectively, and let λ H , λ G be the second-smallest eigenvalues of L H , L G , respectively. Then c ( H ) λ H ≤ λ G ≤ c ( H ) λ H , (11) where c ( H ) = max v ∈ V (cid:18) max e ∈ E γ e ( v ) min e ∈ E γ e ( v ) (cid:19) . This theorem says that there exist edge weights w u , v on G H such that second smallest eigenvalues ofthe Laplacians of H and G H are within a constant factor c ( H ) of each other, where c ( H ) is determined by thevertex weights. We do not know if the edge weights in Equation (59) give the tightest bound, or if anotherchoice of edge weights on G H will yield a Laplacian L G that is “closer” to the hypergraph Laplacians L H .Interestingly, Zhang et al. [42] use a variant of L G as the Laplacian matrix of a hypergraph with edge-dependent vertex weights, and obtain state-of-the-art results on an object classification task. Theorem 11provides some theoretical evidence for why Zhang et al. [42] are able to obtain good results, even with the“wrong” Laplacian. We demonstrate the utility of hypergraphs with edge-dependent vertex weights in two different rankingapplications: ranking authors in an academic citation network, and ranking players in a video game.8 .1 Citation Network

We construct a citation network of all machine learning papers from NIPS, ICML, KDD, IJCAI, UAI, ICLR,and COLT published on or before 10/27/2017, and extracted from the ArnetMiner database [35]. Werepresent the network as a hypergraph whose vertices V are authors and whose hyperedges E are papers,such that each hyperedge e connects the authors of a paper. The hypergraph has | V | = | E | = H T = ( V , E , ω , ) has trivial vertex weights with γ e ( v ) = v and incident hyperedges e , and H D = ( V , E , ω , γ e ) has edge-dependentvertex weights γ e ( v ) = (cid:40) v is the first or last author of paper,1 if vertex v is a middle author of paper . The edge-dependent vertex weights γ e ( v ) model unequal contributions by different authors. For paperswhose authors are in alphabetical order (as is common in theory papers), we set vertex weights γ e ( v ) = v ∈ e . We set the hyperedge weights ω ( e ) = ( number of citations for paper e ) + H T and H D (restartparameter β = . v in each hypergraph by their value in the stationary distribution.This yields two different rankings of authors: one with edge-independent vertex weights, and one withedge-dependent vertex weights.The two rankings have a Kendall τ correlation coefficient [23] of 0 .

77, indicating modest similarity.Examining individual authors, we typically see that authors who are first/last authors on their most citedpapers have higher rankings in H D compared to H T , e.g. Ian Goodfellow [17]. In contrast, authors who aremiddle authors on their most cited papers have lower rankings in H D relative to their rankings in H T . Table6.1 shows the authors with rank above 700 in at least one of the two hypergraphs, and with the largest gainin rank in H D relative to H T . Name Rank in H T Rank in H D Richard Socher 687 382Zhongzhi Shi 543 304Daniel Rueckert 619 391Lars Schmidt-Thieme 673 454Tat-Seng Chua 650 435Ian J. Goodfellow 612 413Table 1: Highly ranked authors with the largest increase in rank when edge-dependent vertex weights areused in the hypergraph citation network.We emphasize that this example is intended to illustrate how a straightforward application of vertexweights leads to alternative author rankings. We do not anticipate that our simple scheme for choosingedge-dependent vertex weights will always yield the best results in practice. For example, ChristopherManning drops in rank when edge-dependent vertex weights are added, but this is because he is thesecond-to-last, and co-corresponding, author on his most cited papers in the database. A more robustvertex weighting scheme would include knowledge of such equal-contribution authors, and would alsoincorporate different relative contributions of first, middle, and corresponding authors.

We illustrate the usage of hypergraphs with edge-dependent vertex weights on the rank aggregation problem .The rank aggregation problem aims to combine many partial rankings into one complete ranking. Formally,9iven a universe { , , ..., n } of items and a collection of partial rankings τ , ..., τ k (e.g. τ i = ( , , ) is apartial ranking expressing item 3 < item 1 < item 5), a rank aggregation algorithm should find a permutation σ on { , , ..., n } that is “close” to the partial rankings τ i .We consider a particular application of rank aggregation: ranking players in a multiplayer game. Here,the outcome of a game/match gives a partial ranking τ of the players participating in the match. In additionto the ranking, one may also have additional information such as the scores of each player in the match.The latter setting has been extensively studied; classic ranking methods are the ELO [14], and Glicko [16]systems that are used to rank chess players. More recently, online multiplayer games such as Halo have ledto the development of alternative ranking systems such as Microsoft’s TrueSkill [19] and TrueSkill 2 [29].We develop a rank aggregation algorithm that uses random walks on hypergraphs with edge-dependentvertex weights, and evaluate the performance of this algorithm on a real-world datasets of Halo 2 games. Inthe Supplement, we also include results on experiments with synthetic data. Data.

We analyze the Halo 2 dataset from the TrueSkill paper [19]. This dataset contains two kinds ofmatches: free-for-all matches with up to 8 players, and 1-v-1 matches. There are 31028 free-for-all matchesand 5093 1-v-1 matches among 5507 players. Using the free-for-all matches as partial rankings, we constructrankings of all players in the dataset, and evaluate those rankings on the 1-v-1 matches.

Methods.

A well-known class of rank aggregation algorithms are Markov chain-based algorithms,first developed by Dwork et al. [13]. Markov-chain based algorithms create a Markov chain M whosestates are the players and whose the transition probabilities depend in some way on the partial rankings.The final ranking of players is determined by sorting the values in the stationary distribution π of M . Inour experiments, we use a random walk with restart ( β = .

4) instead of just a random walk, so that thestationary distribution always exists [36].Using the free-for-all matches, we construct rankings of the players using four algorithms. The firstthree algorithms use Markov chains: a random walk on hypergraph H with edge-dependent vertex weights;a random walk on a clique graph; and MC3 , a Markov chain-based rank aggregation algorithm designed byDwork et al. [13]. The fourth algorithm is TrueSkill [19].First, we derive a rank aggregation algorithm using a random walk on a hypergraph H = ( V , E , ω , γ ) with edge-dependent vertex weights. The vertices V are the players, and the hyperedges E correspond tothe free-for-all matches. We set the hyperedge and vertex weights to be ω ( e ) = ( standard deviation of scores in match e ) + , γ e ( v ) = exp [( score of player v in match e )] . This choice of hyperedge weights are inspired by Ding and Yilmaz [11], who also use variance to definethe hyperedge weights of their hypergraph. For vertex weights, we use exp ( score ) . We choose these vertexweights instead of raw scores for two reasons: first, scores in Halo 5 can be negative, but vertex weightsshould be positive, and second, exponentiating the score gives more importance to the winner of a match.We chose to use relatively simple formulas for the hyperedge and vertex weights to evaluate the potentialbenefits of utilizing edge-dependent vertex weights; further optimization of vertex and edge weights mayyield better performance.Second, we derive a rank aggregation algorithm using a random walk on the clique graph G H ofhypergraph H described above, with the edge weights of G H given by Equation 59. Specifically, if H = ( V , E , ω , γ ) is the hypergraph defined above, then G H is a graph with vertex set V and edge weights w u , v defined by w u , v = (cid:213) e ∈ E ( u , v ) ω ( e ) γ e ( u ) γ e ( v ) δ ( e ) . (12)In contrast to Equation 59, here we do not normalize vertex weights on H so that ρ e = , since computing ρ e is computationally infeasible on our large dataset. Instead, we normalize vertexweights so that δ ( e ) = e .Third, we use MC3 , a Markov chain-based rank aggregation algorithm designed by Dwork et al. [13].MC3 uses the partial rankings in each match; it does not use the score information. MC3 is very similar toa random walk on a hypergraph with edge-independent vertex weights. We convert the scores from eachplayer in match i into a partial ranking τ i of the players, and use the τ i as input to MC3.Fourth, we use TrueSkill [19]. TrueSkill models each player’s skill with a normal distribution. Werank players according to the mean of this distribution. We also implemented the probabilistic decisionprocedure for ranking players from the TrueSkill paper, and found no difference in performance betweenranking by the mean of the distribution and the probabilistic decision procedure. Evaluation and Results:

We evaluate the rankings of each algorithm by using them to predict theoutcomes of the 1-v-1 matches. Specifically, given a ranking π of players, we predict that the winner ofa match between two players is the player with the higher ranking in π . Table 2 shows the fraction of1-v-1 matches correctly predicted by each of the four algorithms. Random walks on the hypergraph withedge-dependent vertex weights have significantly better performance than both MC3 and random walkson the clique graph G H , and comparable performance to TrueSkill. Moreover, on 8 .

9% of 1-v-1 matches,the hypergraph method correctly predicts the outcome of the match, while TrueSkill incorrectly predictsthe outcome—suggesting that the hypergraph model is capturing some information about the players thatTrueSkill is missing. Unfortunately, we are unable to identify any specific pattern in the matches where thehypergraph predicted the outcome correctly and TrueSkill predicted incorrectly.Table 2: Result of ranking players for Halo 2 Dataset.Correctly PredictedTrueSkill 73.4%Hypergraph 71.1%Clique Graph 61.1%MC3 52.3%

In this paper, we use random walks to develop a spectral theory for hypergraphs with edge-dependent vertexweights. We demonstrate both theoretically and experimentally how edge-dependent vertex weights modelhigher-order information in hypergraphs and improve the performance of hypergraph-based algorithms.At the same time, we show that random walks on hypergraphs with edge-independent vertex weights areequivalent to random walks on graphs, generalizing earlier results tha showed this equivalence in specialcases [2].There are numerous directions for future work. It would be desirable to evaluate additional applicationswhere hypergraphs with edge-dependent vertex weights have previously been used (e.g. [26, 42]), replacingthe Laplacian used in some of these works with the hypergraph Laplacian introduced in Section 5. Sharperbounds on the approximation of the hypergraph Laplacian by a graph Laplacian are also desirable. Anotherdirection is to examine the relationship between the linear hypergraph Laplacian matrix introduced hereand the nonlinear Laplacian operators that were recently introduced in the case of trivial vertex weights[7] or submodular vertex weights [27, 28].Another interesting direction is in extending graph convolutional neural networks (GCNs) to hyper-graphs. Recent approaches to GCNs implement the graph convolution operator as a non-linear function11f the graph Laplacian [10, 25]. GCNs have also been generalized to hypergraph convolutional neuralnetworks (HGCNs), where the convolution layer operates on a hypergraph with edge-independent vertexweights instead of a graph [15, 37]. The hypergraph Laplacian matrix introduced in this paper would allowone to extend HGCNs to hypergraphs with edge-dependent vertex weights.

References [1] S. Agarwal, Jongwoo Lim, L. Zelnik-Manor, P. Perona, D. Kriegman, and S. Belongie. Beyond pairwiseclustering. In , volume 2, pages 838–845 vol. 2, June 2005.[2] Sameer Agarwal, Kristin Branson, and Serge Belongie. Higher order learning with graphs. In

Proceedings of the 23rd International Conference on Machine Learning , ICML ’06, pages 17–24, NewYork, NY, USA, 2006. ACM. ISBN 1-59593-383-2. doi: 10.1145/1143844.1143847.[3] David Aldous and James Allen Fill.

Reversible Markov Chains and Random Walks on Graphs . 2002.[4] Chen Avin, Yuval Lando, and Zvi Lotker. Radio cover time in hyper-graphs.

Ad Hoc Networks , 12:278 –290, 2014. ISSN 1570-8705. doi: http://doi.org/10.1016/j.adhoc.2012.08.010.[5] Abdelghani Bellaachia and Mohammed Al-Dhelaan. Random walks in hypergraph. In

Proceedings ofthe 2013 International Conference on Applied Mathematics and Computational Methods, Venice Italy ,pages 187–194, 2013.[6] S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In

SeventhInternational World-Wide Web Conference (WWW 1998) , 1998.[7] T.-H. Hubert Chan, Anand Louis, Zhihao Gavin Tang, and Chenzi Zhang. Spectral properties ofhypergraph laplacian and approximation algorithms.

J. ACM , 65(3):15:1–15:48, March 2018. ISSN0004-5411. doi: 10.1145/3178123.[8] Fan Chung. Laplacians and the cheeger inequality for directed graphs.

Annals of Combinatorics , 9(1):1–19, Apr 2005. ISSN 0219-3094. doi: 10.1007/s00026-005-0237-z.[9] Colin Cooper, Alan Frieze, and Tomasz Radzik. The cover times of random walks on random uniformhypergraphs.

Theoretical Computer Science , 509:51 – 69, 2013. ISSN 0304-3975. doi: http://dx.doi.org/10.1016/j.tcs.2013.01.020.[10] Micha¨el Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks ongraphs with fast localized spectral filtering.

CoRR , abs/1606.09375, 2016.[11] Lei Ding and Alper Yilmaz. Interactive image segmentation using probabilistic hypergraphs.

PatternRecognition , 43(5):1863 – 1873, 2010. ISSN 0031-3203.[12] Aur´elien Ducournau and Alain Bretto. Random walks in directed hypergraphs and application tosemi-supervised image segmentation.

Comput. Vis. Image Underst. , 120:91–102, March 2014. ISSN1077-3142. doi: 10.1016/j.cviu.2013.10.012.[13] Cynthia Dwork, Ravi Kumar, Moni Naor, and D. Sivakumar. Rank aggregation methods for the web.In

Proceedings of the 10th International Conference on World Wide Web , WWW ’01, pages 613–622,New York, NY, USA, 2001. ACM. ISBN 1-58113-348-0. doi: 10.1145/371920.372165.1214] Arpad E. Elo.

The rating of chessplayers, past and present . Arco Pub., New York, 1978. ISBN 06680472169780668047210.[15] Yifan Feng, Haoxuan You, Zizhao Zhang, Rongrong Ji, and Yue Gao. Hypergraph neural networks.

CoRR , abs/1809.09401, 2018.[16] Mark E Glickman. The glicko system.

Boston University , 1995.[17] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, AaronCourville, and Yoshua Bengio. Generative adversarial nets. In

Advances in neural information processingsystems , pages 2672–2680, 2014.[18] David Harel and Yehuda Koren. On clustering using random walks. In

Proceedings of the 21st Conferenceon Foundations of Software Technology and Theoretical Computer Science , FST TCS ’01, pages 18–41,Berlin, Heidelberg, 2001. Springer-Verlag. ISBN 3-540-43002-4.[19] Ralf Herbrich, Tom Minka, and Thore Graepel. Trueskill™: A bayesian skill rating system. In

Proceedings of the 19th International Conference on Neural Information Processing Systems , NIPS’06,pages 569–576, Cambridge, MA, USA, 2006. MIT Press.[20] Y. Huang, Q. Liu, S. Zhang, and D. N. Metaxas. Image retrieval via probabilistic hypergraph ranking. In , pages 3376–3383,June 2010. doi: 10.1109/CVPR.2010.5540012.[21] Mohsen Jamali and Martin Ester. Trustwalker: A random walk model for combining trust-based anditem-based recommendation. In

Proceedings of the 15th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining , KDD ’09, pages 397–406, New York, NY, USA, 2009. ACM. ISBN978-1-60558-495-9. doi: 10.1145/1557019.1557067.[22] Daniel Jerison. General mixing time bounds for finite markov chains via the absolute spectral gap,October 2013.[23] M. G. Kendall. A new measure of rank correlation.

Biometrika , 30(1/2):81–93, 1938. ISSN 00063444.[24] Sungwoong Kim, Sebastian Nowozin, Pushmeet Kohli, and Chang D. Yoo. Higher-order correlationclustering for image segmentation. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q.Weinberger, editors,

Advances in Neural Information Processing Systems 24 , pages 1530–1538. CurranAssociates, Inc., 2011.[25] Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks.

CoRR , abs/1609.02907, 2016.[26] Jianbo Li, Jingrui He, and Yada Zhu. E-tail product return prediction via hypergraph-based local graphcut. In

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and DataMining , KDD ’18, pages 519–527, New York, NY, USA, 2018. ACM. ISBN 978-1-4503-5552-0.[27] Pan Li and Olgica Milenkovic. Inhomogeneous hypergraph clustering with applications. In I. Guyon,U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,

Advances inNeural Information Processing Systems 30 , pages 2308–2318. Curran Associates, Inc., 2017.[28] Pan Li and Olgica Milenkovic. Submodular hypergraphs: p-laplacians, Cheeger inequalities andspectral clustering. In Jennifer Dy and Andreas Krause, editors,

Proceedings of the 35th InternationalConference on Machine Learning , volume 80 of

Proceedings of Machine Learning Research , pages 3014–3023, Stockholmsm¨assan, Stockholm Sweden, 10–15 Jul 2018. PMLR.1329] Tom Minka, Ryan Cleven, and Yordan Zaykov. Trueskill 2: An improved bayesian skill rating system.March 2018.[30] R. Montenegro and P. Tetali. Mathematical aspects of mixing times in markov chains.

Found. TrendsTheor. Comput. Sci. , 1(3):237–354, May 2006. ISSN 1551-305X. doi: 10.1561/0400000003.[31] Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. On spectral clustering: Analysis and an algorithm.In

Proceedings of the 14th International Conference on Neural Information Processing Systems: Naturaland Synthetic , NIPS’01, pages 849–856, Cambridge, MA, USA, 2001. MIT Press.[32] E. Ramadan, A. Tarafdar, and A. Pothen. A hypergraph model for the yeast protein complex network.In , pages 189–,April 2004. doi: 10.1109/IPDPS.2004.1303205.[33] Anna Ritz, Allison N. Tegge, Hyunju Kim, Christopher L. Poirel, and T.M. Murali. Signaling hyper-graphs.

Trends in Biotechnology , 32(7):356 – 362, 2014. ISSN 0167-7799. doi: http://doi.org/10.1016/j.tibtech.2014.04.007.[34] Juan Alberto Rodriguez-Velazquez. On the laplacian eigenvalues and metric parameters of hypergraphs.

Linear and Multilinear Algebra , 50:1–14, 03 2002.[35] Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. Arnetminer: Extraction andmining of academic social networks. In

Proceedings of the 14th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining , KDD ’08, pages 990–998, New York, NY, USA, 2008. ACM.ISBN 978-1-60558-193-4. doi: 10.1145/1401890.1402008.[36] H. Tong, C. Faloutsos, and J. Pan. Fast random walk with restart and its applications. In

SixthInternational Conference on Data Mining (ICDM’06) , pages 613–622, Dec 2006. doi: 10.1109/ICDM.2006.70.[37] Naganand Yadati, Madhav Nimishakavi, Prateek Yadav, Anand Louis, and Partha Talukdar. Hypergcn:Hypergraph convolutional networks for semi-supervised classification.

CoRR , abs/1809.02589, 2018.[38] Wenyin Yang, Guojun Wang, Md Zakirul Alam Bhuiyan, and Kim-Kwang Raymond Choo. Hypergraphpartitioning for social networks based on information entropy modularity.

Journal of Network andComputer Applications , 86:59 – 71, 2017. ISSN 1084-8045. Special Issue on Pervasive Social Networking.[39] Emine Yilmaz, Javed A. Aslam, and Stephen Robertson. A new rank correlation coefficient forinformation retrieval. In

Proceedings of the 31st Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval , SIGIR ’08, pages 587–594, New York, NY, USA, 2008. ACM.ISBN 978-1-60558-164-4. doi: 10.1145/1390334.1390435.[40] Kaiman Zeng, Nansong Wu, Arman Sargolzaei, and Kang Yen. Learn to rank images: A unifiedprobabilistic hypergraph model for visual search.

Mathematical Problems in Engineering , 2016:1–7, 012016. doi: 10.1155/2016/7916450.[41] Z. Zhang, H. Lin, X. Zhao, R. Ji, and Y. Gao. Inductive multi-hypergraph learning and its applicationon view-based 3d object classification.

IEEE Transactions on Image Processing , 27(12):5957–5968, Dec2018.[42] Zizhao Zhang, Haojie Lin, and Yue Gao. Dynamic hypergraph structure learning. In

Proceedings ofthe Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18 , pages 3162–3169.International Joint Conferences on Artificial Intelligence Organization, 7 2018.1443] Dengyong Zhou, Jiayuan Huang, and Bernhard Sch¨olkopf. Learning with hypergraphs: Clustering,classification, and embedding. In

Proceedings of the 19th International Conference on Neural InformationProcessing Systems , NIPS’06, pages 1601–1608, Cambridge, MA, USA, 2006. MIT Press.15

Incorrect Stationary Distribution in Earlier Work

Li et al. [26] claim in Equation 4 that the stationary distribution π of a random walk on a hypergraph H = ( V , E , γ , ω ) with edge-dependent vertex weights is π v = d ( v ) (cid:205) u ∈ V d ( u ) , (13)where d ( v ) = (cid:205) e ∈ E ( v ) ω ( e ) is the sum of edge weights of incident hyperedges. Curiously, the stationarydistribution given by this formula does not depend on the vertex weights. A counterexample to this formulais shown in hypergraph H in Figure 1 of the main text, with edge-dependent vertex weights as described inthe caption (i.e. γ e ( b ) = , γ e ( b ) = π of a random walk on H yields that π b = /

20, while Equation (13) incorrectly yields π b = / B Proof of Theorem 4

First we need the following definition and lemma.

Definition 12.

Let M be a Markov chain with state space X and transition probabilities p x , y , for x , y ∈ S .We say M is reversible if there exists a probability distribution π over S such that π x p x , y = π y p y , x . (14) Lemma 13.

Let M be an irreducible Markov chain with finite state space S and transition probabilities p x , y for x , y , ∈ S . M is reversible if and only if there exists a weighted, undirected graph G with vertex set S suchthat a random walk on G and M are equivalent.Proof of Lemma. First, suppose M is reversible. Since M is irreducible, let π be the stationary distributionof M . Note that, because M is irreducible, π x (cid:44) x .Let G be a graph with vertices S , and edge weights w x , y = π x p x , y . By reversibility, G is well-defined. Ina random walk on G , the probability of going from x to y in one time-step is w x , y (cid:205) z ∈ S w x , z = π x p x , y (cid:205) z ∈ S π x p x , z = p x , y (cid:205) z ∈ S p x , z = p x , y , since (cid:205) z ∈ S p x , z = M is reversible, the stated claim holds. The other direction follows from the fact that a randomwalk on an undirected graph is always reversible [3]. (cid:3) Theorem 4.

Let H = ( V , E , ω , γ ) be a hypergraph with edge- independent vertex weights. Then, there existweights w u , v on the clique graph G H such that a random walk on H is equivalent to a random walk on G H .Proof of Theorem 4. Let γ ( v ) = γ e ( v ) for vertices v and incident hyperedges e . We first show that a randomwalk on H is reversible. By Kolmogorov’s criterion, reversibility is equivalent to p v , v p v , v · · · p v n , v = p v , v n p v n , v n − · · · p v , v . (15)for any set of vertices v , . . . , v n .Since the transition probabilities for any two vertices u , v are p u , v = (cid:213) e ∈ E ( u , v ) ω ( e ) d ( u ) γ ( u ) δ ( e ) = γ ( u ) δ ( u ) (cid:213) e ∈ E ( u , v ) ω ( e ) δ ( e ) , (16)16e have p v , v p v , v · · · p v n , v = (cid:169)(cid:173)(cid:171) γ ( v ) δ ( v ) (cid:213) e ∈ E ( v , v ) ω ( e ) δ ( e ) (cid:170)(cid:174)(cid:172) · · · (cid:169)(cid:173)(cid:171) γ ( v n ) δ ( v n ) (cid:213) e ∈ E ( v n , v ) ω ( e ) δ ( e ) (cid:170)(cid:174)(cid:172) = n (cid:214) i = (cid:169)(cid:173)(cid:171) γ ( v i ) δ ( v i ) (cid:213) e ∈ E ( v i , v i + ) ω ( e ) δ ( e ) (cid:170)(cid:174)(cid:172) , where we define v n + = v = (cid:169)(cid:173)(cid:171) γ ( v ) δ ( v ) (cid:213) e ∈ E ( v n , v ) , ω ( e ) δ ( e ) (cid:170)(cid:174)(cid:172) · · · (cid:169)(cid:173)(cid:171) γ ( v ) δ ( v ) (cid:213) e ∈ E ( v , v ) ω ( e ) δ ( e ) (cid:170)(cid:174)(cid:172) = p v , v n p v n , v n − · · · p v , v . (17)So by Kolmogorov’s criterion, a random walk on H is reversible.Furthermore, because H is connected, random walks on H are irreducible. Thus, by Lemma 13, thereexists a graph G with vertex set V and edge weights w u , v such that random walks on G and H are equivalent.The equivalence of the random walks implies that p u , v > w u , v >

0, so it follows that G isthe clique graph of H . (cid:3) C Non-Lazy Random Walks on Hypergraphs

First we generalize the random walk framework of Cooper et al. [9] to random walks on hypergraphs withedge-dependent vertex weights. Informally, in a non-lazy random walk, a random walker at vertex v willdo the following:1. pick an edge e containing v , with probability ω ( e ) d ( v ) ,2. pick a vertex w (cid:44) v from e , with probability γ e ( w ) δ ( e )− γ e ( v ) , and3. move to vertex w .Formally, we have the following. Definition 14. A non-lazy random walk on a hypergraph with edge-dependent vertex weights H = ( V , E , ω , γ ) is a Markov chain on V with transition probabilities p v , w = (cid:213) e ∈ E ( v ) (cid:18) ω ( e ) d ( v ) (cid:19) (cid:18) γ e ( w ) δ ( e ) − γ e ( v ) (cid:19) . (18)for all states v (cid:44) w .It is also useful to define a modified version of the clique graph without self-loops. Definition 15.

Let H = ( V , E , ω , γ ) be a hypergraph with edge-dependent vertex weights. The clique graphof H without self-loops , G Hnl , is a weighted, undirected graph with vertex set V , and edges E (cid:48) defined by E (cid:48) = {( v , w ) ∈ V × V : v , w ∈ e for some e ∈ E , and v (cid:44) w } . (19)In contrast to the lazy random walk, a non-lazy random walk on a hypergraph with edge-independentvertex weights is not guaranteed to satisfy reversibility. However, if H has trivial vertex weights, thenreversibility holds, and we get the following result. 17 eorem 16. Let H = ( V , E , ω , γ ) be a hypergraph with trivial vertex weights, i.e. γ e ( v ) = for all vertices v and incident hyperedges e . Then, there exist weights w u , v on the clique graph without self-loops G Hnl such thata non-lazy random walk on H is equivalent to a random walk on G Hnl .Proof.

Again, we first show that a non-lazy random walk on H is reversible. Define the probability massfunction π v = c · d ( v ) for normalizing constant c >

0. Let p u , v be the probability of going from u to v in anon-lazy random walk on H , where u (cid:44) v . Then, π u P u , v = c · d ( u ) · (cid:169)(cid:173)(cid:171) (cid:213) e ∈ E ( u , v ) w ( e ) d ( u ) · | e | − (cid:170)(cid:174)(cid:172) = (cid:213) e ∈ E ( u , v ) (cid:18) ω ( e ) · c | e | − (cid:19) . By symmetry, π u p u , v = π v p v , u , so a non-lazy random is reversible. Thus, by Lemma 13, there exists agraph G with vertex set V and edge weights w u , v such that a random walk on G and a non-lazy randomwalk on H are equivalent. The equivalence of the random walks implies that p u , v > w u , v > G is the clique graph of H without self-loops. (cid:3) D Relationships between Random Walks on Hypergraphs and MarkovChains on Vertex Set

In the main text, we show that there are hypergraphs with edge-dependent vertex weights whose randomwalks are not equivalent to a random walk on a graph. A natural follow-up question is to ask whether allMarkov chains on a vertex set V can be represented as a random walk on some hypergraph with the samevertex set and edge-dependent vertex weights. Below, we show that the answer is no. Since random walkson hypergraphs with edge-dependent vertex weights are lazy, in the sense that p v , v > v ,we restrict our attention to lazy Markov chains with p v , v = Claim 17.

There exists a lazy Markov chain M with state space V such that M is not equivalent to a randomwalk on a hypergraph with vertex set V and edge-dependent vertex weights. Proof.

Suppose for the sake of contradiction that any lazy Markov chain with V is equivalent to a randomwalk on some hypergraph with vertex set V . Let M be a lazy Markov chain with states V and transitionprobabilities p M , with the following property. For some states x , y ∈ V , let p Mx , x = . p Mx , y = . p My , x = . p My , y = . . (20)By assumption, let H = ( V , E , ω , γ ) be a hypergraph with vertex set V and edge-dependent vertexweights, such that a random walk on H is equivalent to M . Let p H be the transition probabilities of a18andom walk on H . We have d ( x ) · p Mx , x = d ( x ) · p Hx , x = (cid:213) e ∈ E ( x ) ω ( e ) · (cid:18) γ e ( x ) δ ( x ) (cid:19) ≥ (cid:213) e ∈ E ( x , y ) ω ( e ) · (cid:18) γ e ( x ) δ ( x ) (cid:19) = d ( y ) · p Hy , x = d ( y ) · p My , x (21)Plugging in Equations (20) to the above yields d ( x ) · . ≥ d ( y ) · .

1, or 9 d ( x ) ≥ d ( y ) .By similar reasoning, we also have d ( y ) · p My , y ≥ d ( x ) · p Mx , y , and plugging in Equations (20) gives us d ( y ) · . ≥ d ( x ) · .

01, or d ( y ) ≥ d ( x ) .Combining both of these inequalities, we obtain9 d ( x ) ≥ d ( y ) ≥ d ( x ) . (22)Since the vertex degree d ( x ) ≥

0, we obtain a contradiction. (cid:3)

Next, for any k >

1, define a k -hypergraph to be a hypergraph with edge-dependent vertex weightswhose hyperedges have cardinality at most k . We show that, for any k , there exists a k -hypergraph withvertex set V whose random walk is not equivalent to the random walk of any ( k − ) -hypergraph withvertex set V . We first prove the result for k = Lemma 18.

There exists a -hypergraph with vertex set V , whose random walk is not equivalent to a randomwalk on any -hypergraph with vertex set V .Proof. Let H = ( V , E , ω , γ ) be a 3-hypergraph with four vertices, V = { v , v , v , v } , and two hyperedges e = { v , v , v } and e = { v , v , v } . Let the hyperedge weights be ω ( e ) = ω ( e ) = γ e ( v ) =

2, and γ e i ( v j ) = v j , e i such that v j ∈ e i .For the sake of contradiction, suppose a random walk on H is equivalent to a random walk on H = ( V , E , ω , γ ) , where H is a 2-hypergraph with vertex set V . Let p H i be the transition probabilities of H i for i = ,

3; by assumption, p H = p H . H must have the following edges: e (cid:48) = { v , v } , e (cid:48) = { v , v } , e (cid:48) = { v , v } , e (cid:48) = { v , v } , and e (cid:48) = { v , v } . WLOG let γ e ij ( v i ) + γ e ij ( v j ) = i , j . Moreover, while we do not depict these edgesin the figure below, H also has edges e (cid:48) i = { v i } for i = , , ,

4, though it may be the case that ω ( e (cid:48) i ) = ω ij for ω ( e (cid:48) ij ) , ω i for ω ( e (cid:48) i ) , and γ ijk for γ e (cid:48) ij ( v k ) where k ∈ { i , j } .By definition, we have 12 = p H v , v = p H v , v = (cid:18) ω ω + ω + ω (cid:19) γ (23)Thus, (cid:16) ω ω + ω + ω (cid:17) = ( · γ ) − .By similar analysis of p H v , v , and using that γ + γ =

1, we also have (cid:16) ω ω + ω + ω (cid:17) = (cid:0) (cid:0) − γ (cid:1) (cid:1) − .Thus, adding together the bounds on p H v , v and p H v , v γ + ( − γ ) = (cid:18) ω ω + ω + ω (cid:19) + (cid:18) ω ω + ω + ω (cid:19) ≤ . (24)19igure 2: Pictured above is H .Figure 3: Pictured above is H . For illustrative purposes, we do not draw out singleton edges.Note that, to get the bound in Equation (24), we summed p H v , v i for i (cid:44)

2. If we follow the same stepsbut replace v with v , v , we get the following bounds, respectively:18 · γ + ( − γ ) + ( − γ ) ≤ γ + γ + γ ≤ . (26)Now, solving for γ in Equation (24) yields γ ≥ ( − γ ) − γ . (27)20ext, using that γ ijk ∈ [ , ] , we bound Equation (25):1 ≥ γ + ( − γ ) + ( − γ )≥ γ + + = γ + . (28)Solving for γ yields γ ≤ . Combining with Equation (27):1013 ≥ γ ≥ ( − γ ) − γ = ⇒ γ ≤ . (29)Bounding Equation (26) in a similar way to Equation (28) gives us:1 ≥ γ + γ + γ ≥ γ + + = γ + . (30)Solving for γ gives us γ ≥ . (31)Finally, putting together Equations (29) and (31):310 ≤ γ ≤ , (32)which yields a contradiction, as > . (cid:3) We prove the result for general k by extending the above proof. Theorem 19.

Let k > . Then, there exists a k -hypergraph with vertex set V whose random walk is notequivalent to a random walk on any ( k − ) -hypergraph with vertex set V .Proof. For simplicity, assume k is even (our argument can be adapted to odd k ). Write k = ( n + ) . For thesake of contradiction, suppose all k -hypergraphs have random walks equivalent to the random walk ofsome ( k − ) -hypergraph.Let H k = ( V , E k , ω , γ ) be a k -hypergraph with vertices V = { v , . . . , v n , w , . . . , w n , x , y } , and hyper-edges e = { v , . . . , v n , b , c } and e = { w , . . . , w n , b , c } . The edge weights are ω ( e ) = ω ( e ) =

1, and theedge-dependent vertex weights are ω e ( b ) =

2, and ω e i ( v ) = v , e i with v ∈ e i .By assumption, let H k − = ( V , E k − , ω , γ ) be a ( k − ) -hypergraph whose random walk is equivalent toa random walk on H k . Let p H k , p H k − be the transition probabilities of H k , H k − , respectively.Then, in H k − , we have d ( v i ) · p H k − v i , v j = (cid:213) e ∈ E ( v i , v j ) ω ( e ) · (cid:18) γ e ( v j ) δ ( e ) (cid:19) ≤ (cid:213) e ∈ E ( v j ) ω ( e ) · (cid:18) γ e ( v j ) δ ( e ) (cid:19) = d ( v j ) · p H k − v j , v j (33)21igure 4: Pictured above is H k .for all i , j ∈ { , . . . , n } . Since p H k − v i , v j = p H k − v j , v j , the above equation implies d ( v i ) ≤ d ( v j ) . So by symmetry, d ( v i ) = d ( v j ) for all i , j .This means that Equation (33) is actually a strict equality, so (cid:213) e ∈ E ( v i , v j ) ω ( e ) · (cid:18) γ e ( v j ) δ ( e ) (cid:19) = (cid:213) e ∈ E ( v j ) ω ( e ) · (cid:18) γ e ( v j ) δ ( e ) (cid:19) . (34)Since every term in the above sums are positive and equal, it must be the case that every hyperedge in H k − containing v j also contains v i , for all i , j . Because they all are in the same hyperedges in both H k − and H k , we can view { v , . . . , v n } as a single “supernode” v . By symmetry, we can also view { w , . . . , w n } as a single supernode w .Thus, we have reduced our problem to the counterexample in Lemma 18, and the result follows. (cid:3) Putting all of our results together gives us the following (informal) hierarchy of Markov chains { random walks on hypergraphs with edge-independent vertex weights } = { random walks on graphs } (cid:40) { random walks on 2-hypergraphs } (cid:40) { random walks on 3-hypergraphs } (cid:40) . . . (cid:40) { all lazy Markov chains } . E Proof of Theorem 6

We first prove the following lemma.

Lemma 20.

Let H = ( V , E ) be a hypergraph with edge-dependent vertex weights γ e ( v ) and hyperedge weights ω ( e ) . Without loss of generality, assume (cid:205) v ∈ e γ e ( v ) = . There exist ρ e > satisfying ρ e = (cid:213) v ∈ e (cid:213) f ∈ E ( v ) d ( v ) − · ρ f · ω ( f ) · γ f ( v ) (35) and (cid:213) e ∈ E ρ e · ω ( e ) = . (36)22 roof of Lemma. Our proof outline is as follows. First, we prove the lemma in the case where the hyperedgeweights are all equal to each other. Then, we extend that result to the case where the hyperedge weightsare rational. Finally, we use the density of (cid:81) in (cid:82) to extend our result from rational hyperedge weights toreal ones.First, suppose all of the hyperedge weights are equal to each other. WLOG let ω ( e ) = e ∈ E .Switching the order of summation in Equation 35, we have (cid:213) v ∈ e (cid:213) f ∈ E ( v ) d ( v ) − · ρ f · ω ( f ) · γ f ( v ) = (cid:213) v ∈ e (cid:213) f ∈ E ( v ) d ( v ) − · ρ f · γ f ( v ) = (cid:213) f ∈ E (cid:213) v ∈ e ∩ f d ( v ) − · ρ f · γ f ( v ) = (cid:213) f ∈ E ρ f · (cid:169)(cid:173)(cid:171) (cid:213) v ∈ e ∩ f d ( v ) − γ f ( v ) (cid:170)(cid:174)(cid:172) . (37)Now let A be a square matrix of size | E | × | E | , with entries A e , f = (cid:205) v ∈ e ∩ f d ( v ) − γ f ( v ) . Note that thecolumn sums of A are equal to 1: (cid:213) e ∈ E A e , f = (cid:213) e ∈ E (cid:213) v ∈ e ∩ f d ( v ) − γ f ( v ) = (cid:213) v ∈ f (cid:213) e ∈ E ( v ) d ( v ) − γ f ( v ) = (cid:213) v ∈ f d ( v ) − γ f ( v ) · d ( v ) = (cid:213) v ∈ f γ f ( v ) = . (38)Thus, by the Perron-Frobenius theorem, A has a positive eigenvector ρ with eigenvalue 1.So by construction, ρ satisfies Equation 35. Moreover, t · ρ also satisfies Equation 35 for any t >

0. Thus, t · ρ with t = ( (cid:205) e ∈ E ρ e · ω ( e )) − satisfies both Equation 35 and Equation 36, and so the lemma is proved inthe case where the hyperedge weights are all equal.Next, assume H is a hypergraph with rational hyperedge weights, i.e. ω ( e ) ∈ (cid:81) for all e ∈ E . Multiplyingthrough by denominators, we can assume ω ( e ) ∈ (cid:78) . Create hypergraph H (cid:48) with vertices V in the followingway. For each hyperedge e , replace e with hyperedges e , ..., e ω ( e ) , where each hyperedge e i :• contains the same vertices as e ,• has weight ω (cid:48) ( e i ) = e , so that γ (cid:48) e i ( v ) = γ e ( v ) for all v ∈ e .Let E (cid:48) be the hyperedges of H (cid:48) , and let M ( v ) be the hyperedges incident to vertex v in H (cid:48) . Since H (cid:48) has equal hyperedge weights, we can find constants ρ (cid:48) e i that satisfy Equations 35 and 36 for H (cid:48) . Note that ρ (cid:48) e i = ρ (cid:48) e j by symmetry.Now, for each hyperedge e of H , let ρ e = ρ (cid:48) e . I claim that ρ e satisfies Equations 35 and 36 for H .Equation 36 is satisfied since ω ( e ) · ρ e = ω ( e ) · ρ (cid:48) e = ρ (cid:48) e + · · · + ρ (cid:48) e ω ( e ) = ω ( e ) (cid:213) i = ρ (cid:48) e i ω (cid:48) ( e i ) , (39)23hich implies (cid:213) e ∈ E ρ e · ω ( e ) = (cid:213) e ∈ E ω ( e ) (cid:213) i = ρ (cid:48) e i ω (cid:48) ( e i ) = (cid:213) e ∈ E (cid:48) ρ (cid:48) e i ω ( e i ) = . (40)To show Equation 35 holds for H , note that d ( v ) − · ρ f · ω ( f ) · γ f ( v ) = ω ( f ) (cid:213) i = (cid:0) d ( v ) − · ρ (cid:48) f i · ω (cid:48) ( f i ) · γ (cid:48) f i ( v ) (cid:1) . (41)Summing over both sides yields (cid:213) v ∈ e (cid:213) f ∈ E ( v ) d ( v ) − · ρ f · ω ( f ) · γ f ( v ) = (cid:213) v ∈ e (cid:213) f ∈ E ( v ) ω ( f ) (cid:213) i = (cid:0) d ( v ) − · ρ (cid:48) f i · ω (cid:48) ( f i ) · γ (cid:48) f i ( v ) (cid:1) = (cid:213) v ∈ e (cid:213) f ∈ M ( v ) d ( v ) − · ρ (cid:48) f · ω (cid:48) ( f ) · γ (cid:48) f ( v ) = (cid:213) v ∈ e (cid:213) f ∈ M ( v ) d ( v ) − · ρ (cid:48) f · ω (cid:48) ( f ) · γ (cid:48) f ( v ) = ρ (cid:48) e , since Equation 35 holds for H (cid:48) = ρ e . (42)Thus, Equations 35 and 36 hold for H when H has rational hyperedge weights.Finally, we consider the general case, where we assume nothing about the hyperedge weights besidesthat they are positive real numbers. By similar reasoning to our proof of the equal hyperedge weight case,we are done if we can find positive ρ e satisfying Equation 35.We have (cid:213) v ∈ e (cid:213) f ∈ E ( v ) d ( v ) − · ρ f · ω ( f ) · γ f ( v ) = (cid:213) v ∈ e (cid:213) f ∈ E ( v ) d ( v ) − · ω ( f ) · ρ f · γ f ( v ) = (cid:213) f ∈ E (cid:213) v ∈ e ∩ f d ( v ) − · ρ f · ω ( f ) · γ f ( v ) = (cid:213) f ∈ E ρ f · (cid:169)(cid:173)(cid:171) (cid:213) v ∈ e ∩ f d ( v ) − · ω ( f ) · γ f ( v ) (cid:170)(cid:174)(cid:172) . (43)Let A be a matrix of size | E | × | E | with entries A e , f = (cid:213) v ∈ e ∩ f d ( v ) − · ω ( f ) · γ f ( y ) . (44)Showing that there exist positive ρ e that satisfy Equation 35 is equivalent to showing that A has apositive eigenvector with eigenvalue 1. By the Perron-Frobenius theorem, this equivalent to A havingspectral radius 1.For each hyperedge e ∈ E , let q e , q e , . . . be a sequence of rational numbers that converges to ω ( e ) , i.e.lim n →∞ q en = ω ( e ) . Let H n be H except we replace all hyperedge weights ω ( e ) with q en . By the previous partof the proof, there exist positive constants ρ n ( e ) that satisfy Equation 35 for H n ; equivalently, if we let A n be the matrix from Equation 44 for hypergraph H n , then A n has spectral radius 1.Since A n has a continuous dependence on the hyperedge weights, and spectral radius is a continuousfunction, it follows that the spectral radius of A is the limit of the spectral radius of A n . Thus, the spectralradius of A is 1, and we are done. (cid:3) Theorem 6.

Let H = ( V , E , ω , γ ) be a hypergraph with edge-independent vertex weights. There exist positiveconstants ρ e such that the stationary distribution π of a random walk on H is π v = (cid:213) e ∈ E ( v ) ω ( e ) · (cid:0) ρ e γ e ( v ) (cid:1) . (45) Moreover, ρ e can be computed in time O (cid:0) | E | + | E | · | V | (cid:1) .Proof of Theorem 6. Without loss of generality, assume δ ( e ) = (cid:205) v ∈ e γ e ( v ) = e , i.e. byscaling ρ e appropriately.Let ρ e > π v = (cid:213) e ∈ E ( v ) ω ( e ) (cid:0) ρ e γ e ( v ) (cid:1) . (46)I claim that π v is the stationary distribution for a random walk on H .First, note that (cid:213) v ∈ V π v = (cid:213) v ∈ V (cid:213) e ∈ E ( v ) ω ( e ) (cid:0) ρ e γ e ( v ) (cid:1) = (cid:213) e ∈ E (cid:213) v ∈ e ω ( e ) (cid:0) ρ e γ e ( v ) (cid:1) = (cid:213) e ∈ E ρ e ω ( e ) (cid:213) v ∈ e γ e ( v ) = (cid:213) e ∈ E ρ e ω ( e ) = , by Equation 36 (47)so π is indeed a probability distribution on V . Now, for any vertex w ∈ V , we have (cid:213) v ∈ V π v p v , w = (cid:213) v ∈ V π v (cid:169)(cid:173)(cid:171) (cid:213) e ∈ E ( v ) ω ( e ) d ( v ) γ e ( w ) (cid:170)(cid:174)(cid:172) = (cid:213) v ∈ V (cid:213) e ∈ E ( v , w ) π v · γ e ( w ) · ω ( e ) · d ( v ) − = (cid:213) e ∈ E ( w ) (cid:213) v ∈ e π v · γ e ( w ) · ω ( e ) · d ( v ) − = (cid:213) e ∈ E ( w ) ω ( e ) · γ e ( w ) (cid:32)(cid:213) v ∈ e π v d ( v ) (cid:33) . (48)If we simplify the inner sum, we get (cid:213) v ∈ e π v d ( v ) = (cid:213) v ∈ e d ( v ) − (cid:213) f ∈ E ( v ) ρ f · ω ( f ) · γ f ( v ) = (cid:213) v ∈ e (cid:213) f ∈ E ( v ) d ( v ) − · ρ f · ω ( f ) · γ f ( v ) = ρ e . (49)Plugging this back in, we get (cid:213) e ∈ E ( w ) ω ( e ) · γ e ( w ) (cid:32)(cid:213) v ∈ e π v d ( v ) (cid:33) = (cid:213) e ∈ E ( w ) ω ( e ) · γ e ( w ) · ρ e = π w . (50)Thus, (cid:205) v ∈ V π v p v , w = π w , so π is a stationary distribution for H .Finally, note that computing A (Equation 44) takes time O (| E | · | V |) when d ( v ) is precomputed, andsolving Aρ = ρ takes time O (| E | ) , so the total runtime to compute ρ e is O (| E | + | E | · | V |) . (cid:3) Proof of Theorem 8

For completeness, we include the definition of the Cheeger constant of a Markov chain [30].

Definition 21.

Let M be an ergodic Markov chain with finite state space V , transition probabilities p u , v ,and stationary distribution π . The Cheeger constant of M is Φ = min S ⊂ V , < π ( S )≤ / (cid:205) x ∈ S , y (cid:60) S π x p x , y π ( S ) , (51)where π ( S ) = (cid:205) v ∈ S π v .First, we prove the following lemma for the mixing time of any lazy Markov chain. Lemma 22.

Let M be a finite, irreducible Markov chain with states S and transition probabilities p x , y , satisfying p x , x ≥ δ for all x ∈ S . Let π be the stationary distribution of M , and let π min be the smallest element of π .Then, t mix ( ϵ ) ≤ (cid:24) δ Φ log (cid:18) ϵ √ π min (cid:19)(cid:25) (52) Proof of Lemma.

We use the notation of Jerison [22]. Let P ∗ be the time-reversal transition matrix of P .Note that P ∗ P and P + P ∗ are both reversible Markov chains. Let α be the square-root of the second-largesteigenvalue of P ∗ P , and let b be the second-largest eigenvalue of P + P ∗ . By the Cheeger inequality, we have1 − b ≥ Φ . Combining this with Lemma 1.21 of Montenegro and Tetali [30] yields E P + P ∗ ( f , f ) Var π ( f ) ≥ Φ , (53)where f : S → (cid:82) is any function, E P + P ∗ ( f , f ) is the Dirichlet form of the Markov chain P + P ∗ , and Var π ( f ) is the variance of f (see Montenegro and Tetali [30] for more details).From Jerison [22], E P ∗ P ( f , f ) ≥ δ E P + P ∗ ( f , f ) . (54)Combining Equations 53 and 54 yields E P ∗ P ( f , f ) Var π ( f ) ≥ Φ δ . (55)Now, from Lemma 1.2 of Montenegro and Tetali [30], 1 − α ≥ E P ∗ P ( f , f ) Var π ( f ) ; plugging this into the aboveequation and rearranging yields α ≤ (cid:16) − Φ δ (cid:17) / ≤ − Φ δ . Plugging this into Equation 1.6 of Jerison [22]yields t mix ( ϵ ) ≤ (cid:24) − α log (cid:18) ϵ √ π min (cid:19)(cid:25) ≤ (cid:24) δ Φ log (cid:18) ϵ √ π min (cid:19)(cid:25) . (cid:3) Theorem 8.

Let H = ( V , E , ω , γ ) be a hypergraph with edge-dependent vertex weights. Without loss ofgenerality, assume ρ e = (i.e. by multiplying the vertex weights in hyperedge e by ρ e ). Then, t Hmix ( ϵ ) ≤ (cid:38) β Φ log (cid:32) ϵ (cid:112) d min β (cid:33)(cid:39) , (56) where Φ is the Cheeger constant of a random walk on H [22, 30]• d min is the minimum degree of a vertex in H , i.e. d min = min v d ( v ) ,• β = min e ∈ E , v ∈ e (cid:18) γ e ( v ) δ ( e ) (cid:19) , and• β = min e ∈ E , v ∈ e (cid:0) γ e ( v ) (cid:1) .Proof of Theorem 8. We have p v , v = (cid:213) e ∈ E ( v ) ω ( e ) d ( v ) γ e ( v ) δ ( e ) ≥ β (cid:213) e ∈ E ( v ) ω ( e ) d ( v ) = β (57)for all vertices v . Similarly, π v = (cid:213) e ∈ E ( v ) ω ( e ) γ e ( v ) ≥ β d ( v ) . (58)Applying Lemma 22 to a random walk on H yields the desired bound: t mix ( ϵ ) ≤ (cid:24) δ Φ log (cid:18) ϵ √ π min (cid:19)(cid:25) ≤ (cid:38) β Φ log (cid:32) ϵ (cid:112) d min β (cid:33)(cid:39) (cid:3) G Proof of Theorem 11

Theorem 11.

Let H = ( V , E , ω , γ ) be a hypergraph with edge-dependent vertex weights, with vertex weightsnormalized so that ρ e = for all hyperedges e . Let G H be the clique graph of H , with edge weights w u , v = (cid:213) e ∈ E ( u , v ) ω ( e ) γ e ( u ) γ e ( v ) δ ( e ) . (59) Let L H , L G be the Laplacians of H and G H , respectively, and let λ H , λ G be the second-smallest eigenvaluesof L H , L G , respectively. Then c ( H ) λ H ≤ λ G ≤ c ( H ) λ H , (60) where c ( H ) = max v ∈ V (cid:18) max e ∈ E γ e ( v ) min e ∈ E γ e ( v ) (cid:19) .Proof of Theorem 11. As shorthand, we write G = G H . Let p Hu , v and π Hv be the transition probabilities,stationary distribution of a random walk on H . Define p Gu , v and π Gv similarly for G . Furthermore, let d H ( v ) and d G ( v ) be the degree in H and G respectively.We will use Theorem 8 of Chung [8] to prove our theorem, which requires us to have lower and upperbounds on π Gv π Hv and π Gv p v , u π Gv p v , u . 27irst, for an arbitrary vertex v , we have π Gv ∝ (cid:213) u ∈ V w u , v = (cid:213) u ∈ V (cid:213) e ∈ E ( u , v ) ω ( e ) γ e ( u ) γ e ( v ) δ ( e ) = (cid:213) e ∈ E ( v ) (cid:213) u ∈ e ω ( e ) γ e ( u ) γ e ( v ) δ ( e ) = (cid:213) e ∈ E ( v ) ω ( e ) γ e ( v ) (cid:18) (cid:205) u ∈ e γ e ( u ) δ ( e ) (cid:19) = (cid:213) e ∈ E ( v ) ω ( e ) γ e ( v ) = π Hv , (61)so random walks on G H and H have the same stationary distributions. Next, for any two vertices u , v ,we have π G ( v ) p Gu , v π H ( v ) p Hu , v = p Gu , v p Hu , v = w u , v d G ( u ) (cid:213) e ∈ E ( u , v ) ω ( e ) d H ( u ) γ e ( v ) δ ( e ) = (cid:213) e ∈ E ( u , v ) ω ( e ) γ e ( u ) γ e ( v ) δ ( e ) (cid:213) e ∈ E ( u , v ) ω ( e ) γ e ( v ) d G ( u ) δ ( e ) d H ( u ) . (62)The RHS is upper-bounded by the maximum ratio of each term in the sum, which ismax u , v d H ( u ) γ e ( u ) d G ( u ) = max u , v (cid:169)(cid:173)(cid:171) (cid:213) f ∈ E ( u ) ω ( f ) (cid:170)(cid:174)(cid:172) γ e ( u ) (cid:169)(cid:173)(cid:171) (cid:213) f ∈ E ( u ) ω ( f ) γ f ( u ) (cid:170)(cid:174)(cid:172) ≤ max u , v (cid:18) max e γ e ( u ) min f γ f ( u ) (cid:19) = max u (cid:18) max e γ e ( u ) min e γ f ( u ) (cid:19) . (63)Similarly, it is lower bounded by min u min e γ e ( u ) max e γ e ( u ) . Applying Theorem 8 of Chung [8] gives the desiredbound. (cid:3) H Rank Aggregation Experiments with Synthetic Data

Data:

We use a variant of the TrueSkill model to generate our data. We assume each player has an intrinsic“skill” level (for simplicity, assume skill does not change over time), and a player’s performance in matchis proportional to their skill plus some added Gaussian noise. Such a model can represent many differentkinds of games, including shooting games (e.g. Halo, scores represent kill/death ratios in a timed free-for-allmatch) and racing games (e.g. Mario Kart, scores are inversely proportional to the time a player takes tofinish a course).The players are { , . . . , n } . Player i has intrinsic skill i , so the true ranking of players, τ ∗ , isplayer 1 < player 2 < · · · player n .

28e create k partial rankings, τ , . . . , τ k , where each partial ranking τ i corresponds to a noisy subsamplingof τ ∗ . More specifically, to create each partial ranking, we do the following.1. Choose a subset of players A ⊂ { , ..., n } , where player i is included in A with probability p .2. Choose a scale factor c uniformly at random from [ / , ] .3. For each player i ∈ A , independently draw a score for player i from a N ( . · i , σ ) distribution, andscale that score by c .4. Set τ j to be a ranking of the players in A according to their score.The tuneable parameters are: n , the number of players to be ranked; σ , the amount of noise in ourpartial rankings; k , the number of partial rankings; and p , which controls the size of each partial ranking.We set the mean score for player i to be 0 . · i , so that the the scale of the simulated scores is similar to thescores from the Halo dataset. Methods:

As with the real data, we create a Markov chain-based rank aggregation algorithm wherethe Markov chain is a random walk on a hypergraph H = ( V , E , ω , γ ) . The vertices are V = { , ..., n } ,and the hyperedges E correspond to the partial rankings τ , . . . , τ k . We set vertex weights γ e j ( v ) = exp [( score of v in partial ranking τ j )] , and edge weights ω ( e j ) = ( standard deviation of scores in τ j ) + clique graph G H , both of which aredescribed in the main text. Results:

We fix universe size n = k to be the smallest number of hyperedges until all n vertices are included at least once. We set σ = p = . , . , . τ correlation coefficient [39] between theestimated ranking and the true ranking τ ∗ . Our weighted hypergraph algorithm outperforms both MC3and the clique graph algorithm in all cases (figure below), with the most significant gains when p is small,i.e. when there is less information in each partial ranking. Moreover, the performance of the clique graphalgorithm is much worse than both MC3 and the weighted hypergraph, which suggests that the cliquegraph is not a good approximation of H . W e i g h t e d K e n d a ll T a u D i s t a n c e Hypergraph vs Dwork Performance in Rank Aggregation