[PDF] A simple bipartite graph projection model for clustering in networks

Abstract

Graph datasets are frequently constructed by a projection of a bipartite graph, where two nodes are connected in the projection if they share a common neighbor in the bipartite graph; for example, a coauthorship graph is a projection of an author-publication bipartite graph. Analyzing the structure of the projected graph is common, but we do not have a good understanding of the consequences of the projection on such analyses. Here, we propose and analyze a random graph model to study what properties we can expect from the projection step. Our model is based on a Chung-Lu random graph for constructing the bipartite representation, which enables us to rigorously analyze the projected graph. We show that common network properties such as sparsity, heavy-tailed degree distributions, local clustering at nodes, the inverse relationship between node degree, and global transitivity can be explained and analyzed through this simple model. We also develop a fast sampling algorithm for our model, which we show is provably optimal for certain input distributions. Numerical simulations where model parameters come from real-world datasets show that much of the clustering behavior in some datasets can just be explained by the projection step.

Full PDF

aa r X i v : . [ c s . S I] J u l A simple bipartite graph projection model for clustering in networks

Austin R. Benson ∗ , Paul Liu † , and Hao Yin ‡ Abstract.

Graph datasets are frequently constructed by a projection of a bipartite graph, where two nodes areconnected in the projection if they share a common neighbor in the bipartite graph; for example, acoauthorship graph is a projection of an author-publication bipartite graph. Analyzing the structureof the projected graph is common, but we do not have a good understanding of the consequences ofthe projection on such analyses. Here, we propose and analyze a random graph model to study whatproperties we can expect from the projection step. Our model is based on a Chung-Lu random graphfor constructing the bipartite representation, which enables us to rigorously analyze the projectedgraph. We show that common network properties such as sparsity, heavy-tailed degree distributions,local clustering at nodes, the inverse relationship between node degree, and global transitivity canbe explained and analyzed through this simple model. We also develop a fast sampling algorithm forour model, which we show is provably optimal for certain input distributions. Numerical simulationswhere model parameters come from real-world datasets show that much of the clustering behaviorin some datasets can just be explained by the projection step.

1. Networks as bipartite projections.

Networks or graphs that consist of a set of nodesand their pairwise interactions are pervasive models throughout the sciences. Oftentimes,network datasets are constructed by a “projection” of a bipartite graph [39, 43, 53, 64];speciﬁcally, given a bipartite graph with left and right nodes, the one-mode projection is a(unipartite) graph on the left nodes, where two left nodes are connected if they share a commonright node neighbor in the bipartite graph. In many cases, these projections are explicit inthe data construction process, such as connecting diseases associated with the same gene [28],people belonging to the same group or team [45, 51], and ingredients appearing in commonrecipes [1, 54]. In other cases, the projection is more implicit. For example, the connections ina social network often arise due to shared interests [14]. Regardless, even though a bipartitegraph is more expressive than its projection, analyzing the projection still leads to valuabledata insights [56, 62], enables the use of standard network analysis tools [9, 37, 63], and caneven be used to make predictions about the bipartite graph itself [8].For network analysis, it is paramount to know if structural properties in the data arise fromsome phenomena of the system under study or are simply consequences of a mathematicalproperty of the graph construction process. Random graph models can serve as null modelsfor making such distinctions [25]. Often, the random graph model maintains some propertyof the network data (at least approximately or in expectation) and then direct mathematicalanalysis of the random graph can be used to determine whether certain structural propertieswill arise as a consequence. For example, Chung and Lu showed that short average pathlengths can be a consequence of a uniform sample of a random graph with an expected powerlaw degree distribution [18].Here, we analyze a simple random graph model that explains some properties of projected ∗ Department of Computer Science, Cornell University, Ithaca, NY, USA ([email protected]). † Department of Computer Science, Stanford University, Stanford, CA, USA ([email protected]) ‡ Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA, USA([email protected]) graphs. More speciﬁcally, the random graph model is a projection of a bipartite “Chung-Lustyle” model. Each left and right node in the bipartite graph has a weight, and the probabilityof an edge is proportional to the product of these weights.The simplicity of this model enables theoretical analysis of properties of the projectedgraph. One fundamental property is clustering : even in a sparse network, there is a tendencyof edges to appear in small clusters or cliques [24, 46, 57]. There are various explanations forclustering, including local evolutionary processes [31, 29, 52], hierarchical organization [47],and community structure [50]. Here, we show how clustering can arise just from bipartiteprojection. We derive an explicit equation for the expected value of a probabilistic variant ofthe local clustering coeﬃcient of a node (the fraction of pairs of neighbors of the node thatare connected) as a function of its weight in the model.We show that local clustering decreases with the inverse of the weight, while expecteddegree grows linearly with the weight, which is consistent with prior empirical measure-ments [41, 50], mean-ﬁeld analysis of models that explicitly incorporate clustering [52], andcertain random intersection graph models [13]. Thus, the weights in the bipartite model area potential confounding factor for this relationship between degree and clustering.In addition, using weight distributions ﬁt from real-world bipartite graph data, we showthat high levels of clustering and clustering levels at a given degree are often just a consequenceof bipartite projection. However, in several datasets, there is still a gap between the clusteringlevels in the data and in the model. Bipartite projection has been mentioned informallyas a reason for clustering in several datasets [26, 42, 44], and a recent study has shownthat sampling from conﬁguration models of hypergraphs and projecting can also reproduceclustering [17]. Our analysis provides theoretical justiﬁcations and further explanations forthese claims, and also shows that the global clustering (also called transitivity) tends towardsa positive constant as the bipartite network grows large. We also analyze a recently introducedmeasure of clustering called the closure coeﬃcient [60, 61] under our projection model andﬁnd that the expected local closure coeﬃcient of every node is the same, which aligns withsome prior empirical results [60].In addition to clustering, we analyze several properties of the bipartite random graph andits projection. For instance, we show that if the weight distribution on the left and rightnodes follow a power law, then the degree distribution for those nodes is also a power law inthe bipartite graph; moreover, the degrees in the projected graph will also follow a power law.Thus, heavy-tailed degree distributions in the projected graph can simply be a consequence ofa process that creates heavy-tailed degree distributions in the bipartite graph. Furthermore,we show that the projected graph is sparse in the sense that, under a mild restriction on themaximum weight, the probability of an edge between any two nodes goes to zero as the numberof nodes in the projected graph grows to inﬁnity. Combined with our results on clustering, ourmodel thus provides a large class of networks that are “locally dense but globally sparse” [58].

We consider networks as undirected graphs G = ( V, E ) without self-loops and multi-edges. We use d ( u ) to denote the degree of node u (the number of edgesincident to node u ) and T ( u ) to denote the number of triangles (3-cliques) containing node u .A wedge is a pair of edges that shared a common node, and the common node is the center of the wedge. A statistic of primary interest is the clustering coeﬃcient : SIMPLE BIPARTITE GRAPH PROJECTION MODEL FOR CLUSTERING IN NETWORKS 3

Deﬁnition 1.1.

The local clustering coeﬃcient of a node u ∈ V is ˜ C ( u ) = T ( u ) d ( u )( d ( u ) − , i.e.,the chance that a randomly chosen wedge centered at u induces a triangle.At the network level, the global clustering coeﬃcient ˜ C G is the probability that a randomlychosen wedge in the entire graph induces a triangle, i.e., ˜ C G = P u ∈ V T ( u ) P u ∈ V d ( u )( d ( u ) − . A closely related measure of clustering is the conditional probability of edge existencegiven the wedge structure [10, 13, 22]. Speciﬁcally, we have the following analogs of the localand global clustering coeﬃcients:(1.1) C G = P [( v, w ) ∈ E | ( u, v ) , ( u, w ) ∈ E ] , where all the nodes u, v, w ∈ V are unspeciﬁed, while the local clustering coeﬃcient is(1.2) C ( u ) = P [( v, w ) ∈ E | ( u, v ) , ( u, w ) ∈ E ] , where u is the speciﬁed node. In both cases, ( u, v ) and ( u, w ) comprise a random wedge fromthe graph. In this paper, we use these slightly diﬀerent deﬁnitions of clustering based onconditional edge existence, as they are more amenable to analysis.An alternative clustering metric is the recently proposed closure coeﬃcient [60, 61]. Deﬁnition 1.2.

The local closure coeﬃcient of a node u ∈ V is ˜ H ( u ) = T ( u ) W h ( u ) , where W h ( u ) is the number of length-2 paths leaving vertex u . In other words, the closure coeﬃcient is thechance that a randomly chosen 2-path emanating from u induces a triangle. Analogously, the conditional probability variant of the closure coeﬃcient is:(1.3) H ( u ) = P [( u, w ) ∈ E | ( u, v ) , ( v, w ) ∈ E ] , where u is the speciﬁed node.The global closure coeﬃcient is equal to the global clustering coeﬃcient, as the number of2-paths is exactly equal to the number of wedges. This is true for both the non-conditional andthe conditional probability variant. In Appendix A, we show that the conditional probabilitydeﬁnitions above correspond to a weighted average over the standard deﬁnitions of clusteringand closure. Henceforth when referring to the clustering or closure coeﬃcients, we alwaysrefer to the conditional probability variant.Next, a graph is bipartite if the nodes can be partitioned into two disjoint subsets L ⊔ R ,which we call the left and right nodes, and any edge is between one node from L and one nodefrom R . We denote a bipartite graph by G b = ( V b , E b ) with V b = L ⊔ R , and call L and R the left and right side of the bipartite graph. The number of nodes on each side is denoted by n L = | L | and n R = | R | , and n b = | V b | = n L + n R is the total number or nodes. Analogously,for any node u ∈ V b , we use d b ( u ) as its degree.The projection of a bipartite graph is the primary concept we analyze. Deﬁnition 1.3.

A projection of a bipartite graph G b = ( L ⊔ R, E b ) is the graph G = ( L, E ) ,where the nodes are the left nodes of the bipartite graph and the edges connect any two nodesin L that connect to some node r ∈ R in the bipartite graph. More formally, (1.4) E = { ( u, v ) | u, v ∈ L, u = v, and ∃ z ∈ R for which ( u, z ) , ( v, z ) ∈ E b } . AUSTIN R. BENSON, PAUL LIU, HAO YIN

If there is more than one right node z that connects to left nodes u and v in the bipartitegraph, the projection only creates a single edge between u and v . Given a dataset, one can project onto the left or right nodes. One can always permute the leftand right nodes, and we assume projection onto the left nodes L for notational consistency.Several statistical properties of the models we consider will use samples drawn from a power law distributions, which are prevalent in network data models [21]. Deﬁnition 1.4.

The probability density function of the power law distribution, parametrizedby ( α, w min , w max ) with α > and < w min < w max ≤ ∞ , is f ( w ) = ( Cw − α if w ∈ [ w min , w max ]0 otherwisewhere w > is any real number and C = ( w − α min − w − α max ) / ( α − is a normalizing constant.For a discrete power-law (or Zipﬁan) distribution, we restrict w to integer values inside [ w min , w max ] and adjust the normalization constant accordingly. The parameter α is the decay exponent of the distribution, while w min and w max specify range.For simplicity, we assume that w min = 1 and w max = Ω(1) throughout this paper.When the maximum range is not speciﬁed, i.e. , w max = ∞ , a standard result on themaximum statistics of power-law samples is the following: Lemma 1.5 (Folklore).

For a discrete or continuous power-law distribution D with param-eters ( α, w min = 1 , w max = ∞ ) and i.i.d. samples w , w , . . . , w n ∼ D , E [max i w i ] = n α − .

2. Models for Bipartite Projection.

In this section we formalize our model and give somebackground on relevant models for projection and graph generation. Our model is an extensionof the seminal random graph model from Chung and Lu [18]. The classical Chung-Lu modeltakes as input a weight sequence S , which speciﬁes a nonnegative weight w u for each node,and then produces an undirected edge ( u, v ) with probability w u w v / P z w z . To make surethat the probabilities are well deﬁned, the model assumes that max u w u ≤ P v w v . Alongsimilar lines, Aksoy et al. introduced a Chung-Lu-style bipartite random graph model basedon realizable degree sequences [3]. In general, the model we use is quite similar. However, ourfocus in this paper is to analyse the eﬀects of projection on such models. Our model takes as input thenumber of left nodes n L , the number of right nodes n R , and two sequences of weights S L and S R for the left and right nodes. We denote the weight of any node u by w u . The model thensamples a random bipartite graph G b = ( L ⊔ R, E b ), where(2.1) P [( u, v ) ∈ E b | w u , w v ] = min (cid:18) w u w v P z ∈ R w z , (cid:19) , u ∈ L, v ∈ R. After generating the graph, we project the graph following Deﬁnition 1.3, which is itself arandom graph. This model is similar to the inhomogeneous random intersection graph [12](see subsection 2.3 for more details).

SIMPLE BIPARTITE GRAPH PROJECTION MODEL FOR CLUSTERING IN NETWORKS 5

Our analysis will depend on properties of S L and S R and the moments of these sequences.We denote the k th-order moments of S L and S R by M Lk and M Rk for integers k ≥ M Lk = 1 n L X u ∈ L w ku , M Rk = 1 n R X v ∈ R w kv . (2.2)With this notation, we can re-write the edge probabilities as(2.3) P [( u, v ) ∈ E b | w u , w v ] = min (cid:18) w u w v n R M R , (cid:19) . Remark S R .Thus we can assume without loss of generality that n R E [ M R ] = n L E [ M L ]. This correspondsto the natural condition that the expected degree sum of the left and right side is equal.A practical concern is how eﬃciently we can sample from this model, as naive sampling ofthe bipartite graph requires n L n R coin ﬂips. There are fast sampling heuristics for the bipartitegraph, based on sampling each node in an edge individually for some pre-speciﬁed numberof edges [3]. We develop a fast sampling algorithm in Section 4 that has some theoreticaloptimality guarantees for sequences S L and S R with certain properties. Much of our motivation for random graph models is thatthey provide a baseline for what graph properties we might expect in network data just froma simple underlying random process (in our case, we are particularly interested in what graphproperties we can expect from projection). In turn, this helps researchers determine whichproperties of the data are interesting or inherent to the system modeled by the graph.While Chung-Lu models aim to preserve input degree sequences in expectation, conﬁg-uration models preserve degrees exactly, sampling from the space of graphs with a speciﬁeddegree sequence [25]. Conﬁguration models for bipartite graphs have only been studied inearnest recently [17], where the goal is to sample bipartite graphs with a speciﬁed degreesequence for the left and right nodes. A bipartite conﬁguration model inherits many bene-ﬁts of a standard conﬁguration model; for instance, the degree sequence is preserved exactly,creating an excellent null model for a given dataset.At the same time, conﬁguration models carry some restrictions. First, the random eventson the existence of two edges are dependent (though weakly). To see this, in a stub-labeledbipartite graph, if we condition on an edge existing between u ∈ L and v ∈ R , then there isone fewer stub for each node, making them less likely to connect to other nodes. This makestheoretical analysis diﬃcult. Second, to generate a random graph, a conﬁguration modelneeds a degree sequence that is realizable. While the Gale–Ryser theorem provides a simpleway to check if a candidate bipartite degree sequence is realizable [48], conﬁguration modelstypically analyze a given input graph rather than a class of input graphs with some property.Third, eﬃcient uniform sampling algorithms rely on Markov Chain Monte Carlo, for which itis extremely diﬃcult to obtain reasonable mixing time bounds.The Chung-Lu approach (for either bipartite or unipartite graphs) sacriﬁces control overthe exact degree sequence for easier theoretical analysis while maintaining the expected degreesequence. Unlike the conﬁguration model, the existences of two distinct edges are independent AUSTIN R. BENSON, PAUL LIU, HAO YIN events, there is no need to specify a realizable degree sequence, and samples can be immediatelygenerated. In unipartite graphs, this has led to remarkable results on random graphs withexpected power-law degree sequences, such as small average node distance and diameter [18],the existence of a giant connected component [19], and spectral properties [20].

There are random graph models for bipartitegraphs that are motivated by how the projection step can lose information about communitystructure in the data [30, 33]. While these identify possible issues with the projection, we aremotivated by the fact that several datasets are constructed via projection, either implicitly orexplicitly. There are also many models based on communities, where edge probabilities dependon community membership [2, 32, 50, 59]. These models can be interpreted as probabilisticprojections of node-community bipartite graphs. Such models are typically ﬁt from data toreveal cluster structure. Such analysis is not the focus of this paper.There are a few random graph models where a random bipartite graph is deterministicallyprojected [7, 17, 35, 58]. Some of these have speciﬁcally considered clustering, which is ofprimary interest for us. A recent example is the conﬁguration model for hypergraphs [17],which can be interpreted as a bipartite random graph model: the nodes in the hypergraph arethe left nodes in the bipartite graph, and the right nodes in the bipartite graph correspond toedges in the hypergraph. Chodrow [17] found that the clustering of projections of bipartiterepresentations of several real-world hypergraph datasets was similar to or even less than theclustering of projections of samples from the conﬁguration model. Similar empirical resultshave been found on related datasets, under a model that samples the degrees of the leftnodes in the bipartite graph according to a distribution learned from the data and connectsthe edges to the right nodes uniformly at random [26]. Our theoretical analysis providesadditional grounding for these empircal results, and our model provides a Chung-Lu-stylealternative to the conﬁguration model approach.In terms of theoretical results, the models most related to ours are random intersectiongraphs [10, 27] and random clique covers [58]. In these models, a graph is constructed bysampling n sets from a universe of size m according to a distribution D . A node is associatedto each of the n sets, and two vertices in the graph are adjacent if their subsets overlap. This isequivalent to representing the sets as an n -by- m bipartite graph and then projecting the graphonto the left nodes. Such models can also produce several key properties of projected graphsin practice, such as power-law degree distributions and negative correlation of clustering andprojected degree. In contrast to these approaches, our model can specify degree distributionson both sides of the bipartite graph, as opposed to just one side. Inhomogeneous randomintersection graphs also support arbitrary degree distribution on both sides [12, 13], andjustify the negative correlation of local clustering and projected degree. In comparison, ouranalysis is conducted conditional on the degree sequence, which is potentially generated froma distribution with inﬁnite moment, and thus requires a weaker and more realistic assumptionon the degree distribution than results from Bloznelis and Petuchovas [11, 13]; however, theirresults work directly with projected degrees, which is advantageous.

3. Theoretical Properties of the Projection Model.

In this section we provide results forgraph statistics on the projected graph, such as the degree distribution, clustering coeﬃcients,and closure coeﬃcients. For intuition, one may think of the input weight distributions to our

SIMPLE BIPARTITE GRAPH PROJECTION MODEL FOR CLUSTERING IN NETWORKS 7 model as the degree distribution of a class of input graphs. As we show in Section 5, these inputweights often follow a power law distribution in real-world datasets. Due to the simplicity ofour model, it is possible to derive analytical expressions when the input weight distributionfollows a power law (Deﬁnition 1.4).At a high-level, for a broad range of weight distributions (including power-law distribu-tions), the projected graph has the following properties.1. The projected graph is sparse (edge probabilities go to zero).2. Expected local clustering at a node decays with the node’s weight, and the node’sweight is directly proportional to its degree in expectation.3. Expected local closure at a node is the same for all nodes.4. Global clustering and closure (transitivity) is a positive constant. In other words,clustering does not go to zero as the graph grows large.Besides theoretical analysis, we also verify some key results with simulations, which relies ona fast sampling algorithm that we develop in section 4.

Our analysis is conditional on the general inputweight sequence on both sides of the bipartite graph. We ﬁrst assume that the normalizedproduct of weights is at most one, making the edge existence probability (Equation (2.1)).

Assumption 1 (Well-deﬁned probabilities).

The weight in the sequences S L and S R satisfy w u w v n R M R ≤ for any nodes u ∈ L , v ∈ R . Moreover, our analysis is asymptotic, meaning that the result holds with high accuracy onlarge networks, i.e., n L , n R → ∞ . For any two quantities f and g , we use the following big- O notations in the limit of n L , n R → ∞ : f = o ( g ) if f /g → f = O ( g ) if f /g is bounded; and f = Ω( g ) if f /g is bounded away from 0. We make the following assumption on the rangeand moment of weight sequences. Assumption 2 (Bounded weight sequences).

There exists a constant δ > such that • (bounded range) max[ S L , S R ] = O (cid:16) n / − δR (cid:17) , min[ S L ] = Ω(1) and • (bounded S R moments) M R = O ( M R ) , M R = O (cid:16) n − δR (cid:17) ,as n L , n R → ∞ . Assumption 2 actually speciﬁes a family of assumptions parameterized by δ , with larger δ imposing stronger assumptions. Unless otherwise stated, we only require that δ >

0. In thetheoretical analysis of clustering coeﬃcient, we sometimes require δ > / n L , n R → ∞ , or any direct relationships between n L and n R . This makes our assumptions weaker than a wide range of assumptions typical in theliterature, such as having n L = βn σR for certain β, σ > Proposition 3.1.

If the sequences S L and S R are independent generated from the power-lawdistribution with w max = n / − δR , and the right side distribution has decay exponent α R > ,then Assumption is satisﬁed. AUSTIN R. BENSON, PAUL LIU, HAO YIN

Proof.

The bounded range requirement is automatically satisﬁed due to max capping, andwe focus on the bounded moment requirement.Let W be the random variable denoting a sample weight in S R . Since M R ≥

1, dueto the law of large numbers, it suﬃces to show that E (cid:2) W (cid:3) < ∞ and E (cid:2) W (cid:3) = O ( n − δR )as n R → ∞ . The ﬁrst result can be easily veriﬁed, and when α R ≥

5, a straight-forwardcomputation shows that E (cid:2) W (cid:3) = O (log n R ). When α R ∈ (3 , E (cid:2) W (cid:3) = Z n / − δR C α,δ · w − α +4 d w = C α,δ (5 − α ) (cid:18) n (5 − α )( − δ ) R − (cid:19) = O (cid:16) n − δR (cid:17) , where C α,δ = (1 − n (1 − α )(1 / − δ ) R ) / ( α −

1) = O (1) is the normalizing constant.Therefore, Assumption 2 is satisﬁed when the weight sequences are generated from power-law distributions with only a mild requirement on the decay exponent. In contrast, someresults require constant weights on the right side [10, 22] or α R > M R ≥

1, Assumption 1 is a direct consequence of Assumption 2 for large graphssince max[ S L , S R ] = o ( √ n R ). Henceforth, for our theoretical analysis, we assume that bothAssumption 1 and Assumption 2 are satisﬁed.As a ﬁnal note, a direct consequence of Assumption 2 is that P [( u, v ) ∈ E b | w u , w v ] → w u , w v = o ( n R ), meaning that the bipartite network is sparse. In this section, we study the degreedistribution in the bipartite graph with respect to a given input weight distribution.

Theorem 3.2.

For any node u ∈ L , conditional on u ’s weight w u , the bipartite degree d b ( u ) of u converges in distribution to a Poisson random variable with mean w u as n R → ∞ .Analogously, for any v ∈ R , conditional on w v , d b ( v ) converges in distribution to a Poissonrandom variable with mean w v as n L → ∞ .Proof. By symmetry, we just need to prove the result for a node u ∈ L . For any v ∈ R , theindicator function [( u,v ) ∈ E b ] is a Bernoulli random variable with positive probability w u w v n R M R .By a Taylor expansion, its characteristic function can be written as φ uv ( t ) = 1 + ( e it − w u w v n R M R = e wuwvnRMR ( e it − · (1+ o (1)) , where the o (1) term comes from the bounded range condition in Assumption 2. The bipartitedegree of node u is the sum of the indicator functions of all nodes v ∈ R , which are independentrandom variables. Thus, its characteristic function of d b ( u ) can be written as φ d b ( u ) ( t ) = Y v ∈ R φ uv ( t ) = e w u P v ∈ R wvnRMR ( e it − · (1+ o (1)) → e w u ( e it − . The limiting characteristic function is the characteristic function of a Poisson random variablewith mean w u . Thus, d b ( u ) converges in distribution to a Poisson random variable with mean w u by L´evy’s continuity theorem. SIMPLE BIPARTITE GRAPH PROJECTION MODEL FOR CLUSTERING IN NETWORKS 9

One corollary of Theorem 3.2 is that, in the limit, the expected degree of any node u isits weight w u , which provides an interpretation of the node weights. Next, we show that ifthe weights are independently generated from a power-law distribution, then the degrees inthe bipartite graph are power-law distributed as well. Theorem 3.3.

Suppose that the node weights on the left are independently sampled from acontinuous power-law distribution with exponent α L . Then, for any node u ∈ L , as n R → ∞ ,we have that P [ d b ( u ) = k ] ∝ k − α L for large k .Similarly, suppose that the node weights on the right are independently sampled from acontinuous power-law distribution with exponent α R . Then, for any node v ∈ R , as n L → ∞ ,we have that P [ d b ( u ) = k ] ∝ k − α R for large k .Proof. Again, by symmetry, we only need to show the result for a node on the left. Forany node u ∈ L , according to Theorem 3.2, its bipartite degree distribution converges to aPoisson distribution with mean w u . For any integer k > α , P [ d b ( u ) = k ] = Z w max P [ d b ( u ) = k | w u = w ] · f L ( w ) d w = C Z w max e − w w k k ! · w − α L d w = Ck ! (cid:18)Z ∞ e − w w k − α L d w − Z e − w w k − α L d w − Z ∞ w max e − w w k − α L d w (cid:19) = C Γ( k + 1) (Γ( k − α L + 1) − O (1)) → Ck − α L (1 + o (1)) . Here, C is a normalizing constant constant. The second to last line is due to the fact that w max = Ω(1), and the last line follows because Γ( k − α L + 1) / Γ( k + 1) → k − α L as k → ∞ . To study the edgedensity and degree distribution in the projected graph, we use the following quantity:(3.1) p u u := M R M R w u w u n R . The following theorem shows that p u u is the asymptotic edge existence probability betweenthe two nodes u and u in the projected graph . Note that under Assumption 2, we have w u , w u = O ( n / − δR ) and thus p u u = O ( n − δR ) = o (1), so the projected graph is sparse asthe number of nodes goes to inﬁnity. Theorem 3.4.

For any u , u ∈ L , as n R → ∞ , we have P [( u , u ) ∈ E | S L , S R ] = p u u − p u u (cid:18) p u u M R n R M R (cid:19) p u u · (1 + O ( n − δR )) . Proof.

We consider the complementary case when u and u are not connected in theprojected graph. This is the case when, for any nodes v ∈ R , it is connected to at most one of u and u in the bipartite graph. For each single node v ∈ R , this case happens withprobability 1 − w u w u w v n R M R . Therefore,log( P [( u , u ) / ∈ E | S L , S R ]) = X v ∈ R log (cid:18) − w u w u w v n R M R (cid:19) = X v ∈ R (cid:20) − w u w u w v n R M R − w u w u w v n R M R · (1 + O ( n − δR )) (cid:21) = − p u u − M R n R M R p u u · (1 + O ( n − δR )) . Consequently, P [( u , u ) ∈ E | S L , S R ] = p u u − p u u (cid:18) p u u M R n R M R (cid:19) p u u · (1 + O ( n − δR )) . We now examine the expected degree distribution of the projected graph. One concern isthe possibility of multi-edges in our deﬁnition of a projection, which occurs when two nodes u , u ∈ L have more than one common neighbor in the bipartite graph. The following lemmashows that the probability of having multi-edges conditional on edge existence is negligible,meaning that we can ignore the case of multi-edges with high probability. Lemma 3.5.

Let u , u ∈ L , and let N u u be the number of common neighbors of u and u in the bipartite graph, then P [ N u u ≥ | S L , S R , ( u , u ) ∈ E ] = O ( p u u ) as n R → ∞ .Proof. Note that it suﬃces to show that P [ N u u ≥ | S L , S R ] = O ( p u u ). By the tailformula for expected values, E [ N u u | S L , S R ] = P ∞ k =1 k · P [ N u u = k | S L , S R ] ≥ · P [ N u u ≥ | S L , S R ] + P [ N u u = 1 | S L , S R ]= P [ N u u ≥ | S L , S R ] + P [ N u u ≥ | S L , S R ] . Note that we also have E [ N u u | S L , S R ] = P v ∈ R P [( u , v ) , ( u , v ) ∈ E b | S L , S R ] = p u u , and consequently P [ N u u ≥ | S L , S R ] ≤ p u u − P [ N u u ≥ | S L , S R ] ≤ p u u + o ( p u u ) . The inequality uses the fact that the event N u ,u ≥ u , u ) in the projected graph, which happens with probability p u u − · p u u + o ( p u u ) byTheorem 3.4.Now we are ready to analyze the degree of a node in the projected graph. The followingtheorem says that degree of a node in the projected is directly proportional the weight of thenode. Thus, at least in expectation, we can think of the weight as a proxy for degree. Theorem 3.6.

For any u ∈ L , as n L , n R → ∞ , we have E [ d ( u ) | S L , S R ] = M R M L M R · n L n R · w u · (1 + o (1)) , SIMPLE BIPARTITE GRAPH PROJECTION MODEL FOR CLUSTERING IN NETWORKS 11

Proof.

By Theorem 3.4, E [ d ( u ) | S L , S R ] = X u ∈ L,u = u P [( u, u ) ∈ E | S L , S R ] = X u ∈ L,u = u w u w u n R · M R M R · (1 + o (1))= M R M L M R · n L n R · w u · (1 + o (1)) . By Theorem 3.3, the bipartite degree distributions of the left and right nodes are power-law distributions with exponents α L and α R . For such bipartite graphs, Nacher and Aktsu [38]showed that the degree sequence of the projected graph follows a power law distribution. Corollary 3.7 (Section 2, [38]).

Suppose the node weights on the left and right follow power-law distributions with exponents α L and α R . Then the degree distribution of the projected graphis a power-law distribution with decay exponent min( α L , α R − . When α R ∈ (3 , , α R ∈ (3 ,

4) for several real-world bipartite networks that we analzye (Appendix B).

In this section we compute the expected valueof the clustering and closure coeﬃcients. Theorem 3.8 rigorously analyzes the expected valueof local clustering coeﬃcients on networks generated from projections of general bipartiterandom graphs. Our results show how (for a broad class of random graphs) the expected localclustering coeﬃcient varies with the node weight: it decays at a slower rate for small weightand then decays as the inverse of the weight for large weights. Combined with the result thatthe expected projected degree is proportional to the node weight (Theorem 3.6), this says thatthere is an inverse correlation of node degree with the local clustering coeﬃcient, which we alsoverify with simulation. This has long been a noted empirical property of complex networks [41],and our analysis provides theoretical grounding, along with other recent results [11, 13].

Theorem 3.8.

If Assumption is satisﬁed with δ > , then conditioned on S L and S R forany node u ∈ L , we have in the projected graph that C ( u ) = 11 + M R M R M R w u + o (1) . Besides the trend of how local clustering coeﬃcient decays with node weight, we highlighthow the sequence moment of S R inﬂuences the clustering coeﬃcient. If the distribution of S R has a heavier tail, then M R M R M R is small (via Cauchy-Schwartz), and one would expect higherlocal clustering compared to cases where S R is light-tailed [13] or uniform [10, 22]. We alsoobserve this higher level of clustering in simulations (Figure 5.1).We break the proof of Theorem 3.8 into several lemmas. From this point on, we assume δ > /

10. We ﬁrst present the following results on the limiting probability of wedge andtriangle existence, with proofs given in Appendix C.

Lemma 3.9. As n R → ∞ , for any node triple ( u , u, u ) , the probability that they form awedge centered at u is P [( u, u ) , ( u, u ) ∈ E | S L , S R ] = (cid:18) M R M R M R · w u (cid:19) p uu p uu · (1 + o (1)) . Lemma 3.10.

In the limit of n R → ∞ , the probability of a node triple ( u , u, u ) forms atriangle is P [( u, u ) , ( u, u ) , ( u , u ) ∈ E | S L , S R ] = p uu p uu · M R M R M R · w u · (1 + o (1)) + o ( p uu p uu ) . Now we have the following key result on the conditional probability triadic closure.

Lemma 3.11.

In the limit of n L , n R → ∞ , if a node triple ( u , u, u ) forms an wedge, thenthe probability of this wedge being closed is P [( u , u ) ∈ E | S L , S R , ( u, u ) , ( u, u ) ∈ E ] = 11 + M R M R M R w u + o (1) . Proof.

By combining the result of Lemmas 3.9 and 3.10, we have P [( u , u ) ∈ E | S L , S R , ( u, u ) , ( u, u ) ∈ E ] = P [( u,u ) , ( u,u ) , ( u ,u ) ∈ E | S L ,S R ] P [( u,u ) , ( u,u ) ∈ E | S L ,S R ] = p uu p uu MR MR M R · wu · (1+ o (1))+ o ( p uu p uu ) (cid:18) MR MR M R · wu +1 (cid:19) p uu p uu · (1+ o (1)) = o (1)1+ M R MR MR w u + o (1) . Finally, we are ready to prove our main result.

Proof of Theorem . According to Equation (1.2), the local clustering coeﬃcient is theconditional probability that a randomly chosen wedge centered at node u forms a triangle.Lemma 3.11 shows that this probability is asymptotically the same regardless of the weightson the wedge endpoints u , u . Therefore conditioned on S L and S R , we have C ( u ) = P [( u , u ) ∈ E | S L , S R , ( u, u ) , ( u, u ) ∈ E ] = 11 + M R M R M R w u + o (1) . Figure 3.1 shows the mean conditional local clustering coeﬃcient of a projected graph asa function of node weights w u for networks where n L = n R = 10,000,000 and weights drawnfrom discrete power-law distributions with diﬀerent decay parameters. We cap the maximumvalue of the weights at n . L , which corresponds to δ = 0 . transitivity ) of theprojected graph. The following theorem says that the global clustering tends to a constantbounded away from 0. Theorem 3.12.

If Assumption is satisﬁed with δ > , then conditioned on S L and S R ,we have in the projected graph that C G = 11 + M R M R M R · M L M L + o (1) . SIMPLE BIPARTITE GRAPH PROJECTION MODEL FOR CLUSTERING IN NETWORKS 13 weight −1 c o n d . l o c . c l u s t . c o e ff . α L = 3.0, α R = 3.0expectedmean in sample 10 weight −1 c o n d . l o c . c l u s t . c o e ff . α L = 2.5, α R = 3.5 Figure 3.1: Conditional local clustering coeﬃcient distribution on simulated graphs as a func-tion of node weight w u , where left and right node weights are sampled from a discrete power lawdistribution with decay rates α L and α R . The dots are the mean conditional local clusteringcoeﬃcients for all nodes with that weight, and the curve is the prediction from Theorem 3.8. Proof.

Let W be the set of wedges in G and T be the set of triangles. We ﬁrst show thatthe global clustering coeﬃcient is always well-deﬁned, i.e. P [ |W| ≥ ≥ − exp( − O ( n R )).We show that with high probability, some node on the right partition has degree at least3. This implies that a triangle exists in the graph and therefore a wedge exists. For anygiven node v on the right, its expected degree is w v by Theorem 3.2 and the degrees followa Poisson distribution. By standard concentration bounds [16], P [ d b ( u ) ≤ ≤ exp( − w v / w v larger than 2 (in particular, this probability is less than 1 / w v > − (cid:0) (cid:1) O ( n R ) .Next, we note that the probabilities computed in Lemma 3.11 remain unchanged whenconditioned on the fact that at least one wedge exists. Let E be the event that some wedge( u, u , u ) closes into a triangle (with u as the centre of the wedge). P [ E ∩ |W| ≥ ≥ P [ E ] − (1 − P [ |W| ≥ P [ E ] − (cid:18) (cid:19) O ( n R ) , Consequently, P [ E ] − (cid:18) (cid:19) O ( n R ) ≤ P [ E | |W| ≥ ≤ P [ E ] + (cid:18) (cid:19) O ( n R ) . Finally, P [ E ] = Ω( n O (1) R ) for any of the events we previously considered, so the exponentiallysmall deviation does not produce any additional error in our results.For any node u , the probability that a random wedge has center u is proportional to thenumber of wedges centred at u . By our reasoning above, we can assume at least 1 wedge α R g l o b a l c l u s t . c o e ff . α L = 2.5 expectedsample α R g l o b a l c l u s t . c o e ff . α L = 4.0 Figure 3.2: Expected (via Theorem 3.12) and sampled global clustering coeﬃcients on sim-ulated graphs with discrete power law weight distributions on the left and right nodes withdecay rights α L and α R . The samples are close to the expected value.exists, so these probabilities sum to 1. By Lemma 3.11, we have: P [ u is the center node] = P b,c ∈ L (cid:16) M R M R M R · w u (cid:17) · p ub p uc P a,b,c ∈ L (cid:16) M R M R M R · w a (cid:17) p ab p ac + o (1) . Putting everything together, C G = X u ∈ L P [( u, u , u ) ∈ T | ( u, u , u ) ∈ W ] · P [ u is the center] = 11 + M R M R M R · M L M L + o (1) , where the probability is taken over all u , u ∈ L and the second equality uses Lemma 3.11for the probability that ( u, u , u ) ∈ T .Figure 3.2 shows the expected (computed from Theorem 3.12) and actual global clusteringcoeﬃcient of the projected graph with n L = n R = 1,000,000. The weights are drawn from adiscrete power law distribution with ﬁxed decay rate α L = 2 . . α R on the right nodes, and w max = n . L . The sampled global clustering coeﬃcientsare close to the expectation at all parameter values.Finally, we investigate the local closure coeﬃcient H ( u ). Analysis under the conﬁgurationmodel predicts that H ( u ) should be proportional to the node degree, while empirical analysisdemonstrates a much slower increasing trend versus degree, or even a constant relationship ina coauthorship network that is directedly generated from the bipartite graph projection [60].The following result theoretically justify this phenomenon, showing the the expected value oflocal closure coeﬃcient is independent from node weight Theorem 3.13.

If Assumption is satisﬁed with δ > , then conditioned on S L and S R we have, in the projected graph, H ( u ) = 11 + M R M R M R · M L M L + o (1) SIMPLE BIPARTITE GRAPH PROJECTION MODEL FOR CLUSTERING IN NETWORKS 15 weight c o n d . l o c . c l o s u r e . c o e ff . α L = 3.0, α R = 3.0expectedmean in sample 0 25 50 75 100 125 weight c o n d . l o c . c l o s u r e . c o e ff . α L = 2.5, α R = 3.5 Figure 3.3: Conditional local closure coeﬃcient distribution on simulated graphs as a functionof node weight w u , where left and right node weights are sampled from a discrete power lawdistribution with decay rates α L and α R . The dots are the mean conditional local closure coef-ﬁcients for all nodes with that weight, and the ﬂat curve is the prediction from Theorem 3.13.Weights with fewer than 5 nodes were omitted. as n R → ∞ , i.e., the expected closure coeﬃcient is asymptotically independent of node weight.Proof. By Theorem 3.8, the probability that a length-2 path ( u, v, w ) closes into a triangleonly depends on its center node v . Since the closure coeﬃcient is measured from the headnode u , the probability that any wedge is closed is independent of u and thus the same acrossevery node in the graph. This implies that the local closure coeﬃcient is equal to the globalclosure coeﬃcient, which in turn is equal to the global clustering coeﬃcient.Figure 3.3 shows the local closure coeﬃcient of the projected graph as a function of nodeweights w u , using the same random graphs as for the clustering coeﬃcient in Figure 3.1. Weobserve that the mean local conditional closure coeﬃcient is independent of the node weightin the samples, which veriﬁes Theorem 3.13. Remark δ > /

6. In particular, instead of an additive o (1) error term, the error terms are amultiplicative 1 + o (1) factor. For example, the global clustering coeﬃcient in Theorem 3.12would be C G = 11 + M R M R M R · M L M L (1 + o (1)) .

4. Fast sampling and counting.

We develop a fast sampling algorithm for graphs withdegrees following discrete power law (Zipﬁan) distributions, which we use in all of our experi-ments. One naive way to implement our model is to simply iterate over all O ( n L n R ) potentialedges, and generate a random sample for each edge. For large graphs however, this quadraticscaling is too costly. In contrast, our algorithm has running time linear in the number ofsampled edges rather than the product of the left and right partition sizes. This speedup isenabled by the discrete power law distributions, which allows us to group nodes with the sameweight. The overall procedure is in Algorithm 4.1. Algorithm 4.1

Fast sampling of a Chung-Lu bipartite graph with discrete power-law weights.

Input: positive integers n L , n R , and degree distributions D L and D R Output: a bipartite graph G following degree distributions D L and D R L ← { , , . . . , n } , R ← { n + 1 , n + 2 , . . . , n + n R } W L ← { w u | w u ∼ D L } , W R ← { w u | w u ∼ D R } G ← an empty graph with node set L ⊔ R for each unique value ( w l , w r ) ∈ W L × W R do V L ← { u ∈ L | w u = w l } , V R ← { u ∈ R | w u = w r } m ← | V L || V R | , p = w l w r n R µ R e g ∼ Binomial( m, p )draw e g uniformly from V L × V R without replacement and add them to G end forreturn G Suppose that we have two discrete power law distributions D L and D R , with n L E [ D L ] = n R E [ D R ] and decay parameters α L and α R . We begin by ﬁrst sampling the node weights w u ∈ N according to the speciﬁed distributions. We then group together nodes on each sideof the bipartite graph by their weight. With high probability, the number of groups will besmall (Lemma 1.5). Thus, instead of iterating over all O ( n L n R ) pairs of potential edges, wecan iterate over all pairs of groups between the left and right partition. Within each group,edges between nodes of the group occur with a ﬁxed probability. Hence, the number of edgeswithin the group follows a binomial distribution. The ﬁnal step simply generates the numberof edges e g we need from each group, and then draws that many edges from the node pairswithin that group, which can be done in linear time. Theorem 4.1.

Let µ L = E [ D L ] and µ R = E [ D R ] . The expected running time of Algo-rithm is O (cid:16) n / ( α L − L n / ( α R − R + µ L n L (cid:17) . For α L , α R > , the latter term dominates andthe algorithm is asymptotically optimal since the second term is the expected number of edges.Proof. By Lemma 1.5, the E [ | W L | ] and E [ | W R | ] are O (cid:16) n / ( α L − L (cid:17) and O (cid:16) n / ( α R − R (cid:17) .Thus, the number of unique pairs ( w u , w v ) iterated over in the for loop of Algorithm 4.1 is O (cid:16) n / ( α L − L n / ( α R − R (cid:17) in expectation. Aside from the time taken to draw e g edges, eachgroup takes constant time to process. The expected number of edges added over all thegroups is P u ∈ L,v ∈ R w u w v n R M R = n L M L . This tends to O ( n L µ L ) in expectation. Hence the totalrunning time is upper bounded by O (cid:16) n / ( α L − L n / ( α R − R + µ L n L (cid:17) .Following Remark 2.1, we may assume without loss of generality that µ L n L = µ R n R .By the AM-GM inequality, µ L n L + µ R n R ≥ √ n L n R µ L µ R . For α L , α R ≥

3, the latter termdominates and the runtime is bounded by the expected number of generated edges. Since theoutput size is at least O ( µ L n L ), Algorithm 4.1 is asymtotically optimal when α L , α R ≥ Lemma 4.2.

Let D L and D R with decay parameters α L and α R be the weight distributions.In expectation, the running time for computing all local clustering and closure coeﬃcients and SIMPLE BIPARTITE GRAPH PROJECTION MODEL FOR CLUSTERING IN NETWORKS 17 the global clustering coeﬃcient is O (cid:16) n / min( α L ,α R − L n L M L M R n R M R (cid:17) . Under the normalization inRemark , this is equal to O (cid:16) n / ( α L − L n / ( α R − R + µ L n L + µ R n R (cid:17) . For α L , α R > , thealgorithm is asymptotically optimal, since the second term is the expected number of edges.Proof. To compute the projected graph, we can simply iterate over all nodes u in the rightpartition. For each pair of nodes in N ( u ), we connect the nodes with an edge in the projectedgraph. Summed over all nodes in the right partition, we add P v ∈ R (cid:16) n L M L n R M R w v (cid:17) edges in theprojected graph on expectation. Hence both the expected time to compute the projection andthe expected number of edges in the projection is upper bounded by O (cid:16) n L M L M R n R M R (cid:17) .To compute the local clustering and closure coeﬃcients, as well as the global clusteringcoeﬃcient, it is suﬃcient to have the degree and triangle participation counts of each node.The degrees are immediately available from the projected graph, and we can list all triangles in O ( mn /α ) time, where m is the number of edges in the projection, and α is the power law pa-rameter of the projection [34]. By Corollary 3.7 and our reasoning above, m = O (cid:16) n L M L M R n R M R (cid:17) and α = min( α L , α R − D L and D R as the degree distributions ofthe left and right partitions respectively. In these cases, we have the equality n L E [ D L ] = n R E [ D R ]. With this equality, our results above simplify. The running time of Algorithm 4.1can be restated as O (cid:16) n / ( α L − L n / ( α R − R + µ L n L + µ R n R (cid:17) . Thus for α L , α R >

3, the latterterms dominate (by the AM-GM inequality) and the running time is asymtotically optimal,since it is bounded by the expected number of generated edges.

5. Numerical experiments.

In this section, we use our model in conjunction with severaldatasets. We ﬁnd that much of the empirical clustering behavior in real-world projectionscan be accounted for by our bipartite project model. All algorithms and simulations wereimplemented in C++, and all experiments were executed on a dual-core Intel i7-7500U 2.7GHz CPU with 16 GB of RAM. Code and data are available at https://gitlab.com/paul.liu.ubc/bipartite-generation-model.We analyze 11 bipartite network datasets (Table 5.1). For the weight sequences S L and S R ,we use the degrees from the data. We also compare with a version of the random intersectionmodel [10, 27], where the weight sequence of the left nodes comes from the data. For eachdataset, we estimated power-law decay parameters for the degree distribution of the left andright partition (Appendix B).Table 5.2 shows clustering and closure coeﬃcients — mean local clustering (i.e., averageclustering coeﬃcient), global clustering (equal to global closure), and mean local closure (i.e.,average closure coeﬃcient) — from (1) the data, (2) the projected graph produced by ourmodel, and (3) the graph produced by the random intersection model. When computing thecoeﬃcients, we ignore any node that has an undeﬁned coeﬃcient, and we report the empirical(i.e., non-conditional) variants deﬁned in Subsection 1.1.In all but one dataset, our model has mean local clustering that is closer to the data thanthe random intersection model. This remains true regardless of whether our model has moreclustering (e.g., mathsx-tags-questions ) or less clustering (e.g., actors-movies ) compared to Table 5.1: Description and summary statistics of real world datasets.dataset | L | | R | | E b | projection descriptionactors-movies [6] 384K 128K 1.47M actors in the same movie amazon-products-pages [36] 721K 549K 2.34M products displayed on the same pageon amazon.com classes-drugs [8] 1.16K 49.7K 156K FDA NDC classiﬁcation codes describ-ing the same drug condmat-authors-papers [40] 16.7K 22.0K 58.6K academics co-authoring a paper on theCondensed Matter arXiv directors-boards [49] 204 1.01K 1.13K directors on the boards of the sameNorwegian company diseases-genes [28] 516 1.42K 3.93K diseases associated with the same gene genes-diseases [28] 1.42K 516 3.93K genes associated with the same disease mathsx-tags-questions [8] 1.63K 822K 1.80M tags applied to the same question on math.stackexchange.com mo-questions-users [55] 73.9K 5.45K 132K questions answered by the same user so-users-threads [8] 2.68M 11.3M 25.6M users posting on the same questionthread on stackoverflow.com walmart-items-trips [5] 88.9K 69.9K 460K items co-purchased in a shopping trip

Table 5.2: Clustering and closure coeﬃcients in real-world data and in random projectionsfollowing our model and the random intersection (RI) model. Variances are on the order of0.001. A large amount clustering is simply explained by the degree distribution and projection.mean clust. coeﬀ. global clust. coeﬀ. mean closure coeﬀ.dataset data ours RI data ours RI data ours RIactors-movies 0.78 0.63 0.58 0.17 0.07 0.04 0.20 0.04 0.03amazon-products-pages 0.74 0.52 0.53 0.20 0.08 0.08 0.29 0.09 0.09classes-drugs 0.83 0.79 0.78 0.50 0.50 0.49 0.40 0.24 0.23condmat-authors-papers 0.74 0.50 0.50 0.36 0.12 0.11 0.35 0.10 0.10directors-boards 0.45 0.28 0.34 0.39 0.21 0.23 0.27 0.17 0.19diseases-genes 0.82 0.46 0.36 0.63 0.31 0.19 0.52 0.21 0.14genes-diseases 0.86 0.65 0.57 0.66 0.37 0.23 0.54 0.24 0.19mathsx-tags-questions 0.63 0.79 0.80 0.33 0.46 0.47 0.17 0.25 0.27mo-questions-users 0.86 0.78 0.64 0.63 0.45 0.19 0.37 0.24 0.19so-users-threads 0.40 0.45 0.46 0.02 0.01 0.01 0.00 0.01 0.01walmart-items-trips 0.63 0.55 0.52 0.05 0.04 0.04 0.07 0.02 0.02the data. The one exception is the directors-boards dataset, where the random intersectionmodel accounts for more clustering than our model. In an absolute sense, a large amount ofthe mean clustering is created by the projection.To further highlight how much is explained by our model, Figure 5.1 shows the localclustering coeﬃcient as a function of degree in the data and in a sample from the models.

SIMPLE BIPARTITE GRAPH PROJECTION MODEL FOR CLUSTERING IN NETWORKS 19 degree −2 −1 c l u s t . c o e ff . walmart-items-trips degree −2 −1 c l u s t . c o e ff . mo-questions-usersdataour modelrandom int. Figure 5.1: Local clustering coeﬃcient as a function of degree on the walmart-items-trips (left) and mo-questions-users (right) datasets. The green, orange, and blue lines representthe clustering coeﬃcients from the real projected graph, the projected graph produced byour model, and the projected graph produced by the random intersection model respectively.Much of the empirical local clustering behavior can be explained by the projection. degree −1 c l o s u r e c o e ff . amazon-products-pages degree −2 −1 c l o s u r e c o e ff . genes-diseasesdataour modelrandom int. Figure 5.2: Local closure coeﬃcient as a function of degree on the walmart-items-trips (left)and genes-diseases (right) datasets. The green, orange, and blue lines represent the clusteringcoeﬃcients from the real projected graph, the projected graph produced by our model, andthe projected graph produced by the random intersection model respectively.We ﬁnd that the empirical characteristics of the clustering coeﬃcient as a function of degreeare largely explained by the projection, suggesting that there is little innate local clusteringbehavior beyond what the projection from the degree distribution already provides.In some datasets, the global clustering coeﬃcient is essentially the same as in our model( classes-drugs , walmart-items-trips ). However, there are several cases where our model andthe random intersection model have a factor of two less global clustering ( actors-movies , amazon-products-pages , diseases-genes ). This suggests that there is global transitivity inthese networks that goes beyond what we would expect from a random projection. Overall, therelative diﬀerence between the data and the model is larger for the global clustering coeﬃcient than for the local clustering coeﬃcient. We emphasize that our model is not designed to matchthese empirical properties. Instead, we are interested in how much clustering one can expectfrom a model that only accounts for the bipartite degree distributions and the projection step.Finally, the random graphs have non-trivial mean closure coeﬃcients, but they tend tobe smaller compared to the data, with the exception of mathsx-tags-questions . Similar to thelocal clustering coeﬃcient, we plot the local closure coeﬃcient as a function of degree for twodatasets ( amazon-products-pages and genes-diseases ; Figure 5.2). For amazon-products-pages ,we see the ﬂat closure coeﬃcient as one might expect from Theorem 3.13, although the datahas more closure at baseline. This is likely explained by the fact that two products tendto appear on the same pages, reducing the number of length-2 paths in the data, whereasbipartite connections are made at random in the model. With the genes-diseases dataset, therandom models capture an increase in closure as a function of degree that is also seen in thedata. In this case, the model parameters do not satisfy the assumptions of Theorem 3.13, butthe general empirical behavior is still seen in our random projection model.

6. Conclusion.

We have analyzed a simple bipartite “Chung-Lu style” model that cap-tures some common properties of real-world networks. The simplicity of our model enablestheoretical analysis of properties of the projected graph, giving analytical formulae for graphstatistics such as clustering coeﬃcients, closure coeﬃcients, and the expected degree distribu-tion. We also pair our model with a fast optimal graph generation algorithm, which is provablyoptimal for certain input distributions. Empirically, we ﬁnd that a substantial amount of clus-tering and closure behavior in real-world networks is explained by sampling from our modelwith the same bipartite degree distribution. However, global clustering is often larger thanpredicted by the projection model.

Acknowledgments.

This research was supported by NSF Award DMS-1830274, AROAward W911NF19-1-0057, ARO MURI, and JPMorgan Chase & Co. We thank Johan Ugan-der for pointing us to the literature on random intersection graphs.

REFERENCES [1]

Y.-Y. Ahn, S. E. Ahnert, J. P. Bagrow, and A.-L. Barab´asi , Flavor network and the principles offood pairing , Scientiﬁc Reports, 1 (2011).[2]

E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing , Mixed membership stochastic blockmodels ,Journal of machine learning research, 9 (2008), pp. 1981–2014.[3]

S. G. Aksoy, T. G. Kolda, and A. Pinar , Measuring and modeling bipartite graphs with communitystructure , Journal of Complex Networks, 5 (2017), pp. 581–603.[4]

J. Alstott, E. Bullmore, and D. Plenz , powerlaw: a python package for analysis of heavy-taileddistributions , PloS one, 9 (2014), p. e85777.[5] I. Amburg, N. Veldt, and A. R. Benson , Clustering in graphs and hypergraphs with categorical edgelabels , in Proceedings of the Web Conference, 2020.[6]

A.-L. Barab´asi and R. Albert , Emergence of scaling in random networks , Science, 286 (1999), pp. 509–512.[7]

D. Barber , Clique matrices for statistical graph decomposition and parameterising restricted positivedeﬁnite matrices , in Proceedings of the Twenty-Fourth Conference on Uncertainty in Artiﬁcial Intel-ligence, AUAI Press, 2008, pp. 26–33.[8]

A. R. Benson, R. Abebe, M. T. Schaub, A. Jadbabaie, and J. Kleinberg , Simplicial closureand higher-order link prediction , Proceedings of the National Academy of Sciences, 115 (2018),

SIMPLE BIPARTITE GRAPH PROJECTION MODEL FOR CLUSTERING IN NETWORKS 21 pp. E11221–E11230.[9]

A. R. Benson, D. F. Gleich, and J. Leskovec , Higher-order organization of complex networks , Science,353 (2016), pp. 163–166.[10]

M. Bloznelis , Degree and clustering coeﬃcient in sparse random intersection graphs , The Annals ofApplied Probability, 23 (2013), pp. 1254–1289.[11]

M. Bloznelis , Local probabilities of randomly stopped sums of power-law lattice random variables , Lithua-nian Mathematical Journal, 59 (2019), pp. 437–468.[12]

M. Bloznelis and V. Kurauskas , Clustering coeﬃcient of random intersection graphs with inﬁnitedegree variance , Internet Mathematics, (2016), p. 1215.[13]

M. Bloznelis and J. Petuchovas , Correlation between clustering and degree in aﬃliation networks , inInternational Workshop on Algorithms and Models for the Web-Graph, Springer, 2017, pp. 90–104.[14]

R. L. Breiger , The duality of persons and groups , Social forces, 53 (1974), pp. 181–190.[15]

A. D. Broido and A. Clauset , Scale-free networks are rare , Nature communications, 10 (2019), pp. 1–10.[16]

C. Cannone , A short note on poisson tail bounds ∼ ccanonne/ﬁles/misc/2017-poissonconcentration.pdf.[17] P. S. Chodrow , Conﬁguration models of random hypergraphs , arXiv:1902.09302, (2019).[18]

F. Chung and L. Lu , The average distances in random graphs with given expected degrees , Proceedingsof the National Academy of Sciences, 99 (2002), pp. 15879–15882.[19]

F. Chung and L. Lu , Connected components in random graphs with given expected degree sequences ,Annals of combinatorics, 6 (2002), pp. 125–145.[20]

F. Chung, L. Lu, and V. Vu , The spectra of random graphs with given expected degrees , InternetMathematics, 1 (2004), pp. 257–275.[21]

A. Clauset, C. R. Shalizi, and M. E. Newman , Power-law distributions in empirical data , SIAMReview, 51 (2009), pp. 661–703.[22]

M. Deijfen and W. Kets , Random intersection graphs with tunable degree distribution and clustering ,Probability in the Engineering and Informational Sciences, 23 (2009), pp. 661–674.[23]

S. N. Dorogovtsev and J. F. Mendes , Evolution of networks , Advances in physics, 51 (2002), pp. 1079–1187.[24]

D. Easley and J. Kleinberg , Networks, crowds, and markets , vol. 8, Cambridge university pressCambridge, 2010.[25]

B. K. Fosdick, D. B. Larremore, J. Nishimura, and J. Ugander , Conﬁguring random graph modelswith ﬁxed degree sequences , SIAM Review, 60 (2018), pp. 315–355.[26]

X. Fu, S. Yu, and A. R. Benson , Modeling and analysis of tagging networks in stack exchange com-munities , Journal of Complex Networks, (2019), pp. 1–19.[27]

E. Godehardt and J. Jaworski , Two models of random intersection graphs for classiﬁcation , in Ex-ploratory data analysis in empirical research, Springer, 2003, pp. 67–81.[28]

K.-I. Goh, M. E. Cusick, D. Valle, B. Childs, M. Vidal, and A.-L. Barabasi , The human diseasenetwork , Proceedings of the National Academy of Sciences, 104 (2007), pp. 8685–8690.[29]

M. S. Granovetter , The strength of weak ties , in Social networks, Elsevier, 1977, pp. 347–367.[30]

R. Guimer`a, M. Sales-Pardo, and L. A. N. Amaral , Module identiﬁcation in bipartite and directednetworks , Physical Review E, 76 (2007).[31]

M. O. Jackson and B. W. Rogers , Meeting strangers and friends of friends: How random are socialnetworks? , American Economic Review, 97 (2007), pp. 890–915.[32]

B. Karrer and M. E. J. Newman , Stochastic blockmodels and community structure in networks , Phys-ical Review E, 83 (2011).[33]

D. B. Larremore, A. Clauset, and A. Z. Jacobs , Eﬃciently inferring community structure in bipar-tite networks , Physical Review E, 90 (2014).[34]

M. Latapy , Main-memory triangle computations for very large (sparse (power-law)) graphs , Theor. Com-put. Sci., 407 (2008), pp. 458–473.[35]

S. Lattanzi and D. Sivakumar , Aﬃliation networks , in Proceedings of the 41st annual ACM Sympo-sium on Theory of Computing (STOC), 2009.[36]

J. Leskovec, L. A. Adamic, and B. A. Huberman , The dynamics of viral marketing , ACM Transac-tions on the Web (TWEB), 1 (2007), pp. 5–es. [37]

P. Li and O. Milenkovic , Inhomogeneous hypergraph clustering with applications , in Advances in NeuralInformation Processing Systems, 2017, pp. 2308–2318.[38]

J. Nacher and T. Akutsu , On the degree distribution of projected networks mapped from bipartitenetworks , Physica A: Statistical Mechanics and its Applications, 390 (2011), pp. 4636–4651.[39]

Z. Neal , The backbone of bipartite projections: Inferring relationships from co-authorship, co-sponsorship,co-attendance and other co-behaviors , Social Networks, 39 (2014), pp. 84–97.[40]

M. E. Newman , The structure of scientiﬁc collaboration networks , Proceedings of the National Academyof Sciences, 98 (2001), pp. 404–409.[41]

M. E. Newman , The structure and function of complex networks , SIAM Review, 45 (2003), pp. 167–256.[42]

M. E. J. Newman , Coauthorship networks and patterns of scientiﬁc collaboration , Proceedings of theNational Academy of Sciences, 101 (2004), pp. 5200–5205.[43]

M. E. J. Newman, S. H. Strogatz, and D. J. Watts , Random graphs with arbitrary degree distribu-tions and their applications , Phys. Rev. E, 64 (2001), p. 026118.[44]

T. Opsahl , Triadic closure in two-mode networks: Redeﬁning the global and local clustering coeﬃcients ,Social Networks, 35 (2013), pp. 159–167.[45]

M. A. Porter, P. J. Mucha, M. E. J. Newman, and C. M. Warmbrand , A network analysis ofcommittees in the U.S. House of Representatives , Proceedings of the National Academy of Sciences,102 (2005), pp. 7057–7062.[46]

A. Rapoport , Spread of information through a population with socio-structural bias: I. assumption oftransitivity , The bulletin of mathematical biophysics, 15 (1953), pp. 523–533.[47]

E. Ravasz, A. L. Somera, D. A. Mongru, Z. N. Oltvai, and A.-L. Barab´asi , Hierarchical organi-zation of modularity in metabolic networks , science, 297 (2002), pp. 1551–1555.[48]

H. J. Ryser , Combinatorial mathematics , vol. 14, American Mathematical Soc., 1963.[49]

C. Seierstad and T. Opsahl , For the few not the many? the eﬀects of aﬃrmative action on presence,prominence, and social capital of women directors in norway , Scandinavian Journal of Management,27 (2011), pp. 44–54.[50]

C. Seshadhri, T. G. Kolda, and A. Pinar , Community structure and scale-free collections of Erd˝os-R´enyi graphs , Physical Review E, 85 (2012), p. 056109.[51]

J. Sun, H. Qu, D. Chakrabarti, and C. Faloutsos , Neighborhood formation and anomaly detectionin bipartite graphs , in Fifth IEEE International Conference on Data Mining, IEEE, 2005.[52]

G. Szab´o, M. Alava, and J. Kert´esz , Structural transitions in scale-free networks , Physical ReviewE, 67 (2003), p. 056102.[53]

A. Taudiere, F. Munoz, A. Lesne, A.-C. Monnet, J.-M. Bellanger, M.-A. Selosse, P.-A.Moreau, and F. Richard , Beyond ectomycorrhizal bipartite networks: projected networks demon-strate contrasted patterns between early- and late-successional plants in corsica , Frontiers in PlantScience, 6 (2015).[54]

C.-Y. Teng, Y.-R. Lin, and L. A. Adamic , Recipe recommendation using ingredient networks , inProceedings of the 3rd Annual ACM Web Science Conference, ACM Press, 2012.[55]

N. Veldt, A. R. Benson, and J. Kleinberg , Localized ﬂow-based clustering in hypergraphs , in Pro-ceedings of the SIGKDD Conference on Knowledge Discovery and Data Mining, 2020.[56]

I. Vogt and J. Mestres , Drug-target networks , Molecular Informatics, 29 (2010), pp. 10–14.[57]

D. J. Watts and S. H. Strogatz , Collective dynamics of ‘small-world’ networks , Nature, 393 (1998),p. 440.[58]

S. A. Williamson and M. Tec , Random clique covers for graphs with local density and global sparsity ,in Proceedings of the Conference on Uncertainty in Artiﬁcial Intelligence, 2019.[59]

J. Yang and J. Leskovec , Community-aﬃliation graph model for overlapping network community de-tection , in 2012 IEEE 12th International Conference on Data Mining, IEEE, Dec. 2012.[60]

H. Yin, A. R. Benson, and J. Leskovec , The local closure coeﬃcient: a new perspective on networkclustering , in Proceedings of the Twelfth ACM International Conference on Web Search and DataMining, ACM, 2019, pp. 303–311.[61]

H. Yin, A. R. Benson, and J. Ugander , Measuring directed triadic closure with closure coeﬃcients ,Network Science, (2020), pp. 1–23.[62]

Y. Zhang, A. Friend, A. L. Traud, M. A. Porter, J. H. Fowler, and P. J. Mucha , Communitystructure in congressional cosponsorship networks , Physica A: Statistical Mechanics and its Applica-

SIMPLE BIPARTITE GRAPH PROJECTION MODEL FOR CLUSTERING IN NETWORKS 23 tions, 387 (2008), pp. 1705–1712.[63]

D. Zhou, J. Huang, and B. Sch¨olkopf , Learning with hypergraphs: Clustering, classiﬁcation, andembedding , in Advances in neural information processing systems, 2007, pp. 1601–1608.[64]

T. Zhou, J. Ren, M. c. v. Medo, and Y.-C. Zhang , Bipartite network projection and personal recom-mendation , Phys. Rev. E, 76 (2007), p. 046115.

Appendix A. Connection between conditional probability and empirical clustering.

Here, we show that the conditional probability formulation for clustering is exactly aweighted average of the standard empirical clustering coeﬃcient for the power-law type dis-tributions our model explores. We demonstrate this below for the local clustering coeﬃcient.The case for local closure is similar.Fix a node u and suppose we generate a graph G i under our random graph model. Let W i and T i be the number of wedges and triangles at node u in the projected graph G i . Theempirical clustering coeﬃcient ˜ C i ( u ) is equal to T i /W i . Weighting each sample ˜ C i by W i , theweighted clustering coeﬃcient is P si =1 W i ˜ C i ( u ) P si =1 W i = s P si =1 T i s P si =1 W i . As the number of samples s (i.e.the size of the graph) approaches inﬁnity, both the numerator and denominator approachestheir expectations since each sample is independent. Computing this expectation, we see thatit is exactly the value of C ( u ) computed in Theorem 3.8.In the case of the global closure coeﬃcient, a similar argument shows that we actuallyhave equality between the conditional and non-conditional deﬁnition (in the limit that thesize of the graph goes to inﬁnity). Appendix B. Power-law statistics in real-world bipartite networks.

In many of ourdatasets, we ﬁnd that power-law degree distributions are a reasonable approximation for theleft and right sides of the bipartite network (Table B.1).Table B.1: Estimated power law (PL) exponents of the left and right degree distributionsin the bipartite graph datasets in Table 5.2 (an exponent of α corresponds to a distributiondecay ∝ k − α ). Parameters were ﬁt using the powerlaw pyton package [4]. We also report theKolmogorov-Smirnov statistic D between the ﬁt model and the data.dataset left PL exponent D right PL exponent Dactors-movies 1.862 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± Appendix C. Additional proofs.Proof of Lemma 3.9.

Let A i denote the event that ( u, u i ) ∈ E for i = 1 ,

2. We want tocompute the probability of A ∩ A . We ﬁrst decompose the probability as follows:(C.1) P [ A ∩ A ] = P [ A ] + P [ A ] − P [ A ∪ A ] = P [ A ] + P [ A ] + P (cid:2) ¯ A ∩ ¯ A (cid:3) − . The probability that events A i occur is given by Theorem 3.4, so we compute the proba-bility of ¯ A ∩ ¯ A , which is the event that u is not connected to either u or u in the projectedgraph. This happens if and only if, in the bipartite graph, for every v ∈ R , we have that (i) u is not connected to v , or (ii) both u and u are not connected to v . For now, let v be aﬁxed node on the right. Conditioning on w v and using the fact that edge formations in thebipartite graph are independent, the probability is1 − w u w v n R M R + w u w v n R M R (cid:16) − w u w v n R M R (cid:17) (cid:16) − w u w v n R M R (cid:17) = 1 − w u ( w u + w u ) w v n R M R + w u w u w u w v n R M R . Therefore, we havelog( P (cid:2) ¯ A ∩ ¯ A (cid:3) ) = P v ∈ R log (cid:16) − w u ( w u + w u ) w v n R M R + w u w u w u w v n R M R (cid:17) = P v ∈ R h − w u ( w u + w u ) w v n R M R + w u w u w u w v n R M R − w u ( w u + w u ) w v n R M R · (1 + O ( n − δR )) i = − p uu − p uu + M R M R M R p uu p uu w u − M R n R M R ( p uu + p uu ) · (1 + O ( n − δR )) . Consequently, P (cid:2) ¯ A ∩ ¯ A (cid:3) = 1 − p uu − p uu + p uu + p uu + p uu p uu − p uu + p uu (1 + O ( n − δR )) + o ( p uu p uu )+ M R M R M R p uu p uu w u · (1 + O ( n − δ )) − M R n R M R ( p uu + p uu )(1 + O ( n − δR )) . Combining everything, the probability of wedge formation is P [ A ∩ A ] = (cid:18) M R M R M R · w u + 1 (cid:19) p uu p uu · (1 + O ( n − δR ) + o (1))+ (cid:18) p uu + p uu M R n R M R (cid:19) ( p uu + p uu ) · O ( n − δR )= (cid:18) M R M R M R · w u + 1 (cid:19) p uu p uu · (cid:18) O ( n − δR ) + o (1) + (cid:18) w u w u + w u w u (cid:19) O ( n − δR ) (cid:19) = (cid:18) M R M R M R · w u + 1 (cid:19) p uu p uu · (cid:16) O ( n − δR ) + o (1) + O ( n / − δR ) (cid:17) where the last equality is due to w u i ∈ [1 , n / − δ ] and their ratio is bounded by n / − δ . Since δ > /

10, the proof is complete.

SIMPLE BIPARTITE GRAPH PROJECTION MODEL FOR CLUSTERING IN NETWORKS 25

Proof of Lemma 3.10.

For nodes u, u , u to form a triangle, one of two cases musthappen. The ﬁrst is the case that all three nodes connect to a same node in the rightpartition. If the ﬁrst case does not happen, then each pair ( u, u ), ( u, u ), ( u , u ) have adiﬀerent common neighbor in the bipartite graph, forming a length-6 cycles. Now we analyzethese two cases separately.In the ﬁrst case, there exists a node v ∈ R such that the three nodes u, u , u are connectedto v . For any speciﬁc node v ∈ R , the probability is w u w u w u n R M R · w v , and thus P [ ∃ v ∈ R s.t. ( u, v ) , ( u , v ) , ( u , v ) ∈ E b ] = 1 − Y v ∈ R (cid:18) − w u w u w u n R M R · w v (cid:19) = 1 − exp − w u w u w u n R M R · X v ∈ R w v · (1 + O ( n − δR )) ! = p uu p uu · M R M R M R · w u · (1 + O ( n − δR )) . In the second case, u, u , u are pairwise connected through a diﬀerent node on the rightseparately, forming a 6-cycle. For any node triple v , v , v , the probability is P [( u, v , u , v , u , v ) forms a 6-cycle] = w u w u w u n R M R · w v w v w v . Therefore, the total probability of the second case is P [ ∃ a 6-cycle containing u, u , u ] ≤ X v ,v ,v ∈ Rv = v = v w u w u w u n R M R · w v w v w v ≤ w u w u w u n R M R · P v ∈ R w v n R P v ∈ R w v n R P v ∈ R w v n R = p uu p uu p u u = o ( p uu p uu ) ..