A simple bipartite graph projection model for clustering in networks
aa r X i v : . [ c s . S I] J u l A simple bipartite graph projection model for clustering in networks
Austin R. Benson ∗ , Paul Liu † , and Hao Yin ‡ Abstract.
Graph datasets are frequently constructed by a projection of a bipartite graph, where two nodes areconnected in the projection if they share a common neighbor in the bipartite graph; for example, acoauthorship graph is a projection of an author-publication bipartite graph. Analyzing the structureof the projected graph is common, but we do not have a good understanding of the consequences ofthe projection on such analyses. Here, we propose and analyze a random graph model to study whatproperties we can expect from the projection step. Our model is based on a Chung-Lu random graphfor constructing the bipartite representation, which enables us to rigorously analyze the projectedgraph. We show that common network properties such as sparsity, heavy-tailed degree distributions,local clustering at nodes, the inverse relationship between node degree, and global transitivity canbe explained and analyzed through this simple model. We also develop a fast sampling algorithm forour model, which we show is provably optimal for certain input distributions. Numerical simulationswhere model parameters come from real-world datasets show that much of the clustering behaviorin some datasets can just be explained by the projection step.
1. Networks as bipartite projections.
Networks or graphs that consist of a set of nodesand their pairwise interactions are pervasive models throughout the sciences. Oftentimes,network datasets are constructed by a “projection” of a bipartite graph [39, 43, 53, 64];specifically, given a bipartite graph with left and right nodes, the one-mode projection is a(unipartite) graph on the left nodes, where two left nodes are connected if they share a commonright node neighbor in the bipartite graph. In many cases, these projections are explicit inthe data construction process, such as connecting diseases associated with the same gene [28],people belonging to the same group or team [45, 51], and ingredients appearing in commonrecipes [1, 54]. In other cases, the projection is more implicit. For example, the connections ina social network often arise due to shared interests [14]. Regardless, even though a bipartitegraph is more expressive than its projection, analyzing the projection still leads to valuabledata insights [56, 62], enables the use of standard network analysis tools [9, 37, 63], and caneven be used to make predictions about the bipartite graph itself [8].For network analysis, it is paramount to know if structural properties in the data arise fromsome phenomena of the system under study or are simply consequences of a mathematicalproperty of the graph construction process. Random graph models can serve as null modelsfor making such distinctions [25]. Often, the random graph model maintains some propertyof the network data (at least approximately or in expectation) and then direct mathematicalanalysis of the random graph can be used to determine whether certain structural propertieswill arise as a consequence. For example, Chung and Lu showed that short average pathlengths can be a consequence of a uniform sample of a random graph with an expected powerlaw degree distribution [18].Here, we analyze a simple random graph model that explains some properties of projected ∗ Department of Computer Science, Cornell University, Ithaca, NY, USA ([email protected]). † Department of Computer Science, Stanford University, Stanford, CA, USA ([email protected]) ‡ Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA, USA([email protected]) graphs. More specifically, the random graph model is a projection of a bipartite “Chung-Lustyle” model. Each left and right node in the bipartite graph has a weight, and the probabilityof an edge is proportional to the product of these weights.The simplicity of this model enables theoretical analysis of properties of the projectedgraph. One fundamental property is clustering : even in a sparse network, there is a tendencyof edges to appear in small clusters or cliques [24, 46, 57]. There are various explanations forclustering, including local evolutionary processes [31, 29, 52], hierarchical organization [47],and community structure [50]. Here, we show how clustering can arise just from bipartiteprojection. We derive an explicit equation for the expected value of a probabilistic variant ofthe local clustering coefficient of a node (the fraction of pairs of neighbors of the node thatare connected) as a function of its weight in the model.We show that local clustering decreases with the inverse of the weight, while expecteddegree grows linearly with the weight, which is consistent with prior empirical measure-ments [41, 50], mean-field analysis of models that explicitly incorporate clustering [52], andcertain random intersection graph models [13]. Thus, the weights in the bipartite model area potential confounding factor for this relationship between degree and clustering.In addition, using weight distributions fit from real-world bipartite graph data, we showthat high levels of clustering and clustering levels at a given degree are often just a consequenceof bipartite projection. However, in several datasets, there is still a gap between the clusteringlevels in the data and in the model. Bipartite projection has been mentioned informallyas a reason for clustering in several datasets [26, 42, 44], and a recent study has shownthat sampling from configuration models of hypergraphs and projecting can also reproduceclustering [17]. Our analysis provides theoretical justifications and further explanations forthese claims, and also shows that the global clustering (also called transitivity) tends towardsa positive constant as the bipartite network grows large. We also analyze a recently introducedmeasure of clustering called the closure coefficient [60, 61] under our projection model andfind that the expected local closure coefficient of every node is the same, which aligns withsome prior empirical results [60].In addition to clustering, we analyze several properties of the bipartite random graph andits projection. For instance, we show that if the weight distribution on the left and rightnodes follow a power law, then the degree distribution for those nodes is also a power law inthe bipartite graph; moreover, the degrees in the projected graph will also follow a power law.Thus, heavy-tailed degree distributions in the projected graph can simply be a consequence ofa process that creates heavy-tailed degree distributions in the bipartite graph. Furthermore,we show that the projected graph is sparse in the sense that, under a mild restriction on themaximum weight, the probability of an edge between any two nodes goes to zero as the numberof nodes in the projected graph grows to infinity. Combined with our results on clustering, ourmodel thus provides a large class of networks that are “locally dense but globally sparse” [58].
We consider networks as undirected graphs G = ( V, E ) without self-loops and multi-edges. We use d ( u ) to denote the degree of node u (the number of edgesincident to node u ) and T ( u ) to denote the number of triangles (3-cliques) containing node u .A wedge is a pair of edges that shared a common node, and the common node is the center of the wedge. A statistic of primary interest is the clustering coefficient : SIMPLE BIPARTITE GRAPH PROJECTION MODEL FOR CLUSTERING IN NETWORKS 3
Definition 1.1.
The local clustering coefficient of a node u ∈ V is ˜ C ( u ) = T ( u ) d ( u )( d ( u ) − , i.e.,the chance that a randomly chosen wedge centered at u induces a triangle.At the network level, the global clustering coefficient ˜ C G is the probability that a randomlychosen wedge in the entire graph induces a triangle, i.e., ˜ C G = P u ∈ V T ( u ) P u ∈ V d ( u )( d ( u ) − . A closely related measure of clustering is the conditional probability of edge existencegiven the wedge structure [10, 13, 22]. Specifically, we have the following analogs of the localand global clustering coefficients:(1.1) C G = P [( v, w ) ∈ E | ( u, v ) , ( u, w ) ∈ E ] , where all the nodes u, v, w ∈ V are unspecified, while the local clustering coefficient is(1.2) C ( u ) = P [( v, w ) ∈ E | ( u, v ) , ( u, w ) ∈ E ] , where u is the specified node. In both cases, ( u, v ) and ( u, w ) comprise a random wedge fromthe graph. In this paper, we use these slightly different definitions of clustering based onconditional edge existence, as they are more amenable to analysis.An alternative clustering metric is the recently proposed closure coefficient [60, 61]. Definition 1.2.
The local closure coefficient of a node u ∈ V is ˜ H ( u ) = T ( u ) W h ( u ) , where W h ( u ) is the number of length-2 paths leaving vertex u . In other words, the closure coefficient is thechance that a randomly chosen 2-path emanating from u induces a triangle. Analogously, the conditional probability variant of the closure coefficient is:(1.3) H ( u ) = P [( u, w ) ∈ E | ( u, v ) , ( v, w ) ∈ E ] , where u is the specified node.The global closure coefficient is equal to the global clustering coefficient, as the number of2-paths is exactly equal to the number of wedges. This is true for both the non-conditional andthe conditional probability variant. In Appendix A, we show that the conditional probabilitydefinitions above correspond to a weighted average over the standard definitions of clusteringand closure. Henceforth when referring to the clustering or closure coefficients, we alwaysrefer to the conditional probability variant.Next, a graph is bipartite if the nodes can be partitioned into two disjoint subsets L ⊔ R ,which we call the left and right nodes, and any edge is between one node from L and one nodefrom R . We denote a bipartite graph by G b = ( V b , E b ) with V b = L ⊔ R , and call L and R the left and right side of the bipartite graph. The number of nodes on each side is denoted by n L = | L | and n R = | R | , and n b = | V b | = n L + n R is the total number or nodes. Analogously,for any node u ∈ V b , we use d b ( u ) as its degree.The projection of a bipartite graph is the primary concept we analyze. Definition 1.3.
A projection of a bipartite graph G b = ( L ⊔ R, E b ) is the graph G = ( L, E ) ,where the nodes are the left nodes of the bipartite graph and the edges connect any two nodesin L that connect to some node r ∈ R in the bipartite graph. More formally, (1.4) E = { ( u, v ) | u, v ∈ L, u = v, and ∃ z ∈ R for which ( u, z ) , ( v, z ) ∈ E b } . AUSTIN R. BENSON, PAUL LIU, HAO YIN
If there is more than one right node z that connects to left nodes u and v in the bipartitegraph, the projection only creates a single edge between u and v . Given a dataset, one can project onto the left or right nodes. One can always permute the leftand right nodes, and we assume projection onto the left nodes L for notational consistency.Several statistical properties of the models we consider will use samples drawn from a power law distributions, which are prevalent in network data models [21]. Definition 1.4.
The probability density function of the power law distribution, parametrizedby ( α, w min , w max ) with α > and < w min < w max ≤ ∞ , is f ( w ) = ( Cw − α if w ∈ [ w min , w max ]0 otherwisewhere w > is any real number and C = ( w − α min − w − α max ) / ( α − is a normalizing constant.For a discrete power-law (or Zipfian) distribution, we restrict w to integer values inside [ w min , w max ] and adjust the normalization constant accordingly. The parameter α is the decay exponent of the distribution, while w min and w max specify range.For simplicity, we assume that w min = 1 and w max = Ω(1) throughout this paper.When the maximum range is not specified, i.e. , w max = ∞ , a standard result on themaximum statistics of power-law samples is the following: Lemma 1.5 (Folklore).
For a discrete or continuous power-law distribution D with param-eters ( α, w min = 1 , w max = ∞ ) and i.i.d. samples w , w , . . . , w n ∼ D , E [max i w i ] = n α − .
2. Models for Bipartite Projection.
In this section we formalize our model and give somebackground on relevant models for projection and graph generation. Our model is an extensionof the seminal random graph model from Chung and Lu [18]. The classical Chung-Lu modeltakes as input a weight sequence S , which specifies a nonnegative weight w u for each node,and then produces an undirected edge ( u, v ) with probability w u w v / P z w z . To make surethat the probabilities are well defined, the model assumes that max u w u ≤ P v w v . Alongsimilar lines, Aksoy et al. introduced a Chung-Lu-style bipartite random graph model basedon realizable degree sequences [3]. In general, the model we use is quite similar. However, ourfocus in this paper is to analyse the effects of projection on such models. Our model takes as input thenumber of left nodes n L , the number of right nodes n R , and two sequences of weights S L and S R for the left and right nodes. We denote the weight of any node u by w u . The model thensamples a random bipartite graph G b = ( L ⊔ R, E b ), where(2.1) P [( u, v ) ∈ E b | w u , w v ] = min (cid:18) w u w v P z ∈ R w z , (cid:19) , u ∈ L, v ∈ R. After generating the graph, we project the graph following Definition 1.3, which is itself arandom graph. This model is similar to the inhomogeneous random intersection graph [12](see subsection 2.3 for more details).
SIMPLE BIPARTITE GRAPH PROJECTION MODEL FOR CLUSTERING IN NETWORKS 5
Our analysis will depend on properties of S L and S R and the moments of these sequences.We denote the k th-order moments of S L and S R by M Lk and M Rk for integers k ≥ M Lk = 1 n L X u ∈ L w ku , M Rk = 1 n R X v ∈ R w kv . (2.2)With this notation, we can re-write the edge probabilities as(2.3) P [( u, v ) ∈ E b | w u , w v ] = min (cid:18) w u w v n R M R , (cid:19) . Remark S R .Thus we can assume without loss of generality that n R E [ M R ] = n L E [ M L ]. This correspondsto the natural condition that the expected degree sum of the left and right side is equal.A practical concern is how efficiently we can sample from this model, as naive sampling ofthe bipartite graph requires n L n R coin flips. There are fast sampling heuristics for the bipartitegraph, based on sampling each node in an edge individually for some pre-specified numberof edges [3]. We develop a fast sampling algorithm in Section 4 that has some theoreticaloptimality guarantees for sequences S L and S R with certain properties. Much of our motivation for random graph models is thatthey provide a baseline for what graph properties we might expect in network data just froma simple underlying random process (in our case, we are particularly interested in what graphproperties we can expect from projection). In turn, this helps researchers determine whichproperties of the data are interesting or inherent to the system modeled by the graph.While Chung-Lu models aim to preserve input degree sequences in expectation, config-uration models preserve degrees exactly, sampling from the space of graphs with a specifieddegree sequence [25]. Configuration models for bipartite graphs have only been studied inearnest recently [17], where the goal is to sample bipartite graphs with a specified degreesequence for the left and right nodes. A bipartite configuration model inherits many bene-fits of a standard configuration model; for instance, the degree sequence is preserved exactly,creating an excellent null model for a given dataset.At the same time, configuration models carry some restrictions. First, the random eventson the existence of two edges are dependent (though weakly). To see this, in a stub-labeledbipartite graph, if we condition on an edge existing between u ∈ L and v ∈ R , then there isone fewer stub for each node, making them less likely to connect to other nodes. This makestheoretical analysis difficult. Second, to generate a random graph, a configuration modelneeds a degree sequence that is realizable. While the Gale–Ryser theorem provides a simpleway to check if a candidate bipartite degree sequence is realizable [48], configuration modelstypically analyze a given input graph rather than a class of input graphs with some property.Third, efficient uniform sampling algorithms rely on Markov Chain Monte Carlo, for which itis extremely difficult to obtain reasonable mixing time bounds.The Chung-Lu approach (for either bipartite or unipartite graphs) sacrifices control overthe exact degree sequence for easier theoretical analysis while maintaining the expected degreesequence. Unlike the configuration model, the existences of two distinct edges are independent AUSTIN R. BENSON, PAUL LIU, HAO YIN events, there is no need to specify a realizable degree sequence, and samples can be immediatelygenerated. In unipartite graphs, this has led to remarkable results on random graphs withexpected power-law degree sequences, such as small average node distance and diameter [18],the existence of a giant connected component [19], and spectral properties [20].
There are random graph models for bipartitegraphs that are motivated by how the projection step can lose information about communitystructure in the data [30, 33]. While these identify possible issues with the projection, we aremotivated by the fact that several datasets are constructed via projection, either implicitly orexplicitly. There are also many models based on communities, where edge probabilities dependon community membership [2, 32, 50, 59]. These models can be interpreted as probabilisticprojections of node-community bipartite graphs. Such models are typically fit from data toreveal cluster structure. Such analysis is not the focus of this paper.There are a few random graph models where a random bipartite graph is deterministicallyprojected [7, 17, 35, 58]. Some of these have specifically considered clustering, which is ofprimary interest for us. A recent example is the configuration model for hypergraphs [17],which can be interpreted as a bipartite random graph model: the nodes in the hypergraph arethe left nodes in the bipartite graph, and the right nodes in the bipartite graph correspond toedges in the hypergraph. Chodrow [17] found that the clustering of projections of bipartiterepresentations of several real-world hypergraph datasets was similar to or even less than theclustering of projections of samples from the configuration model. Similar empirical resultshave been found on related datasets, under a model that samples the degrees of the leftnodes in the bipartite graph according to a distribution learned from the data and connectsthe edges to the right nodes uniformly at random [26]. Our theoretical analysis providesadditional grounding for these empircal results, and our model provides a Chung-Lu-stylealternative to the configuration model approach.In terms of theoretical results, the models most related to ours are random intersectiongraphs [10, 27] and random clique covers [58]. In these models, a graph is constructed bysampling n sets from a universe of size m according to a distribution D . A node is associatedto each of the n sets, and two vertices in the graph are adjacent if their subsets overlap. This isequivalent to representing the sets as an n -by- m bipartite graph and then projecting the graphonto the left nodes. Such models can also produce several key properties of projected graphsin practice, such as power-law degree distributions and negative correlation of clustering andprojected degree. In contrast to these approaches, our model can specify degree distributionson both sides of the bipartite graph, as opposed to just one side. Inhomogeneous randomintersection graphs also support arbitrary degree distribution on both sides [12, 13], andjustify the negative correlation of local clustering and projected degree. In comparison, ouranalysis is conducted conditional on the degree sequence, which is potentially generated froma distribution with infinite moment, and thus requires a weaker and more realistic assumptionon the degree distribution than results from Bloznelis and Petuchovas [11, 13]; however, theirresults work directly with projected degrees, which is advantageous.
3. Theoretical Properties of the Projection Model.
In this section we provide results forgraph statistics on the projected graph, such as the degree distribution, clustering coefficients,and closure coefficients. For intuition, one may think of the input weight distributions to our
SIMPLE BIPARTITE GRAPH PROJECTION MODEL FOR CLUSTERING IN NETWORKS 7 model as the degree distribution of a class of input graphs. As we show in Section 5, these inputweights often follow a power law distribution in real-world datasets. Due to the simplicity ofour model, it is possible to derive analytical expressions when the input weight distributionfollows a power law (Definition 1.4).At a high-level, for a broad range of weight distributions (including power-law distribu-tions), the projected graph has the following properties.1. The projected graph is sparse (edge probabilities go to zero).2. Expected local clustering at a node decays with the node’s weight, and the node’sweight is directly proportional to its degree in expectation.3. Expected local closure at a node is the same for all nodes.4. Global clustering and closure (transitivity) is a positive constant. In other words,clustering does not go to zero as the graph grows large.Besides theoretical analysis, we also verify some key results with simulations, which relies ona fast sampling algorithm that we develop in section 4.
Our analysis is conditional on the general inputweight sequence on both sides of the bipartite graph. We first assume that the normalizedproduct of weights is at most one, making the edge existence probability (Equation (2.1)).
Assumption 1 (Well-defined probabilities).
The weight in the sequences S L and S R satisfy w u w v n R M R ≤ for any nodes u ∈ L , v ∈ R . Moreover, our analysis is asymptotic, meaning that the result holds with high accuracy onlarge networks, i.e., n L , n R → ∞ . For any two quantities f and g , we use the following big- O notations in the limit of n L , n R → ∞ : f = o ( g ) if f /g → f = O ( g ) if f /g is bounded; and f = Ω( g ) if f /g is bounded away from 0. We make the following assumption on the rangeand moment of weight sequences. Assumption 2 (Bounded weight sequences).
There exists a constant δ > such that • (bounded range) max[ S L , S R ] = O (cid:16) n / − δR (cid:17) , min[ S L ] = Ω(1) and • (bounded S R moments) M R = O ( M R ) , M R = O (cid:16) n − δR (cid:17) ,as n L , n R → ∞ . Assumption 2 actually specifies a family of assumptions parameterized by δ , with larger δ imposing stronger assumptions. Unless otherwise stated, we only require that δ >
0. In thetheoretical analysis of clustering coefficient, we sometimes require δ > / n L , n R → ∞ , or any direct relationships between n L and n R . This makes our assumptions weaker than a wide range of assumptions typical in theliterature, such as having n L = βn σR for certain β, σ > Proposition 3.1.
If the sequences S L and S R are independent generated from the power-lawdistribution with w max = n / − δR , and the right side distribution has decay exponent α R > ,then Assumption is satisfied. AUSTIN R. BENSON, PAUL LIU, HAO YIN
Proof.
The bounded range requirement is automatically satisfied due to max capping, andwe focus on the bounded moment requirement.Let W be the random variable denoting a sample weight in S R . Since M R ≥
1, dueto the law of large numbers, it suffices to show that E (cid:2) W (cid:3) < ∞ and E (cid:2) W (cid:3) = O ( n − δR )as n R → ∞ . The first result can be easily verified, and when α R ≥
5, a straight-forwardcomputation shows that E (cid:2) W (cid:3) = O (log n R ). When α R ∈ (3 , E (cid:2) W (cid:3) = Z n / − δR C α,δ · w − α +4 d w = C α,δ (5 − α ) (cid:18) n (5 − α )( − δ ) R − (cid:19) = O (cid:16) n − δR (cid:17) , where C α,δ = (1 − n (1 − α )(1 / − δ ) R ) / ( α −
1) = O (1) is the normalizing constant.Therefore, Assumption 2 is satisfied when the weight sequences are generated from power-law distributions with only a mild requirement on the decay exponent. In contrast, someresults require constant weights on the right side [10, 22] or α R > M R ≥
1, Assumption 1 is a direct consequence of Assumption 2 for large graphssince max[ S L , S R ] = o ( √ n R ). Henceforth, for our theoretical analysis, we assume that bothAssumption 1 and Assumption 2 are satisfied.As a final note, a direct consequence of Assumption 2 is that P [( u, v ) ∈ E b | w u , w v ] → w u , w v = o ( n R ), meaning that the bipartite network is sparse. In this section, we study the degreedistribution in the bipartite graph with respect to a given input weight distribution.
Theorem 3.2.
For any node u ∈ L , conditional on u ’s weight w u , the bipartite degree d b ( u ) of u converges in distribution to a Poisson random variable with mean w u as n R → ∞ .Analogously, for any v ∈ R , conditional on w v , d b ( v ) converges in distribution to a Poissonrandom variable with mean w v as n L → ∞ .Proof. By symmetry, we just need to prove the result for a node u ∈ L . For any v ∈ R , theindicator function [( u,v ) ∈ E b ] is a Bernoulli random variable with positive probability w u w v n R M R .By a Taylor expansion, its characteristic function can be written as φ uv ( t ) = 1 + ( e it − w u w v n R M R = e wuwvnRMR ( e it − · (1+ o (1)) , where the o (1) term comes from the bounded range condition in Assumption 2. The bipartitedegree of node u is the sum of the indicator functions of all nodes v ∈ R , which are independentrandom variables. Thus, its characteristic function of d b ( u ) can be written as φ d b ( u ) ( t ) = Y v ∈ R φ uv ( t ) = e w u P v ∈ R wvnRMR ( e it − · (1+ o (1)) → e w u ( e it − . The limiting characteristic function is the characteristic function of a Poisson random variablewith mean w u . Thus, d b ( u ) converges in distribution to a Poisson random variable with mean w u by L´evy’s continuity theorem. SIMPLE BIPARTITE GRAPH PROJECTION MODEL FOR CLUSTERING IN NETWORKS 9
One corollary of Theorem 3.2 is that, in the limit, the expected degree of any node u isits weight w u , which provides an interpretation of the node weights. Next, we show that ifthe weights are independently generated from a power-law distribution, then the degrees inthe bipartite graph are power-law distributed as well. Theorem 3.3.
Suppose that the node weights on the left are independently sampled from acontinuous power-law distribution with exponent α L . Then, for any node u ∈ L , as n R → ∞ ,we have that P [ d b ( u ) = k ] ∝ k − α L for large k .Similarly, suppose that the node weights on the right are independently sampled from acontinuous power-law distribution with exponent α R . Then, for any node v ∈ R , as n L → ∞ ,we have that P [ d b ( u ) = k ] ∝ k − α R for large k .Proof. Again, by symmetry, we only need to show the result for a node on the left. Forany node u ∈ L , according to Theorem 3.2, its bipartite degree distribution converges to aPoisson distribution with mean w u . For any integer k > α , P [ d b ( u ) = k ] = Z w max P [ d b ( u ) = k | w u = w ] · f L ( w ) d w = C Z w max e − w w k k ! · w − α L d w = Ck ! (cid:18)Z ∞ e − w w k − α L d w − Z e − w w k − α L d w − Z ∞ w max e − w w k − α L d w (cid:19) = C Γ( k + 1) (Γ( k − α L + 1) − O (1)) → Ck − α L (1 + o (1)) . Here, C is a normalizing constant constant. The second to last line is due to the fact that w max = Ω(1), and the last line follows because Γ( k − α L + 1) / Γ( k + 1) → k − α L as k → ∞ . To study the edgedensity and degree distribution in the projected graph, we use the following quantity:(3.1) p u u := M R M R w u w u n R . The following theorem shows that p u u is the asymptotic edge existence probability betweenthe two nodes u and u in the projected graph . Note that under Assumption 2, we have w u , w u = O ( n / − δR ) and thus p u u = O ( n − δR ) = o (1), so the projected graph is sparse asthe number of nodes goes to infinity. Theorem 3.4.
For any u , u ∈ L , as n R → ∞ , we have P [( u , u ) ∈ E | S L , S R ] = p u u − p u u (cid:18) p u u M R n R M R (cid:19) p u u · (1 + O ( n − δR )) . Proof.
We consider the complementary case when u and u are not connected in theprojected graph. This is the case when, for any nodes v ∈ R , it is connected to at most one of u and u in the bipartite graph. For each single node v ∈ R , this case happens withprobability 1 − w u w u w v n R M R . Therefore,log( P [( u , u ) / ∈ E | S L , S R ]) = X v ∈ R log (cid:18) − w u w u w v n R M R (cid:19) = X v ∈ R (cid:20) − w u w u w v n R M R − w u w u w v n R M R · (1 + O ( n − δR )) (cid:21) = − p u u − M R n R M R p u u · (1 + O ( n − δR )) . Consequently, P [( u , u ) ∈ E | S L , S R ] = p u u − p u u (cid:18) p u u M R n R M R (cid:19) p u u · (1 + O ( n − δR )) . We now examine the expected degree distribution of the projected graph. One concern isthe possibility of multi-edges in our definition of a projection, which occurs when two nodes u , u ∈ L have more than one common neighbor in the bipartite graph. The following lemmashows that the probability of having multi-edges conditional on edge existence is negligible,meaning that we can ignore the case of multi-edges with high probability. Lemma 3.5.
Let u , u ∈ L , and let N u u be the number of common neighbors of u and u in the bipartite graph, then P [ N u u ≥ | S L , S R , ( u , u ) ∈ E ] = O ( p u u ) as n R → ∞ .Proof. Note that it suffices to show that P [ N u u ≥ | S L , S R ] = O ( p u u ). By the tailformula for expected values, E [ N u u | S L , S R ] = P ∞ k =1 k · P [ N u u = k | S L , S R ] ≥ · P [ N u u ≥ | S L , S R ] + P [ N u u = 1 | S L , S R ]= P [ N u u ≥ | S L , S R ] + P [ N u u ≥ | S L , S R ] . Note that we also have E [ N u u | S L , S R ] = P v ∈ R P [( u , v ) , ( u , v ) ∈ E b | S L , S R ] = p u u , and consequently P [ N u u ≥ | S L , S R ] ≤ p u u − P [ N u u ≥ | S L , S R ] ≤ p u u + o ( p u u ) . The inequality uses the fact that the event N u ,u ≥ u , u ) in the projected graph, which happens with probability p u u − · p u u + o ( p u u ) byTheorem 3.4.Now we are ready to analyze the degree of a node in the projected graph. The followingtheorem says that degree of a node in the projected is directly proportional the weight of thenode. Thus, at least in expectation, we can think of the weight as a proxy for degree. Theorem 3.6.
For any u ∈ L , as n L , n R → ∞ , we have E [ d ( u ) | S L , S R ] = M R M L M R · n L n R · w u · (1 + o (1)) , SIMPLE BIPARTITE GRAPH PROJECTION MODEL FOR CLUSTERING IN NETWORKS 11
Proof.
By Theorem 3.4, E [ d ( u ) | S L , S R ] = X u ∈ L,u = u P [( u, u ) ∈ E | S L , S R ] = X u ∈ L,u = u w u w u n R · M R M R · (1 + o (1))= M R M L M R · n L n R · w u · (1 + o (1)) . By Theorem 3.3, the bipartite degree distributions of the left and right nodes are power-law distributions with exponents α L and α R . For such bipartite graphs, Nacher and Aktsu [38]showed that the degree sequence of the projected graph follows a power law distribution. Corollary 3.7 (Section 2, [38]).
Suppose the node weights on the left and right follow power-law distributions with exponents α L and α R . Then the degree distribution of the projected graphis a power-law distribution with decay exponent min( α L , α R − . When α R ∈ (3 , , α R ∈ (3 ,
4) for several real-world bipartite networks that we analzye (Appendix B).
In this section we compute the expected valueof the clustering and closure coefficients. Theorem 3.8 rigorously analyzes the expected valueof local clustering coefficients on networks generated from projections of general bipartiterandom graphs. Our results show how (for a broad class of random graphs) the expected localclustering coefficient varies with the node weight: it decays at a slower rate for small weightand then decays as the inverse of the weight for large weights. Combined with the result thatthe expected projected degree is proportional to the node weight (Theorem 3.6), this says thatthere is an inverse correlation of node degree with the local clustering coefficient, which we alsoverify with simulation. This has long been a noted empirical property of complex networks [41],and our analysis provides theoretical grounding, along with other recent results [11, 13].
Theorem 3.8.
If Assumption is satisfied with δ > , then conditioned on S L and S R forany node u ∈ L , we have in the projected graph that C ( u ) = 11 + M R M R M R w u + o (1) . Besides the trend of how local clustering coefficient decays with node weight, we highlighthow the sequence moment of S R influences the clustering coefficient. If the distribution of S R has a heavier tail, then M R M R M R is small (via Cauchy-Schwartz), and one would expect higherlocal clustering compared to cases where S R is light-tailed [13] or uniform [10, 22]. We alsoobserve this higher level of clustering in simulations (Figure 5.1).We break the proof of Theorem 3.8 into several lemmas. From this point on, we assume δ > /
10. We first present the following results on the limiting probability of wedge andtriangle existence, with proofs given in Appendix C.
Lemma 3.9. As n R → ∞ , for any node triple ( u , u, u ) , the probability that they form awedge centered at u is P [( u, u ) , ( u, u ) ∈ E | S L , S R ] = (cid:18) M R M R M R · w u (cid:19) p uu p uu · (1 + o (1)) . Lemma 3.10.
In the limit of n R → ∞ , the probability of a node triple ( u , u, u ) forms atriangle is P [( u, u ) , ( u, u ) , ( u , u ) ∈ E | S L , S R ] = p uu p uu · M R M R M R · w u · (1 + o (1)) + o ( p uu p uu ) . Now we have the following key result on the conditional probability triadic closure.
Lemma 3.11.
In the limit of n L , n R → ∞ , if a node triple ( u , u, u ) forms an wedge, thenthe probability of this wedge being closed is P [( u , u ) ∈ E | S L , S R , ( u, u ) , ( u, u ) ∈ E ] = 11 + M R M R M R w u + o (1) . Proof.
By combining the result of Lemmas 3.9 and 3.10, we have P [( u , u ) ∈ E | S L , S R , ( u, u ) , ( u, u ) ∈ E ] = P [( u,u ) , ( u,u ) , ( u ,u ) ∈ E | S L ,S R ] P [( u,u ) , ( u,u ) ∈ E | S L ,S R ] = p uu p uu MR MR M R · wu · (1+ o (1))+ o ( p uu p uu ) (cid:18) MR MR M R · wu +1 (cid:19) p uu p uu · (1+ o (1)) = o (1)1+ M R MR MR w u + o (1) . Finally, we are ready to prove our main result.
Proof of Theorem . According to Equation (1.2), the local clustering coefficient is theconditional probability that a randomly chosen wedge centered at node u forms a triangle.Lemma 3.11 shows that this probability is asymptotically the same regardless of the weightson the wedge endpoints u , u . Therefore conditioned on S L and S R , we have C ( u ) = P [( u , u ) ∈ E | S L , S R , ( u, u ) , ( u, u ) ∈ E ] = 11 + M R M R M R w u + o (1) . Figure 3.1 shows the mean conditional local clustering coefficient of a projected graph asa function of node weights w u for networks where n L = n R = 10,000,000 and weights drawnfrom discrete power-law distributions with different decay parameters. We cap the maximumvalue of the weights at n . L , which corresponds to δ = 0 . transitivity ) of theprojected graph. The following theorem says that the global clustering tends to a constantbounded away from 0. Theorem 3.12.
If Assumption is satisfied with δ > , then conditioned on S L and S R ,we have in the projected graph that C G = 11 + M R M R M R · M L M L + o (1) . SIMPLE BIPARTITE GRAPH PROJECTION MODEL FOR CLUSTERING IN NETWORKS 13 weight −1 c o n d . l o c . c l u s t . c o e ff . α L = 3.0, α R = 3.0expectedmean in sample 10 weight −1 c o n d . l o c . c l u s t . c o e ff . α L = 2.5, α R = 3.5 Figure 3.1: Conditional local clustering coefficient distribution on simulated graphs as a func-tion of node weight w u , where left and right node weights are sampled from a discrete power lawdistribution with decay rates α L and α R . The dots are the mean conditional local clusteringcoefficients for all nodes with that weight, and the curve is the prediction from Theorem 3.8. Proof.
Let W be the set of wedges in G and T be the set of triangles. We first show thatthe global clustering coefficient is always well-defined, i.e. P [ |W| ≥ ≥ − exp( − O ( n R )).We show that with high probability, some node on the right partition has degree at least3. This implies that a triangle exists in the graph and therefore a wedge exists. For anygiven node v on the right, its expected degree is w v by Theorem 3.2 and the degrees followa Poisson distribution. By standard concentration bounds [16], P [ d b ( u ) ≤ ≤ exp( − w v / w v larger than 2 (in particular, this probability is less than 1 / w v > − (cid:0) (cid:1) O ( n R ) .Next, we note that the probabilities computed in Lemma 3.11 remain unchanged whenconditioned on the fact that at least one wedge exists. Let E be the event that some wedge( u, u , u ) closes into a triangle (with u as the centre of the wedge). P [ E ∩ |W| ≥ ≥ P [ E ] − (1 − P [ |W| ≥ P [ E ] − (cid:18) (cid:19) O ( n R ) , Consequently, P [ E ] − (cid:18) (cid:19) O ( n R ) ≤ P [ E | |W| ≥ ≤ P [ E ] + (cid:18) (cid:19) O ( n R ) . Finally, P [ E ] = Ω( n O (1) R ) for any of the events we previously considered, so the exponentiallysmall deviation does not produce any additional error in our results.For any node u , the probability that a random wedge has center u is proportional to thenumber of wedges centred at u . By our reasoning above, we can assume at least 1 wedge α R g l o b a l c l u s t . c o e ff . α L = 2.5 expectedsample α R g l o b a l c l u s t . c o e ff . α L = 4.0 Figure 3.2: Expected (via Theorem 3.12) and sampled global clustering coefficients on sim-ulated graphs with discrete power law weight distributions on the left and right nodes withdecay rights α L and α R . The samples are close to the expected value.exists, so these probabilities sum to 1. By Lemma 3.11, we have: P [ u is the center node] = P b,c ∈ L (cid:16) M R M R M R · w u (cid:17) · p ub p uc P a,b,c ∈ L (cid:16) M R M R M R · w a (cid:17) p ab p ac + o (1) . Putting everything together, C G = X u ∈ L P [( u, u , u ) ∈ T | ( u, u , u ) ∈ W ] · P [ u is the center] = 11 + M R M R M R · M L M L + o (1) , where the probability is taken over all u , u ∈ L and the second equality uses Lemma 3.11for the probability that ( u, u , u ) ∈ T .Figure 3.2 shows the expected (computed from Theorem 3.12) and actual global clusteringcoefficient of the projected graph with n L = n R = 1,000,000. The weights are drawn from adiscrete power law distribution with fixed decay rate α L = 2 . . α R on the right nodes, and w max = n . L . The sampled global clustering coefficientsare close to the expectation at all parameter values.Finally, we investigate the local closure coefficient H ( u ). Analysis under the configurationmodel predicts that H ( u ) should be proportional to the node degree, while empirical analysisdemonstrates a much slower increasing trend versus degree, or even a constant relationship ina coauthorship network that is directedly generated from the bipartite graph projection [60].The following result theoretically justify this phenomenon, showing the the expected value oflocal closure coefficient is independent from node weight Theorem 3.13.
If Assumption is satisfied with δ > , then conditioned on S L and S R we have, in the projected graph, H ( u ) = 11 + M R M R M R · M L M L + o (1) SIMPLE BIPARTITE GRAPH PROJECTION MODEL FOR CLUSTERING IN NETWORKS 15 weight c o n d . l o c . c l o s u r e . c o e ff . α L = 3.0, α R = 3.0expectedmean in sample 0 25 50 75 100 125 weight c o n d . l o c . c l o s u r e . c o e ff . α L = 2.5, α R = 3.5 Figure 3.3: Conditional local closure coefficient distribution on simulated graphs as a functionof node weight w u , where left and right node weights are sampled from a discrete power lawdistribution with decay rates α L and α R . The dots are the mean conditional local closure coef-ficients for all nodes with that weight, and the flat curve is the prediction from Theorem 3.13.Weights with fewer than 5 nodes were omitted. as n R → ∞ , i.e., the expected closure coefficient is asymptotically independent of node weight.Proof. By Theorem 3.8, the probability that a length-2 path ( u, v, w ) closes into a triangleonly depends on its center node v . Since the closure coefficient is measured from the headnode u , the probability that any wedge is closed is independent of u and thus the same acrossevery node in the graph. This implies that the local closure coefficient is equal to the globalclosure coefficient, which in turn is equal to the global clustering coefficient.Figure 3.3 shows the local closure coefficient of the projected graph as a function of nodeweights w u , using the same random graphs as for the clustering coefficient in Figure 3.1. Weobserve that the mean local conditional closure coefficient is independent of the node weightin the samples, which verifies Theorem 3.13. Remark δ > /
6. In particular, instead of an additive o (1) error term, the error terms are amultiplicative 1 + o (1) factor. For example, the global clustering coefficient in Theorem 3.12would be C G = 11 + M R M R M R · M L M L (1 + o (1)) .
4. Fast sampling and counting.
We develop a fast sampling algorithm for graphs withdegrees following discrete power law (Zipfian) distributions, which we use in all of our experi-ments. One naive way to implement our model is to simply iterate over all O ( n L n R ) potentialedges, and generate a random sample for each edge. For large graphs however, this quadraticscaling is too costly. In contrast, our algorithm has running time linear in the number ofsampled edges rather than the product of the left and right partition sizes. This speedup isenabled by the discrete power law distributions, which allows us to group nodes with the sameweight. The overall procedure is in Algorithm 4.1. Algorithm 4.1
Fast sampling of a Chung-Lu bipartite graph with discrete power-law weights.
Input: positive integers n L , n R , and degree distributions D L and D R Output: a bipartite graph G following degree distributions D L and D R L ← { , , . . . , n } , R ← { n + 1 , n + 2 , . . . , n + n R } W L ← { w u | w u ∼ D L } , W R ← { w u | w u ∼ D R } G ← an empty graph with node set L ⊔ R for each unique value ( w l , w r ) ∈ W L × W R do V L ← { u ∈ L | w u = w l } , V R ← { u ∈ R | w u = w r } m ← | V L || V R | , p = w l w r n R µ R e g ∼ Binomial( m, p )draw e g uniformly from V L × V R without replacement and add them to G end forreturn G Suppose that we have two discrete power law distributions D L and D R , with n L E [ D L ] = n R E [ D R ] and decay parameters α L and α R . We begin by first sampling the node weights w u ∈ N according to the specified distributions. We then group together nodes on each sideof the bipartite graph by their weight. With high probability, the number of groups will besmall (Lemma 1.5). Thus, instead of iterating over all O ( n L n R ) pairs of potential edges, wecan iterate over all pairs of groups between the left and right partition. Within each group,edges between nodes of the group occur with a fixed probability. Hence, the number of edgeswithin the group follows a binomial distribution. The final step simply generates the numberof edges e g we need from each group, and then draws that many edges from the node pairswithin that group, which can be done in linear time. Theorem 4.1.
Let µ L = E [ D L ] and µ R = E [ D R ] . The expected running time of Algo-rithm is O (cid:16) n / ( α L − L n / ( α R − R + µ L n L (cid:17) . For α L , α R > , the latter term dominates andthe algorithm is asymptotically optimal since the second term is the expected number of edges.Proof. By Lemma 1.5, the E [ | W L | ] and E [ | W R | ] are O (cid:16) n / ( α L − L (cid:17) and O (cid:16) n / ( α R − R (cid:17) .Thus, the number of unique pairs ( w u , w v ) iterated over in the for loop of Algorithm 4.1 is O (cid:16) n / ( α L − L n / ( α R − R (cid:17) in expectation. Aside from the time taken to draw e g edges, eachgroup takes constant time to process. The expected number of edges added over all thegroups is P u ∈ L,v ∈ R w u w v n R M R = n L M L . This tends to O ( n L µ L ) in expectation. Hence the totalrunning time is upper bounded by O (cid:16) n / ( α L − L n / ( α R − R + µ L n L (cid:17) .Following Remark 2.1, we may assume without loss of generality that µ L n L = µ R n R .By the AM-GM inequality, µ L n L + µ R n R ≥ √ n L n R µ L µ R . For α L , α R ≥
3, the latter termdominates and the runtime is bounded by the expected number of generated edges. Since theoutput size is at least O ( µ L n L ), Algorithm 4.1 is asymtotically optimal when α L , α R ≥ Lemma 4.2.
Let D L and D R with decay parameters α L and α R be the weight distributions.In expectation, the running time for computing all local clustering and closure coefficients and SIMPLE BIPARTITE GRAPH PROJECTION MODEL FOR CLUSTERING IN NETWORKS 17 the global clustering coefficient is O (cid:16) n / min( α L ,α R − L n L M L M R n R M R (cid:17) . Under the normalization inRemark , this is equal to O (cid:16) n / ( α L − L n / ( α R − R + µ L n L + µ R n R (cid:17) . For α L , α R > , thealgorithm is asymptotically optimal, since the second term is the expected number of edges.Proof. To compute the projected graph, we can simply iterate over all nodes u in the rightpartition. For each pair of nodes in N ( u ), we connect the nodes with an edge in the projectedgraph. Summed over all nodes in the right partition, we add P v ∈ R (cid:16) n L M L n R M R w v (cid:17) edges in theprojected graph on expectation. Hence both the expected time to compute the projection andthe expected number of edges in the projection is upper bounded by O (cid:16) n L M L M R n R M R (cid:17) .To compute the local clustering and closure coefficients, as well as the global clusteringcoefficient, it is sufficient to have the degree and triangle participation counts of each node.The degrees are immediately available from the projected graph, and we can list all triangles in O ( mn /α ) time, where m is the number of edges in the projection, and α is the power law pa-rameter of the projection [34]. By Corollary 3.7 and our reasoning above, m = O (cid:16) n L M L M R n R M R (cid:17) and α = min( α L , α R − D L and D R as the degree distributions ofthe left and right partitions respectively. In these cases, we have the equality n L E [ D L ] = n R E [ D R ]. With this equality, our results above simplify. The running time of Algorithm 4.1can be restated as O (cid:16) n / ( α L − L n / ( α R − R + µ L n L + µ R n R (cid:17) . Thus for α L , α R >
3, the latterterms dominate (by the AM-GM inequality) and the running time is asymtotically optimal,since it is bounded by the expected number of generated edges.
5. Numerical experiments.
In this section, we use our model in conjunction with severaldatasets. We find that much of the empirical clustering behavior in real-world projectionscan be accounted for by our bipartite project model. All algorithms and simulations wereimplemented in C++, and all experiments were executed on a dual-core Intel i7-7500U 2.7GHz CPU with 16 GB of RAM. Code and data are available at https://gitlab.com/paul.liu.ubc/bipartite-generation-model.We analyze 11 bipartite network datasets (Table 5.1). For the weight sequences S L and S R ,we use the degrees from the data. We also compare with a version of the random intersectionmodel [10, 27], where the weight sequence of the left nodes comes from the data. For eachdataset, we estimated power-law decay parameters for the degree distribution of the left andright partition (Appendix B).Table 5.2 shows clustering and closure coefficients — mean local clustering (i.e., averageclustering coefficient), global clustering (equal to global closure), and mean local closure (i.e.,average closure coefficient) — from (1) the data, (2) the projected graph produced by ourmodel, and (3) the graph produced by the random intersection model. When computing thecoefficients, we ignore any node that has an undefined coefficient, and we report the empirical(i.e., non-conditional) variants defined in Subsection 1.1.In all but one dataset, our model has mean local clustering that is closer to the data thanthe random intersection model. This remains true regardless of whether our model has moreclustering (e.g., mathsx-tags-questions ) or less clustering (e.g., actors-movies ) compared to Table 5.1: Description and summary statistics of real world datasets.dataset | L | | R | | E b | projection descriptionactors-movies [6] 384K 128K 1.47M actors in the same movie amazon-products-pages [36] 721K 549K 2.34M products displayed on the same pageon amazon.com classes-drugs [8] 1.16K 49.7K 156K FDA NDC classification codes describ-ing the same drug condmat-authors-papers [40] 16.7K 22.0K 58.6K academics co-authoring a paper on theCondensed Matter arXiv directors-boards [49] 204 1.01K 1.13K directors on the boards of the sameNorwegian company diseases-genes [28] 516 1.42K 3.93K diseases associated with the same gene genes-diseases [28] 1.42K 516 3.93K genes associated with the same disease mathsx-tags-questions [8] 1.63K 822K 1.80M tags applied to the same question on math.stackexchange.com mo-questions-users [55] 73.9K 5.45K 132K questions answered by the same user so-users-threads [8] 2.68M 11.3M 25.6M users posting on the same questionthread on stackoverflow.com walmart-items-trips [5] 88.9K 69.9K 460K items co-purchased in a shopping trip
Table 5.2: Clustering and closure coefficients in real-world data and in random projectionsfollowing our model and the random intersection (RI) model. Variances are on the order of0.001. A large amount clustering is simply explained by the degree distribution and projection.mean clust. coeff. global clust. coeff. mean closure coeff.dataset data ours RI data ours RI data ours RIactors-movies 0.78 0.63 0.58 0.17 0.07 0.04 0.20 0.04 0.03amazon-products-pages 0.74 0.52 0.53 0.20 0.08 0.08 0.29 0.09 0.09classes-drugs 0.83 0.79 0.78 0.50 0.50 0.49 0.40 0.24 0.23condmat-authors-papers 0.74 0.50 0.50 0.36 0.12 0.11 0.35 0.10 0.10directors-boards 0.45 0.28 0.34 0.39 0.21 0.23 0.27 0.17 0.19diseases-genes 0.82 0.46 0.36 0.63 0.31 0.19 0.52 0.21 0.14genes-diseases 0.86 0.65 0.57 0.66 0.37 0.23 0.54 0.24 0.19mathsx-tags-questions 0.63 0.79 0.80 0.33 0.46 0.47 0.17 0.25 0.27mo-questions-users 0.86 0.78 0.64 0.63 0.45 0.19 0.37 0.24 0.19so-users-threads 0.40 0.45 0.46 0.02 0.01 0.01 0.00 0.01 0.01walmart-items-trips 0.63 0.55 0.52 0.05 0.04 0.04 0.07 0.02 0.02the data. The one exception is the directors-boards dataset, where the random intersectionmodel accounts for more clustering than our model. In an absolute sense, a large amount ofthe mean clustering is created by the projection.To further highlight how much is explained by our model, Figure 5.1 shows the localclustering coefficient as a function of degree in the data and in a sample from the models.
SIMPLE BIPARTITE GRAPH PROJECTION MODEL FOR CLUSTERING IN NETWORKS 19 degree −2 −1 c l u s t . c o e ff . walmart-items-trips degree −2 −1 c l u s t . c o e ff . mo-questions-usersdataour modelrandom int. Figure 5.1: Local clustering coefficient as a function of degree on the walmart-items-trips (left) and mo-questions-users (right) datasets. The green, orange, and blue lines representthe clustering coefficients from the real projected graph, the projected graph produced byour model, and the projected graph produced by the random intersection model respectively.Much of the empirical local clustering behavior can be explained by the projection. degree −1 c l o s u r e c o e ff . amazon-products-pages degree −2 −1 c l o s u r e c o e ff . genes-diseasesdataour modelrandom int. Figure 5.2: Local closure coefficient as a function of degree on the walmart-items-trips (left)and genes-diseases (right) datasets. The green, orange, and blue lines represent the clusteringcoefficients from the real projected graph, the projected graph produced by our model, andthe projected graph produced by the random intersection model respectively.We find that the empirical characteristics of the clustering coefficient as a function of degreeare largely explained by the projection, suggesting that there is little innate local clusteringbehavior beyond what the projection from the degree distribution already provides.In some datasets, the global clustering coefficient is essentially the same as in our model( classes-drugs , walmart-items-trips ). However, there are several cases where our model andthe random intersection model have a factor of two less global clustering ( actors-movies , amazon-products-pages , diseases-genes ). This suggests that there is global transitivity inthese networks that goes beyond what we would expect from a random projection. Overall, therelative difference between the data and the model is larger for the global clustering coefficient than for the local clustering coefficient. We emphasize that our model is not designed to matchthese empirical properties. Instead, we are interested in how much clustering one can expectfrom a model that only accounts for the bipartite degree distributions and the projection step.Finally, the random graphs have non-trivial mean closure coefficients, but they tend tobe smaller compared to the data, with the exception of mathsx-tags-questions . Similar to thelocal clustering coefficient, we plot the local closure coefficient as a function of degree for twodatasets ( amazon-products-pages and genes-diseases ; Figure 5.2). For amazon-products-pages ,we see the flat closure coefficient as one might expect from Theorem 3.13, although the datahas more closure at baseline. This is likely explained by the fact that two products tendto appear on the same pages, reducing the number of length-2 paths in the data, whereasbipartite connections are made at random in the model. With the genes-diseases dataset, therandom models capture an increase in closure as a function of degree that is also seen in thedata. In this case, the model parameters do not satisfy the assumptions of Theorem 3.13, butthe general empirical behavior is still seen in our random projection model.
6. Conclusion.
We have analyzed a simple bipartite “Chung-Lu style” model that cap-tures some common properties of real-world networks. The simplicity of our model enablestheoretical analysis of properties of the projected graph, giving analytical formulae for graphstatistics such as clustering coefficients, closure coefficients, and the expected degree distribu-tion. We also pair our model with a fast optimal graph generation algorithm, which is provablyoptimal for certain input distributions. Empirically, we find that a substantial amount of clus-tering and closure behavior in real-world networks is explained by sampling from our modelwith the same bipartite degree distribution. However, global clustering is often larger thanpredicted by the projection model.
Acknowledgments.
This research was supported by NSF Award DMS-1830274, AROAward W911NF19-1-0057, ARO MURI, and JPMorgan Chase & Co. We thank Johan Ugan-der for pointing us to the literature on random intersection graphs.
REFERENCES [1]
Y.-Y. Ahn, S. E. Ahnert, J. P. Bagrow, and A.-L. Barab´asi , Flavor network and the principles offood pairing , Scientific Reports, 1 (2011).[2]
E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing , Mixed membership stochastic blockmodels ,Journal of machine learning research, 9 (2008), pp. 1981–2014.[3]
S. G. Aksoy, T. G. Kolda, and A. Pinar , Measuring and modeling bipartite graphs with communitystructure , Journal of Complex Networks, 5 (2017), pp. 581–603.[4]
J. Alstott, E. Bullmore, and D. Plenz , powerlaw: a python package for analysis of heavy-taileddistributions , PloS one, 9 (2014), p. e85777.[5] I. Amburg, N. Veldt, and A. R. Benson , Clustering in graphs and hypergraphs with categorical edgelabels , in Proceedings of the Web Conference, 2020.[6]
A.-L. Barab´asi and R. Albert , Emergence of scaling in random networks , Science, 286 (1999), pp. 509–512.[7]
D. Barber , Clique matrices for statistical graph decomposition and parameterising restricted positivedefinite matrices , in Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intel-ligence, AUAI Press, 2008, pp. 26–33.[8]
A. R. Benson, R. Abebe, M. T. Schaub, A. Jadbabaie, and J. Kleinberg , Simplicial closureand higher-order link prediction , Proceedings of the National Academy of Sciences, 115 (2018),
SIMPLE BIPARTITE GRAPH PROJECTION MODEL FOR CLUSTERING IN NETWORKS 21 pp. E11221–E11230.[9]
A. R. Benson, D. F. Gleich, and J. Leskovec , Higher-order organization of complex networks , Science,353 (2016), pp. 163–166.[10]
M. Bloznelis , Degree and clustering coefficient in sparse random intersection graphs , The Annals ofApplied Probability, 23 (2013), pp. 1254–1289.[11]
M. Bloznelis , Local probabilities of randomly stopped sums of power-law lattice random variables , Lithua-nian Mathematical Journal, 59 (2019), pp. 437–468.[12]
M. Bloznelis and V. Kurauskas , Clustering coefficient of random intersection graphs with infinitedegree variance , Internet Mathematics, (2016), p. 1215.[13]
M. Bloznelis and J. Petuchovas , Correlation between clustering and degree in affiliation networks , inInternational Workshop on Algorithms and Models for the Web-Graph, Springer, 2017, pp. 90–104.[14]
R. L. Breiger , The duality of persons and groups , Social forces, 53 (1974), pp. 181–190.[15]
A. D. Broido and A. Clauset , Scale-free networks are rare , Nature communications, 10 (2019), pp. 1–10.[16]
C. Cannone , A short note on poisson tail bounds ∼ ccanonne/files/misc/2017-poissonconcentration.pdf.[17] P. S. Chodrow , Configuration models of random hypergraphs , arXiv:1902.09302, (2019).[18]
F. Chung and L. Lu , The average distances in random graphs with given expected degrees , Proceedingsof the National Academy of Sciences, 99 (2002), pp. 15879–15882.[19]
F. Chung and L. Lu , Connected components in random graphs with given expected degree sequences ,Annals of combinatorics, 6 (2002), pp. 125–145.[20]
F. Chung, L. Lu, and V. Vu , The spectra of random graphs with given expected degrees , InternetMathematics, 1 (2004), pp. 257–275.[21]
A. Clauset, C. R. Shalizi, and M. E. Newman , Power-law distributions in empirical data , SIAMReview, 51 (2009), pp. 661–703.[22]
M. Deijfen and W. Kets , Random intersection graphs with tunable degree distribution and clustering ,Probability in the Engineering and Informational Sciences, 23 (2009), pp. 661–674.[23]
S. N. Dorogovtsev and J. F. Mendes , Evolution of networks , Advances in physics, 51 (2002), pp. 1079–1187.[24]
D. Easley and J. Kleinberg , Networks, crowds, and markets , vol. 8, Cambridge university pressCambridge, 2010.[25]
B. K. Fosdick, D. B. Larremore, J. Nishimura, and J. Ugander , Configuring random graph modelswith fixed degree sequences , SIAM Review, 60 (2018), pp. 315–355.[26]
X. Fu, S. Yu, and A. R. Benson , Modeling and analysis of tagging networks in stack exchange com-munities , Journal of Complex Networks, (2019), pp. 1–19.[27]
E. Godehardt and J. Jaworski , Two models of random intersection graphs for classification , in Ex-ploratory data analysis in empirical research, Springer, 2003, pp. 67–81.[28]
K.-I. Goh, M. E. Cusick, D. Valle, B. Childs, M. Vidal, and A.-L. Barabasi , The human diseasenetwork , Proceedings of the National Academy of Sciences, 104 (2007), pp. 8685–8690.[29]
M. S. Granovetter , The strength of weak ties , in Social networks, Elsevier, 1977, pp. 347–367.[30]
R. Guimer`a, M. Sales-Pardo, and L. A. N. Amaral , Module identification in bipartite and directednetworks , Physical Review E, 76 (2007).[31]
M. O. Jackson and B. W. Rogers , Meeting strangers and friends of friends: How random are socialnetworks? , American Economic Review, 97 (2007), pp. 890–915.[32]
B. Karrer and M. E. J. Newman , Stochastic blockmodels and community structure in networks , Phys-ical Review E, 83 (2011).[33]
D. B. Larremore, A. Clauset, and A. Z. Jacobs , Efficiently inferring community structure in bipar-tite networks , Physical Review E, 90 (2014).[34]
M. Latapy , Main-memory triangle computations for very large (sparse (power-law)) graphs , Theor. Com-put. Sci., 407 (2008), pp. 458–473.[35]
S. Lattanzi and D. Sivakumar , Affiliation networks , in Proceedings of the 41st annual ACM Sympo-sium on Theory of Computing (STOC), 2009.[36]
J. Leskovec, L. A. Adamic, and B. A. Huberman , The dynamics of viral marketing , ACM Transac-tions on the Web (TWEB), 1 (2007), pp. 5–es. [37]
P. Li and O. Milenkovic , Inhomogeneous hypergraph clustering with applications , in Advances in NeuralInformation Processing Systems, 2017, pp. 2308–2318.[38]
J. Nacher and T. Akutsu , On the degree distribution of projected networks mapped from bipartitenetworks , Physica A: Statistical Mechanics and its Applications, 390 (2011), pp. 4636–4651.[39]
Z. Neal , The backbone of bipartite projections: Inferring relationships from co-authorship, co-sponsorship,co-attendance and other co-behaviors , Social Networks, 39 (2014), pp. 84–97.[40]
M. E. Newman , The structure of scientific collaboration networks , Proceedings of the National Academyof Sciences, 98 (2001), pp. 404–409.[41]
M. E. Newman , The structure and function of complex networks , SIAM Review, 45 (2003), pp. 167–256.[42]
M. E. J. Newman , Coauthorship networks and patterns of scientific collaboration , Proceedings of theNational Academy of Sciences, 101 (2004), pp. 5200–5205.[43]
M. E. J. Newman, S. H. Strogatz, and D. J. Watts , Random graphs with arbitrary degree distribu-tions and their applications , Phys. Rev. E, 64 (2001), p. 026118.[44]
T. Opsahl , Triadic closure in two-mode networks: Redefining the global and local clustering coefficients ,Social Networks, 35 (2013), pp. 159–167.[45]
M. A. Porter, P. J. Mucha, M. E. J. Newman, and C. M. Warmbrand , A network analysis ofcommittees in the U.S. House of Representatives , Proceedings of the National Academy of Sciences,102 (2005), pp. 7057–7062.[46]
A. Rapoport , Spread of information through a population with socio-structural bias: I. assumption oftransitivity , The bulletin of mathematical biophysics, 15 (1953), pp. 523–533.[47]
E. Ravasz, A. L. Somera, D. A. Mongru, Z. N. Oltvai, and A.-L. Barab´asi , Hierarchical organi-zation of modularity in metabolic networks , science, 297 (2002), pp. 1551–1555.[48]
H. J. Ryser , Combinatorial mathematics , vol. 14, American Mathematical Soc., 1963.[49]
C. Seierstad and T. Opsahl , For the few not the many? the effects of affirmative action on presence,prominence, and social capital of women directors in norway , Scandinavian Journal of Management,27 (2011), pp. 44–54.[50]
C. Seshadhri, T. G. Kolda, and A. Pinar , Community structure and scale-free collections of Erd˝os-R´enyi graphs , Physical Review E, 85 (2012), p. 056109.[51]
J. Sun, H. Qu, D. Chakrabarti, and C. Faloutsos , Neighborhood formation and anomaly detectionin bipartite graphs , in Fifth IEEE International Conference on Data Mining, IEEE, 2005.[52]
G. Szab´o, M. Alava, and J. Kert´esz , Structural transitions in scale-free networks , Physical ReviewE, 67 (2003), p. 056102.[53]
A. Taudiere, F. Munoz, A. Lesne, A.-C. Monnet, J.-M. Bellanger, M.-A. Selosse, P.-A.Moreau, and F. Richard , Beyond ectomycorrhizal bipartite networks: projected networks demon-strate contrasted patterns between early- and late-successional plants in corsica , Frontiers in PlantScience, 6 (2015).[54]
C.-Y. Teng, Y.-R. Lin, and L. A. Adamic , Recipe recommendation using ingredient networks , inProceedings of the 3rd Annual ACM Web Science Conference, ACM Press, 2012.[55]
N. Veldt, A. R. Benson, and J. Kleinberg , Localized flow-based clustering in hypergraphs , in Pro-ceedings of the SIGKDD Conference on Knowledge Discovery and Data Mining, 2020.[56]
I. Vogt and J. Mestres , Drug-target networks , Molecular Informatics, 29 (2010), pp. 10–14.[57]
D. J. Watts and S. H. Strogatz , Collective dynamics of ‘small-world’ networks , Nature, 393 (1998),p. 440.[58]
S. A. Williamson and M. Tec , Random clique covers for graphs with local density and global sparsity ,in Proceedings of the Conference on Uncertainty in Artificial Intelligence, 2019.[59]
J. Yang and J. Leskovec , Community-affiliation graph model for overlapping network community de-tection , in 2012 IEEE 12th International Conference on Data Mining, IEEE, Dec. 2012.[60]
H. Yin, A. R. Benson, and J. Leskovec , The local closure coefficient: a new perspective on networkclustering , in Proceedings of the Twelfth ACM International Conference on Web Search and DataMining, ACM, 2019, pp. 303–311.[61]
H. Yin, A. R. Benson, and J. Ugander , Measuring directed triadic closure with closure coefficients ,Network Science, (2020), pp. 1–23.[62]
Y. Zhang, A. Friend, A. L. Traud, M. A. Porter, J. H. Fowler, and P. J. Mucha , Communitystructure in congressional cosponsorship networks , Physica A: Statistical Mechanics and its Applica-
SIMPLE BIPARTITE GRAPH PROJECTION MODEL FOR CLUSTERING IN NETWORKS 23 tions, 387 (2008), pp. 1705–1712.[63]
D. Zhou, J. Huang, and B. Sch¨olkopf , Learning with hypergraphs: Clustering, classification, andembedding , in Advances in neural information processing systems, 2007, pp. 1601–1608.[64]
T. Zhou, J. Ren, M. c. v. Medo, and Y.-C. Zhang , Bipartite network projection and personal recom-mendation , Phys. Rev. E, 76 (2007), p. 046115.
Appendix A. Connection between conditional probability and empirical clustering.
Here, we show that the conditional probability formulation for clustering is exactly aweighted average of the standard empirical clustering coefficient for the power-law type dis-tributions our model explores. We demonstrate this below for the local clustering coefficient.The case for local closure is similar.Fix a node u and suppose we generate a graph G i under our random graph model. Let W i and T i be the number of wedges and triangles at node u in the projected graph G i . Theempirical clustering coefficient ˜ C i ( u ) is equal to T i /W i . Weighting each sample ˜ C i by W i , theweighted clustering coefficient is P si =1 W i ˜ C i ( u ) P si =1 W i = s P si =1 T i s P si =1 W i . As the number of samples s (i.e.the size of the graph) approaches infinity, both the numerator and denominator approachestheir expectations since each sample is independent. Computing this expectation, we see thatit is exactly the value of C ( u ) computed in Theorem 3.8.In the case of the global closure coefficient, a similar argument shows that we actuallyhave equality between the conditional and non-conditional definition (in the limit that thesize of the graph goes to infinity). Appendix B. Power-law statistics in real-world bipartite networks.
In many of ourdatasets, we find that power-law degree distributions are a reasonable approximation for theleft and right sides of the bipartite network (Table B.1).Table B.1: Estimated power law (PL) exponents of the left and right degree distributionsin the bipartite graph datasets in Table 5.2 (an exponent of α corresponds to a distributiondecay ∝ k − α ). Parameters were fit using the powerlaw pyton package [4]. We also report theKolmogorov-Smirnov statistic D between the fit model and the data.dataset left PL exponent D right PL exponent Dactors-movies 1.862 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± Appendix C. Additional proofs.Proof of Lemma 3.9.
Let A i denote the event that ( u, u i ) ∈ E for i = 1 ,
2. We want tocompute the probability of A ∩ A . We first decompose the probability as follows:(C.1) P [ A ∩ A ] = P [ A ] + P [ A ] − P [ A ∪ A ] = P [ A ] + P [ A ] + P (cid:2) ¯ A ∩ ¯ A (cid:3) − . The probability that events A i occur is given by Theorem 3.4, so we compute the proba-bility of ¯ A ∩ ¯ A , which is the event that u is not connected to either u or u in the projectedgraph. This happens if and only if, in the bipartite graph, for every v ∈ R , we have that (i) u is not connected to v , or (ii) both u and u are not connected to v . For now, let v be afixed node on the right. Conditioning on w v and using the fact that edge formations in thebipartite graph are independent, the probability is1 − w u w v n R M R + w u w v n R M R (cid:16) − w u w v n R M R (cid:17) (cid:16) − w u w v n R M R (cid:17) = 1 − w u ( w u + w u ) w v n R M R + w u w u w u w v n R M R . Therefore, we havelog( P (cid:2) ¯ A ∩ ¯ A (cid:3) ) = P v ∈ R log (cid:16) − w u ( w u + w u ) w v n R M R + w u w u w u w v n R M R (cid:17) = P v ∈ R h − w u ( w u + w u ) w v n R M R + w u w u w u w v n R M R − w u ( w u + w u ) w v n R M R · (1 + O ( n − δR )) i = − p uu − p uu + M R M R M R p uu p uu w u − M R n R M R ( p uu + p uu ) · (1 + O ( n − δR )) . Consequently, P (cid:2) ¯ A ∩ ¯ A (cid:3) = 1 − p uu − p uu + p uu + p uu + p uu p uu − p uu + p uu (1 + O ( n − δR )) + o ( p uu p uu )+ M R M R M R p uu p uu w u · (1 + O ( n − δ )) − M R n R M R ( p uu + p uu )(1 + O ( n − δR )) . Combining everything, the probability of wedge formation is P [ A ∩ A ] = (cid:18) M R M R M R · w u + 1 (cid:19) p uu p uu · (1 + O ( n − δR ) + o (1))+ (cid:18) p uu + p uu M R n R M R (cid:19) ( p uu + p uu ) · O ( n − δR )= (cid:18) M R M R M R · w u + 1 (cid:19) p uu p uu · (cid:18) O ( n − δR ) + o (1) + (cid:18) w u w u + w u w u (cid:19) O ( n − δR ) (cid:19) = (cid:18) M R M R M R · w u + 1 (cid:19) p uu p uu · (cid:16) O ( n − δR ) + o (1) + O ( n / − δR ) (cid:17) where the last equality is due to w u i ∈ [1 , n / − δ ] and their ratio is bounded by n / − δ . Since δ > /
10, the proof is complete.
SIMPLE BIPARTITE GRAPH PROJECTION MODEL FOR CLUSTERING IN NETWORKS 25
Proof of Lemma 3.10.
For nodes u, u , u to form a triangle, one of two cases musthappen. The first is the case that all three nodes connect to a same node in the rightpartition. If the first case does not happen, then each pair ( u, u ), ( u, u ), ( u , u ) have adifferent common neighbor in the bipartite graph, forming a length-6 cycles. Now we analyzethese two cases separately.In the first case, there exists a node v ∈ R such that the three nodes u, u , u are connectedto v . For any specific node v ∈ R , the probability is w u w u w u n R M R · w v , and thus P [ ∃ v ∈ R s.t. ( u, v ) , ( u , v ) , ( u , v ) ∈ E b ] = 1 − Y v ∈ R (cid:18) − w u w u w u n R M R · w v (cid:19) = 1 − exp − w u w u w u n R M R · X v ∈ R w v · (1 + O ( n − δR )) ! = p uu p uu · M R M R M R · w u · (1 + O ( n − δR )) . In the second case, u, u , u are pairwise connected through a different node on the rightseparately, forming a 6-cycle. For any node triple v , v , v , the probability is P [( u, v , u , v , u , v ) forms a 6-cycle] = w u w u w u n R M R · w v w v w v . Therefore, the total probability of the second case is P [ ∃ a 6-cycle containing u, u , u ] ≤ X v ,v ,v ∈ Rv = v = v w u w u w u n R M R · w v w v w v ≤ w u w u w u n R M R · P v ∈ R w v n R P v ∈ R w v n R P v ∈ R w v n R = p uu p uu p u u = o ( p uu p uu ) ..