Protein interaction networks and biology: towards the connection
PProtein interaction networks and biology:towards the connection
A Annibale † , ACC Coolen †‡§ , N Planell-Morell † † Department of Mathematics, King’s College London, The Strand, London WC2R 2LS, UK ‡ Institute for Mathematical and Molecular Biomedicine, King’s College London, HodgkinBuilding, London SE1 1UL, UK § London Institute for Mathematical Sciences, 22 South Audley St, London W1K 2NY, UK
Abstract.
Protein interaction networks (PIN) are popular means to visualize the proteome.However, PIN datasets are known to be noisy, incomplete and biased by the experimental protocolsused to detect protein interactions. This paper aims at understanding the connection between trueprotein interactions and the protein interaction datasets that have been obtained using the mostpopular experimental techniques, i.e. mass spectronomy (MS) and yeast two-hybrid (Y2H). Weshow that the most natural adjacency matrix of protein interaction networks has a separable form,and this induces precise relations between moments of the degree distribution and the number ofshort loops. These relations provide powerful tools to test the reliability of datasets and hint atthe underlying biological mechanism with which proteins and complexes recruit each other.
1. Introduction
A protein interaction network (PIN) is a graph where nodes i = 1 . . . N represent proteins andlinks represent their interactions. This graph is encoded in an adjacency matrix a = { a ij } , whoseentries denote whether there is a link between proteins i and j ( a ij = 1) or not ( a ij = 0). However,there is ambiguity in its definition, arising from the non-binarity of the underlying biochemistry.For example, three proteins may form a complex, but may not interact in pairs. Assigningbinary values to intrinsically non-binary interactions requires further prescriptions, which varyacross experimental protocols and lead in practice to different graphs. Moreover, differentexperiments measure protein interactions in different ways, which causes further biases [1, 2, 3].For quantitative studies of the effects of sampling biases on networks see e.g. [4, 6, 7, 5, 8, 9, 10].In this paper we seek to establish the connection between true biological protein interactionsand protein interaction datasets produced by the most popular experimental techniques, massspectronomy (MS) and yeast two-hybrid (Y2H). We argue that the most natural network matrixrepresentation of the proteome has a separable form, which induces precise relations betweenthe degree distribution and the density of short loops. These relations provide simple teststo assess the reliability and quality of different data sets, and provide hints on the underlying(evolutionary) mechanisms with which proteins and complexes recruit each other. Our studyalso provides a theoretical framework to discriminate between ‘party’ and ‘date’ hubs in proteininteraction networks, see e.g. [17] and references therein, and addresses several intriguing a r X i v : . [ c ond - m a t . d i s - nn ] J a n rotein interaction networks and biology: towards the connection protein species i = 1 . . . N complexes µ = 1 . . . αN d = 4 d = 2 · · · · · · · · · d N = 3 q = 2 q = 2 · · · · · · · · · q αN = 3 (cid:119) (cid:119) (cid:119) (cid:119) (cid:119) (cid:119) (cid:119) (cid:119) (cid:3)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3) (cid:67)(cid:67)(cid:67)(cid:67)(cid:67)(cid:67)(cid:67)(cid:67)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:80)(cid:28)(cid:28)(cid:28)(cid:28)(cid:28)(cid:28)(cid:28)(cid:28) (cid:92)(cid:92)(cid:92)(cid:92)(cid:92)(cid:92)(cid:92)(cid:92)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3) (cid:28)(cid:28)(cid:28)(cid:28)(cid:28)(cid:28)(cid:28)(cid:28) (cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:97)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3) (cid:67)(cid:67)(cid:67)(cid:67)(cid:67)(cid:67)(cid:67)(cid:67)(cid:28)(cid:28)(cid:28)(cid:28)(cid:28)(cid:28)(cid:28)(cid:28) (cid:92)(cid:92)(cid:92)(cid:92)(cid:92)(cid:92)(cid:92)(cid:92)(cid:90)(cid:90)(cid:90)(cid:90)(cid:90)(cid:90)(cid:90)(cid:90)(cid:90)(cid:90)(cid:92)(cid:92)(cid:92)(cid:92)(cid:92)(cid:92)(cid:92)(cid:92)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3) (cid:67)(cid:67)(cid:67)(cid:67)(cid:67)(cid:67)(cid:67)(cid:67)(cid:92)(cid:92)(cid:92)(cid:92)(cid:92)(cid:92)(cid:92)(cid:92)(cid:90)(cid:90)(cid:90)(cid:90)(cid:90)(cid:90)(cid:90)(cid:90)(cid:90)(cid:90)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3) (cid:92)(cid:92)(cid:92)(cid:92)(cid:92)(cid:92)(cid:92)(cid:92)(cid:28)(cid:28)(cid:28)(cid:28)(cid:28)(cid:28)(cid:28)(cid:28) (cid:67)(cid:67)(cid:67)(cid:67)(cid:67)(cid:67)(cid:67)(cid:67)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3) Figure 1.
Bipartite graph (or ‘factor graph’) representation of protein interactions. The proteinspecies i = 1 . . . N are drawn as circles, and their complexes µ = 1 . . . αN as squares. We write thedegree of protein i as d i (the number of complexes it participates in), and the degree of complex µ as q µ (the number of protein species it contains). The bipartite graph gives more detailedinformation than the conventional PIN with protein nodes and pairwise links only. For instance,one distinguishes easily between different types of ‘hub’ proteins: ‘date hub’ proteins connect tomany degree-2 complexes, whereas ‘party hub’ proteins connect to a high degree complex. questions concerning the universality of protein and complex statistics across species. Forexample, given N protein species in a cell, what is the number of complexes they typicallyform, i.e. to what extent is the ratio complexes/proteins conserved across different species? Isthe distribution of complex sizes peaked around ‘typical’ values, or does it have long tails? Howis this mirrored in the protein promiscuities, i.e. the propensities of proteins to participate inmultiple complexes? Does the power law behaviour of the degree distribution of protein interactionnetworks perhaps result from tails in the distribution of complex sizes and protein promiscuities?We tackle the above questions using an approach that is entirely based on statisticalproperties of graph ensembles. In section 2 we first define our models. Sections 3, 4 and 5are devoted to the derivation of properties of distinct separable graph ensembles which mimicprotein interaction networks, each reflecting different possible mechanisms for complex genesis.In section 6 we test these properties in synthetically generated graphs, and in section 7 we do thesame for protein interaction networks measured by MS and Y2H experiments. We end our paperwith a summary of our conclusions, and suggest pathways for further research.
2. Definitions and basic properties
Proteins are large and complicated heteropolymers, which can bind in specific combinations toform stable molecular complexes. We consider a set of N protein species, labelled by i = 1 . . . N .We assume that the number of stable complexes p scales as p = αN where α >
0, and we label thecomplexes by µ = 1 . . . αN . We can represent this system as a bi-partite graph [11], see Figure1, with two sets of nodes. The set ν p represents proteins (drawn as circles), the set ν c representscomplexes (drawn as squares), and a link between protein i ∈ ν p and complex µ ∈ ν c is drawnif protein i participates in complex µ . This graph is defined by the N × αN connectivity matrix rotein interaction networks and biology: towards the connection ξ = { ξ µi } , where ξ µi = 1 if there is a link between i and µ , and ξ µi = 0 otherwise. For simplicitywe do not allow for complexes with more than one occurrence of any given protein species.In the bipartite graph one has two types of node degrees: the degree d i ( ξ ) = (cid:80) µ ξ µi (or‘promiscuity’) of each protein i gives the number of different complexes in which it is involved, andthe degree q µ ( ξ ) = (cid:80) i ξ µi (or ‘size’) of each complex µ gives the number of protein species of whichit is formed. We define the distribution of promiscuities in graph ξ as p ( d | ξ ) = N − (cid:80) i δ d,d i ( ξ ) ,with the average promiscuity (cid:104) d ( ξ ) (cid:105) = (cid:80) d dp ( d | ξ ), and the distribution of complex sizes as p ( q | ξ ) = ( αN ) − (cid:80) αNµ =1 δ q,q µ ( ξ ) , with the average complex size (cid:104) q ( ξ ) (cid:105) = (cid:80) q qp ( q | ξ ). Since thenumber of links is conserved, we always have (cid:104) d ( ξ ) (cid:105) = α (cid:104) q ( ξ ) (cid:105) for any bipartite graph ξ . Since we generally do not know the microscopic bipartite graph ξ , we will regard it as a quenchedrandom object. Several natural choices can be proposed for its distribution p ( ξ ). If we assumethat complexes recruit proteins, independently and with the same likelihood, we are led to p A ( ξ ) = (cid:89) iµ (cid:20) q µ N δ ξ µi , + (cid:18) − q µ N (cid:19) δ ξ µi , (cid:21) (1)with δ xy = 1 for x = y and 0 otherwise, and where the { q µ } are distributed according to P ( q ) = ( αN ) − (cid:80) µ δ q,q µ . For graphs ξ drawn from the ensemble (1) and N → ∞ , each complexsize q µ ( ξ ) is a Poissonian random variable with average q µ , and all protein promiscuities d i ( ξ ) arePoissonian variables with average (cid:104) d (cid:105) = α (cid:104) q (cid:105) , since p ( d ) = lim N →∞ (cid:104) δ d, (cid:80) µ ξ µi (cid:105) = lim N →∞ (cid:90) π − π d ω π e i ωd (cid:104) e − i ω (cid:80) µ ξ µi (cid:105) = (cid:90) π − π d ω π e i ωd + α (cid:104) q (cid:105) (e − i ω − = e − α (cid:104) q (cid:105) ( α (cid:104) q (cid:105) ) d /d ! (2)In the scenario (1) complexes have sizes that are determined e.g. by their functions, and thiscontrols the promiscuities of the recruited proteins. Alternatively one could assume that thelikelihood of a protein participating in a complex is driven by its promiscuitiy, leading to the‘dual’ ensemble p B ( ξ ) = (cid:89) iµ (cid:34) d i αN δ ξ µi , + (cid:32) − d i αN (cid:33) δ ξ µi , (cid:35) (3)where the { d i } are distributed according to P ( d ) = N − (cid:80) i δ d,d i . Here as N → ∞ the proteinpromiscuities d i ( ξ ) are Poissonian variables with averages d i , whereas all complex sizes q µ ( ξ ) arePoisson variables with identical average (cid:104) q (cid:105) = (cid:104) d (cid:105) /α , since p ( q ) = lim N →∞ (cid:104) δ q, (cid:80) i ξ µi (cid:105) = lim N →∞ (cid:90) π − π d ω π e i ωq (cid:104) e − i ω (cid:80) i ξ µi (cid:105) == (cid:90) π − π d ω π e i ωq + (cid:104) d (cid:105) α (e − i ω − = e −(cid:104) d (cid:105) /α ( (cid:104) d (cid:105) /α ) q /q ! (4)In this second ensemble proteins have intrinsic promiscuities, determined e.g. by the number oftheir binding sites, their polarization and so on, and these drive their recruitment to complexes. rotein interaction networks and biology: towards the connection
4A third obvious choice is the ‘mixed’ ensemble p C ( ξ ) = (cid:89) iµ (cid:34) d i q µ αN (cid:104) q (cid:105) δ ξ µi , + (cid:32) − d i q µ αN (cid:104) q (cid:105) (cid:33) δ ξ µi , (cid:35) (5)where all protein promiscuities and complex sizes are constrained on average, i.e. (cid:104) d i ( ξ ) (cid:105) = d i and (cid:104) q µ ( ξ ) (cid:105) = q µ , with { d i } and { q µ } distributed according to P ( d ) and P ( q ). Here proteinbinding statistics are driven both by complex functionality and protein promiscuity factors. Themixed ensemble (5) reduces to (1) for the choice P ( d ) = δ d,α (cid:104) q (cid:105) , and to (3) when P ( q ) = δ q, (cid:104) q (cid:105) .By determining which of the above ensemble reflects better biological reality, we will thus learnabout the mechanisms with which complexes and proteins recruit each other.The above three ensembles become equivalent when q µ = (cid:104) q (cid:105) ∀ µ and d i = α (cid:104) q (cid:105) ∀ i . Inthat case complex sizes and protein promiscuities are homogeneous, and the recruitment processbetween proteins and complexes is fully random. Bipartite graphs drawn from (1) were foundto have modular topologies, and to accomplish parallel information processing for suitable valuesof the parameter α [14, 12]. Their ensemble entropy has been calculated in [15]. One can showeasily that if one replaces the soft constraints on the local degrees in our soft-constrained graphensembles (1,3) by hard constraints, then one finds asymptotically the same distributions (2,4).Finally, we note that all three ensembles (1,3,5) are of the form p ( ξ ) = (cid:81) iµ p iµ ( ξ µi ), so there areno correlations between the entries of ξ . This strong assumption of our models will need to bechecked a posteriori. In all PINs each protein is reduced to a simple network node, in spite of the fact that proteinsare in reality complex chains of aminoacids with several binding domains. Here we show that theensembles introduced in the previous section can accommodate the presence of multiple bindingsites when these are equally reactive. Let us first assume that each protein has d functional reactiveamino-acid endgroups. When two such proteins bind, the resulting dimer has 2 d − d − k -mer has kd − k −
1) = ( d − k + 2endgroups. If all endgroups are equally reactive, the a priori probability that a protein i is partof a complex µ is given by p ( ξ µi = 1) = d [( d − q µ + 2] Z (cid:39) q µ dαN (cid:104) q (cid:105) (6)where the last approximate equality holds for d (cid:29) Z = (cid:80) µ q µ d = αN (cid:104) q (cid:105) d . This correspondsto ensemble (1), with the choice d = α (cid:104) q (cid:105) . If proteins have different endgroups d i , p ( ξ µi = 1) (cid:39) d i [( d − q µ + 2] αN (cid:104) q (cid:105) d (cid:39) d i q µ αN (cid:104) q (cid:105) (7)where d = N − (cid:80) i d i , leading to ensemble (5). If the variability of q µ is small, q µ (cid:39) (cid:104) q (cid:105) , p ( ξ µi = 1) = d i αN (8)and we retrieve (3). The assumption of unbiased interactions between proteins with varyingindividual binding affinities has been supported in [23]. rotein interaction networks and biology: towards the connection Protein detection experiments seek to measure for each pair ( i, j ) of protein species whether theyinteract in any complex, and assign an undirected link between nodes i and j if they do. Hencethe PIN adjacency matrix a = { a ij } resulting from such experiments can be expressed in termsof the entries of the bipartite graph ξ in Figure 1 via a ij = θ ( αN (cid:88) µ =1 ξ µi ξ µj ) ∀ i (cid:54) = j (9)and a ii = 0 ∀ i , with the convention θ (0) = 0 for the step function, defined by θ ( x >
0) = 1and θ ( x <
0) = 0. The aim of this paper hence translates into studying the properties of thefollowing ensemble of nondirected random graphs, in which the { ξ µi } are drawn from either of theensembles (1,3,5): p ( a ) = (cid:68) (cid:89) i
3. Network properties generated by the q -ensemble In this section we study the statistical properties of the ensembles (11) and (10) upon generatingthe bipartite protein interaction graph ξ from ensemble (1), where complexes recruit proteins. For the graphs c of (11) we find the following expectation values of individual bonds (cid:104) c ij (cid:105) = αN (cid:88) µ =1 (cid:104) ξ µi ξ µj (cid:105) ξ = αN (cid:88) µ =1 (cid:18) q µ N (cid:19) = αN (cid:104) q (cid:105) (12)where the brackets on the right-hand side denote averaging over the complex size distribution P ( q ). The likelihood of an individual bond is (see Appendix A) p ( c ij ) = (cid:104) δ c ij , (cid:80) µ ≤ αN ξ µi ξ µj (cid:105) ξ = δ c ij , + α (cid:104) q (cid:105) N ( δ c ij , − δ c ij , ) + (cid:16) α (cid:104) q (cid:105) N − α (cid:104) q (cid:105) N (cid:17) ( δ c ij , − δ c ij , + δ c ij , ) rotein interaction networks and biology: towards the connection α (cid:104) q (cid:105) N ( δ c ij , − δ c ij , + 3 δ c ij , − δ c ij , ) + O ( N − ) (13)so we find for the first few probabilities: p (0) = 1 − α (cid:104) q (cid:105) N + α (cid:104) q (cid:105) N − α (cid:104) q (cid:105) N − α (cid:104) q (cid:105) N + O ( N − ) (14) p (1) = α (cid:104) q (cid:105) N − α (cid:104) q (cid:105) N + α (cid:104) q (cid:105) N + α (cid:104) q (cid:105) N + O ( N − ) (15)and hence (cid:88) (cid:96)> p ( (cid:96) ) = 1 − p (0) − p (1) = O ( N − ) , (cid:88) (cid:96)> (cid:96)p ( (cid:96) ) = (cid:104) c ij (cid:105) − p (1) = O ( N − ) (16)The probability to have c ij (cid:54) = 0 is of order O ( N − ), so the graphs generated by (11) are finitelyconnected. Moreover, although the graphs c are in principle weighted, for large N the number oflinks per node that are not in { , } will be vanishingly small. We now turn to the calculation of expectation values for different observables in ensemble (11).First, we calculate the average number of ordered and oriented loops of length 3 per node, whichare (see Appendix A): m = (cid:68) N (cid:88) ijk c ij c jk c ki (cid:69) ξ = 1 N αN (cid:88) µνρ =1 (cid:88) i (cid:54) = j (cid:54) = k (cid:68) ξ µi ξ µj ξ νj ξ νk ξ ρk ξ ρi (cid:69) ξ (17)= α (cid:104) q (cid:105) + O ( N − ) (18)Calculating the density of loops m L for lengths L > ξ . We define a star S n to be a simple ( n +1)-node tree in ξ , of which the centralnode belongs to ν c (the complexes), and the n leaves belong to ν p (the proteins). Thus S starsrepresent protein dimers, S stars represent protein trimers, and so on. Each link in c correspondsto at least one S star in the bipartite graph (which, in turn, can be a subset of any S n star with n > S stars in the bipartite graph, (cid:88) µ (cid:88) i (cid:54) = j (cid:104) ξ µi ξ µj (cid:105) = (cid:88) µ (cid:88) i (cid:54) = j (cid:104) ξ µi (cid:105)(cid:104) ξ µj (cid:105) = (cid:88) i (cid:54) = j (cid:88) µ q µ N = α ( N − (cid:104) q (cid:105) (19)has to equate in leading order the total number of links N (cid:104) k (cid:105) in graph c , yielding (cid:104) q (cid:105) = (cid:104) k (cid:105) α + O ( N − ) (20)which is indeed in agreement with the result of the direct calculation (cid:104) k (cid:105) = N − (cid:80) ij (cid:104) c ij (cid:105) , using(12). Similarly we can obtain the number of loops of length 3, calculated earlier, by realising thatthese loops arise when we have in the bipartite graph either a star S (which can be a subset ofany S n with n >
3) or a combination of three S stars, where every leaf is shared by two stars.The contribution of the number of S stars per node to the number of loops of length 3 is1 N (cid:88) µ (cid:88) i (cid:54) = j (cid:54) = k ( (cid:54) = i ) (cid:104) ξ µi ξ µj ξ µk (cid:105) = 1 N (cid:88) µ (cid:88) i (cid:54) = j (cid:54) = k ( (cid:54) = i ) (cid:104) ξ µi (cid:105)(cid:104) ξ µj (cid:105)(cid:104) ξ µk (cid:105) = 1 N (cid:88) µ (cid:88) i (cid:54) = j (cid:54) = k ( (cid:54) = i ) q µ N = α (cid:104) q (cid:105) + O ( N − ) (21) rotein interaction networks and biology: towards the connection S stars, where each leaf is shared by two stars, is1 N (cid:88) [ µ,ν,ρ ] (cid:88) [ i,j,k ] (cid:104) ξ µi ξ µj ξ νj ξ νk ξ ρk ξ ρi (cid:105) = 1 N (cid:88) [ µ,ν,ρ ] (cid:88) [ i,j,k ] q µ q ν q ρ N = 1 N α (cid:104) q (cid:105) + O ( N − ) (22)with the square brackets [ i, j, k ] denoting that the three indices are distinct. The expected densityof length-3 loops is the sum of an O (1) contribution from S stars, plus an O ( N − ) contributionfrom combinations of three S stars that share leaves. For large N the second contributionvanishes, and we recover m = α (cid:104) q (cid:105) . Likewise, the O (1) contribution to the density of length-4loops comes from S stars in the bi-partite graph, which consist of five sites (four leaves and onecentral node) and four links, each with probability O ( N − ). Combinations of two S stars withtwo shared leaves, or of S stars, always involve a number of links at least equal to the numberof nodes and therefore yield sub-leading contributions. Hence, the density of loops of length 4 is m = 1 N (cid:88) µ (cid:88) [ i,j,k,(cid:96) ] (cid:104) ξ µi ξ µj ξ µk ξ µ(cid:96) (cid:105) = α (cid:104) q (cid:105) + O ( N − ) (23)More generally, the average density of loops of arbitrary length L is given by m L = α (cid:104) q L (cid:105) + O ( N − ) (24)For large N the ratio α and the distribution P ( q ) of complex sizes apparently determine in fullthe statistics of loops of arbitrary length in c , if the protein interactions are described by (1).Finally, we note that if m L gives the number of ordered and oriented loops of length L pernode, the number of unordered and unoriented closed paths of length L equals ¯ m L = m L /
6, sincethere are L possible nodes to start a closed path from, and two possible orientations. It follows from (20, 24) that by measuring the average degree (cid:104) k (cid:105) and the densities m L of loopsof length L we can compute all the moments of the distribution of complex sizes P ( q ): (cid:104) q (cid:105) = (cid:104) k (cid:105) /α, ∀ L > (cid:104) q L (cid:105) = m L /α (25)This would allow us to calculate P ( q ) in full via its generating function, provided α and (cid:104) q (cid:105) areknown. However, counting the number of loops of arbitrary length in a graph is computationallychallenging, and α and (cid:104) q (cid:105) are generally unknown. However, it is possible to express P ( q ) forlarge N in terms of the degree distribution p ( k ) of c . Specifically, in Appendix B we show thatlim N →∞ p ( k ) = (cid:90) ∞ d y P ( y ) e − y y k /k ! (26)where P ( y ) = e − α (cid:104) q (cid:105) (cid:88) (cid:96) ≥ ( α (cid:104) q (cid:105) ) (cid:96) (cid:96) ! (cid:88) q ...q (cid:96) ≥ W ( q ) . . . W ( q (cid:96) ) δ [ y − (cid:88) r ≤ (cid:96) q r ] (27)and W ( q ) = qP ( q ) / (cid:104) q (cid:105) is the likelihood to draw a link attached to a complex-node of degree q inthe bipartite graph ξ . Formula (26) is easily interpreted. The degree of node i in c is given by thesecond neighbours of i in ξ ; the number (cid:96) of first neighbours of node i will thus be a Poissonianvariable with average α (cid:104) q (cid:105) , and each of its (cid:96) first neighbours will have a degree q r drawn from rotein interaction networks and biology: towards the connection W ( q r ). Clearly, any tail in the distribution W ( q ) will induce a tail in the distribution p ( k ), with(as we will show below) the same exponent, but an amplitude that is reduced by a factor α (cid:104) q (cid:105) .One can complement (26) with a reciprocal relation that gives P ( q ) in terms of p ( k ). Toachieve this we define the generating functions Q ( z ) = (cid:80) k p ( k )e − kz , Q ( z ) = (cid:82) ∞ d y P ( y )e − yz and Q ( z ) = (cid:80) q W ( q )e − zq . We then see from expression (26) for p ( k ) that Q ( z ) = (cid:90) ∞ d y P ( y ) e − y (cid:88) k ≥ ( y e − z ) k k ! = (cid:90) ∞ d y P ( y ) e y [e − z − = Q (1 − e − z ) (28) Q ( z ) = e − α (cid:104) q (cid:105) (cid:88) (cid:96) ≥ ( α (cid:104) q (cid:105) ) (cid:96) (cid:96) ! (cid:88) q ...q (cid:96) ≥ W ( q ) . . . W ( q (cid:96) ) e − z (cid:80) r ≤ (cid:96) q r = e − α (cid:104) q (cid:105) (cid:88) (cid:96) ≥ ( α (cid:104) q (cid:105) Q ( z )) (cid:96) (cid:96) ! = e α (cid:104) q (cid:105) [ Q ( z ) − (29)The first identity can be rewritten as Q ( − log(1 − y )) = Q ( y ). Inserting this into (29), allowsus to express the desired Q ( z ) as Q ( z ) = 1 + log Q ( z ) α (cid:104) q (cid:105) = 1 + log Q ( − log(1 − z )) α (cid:104) q (cid:105) (30)which translates into (cid:88) q> P ( q ) q e − zq = (cid:104) q (cid:105) + 1 α log (cid:88) k p ( k )(1 − z ) k (31)We can now extract the asymptotic form of P ( q ) from that of p ( k ). The generating functions Q ( z ) of degree distributions that exhibit prominent tails, i.e. p ( k ) (cid:39) Ck − µ for large k with2 < µ < z of the form Q ( z ) = 1 − (cid:104) k (cid:105) z + C Γ(1 − µ ) z µ − + . . . (32)where Γ is Euler’s gamma function [21]. For small z we may use 1 − z (cid:39) e − z to rewrite (30) aslog Q ( z ) (cid:39) α (cid:104) q (cid:105) [ Q ( z ) −
1] (33)Combining this with (32) then gives, for small z , − (cid:104) k (cid:105) z + C Γ(1 − µ ) z µ − (cid:39) α (cid:104) q (cid:105) [ Q ( z ) −
1] (34)Hence, for small z , Q ( z ) has the same form as Q ( z ), Q ( z ) = 1 − (cid:104) k (cid:105) α (cid:104) q (cid:105) z + Cα (cid:104) q (cid:105) Γ(1 − µ ) z µ − (35)Therefore W ( q ) behaves asymptotically in the same way as p ( k ), i.e. W ( q ) (cid:39) ( C/α (cid:104) q (cid:105) ) q − µ . This,in turn, gives P ( q ) (cid:39) ( C/α ) q − µ − (36)The complex size distribution P ( q ) in (1) decays faster than the degree distribution of theassociated c , so fat tails in the degree distribution of protein interaction networks can emergefrom less heterogeneous complex size distributions. In particular, complex size distributions witha finite second moment (but diverging higher moments) give scale-free degree distributions in c .This is consistent with the intuition that, while large hubs are often observed in protein interaction rotein interaction networks and biology: towards the connection Figure 2.
Symbols: theoretical (cid:104) . . . (cid:105) th versus measured (cid:104) . . . (cid:105) m values of observables (cid:104) k (cid:105) , (cid:104) k (cid:105) , m and m in synthetically random graphs c with N = 3000, defined via (1,11) for a power-lawdistributed complex size distribution P ( q ). Theoretical values are given by formulae (37) for (cid:104) k (cid:105) ,(38) for (cid:104) k (cid:105) , (24) and (40) for m and (24) and (41) for m . Dotted lines: the diagonals (as aguides to the eye). networks, super-complexes of the same number of proteins are unlikely to be stable. Indeed, manyinteractions in hubs are ‘date’ type, as opposed to ‘party’ type [17]. Our framework allows usto discriminate between different type of hub proteins, and suggests that heterogeneities in PINsmay emerge from homogeneous protein ‘dating’ and moderately heterogenous protein ‘partying’. P ( q ) and α The first two moments of p ( k ) are given, to leading order in N , by (see Appendix B) (cid:104) k (cid:105) = α (cid:104) q (cid:105) + O ( N − ) (37)which is in agreement with (20), and (cid:104) k (cid:105) = α (cid:104) q (cid:105) + α (cid:104) q (cid:105) + α (cid:104) q (cid:105) (38)The latter is easily interpreted in terms of the underlying bipartite graph: (cid:104) k (cid:105) is the averagedensity of paths of length two, so it has a contribution from (cid:104) k (cid:105) = α (cid:104) q (cid:105) due to backtracking,plus a contribution from pairs of S stars that share a node, whose density is1 N (cid:88) [ ijk ] (cid:88) µ (cid:54) = ν (cid:104) ξ µi ξ µj ξ νj ξ νk (cid:105) = 1 N (cid:88) [ ijk ] (cid:88) µ (cid:54) = ν q µ N q ν N = α (cid:104) q (cid:105) , (39) rotein interaction networks and biology: towards the connection S stars, whose density is α (cid:104) q (cid:105) (as shown earlier). Combining (38) with(25) gives us a relation between average and width of the degree distribution of c and its densityof length-3 loops. Remarkably, this relation is completey independent of α and P ( q ): m = (cid:104) k (cid:105) − (cid:104) k (cid:105) − (cid:104) k (cid:105) (40)This identity and others, which all depend only on the separable underlying nature of the PINand the assumption of complex-driven recruitment of proteins to complexes, can be derived moresystematically from (31) by expanding both sides as power series in z and comparing the expansioncoefficients. This gives a hierarchy of relations between moments of p ( k ) and P ( q ), and hence (via(24)) between moments of p ( k ) and densities of loops of increasing length, that are all completelyindependent of α and P ( q ). At order z one recovers (40). The next order z leads to m = (cid:104) k (cid:105) − (cid:104) k (cid:105) + 2 (cid:104) k (cid:105) + (cid:104) k (cid:105) ( (cid:104) k (cid:105) − (cid:104) k (cid:105) − (cid:104) k (cid:105) )= (cid:104) k (cid:105) − (cid:104) k (cid:105) + 2 (cid:104) k (cid:105) − (cid:104) k (cid:105) − (cid:104) k (cid:105) m (41)To test these asymptotic identities in finite systems, we generate random graphs c of size N = 3000according to (1,11), and we compared the measured values of m and m in these random graphswith the predictions of formulae (40) and (41), respectively. We show the results in Figure 2. a and c graph definitions In conventional experimental PIN data bases one records only whether or not protein pairsinteract, not the number of complexes in which they interact. Hence, protein interactions arenormally represented in terms of the adjacency matrix a = { a ij } , which is related to the weightedmatrix c = { c ij } via a ij = θ ( c ij ) ∀ ( i (cid:54) = j ), with the convention for the step function θ (0) = 0. Wetherefore have p ( a ij ) = (cid:104) δ c ij , (cid:105) δ a ij, + (1 − (cid:104) δ c ij , (cid:105) ) δ a ij , . However, the links { a ij } are correlated.In Appendix C we derive the relation between the expected values of different graph observablesfor the two graph ensembles p ( a ) and p ( c ). Denoting averages in the a ensemble as (cid:104) . . . (cid:105) a , andusing the usual notation (cid:104) . . . (cid:105) for averages in the c ensemble, one finds that for large N the firsttwo moments of the degree distributions and the first two loop densities in the two ensembles areidentical: (cid:104) k (cid:105) a = 1 N (cid:88) ij (cid:104) a ij (cid:105) a = 1 N (cid:88) ij [1 − (cid:104) δ c ij , (cid:105) ] = α (cid:104) q (cid:105) + O ( N − )= (cid:104) k (cid:105) + O ( N − ) (42) (cid:104) k (cid:105) a = 1 N (cid:88) i (cid:54) = j (cid:54) = k (cid:104) a ij a jk (cid:105) = α (cid:104) q (cid:105) + α (cid:104) q (cid:105) + α (cid:104) q (cid:105) + O ( N − )= (cid:104) k (cid:105) + O ( N − ) (43) m a = 1 N (cid:88) i (cid:54) = j (cid:54) = k ( (cid:54) = i ) (cid:104) a ij a jk a ki (cid:105) = α (cid:104) q (cid:105) + O ( N − )= m + O ( N − ) (44) m a = 1 N (cid:88) [ i,j,k,(cid:96) ] (cid:104) a ij a jk a k(cid:96) a (cid:96)i (cid:105) = α (cid:104) q (cid:105) + O ( N − )= m + O ( N − ) (45) rotein interaction networks and biology: towards the connection p ( a ) and p ( c ) areasymptotically equivalent with regard to the statistics of these four quantities. We will see in thenext section that this equivalence holds also for the ‘dual’ ensemble (3). To test the above claimswe compute and show in Figure 3 the above observables in synthetic graphs c and a generatedrandomly from (10,11), where the random bipartite interaction graph ξ is drawn from (1).
4. Network properties generated by the d -ensemble In this section we will derive properties for the network ensembles (10,11) upon assuming that thestatistics of the underlying bipartite protein interaction network are given by (3), i.e. are protein-driven as opposed to complex-driven. In spite of the superficial similarity between definitions (2)and (4), the expectations of graph observables in the two ensembles are found to be remarkablydifferent.
We start by calculating the link expectation values in the weighted graphs c ij = (cid:80) µ ξ µi ξ µj : (cid:104) c ij (cid:105) = (cid:88) µ (cid:104) ξ µi ξ µj (cid:105) = d i d j αN (46)Hence the random graphs c are again finitely connected, now with (cid:104) k (cid:105) = 1 N (cid:88) ij (cid:104) c ij (cid:105) = (cid:104) d (cid:105) α (47)Averages over d refer to the distribution P ( d ) of protein promiscuities in the bipartite graph ξ .The result (47) can also be written as (cid:104) k (cid:105) = α (cid:104) q (cid:105) , and is thus notably different from the earlierexpression (cid:104) k (cid:105) = α (cid:104) q (cid:105) found in the q -ensemble. The link likelihood is calculated in Appendix A,and shows again that p ( c ij >
1) = O ( N − ). We can calculate the density of length-3 loops similar to how this was done for the q -ensemblein the previous section. Again these are given, to order O (1), by the S stars in the bi-partitegraph, since the contribution from combinations of S stars is as before O ( N − ). Here we obtain m = 1 N (cid:88) [ ijk ] (cid:88) µ (cid:104) ξ µi ξ µj ξ µk (cid:105) = 1 N (cid:88) [ ijk ] (cid:88) µ d i d j d k α N = (cid:104) d (cid:105) α (48)For loops of arbitrary length L this generalises to m L = (cid:104) d (cid:105) L /α L − (49)Interestingly, the densities m L of short loops and the average connectivity (cid:104) k (cid:105) depend on P ( d )only through its first moment. Promiscuity heterogeneity apparently cannot affect the densitiesof short loops. In the present ensemble these densities must therefore be identical to what wouldbe found in a randomly wired bipartite graph. This prediction will be confirmed in simulations. rotein interaction networks and biology: towards the connection In Appendix B we calculate the asymptotic degree distribution of c for the protein-driven complexrecruitment model (3), giving p ( k ) = lim N →∞ N (cid:88) i δ k, (cid:80) j c ij = (cid:88) d ≥ P ( d ) (cid:88) (cid:96) (cid:16) e − d d (cid:96) /(cid:96) ! (cid:17) e − (cid:96) (cid:104) d (cid:105) α (cid:32) (cid:96) (cid:104) d (cid:105) α (cid:33) k /k ! (50)This result is again understood easily: the number of neighbours of a node i is a Poissonianvariable (cid:96) , with average d , where d is now drawn from P ( d ). Each of the (cid:96) first neighbourswill have a degree which is a Poissonian variable with average (cid:104) d (cid:105) /α , so the number k of secondneighbours of i in the bipartite graph is a Poisson variable with average (cid:96) (cid:104) d (cid:105) /α . Equation (50)shows that a tail in the promiscuity distribution P ( d ) will induce a tail in the degree distribution p ( k ) of c . The link between the two distributions is again most easily expressed via generatingfunctions. Upon defining Q ( z ) = (cid:80) k p ( k )e − zk and Q ( z ) = (cid:80) d P ( d )e − zd , we obtain from (50): Q ( z ) = (cid:88) d ≥ P ( d )e − d (cid:88) (cid:96) (cid:16) d e (cid:104) d (cid:105) (e − z − /α (cid:17) (cid:96) /(cid:96) ! = Q (1 − e (cid:104) d (cid:105) (e − z − /α ) (51)For z (cid:39) Q ( z ) (cid:39) Q ( z (cid:104) d (cid:105) /α ) (52)Hence, if p ( k ) decays for large k as p ( k ) (cid:39) Ck − µ with 2 < µ <
3, then via (32) we infer that Q ( z (cid:104) d (cid:105) /α ) (cid:39) − (cid:104) k (cid:105) z + C Γ(1 − µ ) z µ − (53)Equivalently, Q ( x ) (cid:39) − α (cid:104) k (cid:105) x/ (cid:104) d (cid:105) + C Γ(1 − µ )( α/ (cid:104) d (cid:105) ) µ − x µ − (54)This implies that for large d the promiscuity distribution will be of the form P ( d ) (cid:39) C (cid:48) d − µ , where C (cid:48) = C ( α/ (cid:104) d (cid:105) ) µ − = C (cid:104) q (cid:105) − µ (55)Any tail in the promiscuity distribution will produce the same tail in the degree distribution of c ,but with a rescaled amplitude. Fat tails in the degree distribution of protein interaction networkscan thus arise from equally heterogeneous ‘dating’ interactions between proteins, combined with ahomogeneous distribution of ‘party’ interactions. Short loops are boosted by broad distributionsof complex sizes, since large complexes in the bipartite graph induce large cliques in the network c .The d -ensemble (3), which attributes any heterogeneity in p ( k ) to heterogeneity of protein bindingpromiscuities, generates separable PIN graphs c with the least number of loops. Conversely,the q -ensemble (1), which attributes all heterogeneity in p ( k ) to heterogeneity in complex sizes,generates separable PIN graphs c with the largest number of loops. P ( d ) and α The first two moments of the degree distribution p ( k ) of the separable PIN networks c are (cid:104) k (cid:105) = (cid:88) k kp ( k ) = (cid:88) d P ( d ) (cid:88) (cid:96) e − d d (cid:96) (cid:96) ! (cid:96) (cid:104) d (cid:105) α = (cid:104) d (cid:105) /α (56) rotein interaction networks and biology: towards the connection (cid:104) k (cid:105) = (cid:88) k k p ( k ) = (cid:88) d P ( d ) (cid:88) (cid:96) e − d d (cid:96) (cid:96) ! (cid:104)(cid:16) (cid:96) (cid:104) d (cid:105) α (cid:17) + (cid:96) (cid:104) d (cid:105) α (cid:105) = (cid:104) d (cid:105) /α + (cid:104) d (cid:105) /α + (cid:104) d (cid:105) (cid:104) d (cid:105) /α (57)Combination of (63), (57) and (48) now yields the relation (cid:104) d (cid:105) /α = ( (cid:104) k (cid:105) − (cid:104) k (cid:105) − m ) / (cid:104) k (cid:105) (58)which still involves (cid:104) d (cid:105) and α . We can also find an alternative expression for the density of loopsof length 3 by combining (63) and (48) m = (cid:104) k (cid:105) / / √ α (59)Unfortunately, neither of our two expressions for m , (58) nor (59), are useful, because the proteinpromiscuities distribution P ( d ) and the ratio α are generally unknown. Access to information onthese quantities via future detection experiments may therefore be extremely welcome in supportof theoretical modelling of protein interaction datasets. To make progress, we need to deriverelations for graph observables that are independent of α and P ( d ). We note that (49) yields ∀ L ≥ m L +1 /m L = (cid:104) d (cid:105) /α (60)This can be rewritten using (63), as ∀ L ≥ m L +1 /m L = (cid:113) (cid:104) k (cid:105) /α (61)On the other hand, we know from (59) that m / (cid:104) k (cid:105) = (cid:113) (cid:104) k (cid:105) /α . Combining the above formulaeallows us to establish the following relation, that now is completely independent of P ( d ) and α : m = m / (cid:104) k (cid:105) (62)Again we have tested the various formulae in synthetically generated graphs, see Figure 4. a and c graph definitions As a final step, we check whether the observables m and m are indeed the same for the two PINdefinitions (10, 11), with the bipartite graph of our protein-driven ensemble (3), since proteindetection experiments provide the binary matrix a as opposed to the weighted graph c for which(66) was derived. Again we denote averages relating to a as (cid:104) . . . (cid:105) a , and those relating to c as (cid:104) . . . (cid:105) . For the moments of the degree distributions we find the differences to be negligible: (cid:104) k (cid:105) a = 1 N (cid:88) ij (cid:104) a ij (cid:105) a = (cid:104) d (cid:105) α + O ( N − ) = (cid:104) k (cid:105) + O ( N − ) (63) (cid:104) k (cid:105) a = 1 N (cid:88) i (cid:54) = j (cid:54) = k (cid:104) a ij a jk (cid:105) = (cid:104) d (cid:105) α + (cid:104) d (cid:105) α + (cid:104) d (cid:105)(cid:104) d (cid:105) α + O ( N − ) = (cid:104) k (cid:105) + O ( N − ) (64)The same is true for the densities of loops of length 3 and 4: m a = 1 N (cid:88) i (cid:54) = j (cid:54) = k ( (cid:54) = i ) (cid:104) a ij a jk a ki (cid:105) = (cid:104) d (cid:105) α + O ( N − ) = m + O ( N − ) (65) m a = 1 N (cid:88) [ i,j,k,(cid:96) ] (cid:104) a ij a jk a k(cid:96) a (cid:96)i (cid:105) = (cid:104) d (cid:105) α + O ( N − ) = m + O ( N − ) (66) rotein interaction networks and biology: towards the connection p ( a ) and p ( c ) when calculating the main average valuesof graph observables for large N implies that large protein interaction adjacency matrices can inpractice be regarded as having a separable structure. Again, we check our relations (63, 57, 65,66), against synthetically generated graphs and show results in figure 4.5.
5. Macroscopic observables in the mixed ensemble
The two bipartite graph ensembles (1, 3) considered so far led to Poissonian distributions eitherfor the protein promiscuities d i (in the q -ensemble), or for the complex sizes q µ (in the d ensemble).It is possible to model heterogeneity in both d i and q µ using the mixed ensemble (5). Due to thesimilarities with previous calculations we can and will be more brief in this section. For ensemble(5) the expectation values of individual links in the weighted graph c are (cid:104) c ij (cid:105) = (cid:88) µ (cid:104) ξ µi ξ µj (cid:105) = (cid:88) µ d i d j q µ α (cid:104) q (cid:105) N = d i d j (cid:104) q (cid:105) α (cid:104) q (cid:105) N + O ( N − / ) (67)and the average connectivity follows as (cid:104) k (cid:105) = 1 N (cid:88) ij (cid:104) c ij (cid:105) = (cid:104) d (cid:105) (cid:104) q (cid:105) α (cid:104) q (cid:105) + O ( N − / ) = α (cid:104) q (cid:105) + O ( N − / ) (68)Full details are found in Appendix A. As in previous ensembles, the leading contribution to thedensity of length-3 loops comes from the S stars in the bipartite graphs, now giving m = 1 N (cid:88) [ ijk ] (cid:88) µ (cid:104) ξ µi ξ µj ξ µk (cid:105) = 1 N (cid:88) [ ijk ] (cid:88) µ d i d j d k q µ α (cid:104) q (cid:105) N (cid:39) (cid:104) d (cid:105) (cid:104) q (cid:105) α (cid:104) q (cid:105) = α (cid:104) q (cid:105) (69)As before, the heterogeneity in the d affects neither the average connectivity (cid:104) k (cid:105) nor the densityof triangles m , both are as they were in the q -ensemble. This is confirmed numerically, see Figure6. The degree distribution for large N in the ensemble p ( c ) is calculated in Appendix B, giving p ( k ) = (cid:90) ∞ d y P ( y ) e − y y k /k ! (70)where P ( y ) = (cid:88) d P ( d )e − d (cid:88) (cid:96) ≥ d (cid:96) (cid:96) ! (cid:88) q ...q (cid:96) ≥ W ( q ) . . . W ( q (cid:96) ) δ [ y − (cid:88) r ≤ (cid:96) q r ] (71)Again it is possible to relate the asymptotic behaviour of p ( k ) to that of P ( d ) and W ( q ), byinspecting the relation between the relevant generating functions. Using our previous definitionsfor Q ( z ) , Q ( z ) , Q ( z ), and Q ( z ), we obtain via (70) and (71): Q ( z ) = (cid:90) d y P ( y ) (cid:88) k e − y ( ye − z ) k /k ! = (cid:90) d y P ( y )e − y (1 − e − z ) = Q (1 − e − z ) (72) Q ( z ) = (cid:88) d P ( d )e − d (cid:88) (cid:96) d (cid:96) (cid:96) ! (cid:96) (cid:89) r =1 (cid:16) (cid:88) q r W ( q r )e − zq r (cid:17) = (cid:88) d P ( d )e − d (cid:88) (cid:96) d (cid:96) (cid:96) ! Q (cid:96) ( z )= (cid:88) d P ( d )e − d [1 − Q ( z )] = Q (1 − Q ( z )) (73) rotein interaction networks and biology: towards the connection z tells us that Q ( z ) (cid:39) Q ( z ). Substitution into (73) subsequently gives Q ( z ) (cid:39) Q (1 − Q ( z )) (74)Assuming W ( q ) to have a power-law tail, but with a finite first moment (as in all cases previouslyconsidered), i.e. W ( q ) (cid:39) Kq − γ with γ >
2, its generating function Q ( z ) can be written as Q ( z ) = 1 − (cid:104) q (cid:105) z/ (cid:104) q (cid:105) + O ( z δ ) (75)where δ = min { , γ − } . Insertion into (74) then leads to Q ( z ) (cid:39) Q ( z (cid:104) q (cid:105) / (cid:104) q (cid:105) − O ( z δ )) (76)If p ( k ) = Ck − µ , with 2 < µ <
3, we may use our earlier result (32) and get Q ( x − O ( x (cid:104) q (cid:105) / (cid:104) q (cid:105) ) δ ) (cid:39) − (cid:104) k (cid:105)(cid:104) q (cid:105) x/ (cid:104) q (cid:105) + C Γ(1 − µ )( (cid:104) q (cid:105) / (cid:104) q (cid:105) ) µ − x µ − (77)If γ > µ we have δ > µ −
1, so we can neglect the second term in the argument of Q and conclude that the promiscuity distribution has the asymptotic form P ( d ) = C (cid:48) d − µ where C (cid:48) = C ( (cid:104) q (cid:105) / (cid:104) q (cid:105) ) − µ . This means that if W ( q ) decays faster than p ( k ) (as in Section 4), thenthe tail in p ( k ) must arise from the tail in P ( d ). Note, however, that heterogeneities in P ( q ) willaffect the amplitude of the power law tail in P ( d ), which will be smaller by a factor ( (cid:104) q (cid:105) / (cid:104) q (cid:105) ) − µ compared to the case where P ( q ) = δ q, (cid:104) q (cid:105) , where we had C (cid:48) = C (cid:104) q (cid:105) − µ . Conversely, if γ = µ wehave δ = µ −
1, and writing the O ( z δ ) term explicitely in (76) gives Q ( z (cid:104) q (cid:105) / (cid:104) q (cid:105) − K Γ(1 − µ ) z µ − ) = 1 − (cid:104) k (cid:105) z + C Γ(1 − µ ) z µ − (78)Expanding both sides in powers of z and equating prefactors tells us that either C (cid:48) = 0and C = K (cid:104) d (cid:105) (i.e. K = C/α (cid:104) q (cid:105) , which retrieves the case in Section 3), or δ = µ with K (cid:104) d (cid:105) + C (cid:48) ( (cid:104) q (cid:105) / (cid:104) q (cid:105) ) µ − = C . Hence, if P ( d ) is as broad as W ( q ), then both contribute tothe tail in p ( k ), whose amplitude will be the sum of the amplitudes of the tails in P ( q ) and P ( d ).We see in (77) that γ < µ is not possible, i.e. W ( q ) needs to decay at least as fast as p ( k ).In Appendix B we calculate the first two moments of the degree distribution p ( k ) of theensemble p ( c ). This recovers (68) for the first moment, and for the second moment gives (cid:104) k (cid:105) = α (cid:104) q (cid:105) + α (cid:104) q (cid:105) + (cid:104) d (cid:105)(cid:104) k (cid:105) / (cid:104) d (cid:105) (79)Substituting (68) and (69) into (79) then leads to m = (cid:104) k (cid:105) − (cid:104) k (cid:105) − (cid:104) k (cid:105) (cid:104) d (cid:105) / (cid:104) d (cid:105) (80)The density of length-3 loops depends again on the first two moments of the degree distribution p ( k ), but is also seen to depend on the first two moments of the promiscuity distribution P ( d ),which is unknown. Hence, this relation cannot serve as a test of PIN data quality. It is neverthelessuseful for comparing the mixed ensemble to the d - and the q -ensembles in synthetically generateddata. rotein interaction networks and biology: towards the connection
6. Numerical comparison of the three bipartite generative ensembles
Here we compare the ability of our bipartite ensembles (1, 3, 5) to predict properties of theassociated binary PIN graphs, for synthetic networks that are generated from any of theseensembles. We focus on comparing homologous fomulae for the observables (cid:104) k (cid:105) , (cid:104) k (cid:105) , m and m .The synthetic matrices a = { a ij } with a ij ∈ { , } are defined as before via a ij = θ ( (cid:80) µ ξ µi ξ µj ), with θ (0) = 0, and the links of the bipartite graph ξ are generated from the following three protocols.In the first protocol, links between nodes ( i, µ ) are drawn randomly and independently, until theirtotal number reaches a prescribed limit. In the second protocol, we assign the links prefentiallyto complexes with large sizes. In a third protocol we assign links preferentially to proteins withlarge promiscuities.In Figure 6 we show along the vertical axes the values of (cid:104) k (cid:105) (left) predicted by the threeensembles, via formulae (37), (47) and (68), the predicted values of (cid:104) k (cid:105) (middle), via (38), (57),and (79), and the predicted triangle density m (right), via (40), (58) and (80). All are showntogether with the corresponding values that were measured in a , along the horizontal axis. Asexpected, the d -ensemble outperforms the other ensembles when links are drawn according to d -preferential attachment, whereas the q -ensemble performs better for graphs generated via q -preferential attachment. The mixed ensemble performs very similar to the q -ensemble in termsof counting triangles, as expected from the reasoning in Section 5. Deviations between the q and the mixed ensembles are most evident in the second moment of the degree distribution,where the mixed ensemble always leads to values well above those of the q - and the d -ensembles.We found in Section 4 that the d -ensemble is indistinguishable from a fully random ensemblewhen calculating (cid:104) k (cid:105) and m , which explains why the d -ensemble predicts the values of these twoobservables perfectly. The other two ensembles are more sensitive to finite size effects, as anyheterogeneity in the q will boost the number of loops.In Figure 7 we show the values of m and m predicted by those formulae that involveonly measurable graph observables, for the synthetically generated graphs used in Figure 6. Theprediction of m is now obtained from (40) and (66), for the q - and d - ensembles respectively, and m is evaluated using (41) and (66). In figure 8 we plot the degree distribution p ( k ) of graphswith identical values for the number of nodes ( N = 3000) and the number of links L = N α (cid:104) q (cid:105) ,generated synthetically via the three chosen protocols, together with the distributions P ( q ) ofcomplex sizes and P ( d ) of protein promiscuities. As explained in Section 5, tails in the degreedistribution p ( k ) ∼ k − µ can arise either from a complex size distribution P ( q ) ∼ q − µ − anda homogeneous promiscuity distribution, or from having an equally fat tail in the promiscuitydistribution P ( d ) ∼ d − µ together with less heterogeneous complex sizes P ( q ) ∼ q − α − with α > µ .
7. Test against experimental protein interaction data
In this section we apply the results of our analyses to real publicly available protein interactiondatasets, obtained via MS (mass spectrometry) and Y2H (yeast 2-hybrid) experiments. Thedetailed quantitative features of the various data sets and their references are listed in Table 7. rotein interaction networks and biology: towards the connection N (cid:104) k (cid:105) k max Method Reference
C.elegans
C.jejuni
E.coli
H.pylori
724 3.87 55 Y2H [27]
H.sapiens
I 1499 3.37 125 Y2H [28]
H.sapiens
II 1655 3.71 95 Y2H [29]
H.sapiens
III 2268 5.67 314 MS [30]
M.loti
P.falciparum
S.cerevisiae
I 991 1.82 24 Y2hH [33]
S.cerevisiae
II 787 1.91 55 Y2H [34]
S.cerevisiae
III 3241 2.69 279 Y2H [34]
S.cerevisiae
IV 1576 4.58 62 MS [35]
S.cerevisiae
VI 1358 4.73 53 MS [36]
S.cerevisiae
VIII 2551 16.77 955 MS [37]
S.cerevisiae
IX 2708 5.25 141 MS [38]
Synechocystis
T.pallidum
724 10.01 285 Y2H [40]
Table 1.
List of the publicly available experimental protein interaction data sets as used inthe present study, together with their main quantitative characteristics (number of proteins N ,average degree (cid:104) k (cid:105) , and largest degree k max ) and references. Seven of the experimental PIN datasets in Table 7 were obtained by MS experiments, and theyinvolved three distinct biological species, namely
S. cerevisiae , H.sapiens and
E.coli . Each settakes the form of an N × N matrix of binary entries a ij , but with different values of N .In Figure 9 we show the results of our analytical predictions for the densities of length-3and length-4 loops, as given by the formulae for the bipartite q - and d -ensembles, versus theirmeasured values in the MS datasets. The q -ensemble leads to values of the number of short loopsconsistently higher than those predicted by the d -ensemble. This could have been expected, sincethe q -ensemble induces large cliques in the protein interaction networks c and a , which boostsshort loops. In contrast, the d -ensemble induces a homogeneous distribution for the complexsizes, and thereby suppresses the presence of large cliques in the protein interaction networks.Remarkably, the values for lenght-4 loop densities of all the MS data sets are in betweenthose of the d -ensemble (which thereby acts as a lower bound) and those of the q -ensemble(which acts as an upper bound). This suggests a compatibility of data from MS experiments withthe expected separable form of the proteome network. However, the measured length-3 densitiesare consistently lower than the values compatible with a separable structure of the proteome. rotein interaction networks and biology: towards the connection We tested similarly the compatibility of Y2H data with a separable structure of the proteome, bychecking whether the measured values for the network observables m and m fall within whatappeared to be (in MS data) theoretical bounds set by the q - and d -ensembles. We now usedthe 12 PIN datasets in Table 7 that were obtained from Y2H experiments. Results are shown inFigure 10. We observe that Y2H datasets exhibit generally fewer short loops than MS dataset.This may be due to the fact that Y2H experiments mostly detect direct binding domain contactsin protein interactions, leading to an undersampling of links (and thereby to an underestimationof connectivity and loops). However, Y2H data sets still show the same level of compatibilitywith a separable structure of the proteome as the MS datasets did, with measured values of m that are fully compatible, and values for m that fall below those predicted by the d -ensemble.This is quite remarkable, since MS and Y2H experiments are known to measure interactions invery different ways.
8. Conclusions
In this paper we propose a bipartite network representation of protein interactions, where the twonode types represent proteins and complexes, respectively. A protein-protein interaction networkcan then be regarded as the result of a ‘marginalization’ of the bipartite network, whereby thecomplexes are integrated out (i.e. summed over). This leads to a weighted protein interactionnetwork c with a separable structure. Adjacency matrices of protein interaction networks a arethen simply the binary versions of the separable c , obtained by the entry truncations a ij = θ ( c ij ),with the convention θ (0) = 0. One of the central results of this work is that for sufficientlylarge networks there is an equivalence between the two graph ensembles p ( c ) and p ( a ), inasmuchas macroscopic statistical properties are concerned, such as densities of short loops and degreedistributions. This allows us to regard the conventional protein interaction adjacency matricesas if they were to have a separable structure, and induces precise relations between expectationvalues of macroscopic graph observables which, remarkably, only depend on measurable quantitiesand on the underlying mechanism with which proteins and complexes recruit each other. Theyare independent of inaccessible microscopic details of proteins and their complexes.We considered the two extreme complex recruitment scenarios, one where recruitment iseither driven solely by protein promiscuities, and one where it is driven by complex sizes.Preferential attachment to large complexes (the q -ensemble) favours the presence of large cliquesin PINs, which boosts the number of short loops. Hence we can reasonably expect that thepredictions on short loop densities from the q -ensemble will over-estimate the real number of loops.Conversely, preferential attachment based only on protein promiscuities (the d -ensemble) leads tohomogeneous complex sizes, which suppresses large cliques in PINs, leading to an underestimationof short loop densities. Remarkably, real protein interaction data from mass-spectronomy andyeast 2-hybrid experiments show a density of length-4 loops in between the predictions of the d -ensemble and those of the q -ensemble, suggesting a degree of compatibility of these experimentaldata with a separable structure of the proteome. In contrast, both MS and Y2H dataset show rotein interaction networks and biology: towards the connection α , and the distributions of protein promiscuities and complexsizes. Such quantities are not available in the current PIN data sets, and are difficult to accessexperimentally. The present work has revealed that the asymptotic forms of these distributionscan be extracted from the tails of the PIN degree distributions. Finally, our method my shedsome light on the way protein and complexes recruit one another, in particular, whether thisrecruitment is driven by proteins or by complexes, and may enable us to discriminate between‘party hub’ and ‘date hub’ interactions.
9. Aknowledgements
AA aknowledges Alessandro Pandini and Sun Chung for providing protein interaction datasets.Kate Roberts is aknowledged for interesting discussions during the early stages of this work.ACCC is grateful for support from the UK’s Biotechnology and Biological Sciences ResearchCouncil (BBSRC) .
10. References [1] Hakes L, Pinney JW, Robertson DL, Lovel SC 2008
Nature Biotechnology Nature Biotechnology , 839–844.[3] De Silva E, Thorne T, Ingram P, Agrafioti I, Swire J, Wiuf C and Stumpf MPH 2006 BMC Biology , 39.[4] Fernandes LP, Annibale A, Kleinjung J, Coolen ACC and Fraternali F 2010 PLoS ONE , e12083.[5] Lee SH, Kim PJ and Jeong H 2006 Phys. Rev. E , 016102.[6] Stumpf MPH and Wiuf C 2005 Phys. Rev. E , 036118.[7] Stumpf MPH, Wiuf C and May RM 2005 Proc. Natl. Sci. USA , 4221–4224.[8] Viger F, Barrat A, Dall’ Asta L, Zhang CH and Kolaczyk ED 2007
Phys. Rev. E , 056111.[9] Solokov IM and Eliazar II 2010 Phys. Rev. E , 026107.[10] Annibale A, Coolen ACC 2011 Interface Focus 2011 (6) .[11] Newman, Strogatz, Watts Phys. Rev. E arXiv: arXiv: Science
Scientific Reports Reviews of Modern Physics Science
Evolution of networks (Oxford: Oxford University Press)[21] Abramowitz M and Stegun IA 1972
Handbook of mathematical functions (New York: Dover)[22] Junker B H and Schreiber F 2008
Analysis of biological networks (New York: Wiley Series on Bioinformatics)[23] Ivanic J, Wallqvist A and Reifman J 2008
PLoS Computational Biology rotein interaction networks and biology: towards the connection [24] Arifuzzaman M et al. Genome Res et al. Nature Methods (1):47-54[26] Parrish J R te al. Genome Biology (7)[27] Rain J C et al. Nature (6817):211-215[28] Rual J-F F et al.
Nature (7062):1173-1178[29] Stelzl U et al.
Cell (6):957-968[30] Ewing R M et al.
Molecular systems biology :89[31] Shimoda Y, Shinpo S, Kohara M, Nakamura Y, Tabata S and Sato S 2008 DNA Res (1):13-23[32] Lacount D J et al. Nature (7064):103-107[33] Uetz P et al.
Nature (6770):623-627[34] Ito T et al.
Proc Natl Acad Sci U S A (8):4569-4574[35] Ho Y et al. Nature (6868):180-183[36] Gavin A C et al.
Nature (6868):141147[37] Gavin A C C et al.
Nature (7084):631-636[38] Krogan N J et al.
Nature (7084):637-643[39] Sato S, Shimoda Y, Muraki A, Kohara M, Nakamura Y and Tabata S 2007
DNA Res et al.
PLoS ONE (5) Appendix A. Link probabilities in the weighted protein interaction network
In this appendix we derive the likelihood to have a link in the weighted protein interaction network c ij = (cid:80) µ ξ µi ξ µj , when the ξ µi are drawn from the ensembles (1,3,5). Appendix A.1. The q ensemble In the q -ensemble we have p ( c ij ) = (cid:68) δ c ij , (cid:80) µ ≤ αN ξ µi ξ µj (cid:69) ξ = (cid:90) π − π d ω π e i ωc ij αN (cid:89) µ =1 (cid:68) e − i ωξ µi ξ µj (cid:69) ξ = (cid:90) π − π d ω π e i ωc ij αN (cid:89) µ =1 (cid:110) q µ N e − i ω + (1 − q µ N ) (cid:111) = (cid:90) π − π d ω π e i ωc ij + (cid:80) αNµ =1 q µN [e − i ω − − (cid:80) αNµ =1 q µN [e − i ω − = (cid:90) π − π d ω π e i ωc ij (cid:34) α (cid:104) q (cid:105) N (e − i ω − − α (cid:104) q (cid:105) N (e − i ω − + α (cid:104) q (cid:105) N (e − i ω − + α (cid:104) q (cid:105) N (e − i ω − + O ( N − ) (cid:35) = δ c ij , + α (cid:104) q (cid:105) N ( δ c ij , − δ c ij , ) + (cid:32) α (cid:104) q (cid:105) N − α (cid:104) q (cid:105) N (cid:33) ( δ c ij , − δ c ij , + δ c ij , )+ α (cid:104) q (cid:105) N ( δ c ij , − δ c ij , + 3 δ c ij , − δ c ij , ) + O ( N − ) (A.1)From this one reads off directly the values of p ( c ij = 0), p ( c ij = 1) and p ( c ij ≥ m = ( N − N − αN (cid:88) µνρ =1 (cid:104) ξ µ ξ ν (cid:105)(cid:104) ξ ν ξ ρ (cid:105)(cid:104) ξ ρ ξ µ (cid:105) (A.2)and using (cid:104) ξ µ ξ ν (cid:105) = (cid:104) ξ µ (cid:105)(cid:104) ξ ν (cid:105) + δ µν (cid:104) ξ µ (cid:105) (1 − (cid:104) ξ µ (cid:105) ) = q µ q ν N + δ µν q µ N (1 − q µ N ) (A.3) rotein interaction networks and biology: towards the connection m = 1 N (cid:104) O ( 1 N ) (cid:105) αN (cid:88) µνρ =1 q µ q ν q ρ (cid:104) q ν N + δ µν (1 − q ν N ) (cid:105)(cid:104) q ρ N + δ νρ (1 − q ρ N ) (cid:105)(cid:104) q µ N + δ ρµ (1 − q µ N ) (cid:105) = 1 N (1+ O ( 1 N )) αN (cid:88) µνρ =1 q µ q ν q ρ (cid:110) q µ q ν q ρ N + 3 δ µν q ρ q µ N (1 − q µ N ) + 3 δ µν δ νρ q µ N (1 − q µ N ) + δ µν δ νρ δ ρµ (1 − q µ N ) (cid:111) = 1 N αN (cid:88) µ =1 (1 − q µ N ) q µ + O ( 1 N ) = α (cid:104) q (cid:105) + O ( N − ) (A.4) Appendix A.2. The d -ensemble In the d -ensemble we obtain p ( c ij ) = (cid:104) δ c ij , (cid:80) µ ξ µi ξ µj (cid:105) = (cid:90) π − π d ω π e i ωc ij + didjαN (e − i ω − − d i d j ( αN )3 (e − i ω − = (cid:90) π − π d ω π e i ωc ij (cid:104) d i d j αN (e − i ω −
1) + 12 (cid:32) d i d j αN (cid:33) (e − i ω − −
12 ( d i d j ) ( αN ) (e − i ω − + 16 (cid:32) d i d j αN (cid:33) (e − i ω − + . . . (cid:105) (A.5)which gives p ( c ij = 0) = 1 − d i d j αN + 12 (cid:32) d i d j αN (cid:33) − (cid:32) d i d j αN (cid:33) − d i d j ( αN ) p ( c ij = 1) = d i d j αN − (cid:32) d i d j αN (cid:33) + 12 (cid:32) d i d j αN (cid:33) + d i d j ( αN ) p ( c ij ≥
2) = O ( N − ) (A.6) Appendix A.3. The mixed ensemble
For the mixed ensemble, the link likelihood is found to be p ( c ij ) = (cid:104) δ c ij , (cid:80) µ ξ µi ξ µj (cid:105) = (cid:90) π − π d ω π e i ωc ij + (cid:80) µ didjq µα (cid:104) q (cid:105) N (e − i ω − = (cid:90) π − π d ω π e i ωc ij + didj (cid:104) q (cid:105) α (cid:104) q (cid:105) N (e − i ω − = e − didj (cid:104) q (cid:105) α (cid:104) q (cid:105) N (cid:90) π − π d ω π e i ωc ij (cid:104) d i d j (cid:104) q (cid:105) αN (cid:104) q (cid:105) e − i ω + 12 (cid:16) d i d j (cid:104) q (cid:105) αN (cid:104) q (cid:105) (cid:17) e − ω + . . . (cid:105) (A.7)giving p ( c ij = 0) = 1 − d i d j (cid:104) q (cid:105) α (cid:104) q (cid:105) N + 12 (cid:32) d i d j (cid:104) q (cid:105) α (cid:104) q (cid:105) N (cid:33) + O ( N − ) p ( c ij = 1) = d i d j (cid:104) q (cid:105) α (cid:104) q (cid:105) N (cid:32) − d i d j (cid:104) q (cid:105) α (cid:104) q (cid:105) N (cid:33) + O ( N − ) p ( c ij ≥
2) = O ( N − ) (A.8) rotein interaction networks and biology: towards the connection Appendix B. Calculation of the degree distribution p ( k )In this appendix we calculate the degree distribution of the weighted protein interaction network c ij = (cid:80) µ ξ µi ξ µj , in which the entries ξ µi are drawn from the bipartite ensembles (1,3,5), respectively. Appendix B.1. The q -ensemble In the q -ensemble, we can calculate p ( k ) as follows: p ( k ) = (cid:90) π − π d ω π e i ωk (cid:68) N (cid:88) i e − i ω (cid:80) j c ij (cid:69) ξ = (cid:90) π − π d ω π e i ωk (cid:68) e − i ω (cid:80) j> c j (cid:69) ξ = (cid:90) π − π d ω π e i ωk (cid:68) e − i ω (cid:80) µ ξ µ (cid:80) j> ξ µj (cid:69) ξ = (cid:90) π − π d ω π e i ωk (cid:89) µ (cid:68) e − i ωξ µ (cid:80) j> ξ µj (cid:69) ξ = (cid:90) π − π d ω π e i ωk (cid:89) µ (cid:110) q µ N (cid:104)(cid:68) e − i ωξ µ (cid:69) N − ξ µ − (cid:105)(cid:111) = (cid:90) π − π d ω π e i ωk (cid:89) µ (cid:110) q µ N (cid:104)(cid:16) q µ N (e − i ω − (cid:17) N − − (cid:105)(cid:111) = (cid:90) π − π d ω π e i ωk + (cid:80) µ qµN (cid:104) exp[ q µ (e − i ω − − (cid:105) + O ( N − ) = (cid:90) π − π d ω π e i ωk + α (cid:68) q (cid:104) exp[ q (e − i ω − − (cid:105)(cid:69) + O ( N − )= e − α (cid:104) q (cid:105) (cid:90) π − π d ω π e i ωk + α (cid:68) q e − q exp[ q e − i ω ] (cid:69) + O ( N − )= e − α (cid:104) q (cid:105) (cid:88) (cid:96) ≥ α (cid:96) (cid:96) ! (cid:90) π − π d ω π e i ωk (cid:68) q e − q exp[ q e − i ω ] (cid:69) (cid:96) + O ( N − )= e − α (cid:104) q (cid:105) (cid:88) (cid:96) ≥ α (cid:96) (cid:96) ! (cid:68) (cid:89) r ≤ (cid:96) ( q r e − q r ) (cid:90) π − π d ω π e i ωk e e − i ω (cid:80) r ≤ (cid:96) q r (cid:69) q ...q (cid:96) + O ( N − )= e − α (cid:104) q (cid:105) (cid:88) (cid:96) ≥ α (cid:96) (cid:96) ! (cid:68) (cid:89) r ≤ (cid:96) ( q r e − q r ) (cid:88) s ≥ ( (cid:80) r ≤ (cid:96) q r ) s s ! (cid:90) π − π d ω π e i ωk − iωs (cid:69) q ...q (cid:96) + O ( N − )= e − α (cid:104) q (cid:105) (cid:88) (cid:96) ≥ α (cid:96) (cid:96) ! (cid:68) (cid:89) r ≤ (cid:96) ( q r e − q r ) ( (cid:80) r ≤ (cid:96) q r ) k k ! (cid:69) q ...q (cid:96) + O ( N − ) (B.1)Hence, for large network sizes N → ∞ we obtainlim N →∞ p ( k ) = e − α (cid:104) q (cid:105) (cid:88) (cid:96) ≥ α (cid:96) (cid:96) ! (cid:68) (cid:89) r ≤ (cid:96) q r e − (cid:80) r ≤ (cid:96) q r ( (cid:80) r ≤ (cid:96) q r ) k k ! (cid:69) q ...q (cid:96) = e − α (cid:104) q (cid:105) (cid:88) (cid:96) ≥ α (cid:96) (cid:96) ! (cid:88) q ...q (cid:96) ≥ p ( q ) . . . p ( q (cid:96) ) q . . . q (cid:96) e − (cid:80) r ≤ (cid:96) q r ( (cid:80) r ≤ (cid:96) q r ) k k ! (B.2)We can rewrite this in terms of the distribution W ( q ) = qP ( q ) / (cid:104) q (cid:105) , which denotes the likelihoodto draw a link attached to a node of degree q in the bi-partite graph,lim N →∞ p ( k ) = e − α (cid:104) q (cid:105) (cid:88) (cid:96) ≥ ( α (cid:104) q (cid:105) ) (cid:96) (cid:96) ! (cid:88) q ...q (cid:96) ≥ W ( q ) . . . W ( q (cid:96) )e − (cid:80) r ≤ (cid:96) q r ( (cid:80) r ≤ (cid:96) q r ) k k ! (B.3) rotein interaction networks and biology: towards the connection P ( y ) = e − α (cid:104) q (cid:105) (cid:88) (cid:96) ≥ ( α (cid:104) q (cid:105) ) (cid:96) (cid:96) ! (cid:88) q ...q (cid:96) ≥ W ( q ) . . . W ( q (cid:96) ) δ [ y − (cid:88) r ≤ (cid:96) q r ] (B.4)we finally get tolim N →∞ p ( k ) = (cid:90) ∞ d y P ( y ) e − y y k /k ! (B.5)The interpretation is that if we draw (cid:96) from a Poisson distribution with (cid:104) (cid:96) (cid:105) = α (cid:104) q (cid:105) , and thendraw (cid:96) variables q r from W ( q r ), we find k as a Poissonian variable with (cid:104) k (cid:105) = (cid:80) r ≤ (cid:96) q r . Clearly p ( k ) is normalised, and for its first moment we find: (cid:104) k (cid:105) = (cid:90) ∞ d y P ( y ) y = e − α (cid:104) q (cid:105) (cid:88) (cid:96) ≥ ( α (cid:104) q (cid:105) ) (cid:96) (cid:96) ! (cid:88) q ...q (cid:96) ≥ W ( q ) . . . W ( q (cid:96) ) (cid:88) r ≤ (cid:96) q r = e − α (cid:104) q (cid:105) (cid:88) (cid:96)> ( α (cid:104) q (cid:105) ) (cid:96) ( (cid:96) − (cid:88) q W ( q ) q = α (cid:104) q (cid:105) (B.6)For the second moment we obtain (cid:104) k (cid:105) = (cid:104) k (cid:105) + (cid:90) ∞ d y P ( y ) y = α (cid:104) q (cid:105) + e − α (cid:104) q (cid:105) (cid:88) (cid:96) ≥ ( α (cid:104) q (cid:105) ) (cid:96) (cid:96) ! (cid:88) q ...q (cid:96) ≥ W ( q ) . . . W ( q (cid:96) ) (cid:88) r,s ≤ (cid:96) q r q s = α (cid:104) q (cid:105) + e − α (cid:104) q (cid:105) (cid:16) (cid:88) q W ( q ) q (cid:17) (cid:88) (cid:96)> ( α (cid:104) q (cid:105) ) (cid:96) (cid:96) ! (cid:96) + e − α (cid:104) q (cid:105) (cid:104) (cid:88) q W ( q ) q − (cid:16) (cid:88) q W ( q ) q (cid:17) (cid:105) (cid:88) (cid:96)> ( α (cid:104) q (cid:105) ) (cid:96) ( (cid:96) − α (cid:104) q (cid:105) + e − α (cid:104) q (cid:105) (cid:88) (cid:96) ≥ ( α (cid:104) q (cid:105) ) (cid:96) (cid:96) ! (cid:88) q ...q (cid:96) ≥ W ( q ) . . . W ( q (cid:96) ) (cid:88) r,s ≤ (cid:96) q r q s = α (cid:104) q (cid:105) + α (cid:104) (cid:104) q (cid:105) − (cid:104) q (cid:105) (cid:104) q (cid:105) (cid:105) + (cid:104) q (cid:105) (cid:104) q (cid:105) e − α (cid:104) q (cid:105) (cid:88) (cid:96)> ( α (cid:104) q (cid:105) ) (cid:96) (cid:96) ! (cid:96) = α (cid:104) q (cid:105) + α (cid:104) (cid:104) q (cid:105) − (cid:104) q (cid:105) (cid:104) q (cid:105) (cid:105) + (cid:104) q (cid:105) (cid:104) q (cid:105) (cid:104) α (cid:104) q (cid:105) + α (cid:104) q (cid:105) (cid:105) = α (cid:104) q (cid:105) + α (cid:104) q (cid:105) + α (cid:104) q (cid:105) (B.7)This is in agreement with results from a direct calculation: (cid:104) k (cid:105) = 1 N (cid:88) i (cid:54) = j (cid:54) = k (cid:104) c ij c k(cid:96) (cid:105) = 1 N (cid:88) i (cid:54) = j (cid:104) c ij c ji (cid:105) + 1 N (cid:88) [ ijk ] (cid:104) c ij c jk (cid:105) = 1 N (cid:88) i (cid:54) = j (cid:88) µν (cid:104) ξ µi ξ µj ξ νi ξ νj (cid:105) + 1 N (cid:88) [ ijk ] (cid:88) µν (cid:104) ξ µi ξ µj ξ νj ξ νk (cid:105) = 1 N (cid:88) i (cid:54) = j (cid:88) µ (cid:104) ξ µi ξ µj (cid:105) + 1 N (cid:88) [ ijk ] (cid:88) µ (cid:54) = ν (cid:104) ξ µi ξ µj (cid:105)(cid:104) ξ νj ξ νk (cid:105) + 1 N (cid:88) [ ijk ] (cid:88) µ (cid:104) ξ µi ξ µj ξ νk (cid:105) + O ( N − )= 1 N (cid:88) i (cid:54) = j (cid:88) µ q µ N + 1 N (cid:88) [ ijk ] (cid:88) µ (cid:54) = ν q µ N q ν N + 1 N (cid:88) [ ijk ] (cid:88) µ q µ N + O ( N − )= α (cid:104) q (cid:105) + ( α (cid:104) q (cid:105) ) + α (cid:104) q (cid:105) + O ( N − ) = (cid:104) k (cid:105) + (cid:104) k (cid:105) + α (cid:104) q (cid:105) + O ( N − ) (B.8) rotein interaction networks and biology: towards the connection Appendix B.2. The d -ensemble We can calculate the asymptotic degree distribution in the d -ensemble as follows p ( k ) = lim N →∞ N (cid:88) i (cid:104) δ k, (cid:80) j c ij (cid:105) ξ = 1 N (cid:88) i (cid:90) π − π d ω π e i ωk (cid:104) e − i ω (cid:80) µ ξ µ i (cid:80) j ξ µ j (cid:105) ξ = lim N →∞ N (cid:88) i (cid:90) π − π d ω π e iωk (cid:89) µ (cid:104) d i αN (cid:16) (cid:89) j (cid:104) e − i ωξ µj (cid:105) − (cid:17)(cid:105) = lim N →∞ N (cid:88) i (cid:90) π − π d ω π e i ωk (cid:89) µ (cid:104) d i αN (cid:16) e (cid:104) d (cid:105) α (e − i ω − − (cid:17)(cid:105) = lim N →∞ N (cid:88) i (cid:90) π − π d ω π e i ωk + d i (cid:16) e (cid:104) d (cid:105) α (e − i ω − − (cid:17) = (cid:88) d P ( d ) (cid:90) π − π d ω π e iωk + d (cid:16) e (cid:104) d (cid:105) α (e − i ω − − (cid:17) = (cid:88) d P ( d )e − d (cid:88) (cid:96) d (cid:96) (cid:96) ! e − (cid:96) (cid:104) d (cid:105) α (cid:90) π − π d ω π e i ωk + (cid:96) (cid:104) d (cid:105) α e − i ω = (cid:88) d P ( d ) (cid:88) (cid:96) e − d d (cid:96) (cid:96) ! e − (cid:96) (cid:104) d (cid:105) α (cid:16) (cid:96) (cid:104) d (cid:105) α (cid:17) k k ! (B.9) Appendix B.3. The mixed ensemble
In the mixed ensemble we have the asymptotic degree distribution p ( k ) = lim N →∞ N (cid:88) i (cid:104) δ k, (cid:80) j c ij (cid:105) ξ = 1 N (cid:88) i (cid:90) π − π d ω π e i ωk (cid:104) e − i ω (cid:80) µ ξ µi (cid:80) j ξ µj (cid:105) ξ = lim N →∞ N (cid:88) i (cid:90) π − π d ω π e i ωk (cid:89) µ (cid:104) d i q µ α (cid:104) q (cid:105) N (cid:16) (cid:89) j (cid:104) e − i ωξ µj (cid:105) − (cid:17)(cid:105) = lim N →∞ N (cid:88) i (cid:90) π − π d ω π e i ωk (cid:89) µ (cid:104) d i q µ α (cid:104) q (cid:105) N (cid:16) e q µ (e − i ω − − (cid:17)(cid:105) = lim N →∞ N (cid:88) i (cid:90) π − π d ω π e i ωk + (cid:80) µ diqµα (cid:104) q (cid:105) N (cid:16) e qµ (e − i ω − − (cid:17) = (cid:88) d P ( d ) (cid:90) π − π d ω π e i ωk + d (cid:104) q (cid:105) (cid:104) q (e q (e − i ω − − (cid:105) q = (cid:88) d P ( d )e − d (cid:90) π − π d ω π e i ωk + d (cid:104) q (cid:105) (cid:104) q e − q exp[ q e − i ω ] (cid:105) q = (cid:88) d P ( d )e − d (cid:88) (cid:96) ≥ ( d/ (cid:104) q (cid:105) ) (cid:96) (cid:96) ! (cid:90) π − π d ω π e i ωk (cid:104) q e − q exp[ q e − i ω ] (cid:105) (cid:96)q = (cid:88) d P ( d )e − d (cid:88) (cid:96) ≥ d (cid:96) (cid:96) ! (cid:68) (cid:89) r ≤ (cid:96) (cid:32) q r e − q r (cid:104) q (cid:105) (cid:33) ( (cid:80) r ≤ (cid:96) q r ) k k ! (cid:69) q ...q (cid:96) (B.10)We can rewrite this expression in terms of the associated distribution W ( q ) = qP ( q ) / (cid:104) q (cid:105) as: p ( k ) = (cid:88) d P ( d )e − d (cid:88) (cid:96) ≥ d (cid:96) (cid:96) ! (cid:68) (cid:89) r ≤ (cid:96) (cid:16) q r e − q r (cid:104) q (cid:105) (cid:17) ( (cid:80) r ≤ (cid:96) q r ) k k ! (cid:69) q ...q (cid:96) rotein interaction networks and biology: towards the connection (cid:88) d P ( d )e − d (cid:88) (cid:96) ≥ d (cid:96) (cid:96) ! (cid:88) q ...q (cid:96) ≥ W ( q ) . . . W ( q (cid:96) )e − (cid:80) r ≤ (cid:96) q r ( (cid:80) r ≤ (cid:96) q r ) k k ! (B.11)or, equivalently, as p ( k ) = (cid:90) ∞ d y P ( y ) e − y y k /k ! (B.12)where P ( y ) = (cid:88) d P ( d )e − d (cid:88) (cid:96) ≥ d (cid:96) (cid:96) ! (cid:88) q ...q (cid:96) ≥ W ( q ) . . . W ( q (cid:96) ) δ [ y − (cid:88) r ≤ (cid:96) q r ] (B.13)The first two moments of p ( k ) are (cid:104) k (cid:105) = (cid:90) ∞ d y P ( y ) y = (cid:88) d P ( d )e − d (cid:88) (cid:96) ≥ d (cid:96) (cid:96) ! (cid:88) q ...q (cid:96) ≥ W ( q ) . . . W ( q (cid:96) ) (cid:88) r ≤ (cid:96) q r = (cid:88) d P ( d )e − d (cid:88) (cid:96)> d (cid:96) ( (cid:96) − (cid:88) q W ( q ) q = (cid:104) d (cid:105) (cid:104) q (cid:105)(cid:104) q (cid:105) = α (cid:104) q (cid:105) (B.14) (cid:104) k (cid:105) = (cid:104) k (cid:105) + (cid:90) ∞ d y P ( y ) y = α (cid:104) q (cid:105) + (cid:88) d P ( d ) e − d (cid:88) (cid:96) ≥ d (cid:96) (cid:96) ! (cid:88) q ...q (cid:96) ≥ W ( q ) . . . W ( q (cid:96) ) (cid:88) r,s ≤ (cid:96) q r q s = α (cid:104) q (cid:105) + (cid:88) d P ( d )e − d (cid:88) (cid:96)> d (cid:96) (cid:96) ! (cid:104) (cid:96) (cid:88) q W ( q ) q + (cid:96) ( (cid:96) − (cid:16) (cid:88) q W ( q ) q (cid:17) (cid:105) = α (cid:104) q (cid:105) + (cid:104) q (cid:105)(cid:104) q (cid:105) (cid:104) d (cid:105) + (cid:104) d (cid:105) (cid:104) q (cid:105) (cid:104) q (cid:105) = α (cid:104) q (cid:105) + α (cid:104) q (cid:105) + (cid:104) d (cid:105)(cid:104) d (cid:105) (cid:104) k (cid:105) (B.15) Appendix C. The link between observables in the a and c networks In this appendix we inspect the relation between expectation values of various observables in theensembles p ( a ) and p ( c ). Appendix C.1. The q -ensemble Denoting averages in the a ensemble as (cid:104) . . . (cid:105) a , we have, for the q -ensemble of bipartite graphs: (cid:104) k (cid:105) a = 1 N (cid:88) ij (cid:104) a ij (cid:105) a = 1 N (cid:88) ij (cid:104) θ [ c ij −
12 ] (cid:105) = 1 N (cid:88) ij [1 − (cid:104) δ c ij , (cid:105) ] = α (cid:104) q (cid:105) + O ( N − ) = (cid:104) k (cid:105) + O ( N − ) (C.1) (cid:104) k (cid:105) a = 1 N (cid:88) i (cid:54) = j (cid:54) = k (cid:104) a ij a jk (cid:105) = 1 N (cid:88) ij (cid:104) a ij (cid:105) + 1 N (cid:88) i (cid:54) = j (cid:54) = k ( (cid:54) = i ) (cid:104) a ij a jk (cid:105) = 1 N (cid:88) ij (cid:104) (1 − δ c ij , ) (cid:105) + 1 N (cid:88) i (cid:54) = j (cid:54) = k ( (cid:54) = i ) (cid:104) (1 − δ c ij , )(1 − δ c jk , ) (cid:105) = 1 N (cid:88) ij α (cid:104) q (cid:105) N + 1 N (cid:88) i (cid:54) = j (cid:54) = k ( (cid:54) = i ) (1 − (cid:104) δ c ij , (cid:105) + (cid:104) δ c ij , δ c jk , (cid:105) ) rotein interaction networks and biology: towards the connection
26= ( N − N − − N − N − (cid:16) − α (cid:104) q (cid:105) N + α (cid:104) q (cid:105) N (cid:17) + ( N − N − (cid:16) − α (cid:104) q (cid:105) N + α (cid:104) q (cid:105) N + 2 α (cid:104) q (cid:105) N (cid:17) + α (cid:104) q (cid:105) = α (cid:104) q (cid:105) + α (cid:104) q (cid:105) + α (cid:104) q (cid:105) ≡ (cid:104) k (cid:105) (C.2)where we used 1 N ( N − N − (cid:88) i (cid:54) = j (cid:54) = k ( (cid:54) = i ) (cid:104) δ c ij , δ c jk , (cid:105) = 1 N ( N − N − (cid:88) i (cid:54) = j (cid:54) = k ( (cid:54) = i ) (cid:90) π − π d ω d ω (cid:48) π (cid:89) µ (cid:104) e i ξ µj ( ξ µi ω + ξ µk ω (cid:48) ) (cid:105) = (cid:90) π − π d ω d ω (cid:48) π (cid:89) µ (cid:110) q µ N (cid:104) (e i ω + e i ω (cid:48) −
2) + q µ N (e i( ω + ω (cid:48) ) − e i ω − e i ω (cid:48) + 1) (cid:105)(cid:111) = (cid:90) π − π d ω d ω (cid:48) π e α (cid:104) q (cid:105) N (e i ω +e i ω (cid:48) − α (cid:104) q (cid:105) N (e i( ω + ω (cid:48) ) − e i ω − e i ω (cid:48) +1) − α (cid:104) q (cid:105) N (e i ω +e i ω (cid:48) − = 1 − α (cid:104) q (cid:105) N + α (cid:104) q (cid:105) N + 2 α (cid:104) q (cid:105) N − α (cid:104) q (cid:105) N − α (cid:104) q (cid:105)(cid:104) q (cid:105) N − α (cid:104) q (cid:105) N (C.3)For loops of length 3 we proceed in the same way, obtaining m a = 1 N (cid:88) i (cid:54) = j (cid:54) = k ( (cid:54) = i ) (cid:104) a ij a jk a ki (cid:105) = 1 N (cid:88) i (cid:54) = j (cid:54) = k ( (cid:54) = i ) (cid:104) (1 − δ c ij , )(1 − δ c jk , )(1 − δ c ki , ) (cid:105) = 1 N (cid:88) i (cid:54) = j (cid:54) = k ( (cid:54) = i ) (1 − (cid:104) δ c ij , (cid:105) + 3 (cid:104) δ c ij , δ c jk , (cid:105) − (cid:104) δ c ij , δ c jk , δ c ki , ) (cid:105) = ( N − N − − N − N − (cid:16) − α (cid:104) q (cid:105) N + 2 α (cid:104) q (cid:105) N (cid:17) + 3( N − N − (cid:16) − α (cid:104) q (cid:105) N + α (cid:104) q (cid:105) N + 2 α (cid:104) q (cid:105) N (cid:17) − ( N − N − (cid:16) − α (cid:104) q (cid:105) N + 2 α (cid:104) q (cid:105) N + 92 α (cid:104) q (cid:105) N (cid:17) = α (cid:104) q (cid:105) ≡ m c (C.4)where we used1 N ( N − N − (cid:88) i (cid:54) = j (cid:54) = k ( (cid:54) = i ) (cid:104) δ c ij , δ c jk , δ c ki , (cid:105) = 1 N ( N − N − (cid:88) i (cid:54) = j (cid:54) = k ( (cid:54) = i ) (cid:90) π − π d ω d ω (cid:48) d ω (cid:48)(cid:48) π (cid:89) µ (cid:104) e i ξ µi ( ξ µj ω + ξ µk ω (cid:48)(cid:48) )+ iξ µj ξ µk ω (cid:48) (cid:105) = (cid:90) π − π d ω d ω (cid:48) d ω (cid:48)(cid:48) π (cid:89) µ (cid:110) q µ N (cid:104) (e i ω +e i ω (cid:48) +e i ω (cid:48)(cid:48) − q µ N (e i( ω + ω (cid:48) + ω (cid:48)(cid:48) ) − e i ω − e i ω (cid:48) − e i ω (cid:48)(cid:48) + 2) (cid:105)(cid:111) = (cid:90) π − π d ω d ω (cid:48) d ω (cid:48)(cid:48) π e (cid:80) µ q µN (e i ω +e i ω (cid:48) +e i ω (cid:48)(cid:48) − (cid:80) µ q µN (e i( ω + ω (cid:48) + ω (cid:48)(cid:48) ) − e i ω − e i ω (cid:48) − e i ω (cid:48)(cid:48) +2) − (cid:80) µ q µ N (e i ω +e i ω (cid:48) +e i ω (cid:48)(cid:48) − = (cid:90) π − π d ω d ω (cid:48) d ω (cid:48)(cid:48) π e α (cid:104) q (cid:105) N (e i ω +e i ω (cid:48) +e i ω (cid:48)(cid:48) − α (cid:104) q (cid:105) N (e i( ω + ω (cid:48) + ω (cid:48)(cid:48) ) − e i ω − e i ω (cid:48) − e i ω (cid:48)(cid:48) +2) − α (cid:104) q (cid:105) N (e i ω +e i ω (cid:48) +e i ω (cid:48)(cid:48) − = 1 − α (cid:104) q (cid:105) N + 2 α (cid:104) q (cid:105) N + 92 α (cid:104) q (cid:105) N − α (cid:104) q (cid:105) N − α (cid:104) q (cid:105)(cid:104) q (cid:105) N − α (cid:104) q (cid:105) N (C.5) rotein interaction networks and biology: towards the connection m a = 1 N (cid:88) [ i,j,k,(cid:96) ] (cid:104) a ij a jk a k(cid:96) a (cid:96)i (cid:105) = 1 N (cid:88) [ i,j,k,(cid:96) ] (cid:104) (1 − δ c ij , )(1 − δ c jk , )(1 − δ c k(cid:96) , )(1 − δ c (cid:96)i , ) (cid:105) = 1 N (cid:88) [ i,j,k,(cid:96) ] (1 − (cid:104) δ c ij , (cid:105) + 4 (cid:104) δ c ij , δ c jk , (cid:105) + 2 (cid:104) δ c ij , (cid:105)(cid:104) δ c jk , (cid:105) − (cid:104) δ c ij , δ c jk , δ c k(cid:96) , ) (cid:105) + (cid:104) δ c ij , δ c jk , δ c k(cid:96) , δ c (cid:96)i , ) (cid:105) = ( N − N − N − (cid:40) − (cid:16) − α (cid:104) q (cid:105) N + α (cid:104) q (cid:105) N − α (cid:104) q (cid:105) N − α (cid:104) q (cid:105) N (cid:17) + 4 (cid:16) − α (cid:104) q (cid:105) N + α (cid:104) q (cid:105) N + 2 α (cid:104) q (cid:105) N − α (cid:104) q (cid:105) N − α (cid:104) q (cid:105) N − α (cid:104) q (cid:105)(cid:104) q (cid:105) N (cid:17) + 2 (cid:16) − α (cid:104) q (cid:105) N + α (cid:104) q (cid:105) N − α (cid:104) q (cid:105) N − α (cid:104) q (cid:105) N (cid:17) − (cid:16) − α (cid:104) q (cid:105) N + 2 α (cid:104) q (cid:105) N + 92 α (cid:104) q (cid:105) N − α (cid:104) q (cid:105) N − α (cid:104) q (cid:105) N − α (cid:104) q (cid:105)(cid:104) q (cid:105) N (cid:17) + (cid:16) − α (cid:104) q (cid:105) N + 4 α (cid:104) q (cid:105) N − αN (cid:104) q (cid:105) + 8 α (cid:104) q (cid:105) N − α (cid:104) q (cid:105)(cid:104) q (cid:105) N − α (cid:104) q (cid:105) N (cid:17)(cid:41) = α (cid:104) q (cid:105) ≡ m c (C.6)where we used 1 N ( N − N − N − (cid:88) i (cid:54) = j (cid:54) = k ( (cid:54) = i ) (cid:104) δ c ij , δ c jk , δ c k(cid:96) , (cid:105) = 1 N ( N − N − N − (cid:88) i (cid:54) = j (cid:54) = k ( (cid:54) = i ) (cid:90) π − π d ω d ω (cid:48) d ω (cid:48)(cid:48) π (cid:89) µ (cid:104) e i ξ µj ( ξ µi ω + ξ µk ω (cid:48) )+ iξ µ(cid:96) ξ µk ω (cid:48)(cid:48) (cid:105) = (cid:90) π − π d ω d ω (cid:48) d ω (cid:48)(cid:48) π (cid:89) µ (cid:110) q µ N (cid:104) (e i ω + e i ω (cid:48) + e i ω (cid:48)(cid:48) −
3) + q µ N (e i ω (cid:48) − i ω + e i ω (cid:48)(cid:48) − q µ N e i ω (cid:48) (e i ω − i ω (cid:48)(cid:48) − (cid:105)(cid:111) = (cid:90) π − π d ω d ω (cid:48) d ω (cid:48)(cid:48) π e (cid:80) µ q µN (e i ω +e i ω (cid:48) +e i ω (cid:48)(cid:48) − (cid:80) µ q µN (e i ω (cid:48) − i ω +e i ω (cid:48)(cid:48) − − (cid:80) µ q µ N (e i ω +e i ω (cid:48) +e i ω (cid:48)(cid:48) − × e q µN e iω (cid:48) ( e iω − e iω (cid:48)(cid:48) − = (cid:90) π − π d ω d ω (cid:48) d ω (cid:48)(cid:48) π e α (cid:104) q (cid:105) N (e i ω +e i ω (cid:48) +e i ω (cid:48)(cid:48) − α (cid:104) q (cid:105) N (e i ω (cid:48) − i ω +e i ω (cid:48)(cid:48) − − α (cid:104) q (cid:105) N (e i ω +e i ω (cid:48) +e i ω (cid:48)(cid:48) − × e α (cid:104) q (cid:105) N e i ω (cid:48) (e i ω − i ω (cid:48)(cid:48) − = 1 − α (cid:104) q (cid:105) N + 2 α (cid:104) q (cid:105) N + 92 α (cid:104) q (cid:105) N − α (cid:104) q (cid:105) N − α (cid:104) q (cid:105)(cid:104) q (cid:105) N − α (cid:104) q (cid:105) N (C.7)and 1 N ( N − N − N − (cid:88) [ ijk(cid:96) ] (cid:104) δ c ij , δ c jk , δ c k(cid:96) , δ c (cid:96)i , (cid:105) rotein interaction networks and biology: towards the connection
28= 1 N ( N − N − N − (cid:88) [ ijk(cid:96) ] (cid:90) π − π d ω d ω (cid:48) d ω (cid:48)(cid:48) d ω (cid:48)(cid:48)(cid:48) π (cid:89) µ (cid:104) e i ξ µi ( ξ µj ω + ξ µ(cid:96) ω (cid:48)(cid:48)(cid:48) )+ ξ µk ( ξ µj ω (cid:48) + ξ µ(cid:96) ω (cid:48)(cid:48) ) (cid:105) = (cid:90) π − π d ω d ω (cid:48) d ω (cid:48)(cid:48) d ω (cid:48)(cid:48)(cid:48) π (cid:89) µ (cid:40)(cid:18) − q µ N (cid:19) (cid:40) q µ N (cid:34) q µ N e i( ω + ω (cid:48) ) + q µ N (cid:18) − q µ N (cid:19) (e i ω (cid:48) +e i ω (cid:48)(cid:48) ) + (cid:18) − q µ N (cid:19) (cid:35) + (cid:18) − q µ N (cid:19)(cid:27) + q µ N (cid:40) q µ N e i( ω + ω (cid:48)(cid:48)(cid:48) ) (cid:18) − q µ N + q µ N e i( ω (cid:48) + ω (cid:48)(cid:48) ) (cid:19) + q µ N (cid:18) − q µ N (cid:19) (cid:20) e i ω (cid:18) − q µ N + q µ N e i ω (cid:48) (cid:19) + e i ω (cid:48)(cid:48)(cid:48) (cid:18) − q µ N + q µ N e i ω (cid:48)(cid:48) (cid:19)(cid:21) + (cid:18) − q µ N (cid:19) (cid:41) = (cid:89) µ (cid:40) q µ N (cid:18) − q µ N (cid:19) + (cid:18) − q µ N (cid:19) (cid:34) q µ N (cid:18) − q µ N (cid:19) + (cid:18) − q µ N (cid:19)(cid:35)(cid:41) = (cid:89) µ (cid:40) − q µ N + 4 q µ N − q µ N (cid:41) = e − α (cid:104) q (cid:105) N +4 α (cid:104) q (cid:105) N − α (cid:104) q (cid:105) N + O ( N − ) = 1 − α (cid:104) q (cid:105) N + 4 α (cid:104) q (cid:105) N + 8 α (cid:104) q (cid:105) N − α (cid:104) q (cid:105) N − α (cid:104) q (cid:105)(cid:104) q (cid:105) N − α (cid:104) q (cid:105) N + O ( N − ) (C.8)Again, the square brackets underneath the summations indicate that all indices are different, toexclude backtracking in the counting of loops of length 4. Appendix C.2. The d -ensemble For the d -ensemble, denoting averages relating to a as (cid:104) . . . (cid:105) a , we have: (cid:104) k (cid:105) a = 1 N (cid:88) ij (cid:104) a ij (cid:105) a = 1 N (cid:88) ij [1 − (cid:104) δ c ij , (cid:105) ] == 1 N (cid:88) ij d i d j αN − (cid:32) d i d j αN (cid:33) + 16 (cid:32) d i d j αN (cid:33) + 12 d i d j ( αN ) = (cid:104) d (cid:105) α + O ( N − ) = (cid:104) k (cid:105) + O ( N − ) (C.9) (cid:104) k (cid:105) a = 1 N (cid:88) i (cid:54) = j (cid:54) = k (cid:104) a ij a jk (cid:105) = 1 N (cid:88) ij (cid:104) a ij (cid:105) + 1 N (cid:88) i (cid:54) = j (cid:54) = k ( (cid:54) = i ) (cid:104) a ij a jk (cid:105) = (cid:104) d (cid:105) α + 1 N (cid:88) i (cid:54) = j (cid:54) = k ( (cid:54) = i ) (cid:104) (1 − δ c ij , )(1 − δ c jk , ) (cid:105) = (cid:104) d (cid:105) α + 1 N (cid:88) i (cid:54) = j (cid:54) = k ( (cid:54) = i ) (1 − (cid:104) δ c ij , (cid:105) + (cid:104) δ c ij , δ c jk , (cid:105) )= (cid:104) d (cid:105) α + ( N − N − − N (cid:88) [ ijk ] − d i d j αN + 12 (cid:32) d i d j αN (cid:33) + 1 N (cid:88) [ ijk ] (cid:104) δ c ij , δ c jk , (cid:105) = (cid:104) d (cid:105) α + 2 (cid:104) d (cid:105) α N − (cid:104) d (cid:105) α − (cid:104) d (cid:105) α − N (cid:104) d (cid:105) α + 2 (cid:104) d (cid:105) α + (cid:104) d (cid:105) α + (cid:104) d (cid:105) α − (cid:104) d (cid:105)(cid:104) d (cid:105) α = (cid:104) d (cid:105) α + (cid:104) d (cid:105) α + (cid:104) d (cid:105)(cid:104) d (cid:105) α ≡ (cid:104) k (cid:105) (C.10) rotein interaction networks and biology: towards the connection N (cid:88) i (cid:54) = j (cid:54) = k ( (cid:54) = i ) (cid:104) δ c ij , δ c jk , (cid:105) = 1 N (cid:88) i (cid:54) = j (cid:54) = k ( (cid:54) = i ) (cid:90) π − π d ω d ω (cid:48) π (cid:89) µ (cid:104) e i ξ µj ( ξ µi ω + ξ µk ω (cid:48) ) (cid:105) = 1 N (cid:88) [ ijk ] (cid:90) π − π d ω d ω (cid:48) π (cid:89) µ (cid:40) d j αN (cid:34) d i αN (e i ω −
1) + d k αN (e i ω (cid:48) −
1) + d i d k ( αN ) (e i( ω + ω (cid:48) ) − e i ω − e i ω (cid:48) + 1) (cid:35)(cid:41) = 1 N (cid:88) [ ijk ] (cid:90) π − π d ω d ω (cid:48) π (cid:40) d j αN (cid:34) d i (e i ω −
1) + d k (e i ω (cid:48) −
1) + d i d k αN (e i( ω + ω (cid:48) ) − e i ω − e i ω (cid:48) + 1) (cid:35) + 12 (cid:32) d j αN [ d i (e i ω −
1) + d k (e i ω (cid:48) − (cid:33) − d i d j d k ( αN ) ( d i + d k ) = 1 N (cid:88) [ ijk ] d j αN (cid:34) − d i − d k + d i d k αN (cid:35) + 12 (cid:32) d j αN (cid:33) ( d i + d k + 2 d i d k ) − d i d j d k ( αN ) − d i d j d k ( αN ) = ( N − N − − N (cid:104) d (cid:105) α + 2 (cid:104) d (cid:105) α + (cid:104) d (cid:105) α + (cid:104) d (cid:105) α + (cid:104) d (cid:105)(cid:104) d (cid:105) α (C.11)For loops of length 3 we have: m a = 1 N (cid:88) i (cid:54) = j (cid:54) = k ( (cid:54) = i ) (cid:104) a ij a jk a ki (cid:105) = 1 N (cid:88) i (cid:54) = j (cid:54) = k ( (cid:54) = i ) (cid:104) (1 − δ c ij , )(1 − δ c jk , )(1 − δ c ki , ) (cid:105) = 1 N (cid:88) [ ijk ] (1 − (cid:104) δ c ij , (cid:105) + 3 (cid:104) δ c ij , δ c jk , (cid:105) − (cid:104) δ c ij , δ c jk , δ c ki , ) (cid:105) = ( N − N − − N (cid:88) [ ijk ] − d i d j αN + 12 (cid:32) d i d j αN (cid:33) + 3 (cid:34) ( N − N − − N (cid:104) d (cid:105) α + 2 (cid:104) d (cid:105) α + (cid:104) d (cid:105) α + (cid:104) d (cid:105) α − (cid:104) d (cid:105)(cid:104) d (cid:105) α (cid:35) − ( N − N −
2) + 3 N (cid:104) d (cid:105) α − (cid:104) d (cid:105) α − (cid:104) d (cid:105) α − (cid:104) d (cid:105) α − (cid:104) d (cid:105)(cid:104) d (cid:105) α = 3 (cid:104) d (cid:105) α N − (cid:104) d (cid:105) α − (cid:104) d (cid:105) α + 3 (cid:34) − N (cid:104) d (cid:105) α + 2 (cid:104) d (cid:105) α + (cid:104) d (cid:105) α + (cid:104) d (cid:105) α + (cid:104) d (cid:105)(cid:104) d (cid:105) α (cid:35) + 3 N (cid:104) d (cid:105) α − (cid:104) d (cid:105) α − (cid:104) d (cid:105) α − (cid:104) d (cid:105) α − (cid:104) d (cid:105)(cid:104) d (cid:105) α = (cid:104) d (cid:105) α ≡ m c (C.12)where we used1 N (cid:88) [ ijk ] (cid:104) δ c ij , δ c jk , δ c ki , (cid:105) = 1 N (cid:88) [ ijk ] (cid:90) π − π d ω d ω (cid:48) d ω (cid:48)(cid:48) π (cid:89) µ (cid:104) e i ξ µi ( ξ µj ω + ξ µk ω (cid:48)(cid:48) )+ iξ µj ξ µk ω (cid:48) (cid:105) = 1 N (cid:88) [ ijk ] (cid:90) π − π d ω d ω (cid:48) d ω (cid:48)(cid:48) π (cid:89) µ (cid:40) d i αN (cid:104) e i( ξ µj ω + ξ µk ω (cid:48)(cid:48) + ξ µj ξ µk ω (cid:48) (cid:105) + (cid:32) − d i αN (cid:33) (cid:104) e i ξ µj ξ µk ω (cid:48) (cid:105) (cid:41) = 1 N (cid:88) [ ijk ] (cid:90) π − π d ω d ω (cid:48) d ω (cid:48)(cid:48) π (cid:32) d j d k ( αN ) (e i ω (cid:48) −
1) + d i αN (cid:40) − − d j d k ( αN ) (e i ω (cid:48) − rotein interaction networks and biology: towards the connection d j αN e i ω (cid:34) d k αN (e i( ω (cid:48)(cid:48) + ω (cid:48) ) − (cid:35) + (cid:32) − d j αN (cid:33) (cid:34) d k αN (e i ω (cid:48)(cid:48) − (cid:35)(cid:41)(cid:33) αN = 1 N (cid:88) [ ijk ] (cid:32) − d j d k ( αN ) + d i αN (cid:40) − d j αN − d k αN + 2 d j d k ( αN ) (cid:41)(cid:33) αN = 1 N (cid:88) [ ijk ] − d j d k αN + d i αN (cid:40) − d j − d k + 2 d j d k αN (cid:41) + 12 (cid:32) d i αN (cid:33) ( d j + d k + 2 d j d k )+ 12 d j d k ( αN ) + d i d j d k ( αN ) ( d j + d k ) (cid:35) = ( N − N − − N (cid:104) d (cid:105) α + 3 (cid:104) d (cid:105) α + 2 (cid:104) d (cid:105) α + 32 (cid:104) d (cid:105) α + 3 (cid:104) d (cid:105)(cid:104) d (cid:105) α (C.13)Finally, for loops of length 4 we have m a = 1 N (cid:88) [ i,j,k,(cid:96) ] (cid:104) a ij a jk a k(cid:96) a (cid:96)i (cid:105) = 1 N (cid:88) [ i,j,k,(cid:96) ] (cid:104) (1 − δ c ij , )(1 − δ c jk , )(1 − δ c k(cid:96) , )(1 − δ c (cid:96)i , ) (cid:105) = 1 N (cid:88) [ i,j,k,(cid:96) ] (1 − (cid:104) δ c ij , (cid:105) + 4 (cid:104) δ c ij , δ c jk , (cid:105) + 2 (cid:104) δ c ij , (cid:105)(cid:104) δ c k(cid:96) , (cid:105) − (cid:104) δ c ij , δ c jk , δ c k(cid:96) , ) (cid:105) + (cid:104) δ c ij , δ c jk , δ c k(cid:96) , δ c (cid:96)i , ) (cid:105) = ( N − N − N − − (cid:34) ( N − N − N − − N (cid:104) d (cid:105) α + 12 N (cid:104) d (cid:105) α − − (cid:104) d (cid:105) α − (cid:104) d (cid:105) α (cid:35) + 4 (cid:34) ( N − N − N − − N (cid:104) d (cid:105) α + N (cid:104) d (cid:105) α − (cid:104) d (cid:105) α − (cid:104) d (cid:105)(cid:104) d (cid:105) α + N (cid:104) d (cid:105) α + N (cid:104) d (cid:105)(cid:104) d (cid:105) α − (cid:104) d (cid:105) (cid:104) d (cid:105) α − (cid:104) d (cid:105) α − (cid:104) d (cid:105)(cid:104) d (cid:105)(cid:104) d (cid:105) α (cid:35) + 2 (cid:34) ( N − N − N − − N (cid:104) d (cid:105) α + N (cid:104) d (cid:105) α + N (cid:104) d (cid:105) α − (cid:104) d (cid:105) α − (cid:104) d (cid:105) α − (cid:104) d (cid:105) (cid:104) d (cid:105) α (cid:35) − (cid:34) ( N − N − N − − N (cid:104) d (cid:105) α + 2 N (cid:104) d (cid:105) α − (cid:104) d (cid:105) α − (cid:104) d (cid:105)(cid:104) d (cid:105) α + 32 N (cid:104) d (cid:105) α − (cid:104) d (cid:105) α + 2 N (cid:104) d (cid:105)(cid:104) d (cid:105) α + N (cid:104) d (cid:105) α − (cid:104) d (cid:105) (cid:104) d (cid:105) α − (cid:104) d (cid:105)(cid:104) d (cid:105) α − (cid:104) d (cid:105) α − (cid:104) d (cid:105)(cid:104) d (cid:105)(cid:104) d (cid:105) α − (cid:104) d (cid:105) (cid:104) d (cid:105) α (cid:35) + ( N − N − N − − N (cid:104) d (cid:105) α + 4 N (cid:104) d (cid:105) α − (cid:104) d (cid:105) α − (cid:104) d (cid:105)(cid:104) d (cid:105) α + 2 N (cid:104) d (cid:105) α − (cid:104) d (cid:105) α + 4 N (cid:104) d (cid:105)(cid:104) d (cid:105) α + 2 N (cid:104) d (cid:105) α − (cid:104) d (cid:105) (cid:104) d (cid:105) α − (cid:104) d (cid:105)(cid:104) d (cid:105) α − (cid:104) d (cid:105) α − (cid:104) d (cid:105)(cid:104) d (cid:105)(cid:104) d (cid:105) α − (cid:104) d (cid:105) (cid:104) d (cid:105) α rotein interaction networks and biology: towards the connection (cid:104) d (cid:105) α ≡ m c (C.14)where we used1 N (cid:88) [ ijk(cid:96) ] (cid:104) δ c ij , δ c jk , δ c k(cid:96) , (cid:105) = 1 N (cid:88) [ ijk(cid:96) ] (cid:90) π − π d ω d ω (cid:48) d ω (cid:48)(cid:48) π (cid:89) µ (cid:104) e i ξ µj ( ξ µi ω + ξ µk ω (cid:48) )+ ξ µk ξ µ(cid:96) ω (cid:48)(cid:48) (cid:105) = 1 N (cid:88) [ ijk(cid:96) ] (cid:90) π − π d ω d ω (cid:48) d ω (cid:48)(cid:48) π (cid:89) µ (cid:40) d j αN (cid:104) e i( ξ µi ω + ξ µk ω (cid:48) )+ iξ µk ξ µ(cid:96) ω (cid:48)(cid:48) (cid:105) + (cid:32) − d j αN (cid:33) (cid:104) e iξ µk ξ µ(cid:96) ω (cid:48)(cid:48) (cid:105) (cid:41) = 1 N (cid:88) [ ijk(cid:96) ] (cid:90) π − π d ω d ω (cid:48) d ω (cid:48)(cid:48) π (cid:32) d i d j ( αN ) (e i ω −
1) + d j d k ( αN ) ( e iω (cid:48) −
1) + d k d (cid:96) ( αN ) (e i ω (cid:48)(cid:48) − d i d j d k ( αN ) (e i( ω + ω (cid:48) ) − e i ω − e i ω (cid:48) + 1) + d j d k d (cid:96) ( αN ) (e i( ω (cid:48) + ω (cid:48)(cid:48) ) − e i ω (cid:48) − e i ω (cid:48)(cid:48) + 1)+ d i d j d k d (cid:96) ( αN ) (e i( ω + ω (cid:48) + ω (cid:48)(cid:48) ) − e i( ω + ω (cid:48) ) − e i( ω + ω (cid:48)(cid:48) ) + e i ω (cid:48) ) (cid:33) αN = 1 N (cid:88) [ ijk(cid:96) ] (cid:40) − d i d j αN − d j d k αN − d k d (cid:96) αN + d i d j d k ( αN ) + d j d k d (cid:96) ( αN ) − (cid:34) d i d j ( αN ) + d j d k ( αN ) + d k d (cid:96) ( αN ) + 2 d i d j d k ( αN ) + 2 d i d j d k d (cid:96) ( αN ) + 2 d j d k d (cid:96) ( αN ) (cid:35) + 12 (cid:34) d i d j ( αN ) + d j d k ( αN ) + d k d (cid:96) ( αN ) + 2 d i d j d k ( αN ) + 2 d i d j d k d (cid:96) ( αN ) + 2 d j d k d (cid:96) ( αN ) − d i d j d k ( αN ) − d i d j d k d (cid:96) ( αN ) − d i d j d k ( αN ) − d i d j d k d (cid:96) ( αN ) − d j d k d (cid:96) ( αN ) − d j d k d (cid:96) ( αN ) (cid:35) − (cid:34) d i d j ( αN ) + d j d k ( αN ) + d k d (cid:96) ( αN ) + 3 d i d j d k ( αN ) + 3 d i d j d k ( αN ) + 3 d i d j d k d (cid:96) ( αN ) + 3 d i d j d k d (cid:96) ( αN ) + 3 d j d k d (cid:96) ( αN ) + 3 d j d k d (cid:96) ( αN ) (cid:35)(cid:41) = ( N − N − N − − N (cid:104) d (cid:105) α + 2 N (cid:104) d (cid:105) α − (cid:104) d (cid:105) α − (cid:104) d (cid:105)(cid:104) d (cid:105) α − (cid:104) d (cid:105) α N (cid:104) d (cid:105) α + 2 N (cid:104) d (cid:105)(cid:104) d (cid:105) α + N (cid:104) d (cid:105) α − (cid:104) d (cid:105) (cid:104) d (cid:105) α − (cid:104) d (cid:105)(cid:104) d (cid:105) α − (cid:104) d (cid:105) α − (cid:104) d (cid:105)(cid:104) d (cid:105)(cid:104) d (cid:105) α − (cid:104) d (cid:105) (cid:104) d (cid:105) α (C.15)and 1 N (cid:88) [ ijk(cid:96) ] (cid:104) δ c ij , δ c jk , δ c k(cid:96) , δ c (cid:96)i , (cid:105) = 1 N (cid:88) [ ijk(cid:96) ] (cid:90) π − π d ω d ω (cid:48) d ω (cid:48)(cid:48) d ω (cid:48)(cid:48)(cid:48) π (cid:89) µ (cid:104) e i ξ µi ( ξ µj ω + ξ µ(cid:96) ω (cid:48)(cid:48)(cid:48) )+ ξ µk ( ξ µj ω (cid:48) + ξ µ(cid:96) ω (cid:48)(cid:48) ) (cid:105) = 1 N (cid:88) [ ijk(cid:96) ] (cid:90) π − π d ω d ω (cid:48) d ω (cid:48)(cid:48) d ω (cid:48)(cid:48)(cid:48) π (cid:89) µ (cid:40) d i αN (cid:104) e i( ξ µj ω + ξ µ(cid:96) ω (cid:48)(cid:48)(cid:48) )+ iξ µk ( ξ µj ω (cid:48) + ξ µ(cid:96) ω (cid:48)(cid:48) ) (cid:105) + (cid:32) − d i αN (cid:33) (cid:104) e i ξ µk ( ξ µj ω (cid:48) + ξ µ(cid:96) ω (cid:48)(cid:48) ) (cid:105) (cid:41) = 1 N (cid:88) [ ijk(cid:96) ] (cid:90) π − π d ω d ω (cid:48) d ω (cid:48)(cid:48) d ω (cid:48)(cid:48)(cid:48) π (cid:32) d i d j ( αN ) (e i ω −
1) + d j d k ( αN ) (e i ω (cid:48) −
1) + d k d (cid:96) ( αN ) (e i ω (cid:48)(cid:48) − d i d (cid:96) ( αN ) (e i ω (cid:48)(cid:48)(cid:48) −
1) + d i d j d k ( αN ) (e i( ω + ω (cid:48) ) − e i ω − e i ω (cid:48) + 1) + d i d k d (cid:96) ( αN ) (e i( ω (cid:48)(cid:48) + ω (cid:48)(cid:48)(cid:48) ) − e i ω (cid:48)(cid:48) − e i ω (cid:48)(cid:48)(cid:48) + 1) rotein interaction networks and biology: towards the connection d i d j d (cid:96) ( αN ) (e i( ω + ω (cid:48)(cid:48)(cid:48) ) − e i ω − e i ω (cid:48)(cid:48)(cid:48) + 1) + d j d k d (cid:96) ( αN ) (e i( ω (cid:48) + ω (cid:48)(cid:48) ) − e i ω (cid:48) − e i ω (cid:48)(cid:48) + 1)+ d i d j d k d (cid:96) ( αN ) (e i( ω + ω (cid:48) + ω (cid:48)(cid:48) + ω (cid:48)(cid:48)(cid:48) ) − e i( ω + ω (cid:48) ) − e i( ω (cid:48)(cid:48) + ω (cid:48)(cid:48)(cid:48) ) − e i( ω + ω (cid:48)(cid:48)(cid:48) ) − e i( ω (cid:48) + ω (cid:48)(cid:48) ) +e i ω + e i ω (cid:48) + e i ω (cid:48)(cid:48) + e i ω (cid:48)(cid:48)(cid:48) − (cid:17) αN = 1 N (cid:88) [ ijk(cid:96) ] (cid:40) − d i d j αN − d j d k αN − d k d (cid:96) αN − d i d (cid:96) αN + d i d j d k ( αN ) + d i d k d (cid:96) ( αN ) + d i d j d (cid:96) ( αN ) + d j d k d (cid:96) ( αN ) − d i d j d k d (cid:96) ( αN ) − (cid:34) d i d j ( αN ) + d j d k ( αN ) + d k d (cid:96) ( αN ) + d i d (cid:96) ( αN ) + 2 d i d j d k ( αN ) + 2 d i d j d k d (cid:96) ( αN ) + 2 d i d j d (cid:96) ( αN ) +2 d j d k d (cid:96) ( αN ) + 2 d i d j d k d (cid:96) ( αN ) + 2 d i d k d (cid:96) ( αN ) (cid:35) + 12 (cid:34) d i d j ( αN ) + d j d k ( αN ) + d k d (cid:96) ( αN ) + d i d (cid:96) ( αN ) + 2 d i d j d k ( αN ) +2 d i d j d k d (cid:96) ( αN ) + 2 d i d j d (cid:96) ( αN ) + 2 d j d k d (cid:96) ( αN ) + 2 d i d j d k d (cid:96) ( αN ) + 2 d i d k d (cid:96) ( αN ) − d i d j d k ( αN ) − d i d j d (cid:96) ( αN ) − d i d j d k d (cid:96) ( αN ) − d i d j d k d (cid:96) ( αN ) − d i d j d k ( αN ) − d i d j d k d (cid:96) ( αN ) − d i d j d k d (cid:96) ( αN ) − d j d k d (cid:96) ( αN ) − d i d j d k d (cid:96) ( αN ) − d i d k d (cid:96) ( αN ) − d i d j d k d (cid:96) ( αN ) − d j d k d (cid:96) ( αN ) − d i d j d k d (cid:96) ( αN ) − d i d k d (cid:96) ( αN ) − d i d j d (cid:96) ( αN ) − d i d j d k d (cid:96) ( αN ) (cid:35) − (cid:34) d i d j ( αN ) + d j d k ( αN ) + d k d (cid:96) ( αN ) + d i d (cid:96) ( αN ) + 3 d i d j d k ( αN ) + 3 d i d j d k ( αN ) + 3 d i d j d k d (cid:96) ( αN ) + 3 d i d j d k d (cid:96) ( αN ) + 3 d i d j d (cid:96) ( αN ) + 3 d i d j d (cid:96) ( αN ) + 3 d j d k d (cid:96) ( αN ) + 3 d j d k d (cid:96) ( αN ) + 3 d i d j d k d (cid:96) ( αN ) + 3 d i d j d k d (cid:96) ( αN ) +3 d i d k d (cid:96) ( αN ) + 3 d i d k d (cid:96) ( αN ) (cid:35)(cid:41) = ( N − N − N − − N (cid:104) d (cid:105) α + 4 N (cid:104) d (cid:105) α − (cid:104) d (cid:105) α − (cid:104) d (cid:105) α − (cid:104) d (cid:105)(cid:104) d (cid:105) α + 2 N (cid:104) d (cid:105) α − (cid:104) d (cid:105) α + 4 N (cid:104) d (cid:105)(cid:104) d (cid:105) α + 2 N (cid:104) d (cid:105) α − (cid:104) d (cid:105) (cid:104) d (cid:105) α − (cid:104) d (cid:105)(cid:104) d (cid:105) α − (cid:104) d (cid:105) α − (cid:104) d (cid:105)(cid:104) d (cid:105)(cid:104) d (cid:105) α − (cid:104) d (cid:105) (cid:104) d (cid:105) α (C.16) rotein interaction networks and biology: towards the connection Figure 3.
Symbols: (cid:104) k (cid:105) , (cid:104) k (cid:105) , m and m as measured in synthetic graphs c drawn from (11)with N = 3000, shown versus corresponding values found in the binary graphs a drawn from (10).Bipartite interaction graphs ξ are drawn from (1), with complex size distributions P ( q ) that arePoissonian (left panels) or power law (right panels). Dotted lines: the diagonals (shown as guidesto the eye). As expected, the values measured in the weighted graphs c are consistently higherthan in the binary ones, but one finds that these deviations get smaller for increasing networksizes N . rotein interaction networks and biology: towards the connection Figure 4.
Symbols: theoretical (cid:104) . . . (cid:105) th versus measured (cid:104) . . . (cid:105) m values of observables (cid:104) k (cid:105) , (cid:104) k (cid:105) , m and m in synthetic random graphs c with N = 3000, defined via (1,11) for a power-lawdistributed promiscuity distribution P ( d ). Theoretical values are given by formulae (63) for (cid:104) k (cid:105) ,(57) for (cid:104) k (cid:105) , (48), (59) and (66) for m and (49) and (66) for m . Dotted lines: the diagonals(shown as guides to the eye). rotein interaction networks and biology: towards the connection Figure 5.
Symbols: (cid:104) k (cid:105) , (cid:104) k (cid:105) , m and m as measured in synthetic graphs c drawn from (11)with N = 3000, shown versus corresponding values found in the binary graphs a drawn from(10). Bipartite interaction graphs ξ are drawn from (3), with protein promiscuity distributions P ( d ) that have a power law form. Dotted line: the diagonals (shown as guides to the eye). Asexpected, the values measured in the weighted graphs c are consistently higher than in the binaryones, but these deviations get smaller for increasing network sizes N . rotein interaction networks and biology: towards the connection Figure 6.
Symbols: theoretical (cid:104) . . . (cid:105) th versus measured (cid:104) . . . (cid:105) m values of observables (cid:104) k (cid:105) , (cid:104) k (cid:105) , and m in synthetic random graphs a with N = 3000 and and α = 0 .
5, generatedeither via random wiring (top panels), q -preferential attachment (middle panels) or d -preferentialattachment (bottom panels). Dotted lines: the diagonals (shown as guides to the eye). rotein interaction networks and biology: towards the connection Figure 7.
Predicted versus real m (left) and m (right) for random bi-partite graphs with N = 3000 and α = 0 . q, k, d q, k, d q, k, d Figure 8.
Distributions P ( q ) of complex sizes, P ( d ) or protein promiscuities, and p ( k ) of thedegrees in a (distinguished by markers whom in the panel legends), for random bi-partite graphswith N = 3000, α = 0 . (cid:104) q (cid:105) = 4 .
8, which have been generated either via random wiring (left),via q -preferential attachment (middle), or via d -preferential attachment (right). rotein interaction networks and biology: towards the connection m m cerevisia10cerevisiae4cerevisiae6cerevisiae8cerevisiae9e.colisapiens3 m m Figure 9.
Left: theoretical predictions m for the densities of length-3 loops in the PINs,as obtained from the q -ensemble (stars) and the d -ensemble (circles), plotted versus the values m measured in the different MS datasets. Right: theoretical predictions m for the densitiesof length-4 loops in the same PINs, obtained from the q -ensemble (stars) and the d -ensemble(circles), plotted versus the measured values m . The diagonals are shown as guides to the eye. m m m m Figure 10.
Left: theoretical predictions m for the densities of length-3 loops in the PINs,as obtained from the q -ensemble (stars) and the d -ensemble (circles), plotted versus the values m measured in the different Y2H datasets. Right: theoretical predictions m for the densitiesof length-4 loops in the same PINs, obtained from the q -ensemble (stars) and the d -ensemble(circles), plotted versus the measured values m4m