Motif Conservation Laws for the Configuration Model
MMotif Conservation Laws for the Configuration Model
Anatol E. Wegner
Max Planck Institute for Mathematics in the Sciences, Inselstr. 22, Leipzig-Germany ∗ The observation that some subgraphs, called motifs, appear more often in real networks thanin their randomized counterparts has attracted much attention in the scientific community. In theprevalent approach the detection of motifs is based on comparing subgraph counts in a network withtheir counterparts in the configuration model with the same degree distribution as the network. Inthis short note we derive conservation laws that relate motif counts in the configuration model anddiscuss their consequences.
INTRODUCTION
Motif identification [1, 2] has become a widely usedmethod in network analysis. The prevalent approach tomotif analysis is due to Milo et al. and is based on com-paring subgraph counts of motifs in the network withtheir counterparts in a null model that preserves certainfeatures of the network. The most widely used null modelis the configuration model [1, 3, 4] with the same degreedistribution as the network. In this short note we presentsome simple conservation laws relating motif counts thatfollow directly from the conservation of the degree se-quence. Some conserved quantities were given by Miloet. al [2] before and correlations between motif countshave also been investigated in [5]. The conservation lawswe present here directly relate motif counts and accountfor the correlations observed between motifs [5] and thegeneral structure of motif significance profiles that havebeen used to categorize networks [2].
THE CONFIGURATION MODEL
The configuration model for directed graphs on n nodes[3] is based on assigning to each node a specific in, outand mutual degree ( I i , O i and M i , i = 1 , , . . . , n ) andassigning equal probability to each possible graph con-figuration with the given degree sequence. Whether toinclude the mutual degree in the construction or not isa matter of choice but is in general done when detect-ing motifs [1]. Graphs with self edges, parallel edges andadditional mutual edges that arise during the randomiza-tion process are in general discarded from the ensemble.In the case of undirected graphs one can simply considerall edges to be mutual edges. If additional mutual edgesand/or parallel edges are allowed to form during the ran-domization process the conservation laws we derive holdonly approximately. However, the expected number ofsuch edges in general is small (i.e. O(1)). Algorithms forsampling the configuration model are reviewed in [4]. RESULTSDefinitions and Conventions
The conservation laws we present are based on the dis-tinction between subgraphs and induced subgraphs andthe observation that subgraph counts of V-shaped motifsare preserved in the configuration model.A graph H = ( V ( H ) , E ( H )) is called a subgraphof G = ( V ( G ) , E ( G )) whenever V ( H ) ⊆ V ( G ) and E ( H ) ⊆ E ( G ). A subgraph is said to be induced iffit contains all edges xy ∈ E ( G ) such that x, y ∈ V ( H ).In the literature on network motifs the word ’subgraph’in general refers to an induced subgraph and most mo-tif detection algorithms are based on counting inducedsubgraphs [1, 2]. FIG. 1: Edges in blue form are an induced subgraph as itcontains all edges between nodes 1,2 and 4.While the red edgesare not an induced subgraph as they do not contain all edgesbetween nodes 1,2 and 3.
In this paper we consider the configuration modelwhere the mutual degree is conserved therefore we con-sider mutual edges not as combination of two edges butinstead as edges of a different type. This coincides withthe convention used to count motifs in [1, 2]. Conse-quently, in the case of directed 3 node motifs [Fig.2],motif 3 is not considered to be a subgraph of motif 4,neither is 8 a subgraph of 12, etc. If one considers mu-tual edges to be combinations of two directed edges thecounting convention has to be modified accordingly. Onthe other hand, the conservation laws arising from such a a r X i v : . [ q - b i o . M N ] A ug counting convention can be shown to be linear combina-tions of the ones we derive here. Obviously, the numberof conservation laws would decrease if the mutual degreesequence is not conserved. FIG. 2: The 13 directed 3-node motifs
Conservation laws for directed 3-node motifs
For a graph G on n nodes with given in, out and mu-tual degree sequences ( I i , O i and M i , i=1,2,. . . ,n ) thesubgraph counts of the V-shaped motifs are entirely de-termined by moments of the degree sequences and aregiven by: N = n (cid:88) i =1 (cid:18) O i (cid:19) (1) N = n (cid:88) i =1 (cid:18) I i (cid:19) (2) N = n (cid:88) i =1 O i I i (3) N = n (cid:88) i =1 I i M i (4) N = n (cid:88) i =1 O i M i (5) N = n (cid:88) i =1 (cid:18) M i (cid:19) (6)It follows that the subgraph counts of the V-shaped tri-ads are conserved in the configuration model as they arefunctions of the degree sequences only. Since a subgraphis either an induced subgraph or not, the subgraph countof a given motif is simply the sum of the subgraphs whichare induced subgraphs and the ones that are not. More-over, according to our convention V-shaped subgraphsthat are not induced subgraphs have to be contained insome triangle shaped induced subgraph. Again becauseof the convention all triangle shaped subgraphs are in-duced subgraphs. Since every triangle shaped subgraphcontains a certain number of copies of V-shaped motifs as subgraphs, we get the following conservation laws where N i denotes the subgraph count of motif i and n i its in-duced subgraph count: N = n + n + n (7) N = n + n + n (8) N = n + n + 3 n + n (9) N = n + 2 n + n + n (10) N = n + 2 n + n + n (11) N = n + n + 3 n (12)These conservation laws show that the statistics of theV-shaped motifs are fully determined by the statistics ofthe triangle shaped motifs. In the supporting materialof [2] it was shown that there are 9 conserved quantitiesfor the sixteen 3-node motif counts (including the singleedged and empty motifs). The conservation laws are alsoclosely related to the reactions proposed in [2, 5] as theyrepresent analogues of mass conservation laws for thesereactions. In the case of undirected 4-node motifs there is oneanalogous conservation law for the 3-star motif (motif 1)that follows from the conservation of the degree sequence, d i : n (cid:88) i =1 (cid:18) d i (cid:19) = N = n + n + 2 n + 4 n (13) FIG. 3: The six 4 node motifs
DISCUSSION
The motif conservation laws show that in the case ofdirected 3 node motifs the induced subgraph statistics ofthe V-shaped motifs are completely determined by thestatistics of the triangle shaped motifs. Consequently,the normalized triad significance of the 13 directed 3 nodemotifs has only 6 degrees of freedom due to the six conser-vation laws and the normalization which further reducesthe degrees of freedom by one. Similarly, the subgraphratio profile used in [2] has only four degrees of freedomfor motifs of size 4. The conservation laws further explainwhy the z-scores of triangle shaped motifs and V-shapedmotifs are negatively correlated.The conservation laws could potentially be used to re-duce the computational complexity of algorithms previ-ously used to evaluate motifs since they show that count-ing of star shaped motifs is essentially redundant.The generalization of the conservation laws to higherorder star shaped motifs and different edge and nodetypes is straightforward. Moreover, when the counts oflower order motifs are conserved during the evaluation ofhigher order motifs [1] similar (approximate) conserva-tion laws may arise. Recently, some generalizations of theconfiguration model that are based on specifying higherorder subgraph degrees (such as triangle degree) for eachnode in addition to the edge degrees have been proposed[6, 7]. In these generalized configuration models analo-gous conservation laws for higher order subgraphs thatare not star shaped do also hold (again approximately)as a consequence of the conservation of the subgraph de-grees. For instance, in the model proposed in [6] thesubgraph count of motif 4 in Fig.3 would be conserved approximately since such models contain only O(1) trian-gles in addition to those specified by the triangle degree. ∗ Electronic address: [email protected][1] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan,D. Chklovskii, and U. Alon. Network motifs: simple build-ing blocks of complex networks.
Science , 298(5594):824,2002.[2] R. Milo, S. Itzkovitz, N. Kashtan, R. Levitt, S. Shen-Orr,I. Ayzenshtat, M. Sheffer, and U. Alon. Superfamilies ofevolved and designed networks.
Science , 303(5663):1538,2004.[3] M.E.J. Newman, S.H. Strogatz, and D.J. Watts. Ran-dom graphs with arbitrary degree distributions and theirapplications.
Physical Review E , 64(2):026118, 2001.[4] R. Milo, N. Kashtan, S. Itzkovitz, MEJ Newman, andU. Alon. On the uniform generation of random graphswith prescribed degree sequences.
Arxiv preprint cond-mat/0312028 , 2003.[5] R. Ginoza and A. Mugler. Network motifs come in sets:Correlations in the randomization process.
Physical Re-view E , 82(1):011921, 2010.[6] M.E.J. Newman. Random graphs with clustering.
Physicalreview letters , 103(5):58701, 2009.[7] B. Karrer and MEJ Newman. Random graphs containingarbitrary distributions of subgraphs.