An algebraic analysis of the graph modularity
aa r X i v : . [ m a t h . NA ] J u l AN ALGEBRAIC ANALYSIS OF THE GRAPH MODULARITY
DARIO FASINO ∗ AND
FRANCESCO TUDISCO † Abstract.
One of the most relevant tasks in network analysis is the detection of communitystructures, or clustering. Most popular techniques for community detection are based on the max-imization of a quality function called modularity, which in turn is based upon particular quadraticforms associated to a real symmetric modularity matrix M , defined in terms of the adjacency matrixand a rank one null model matrix. That matrix could be posed inside the set of relevant matricesinvolved in graph theory, alongside adjacency and Laplacian matrices. In this paper we analyze cer-tain spectral properties of modularity matrices, that are related to the community detection problem.In particular, we propose a nodal domain theorem for the eigenvectors of M ; we point out severalrelations occurring between graph’s communities and nonnegative eigenvalues of M ; and we derivea Cheeger-type inequality for the graph modularity. Key words.
Graph partitioning, community detection, nodal domains, graph modularity, spec-tral partitioning.
AMS subject classifications.
1. Introduction.
For the sake of conciseness, we say that a complex network isa graph occurring in real life. Relevant examples include the Internet and the worldwide web, biological and social systems like food webs, economic networks, socialnetworks, communication and distribution networks, and many others [17]. Variousmathematical disciplines collaborate in the analysis and treatment of such complexsystems; and matrix analysis often plays an important role beside e.g., discrete math-ematics and computer science. Here we consider a clear example of this collaboration,namely, the subdivision of a network into “clusters” (typically connected subnetworks)having certain qualitative properties, a task which is required in a number of applica-tions. Two main research directions can be easily recognized within that topic, bothhaving a considerable scientific literature: the graph partitioning and the communitydetection (or clustering ).Graph partitioning is the problem of dividing the vertices of a graph into a givennumber of disjoint subsets of given sizes such that the overall number or weight ofedges between such sets is minimized. The important point here is that the numberand sizes of the subsets are, at least roughly, prescribed. For instance, the probablybest known example of a graph partitioning problem is the problem of dividing anunweighted graph into two subsets of comparable size, such that the number of edgesbetween them is minimized.Community detection problems differ from graph partitioning in that the numberand size of the subsets into which the network is divided are generally not apriorispecified. Instead it is assumed that the graph is intrinsically structured into commu-nities or groups of vertices which are more or less evidently delimited, the aim beingto reveal the presence and the consistency of such groups. In particular it should betaken into account the possibility that no significant subdivisions exist for a givengraph. A comprehensive review of methods for the solution of partitioning and clus-tering problems can be found in [6, Ch. 8] and [17, Ch. 11]; see also [13] for a goodsurvey. ∗ Department of Chemistry, Physics, and Environment, University of Udine, Udine, Italy † Department of Mathematics, University of Rome Tor Vergata, Rome, Italy1 he question that mainly motivated the present work is indeed related with evalu-ating the quality of a particular division of a network into communities and providingefficient, mathematically sound methods and estimates to locate them. As underlinedin [16] and [18], “a good division of a network into communities is not merely onein which there are few edges between communities; it is one in which there are fewerthan expected edges between communities” . Newman and Girvan therefore introduceda measure of the quality of a particular division of a network, which they call modu-larity . Despite several other quality functions have been proposed in the last ten yearsfor analogous purposes, the modularity is by far the most popular quality functionfor evaluating the quality of a graph partitioning, and is currently adopted by varioussuccessful partitioning algoritms, e.g., the so-called Louvain method [5]. The inter-esting fact here, and the issue that has drawn our attention to this topic, is that themodularity, as well as other related graph-oriented topological invariants, is defined interms of certain quadratic forms associated to a matrix M , called modularity matrix .That matrix can be considered as one of the relevant matrices naturally associated toa graph, together with adjacency and Laplacian matrices.The main aim of this paper is to analyze certain spectral properties of modularitymatrices that are relevant to the community detection problem. In the subsequentpart of this Introduction we provide the notational and conceptual background forthe subsequent discussion. Sections 2 and 3 introduce the modularity matrix of agraph, its relationships with the modularity of a (sub-)graph and with the Laplacianmatrix, and outline the special role of one of its eigenvalues. In Section 4 we presenta nodal domain theorem for the eigenvectors of modularity matrices. The subsequentsections are devoted to the analysis of various connections between optimal partition-ings of a graph and nonnegative eigenvalues of its modularity matrix. Main resultsare summarized in the concluding Section 8, which comprises also our final commentsand possible directions for further research. To avoid any ambiguity wefix here our notations and some preliminary definitions. We give a brief review ofstandard concepts from algebraic graph theory that we will use extensively throughoutthe paper, referring the reader to e.g., [6, Ch. 2] of [17, Ch. 6] for a careful and succinctintroduction to the topic.From a purely algebraic point of view a graph G consists of a triple G = ( V, E, ω )where V is the set of vertices (or nodes), E is the set of edges and formally is asubset of V × V , and ω : E → R + is a nonnegative weight function defined over E , representing the strength of the relation modeled by the edges. We shall alwaysassume that a graph G is finite, simple, connected, not oriented. We always identify V with { , . . . , n } . We use the simpler notation G = ( V, E ) when ω ( ij ) = 1, that is,edges are not weighted.If not otherwise specified, the symbol A will always denote the adjacency matrixof G , that is, A ≡ ( a ij ) such that a ij = ω ( ij ) iff ij ∈ E , and a ij = 0 otherwise. Inparticular, A is a symmetric, irreducible, componentwise nonnegative matrix. For thesake of clarity, further definitions are listed hereafter: • If ij ∈ E we write i ∼ j and say that i and j are adjacent. Our use of the term “community” hereafter makes no reference to its meaning in social sciencesand other disciplines. We limit ourselves to its common meaning in the network analysis context.Accordingly, we may also use the term “cluster” as alternative to “community”: “Clustering is asynonym for the decomposition of a set of entities into natural groups” [6].2
For any i ∈ V , d i denotes its (generalized) degree, d i = P j : ij ∈ E ω ( ij ). More-over, we let d = ( d , . . . , d n ) T and D = Diag( d , . . . , d n ). • For any S ⊆ V we denote by S the complement V \ S , and let vol S = P i ∈ S d i be the volume of S . Correspondingly, vol G = P i ∈ V d i denotes the volumeof the whole graph. • A partition of V is a collection of subsets P = { S , . . . , S k } such that ∪ i S i = V and S i ∩ S j = ∅ for i = j . • For S ⊆ V , we denote by A ( S ) the principal submatrix of A made by therows and columns whose indices belong to S . Moreover, we denote by G ( S )the subgraph induced by the vertices in S , that is the subgraph of G whoseadjacency matrix is A ( S ). • denotes the vector of all ones whose dimension depends on the context. • The cardinality of a set S is denoted by | S | . In particular, | V | = n . • For any S ⊆ { , . . . , n } we let S be its characteristic vector , defined as( S ) i = 1 if i ∈ S and ( S ) i = 0 otherwise. • For any subsets
S, T ⊆ V let E ( S, T ) be the set of edges joining vertices in S with vertices in T ; and let e ( S, T ) = T S A T = X i ∈ S X j ∈ T ω ( ij ) . Note that, if G = ( V, E ) is unweighted and loopless then e ( S, T ) = 2 | E ( S, T ) | .For simplicity, we use the shorthands e in ( S ) = e ( S, S ) and e out ( S ) = e ( S, S ),so that we have also vol S = e in ( S ) + e out ( S ) . • For a matrix A and a vector x , we write A ≥ O or x ≥ A > O or x >
0) to denote componentwise nonnegativity (resp., positivity); and ρ ( A )denotes the spectral radius of A . • If X is a symmetric matrix then its eigenvalues are ordered as λ ( X ) ≥ · · · ≥ λ n ( X ), unless otherwise specified.We will freely use familiar properties of eigenvalues of symmetric matrices, andfundamental results in Perron-Frobenius theory, see e.g., [3, 4]. For completeness, werecall hereafter some important facts concerning the symmetric eigenvalue problem: • Let A ∈ R n × n be a symmetric matrix and let Z ∈ R n × ( n − k ) be a matrix withorthonormal columns. Then, for all i = 1 , . . . , n − k , λ i ( A ) ≥ λ i ( Z T AZ ) ≥ λ i + k ( A ) . (1.1) • Let A ∈ R n × n be a symmetric matrix and let B ∈ R ( n − k ) × ( n − k ) be a principalsubmatrix of A . Then, for all i = 1 , . . . , n − k , λ i ( A ) ≥ λ i ( B ) ≥ λ i + k ( A ) . (1.2) • Let A be a real symmetric matrix of order n and v ∈ R n . Then, for i =1 , . . . , n − λ i ( A ) ≥ λ i +1 ( A + vv T ) ≥ λ i +1 ( A ) . (1.3) .1.1. The modularity matrix. The modularity matrix of the graph is definedas follows: M = A − G dd T . (1.4)Modularity matrices have been introduced originally for unweighted graphs G =( V, E ); in that case, the number a ij indicates the presence of an edge between nodes i and j , whereas d i d j / vol G estimates the expected number of edges between ver-tices i and j , if edges in the graph were placed with an uniformly random distri-bution, according to the given degree sequence d , . . . , d n . Therefore the ( i, j )-entry m ij = a ij − ( d i d j ) / (vol G ) of M measures the disagreement between the expectednumber and the actual number of edges joining i and j . It is a common practice toextend rather informally this definition to any weighted graph G = ( V, E, ω ). In thenext paragraph, we outline a formal justification of this rather natural extension.
Remark 1.1.
In any unweighted graph, the number d i d j / vol G is always anupper bound on the probability that ij ∈ E , assuming that edges are placed in G independently at random, conditionately to the given degrees. In fact, that numberis the first term in a sign-alternating series expressing the actual probability, wheresuccessive terms represent the probability that i and j are connected by multiple edges.If d i d j / vol G ≪ then the alternating series is rapidly convergent, and the boundis a good approximation to the true value. On the other hand, graphs of practicalinterest may contain node pairs with d i d j / vol G > and weighted edges. In any case,a more principled motivation of the rank-one correction in the modularity matrix,which carries over the weighted graph case, relies on the so-called Chung-Lu randomgraph model, to be recalled in the next paragraph. The Chung-Lu random graphmodel is one of the most widespread and successful models for the analysis of largegraphs with general degree distributions. Let w = ( w , . . . , w n ) T > i w i < P ni =1 w i . We say that a graph G = ( V, E ) follows theChung-Lu random graph model with parameter w , denoted by G ( w ), if the existenceof the edge ij ∈ E is determined by an independent Bernoulli trial with probability p ij = w i w j / ( P ni =1 w i ). That model has been popularized in [7, Ch. 5]; and variousstatistical properties have been described e.g., in [1, 20]. A basic and very usefulproperty of this model is that, if G = ( V, E ) is a random graph drawn from G ( w ),then the expected degree of i ∈ V is exactly w i . Consequently, if only the degreevector d is known, it is reasonable to assume w = d . Actually, this equality leads toan asymptotically unbiased estimator of w [1]. Hereafter, we propose a generalizationof the Chung-Lu model which is convenient for working with weighted graphs. Definition 1.2.
Let w = ( w , . . . , w n ) T > , and let X ( p ) be a nonnegativerandom variable parametrized by the scalar parameter p ∈ [0 , , whose expectation is E ( X ( p )) = p . We say that a weighted graph G = ( V, E, ω ) follows the X -weightedChung-Lu random graph model G ( w, X ) if, for all i, j ∈ V , ω ( ij ) are independent ran-dom variables distributed as X ( p ij ) where p ij = w i w j / P ni =1 w i , with the conventionthat ij ∈ E ⇔ ω ( ij ) > , that is, edges with zero weight are removed from G . We point out that G ( w ) is the special case of G ( w, X ) where X ( p ) is the Bernoullitrial with success probability p . On the other hand, if X ( p ) has a continuous part,then G ( w, X ) may contain graphs with generic weighted edges. In any case, as inthe original Chung-Lu model, if G is a random graph drawn from G ( w, X ) then the xpected degree of node i is E ( d i ) = n X j =1 E ( ω ( ij )) = n X j =1 p ij = w i . The modularity matrix (1.4) is a rank one per-turbation of the adjacency matrix, which is still symmetric but looses the nonnega-tivity of its entries. The kernel of M is nontrivial and, indeed, always is a nonzeroelement in ker M . This is reminiscent of another key matrix associated to a graph G : the Laplacian matrix. Such matrix is defined as L = D − A , where D denotesthe diagonal matrix with diagonal entries d , . . . , d n . A huge literature has been de-veloped around L , its spectral properties, and their connections with combinatorialand topological properties of G , see e.g., [8, 14] and references therein; in fact, thismatrix can be thought as a discrete version of the Laplacian differential operator,under many respects.The bilinear form associated to L admits the expression v T Lv = X ij ∈ E ω ( ij )( v i − v j ) , (1.5)where the sum ranges over all edges in the graph, each edge being counted onlyonce. Thus, L is symmetric and positive semidefinite; zero always is a eigenvalue of L , with associated eigenvector , and that eigenvalue is simple if and only if G isconnected. Conventionally the eigenvalues of L are ordered from smallest to largest;for a connected graph, 0 = λ ( L ) < λ ( L ) ≤ · · · ≤ λ n ( L ). The study of the spectral properties of the Laplacianmatrix has originated one of the best known methods for graph partitioning, the spectral partitioning [17, § G , thesecond smallest eigenvalue of L (the smallest one being zero), and the changes of signsof the entries of any eigenvector relative to such eigenvalue. Following Fiedler’s works,the number λ ( L ) is usually called algebraic connectivity of G and denoted by a ( G );furthermore, it is a well established practice to call Fiedler vector any eigenvectorassociated to it.Let us recall a couple of definitions and relevant results. Inspired by Courant’snodal domains theorem (which bounds the number of nodal domains of eigenfunctionsof the Laplacian operator on smooth Riemannian manifolds), nodal domains inducedby a real vector u are commonly defined as follows: Definition 1.3.
Let = u ∈ R n . A subset S ⊆ V is a strong nodal domain of G induced by u if the subgraph G ( S ) induced on G by S is a (maximal) connectedcomponent of either { i : u i > } or { i : u i < } . Definition 1.4.
Let = u ∈ R n . A subset S ⊆ V is a weak nodal domain of G induced by u if the subgraph G ( S ) induced on G by S is a (maximal) connectedcomponent of either { i : u i ≥ } or { i : u i ≤ } and contains at least one node i where u i = 0 . Actually, the previous definitions are a slight modification of the terminology usedin e.g., [9, 10], but their meaning is unchanged. For any connected graph G , λ ( L ) = 0 Unlike their continuous analogous, in the present context nodal domains are located by signvariations rather than zero values. Therefore some authors call them sign domains [9]. We prefer tomaintain the “classical” terminology. 5 s simple and has as associated eigenvector. It clearly follows that the only possiblenodal domain for λ ( L ) is G itself. On the other hand, since L is real and symmetric,each other eigenvector of L can be chosen to be real and orthogonal to , that is,any eigenvector u of L that is not constant has at least two components of differentsigns. Therefore any such u has at least two nodal domains. Fiedler noted in [12, Cor.3.6] that the weak nodal domains induced by any eigenvector associated to a ( G ) areat most two, and thus are exactly two. Many authors derived analogous results forthe other eigenvalues of L afterward [9, 10, 19]. The following nodal domain theorem summarizes their work: Theorem 1.5.
Let L be the Laplacian matrix of a connected graph. Let λ be aneigenvalue of L and let u be an associated eigenvector. Let ℓ and ℓ ′ be the number ofeigenvalues of L that are not larger than λ and strictly smaller than λ , respectively,counted with their multiplicity. Then u induces at most ℓ strong nodal domains andat most ℓ ′ + 1 weak nodal domains.
2. Modularity of a subgraph.
A central problem in graph clustering is tolook for a quantitative definition of community. Although all authors agree that acommunity should be a connected group of nodes that is more densely connectedamong each other than with the rest of the network, as a matter of fact no definitionis universally accepted. A variety of merit functions to quantify the strength of asubset S ⊂ V as a community in G is listed in [6, Ch. 8]; all of them are essentiallybased on a trade-off between the total weight of edges insisting on vertices in V (whichshould be “large”) and the one of the edges connecting vertices in V with verticesoutside V (which should be “small”, for a “good” community).Fortunato in its comprehensive report [13] classifies various definitions of com-munity according to whether they are based on graph-level properties, subgraph-levelproperties, or vertex similarity, and underlines that the global definition based on themodularity quality function introduced by Newman and Girvan in [18] is by far themost popular definition. Their definition can be informally stated as follows: A subsetof vertices S ⊆ V forms a community if the subgraph G ( S ) contains a larger numberof edges than expected. Obviously, such statement is not rigorous, until one definesthe probability distribution underlying the concept of “expected number”. Doubtless,the most simple and natural guess is to assign an equal probability to the connectionbetween any two nodes in the network. The corresponding random graph model isknown as Erd¨os-R´enyi model. That model is at the basis of various successful ap-proaches to community detection [2, 21, 22]. In this work, we follow [15, 16, 18] andassume, instead, the Chung-Lu random graph model with parameter d as reference.Given a graph G = ( V, E, ω ), consider a subset of vertices S ⊆ V . For graphsfollowing the (weighted) Chung-Lu model with parameter d , the overall weight ofedges joining vertices in S can be estimated by X i ∈ S X j ∈ S d i d j vol G = (vol S ) vol G .
Consequently, we define the modularity of S as Q ( S ) = e in ( S ) − (vol S ) / vol G. (2.1)If that difference is positive then there is a clear indication that the subgraph G ( S )contains “more edges” than expected from the reference model. This fact can be onsidered as a clue (apart from connectedness) that S is a closely knit set of verticesand as such, a possible community inside G .An easy computation exploiting the identities vol S = e in ( S )+ e out ( S ) and vol G − vol S = vol S reveals that Q ( S ) = vol S − e out ( S ) − (vol S ) vol G = vol S (1 − vol S vol G ) − e out ( S )= vol S · vol S vol G − e out ( S ) . (2.2)Such relation shows that Q ( S ) = Q ( S ). Therefore, modularity is a quality of the cut { S, S } rather than of S itself. Moreover it reveals that Q ( S ) is large when both S and its complement S have comparable volumes (in fact vol S vol S/ vol G is largewhen vol S ≈ vol S ≈ vol G ) and the overall weight of edges elapsing between S and S is small. Consequently, (2.2) bares that the modularity Q ( S ) shares the structureof virtually all reasonable clustering indices [6, Ch. 8], consisting of the differencebetween vol S vol S/ vol G , which is a term measuring the density of the “clusters” S and S , and e out ( S ), which quantifies the sparsity of their connection. Furthermore,the resulting equalities Q ( ∅ ) = Q ( V ) = 0 formalize the common understanding thatneither the emptyset nor the whole graph constitute a community.It is almost immediate to recognize that e in ( S ) = T S A S and vol S = T S d . Hence,we can express the modularity (2.1) in terms of the modularity matrix (1.4) as follows: Q ( S ) = T S M S . (2.3) Remark 2.1.
In principle other vectors can be chosen in place of d inside (1.4) ,depending on the null model one is assuming for the distribution of the edges in G . Forexample, if G is unweighted and the null model assumed is the Erd¨os-R´enyi randomgraph model, in which every edge has probability p to appear, then the appropriatedefinition for the modularity matrix of G would be M = A − p T with p = vol G/n ,so that Q ( V ) = T M = 0 . In this case, the resulting modularity matrix allows usto express by means of a formula analogous to (2.3) certain modularity-type meritfunctions based on -state Potts Hamiltonian functions adopted in, e.g., [21, 22]. In a somehow heuristic way at this stage, we see from (2.3) that the existence ofa subset S ⊆ V having positive modularity is related with the positive eigenvalues of M and their corresponding eigenspaces. In fact, if F n = { , } n is the set of binary n -tuples, the search of a maximal modularity subgraph is formalized by the optimizationproblem max x ∈ F n x T M x. (2.4)The problem as is stated is clearly NP-complete, so a standard and widely usedprocedure is to move to a continuous relaxation, for example,max x ∈ R n x T x =1 x T M x, (2.5)which is solved by an eigenvector associated to the largest eigenvalue of M , properlynormalized. Once a solution ˜ x for the latter problem (2.5) is computed, the sign vector s = sign(˜ x ) is chosen as an approximate solution for (2.4). Note that such s realizesthe best approximation to ˜ x in the L p sense, that is k s − ˜ x k p = min x ∈ F n k x − ˜ x k p , for ∈ [1 , ∞ ]. The spectral analysis of M and of the maximal subgraphs induced by thechange of signs in its eigenvectors (nodal domains) is the central topic of Sections 4.In what follows, we adopt from [16, 18] the following definitions: Definition 2.2. A module in a given graph G is a subgraph having positivemodularity. A graph is indivisible if it has no modules, and divisible otherwise. Probably, the main reason of the success of modularity as a quantitative measureof community strength is the fact that modules having significant size and modularityare typically decent indicators of community structure.
Remark 2.3.
Cliques and star graphs are indivisible graphs. On the other hand,indivisible graphs are rather scarce. Indeed, a simple computation based on the formula (2.1) shows that, if i, j ∈ V are two vertices joined by an edge, and d i + d j < p ω ( ij )vol G, then Q ( { i, j } ) > . Consequently, a graph is divisible if it has at least one edgefulfilling the previous inequality, a condition which is easily met in practice.
3. The algebraic modularity of a graph.
Since the pioneering works byFiedler [11, 12] the algebraic connectivity of a connected graph G is classically definedas the smallest positive eigenvalue of its Laplacian matrix: a ( G ) = min x T =0 x T Lxx T x . Analogously, we can define the algebraic modularity of G as m ( G ) = max x T =0 x T M xx T x . (3.1)Differently to (2.5), any vector x attaining the maximum in (3.1) must have entrieswith opposite signs. We will see afterward that m ( G ) plays a relevant role in thecommunity detection problem, exactly in the same way as a ( G ) with respect to thepartitioning problem. Furthermore, in tandem with Definition 2.2, it is rather natu-ral to say that G is algebraically indivisible if its modularity matrix has no positiveeigenvalues. For example, cliques and star graphs are algebraically indivisible graphs. Remark 3.1.
The number m ( G ) is the largest eigenvalue of M after deflation ofthe subspace h i , which is an invariant subspace associated to the eigenvalue . Moreprecisely, we have λ ( M ) = max { m ( G ) , } . Hence, we can say that m ( G ) = λ ( M ) if and only if m ( G ) ≥ . We point out that any algebraically indivisible graph is indivisible as well. Indeed,the existence of a subgraph S having positive modularity implies that M has at leastone positive eigenvalue: λ ( M ) ≥ T S M S / T S S = Q ( S ) / | S | >
0. We shall explorein greater detail in Section 6 the relationship between divisibility of G and positiveeigenvalues of M . For the moment, the following argument shows that a better boundthan Q ( S ) ≤ | S | λ ( M ) can be derived: Lemma 3.2.
For any S ⊆ V we have Q ( S ) ≤ m ( G ) | S || S | /n .Proof . Let α = | S | /n . Then, the vector S − α is orthogonal to and moreover,( S − α ) T ( S − α ) = ( S − α ) T S = | S | − α | S | = | S || S | n . Recalling that M = 0 and the definition (3.1) we have Q ( S ) = T S M S = ( S − α ) T M ( S − α ) ≤ m ( G )( S − α ) T ( S − α ) , nd we complete the proof.It is worth noting that the modularity matrix M can be expressed as the differenceof two Laplacian matrices. Indeed, M = A − D + D − dd T / ( T d ) = L − L, (3.2)where L = D − A is the Laplacian matrix of G and L = D − dd T / ( T d ) can beregarded as the Laplacian matrix of the complete graph G = ( V, V × V, ω ) wherethe weight ω ( ij ) = d i d j / T d is placed on the edge ij . Thus, in some sense, G represents the “average graph” in the Chung-Lu model with parameter d .The formula (3.2) yields a decomposition of M in terms of two positive semidef-inite matrices. A noticeable consequence of the Courant-Fischer theorem is the fol-lowing set of inequalities, relating algebraic connectivity and modularity of G , andwhose simple proof is omitted for brevity: d min − a ( G ) ≤ a ( G ) − a ( G ) ≤ m ( G ) ≤ d max − a ( G ) , where d min and d max denote the smallest and largest degree of vertices in G , re-spectively. Consequently, a necessary condition for G being algebraically indivisibleis a ( G ) ≤ a ( G ). By a result by Fiedler [11], whose proof extends immediately toweighted graphs, a ( G ) ≤ [ n/ ( n − d min . Hence, m ( G ) ≥ − d min / ( n −
4. Modularity nodal domains.
As recalled in Definition 1.3 and Definition1.4, any vector u ∈ R n induces some nodal domains over G , that is some maximalconnected subsets of the vertices V related with sign changes inside u . Hereafter,we consider nodal domains induced by eigenvectors of the modularity matrix of thegraph, which we call modularity nodal domains . The aim of this section is to derive anodal domain theorem analogous to Theorem 1.5 for the modularity nodal domains,contributing to the analysis and the improvement of the spectral-based methods forcommunity detection, proposed by Newman and Girvan [18] and well summarized in[13] and [17, Ch. 11].We will say that a nodal domain S ⊂ V induced by a vector u is positive or negative , according to the sign of u over S . If S and S are two nodal domains,we say that S is adjacent to S , in symbols S ≈ S , if there exists i ∈ S and j ∈ S such that i ∼ j . The maximality of the nodal domains therefore implies thata necessary condition for S ≈ S is that S and S have different signs.Given a real vector u = 0 the following properties on the nodal domains it inducesare not difficult to be observed; some of them are borrowed from [9]: P1.
In any nodal domain there exists at least one node where u is nonzero. More-over, if S and S are weak nodal domains such that S ∩ S = ∅ then S and S have opposite sign and u i = 0 for any i ∈ S ∩ S . P2.
Let A be the adjacency matrix of G . If S ⊆ V is a (strong or weak) nodaldomain, then G ( S ) is connected and the principal submatrix A ( S ) is irre-ducible. Therefore, since two nodal domains of the same sign can not beadjacent, for any vector u there exists a labeling of the vertices of V suchthat the adjacency matrix A of G has the form A = A + B CB T A − DC T D T A (4.1) here rows and columns of A + , A − , and A correspond to entries in u thatare positive, negative, and zero, respectively, and A + and A − are the directsum of overall s irreducible matrices, s being the number of strong nodaldomains. P3. If S and S are adjacent weak nodal domains, then there exists i ∈ S and j ∈ S \ S such that i ∼ j and u j = 0. In fact, if S ∩ S = ∅ then theassertion follows by definition. (If i ∼ j and u j = 0 then j ∈ S ∩ S .)Whereas if S ∩ S = ∅ then, by property P1 , there must be at least a pairof vertices i, j ∈ S ∪ S for which i ∈ S ∩ S (whence u i = 0), i ∼ j , and j ∈ S \ S (so that u j = 0); otherwise, there would be no edge joining S ∩ S and S \ S , contradicting the hypothesis that G ( S ) is connected.The following theorem, which is a slight generalization of [12, Thm. 2.1] and[19, Thm. 1], is the key for deriving a nodal domain theorem for the modularityeigenvectors. We stress that such theorem and its corollaries hold for any undirectedsimple graph, that is, loops and weighted edges are possibly allowed. Theorem 4.1.
Let A be the adjacency matrix of a simple, connected graph G .Let λ ∈ R and u ∈ R n be such that at least two entries of u have opposite signs and Au ≥ λu , in the componentwise sense. Let ℓ and ℓ ′ be respectively the number ofeigenvalues of A that are greater than or equal to λ and the number of eigenvaluesthat are strictly greater than λ , counted with their multiplicity. Then u induces atmost ℓ positive strong nodal domains and at most ℓ ′ positive weak nodal domains.Proof . Let s ≥ u .Due to property P2 above, we can assume without loss in generality that the vector u can be partitioned into s + 1 subvectors, u = ( u , . . . , u s , u s +1 ) T such that u i > i = 1 , . . . , s , u s +1 ≤ A is conformally partitioned as A = A B . . . ... A s B s B T · · · B T s B s +1 , where A i are nonnegative and irreducible, since they are the adjacency matrices ofconnected graphs. By hypothesis, A i u i + B i u s +1 ≥ λu i for i = 1 , . . . , s . Therefore A i u i ≥ λu i − B i u s +1 ≥ λu i and, by Perron-Frobenius theorem we have ρ ( A i ) = max x =0 x T A i xx T x ≥ u T i A i u i u T i u i ≥ λ. This implies that A i has at least one eigenvalue not smaller than λ , for i = 1 , . . . , s . Byeigenvalue interlacing inequalities (1.2) we conclude that A has at least s eigenvaluesgreater than or equal to λ , whence s ≤ ℓ . This proves the first inequality in the claim.The second one can be proved analogously. As for the strong domains, two positiveweak nodal domains can not overlap, therefore there exists a labeling of V such that A admits the block form A = A B . . . ... A w B w B T · · · B T w B w +1 , here w is the number of weak positive nodal domains, and the vector u is partitionedconformally as u = ( u , . . . , u w , u w +1 ) T where u i ≥ i = 1 , . . . , w and u w +1 ≤ u w +1 correspond to nodes belonging to the complement of theunion of all positive weak nodal domains, and u may vanish also on some of thosenodes. Nevertheless, property P3 above imply that each B i contains at least onenonzero entry, and B i u − ≤ i = 1 , . . . , w let x i be a Perron eigenvector of A i , A i x i = ρ ( A i ) x i , with positive entries.Hence x T i u i > x T i B i u w +1 <
0. From the inequality A i u i ≥ λu i − B i u w +1 weobtain ρ ( A i ) x T i u i = x T i A i u i ≥ λx T i u i − x T i B i u w +1 > λx T i u i for i = 1 , . . . , w . Again by the eigenvalue interlacing (1.2) we see that A has at least w eigenvalues strictly greater that λ , concluding that w ≤ ℓ ′ .Note that in the preceding theorem λ may not be an eigenvalue of A , in which case ℓ = ℓ ′ . If λ is an eigenvalue of A then the difference ℓ − ℓ ′ equals its algebraic/geometricmultiplicity. A direct consequence of Theo-rem 4.1 is the following result concerning the nodal domains of eigenvectors of mod-ularity matrices, as announced:
Theorem 4.2.
Let λ be an eigenvalue of M and let u be an associated eigenvector,oriented so that d T u ≥ . Let ℓ and ℓ ′ be respectively the number of eigenvalues of M which are greater than or equal to λ and the number of eigenvalues which arestrictly greater than λ , counted with their multiplicity. If at least two entries of u have opposite signs then u induces at most ℓ + 1 positive strong nodal domains andat most ℓ ′ + 1 positive weak nodal domains.Proof . By hypotheses, ℓ ≥ ℓ ′ + 1 and the eigenvalues of M fulfill λ ℓ ′ ( M ) > λ ℓ ′ +1 ( M ) = . . . = λ ℓ ( M ) | {z } eigenvalues equal to λ > λ ℓ +1 ( M ) , the first (last) inequality being missing if ℓ ′ = 0 ( ℓ = n , respectively). Since A − M is apositive semidefinite rank-one matrix, inequalities (1.3) imply the following interlacingbetween the eigenvalues of A and of M : λ ( A ) ≥ λ ( M ) ≥ λ ( A ) ≥ λ ( M ) ≥ . . . ≥ λ n ( M ) . By inspecting the preceding inequalities we get that • λ ℓ ( A ) ≥ λ > λ ℓ +2 ( A ); thus ℓ ≤ |{ i : λ i ( A ) ≥ λ }| ≤ ℓ + 1. • λ ℓ ′ ( A ) > λ ≥ λ ℓ ′ +2 ( A ); thus ℓ ′ ≤ |{ i : λ i ( A ) > λ }| ≤ ℓ ′ + 1.By hypothesis we have Au = M u + [ d T u/ (vol G )] d ≥ λu . The claim follows immedi-ately by Theorem 4.1.A close inspection of the preceding proof reveals that, if λ is not an eigenvalue of A then we must have ℓ = ℓ ′ + 1 and the previous inequalities become λ ℓ ′ ( M ) > λ ℓ ( A ) > λ ℓ ( M ) = λ > λ ℓ +1 ( A ) > λ ℓ +1 ( M ) . Consequently the bound for the induced positive strong nodal domains in the theo-rem above becomes simply ℓ . The following corollary specializes the content of thepreceding theorem to eigenvectors associated to the algebraic modularity: orollary 4.3. Let u be an eigenvector associated to m ( G ) and oriented sothat d T u ≥ . If m ( G ) is simple and is not an eigenvalue of A then u induces exactlyone positive (strong) nodal domain.Proof . It suffices to observe that, if m ( G ) = 0 then u must be a multiple of ,and the claim is trivial. On the other hand, if m ( G ) = 0 then u T = 0, so u has atleast two entries with different signs, and the claim follows from the aforementionedinterlacing inequalities and Theorem 4.2.Unfortunately there exists no analogous of Theorem 4.2 for the negative nodaldomains. This is illustrated by the following example. Example 4.4.
We produce a family of graphs, of arbitrarily large size, to showthat the number of (unsigned) nodal domains induced by the leading eigenvector of M can be arbitrary, whilst there is exactly one positive nodal domain, if signs are chosenas prescribed by Theorem 4.2. Consider a weighted star graph with loops on n = m + 1 nodes, whose structure and adjacency matrix are as follows: '&%$ !" . . . m '&%$ !" n '&%$ !" ❄❄❄❄❄ ⑧⑧⑧⑧⑧ α αβ A = α . . . ... α · · · β . Hence, α and β are the weights of the loops placed on the leaf nodes , . . . , m and onthe root node n = m + 1 , respectively. In particular, the degree vector is d = A =(1+ α, . . . , α, β + m ) T and the volume is vol G = d T = m (2+ α )+ β . Straightforwardcomputations leads to the conclusion that the eigenvalues of the modularity matrix M are the following: • , with associated eigenvector ; • α , with multiplicity m − and associated eigenvectors { } − { j } for j =2 , . . . , m ; • λ = ( αβ − m )( m +1) / vol G , with associated eigenvector v = ( − , . . . , − , m ) T .Observe that, when αβ − m > ( α + 1) then λ is positive and dominant. If in addition α + 1 ≤ β + m then the vector v fulfills the inequality d T v ≥ , whence the spectral clustering of the graph consists ofone positive nodal domain, given by node n , and m distinct, negative nodal domains,given by the leaf nodes. Both inequalities above are fulfilled when, for example, α = 1 and β = m + 5 .On the other hand, the Laplacian matrix of the same graph is L = − . . . ... − − · · · − m , independently on α and β . Its smallest nonzero eigenvalue is and the associatedeigenspace is set of all zero-sum vectors that are orthogonal to the n -vector (1 , . . . , , .Hence, any spectral partitioning induced by a Fiedler vector has exactly two weak odal domains (which intersect at the root node), whereas the number of (positive andnegative) strong nodal domains can vary in the range , . . . , m . Analogous examples can be built up using loopless, unweighted graphs. Indeed,consider a graph with p + mq nodes consisting of one clique with p nodes and m copiesof the clique with q nodes. Moreover, add m edges connecting a fixed node of theformer subgraph with one node of each of the latter subgraphs. The case with p = 4, m = q = 3 is shown hereafter: (cid:15)(cid:14)(cid:13)(cid:12)(cid:8)(cid:9)(cid:10)(cid:11)(cid:15)(cid:14)(cid:13)(cid:12)(cid:8)(cid:9)(cid:10)(cid:11) (cid:15)(cid:14)(cid:13)(cid:12)(cid:8)(cid:9)(cid:10)(cid:11)(cid:15)(cid:14)(cid:13)(cid:12)(cid:8)(cid:9)(cid:10)(cid:11) (cid:15)(cid:14)(cid:13)(cid:12)(cid:8)(cid:9)(cid:10)(cid:11) (cid:15)(cid:14)(cid:13)(cid:12)(cid:8)(cid:9)(cid:10)(cid:11)(cid:15)(cid:14)(cid:13)(cid:12)(cid:8)(cid:9)(cid:10)(cid:11)(cid:15)(cid:14)(cid:13)(cid:12)(cid:8)(cid:9)(cid:10)(cid:11) (cid:15)(cid:14)(cid:13)(cid:12)(cid:8)(cid:9)(cid:10)(cid:11)(cid:15)(cid:14)(cid:13)(cid:12)(cid:8)(cid:9)(cid:10)(cid:11)(cid:15)(cid:14)(cid:13)(cid:12)(cid:8)(cid:9)(cid:10)(cid:11)(cid:15)(cid:14)(cid:13)(cid:12)(cid:8)(cid:9)(cid:10)(cid:11)(cid:15)(cid:14)(cid:13)(cid:12)(cid:8)(cid:9)(cid:10)(cid:11) ♥♥ ❧❧❧✶✶✶✳✳✳♥♥ ❧❧❧ ②②❵❵②②❵❵✶✶✶ ❊❊✫✫❊❊✫✫✳✳✳★★❃❃★★❃❃ Under appropriate conditions on the parameters p , q , and m the leading eigenvectorof M splits the graph into the m + 1 cliques, each belonging to a different nodaldomain; the (unique) positive nodal domain being the clique having order p . Com-puter experiments show that those conditions are met e.g., for p = 4, q = 3, and m = 2 , . . . ,
5. Upper bounds on the graph modularity.
In the preceding sections wehave understood modularity as a functional defined over arbitrary subsets of V . Forthe purposes of community detection problems, it is convenient to extend the previousdefinition to arbitrary partitions. In fact, Newman and Girvan original definition ofthe modularity of a partition P = { S , . . . , S k } of V , see Equation (5) in [18], can beexpressed in our notations as q ( P ) = 1vol G k X i =1 Q ( S i ) = 1vol G k X i =1 T S i M S i . (5.1)The normalization factor 1 / vol G is purely conventional and has been included bythe authors for compatibility with previous works, to settle the value of q ( P ) in arange independent of G . That definition has been introduced as a merit function toquantify the strength of the community structure defined by P . In the earliest com-munity detection algorithm, the function q ( P ) is optimized by a hierarchical clusteringmethod. Subsequent improvements of that algorithm maintain essentially the origi-nal approach, see [5]. The use (and the definition itself) of the modularity matrix tocompute the modularity of a paritioning has been introduced successively in [15, 16].As recalled in the Introduction, in the community detection problem one has nopreliminary indications on the number and size of possible communities inside G .Hence, it is natural to introduce the number q G = max P q ( P ) , where the maximum is taken over all nontrivial partitions of V , and try to bound itin terms of spectral properties of M only. Remark 5.1.
An optimal partition P ∗ = { S , . . . , S k } , that is, a partitionsuch that q G = q ( P ∗ ) , has the property that if any two subsets are merged then theoverall modularity does not increase. This does not imply that Q ( S i ) > for all i = 1 , . . . , k , even if G is divisible. Nevertheless, if Q ( S ) > for some S ⊂ V then q G ≥ q ( { S, S } ) > , so that the condition q G > is equivalent to say that G isdivisible. n the k = 2 case we have P = { S, S } for some S ⊂ V . Since Q ( S ) = Q ( S ), fornotational simplicity, we can write q ( S ) in place of q ( P ). Correspondingly, we alsoconsider the quantity q ′ G = max S ⊂ V q ( S ) , whose computation corresponds to the identification of a set S or, equivalently, a cut { S, S } with maximal modularity. We prove hereafter a very general upper bound for q ′ G in terms of m ( G ); a lower bound is considered in the forthcoming Corollary 7.2,under additional hypotheses. Theorem 5.2.
Let h d i = vol G/n be the average degree in G . Then, q ′ G ≤ m ( G )2 h d i . Proof . Since Q ( S ) = Q ( S ), for any S ⊆ V we have by definition q ( S ) = 1vol G (cid:0) Q ( S ) + Q ( S ) (cid:1) = 2 Q ( S )vol G . (5.2)Let S be a set maximizing q ( S ). From Lemma 3.2 we obtain q ′ G = 2 Q ( S )vol G ≤ m ( G )vol G | S || S | n ≤ m ( G )vol G n , since | S || S | is upper bounded by n / S .In what follows, we prove a result analogous to the preceding theorem but for thenumber q G . For clarity of exposition, we derive firstly a preliminary result: Lemma 5.3.
Let A and B be two symmetric matrices of order n , with eigenvalues λ ( A ) ≥ . . . ≥ λ n ( A ) and λ ( B ) ≥ . . . ≥ λ n ( B ) , respectively. Then, trace( AB ) ≤ n X i =1 λ i ( A ) λ i ( B ) . Proof . The claim can be derived easily from the Hoffman-Wielandt inequality [4,Thm. VI.4.1] n X i =1 ( λ i ( A ) − λ i ( B )) ≤ k A − B k F , using the expansion k A − B k F = k A k F + k B k F − AB ) and the equality k A k F = P ni =1 λ i ( A ) .Consider an arbitrary partition P = { S , . . . , S k } of the node set V . Assume forsimplicity that | S i | ≥ | S i +1 | for i = 1 , . . . , k −
1. Introduce the n × k “index matrix” Z = [ S · · · S k ] and define B = ZZ T = P ki =1 S i T S i . Then B has rank k and thecardinalities | S i | are its nonzero eigenvalues in nonincreasing order. Recalling that, forarbitrary matrices A and B it holds trace( AB ) = trace( BA ), (5.1) can be rewrittenas follows: q ( P ) = 1vol G trace( Z T M Z ) = 1vol G trace( BM ) . ith the help of Lemma 5.3 we immediately get q ( P ) ≤ G k X i =1 | S i | λ i ( M ) ≤ n vol G λ ( M ) . The latter bound, which does not depend on P , can be improved as follows: Theorem 5.4.
For any graph G , q G ≤ n − G m ( G ) . Proof . Let V = h i ⊥ and let V be any matrix whose columns form an orthonormalbasis of V . Observe that W = V V T is the orthogonal projector onto V , that is, W = I − n T . Moreover, λ ( V T M V ) = m ( G ) by (3.1).Let P ∗ = { S , . . . , S k } be an optimal partition of G , that is q G = q ( P ∗ ), and letagain Z = [ S · · · S k ]. Since M = W M W , Lemma 5.3 leads us totrace( Z T M Z ) = trace( Z T W M W Z ) = trace(( V T ZZ T V )( V T M V )) ≤ n X i =1 λ i ( V T ZZ T V ) λ i ( V T M V ) ≤ trace( V T ZZ T V ) m ( G ) = trace( Z T W Z ) m ( G ) . From W = I − n T , letting z = Z T = ( | S | , . . . , | S k | ) T we obtain Z T W Z = Z T Z − n Z T T Z = Diag( | S | , . . . , | S k | ) − n zz T . Owing to the fact that k z k ≥ k z k / √ n = √ n , we have trace( Z T W Z ) = n −k z k /n ≤ n −
1. It suffices to collect terms, and the proof is complete.
6. How many modules?.
Based on rather informal arguments, Newman claimsin [15, Sect. B] that the number of positive eigenvalues of M is related to the number ofcommunities recognizable in the graph G , tightening the connection between spectralproperties of M and the community structure of the network it describes. Moreprecisely, the author argues that the number of positive eigenvalues, plus 1, is anupper bound on the number of communities that can be recognized in G . In thissection we prove various results supporting that conclusion, that culminate in thesubsequent Corollary 6.3.Already in Remark 3.1 we noticed that the existence of a subgraph S havingpositive modularity implies that M has at least one positive eigenvalue. By the way,if Q ( S ) > Q ( S ) >
0, therefore two modules (according to Definition 2.2)give rise to one positive eigenvalue. The forthcoming theorem proves that, if G has k subgraphs that are well separated and sufficiently rich in internal edges, then M hasat least k − Theorem 6.1.
Let S , . . . , S k be pairwise disjoint subsets of V , with k ≥ , suchthat vol ( S i ) ≤ vol G and e in ( S i ) > e out ( S i ) . Then, Q ( S i ) > and M has at least k − positive eigenvalues.Proof . Firstly we observe that, owing to the stated hypotheses, the sets S , . . . , S k ave positive modularity. Indeed, for i = 1 , . . . , k we have vol G ≤ S i , whence Q ( S i ) vol G vol S i = vol S i − e out ( S i ) vol G vol S i = e in ( S i ) − e out ( S i ) (cid:18) vol G vol S i − (cid:19) ≥ e in ( S i ) − e out ( S i ) > . Let Z = [ S · · · S k ] and consider the k × k matrix C = Z T AZ . For i, j = 1 , . . . , k we have C ij = T S i A S j = ( e in ( S i ) i = je ( S i , S j ) i = j. The matrix C is symmetric, nonnegative, and (strongly) diagonally dominant. Indeed,owing to the fact that the S j ’s are pairwise disjoint, and E ( S i , S i ) ⊇ ∪ j = i E ( S i , S j ),for i = 1 , . . . , k we have C ii = e in ( S i ) > e out ( S i ) ≥ X j = i e ( S i , S j ) = X j = i C ij . As a result, by Gershgorin theorem, C is positive definite.Introduce the diagonal matrix ∆ = Diag( p | S | , . . . , p | S k | ) − . Owing to theorthogonality of the columns of Z , the matrix ˆ Z = Z ∆ has orthonormal columns. BySylvester’s law of inertia, also the matrix ∆ C ∆ = ˆ Z T A ˆ Z is positive definite. Fromeigenvalue interlacing inequalities (1.1), λ k ( A ) ≥ λ k ( ˆ Z T A ˆ Z ) = λ k (∆ C ∆) > . Finally, using (1.3) we conclude λ k − ( M ) ≥ λ k ( A ) > M instead of A , as intermediate step. Beforethat, it is convenient to introduce an auxiliary notation.Let S and S two disjoint subsets of V . We define their joint modularity as Q ( S , S ) = e ( S , S ) − vol S vol S vol G .
Its absolute value | Q ( S , S ) | is sometimes referred to as discrepancy between S and S , see e.g., [8, § Q ( S , S ) = Q ( S , S ) and Q ( S ) = Q ( S, S ). Furthermore, we canexpress the joint modularity of S and S equivalently as Q ( S , S ) = T S M S . Note that Q ( S , S ) is the difference between the overall weight of edgesbridging S and S and its value as expected by the (weighted) Chung-Lumodel.2. From the equation ( S + S ) T M ( S + S ) = T S M S + T S M S +2 T S M S we have Q ( S ∪ S ) = Q ( S ) + Q ( S ) + 2 Q ( S , S ) . n particular, Q ( S , S ) > Q ( S ∪ S ) > Q ( S )+ Q ( S ). Hence,when looking for an optimal partitioning of G into modules, it is necessarythat the joint modularity of any two subsets is ≤
0, otherwise, we can increasethe overall modularity by merging two subgraphs into one.The forthcoming theorem proves that, under ample hypotheses, the number ofpositive eigenvalues of M , plus 1, is actually an upper bound for the cardinality ofany partition of G into modules such that if any two subsets are merged then theoverall modularity does not increase. Theorem 6.2.
Let P = { S , . . . , S k } be a partition of V , with k ≥ , such that Q ( S i ) > and Q ( S i , S j ) ≤ for i = j . Consider the matrix C such that C ii = Q ( S i ) and C ij = Q ( S i , S j ) for i = j . If C is irreducible then M has at least k − positiveeigenvalues.Proof . Consider the matrices Z = [ S · · · S k ], ∆ = Diag( | S | , . . . , | S k | ) − / andˆ Z = Z ∆. Then, C = Z T M Z . Furthermore, C is weakly diagonally dominant. Indeed, k X j =1 C ij = T S i M k X j =1 S j = T S i M = 0 . Using Gershgorin theorem we deduce C is a symmetric positive semidefinite matrix,with a zero eigenvalue which is associated to the eigenvector . For a sufficient large α > B = αI − C is entrywise nonnegative and irreducible. Hence, byPerron-Frobenius theory, its largest eigenvalue is simple. Since the eigenspaces of B and C coincide, the zero eigenvalue of C must be simple.We deduce that C has k − C ∆ = ˆ Z T M ˆ Z , by Sylvester’s law of inertia. Finally, eigenvalueinterlacing inequalities (1.1) imply λ k − ( M ) ≥ λ k − ( ˆ Z T M ˆ Z ) = λ k − (∆ C ∆) >
0, andthe proof is complete.Note that, in the preceding theorem, irreducibility of C is verified in particularwhen Q ( S i , S j ) < i = j . That condition is fulfilled by any partition maximiz-ing q G which contains the least number of sets among all such partitions (otherwisewe can reduce their number by merging pairs whose joint modularity is zero withoutdecreasing the overall modularity). We get an immediate corollary: Corollary 6.3.
Let P ∗ be a minimal cardinality partition with q G = q ( P ∗ ) ,interely made by modules. Then P ∗ contains no more than k + 1 sets, being k thenumber of positive eigenvalues of M . The following example, which is inspired by a popular benchmark in the commu-nity detection literature, shows that this result is optimal:
Example 6.4 (
Circulant ring of clusters ). Given integers p > and q > ,consider the graph consisting of n = pq vertices, partitioned as P = { S , . . . , S p } ;every G ( S i ) is a clique of order q ; the cliques are arranged circularly, and every nodeof S i is connected to the corresponding node of the two neighboring cliques by an edgewhose weight is γ ∈ (0 , (so that the generalized degree of each node is q − γ ). Inthis graph, the p cliques have postive modularity, and in fact are clearly recognizableas “communities”. We show hereafter that if < γ ≤ / the modularity matrix ofthis graph has exactly p − positive eigenvalues.With a natural numbering of the nodes, the adjacency matrix can be expressed as block circulant matrix with circulant blocks, A = B q γI γIγI B q . . .. . . . . . γIγI γI B q , where B q = T − I is the adjacency matrix of a q -order clique; and the correspondingmodularity matrix is M = A − c T , with c = ( q − γ ) /n , which is still blockcirculant with circulant blocks and, furthermore, simultaneously diagonalizable with A . In fact, let C p ≡ ( c ij ) be the p × p symmetric circulant matrix such that c ij = 1 if | i − j | = 1 mod p and c ij = 0 otherwise. Denoting by F k the unitary Fourier matrixof order k , we have the spectral factorizations C p = F p Diag( λ ( p )1 , . . . , λ ( p ) p ) F ∗ p , with λ ( p ) j = 2 cos(2 π ( j − /p ) ,B q = F q Diag( q − , − , . . . , − F ∗ q . Making use of the Kronecker (tensor) product, the adjacency matrix A admits thedecomposition A = I ⊗ B q + γC p ⊗ I , whence it is diagonalized by F p ⊗ F q . Consequently,the eigenvalues of A are readily computed as follows:a) q − γλ ( p ) j for j = 1 , . . . , p , each of them having multiplicity ; andb) γλ ( p ) j − for j = 1 , . . . , p , each of them having multiplicity q − .A careful observation reveals that, if γ ≤ then the p largest eigenvalues of A areprecisely the numbers in the preceding item a), which are positive; and the remainingeigenvalues are ≤ .The eigenvalues of M coincide with those of A with the exception of the largestone, which is annihilated by the rank-one correction A − M = c T . Consequently,the matrix M has at least p − positive eigenvalues; for all < γ ≤ , they areexactly p − , that is, the number of “communities” minus one. It is interesting tonote that eigenvectors associated to these eigenvalues lie in the span of ℜ ( f k ) ⊗ and ℑ ( f k ) ⊗ , where f k is the k -th column of F p ; in particular, they are constantwithin each clique. Furhermore, for any two distinct integers i, j = 1 , . . . , p , one sucheigenvector assumes opposite signs on S i and S j , so that communities in this graph aredemarcated precisely by modularity nodal domains associated to positive eigenvaluesof M .
7. A Cheeger-type inequality.
Let G = ( V, E ) an unweighted graph. Thenumber h G = min S ⊂ V < | S |≤ n | E ( S, S ) || S | is one of best known topological invariants of G , as it establishes a wealth of deep andimportant relationships with various areas of mathematics [8, 14]. Its connection withgraph partitioning, and discrete versions of the isoperimetric problem, is apparent.Hence, it is of no surprise that various relationships have been uncovered between h G and a ( G ), also under slightly different definitions.The bound h G ≥ a ( G ) / Cheeger inequality , analogously to classical result in Riemannian geometry that relates the solution of the isoperimetricproblem to the smallest positive eigenvalue of the Laplacian differential operator onmanifolds. For example, it is known that if G is a k -regular graph (that is, d i = k for i = 1 , . . . , n ) then h G ≤ p ka ( G ) [14, Thm. 4.11]. In the forthcoming Corollary 7.2we provide a Cheeger-type inequality between modularity and algebraic modularityof a regular graph. Although practical graphs are seldom regular, that hypothesis isimportant to obtain a converse result to Theorem 5.2. Theorem 7.1.
Let G = ( V, E ) be a connected, k -regular graph, and let f be aneigenvector associated to m ( G ) : M f = m ( G ) f . Let w ≥ w ≥ . . . ≥ w n be the valuesof f , . . . , f n sorted in nonincreasing order. Introduce the sets S i = { j : f j ≤ w i } , i = 1 , . . . , n, and let Q ⋆ = max i Q ( S i ) . Then, Q ⋆ ≥ w − w n (cid:18) k k f k − k f k r ( k − m ( G )) kn (cid:19) . Proof . We start by noticing that f is a Fiedler vector of G . Indeed, if G is k -regular then the matrix L in (3.2) becomes L = kI − ( k/n ) T whence L f = kf ;moreover, from the equation m ( G ) = k − a ( G ) and the decomposition (3.2) we obtain Lf = a ( G ) f , that is, f is a Fiedler vector.Consider the quantity σ = X i ∼ j | f i − f j | , where the sum runs on the edges of G , every edge being counted only once. ByCauchy-Schwartz inequality and (1.5), σ ≤ sX i ∼ j ( f i − f j ) sX i ∼ j k f k r a ( G ) vol G k f k r ( k − m ( G )) kn . For ease of notation, we re-number the vertices of G so that f ≥ f ≥ . . . ≥ f n .In this way, the sets S , . . . , S n introduced in the claim are given by S i = { , . . . , i } .Furthermore, the edge boundary ∂S i is the set of all edges having one vertex in { , . . . , i } and the other in { i + 1 , . . . , n } . Let Q ⋆ = max i Q ( S i ). Using | ∂S i | ≥ vol S i vol S i vol G − Q ⋆ = k i ( n − i ) n − Q ⋆ we obtain σ = X i ∼ ji 8. Concluding remarks. In this paper we have studied the community detec-tion problem trough modularity optimisation from an uncommon algebraic point ofview. In particular we have tried to propose popular concepts from complex networksand physics literatures in a mathematical formalism involving mainly linear algebraand matrix theory.We introduce the concept of algebraic modularity of a graph, allowing to clarifythe difference between indivisible graph and algebraically indivisible graph, often usedwith not much attention interchangeably one with the other. We focus our attention n the nodal domains induced by the eigenvectors of the modularity matrix and wederive a nodal domain theorem for such eigenvectors, in complete analogy with thewell known Fiedler vector theorem for the Laplacian matrix [12], and some furtherdevelopments proposed more recently in [9, 10, 19]. However, unlike in the Laplaciancase, nodal domains arising with modularity matrices are naturally endowed by asign, with different properties for positive and negative nodal domains.Then we consider the possible relationship between the number of modules in G and the number of positive eigenvalues of its modularity matrix. Newman claimed in[15] that the number of positive eigenvalues of M is related to the number of com-munities recognizable in the graph G , but his claim was based on rather informalarguments. Our analysis of M instead tries to support this claim showing, in partic-ular, that the presence of communities in G implies that the spectrum of M at leastpartially lies on the positive axis. We would point out here that a reverse implicationis realistic and desirable, but is still an open problem.Finally we focus the attention on Cheeger-type inequalities, discovering that anice estimate elapses between modularity and algebraic modularity of G . At present,our result is limited to regular graphs; its possible extension to more general graphsseems to be a major task and is left as an open problem.As the importance of the community detection problem is apparent, and modu-larity-based techniques are by far the most popular in this ambit, we believe thatthe modularity matrix M could be considered as a relevant matrix in algebraic graphtheory, together with adjacency and Laplacian matrices. The results we obtain giverise to a first spectral graph analysis aimed at the problems of existence, estimationand localization of optimal subdivisions of the graph into communities. Our resultsadhere to modularity-related definitions borrowed from current literature. Probably,modified (maybe, “normalized”) versions of modularity matrices and functions maylead to conclusions different from those presented here. Acknowledgements. The authors appreciate two anonymous referees for theiruseful comments and suggestions, in particular, those of including Remark 1.1 andExample 6.4 in the final version of this paper. REFERENCES[1] N. Arcolano, K. Ni, B. A. Miller, N. T. Bliss, and P. J. Wolfe. Moments of parameter estimatesfor Chung-Lu random graph models. In IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP 2012) , pages 3961–3964, 2012.[2] A. Arenas, A. Fernandez, and S. Gomez. Analysis of the structure of complex networks atdifferent resolution levels. New J. Phys. , 10:053039, 2008.[3] A. Berman and R. J. Plemmons. Nonnegative Matrices in the Mathematical Sciences , vol-ume 9 of Classics in Applied Mathematics . Society for Industrial and Applied Mathematics(SIAM), Philadelphia, PA, 1994.[4] R. Bhatia. Matrix Analysis . Graduate Texts in Mathematics. Springer, 1996.[5] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of com-munities in large networks. Journal of Statistical Mechanics: Theory and Experiment ,2008(10):P10008, 2008.[6] U. Brandes and T. Erlebach, editors. Network Analysis. Methodological Foundations , volume3418 of Lecture Notes in Computer Sciences . Springer, 2005.[7] F. Chung and L. Lu. Complex Graphs and Networks , volume 107 of CBMS Regional ConferenceSeries in Mathematics . AMS, 2006.[8] F. R. K. Chung. Spectral Graph Theory , volume 92 of CBMS Regional Conference Series inMathematics . AMS, 1997.[9] E. B. Davies, G. M. L. Gladwell, J. Leydold, and P. F. Stadler. Discrete nodal domain theorems. Linear Algebra Appl. , 336:51–60, 2001.2110] A. M. Duval and V. Reiner. Perron-Frobenius type results and discrete versions of nodaldomain theorems. Linear Algebra Appl. , 294:259–268, 1999.[11] M. Fiedler. Algebraic connectivity of graphs. Czechoslovak Mathematical Journal , 23:298–305,1973.[12] M. Fiedler. A property of eigenvectors of nonnegative symmetric matrices and its applicationto graph theory. Czechoslovak Mathematical Journal , 25(100):619–633, 1974.[13] S. Fortunato. Community detection in graphs. Physics Reports , 486:75–174, 2010.[14] S. Hoory, N. Linial, and A. Wigderson. Expander graphs and their applications. Bull. Amer.Math. Soc. (N.S.) , 43(4):439–561 (electronic), 2006.[15] M. E. J. Newman. Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E , 69:321–330, 2006.[16] M. E. J. Newman. Modularity and community structure in networks. Proc. Natl. Acad. Sci.USA , 103:8577–8582, 2006.[17] M. E. J. Newman. Networks: An Introduction . OUP Oxford, 2010.[18] M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E , 69(026113), 2004.[19] D. L. Powers. Graph partitioning by eigenvectors. Linear Algebra Appl. , 101:121–133, 1988.[20] N. Przulj and D. J. Higham. Modelling protein-protein interaction networks via a stickinessindex. J. Roy. Soc. Interface , 3:711–716, 2006.[21] J. Reichardt and S. Bornholdt. Statistical mechanics of community detection. Phys. Rev. E ,74:016110, 2006.[22] V. A. Traag, P. Van Dooren, and Y. Nesterov. Narrow scope for resolution-limit-free communitydetection.