Evolutionary method for finding communities in bipartite networks
aa r X i v : . [ phy s i c s . d a t a - a n ] J un Evolutionary method for finding communities in bipartite networks
Weihua Zhan , ∗ Zhongzhi Zhang , , † Jihong Guan , ‡ and Shuigeng Zhou , § Department of Computer Science and Technology,Tongji University, 4800 Cao’an Road, Shanghai 201804, China School of Computer Science, Fudan University, Shanghai 200433, China and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China (Dated: September 11, 2018)An important step in unveiling the relation between network structure and dynamics definedon networks is to detect communities, and numerous methods have been developed separately toidentify community structure in different classes of networks, such as unipartite networks, bipartitenetworks, and directed networks. Here, we show that the finding of communities in such networkscan be unified in a general framework— detection of community structure in bipartite networks.Moreover, we propose an evolutionary method for efficiently identifying communities in bipartitenetworks. To this end, we show that both unipartite and directed networks can be represented asbipartite networks, and their modularity is completely consistent with that for bipartite networks,the detection of modular structure on which can be reformulated as modularity maximization.To optimize the bipartite modularity, we develop a modified adaptive genetic algorithm (MAGA),which is shown to be especially efficient for community structure detection. The high efficiency ofthe MAGA is based on the following three improvements we make. First, we introduce a differentmeasure for the informativeness of a locus instead of the standard deviation, which can exactlydetermine which loci mutate. This measure is the bias between the distribution of a locus over thecurrent population and the uniform distribution of the locus, i.e., the Kullback-Leibler divergencebetween them. Second, we develop a reassignment technique for differentiating the informativestate a locus has attained from the random state in the initial phase. Third, we present a modifiedmutation rule which by incorporating related operation can guarantee the convergence of the MAGAto the global optimum and can speed up the convergence process. Experimental results show thatthe MAGA outperforms existing methods in terms of modularity for both bipartite and unipartitenetworks.
PACS numbers: 89.75.Hc, 02.10.Ox, 02.50.-r
I. INTRODUCTION
Complex network has gained overwhelming popularityas a powerful tool for understanding various complex sys-tems from diverse fields, including the technical, natural,and social sciences, etc., which provides a unified per-spective or method for studying these systems throughmodeling them as networks with nodes and edges re-spectively representing their units and interactions be-tween units [1–6]. Generally, according to the types ofnode, networks can be classified into unipartite, bipar-tite and multipartite networks. As a typical class of real-world networks, bipartite networks, compared to unipar-tite ones, consist of two types of nodes, and edges existonly between distinct types of nodes. Examples of bipar-tite networks come from various fields, including scien-tific collaboration networks, actor-movie networks, andprotein-protein interaction networks [1, 2, 7–9]. Multi-partite networks with more than three types of nodes,are occasionally seen [10, 11].It has been discovered [7] that most real networks share ∗ Electronic address: [email protected] † Electronic address: [email protected] ‡ Electronic address: [email protected] § Electronic address: [email protected] a local clustering feature, i.e., groups of tight-knit nodesmutually connected to each other with sparser edges.These groups of nodes are generally referred to as com-munities or modules. From a topological point of view, acommunity may correspond to a functional unit becauseof its relative structural independence. In turn, commu-nity structure can critically affect diverse dynamics onnetworks. Therefore, identification of communities playsa key role in numerous related areas of complex networks,e.g., predicting protein function [12] and determining dy-namics of systems [13–15]. The last few years have wit-nessed tremendous efforts in this direction [8, 15–26] (use-ful reviews include Refs. [27, 28]). Most previous studiesare dedicated to deal with unipartite networks, while lit-tle attention has been paid to directed networks [23, 24]and bipartite networks [24–26].It is of interest that unipartite and directed networkscan be represented by bipartite networks as will beshown. Thus, detection of communities in unipartite net-works or in directed networks can be transformed into thesame task in bipartite networks. Given a bipartite mod-ularity, those methods based on modularity maximiza-tion [16–19], in principle, can be applied to bipartite net-works. However, they are expected to be affected by theresolution limit [29, 30] as in the unipartite case, whichmay result in the degeneracy problem [31]. This poses achallenge for the methods that return one solution. In-stead, we present a modified adaptive genetic algorithmto optimize the bipartite modularity [26]. The evolution-ary method can return a better solution in a shorter time.Moreover, the method also can return multiple better so-lutions in multiple runs, which enables us to evaluate thereliability (or significance) of solutions without resortingto other technique as in [32, 33], as well as to obtaina superior solution by combining several better solutionswhen the degeneracy problem is severe.In practice, there exist two distinct conceptual under-standings of the community structure of a bipartite net-work. The first viewpoint for communities in the networkis to consider each composed of two types of nodes withdense edges across them, which is similar to the viewof unipartite cases [26]. An alternative view is that anycommunity should contain only one type of nodes, whichare closely connected through co-participation in severalcommunities that consist of another type of nodes [24].Guided by this view, the usual approach to identifyingcommunities is to project the bipartite network onto onespecific unipartite network as needed, and then identifycommunities in the projection. Guimer`a et al. [24] re-cently presented a method for identifying communitiesof one type of nodes against the other type of nodes witha known community structure.In this paper, we focus on dealing with the problemof identifying communities from the first viewpoint. Wepresent a modified adaptive genetic algorithm (MAGA),based on the mutation-only genetic algorithm (MOGA),which is parameter-free unlike the traditional genetic al-gorithms. The method has no need to know in advancethe number of communities and their sizes. In Sec. II, wefirst give a short review of Barber’s modularity [26] andthen show that unpartite networks and directed networkscan be uniformly represented by bipartite networks. Af-ter the description of the MOGA in Sec. III A, we intro-duce a different measure for selecting loci to mutate inSec. III B, and then develop the reassignment techniquein Sec. III C. Further, we discuss how to select the popu-lation size in Sec. III D and address the issues of conver-gence and time complexity of the MAGA in Sec. III E.In Sec. IV, we apply the algorithm to model bipartitenetworks, several real bipartite networks, and unipartitenetworks. Finally, the conclusion is given.
II. BIPARTITE MODULARITY
The modularity introduced by Newman and Girvan [8]aims at quantifying the goodness of a particular divisionof a given network, and has been widely accepted as abenchmark index to measure and to compare the accu-racy of various methods of community detection. Thedefinition of this quantity is based on the idea that com-munity structure definitely means a statistically surpris-ing arrangement of edges, that is, the number of actualedges within communities should be significantly beyondthat of the expected edges of a null model. In turn, a nullmodel should have the same number of nodes and degree distribution as the original network, while the edges ofthe null model are placed by chance.Let k i be the degree of nodes i , and M the total numberof edges. Since in the null model [18] the probability foran edge being present between nodes i and j is k i k j M ,the modularity quantifying the extent of the number ofactual edges exceeding the expectation based on the nullmodel network, can be formulated as follows: Q = 12 M N X i =1 N X j =1 (cid:18) A i,j − k i k j M (cid:19) δ ( g i , g j ) (1)where Q is the sum of the difference over all groups of theparticular division, N is the network size, A i,j indicatesthe adjacent relation between nodes i and j , g i representsthe group the node i is assigned to, and the δ functiontakes the value of 1 if g i equals g j , 0 otherwise.The value of Q ranges from -1 to 1. Given a network, alarger value generally indicates a more accurate divisionof the network into communities. Community structuredetection thus can be formulated as a problem of mod-ularity maximization, which often works well althoughit may suffer from a resolution problem [29, 30]. Buton the other hand, due to the resolution limitation andthe random fluctuation effect [34], it appears preferablefor the divisions delivered by maximization modularityapproaches to give an evaluation of their reliability [31–33, 35].The above modularity is actually designed for uni-partite networks. To be suitable for various networks,several variations of modularity based on different nullmodels have been proposed, including weighted [36], di-rected [23], and bipartite modularity [24, 26]. A bi-partite network with N nodes can be conveniently de-noted by a duality ( p, q ) ( p + q = N ), where p and q respectively represent the numbers of the two types ofnodes. We can renumber nodes such that in the sequence1 , , · · · , p, p + 1 , · · · , N , the leftmost p indices representthe first type of nodes and the remainder represent thesecond type of nodes. Then, Barber’s bipartite mod-ularity [26], which considers a community composed ofdistinct types of nodes in the network, can be written as Q b = 1 M p X i =1 N X j = p +1 (cid:18) A i,j − k i k j M (cid:19) δ ( g i , g j ) (2)Immediately, a subtle difference between the two mod-ularities in Eqs. (1) and (2) can be observed. It is ofinterest that a unipartite network can be equivalentlyrepresented as a bipartite one, and the bipartite modu-larity can recover the modularity for the original network.If each node i is represented by two nodes A i and B i andeach edge i - j represented by two edges A i - B j and A j - B i ,then a unipartite network with N nodes and M edges istransformed into a corresponding bipartite network with2 N nodes and 2 M edges. For example, the transforma-tion of a simple unipartite network is shown in Fig. 1.Further, if we label N nodes A i with 1 , , . . . , N and la- (a)(b) FIG. 1: Transformation of a simple unipartite network into abipartite one. (a) An unipartite network with five nodes andsix edges. (b) The bipartite network corresponding to (a). bel B i with N + 1 , N + 2 , . . . , N , then an edge i - j in theoriginal network corresponds to two edges, i -( N + j ) and j -( N + i ). Using the bipartite modularity introduced inEq. (2) on the induced bipartite network, we have Q b = 12 M N X i =1 2 N X j = N +1 (cid:18) ˜ A i,j − k i k j M (cid:19) δ ( g i , g j )= 12 M N X i =1 N X j ′ =1 (cid:18) ˜ A i,N + j ′ − k i k N + j ′ M (cid:19) δ ( g i , g N + j ′ )= 12 M N X i =1 N X j =1 (cid:18) A i,j − k i k j M (cid:19) δ ( g i , g j ) = Q (3)where we have made use of the fact that the node A i and B i should be in an identical community and havethe same degree. Thus, bipartite modularity can also beused to community detection in unipartite networks afterbeing transformed.We then turn to the modularity for directed unipartitenetworks, which are another important class of networks.The directed network can analogously be transformed toa bipartite network. A node i is represented by two nodes A i and B i as in unipartite networks, while a directed edgefrom i to j is represented as an edge between A i and B j ,that is, set { A i } and set { B i } are the sources and thesinks. Again, using the Eq. (2) and the fact above, we obtain Q b = 1 M N X i =1 2 N X j = N +1 (cid:18) ˜ A i,j − k i k j M (cid:19) δ ( g i , g j )= 1 M N X i =1 N X j ′ =1 (cid:18) ˜ A i,N + j ′ − k i k N + j ′ M (cid:19) δ ( g i , g N + j ′ )= 1 M N X i =1 N X j =1 A i,j − k out i k in j M ! δ ( g i , g j ) (4)where the term on right-hand side in the last equationis just the modularity for directed networks presentedin [23]. The method for transforming directed networksinto bipartite ones has been proposed by Guimer`a etal [24], but their bipartite modularity is distinct fromBarber’s, as mentioned before.Consequently, the bipartite network can be consideredas a wider class of networks that provides a generic casefor the problem of community structure detection. AndBarber’s bipartite modularity can served as a uniformobjective for these methods of identifying communitiesbased on optimization. III. EVOLUTIONARY METHOD FORCOMMUNITY DETECTION
As a class of general-purpose tools to solve various hardproblems, genetic algorithms have found wide applicationin bioinformatics, computer science, physics, engineering,and other fields. They are, based on the Darwinian prin-ciple of survival of the fittest, a kind of global optimiza-tion method simulating evolutionary processes of speciesin nature [37].The evolutionary methods are easy to implement, andthe process can be described as follows. The methodsstart with a stochastically created initial population withpredefined size wherein individuals are known as chro-mosomes representing a set of feasible solutions to theproblem at hand, with each associated with a fitnessvalue. Then chromosomes are selected in proportion totheir corresponding fitness so that those fitter individualswould will have multiple copies and less fit will be dis-carded in the new population. Next, genetic operatorssuch as crossover and mutation are performed accord-ing to the respective specified ratios on the population.After these operations, the population of the next gener-ation has been reproduced. The above process is iteratedto evolve the current population toward better offspringuntil the termination criterion is met.Since the number of divisions on any given networkgrows at least exponentially in the network size, the op-timization of modularity is clearly an NP-hard problemthat has been given a rigid proof in [38], which has moti-vated an array of heuristic methods including greedy ag-glomeration [9], simulated annealing (SA) [16], spectralrelaxation (SR) [17, 18], extremal optimization (EO) [19]and mathematical programming [39]. All these meth-ods perform a point-point search, that is, transformationfrom one solution to a better one, and are susceptible totrapping in a local optimum. In contrast, genetic algo-rithms work with a population of solutions instead of asingle solution. This implies that genetic algorithms aremore robust because they perform concurrent searchesin multiple directions which would make them effectivelyfind better solutions.However, for practitioners, a fundamental importantproblem is to choose appropriate parameters such ascrossover rate and mutation rate, because they will seri-ously affect the performance of genetic algorithms. Fur-thermore, these parameters are closely related to thestudied problems, and even for the same problem, theyshould adjust themselves in the course of the search.In the following, we would like to introduce an adap-tive genetic algorithm recently presented by Szeto andZhang [40] and then propose a modified version suitedfor community structure detection.
A. Mutation only Genetic algorithm
Traditional genetic algorithms assume that genetic op-erators indiscriminately act on each locus constitutingthe chromosome, but this is not always the case. Indeed,the recent research in human DNA [41] shows that mu-tation rates at different loci are very different from oneanother. Inspired by this, Ma and Szeto [42] reportedon a locus-oriented adaptive genetic algorithm (LOAGA)that makes use of the statistical information inside thepopulation to tune the mutation rate at an individual lo-cus. Szeto and Zhang [40] further presented a new adap-tive genetic algorithm, called the mutation only geneticalgorithm (MOGA), which generalized the LOAGA byincorporating the information about the loci statistics inthe mutation operator. In the MOGA, mutation is theonly genetic operator, and the only required parameter isthe population size. The MOGA was readdressed by Lawand Szeto in [43], wherein it was extended to include acrossover operator. Here, the description for the MOGAis given on the basis of the later version.The population matrix P has N P stacked chromosomeswith length L , with its entries P ij ( t ) representing the al-lele at locus j of the chromosome i at time (or genera-tion) t . The rows of this matrix are ranked according tothe fitness of the chromosomes in descending order, i.e., f ( i ) ≥ f ( k ) for i < k . The columns are ranked accordingto the standard deviation σ t ( j ) (its definition will givenbelow) of alleles at locus j such that σ t ( j ) ≥ σ t ( k ) for j < k . In the MOGA, the fitness cumulative probabil-ity, as an informative measure for chromosome i relativeto the landscape of fitness of the whole population, wasintroduced and defined as C ( i ) = 1 N P X g ≤ f ( i ) N ( g ) , (5) where N ( g ) is the number of chromosomes whose fit-ness values equal g . Subsequently, the standard deviation σ t ( j ) over the allele distribution, as a useful informativemeasure for each locus j , is defined as σ j ( t ) = vuut P N P i =1 ( P ij ( t ) − h j ( t )) × C ( i ) P N P i =1 C ( i ) , (6)where the weighting factor C ( i ) reflects the informativeusefulness of the chromosome i , and h j ( t ) is the mean ofthe alleles at locus j , given by h j ( t ) = 1 N P N P X i =1 P ij ( t ) . (7)A locus with a smaller allele standard deviation wasconsidered to be more informative than other loci, andvice versa. Indeed, this really makes sense in limited sit-uations. For the initial population, the alleles at eachlocus j should satisfy a uniform distribution, so the stan-dard deviation σ t ( j ) will be very high while the locuspresent is not informative. A typical optimization prob-lem generally allows for a few global optima, so the lociwith higher structural information are liable to take feweralleles than allowed, thereby having smaller allele stan-dard deviations. Therefore, the loci with higher devia-tions prefer mutating while the other loci (informativeloci) remain to guide the evolution process.Now we can describe the process for the MOGA. Ineach generation, we sweep the population matrix fromtop to bottom. Each row (a chromosome) is selected formutation, with probability α ( i ) = 1 − C ( i ). Accordingto Eq. (5), we have N P ≤ C ( i ) ≤
1. Then, a chromo-some with a higher fitness value has fewer chances to beselected, and vice versa. In Particular, the first chromo-some that has the highest fitness value will never be se-lected for mutation, while the last one will almost alwaysundergo mutation for a large enough N P , if N P normallytakes a value from 50 to 100 as De Jong suggesed [44],for example, α ( N P ) = 1 − N P = 0 .
98 for N P = 50. If thecurrent chromosome selected is i , then the number N ( i )of loci for mutation is prescribed as N ( i ) = α ( i ) × L .Thus, a selected chromosome with a higher fitness valuehas fewer loci to mutate, so that most of the informativeloci remain; while a selected chromosome with a lowerfitness value has more less-informative loci to mutate. Inpractice, we can mutate the N ( i ) leftmost loci becausethey are less informative than others according to theabove arrangement of loci.Overall, the MOGA was expected to have a two-foldadvantage over traditional genetic algorithms: first, be-cause there is no need to input parameters except thepopulation size it can be more available for solving var-ious problems; second, the mechanism of adaptively ad-justing parameters can make it more effectively performand obtain better solutions if it works as expected. FIG. 2: Encoding schema of a chromosome for a bipartitenetwork ( p, q ). B j k (for k ≤ p ) is the allele at the locus rep-resenting node A k , which stands for a neighbor node of A k in the network. Similarly, A i k (for k ≤ q ) is the allele at thelocus representing node B k . B. Measure for the informativeness of loci
Despite these possible advantages, the MOGA cannotbe directly applied to community structure detection dueto a drawback that will be shown. Instead, we present amodified version of the MOGA, i.e., the MAGA, which isespecially suited for the problem of community structuredetection. We note that genetic algorithms have beenapplied to this problem in [45, 46], but these applicationsare based on standard genetic algorithms (SGAs).We begin with the encoding schema of the genetic al-gorithm for finding communities in a bipartite network.A useful representation is the locus-based adjacency rep-resentation presented by Park and Song in [47] where itwas used in clustering data and also has been used forcommunity detection [46]. In this encoding schema, achromosome consists of N loci with a locus for a nodein the network, and the allele at a locus j is the labelof one neighbor of node j in the network. In this way, achromosome actually induces a graph that is often dis-connected because of the reduction in connectivity rel-ative to the original network. Given the connectivity ofthe community, decoding the division from a chromosomethen amounts to finding all the connected components ofthe induced graph. For simplicity, we also call them theconnected components of the chromosome.Now, we apply the encoding schema to the case of bi-partite networks. Given a bipartite network ( p, q ), we la-bel its nodes as noted above, i.e., we label nodes of type A with 1 , , · · · , p while we label another type of nodeswith p +1 , · · · , N . Then a chromosome R for the networkcan be represented as that shown in Fig. 2. Since our ob-jective is to find a division with as higher a modularityas possible, the fitness function can be defined directly interms of the modularity. Based on the above representa-tion for the chromosome, this function becomes f ( R ) = Q b ( π R ) = 1 M p X i =1 N X j = p +1 (cid:18) A i,j − k i k j M (cid:19) δ ( g i , g j ) , (8)where the parameter π R emphasizes that the division onwhich the modularity is calculated is encoded by chro-mosome R .Recall that in the MOGA the allele standard devia-tion is used to pick the loci to mutate. When appliedto community structure detection, however, the measuregenerally will misguide the algorithm. Consider the sim- TABLE I: Example of a population with three chromosomes.Fitness is calculated on the division induced from decodingthe chromosome. Values in each column are the alleles at thelocus.Chro. Fitness Loc.1 Loc.2 Loc.3 Loc.4 R . R . R . σ . . . . ple case in which the population consists only of threechromosomes, R , R , and R , which in turn consist offour loci that have three alleles. Table I shows the alleledistribution at these loci.From Eqs. (5) and (6), the allele standard deviationsfor the four loci, σ , σ , σ and σ (henceforth, we omitthe parameter t for simplicity), can be calculated to ob-tain σ > σ > σ > σ . (9)According to the selection criterion of the loci to mutatein the MOGA, σ has the highest standard deviation andwill be picked out.In fact, the informativeness of a locus implies a certainbias, and vice versa. The initial population is gener-ated randomly and each locus follows an approximatelyrandom distribution. From the uniform distribution, wehave nothing on the structure of the optimal solutionto the given problem. With gradual evolution, more andmore fit members of the population will assume the samealleles at some loci, which may suggest some structuralinformation of the optimal solutions; that is, the bias (ordeviation) from the random distribution indicates the in-formativeness of the locus. In the simplest case such asthe knapsack problem where each locus takes the value1 or 0, the allele standard deviation amounts to the biasand the MOGA can work well [40].For the current case, loci 3 and 4 should be selectedwith equally higher priority because their allele distribu-tions are equally closer to their respective random distri-butions. Both loci 1 and 2 appear with a certain bias ontheir alleles, indicating that they are more informativethan others. If the informativeness of each chromosomeis taken into account, however, they are evidently differ-ent from one another. Locus 1 has a larger bias since thechromosomes with the same allele 100, i.e., R and R ,have higher fitness. In contrast, locus 2 has a smaller biassince the chromosomes with the same allele 50, i.e., R and R , have lower fitness. Therefore, the correct orderof mutation islocus 3=locus 4 > locus 2 > locus 1 , (10)where the equality means that the pair of loci have thesame priority for mutation. Obviously, the allele stan-dard deviation would severely misguide the MOGA inthe current case.The failure of the allele standard deviation stems fromthe fact that this measure is closely related to allelesat loci. However, the information contained in loci isactually not relevant to the particular values but solelydetermined by the bias relative to the random distribu-tion. The method of measurement of the bias is thuscrucial. Fortunately, we can use the Kullback-Leibler di-vergence [48] to describe the bias.In the formalism of the MAGA, we explicitly representa locus j as a discrete random variable X j , and an alleleat the locus is a value that X j can take. Note that in thefollowing the set of all alleles at the locus is denoted by X j as well. Then the random distribution at the locuscan be formally given by Q ( X j = x ) = ( | X j | , for each x ∈ X j , , otherwise. (11)Let the allele distribution over the population be P , de-fined by P ( X j = x ) = P P ij = x f ( i ) P i f ( i ) . (12)We can mathematically define the bias µ as the Kullback-Leibler divergence between the two distributions, P and Q : µ ( j ) = X x ∈ X j P ( X j = x ) log P ( X j = x ) Q ( X j = x ) , (13)The base of log is irrelevant, but it will change the valueof bias, and in the following all the logs are taken to base2. It is noteworthy that the quantity 0log0 should beinterpreted as zero. As a Kullback-Leibler divergence,the bias is always non-negative and is zero if and only if P = Q . The intuitive explanation is that the amount ofinformation a locus contains is always non-negative, andthat we have to roll an unbiased dice if we have not anyknowledge about something. Conversely, we can predictthat an event will inevitably occur only when we havecomplete information about it.Reconsidering the above example, we obtain µ =0 . µ = 0 . µ = µ = 0 . | X j | . C. The reassignment technique for the locusstatistic
It is so far acknowledged that the loci with randomdistributions should have the highest priority for muta-tion. However, in the community detection case this pre-supposition does not always hold. After the evolutionof a certain number of generations, some communities ortheir main bodies will appear at the population scale. Atpresent, a locus with a random distribution does not nec-essarily imply that it contains no information and shouldundergo mutation immediately. Generally, there exist inthe network many nodes whose neighbors are all (or al-most all) in the same communities and have a similarconnection pattern or even are structurally equivalentnodes [49] that are connected to the same nodes. Forsuch a node, if all (or most) of its neighbors presentingin the same connected component predominates in thecurrent population, then the locus has a random distri-bution or an approximately random distribution. There-fore, we are required to differentiate the cases to avoidsuch misguiding.The reassignment technique is designed to deal withthis problem. For a chromosome R , the element x is theallele at the locus j which is a neighbor of the node j .Check whether the component in which j lies includesother neighbors with smaller labels in the original net-work. If it is true and the neighbor with the smallestlabel is y , then the contribution from R , f ( R ) P i f ( i ) thatshould be assigned to x now is reassigned to y if x = y .In this way, forward sweeping of the population matrixcan obtain an updated allele distribution at the locusover the population, given by P ∗ ( X j = x ) = P S ( i,j )= x f ( i ) P i f ( i ) . (14)where S ( i, j ) is the node j ’s neighbor with the smallestlabel that lies with j in the same component of the chro-mosome i . TABLE II: Example of reassignment technique. Column 1lists four chromosomes, column 2 is the fitness of the chro-mosomes, column 3 shows the alleles of locus 1, and the rightfour columns show whether the corresponding nodes are inthe same connected component as node 1, with 1 indicatingyes and 0 no.Chro. Fitness Loc.1 Loc.2 Loc.3 Loc.4 Loc.5R1 0 .
28 2 1 1 0 0R2 0 .
25 3 0 1 1 0R3 0 .
25 4 1 0 1 1R4 0 .
22 5 0 1 0 1
An example using the technique as shown in Table II.Using Eq. (12), it is obvious that the locus 1 has anapproximately random distribution and thus the bias isclose to 0. Recalculating the distribution with the re-assignment technique, however, we have P ∗ ( X = 2) = FIG. 3: Two possible schemes for changing the allele at locus j , where nodes represent loci and the directed edge j → i rep-resents that the present allele at locus j is i , while undirectededges are irrelevant to the reassignment process. The blacknode is the node (locus) j , the nodes with dashed border arethe allele nodes in this component, the gray ones are the in-fluenced nodes and the others are indifferent ones. (a) Thenew target node 1 (new allele) is in the subgraph elicited fromthe node 3 (the present allele at locus j ). (b) The new targetnode 1 is not in the subgraph elicited from the node 3. . P ∗ ( X = 3) = 0 .
47, and P ∗ ( X = 4) = P ∗ ( X =5) = 0, which is very different from the random distribu-tion with bias 1.0026.The idea behind the technique is well understood.Given a locus j , we can replace the present allele with anyother allele that lies in the same component in a way thatdoes not alter the connectivity of the component hencecausing no change in the division encoded by the chromo-some. To show its feasibility, we focus on the componentin which j lies. Recall that a locus represents a nodeand the allele at the locus represents the unique neigh-bor the node adheres to. Consequently, the component isin the form of a directed graph with unitary out-degreefor each node. There exist two possible schemes as shownin Fig. 3. Note that the undirected edges are irrelevantto the reassignment process; thus their directions can bedisregarded.In the scheme depicted in Fig. 3(a), we can directlychange the allele from 3 to 1 but still maintain the con-nectivity of the component. For the scheme in Fig. 3(b),however, such direct altering of the allele will split theoriginal component. To deal with this case, we studythe travel in the component along directed edges, start- ing from the node j . Since the subgraph elicited fromnode j is connected to the rest of the component through j , this travel must end in a node that has passed. Letthe path be j → x → x → · · · → x k − → x k .When x k = j , we can reestablish the connectivity byremoving the last edge, reversing the direction of eachedge in the path, and adding a new edge x (3) → j .Note that the resultant graph meets with the constraintthat any node has only one outgoing edge. Therefore,we can reset the alleles at those loci involved in thepath. For example, in Fig. 3(b), the entire path is j → → → → → →
2, so we can set the al-leles according to the path, 7 → → → → → j .Now, the allele at the locus can be set to 1. As for thecase x k = j , we can directly alter the alleles as in thescheme in Fig. 3(a).In the reassignment technique, we can also reassign thecontribution from the chromosome to the allele with themaximum label that lies in the same component whenperforming locus statistics. More generally, the methodcan also work as long as we arbitrarily specify a fixed re-assignment order for each locus, although different pre-scriptions may produce different biases.Clearly, the reassignment technique is very useful forcommunity structure detection although it would notwork when applied to loci that have a single allele, i.e.,the corresponding nodes in the network are leaves. More-over, this special case can be readily eliminated by for-bidding the mutation, which may bring about the addi-tional merit that it naturally reduces the complexity ofthe problem. Since most real-world networks are scale-free where substantial number of leaf nodes exist, thismerit will be very significant for finding communities insuch networks. D. Population size
As in the MOGA, the unique parameter required to beprovided in the MAGA is the population size. The pa-rameter may have significant influence on the applicationof genetic algorithms. De Jong’s experiment on a smallsuite of test functions showed [44] that the best popula-tion size was 50-100 for these functions. There are alsoother empirical studies and theoretical analyses of thisparameter [50, 51]. In practice, De Jong’s setting hasbeen widely adopted, which may be because this choicegives a good tradeoff between the quality of the solutionand the cost of computation in many cases.This popularity of the setting, however, does not ex-clude the development of genetic algorithms working witha variable population size. A few examples of the classof algorithms can be found in [52–54]. Although one ofthese mechanisms may be beneficial to be incorporatedinto MAGA, in this work we does not take it into account.Since we expect that all alleles at a locus can simulta-neously appear in the population, a population size thatis greater than the degrees of most nodes in the net-work would be preferable . As mentioned before, mostreal-world networks are scale-free, so the degrees of mostnodes in these networks are smaller than 50. Consideringthis fact and the cost of large population size, we wouldlike to take a fixed value from the interval between 50and 200.
E. Convergence and its speeding up
The MOGA was reported to perform well in the ap-plication to solve the knapsack problem [40], where allthe loci have two alleles, 0 and 1. For many cases, how-ever, its performance will be hindered by two factors.One factor is the misguiding by the allele standard devi-ation mentioned above. The other is that in the evolutionof each generation the fittest individual(s) actually willnot participate in the mutation unless others supersedeit (them).In fact, despite fulfilling the elite preservation strat-egy [55, 56] that assures convergence for a SGA towardthe global optimum, the MOGA does not guarantee suchconvergence and even may end with a nonlocal optimumsolution. Consider a case where the N P − M c . For the power method, it needs O ( N )multiplications to converge the lead vector of a matrix ofsize N , which leads to a run time O ( N ) for a biparti-tion in the spectral method [17]. In order to not increasethe time complexity of each generation’s evolution, themultiplication is executed at most N log N/M c times.Combining the above considerations, the overall pro-cedure of the MAGA for community detection can bedescribed as follows.(1) The connectivity of the network of interest is fedinto the MAGA. The algorithm then creates N P initialfeasible solutions, each locus of which is initiated with a random allele.(2) At each generation, the MAGA first duplicates 10%the fittest chromosomes of previous generation for thecurrent generation.(3) The MAGA then reproduces 0 . N P individuals byselecting from the previous generation in proportion totheir fitness to prepare for mutation.(4) The fitness and the fitness cumulative probabilityfor chromosomes are evaluated using Eqs. (8) and (5),respectively; immediately, the bias for each locus usingEqs. (13) and (14) is evaluated, and then these loci areranked according to their biases.(5) The individuals reproduced in step 3 are swept,and the chromosome i selected with the same probabil-ity 1 − C ( f ( i )) as in the MOGA; if the chromosome ischosen then the mutation aforementioned is performed,otherwise a local search for the fitter individuals is per-formed.(6) Steps 2–5 are repeated until a certain terminationcriterion has been met. Otherwise, the MAGA outputsthe best partition with the highest fitness.Since in step 3 the fitter individuals incline to be repro-duced because of their higher fitness, step 4 enables thereproduced fittest individuals always to perform a localsearch. Step 2 maintains the elite preservation strategyin case of the destruction of the strategy in step 5. Inthis way, the MAGA not only can converge to the globaloptima, also can speed up the process.The most time-consuming operations in each genera-tion are evaluating the bias and fitness with O ( M ) time,and ranking the loci with O ( N log N ) time. This rank-ing operation has seemingly slightly higher complexitythan an O ( M ) operation if the network is sparse. Infact, it can be performed faster than those operationswith O ( M ) time since the latter need to be repeated N P times. Therefore, the overall time cost for each genera-tion of the MAGA is O ( M ) like that of SGAs. IV. RESULTS
In this section, we empirically study the effective-ness of the MAGA by applying it to model bipartitenetworks and several real bipartite networks. In bothcases, we show that MAGA is superior to SGAs and theMOGA [59], and it also can compete with the nice BRIM(bipartite, recursively induced modules) [26] algorithmthat dedicated to bipartite networks. We also tested theperformance on several real unipartite networks, com-paring with several well-known methods for unipartitenetworks in the literature.
A. Model bipartite networks
To test how well our algorithm performs, we have ap-plied it to model bipartite networks with a known com-munity structure. A model network can be constructedin two steps. The first step is to determine the layoutof nodes in the network, i.e., to specify the number ofcommunities N C , and the numbers of nodes of two typesincluded in each community N A and N B , as well as toassign group membership to these nodes. Next, the dis-persion of edges is determined by specifying the intra-community and intercommunity link probabilities p in and p out , such that p in ≥ p out .For simplicity, all communities assume the same valuesof N A and N B . We set N C = 5, N A = 12, and N B = 8as used in [26]. One might expect that as p in is markedlygreater than p out the networks exemplifying the modelhave significant community structure that tends to bedetected. Conversely, as p out approaches p in , the net-work examples become more uniform and their modularstructure becomes more obscure. In this experiment, p in is fixed at the value of 0.9 while p out is varied by tuning p out /p in from 0.1 to 0.9 with steps of 0.1. We have testedon such models the performance of the MAGA as well asof the SGA and the MOGA, each exemplified with tennetworks. On each example we ran these algorithms tentimes.For evaluating the quality of solutions, both the modu-larity and the normalized mutual information (NMI) [27]are useful. But the NMI is more suitable for the currentcase since the optimal (correct) division of the model net-work is known in advance. This measure takes its maxi-mum value of 1 when the found division perfectly matcheswith the known division while it takes 0, the minimumvalue, when they are totally independent of each other.Accordingly, we employed the stop criterion that the al-gorithms reach the predefined generation size (maximumnumber of generations) or the NMI reaches its maximumvalue.Figures 4 and 5 display the performance comparisonbetween such genetic algorithms for p out /p in = 0 . p out /p in = 0 .
2, respectively. The generation size is set to2000. For both cases, the MAGA and the SGA remark-ably outperform the MOGA. From Fig. 4 (a), we can seethat the the MAGA is appreciably faster than the SGA,although both perform well since the mutual informationrapidly exceeds 0.9. In our test, each run of MAGA onall ten example networks consistently gave the optimaldivision, i.e., produced 100 numbers of generations lessthan 2000. For the SGA, 97 runs gave the optimum di-vision. Their distributions of the number of generationsneeded to reach the optimum, reported in Figs. 4(b) and4(c) further reveal their difference in speed (in terms ofthe number of generations).When p out /p in = 0 .
2, it is more difficult to identifytheir community structure of the example networks rela-tive to the previous ones. The SGA succeeded in obtain-ing the optimum division in 32 runs. In sharp contrast,each run of the MAGA gave the optimum division. Moreinformation on the distributions of the number of gen-erations is provided in Fig. 5 (a). Also, in Fig. 5(b),the variations of the mutual information with regard tothe SGA and the MAGA illuminate that there exists a G ene r a t i on RunsGenerationRuns G ene r a t i on (c)(b)(a) MAGA SGA MOGA M u t ua l I n f o r m a t i on FIG. 4: (Color online) Performance on bipartite model net-works with p in = 0 . p out /p in = 0 .
1. The generation sizeis set to 2000. (a) Variation of normalized mutual informa-tion over first 500 generations. (b) Distribution of the numberof generations needed to reach the optimum using the SGA.More than half the number of generations are over 200. (c)Distribution of the number of generations needer to reach theoptimum using the MAGA. There are 83 runs in which thenumber of generations is less than 200. greater performance difference between them than in thecase of p out /p in = 0 . p out /p in = 0 .
1, the MOGA was not observedto reach the optimum solution in its first 2000 generationswas not observed. Actually, the MOGA performed sopoorly that it was even much slowly than the SGA asshown in Figs. 4 (a) and 5 (b). We argue that the main0
MAGA SGA G ene r a t i on Runs
MAGA SGA MOGA N o r m a li z ed M u t ua l I n f o r m a t i on Generation
FIG. 5: (Color online) Performance on model networks with p in = 0 . p out /p in = 0 .
2. The generation size is set to2000. (a) Distributions of the number of generations to reachthe optimum using the SGA and MAGA. There are 32 blackcircle points and 100 red box points respectively representingthe number of generations needed to reach the optimum usingthe SGA and MAGA. Most numbers of generations for theSGA are distributed above 1000 while for the MAGA most arebelow 800. (b) Variation of normalized mutual informationwith the number of generations. Each point is the averageover the 100 runs. reason for this is that the use of an incorrect informativemeasure for the loci has misguided the algorithm.We have made a more extensive performance compar-ison. Figure shows the variations of accuracy of theMAGA and SGA as well as BRIM against changes of p out /p in . For the model networks, assigning each of thenodes from the smaller groups to its own module is a bet-ter strategy for BRIM that will lead to a precise division.To be fair [60], we picked the best division from the tenruns on each sample network and then averaged over tenexamples for a particular p out /p in . N o r m a li z ed M u t ua l I n f o r m a t i on p out /p in MAGA SGA BRIM
FIG. 6: (Color online) Variation of performance of the algo-rithms with different p out /p in . Each point is the average overten sample networks. For p out /p in = 0 . .
2, the gener-ation size is set to 2000; for other values, the size is set to3000.
B. Southern women network
As the first example of a real bipartite network, westudy the southern women network [61]. The social net-work consists of 18 women and 14 events for which thedata were collected by Davis et al. in the 1930s, describ-ing the participation of the women in these events. Ithas been extensively used as a typical instance for inves-tigating the problem of finding cohesive groups hidden insocial networks; see Ref. [25] for a useful review.We have performed the MAGA ten times on this net-work, with the population size 100 and the generationsize 3000. Unlike the BRIM algorithm for which ini-tial state is important, initial states are generally ir-relevant (or weaker relevant) for genetic algorithms tothey can succeed in finding a quite good solution. Foreach run, the MAGA found the best solution so far, with Q = 0 . I ∗ norm ), and averagemodularity ( Q ∗ ).No matter what we are concerned about, the speed orthe quality, the MAGA again has an evident advantage1
123 4567 8910111213 1415 161718 192021 2223 242526 2728 293031 32
FIG. 7: (Color online) Southern women network (dashed linesindicate the division found by the MAGA). Each communityconsists of those nodes with the same color (level of scale),including women and events represented by box and triangleevents respectively. over the SGA and MOGA. Table IV shows the accuracyof the MAGA in comparison with other methods.
TABLE III: Performance comparison between the SGA,MOGA and MAGA on the southern women network. Eachalgorithm runs ten times. Here, success means that the algo-rithms have found the best solution before they reaches thegeneration size 3000.Method Succ. MinGen. MaxGen. I ∗ norm Q ∗ SGA 4 2011 2924 0 . . − − . . . Most previous studies assigned these women to groupsdepending on their interests. Davis et al. [61] assignedthe women to two groups, labeled 1-9 and 9-18. Woman9 can be considered as an overlapping node of the twogroups in a sense, but should be exclusively included inone group by the currently used community definition.We may label the division with 9 and 1-8 in the samegroup as “Davis 1”, and the alternative division (9 isgrouped with 10-18) as “Davis 2”.Doreian et al. [62] took the definition of a bipartitecommunity composed of two types of nodes and proposedseveral divisions, with the accuracy of the division withthe highest modularity shown in Table IV. We call theBRIM algorithm using the strategy of (1) assigning allevents to a single module and (2) assigning each event toits own module “BRIM 1” and “BRIM 2,” respectively.Barber [26] reported its accuracy when using such strate-gies on the network; these results also can be found inTable IV.
TABLE IV: Performance comparison on the southern womennetwork, where some data are drawn from [26].Method Communities Q ∗ I ∗ norm MAGA 4 0 . . . . . . . . . . MAGA SGA MOGA M odu l a r i t y Generation
FIG. 8: (Color online) Distributions of the divisions returnedby the SGA, MOGA and MAGA. The black horizontal lineindicates the best bipartite modularity reported in [26] usingthe BRIM algorithm.
C. Scotland corporate interlock network
The second real-world bipartite network we have usedas a test on is the Scotland corporate interlock net-work [63]. This network describes the corporate interlockpattern between 136 directors and the 108 largest jointstock companies during 1904-1905. As it is disconnected,we focus merely on its largest component, which com-prises 131 directors and 86 firms. In the following, theword “network” consistently indicates this component.The BRIM algorithm found poorer divisions of thisnetwork with Q = 0 . Q = 0 . Q = 0 . ± Q = 0 . π and π ) found by the MAGA with Q = 0 . Q = 0 . π and π and is equal to 0.9191, indicatingthat they are very similar. Simultaneously, for each solu-tion, we calculated the average of the NMI between thatdivision and other divisions. We found that π has thelargest value, 0.8459, and π has the third largest value,0.8248. These facts lend confidence in the reliability ofthe optimum divisions obtained, π and π . Figure 9shows the community structure of this network accord-ing to π . Clearly, the MAGA indeed has given a veryaccurate division of this network. D. Unipartite networks
The MAGA can also be applied to unipartite networksby optimizing the bipartite modularity after the transfor-mation as mentioned in Sec. II. Being a kind of geneticalgorithm, however, the MAGA can directly optimize theunipartite modularity as the SGA does [46], which distin-guishes it from certain methods such as the SR methodwhich is required to develop different versions for differ-ent classes of networks [17, 18, 23, 26]. Furthermore, themodularity consistency revealed in Sec. II means thatthe MAGA can also more effectively optimize unipar-tite modularity so that we only focus on the comparisonwith those well-known methods, including the Girvan-Newman (GN) algorithm [7], EO [27], SR [17, 18], andSA [16].To test the performance of the MAGA on unipartitenetworks, we have considered several real networks withdifferent scales: the Zachary karate club network [64],the jazz musicians network [65] (Jazz), the Caenorhabdi-tis elegans metabolic network [66] (C. elegans), the emailnetwork of University Rovira i Virgili [67] (Email), a trustnetwork of users of the Pretty-Good-Privacy (PGP) al-gorithm for information security [68], and a coauthor-ship network of scientists working in condensed matterphysics [69] (Cond-mat).The EO and SR methods clearly outperform the origi-nal method for detecting communities (the GN method);they may both be viewed as the representatives of modu-larity maximization approaches in that they can achieve a good tradeoff between speed and accuracy. As shown inTable V, the MAGA almost consistently outperforms theEO and SR methods for these networks. Interestingly, forthe Zachary network the MAGA found the accurate so-lution with Q = 0 . . The performance ofthe SA listed in Table V was reported in running on anIntel PC with two 3.2 GHz processors in [39], wherein theauthors proposed an accurate method that can be com-petitive with the SA method but has very high memorydemand. We ran the MAGA ten times for all the net-works, with predefined generation size, on an Intel PCwith two 2.93 GHz processors. The last two columns ofTable V shows the number of generations and the run-ning time needed to find the maximum modularity in theruns. V. CONCLUSION
We have shown both that unipartite and directed net-works can be equivalently represented as bipartite net-works, and their modularity is just the correspondingbipartite modularity. This implies that bipartite net-works can be considered as an extensive class of net-works including unipartite and directed networks, andthat detecting communities in bipartite networks pro-vides a uniform framework for solving the problem invarious networks. Therefore, methods for detecting com-munity structure of bipartite networks generally can beapplied to unipartite and directed networks.We have presented an adaptive genetic algorithm, theMAGA, for the task of community structure detection.This algorithm is based on the MOGA which was pre-sented with the aim of improving the performance of tra-ditional genetic algorithms. But we have shown that theMOGA has a poor performance as applied to this task.In fact, we have revealed the MOGA would be misguidedby the allele standard deviation and does not guaran-tee the convergence to global optima. In the MAGA,we introduced a different measure for the informative-ness of loci, a modified rule for mutation and a reas-signment technique. These ingredients jointly make theMAGA more effectively optimize objective function forcommunity structure detection. The experiments on3
FIG. 9: (Color online) Scotland corporate interlock network (dashed lines indicate the division found by the MAGA). Eachcommunity consists of those nodes with the same color(level of scale), including firms and directors represented by boxes andtriangles respectively.TABLE V: Performance comparison of the MAGA, Girvan-Newman (GN), extremal optimization (EO), spectral relax-ation (SR), and simulated annealing (SA) methods in terms of modularity and running time (only for the SA and MAGA)for unipartite networks. The modularity in bold font represents the maximum modularity obtained for the network, with thecorresponding number of generations and time shown in the last two columns. The running time for the SA or MAGA ismeasured in minutes (min) or seconds (s). SA(0.999) MAGANetwork Size GN EO SR Q Time GenSize Q Generations TimeZachary 34 0.401 0.419 0.419
12s 100 bipartite (model and real) networks have consistentlyshown that the MAGA outperforms the MOGA, SGA,and BRIM. Compared to BRIM, another advantage isthat the MAGA can automatically determine the numberof communities. The results on unipartite networks indi-cate that the global optimization method is indeed moreaccurate than the EO and SR methods as expected, andthat it also can attain or even outperform the accuracyof the SA method in a significantly shorter time, whichis crucial for analyzing large networks.The time complexity of each generation evolution ofthe MAGA is O ( M ), and the overall time demand ofthis algorithm depends on the population size and thegeneration size [70]. Although the MAGA can theoret- ically find the global optima of an objective function,the quality of solutions delivered by the MAGA rests inpractice on the generation size given the population size.Owing to the lower complexity of each generation evolu-tion, we can run enough generations to get a high-qualitysolution. Empirical results showed that the MAGA caneffectively resolve the community structure of networksat many scales up to 10 , which have covered many kindsof real networks such as social, metabolic, and technol-ogy networks. Beyond these scales there are several nicelocal methods available [71–74], while the performanceof our algorithm on networks with such scales needs tobe further explored. On the other hand, since a par-allel implementation of the MAGA allows each of the4most time-consuming operations on N P chromosomes tobe simultaneously calculated by assigning them to mul-tiprocessors of a highly efficient computer, it seems thateven for networks of millions of nodes the MAGA is stilla promising method for accurate detection of their com-munity structure.Methodologically, the MAGA for community detectionis based on the idea of optimization. So the accuracyis determined by the selection of an objective function.Here, we use the (bipartite) modularity as the object tooptimize, which certainly may suffer from the resolutionproblem although this may not be severe for many realnetworks. On the one hand, the resolution problem es-sentially is favorable for gaining deeper insight into thestructure of networks [36]. On the other hand, the ef-fect of this problem may be circumvented or alleviatedas needed. For example, the MAGA can perform networkpreprocessing with random walk [75] before optimizing ortake an alternative objective function [76] instead of themodularity. Also we can combine several high-quality so-lutions to obtain a more accurate division of the networkof interest [32, 72].Overall, the MAGA enables us to accurately and effec-tively detect community structure for various networks including bipartite, unipartite, directed, and weightednetworks so long as it takes the corresponding modu-larity as the fitness function. The evolutionary methodcan return multiple high-quality solutions with no bias,which may provide some useful information on the relia-bility of the solutions of interest and may be combined ina way to obtain a better solution. Finally, we believe thatas an effective discrete optimization method (the specialreassignment technique can be switched off as needed) itwill find more applications in many fields. Acknowledgments
This research was supported by the National Ba-sic Research Program of China under Grant No.2007CB310806, the National Natural Science Founda-tion of China under Grants No. 61074119, No. 60873040and No. 60873070, Shanghai Leading Academic Disci-pline Project No. B114. J.-H. G. was also supported bythe Shanghai Education Development Foundation underGrant No. 09SG23. [1] D. J. Watts and S. H. Strogatz, Nature (London), ,440 (1998).[2] A.-L. Barab´asi and R. Albert, Science , 509 (1999).[3] R. Albert and A.-L. Barab´asi, Rev. Mod. Phys. , 47(2002).[4] S. N. Dorogovtsev and J. F. F. Mendes, Evolution of Net-works: From Biological Nets to the Internet and WWW (Oxford University Press, New York, 2003).[5] M. E. J. Newman, SIAM Rev. , 167 (2003).[6] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D.-U. Hwanga, Phy. Rep. , 175 (2006).[7] M. Girvan and M. E. J. Newman, Proc. Nat. Acad. Sci.U.S.A. , 7821 (2002).[8] M. E. J. Newman and M. Girvan, Phys. Rev. E ,026113 (2004).[9] M. E. J. Newman, Phys. Rev. E , 066133 (2004).[10] N. Neubauer and K. Obermayer (Unpublished).[11] T. Murata, in Proceedings of the 19th International Con-ference on World Wide Web , edited by M. Rappa, P.Jones, J. Freire and S. Chakrabarti (ACM Press, NewYork, 2010), pp. 1159-1160.[12] A. Vazquez, A. Flammini, A. Maritan and A. Vespignani,Nature Biotechnol. , 6 (2003).[13] S. Gupta, R. M. Anderson, and R. M. May, AIDS ,807(1989).[14] G. Yan, Z. Q. Fu, J. Ren, W.-X. Wang, Phys. Rev. E ,016108 (2007).[15] A. Arenas, A. D´ıaz-Guilera, and C. J. P´erez-Vicente,Phys. Rev. Lett. , 114102 (2006).[16] R. Guimer`a and L. A. N. Amaral, Nature (London) ,895 (2005).[17] M. E. J. Newman, Proc. Natl. Acad. Sci. U.S.A. ,8577 (2006). [18] M. E. J. Newman, Phys. Rev. E , 036104 (2006).[19] J. Duch and A. Arenas, Phys. Rev. E , 027104 (2005).[20] H. Zhou, Phys. Rev. E , 041908 (2003).[21] F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, andD. Parisi, Proc. Nat. Acad. Sci. U.S.A. , 2658 (2004).[22] G. Palla, I. Der´enyi, I. Farkas, and T. Vicsek, Nature(London) , 814 (2005).[23] E. A. Leicht and M. E. J. Newman, Phys. Rev. Lett. ,118703 (2008).[24] R. Guimer`a, M. Sales-Pardo, and L. A. N. Amaral, Phys.Rev. E , 036102 (2007).[25] L. C. Freeman, in Dynamic Social Network Modelingand Analysis: Workshop Summary and Papers , editedby R. Breiger, C. Carley, and P. Pattison (The NationalAcademies Press, Washington, DC, 2003), pp. 39-97.[26] M. J. Barber, Phys. Rev. E , 066102 (2007).[27] L. Danon, A. D´ıaz-Guilera, J. Duch and A. Arenas, J.Stat. Mech. (2005) P09008.[28] S. Fortunato, Phys. Rep. , 75 (2010).[29] S. Fortunato and M. Barthelemy, Proc. Nat. Acad. Sci.U.S.A. , 36 (2007).[30] J. M. Kumpula, J. Saramaki, K. Kaski, and J. Kertesz,Eur. Phys. J. B , 41 (2007).[31] B. H. Good, Y.-A. deMontjoye and A. Clauset, Phys.Rev. E , 046106(2010).[32] M. Sales-Pardo, R. Guimer`a, A. A. Moreira, and L. A. N.Amaral, Proc. Nat. Acad. Sci. U.S.A. , 15224 (2007).[33] A. Lancichinetti, F. Radicchi and J.J. Ramasco, Phys.Rev. E , 046110 (2010).[34] R. Guimer`a, M. Sales-Pardo and L.A.N. Amaral, Phys.Rev. E , 025101(R) (2004).[35] B. Karrer , E. Levina and M.E.J. Newman, Phys. Rev.E , 046119 (2008). [36] A. Arenas, J. Duch, A. Fern´andez, and S. G´omez, NewJ. Phys. , 176 (2007).[37] J. H. Holland, Adaptation in natural and artificial sys-tems (University of Michigan Press, Ann Arbor, 1975)[38] U. Brandes, D. Delling, M. Gaertler, R. G¨ o rke, M. Hoe-fer, Z. Nikoloski, and D. Wagner, IEEE Trans. Knowl.Data En. , 38 (2008).[39] G. Agarwal, D. Kempe, Eur. Phys. J. B , 409 (2008).[40] K. Y. Szeto and J. Zhang, in Large-Scale Scientific Com-puting , edited by I. Lirkov, S. Margenov, and J. Was-niewski, Lecture Notes in Computer Science Vol. 3743(Springer-Verlag, Berlin, 2006), pp. 189-196.[41] B. Brinkmann et al, Am. J. Hum. Genet , 1408 (1998).[42] C. W. Ma and K. Y. Szeto, in Proceedings of the FifthInternational Conference on Recent Advances in SoftComputing , edited by L. Lofti (Springer-Verlag, 2004),pp.410-415.[43] N. L. Law and K. Y. Szeto, in
Proceeding of the 20thInternational Joint Conference on Artificial Intelligence ,edited by M. Veloso (AAAI Press, Menlo Park, CA,2007), pp. 2330-2334.[44] K. A. De Jong, Ph. D. thesis, University of Michigan,1975.[45] M. Tasgin, A. Herdagdelen, and H. Bingol, e-printarXiv:0711.0491.[46] C. Pizzuti, in
Parallel Problem Solving from Nature ,edited by G. Rudolph et al. , Lecture Notes in Com-puter Sciences Vol. 5199(Springer-Verlag, Berlin, 2008),pp. 1081-1090.[47] Y. Park, M. Song, in
Proceedings of the Third AnnualConference on Genetic Programming , edited by J. R.Koza et al . (Morgan Kaufmann Publisher, Los Altos, CA,1998), pp. 568-575.[48] S. Kullback, R. A. Leibler, Annals of Math. Stat. , 79(1951).[49] F. Lorrain and H. White, J. Math. Sociol. , 49 (1971).[50] J. J. Grefenstette, IEEE Trans. Systs., Man, Cybern. ,122 (1986).[51] D. E. Goldberg, in Proceedings of the third InternationalConference on Genetic Algorithms , edited by J. Schaffer(Morgan Kaufmann Publishers, Los Altos, CA, 1989),pp. 70-79.[52] J. Arabas, Z. Michalewicz, J. Mulawka, in
Proceedingsof the First IEEE Conference on Evolutionary Computa-tion , edited by D.B. Fogel (IEEE Press, Piscataway, NewJersey, 1994), pp. 73-78.[53] M. Affenzeller, S. Wagner, and S. Winkler, in
The 11thInternational Conference on Computer Aided SystemsTheory , edited by R. Moreno-D´ıaz, F. Pichler and A.Quesada-Arencibia, Lecture Notes in Computer ScienceVol. 4739 (Springer-Verlag, Berlin, 2007), pp. 820-828.[54] T. Hu, S. Harding and W. Banzhaf, Genetic Program-ming and Evolvable Machines,
205 (2010).[55] J. C. Bean, ORSA J. Computing , 154(1994).[56] From the discussion in Sec. III A, the MOGA will dupli-cate the fittest individual(s) to a next generation.[57] After the bipartitioning, we need to update the chromo-some by encoding the two resulting groups of nodes. Thiscan be implemented in time of O ( m ) by the breadth-firstsearch algorithm [58], where m is the total number of edges of these nodes involved.[58] A. V. Aho, J. D. Ullman, and J. E. Hopcroft, Data Struc-tures and Algorithms (Addison-Wesley, Reading, MA,1983).[59] Since the splitting operation can be incorporated intothe SGA in a certain way, to make a fair comparisonbetween the MAGA and SGA, all the results on bipartitenetworks we reported do not involve the operation. For allthe tested unipartite networks the operation is employedand may improve the efficacy of the MAGA, even if itoften yields a rough bipartition.[60] Picking the best one of the run on each example is rea-sonable because the population size is moderate so thatit can be increased and hence improve the accuracy ofeach run; that is, we adopted a better strategy as is donein BRIM.[61] A. Davis, B. B. Gardner, and M. R. Gardner,
DeepSouth (University of Chicago Press, Chicago, 1941).[62] P. Doreian, V. Batagelj and A. Ferligoj, Soc. Networks , 29 (2004).[63] J. Scott and M. Hughes, The Anatomy of ScottishCapital: Scottish Companies and Scottish Capital,1900-1979 (Croom Helm, London, 1980).[64] W. W. Zachary, J. Anthropological Res. , (1977).[65] P. Gleiser and L. Danon, Adv. Complex Syst. , 565(2003).[66] H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and A.-L.Barab´asi, Nature (London) , 651 (2000).[67] R. Guimer`a, L. Danon, A. D´ıaz-Guilera, A.Giralt, andA. Arenas, Phys. Rev. E , 065103 (2003).[68] X. Guardiola, R. Guimera, A. Arenas, A. Diza-Guilera,and L. A. N. Amaral, e-print cond-mat/0206240.[69] M. E. J. Newman, Proc. Natl. Acad. Sci. USA , 404(2001).[70] A simple strategy for selection of the generation size con-sist of two steps: (1) Run the algorithm with an empiricalvalue of generation size 5-10 times. For smaller networksthe generation size can take a value as large as ten timesof the network size, while for larger networks a valueequal to the network size can be used. (2) Appropriatelyincrease the generation size according to the results andthe average time cost if the MAGA does not converge.If these results are very different and the time cost islow, this may indicate that the generation size should bemediated with a large increment; otherwise, it may needa small increment. Generally, this adjustment should bemade at most twice or three times.[71] A. Lancichinetti, S. Fortunato and J. Kert´esz, New J.Phys. , 033015 (2009).[72] U. N. Raghavan, R. Albert and S. Kumara, Phys. Rev.E , 036106 (2007).[73] I. X. Y. Leung, P. Hui, P. Li`o and J. Crowcroft, Phys.Rev. E , 066107 (2009).[74] P. Ronhovde and Z. Nussinov, Phys. Rev. E ,046114 (2010).[75] D. L. Lai, H. T. Lu and C. Nardini, Phys. Rev. E ,066118 (2010).[76] M. Rosvall and C. Bergstrom, Proc. Natl. Acad. Sci.U.S.A.105