[PDF] Generative Model Selection Using a Scalable and Size-Independent Complex Network Classifier

Abstract

Real networks exhibit nontrivial topological features such as heavy-tailed degree distribution, high clustering, and small-worldness. Researchers have developed several generative models for synthesizing artificial networks that are structurally similar to real networks. An important research problem is to identify the generative model that best fits to a target network. In this paper, we investigate this problem and our goal is to select the model that is able to generate graphs similar to a given network instance. By the means of generating synthetic networks with seven outstanding generative models, we have utilized machine learning methods to develop a decision tree for model selection. Our proposed method, which is named "Generative Model Selection for Complex Networks" (GMSCN), outperforms existing methods with respect to accuracy, scalability and size-independence.

Full PDF

aa r X i v : . [ c s . S I] F e b GMSCN: Generative Model Selection Using a Scalable and Size-IndependentComplex Network Classiﬁer

Sadegh Motallebi, a) Sadegh Aliakbary, b) and Jafar Habibi c) Department of Computer Engineering, Sharif University of Technology, Tehran,Iran (Dated: 4 February 2014)

Real networks exhibit nontrivial topological features such as heavy-tailed degree distribution, high clustering,and small-worldness. Researchers have developed several generative models for synthesizing artiﬁcial networksthat are structurally similar to real networks. An important research problem is to identify the generativemodel that best ﬁts to a target network. In this paper, we investigate this problem and our goal is to select themodel that is able to generate graphs similar to a given network instance. By the means of generating syntheticnetworks with seven outstanding generative models, we have utilized machine learning methods to developa decision tree for model selection. Our proposed method, which is named “Generative Model Selectionfor Complex Networks” (GMSCN), outperforms existing methods with respect to accuracy, scalability andsize-independence.Keywords: Complex Networks, Generative Models, Synthetic Networks, Model Selection, Network StructuralFeatures, Social Networks, Decision Tree Learning

A realistic network generative model can gener-ate artiﬁcial graphs similar to real networks. Butthere is no best universal generative model for allsituations. In any application, among diﬀerentexisting generative models, we should choose thebest model for that speciﬁc application. So, gen-erative model selection is a prerequisite for creat-ing artiﬁcial realistic networks. In this paper, weconsider the problem of model selection and pro-pose a method for ﬁnding the model that best ﬁtsa given network. The selected generative modelhelps us infer the growth mechanisms of the givennetwork. It can also generate artiﬁcial networkssimilar to the given network for tasks like sim-ulation, prediction, extrapolation, and hypothe-sis testing. We propose utilizing a combinationof diﬀerent local and global network features forlearning a model selection decision tree. We alsodevise a new network feature based on the quan-tiﬁcation of the degree distribution. We showthat Our proposed method is robust, scalable, in-dependent from the size of the given network, andmore accurate than the baseline method.

I. INTRODUCTION

Complex networks appear in diﬀerent categories suchas social networks, citation networks, collaborationnetworks, and communication networks . In recentyears, complex networks are frequently studied and a) [email protected]. b) [email protected]. c) [email protected]. many evidences indicate that they show some non-trivialstructural properties . For example, power lawdegree distribution, high clustering and small pathlengths are some properties that distinguish complexnetworks from completely random graphs.An active ﬁeld of research is dedicated to the devel-opment of algorithms for generating complex networks.These algorithms, called “generative models”, try to gen-erate synthetic graphs that adhere the structural proper-ties of complex networks . Realistic generative modelshave many applications and beneﬁts. Once a genera-tive model is ﬁtted to a given real network, we can re-place the real network with artiﬁcial networks in taskssuch as simulation, extrapolation (by generating similargraphs with larger sizes), sampling (reverse of extrapo-lations), capturing the network structure and networkscomparison .Despite the advances in the ﬁeld, there is no universalgenerative model suitable for all network types and fea-tures. The prerequisite of network generation is the stageof generative model selection. In fact, when we generatesynthetic networks, we hope to reach graphs that arestructurally similar to a target network. In the modelselection stage, the properties of a given network (called target network ) are analyzed and the best model suit-able for generating similar networks is selected. A modelselection method tries to answer this question: “Amongcandidate generative models, which one is the most suit-able one for generating complex network instances sim-ilar to the given network?” In this paper, we investi-gate this problem and by the means of machine learningalgorithms, we propose a new model selection methodbased on network structural properties. The proposedmethod is named “Generative Model Selection for Com-plex Networks” (GMSCN). The need for model selectionis frequently indicated in literature . More speciﬁ-cally some works are based on counting subgraphsof small sizes (called graphlets or motifs ), andsome others concentrate on structural features of complexnetworks , and some are based on manually selecting amodel through watching a small set of network features .We will show that by using an appropriate combinationof local and global network features, we can develop amore accurate model selection method. In our proposedmethod (GMSCN), we consider seven prominent gener-ative models by which we have generated datasets ofnetwork instances. The datasets are used as trainingdata for learning a decision tree for model selection. Ourmethod also consists of a special technique for quantiﬁ-cation of degree distribution. In comparison to existingmethods , we have considered wider, newer and moresigniﬁcant generative models. Due to a better selectionof network features, GMSCN is also more eﬃcient andmore scalable than similar methods .The rest of this paper is organized as follows. Section IIreviews the related work. Section III presents GMSCN.Section IV is dedicated to evaluation of GMSCN. SectionV describes a case study on some real network samples.The results and evaluations of this paper are discussedin Section VI. Finally, Section VII concludes the paper. II. RELATED WORKA. Network Generation Models

In this subsection, we brieﬂy introduce the leadingmethods of network generation: • Kronecker Graphs Model (KG) . This model gener-ates realistic synthetic networks by applying a ma-trix operation (the kronecker product) on a smallinitiator matrix. This model is mathematicallytractable and supports many network features suchas small path lengths, heavy tail degree distribu-tion, heavy tails for eigenvalues and eigenvectors,densiﬁcation and shrinking diameters over time. • Forest Fire Model (FF) . In this model, edges areadded in a process similar to a ﬁre-spreading pro-cess. This model is inspired by Copying model and Community Guided Attachment but sup-ports the shrinking diameter property. • Random Typing Generator Model (RTG) . RTGuses a process of “random typing” for generatingnode identiﬁers. This model mimics real worldgraphs and conforms to eleven important patterns(such as power law degree distribution, densiﬁca-tion power law and small and shrinking diameter)observed in real networks . • Preferential Attachment Model (PA) . The classi-cal preferential attachment model generates scale-free networks with power law degree distribution. In this model, the nodes are added to the networkincrementally and the probability of the attach-ments depends on the degree of existing nodes. • Small World Model (SW) . This is another classi-cal network generation model that synthesizes net-works with small path lengths and high clustering.It starts with a regular lattice and then rewiressome edges of the network randomly. • Erd¨os-R´enyi Model (ER) . This model generatesa completely random graph. The number of nodesand edges are conﬁgurable. • Random Power Law Model (RP) . The RP modelgenerates synthetic networks by following a vari-ation of ER model that supports the power lawdegree distribution property.Other generative models are also available (wehave not utilized them but they are used in relatedmodel selection methods), such as Copying Model(CM) , Random Geometric Model (GEO) , Spa-tial Preferential Attachment (SPA) , Random Grow-ing (RDG) , Duplication-Mutation-Complementation(DMC) , Duplication-Mutation using Random muta-tions (DMR) , Aging Vertex (AGV) , Ring Lattice(RL) , Core-periphery (CP) , and Cellular model(CL) . B. Model Selection Methods

The aim of this paper and the model selectionmethods is to ﬁnd the best generative model that ﬁts agiven network instance. Some model selection methodsare based on graphlet counting . Graphlets aresubgraphs of bounded sizes (e.g., all possible subgraphswith three or four nodes) and the frequency of graphletsin a network is considered as a way of capturing thenetwork structure . In some works, directed graphs andgraphlets are considered and some others considerthe network as simple (undirected) graphs .Janssen et al. have tested both graphlet features andstructural features (degree distribution, assortativityand average path length) in the model selection problem.They conclude that counting graphlets of three andfour nodes is suﬃcient for capturing the structure ofthe network, i.e., appending structural features to thefeature vector of graphlet counts does not improvethe accuracy of the model selector. In this paper, wecritique this claim and show that using a better set oflocal (such as transitivity) and global (such as eﬀectivediameter ) network structural features, along with anappropriate degree distribution quantiﬁcation algorithm,actually improves the accuracy of the model selection.In fact, graphlet counts are limited local features and arenot able to reﬂect the structural properties of a networkinstance. Janssen et al implemented six generativemodels and generated a dataset of synthetic networksas the training data for decision tree learning . In thismethod, candidate generative models are: PA , CM ,GEO (GEO2D and GEO3D) and SPA (SPA2D andSPA3D).A similar method is proposed by Middendorf et al. .In this method, the feature vectors are the counts ofgraphlets with small sizes. Seven diﬀerent generativemodels are considered by which network instances aregenerated as the training data. Candidate generativemodels are: ER , PA , SW , RDG , DMC ,DMR and AGV . The authors have used a general-ized decision tree called alternating decision tree (ADT)as the learning algorithm.Airoldi et al. propose to form feature vectors accordingto structural network properties. They have consideredsome classical generative models and generated a datasetby which a na¨ıve Bayes classiﬁer is learned. Candidategenerative models are: PA , ER , RL , CP andCL . This method is dependent on the size and averageconnectivity of the target network and this dependencyis one of its limitations.Patro et al. propose a framework for implementingnetwork generation models. The user of this frameworkcan specify the important network features and theweight of each feature. In other words, we considereach generative model as a class of networks. Thismodel, more than to be a speciﬁc method, is a relativelyopen framework and the user should determine diﬀerentparameters of the framework according to the targetapplication. III. THE PROPOSED METHOD

GMSCN is based on learning a classiﬁer for modelselection. The goal of a classiﬁer is to accurately pre-dict the target class for a given network instance and inour method, generative models play the role of networkclasses. In GMSCN, the classiﬁer suggests the best modelthat generates networks similar to a given network. Theinputs of the classiﬁer are the structural properties ofthe target network and the output is the selected modelamong the candidate network generation models.

A. Methodology

Fig. 1 shows the high-level methodology of GMSCN.The methodology is conﬁgurable by several parametersand decision points, such as the set of considered net-work features, the chosen supervised learning algorithmand the candidate generative models. The steps of con-structing the network classiﬁer, as illustrated in Fig. 1,are described in the following: 1. Many artiﬁcial network instances are synthesizedusing the candidate network generative models.These network instances will form the dataset(training and test data) for learning a network clas-siﬁer. In this step, the parameters of the generativemodels are tuned in order to synthesize networkswith densities similar to the density of the giventarget network.2. After generating the network instances, the struc-tural features (e.g., the degree distribution and theclustering coeﬃcient) of each network instance areextracted. The result is a dataset of labeled struc-tural features in which each record consists of topo-logical features of a synthesized network along withthe label of its generative model.3. The labeled dataset forms the training and testdata for the supervised learning algorithm. Thelearning algorithm will return a network classiﬁerwhich is able to predict the class (the best genera-tive model) of the given network instance.4. The structural features of the target network arealso extracted. The same “Feature Extraction”block which is used in the second step is appliedhere. The structural features of the target networkare used as input for the learned classiﬁer.5. The learned network classiﬁer is a customized“model selector” for ﬁnding the model that ﬁts thetarget network. It gets the structural features ofthe target network as input and returns the mostcompatible generative model.In this methodology, the density of the target networkis considered as an important property of the targetnetwork. Network density is deﬁned as the ratio ofthe existing edges to potential edges and is regardedas an indicator of the sparseness of the graph. In theproposed methodology, generative models are conﬁguredto synthesize networks with densities similar to thedensity of the target network. This decision is due to thefact that it is hard to compare networks of completelydiﬀerent densities for predicting their growth mechanismand generation process. On the other hand, even withsimilar network densities, various generative models cre-ate diﬀerent network structures. So, we try to keep thedensity of the generated networks similar to the densityof the target network. In this manner, the networkclassiﬁer can learn the diﬀerence among the structure ofvarious generative models with similar network densities.It is also worth noting that it is not possible togenerate networks with exactly equal densities withsome of the existing generative models. This is becausesome generative models (such as Kronecker graphs andRTG) are not conﬁgurable for ﬁnely tuning the exactdensity of synthesized networks. So, we generate thenetworks of training data with similar, and not exactly

FIG. 1. The methodology of learning a network classiﬁer equal, densities to the density of the given network.Our proposed methodology, unlike existingmethods , is not dependent on the size (num-ber of nodes) of the target network. Size-independenceis an important feature of our method. It enables theclassiﬁer to learn from a dataset of generated networkswith sizes diﬀerent -perhaps smaller- from the size ofthe target network, but with a similar density. Thisfacility decreases the time of network generation andfeature extraction considerably. We will demonstratethe size-independence property of the GMSCN in theevaluation section.GMSCN is actually a realization of the describedmethodology. In the following subsections, we furtherillustrate the details of GMSCN by specifying the openparameters and decision points of the methodology.

B. Network Features

The process of model selection, as described in Fig.1, utilizes structural network features in the secondand fourth steps. There are plenty of diﬀerent networkfeatures, so we clarify the considered features in GMSCNhere.To capture the properties of a network, we should anal-yse a wide and diverse feature set of network connectivitypatterns. We propose the utilization of a combination oflocal and global network structural features. The utiliza- tion of a limited set of local features (graphlet counts)in similar methods has resulted in a lower precisionfor the model selector. As explained later, we have uti-lized ten network features from four feature categories.While trying to ﬁnd the best and minimal set of net-work features, we considered features that are not onlyeﬀective on the classiﬁcation accuracy, but also eﬃcientlycomputable and size-independent. One may consider alonger list of network features, even from diﬀerent featurecategories (e.g. eigenvalues). In such an approach, auto-matic methods for feature selection such as the method-ology explained in Ref. may be helpful. But support-ing speciﬁed diverse criteria (eﬀectiveness, eﬃciency andsize-independence) for selected features is quite diﬃcultin such an automatic methodologies.The utilized features and measurements in GMSCN are: • Transitivity of relationships . In this categoryof network features, we consider two measure-ments of “average clustering coeﬃcient” and“transitivity” . • Degree correlation . The measure of assortativity is selected from this category of network features. • Path lengths . There are diﬀerent global fea-tures about the path lengths in a network, suchas diameter , eﬀective diameter and averagepath length . We selected the “eﬀective diame-ter” measurement since it is more robust and alsobecause of its less computation cost and sensitiv-ity to small network changes . Eﬀective diameterindicates the minimum number of edges in which90 percent of all connected pairs can reach eachother . Eﬀective diameter is well deﬁned forboth connected and disconnected networks . • Degree distribution . It is a common approach toﬁt a power law on the degree distribution and ex-tract the power law exponent as a representativequantity for the degree distribution. But a singlenumber (the power law exponent) is too limited forrepresenting the whole degree distribution. On theother hand, some real networks do not conform tothe power law degree distribution . We proposean alternative method for quantiﬁcation of the de-gree distribution by computing its probability per-centiles. The percentiles are calculated from somedeﬁned regions of the degree distribution accordingto its mean and standard deviation. We devise K intervals in the degree distribution and then calcu-late the probability of degrees of each interval. K is always an even number greater than or equal tofour. The size of all intervals, except the ﬁrst andthe last one, is considered equal to pσ where σ isthe standard deviation of the distribution and p isa tunable parameter. In any application, we canconﬁgure the values of K and p in a manner thatthe percentile values become more distinctive. Inour experiments we let K = 6 and p = 0 .

3, sowe extract six quantities (DegDistP1..DegDistP6percentiles) from any degree distributions. If weincrease the value of K , we should normally de-crease the value of σ so that most of the inter-val points stay in the range of existing node de-grees. Smaller values for σ also necessitate largervalues for K . Large values (e.g., K = 100) andsmall values (e.g., K = 1) for K will also decreasethe distinction power of the extracted features vec-tor. The speciﬁed values for σ and K are foundthrough trial and error. Equation 1 shows the in-terval points of the degree distribution and Equa-tion 2 speciﬁes the probability for a node degree tosit in the i th interval. The set of six percentiles(DegDistP1..DegDistP6) are used as the networkfeatures representing the degree distribution.Let IP i be the i th interval point and D be the de-gree random variable. IP i =  min ( D ) i = 1 µ − ( k − i + 1) pσ i = 2 ..Kmax ( D ) i = K + 1 (1) DegDistP i = P ( IP i < D < IP i +1 ) , i = 1 ..K (2) C. Learning the Classiﬁer

The third step of the proposed methodology is theutilization of a supervised machine learning algorithm.The learning algorithm constructs the network classiﬁerbased on the features of generated network instancesas the training data. Each record of the training dataconsists of the structural features -as described in theprevious subsection- of a generated network along withthe label of its generative model. By the means ofsupervised algorithms, we can learn from this trainingdata a classiﬁer which predicts the best generative modelfor a given network with the speciﬁed structural features.We examined several supervised learning algorithmssuch as decision tree learning , Bayesian networks ,support vector machines (SVM) and neural networks among which the LADTree method showed betterresults. A short description of examined learning algo-rithms is presented in Appendix A. In our experiments,although some methods (such as Bayesian networks)resulted in a small improvement in the accuracy ofthe learned classiﬁer, but the decision tree learned byLADTree algorithm was obviously more robust andless sensitive to noises than other learning methods.The robustness to noise analysis is described in theevaluation section. To avoid over-ﬁtting, we always usedstratiﬁed 10-fold cross-validation. D. Network Models

Among several existing network generative models,we have selected seven important models: KroneckerGraphs Model, Forest Fire Model , Random TypingGenerator Model, Preferential Attachment Model,Small World Model, Erd¨os-R´enyi Model and Ran-dom Power Law Model. The selected models are thestate of the art methods of network generation. The ex-isting model selection methods such as Ref. and Ref. have ignored some new and important generative mod-els such as Kronecker Graphs , Forest Fire and RTG models. IV. EVALUATION

In this section, we evaluate our proposed method ofmodel selection (GMSCN). We also compare GMSCNwith the baseline method and show that it outperformsstate of the art methods with respect to diﬀerent criteria.Despite most of the existing methods, GMSCN has nodependency on the size of the given network. In otherwords, we ignore the number of nodes of the target net-work and we only consider its density in generating thetraining data. Because the baseline method is depen-dent on the size of the target network, we evaluate themethods in two stages. In the ﬁrst stage, we ﬁx thesize of the generated networks to prepare a fair condi-tion for comparing GMSCN with the baseline method.Although size-dependence is a drawback for the baselinemethod, the evaluation shows that GMSCN outperformsthe baseline method even in ﬁxed network size condition.In the second stage, we allow the generation models tosynthesize networks of diﬀerent sizes. In this stage, weshow that the size diversity of generated networks doesnot aﬀect the accuracy of the learned decision tree. Asdescribed in Section III, GMSCN is based on learning adecision tree from a training set of generated networks.In each evaluation stage, we generated 100 networks fromeach network generative model and with seven candidatemodels, we gathered 700 generated networks. We usedthese network instances as the training and test data forlearning the decision tree. A. The Baseline method

We have selected the graphlet-based method proposedby Janssen et al. as the baseline method. The baselinemethod has some similarities to GMSCN: it is based onconsidering some network generative models and thenlearning a decision tree for network classiﬁcation withthe aid of a set of generated networks. In the baselinemethod, eight graphlet counts are considered as thenetwork features. All subgraphs with three nodes (twographlets) and four nodes (six graphlets) are considered FIG. 2. The graphlets with three and four nodes in the baseline method (Fig. 2). A similar approach isalso proposed by Middendorf et al. , with distinctionson the learning algorithm and the set of candidategenerative models. The graphlet-based method isselected as the baseline because it is a new method andits evaluations show a high accuracy, and it is proposedsimilarly in diﬀerent research domains such as socialnetworks and protein networks .Despite the similarities, there exist some importantdiﬀerences between GMSCN and the baseline method.First, the baseline method is based on counting graphletsin networks while GMSCN proposes a wider set of lo-cal and global features. Janssen et al. conclude thatconsidering structural features does not improve the ac-curacy of the graphlet-based classiﬁer, but we will showthat choosing a better set of local and global networkfeatures and with the aid of our proposed degree dis-tribution quantiﬁcation method, structural features willplay an undeniable role in model selection. Second, thebaseline method is size-dependent, i.e., it considers boththe size and the density of the target network, and itgenerates network instances according to these two prop-erties. On the other hand, GMSCN is size-independentand we only consider the density of the target network inthe network generation phase. Third, GMSCN employsnewer and more important generative models such as theKronecker Graphs model, the Forest Fire model andthe RTG model. Fourth, we examined diﬀerent learn-ing algorithms and then selected LADTree as the bestlearning algorithm for this application. Our evaluationof GMSCN is more thorough, considering diﬀerent eval-uation criteria. We have also presented a new algorithmfor quantifying the network degree distribution.Graphlet counting is a very time consuming task andthere is no eﬃcient algorithm for computing the fullcounts of graphlets for large networks. To handle thealgorithmic complexity, most of the graphlet-countingmethods (e.g., Refs. 11, 17, 50, and 51) propose a sam-pling phase before counting the graphlets. But the sam-pling algorithm may aﬀect the graphlet counts and theresulting counts may be biased towards the features ofthe sampling algorithm. It is also possible to estimatethe graphlet counts with approximate algorithms ,but this approach may also bring remarkable errors ingraphlet counts. To prepare a fair comparison situation,we have counted the exact number of graphlets in orig-inal networks and have not employed any sampling orapproximation algorithms. It is worth noting that re-ported accuracy of the baseline method in this paper isdiﬀerent from the report of the original paper , mainly because the set of generative models are not the same inthe two papers. B. Accuracy of the Model Classiﬁer

We ﬁrst set a ﬁxed size for generated networks of thedataset and generate networks with about 4096 nodes.Almost all the generated networks in our dataset contain4096 nodes, but the networks generated by RTG modelhave small variations in their size. Number of nodes inthese networks is in the range of 4000 to 4200 and thisis because the exact number of nodes is not conﬁgurablein the RTG model. Since the Kronecker Graphs modelgenerates networks with 2 x nodes in its original form,we chosen 4096 (2 ) as the size of the networks. Theaverage density of networks in this dataset is equal to0.0024.In addition to overall accuracy, we evalu-ate the precision and recall of the learned de-cision tree for diﬀerent network models. “Pre-cision” shows the percentage of correctly classi-ﬁed instances calculated for each category (e.g., P recision

F F = number of correctly predicted F F intancesnumber of F F predicted instances ),“Recall” illustrates the ability of the methodin ﬁnding the instances of a category (e.g., Recall

F F = number of correctly predicted F F intancesnumber of F F instances ),and “Accuracy” is an indicator of overall eﬀective-ness of the classiﬁer across the entire dataset (i.e., Accuracy = number of correctly predicted intancestotal number of instances ). Theoverall accuracy of GMSCN is 97.14% while the accuracyof the baseline method is 78.57% which indicates 18.57%improvements. Fig. 3 and Fig. 4 show the precision andrecall of GMSCN and the baseline method respectivelyfor diﬀerent network models. In addition to an apparentimprovement in the precision and recall for most ofthe generative models, the ﬁgures show the stability(less undesired deviation) of GMSCN over the baselinemethod. The accuracy and precision of GMSCN showsmall deviation for diﬀerent generative models, whilethese measures for baseline method vary in a widerange. Table I shows the details of GMSCN results fordiﬀerent network models. For example, the ﬁrst row ofthis table indicates that among 700 network instances,104 networks are predicted to be generated by the ERmodel but in fact 97 (out of 104) instances are ER, sixinstances are the KG model and one is generated by theSW model. Because we have utilized cross-validation,all of the 700 network instances are included in theevaluation. Table II shows corresponding results for thebaseline method.It is worth noting that considering both the graphletcounts and the structural features does not improve theaccuracy of the classiﬁer considerably. Since we want toprepare a size-independent and eﬃcient method, we do FIG. 3. Precision of GMSCN compared to baseline methodfor diﬀerent generative modelsFIG. 4. Recall of GMSCN compared to baseline method fordiﬀerent generative models not consider the graphlet counts in feature vectors.

C. Size Independence

GMSCN for model selection is independent from thesize of the target network. When we want to ﬁnd the bestmodel ﬁtting a real network, we can discard the numberof nodes in the network and generate the training dataonly according to its density. The size-independenceis an important feature of GMSCN which is missingin the baseline method. This feature is especiallyimportant when we want to ﬁnd the generative modelfor a relatively large network. In this condition, we cangenerate the training network instances with smallersizes than the target network. This feature also increasesthe applicability, scalability and performance of GMSCN.For evaluating the dependency of GMSCN to thesize of the network, we generate a new dataset withnetworks of diﬀerent sizes. Instead of ﬁxing the numberof nodes in each network instance (such as about 4096nodes in the previous evaluation) we allow networkswith diﬀerent number of nodes in the dataset. In thistest, with each of the generative models, we generated100 networks of diﬀerent sizes: 24 networks with 4,096nodes, 24 networks with 32,768 nodes, 24 networks with131,072 nodes, 24 networks with 524,288 nodes andfour networks with 1,048,576 nodes. Again, the onlyexception is the RTG model which generates networkswith small variations from the speciﬁed sizes. The nodecounts are powers of two because the original versionof Kronecker graph model is able to generate networkswith 2 n nodes. The average density of networks in this FIG. 5. Accuracy of GMSCN for diﬀerent network sizes. dataset is equal to 0.000885.Table III shows the precision and recall of GMSCN forthis dataset. In this evaluation, the overall accuracy ofthe classiﬁer is 97.29% which is very close to the accuracyof the system in the evaluation with ﬁxed network sizes.This fact shows that GMSCN is not dependent on the sizeof the target network. The average density of networks inthis dataset (0.000885) is diﬀerent from the average den-sity of networks in the ﬁxed-size dataset (0.0024). So,the model selection is also performing well for diﬀerentdensities of the given network. We also extended thisexperiment to ensure that there is no meaningful lowerbound for GMSCN in terms of network size. The new ex-periment is conﬁgured similar to the previous trial, butit examines a wider range of network sizes. Fig. 5 plotsthe result of this experiment at each number of nodes. Itindicates that GMSCN shows good performance for thevarying network sizes. Obviously, the baseline methodis size-dependent because the graphlet counts com-pletely depend on the size of the network. So, it is notnecessary to show the precision and recall of the baselinemethod for dataset of networks of diﬀerent sizes. We ig-nored such a useless evaluation because the calculationof graphlet counts for large networks is very time con-suming.

D. Robustness to Noise

We also evaluate the robustness of GMSCN withrespect to random changes in networks. For eachtest-case network, we randomly select a fraction ofedges, rewire them to random nodes, and test theaccuracy of the classiﬁer for the resulting network. Westart from the pure network samples and in each step,we change ﬁve percent of the edges until all the edges(100 percent change) are randomly rewired. In otherwords, in addition to pure networks, we generated 20test-sets with from zero to 100 percent edge changes,each of which containing 700 network samples fromseven generative models.As discussed before, we have chosen LADTree asthe supervised learning algorithm in GMSCN. Fig. 6shows the average accuracy of GMSCN for diﬀerentrandom change fractions. This ﬁgure shows the eﬀect

TABLE I. Precision, Recall and Accuracy of GMSCN for diﬀerent generative modelsTrue ER True FF True KG True PA True RP True SW True RTG

Class Precision

Pred. ER 97 0 6 0 0 1 0 93.27%Pred. FF 0 100 0 0 2 0 0 98.04%Pred. KG 2 0 93 2 0 0 0 95.88%Pred. PA 1 0 0 98 0 0 0 98.99%Pred. RP 0 0 1 0 94 0 1 97.92%Pred. SW 0 0 0 0 0 99 0 100.00%Pred. RTG 0 0 0 0 4 0 99 96.12%

Class Recall

97% 100% 93% 98% 94% 99% 99%

Accuracy: 97.14%

TABLE II. Precision, Recall and Accuracy of the baseline method for diﬀerent generative modelsTrue ER True FF True KG True PA True RP True SW True RTG

Class Precision

Pred. ER 94 1 30 0 11 0 0 69.12%Pred. FF 0 73 2 0 6 6 0 83.91%Pred. KG 6 0 37 0 17 0 0 61.67%Pred. PA 0 0 26 100 6 0 0 75.76%Pred. RP 0 23 5 0 52 0 0 65.00%Pred. SW 0 3 0 0 0 94 0 96.91%Pred. RTG 0 0 0 0 8 0 100 92.59%

Class Recall

94% 73% 37% 100% 52% 94% 100%

Accuracy: 78.57%

TABLE III. Precision and Recall of GMSCN for networks ofdiﬀerent sizesER FF KG PA RP SW RTG

Precision

Recall of choosing diﬀerent learning algorithms for GMSCN.As the ﬁgure shows, LADTree results in a more robustclassiﬁer for this application, since it is less sensitive tonoise. The accuracy of GMSCN is smoothly decreasingnearly linear with random changes. There is no suddendrop in the chart of the GMSCN (based on LADTree).With 100 percent random changes (the right end ofthe diagram), the accuracy of the classiﬁer reachesthe value of 14.43 percent, which is near to 1/7 (i.e., number of candidate models ). This is due to existence ofseven network models and indicates that almost all thecharacteristics of the generative model is eliminatedfrom a generated network with 100 percent edge rewiring. E. Scalability and Performance

The aim of GMSCN is ﬁnding a generative model bestﬁtting a given real network. We deﬁne the scalabilityof such a method as its ability to handle networks of

FIG. 6. Robustness of the diﬀerent classiﬁcation methodswith respect to random edge rewiring. large sizes as the input. Noting to the methodology ofthe proposed method (Fig. 1), the most time-consumingpart of the model classiﬁcation is the feature extractiontask. For the feature extraction task, GMSCN is obvi-ously more scalable than the baseline method. There isno eﬃcient algorithm for counting the graphlets in largenetworks. The selected network features in GMSCN (ef-fective diameter, clustering coeﬃcient, transitivity, as-sorativity and degree distribution percentiles) are eﬃ-ciently computable by existing algorithms. We have alsodiscarded “timely to extract” features such as “averagepath length” because their extraction has more compu-tationally complex algorithms.Most of the graphlet-based methods such as Ref. andRef. try to increase their scalability by incorporating apre-stage of network sampling with very small rates suchas 0.01% (one out of 10,000) in Ref. . But such samplingrates decreases the accuracy of graph counts and the cho-sen sampling algorithm will also bias the graph counts.On the other hand, if sampling or approximation algo-rithms are accepted for baseline method, these techniqueswill improve the performance of GMSCN too. In otherwords, utilization of sampling and approximation algo-rithms increases the scalability of both of the baselinemethod and GMSCN similarly. Some notes about theimplementation and evaluation of GMSCN are presentedin the Appendix B. F. Eﬀectiveness of the Degree Distribution QuantiﬁcationMethod

As described in Section III, we have proposed a newmethod for the quantiﬁcation of the degree distributionbased on its mean and standard deviation. In this sub-section, we test the eﬀectiveness of this quantiﬁcationmethod. We show that without the proposed features ofdegree distribution, the accuracy of the network classiﬁerwill diminish. Table IV shows the results of GMSCN byeliminating six features related to the degree distribution(DegDistP1..DegDistP6 percentiles). By this change, theoverall accuracy of the method decreases about eight per-cent (from 97.14% to 89.29%). This can be seen by com-paring the values in Table IV with those of Table I whichreﬂects the results of GMSCN when employing all thefeatures. Precision and recall are improved for almost allthe models with incorporating features related to the de-gree distribution. This fact shows the eﬀectiveness of ourproposed quantiﬁcation method for degree distribution.

TABLE IV. The results of GMSCN after excluding the fea-tures of degree distributionER FF KG PA RP SW RTG

Precision

Recall

90% 95% 81% 97% 78% 96% 88%

V. CASE STUDY

We applied GMSCN for some real networks. The realnetwork instances and the result of applying GMSCN onthese networks are illustrated here:1. “dblp cite” (with 475,886 nodes and 2,284,694edges) is a network which is extracted from theDBLP service. This network shows the citationnetwork among scientiﬁc papers. GMSCN proposesForest Fire as the best ﬁtting generative model forthis network. Leskovec et al. also propose For-est Fire model for two similar graphs of arXiv andpatent citation networks.2. “dblp collab” (with 975,044 nodes and 3,489,572edges) is a co-authorship network of papers indexed in the DBLP service. A node in this network rep-resents an author and an edge indicates at leastone collaboration in writing papers between the twoauthors. GMSCN suggests Forest Fire for this net-work instance too.3. “p2p-Gnutella08” (with 6,301 nodes and 20,777edges) is a relatively small P2P network with about6000 nodes. The best ﬁtting model suggested byGMSCN for this network instance is KroneckerGraphs.4. Slashdot, as a technology-related news website,presented the Slashdot Zoo which allowed users totag each other as friends. “Slashdot0902” (with82,168 nodes and 543,381 edges) is a network offriendship links between the users of Slashdot, ob-tained in February 2009. The output of GMSCNfor this social network is the Random Power Lawmodel.5. In the “web-Google” (with 875,713 nodes and4,322,051 edges) network, the nodes representweb pages and directed edges represent hyperlinksamong them. We ignored the direction of the linksand considered the network as a simple undirectedgraph. The random Power Law model is also pro-posed for this network by GMSCN.6. “Email-EuAll” (with 265,214 nodes and 365,025edges) is a communication network of email con-tacts which is predicted to follow the RTG model.7. Finally, for the small network of “Email-URV” (with 1,133 nodes and 5,451 edges), which is an-other communication network of emails, GMSCNsuggests the Small World model.As explained above, various real networks, which are se-lected from a wide range of sizes, densities, and domains,are categorized in diﬀerent network models by the GM-SCN classiﬁer. This fact indicates that no generativemodel is dominated in GMSCN for real networks andit suggests diﬀerent models for diﬀerent network struc-tures. The case study also veriﬁes that no generativemodel is suﬃcient for synthesizing networks similar toreal networks and we should ﬁnd the best model ﬁttingto the target network in each application. As a result,it is worth noting that the task of generative model se-lection is an important stage before generating networkinstances. VI. DISCUSSION

We evaluated GMSCN from diﬀerent perspectives.GMSCN proposes a size-independent methodology forbuilding the network classiﬁer based on a wide range of0local and global network features as the inputs of a de-cision tree. It shows a high accuracy in predicting thegenerative model for a given network. It is tolerant andinsensitive to small network changes. In addition to size-independence, GMSCN outperforms the baseline methodthat only considers local features of graphlet counts withrespect to accuracy and eﬃciency. A new structural fea-ture is also proposed in GMSCN which quantiﬁes thenetwork degree distribution.One may argue that the size of the training set (700 net-work instances) is relatively small for a machine learn-ing task. But we have actually utilized many more net-work instances in the process of evaluating GMSCN. Ourdataset for evaluating GMSCN includes 15,400 diﬀer-ent network instances: 700 instances in the ﬁxed-sizeevaluation, 700 instances in the size-independence testand 14,000 (20 × VII. CONCLUSION

In this paper, we proposed a new method (GMSCN)for network model selection. This method, which isbased on learning a decision tree, ﬁnds the best modelfor generating complex networks similar to a speciﬁednetwork instance. The structural features of the givennetwork instance are utilized as the input of the decisiontree and the result is the best ﬁtting model. GMSCNoutperforms the existing methods with respect todiﬀerent criteria. The accuracy of GMSCN shows aconsiderable improvement over the baseline method .In addition, the set of supported generative models inGMSCN contains wider, newer and more importantgenerative models such as Kronecker graphs, ForestFire and RTG. Despite most of the existing methods,GMSCN is independent from the size of the inputnetwork. GMSCN is a robust model and insensitive tosmall network changes and noises. It is also a scalablemethod and its performance is obviously better thanthe baseline method. GMSCN also includes a new andeﬀective algorithm for the quantiﬁcation of networkdegree distribution. We have examined diﬀerent learningalgorithms and as a result, decision tree learning byLADTree method was the most accurate and robustmodel. We showed that the local structural features,such as graphlet counts, are insuﬃcient for inferring thenetwork mechanisms and it is a must to consider a widerrange of local and global structural features to be ableto predict the network growth mechanisms.In future, we will investigate the eﬀect of networkstructural features and growth mechanisms on dynamicsand behavior of the network when it is faced with dif-ferent processes. For example, we will evaluate the sim-ilarity of the information diﬀusion process in a networkand its counterparts synthesized by the selected network1generation model. ACKNOWLEDGMENTS

We wish to thank Masoud Asadpour, Mehdi Jalili andAbbas Heydarnoori for their great comments.

Appendix A: Brief Introduction to Classiﬁcation Methods

Machine Learning is a subﬁeld of Artiﬁcial Intelligencein which the main goal is to learn knowledge through ex-perience. Classiﬁcation is a learning task of inferring aclassiﬁcation function from labeled training data. Here,we explain some classiﬁers that are used in this paper.

Support Vector Machines (SVM) . SVM per-forms a classiﬁcation by mapping the inputs into a high-dimensional feature space and constructing hyperplanesto categorize the data instances. The best hyperplanesare those that cause the largest margin among the classes.The parameters of such a maximum-margin hyperplaneare derived by solving an optimization problem. Sequen-tial Minimal Optimization (SMO) is a common methodfor solving the optimization problem. Bayesian Networks Learning . A Bayesian net-work model is a probabilistic graphical model that rep-resents a set of random variables and their conditionaldependencies by a directed acyclic graph. The nodes inthis graph represent the random variables and an edgeshows a conditional dependency between two variables.Bayesian network learning aims to create a network thatbest describes the probability distribution over the train-ing data. To ﬁnd the best network among the set of pos-sible Bayesian networks, the heuristic search techniqueshas been frequently used in the literature. Artiﬁcial Neural Networks . ANN is inspired by hu-man brain neural network. An ANN consists of neuronunits, arranged in layers and connected with weightedlinks, which convert an input vector into some outputs.Usually, the networks are deﬁned to be feed-forward, withno feedback to the previous layer. In the training phase,the weights of the links are tuned to adapt an ANN tothe training data. Back-propagation algorithm is a com-mon method for the training phase. C4.5 Decision Tree Learning . A decision tree isa tree structure of decision rules which can be used asa classiﬁcation function (leaf nodes show the returnedclasses). C4.5 constructs a decision tree based on a la-beled training data. C4.5 uses information entropy toevaluate the goodness of branches in the tree. LADTree .This classiﬁer generates a multi-class al-ternating decision tree and it uses the boosting strat-egy. Boosting is a well-established classiﬁcation tech-nique that combines some weak classiﬁers to form a sin-gle powerful classiﬁer. A prediction node in a LADTreeincludes a score for each of candidate classes. LADTreecalculates conﬁdences for diﬀerent classes according to their visited score in prediction nodes, and it returns thebest class according to the conﬁdences. Appendix B: Implementation Notes

To implement Kronecker Graphs, Forest Fire model,Preferential Attachment, Small World, and RandomPower Law models, we utilized the SNAP library( http://snap.stanford.edu/snap/ ). The implemen-tation of RTG model is available in a MATLAB li-brary ( ). We alsodeveloped our own implementation of the ER model.The features are extracted by the aid of diﬀer-ent network analysis tools. The igraph package( http://igraph.sourceforge.net/ ) of the R projecthelped us calculate the assortativity and transitivitymeasures. We used the SNAP library for measuring theeﬀective diameter, average clustering coeﬃcient, densityand also the graphlet counts. Since we proposed a newmethod for quantifying network degree distribution, wehave implemented this method ourselves. We utilizedRapidMiner as an open source tool for machine learn-ing. The implementation of LADTree and Bayesian net-work learning and SVM are actually part of the Wekatool which is embedded in RapidMiner. The amount ofcomputation needed for this research, especially count-ing the exact number of graphlets, was enormous. Weutilized three virtual machines on a super-computer forthis enormous computation task, each of which simulateda computer with 16 processing cores of 2.8 GHz and 24GB of memory. Most of the computation time was spentfor counting the graphlets of the generated network in-stances. M. E. Newman, “The structure and function of complex net-works,” SIAM review , 167–256 (2003). R. Albert and A.-L. Barab´asi, “Statistical mechanics of complexnetworks,” Reviews of modern physics , 47 (2002). L. d. F. Costa, F. A. Rodrigues, G. Travieso, and P. Villas Boas,“Characterization of complex networks: A survey of measure-ments,” Advances in Physics , 167–242 (2007). C. Christensen and R. Albert, “Using graph concepts to under-stand the organization of complex systems,” International Jour-nal of Bifurcation and Chaos , 2201–2214 (2007). S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D.-U.Hwang, “Complex networks: Structure and dynamics,” Physicsreports , 175–308 (2006). J. Hlinka, D. Hartman, and M. Paluˇs, “Small-world topologyof functional connectivity in randomly connected dynamical sys-tems,” Chaos: An Interdisciplinary Journal of Nonlinear Science , 033107–033107 (2012). A. Yazdani and P. Jeﬀrey, “Complex network analysis of waterdistribution systems,” Chaos: An Interdisciplinary Journal ofNonlinear Science , 016111–016111 (2011). P. Cano, O. Celma, M. Koppenberger, and J. M. Buld´u, “Topol-ogy of music recommendation networks,” Chaos: An Interdisci-plinary Journal of Nonlinear Science , 013107–013107 (2006). J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, andZ. Ghahramani, “Kronecker graphs: An approach to modelingnetworks,” The Journal of Machine Learning Research , 985–1042 (2010). L. Akoglu and C. Faloutsos, “Rtg: a recursive realistic graphgenerator using random typing,” Data Mining and KnowledgeDiscovery , 194–209 (2009). J. Janssen, M. Hurshman, and N. Kalyaniwalla, “Model selectionfor social networks using graphlets,” Internet Mathematics ,338–363 (2012). E. M. Airoldi, X. Bai, and K. M. Carley, “Network samplingand classiﬁcation: An investigation of network model represen-tations,” Decision Support Systems , 506–518 (2011). M. Middendorf, E. Ziv, and C. H. Wiggins, “Inferring net-work mechanisms: the drosophila melanogaster protein interac-tion network,” Proceedings of the National Academy of Sciencesof the United States of America , 3192–3197 (2005). R. Milo, S. Itzkovitz, N. Kashtan, R. Levitt, S. Shen-Orr,I. Ayzenshtat, M. Sheﬀer, and U. Alon, “Superfamilies of evolvedand designed networks,” Science , 1538–1542 (2004). R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii,and U. Alon, “Network motifs: simple building blocks of complexnetworks,” Science Signaling , 824 (2002). I. Bordino, D. Donato, A. Gionis, and S. Leonardi, “Mininglarge networks with subgraph counting,” in

Data Mining, 2008.ICDM’08. Eighth IEEE International Conference on (IEEE,2008) pp. 737–742. J. A. Grochow and M. Kellis, “Network motif discovery usingsubgraph enumeration and symmetry-breaking,” in

Research inComputational Molecular Biology (Springer, 2007) pp. 92–106. L. Cui, S. Kumara, and R. Albert, “Complex networks: Anengineering view,” Circuits and Systems Magazine, IEEE ,10–25 (2010). A. Pomerance, E. Ott, M. Girvan, and W. Losert, “The eﬀect ofnetwork topology on the stability of discrete state models of ge-netic control,” Proceedings of the National Academy of Sciences , 8209–8214 (2009). E. Lameu, C. Batista, A. Batista, K. Iarosz, R. Viana, S. Lopes,and J. Kurths, “Suppression of bursting synchronization in clus-tered scale-free (rich-club) neuronal networks,” Chaos: An Inter-disciplinary Journal of Nonlinear Science , 043149 (2012). J. Leskovec, J. Kleinberg, and C. Faloutsos, “Graphs over time:densiﬁcation laws, shrinking diameters and possible explana-tions,” in

Proceedings of the eleventh ACM SIGKDD interna-tional conference on Knowledge discovery in data mining (ACM,2005) pp. 177–187. J. M. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, andA. S. Tomkins, “The web as a graph: Measurements, models, andmethods,” in

Computing and combinatorics (Springer, 1999) pp.1–17. A.-L. Barab´asi and R. Albert, “Emergence of scaling in randomnetworks,” science , 509–512 (1999). D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small-world’networks,” nature , 440–442 (1998). P. Erd¨os and A. R´enyi, “On the central limit theorem for samplesfrom a ﬁnite population,” Publ. Math. Inst. Hungar. Acad. Sci , 49–61 (1959). D. Volchenkov and P. Blanchard, “An algorithm generating ran-dom graphs with power law degree distributions,” Physica A:Statistical Mechanics and its Applications , 677–690 (2002). M. Penrose,

Random geometric graphs , Vol. 5 (Oxford UniversityPress Oxford, 2003). W. Aiello, A. Bonato, C. Cooper, J. Janssen, and P. Pra lat, “Aspatial web graph model with local inﬂuence regions,” InternetMathematics , 175–196 (2008). D. S. Callaway, J. E. Hopcroft, J. M. Kleinberg, M. E. Newman,and S. H. Strogatz, “Are randomly grown graphs really random?”Physical Review E , 041902 (2001). R. V. Sol´e, R. Pastor-Satorras, E. Smith, and T. B. Kepler, “Amodel of large-scale proteome evolution,” Advances in ComplexSystems , 43–54 (2002). K. Klemm and V. M. Eguiluz, “Highly clustered scale-free net-works,” Physical Review E , 036123 (2002). B. Bollob´as,

Random graphs , Vol. 73 (Cambridge universitypress, 2001). S. P. Borgatti and M. G. Everett, “Models of core/peripherystructures,” Social networks , 375–395 (2000). T. L. Frantz and K. M. Carley, “A formal characterization ofcellular networks,” Tech. Rep. (DTIC Document, 2005). J. Leskovec, J. Kleinberg, and C. Faloutsos, “Graph evolution:Densiﬁcation and shrinking diameters,” ACM Transactions onKnowledge Discovery from Data (TKDD) , 2 (2007). G. Holmes, B. Pfahringer, R. Kirkby, E. Frank, and M. Hall,“Multiclass alternating decision trees,” in

Machine Learning:ECML 2002 (Springer, 2002) pp. 161–172. R. Patro, G. Duggal, E. Sefer, H. Wang, D. Filippova, andC. Kingsford, “The missing models: a data-driven approach forlearning how networks grow,” in

Proceedings of the 18th ACMSIGKDD international conference on Knowledge discovery anddata mining (ACM, 2012) pp. 42–50. M. Zanin, P. Sousa, D. Papo, R. Bajo, J. Garc´ıa-Prieto, F. delPozo, E. Menasalvas, and S. Boccaletti, “Optimizing functionalnetwork representation of multivariate time series,” Scientiﬁc re-ports (2012). A. Barrat and M. Weigt, “On the properties of small-worldnetwork models,” The European Physical Journal B-CondensedMatter and Complex Systems , 547–560 (2000). B. Bollob´as, “The diameter of random graphs,” Transactions ofthe American Mathematical Society , 41–52 (1981). P. V. Boas, F. Rodrigues, G. Travieso, and L. da F Costa, “Sensi-tivity of complex networks measurements,” Journal of StatisticalMechanics: Theory and Experiment , P03009 (2010). S. L. Tauro, C. Palmer, G. Siganos, and M. Faloutsos, “A simpleconceptual model for the internet topology,” in

Global Telecom-munications Conference, 2001. GLOBECOM’01. IEEE , Vol. 3(IEEE, 2001) pp. 1667–1671. V. G´omez, A. Kaltenbrunner, and V. L´opez, “Statistical analy-sis of the social network and discussion threads in slashdot,” in

Proceedings of the 17th international conference on World WideWeb (ACM, 2008) pp. 645–654. N. Z. Gong, W. Xu, L. Huang, P. Mittal, E. Stefanov, V. Sekar,and D. Song, “Evolution of social-attribute networks: measure-ments, modeling, and implications using google+,” in

Proceed-ings of the 2012 ACM conference on Internet measurement con-ference (ACM, 2012) pp. 131–144. H. Kwak, C. Lee, H. Park, and S. Moon, “What is twitter,a social network or a news media?” in

Proceedings of the 19thinternational conference on World wide web (ACM, 2010) pp.591–600. J. R. Quinlan,

C4. 5: programs for machine learning , Vol. 1(Morgan kaufmann, 1993). N. Friedman, D. Geiger, and M. Goldszmidt, “Bayesian networkclassiﬁers,” Machine learning , 131–163 (1997). J. C. Platt, “12 fast training of support vector machines usingsequential minimal optimization,” (1999). J. A. Freeman and D. M. Skapura, “Neural networks: Algo-rithms, applications, and programming techniques (computationand neural systems series),” Neural networks: algorithms, appli-cations and programming techniques (Computation and NeuralSystems Series) (1991). E. de Silva and M. P. Stumpf, “Complex networks and simplemodels in biology,” Journal of the Royal Society Interface , 419–430 (2005). T. Aittokallio and B. Schwikowski, “Graph-based methods foranalysing networks in cell biology,” Brieﬁngs in bioinformatics ,243–255 (2006). M. Rahman, M. Bhuiyan, and M. A. Hasan, “Graft: an approx-imate graphlet counting algorithm for large graph analysis,” in

Proceedings of the 21st ACM international conference on Infor-mation and knowledge management (ACM, 2012) pp. 1467–1471. Http://dblp.uni-trier.de/xml. Http://dblp.uni-trier.de/xml. Http://snap.stanford.edu. Http://snap.stanford.edu. Http://snap.stanford.edu. Http://konect.uni-koblenz.de. Http://deim.urv.cat/˜aarenas. %80%90%100%