[PDF] A Survey of Community Detection Approaches: From Statistical Modeling to Deep Representation

Abstract

Community detection, a fundamental task for network analysis, aims to partition a network into multiple sub-structures to help reveal their latent functions. Community detection has been extensively studied in and broadly applied to many real-world network problems. Classical approaches to community detection typically utilize probabilistic graphical models and adopt a variety of prior knowledge to infer community structures. As the problems that network methods try to solve and the network data to be analyzed become increasingly more sophisticated, new approaches have also been proposed and developed, particularly those that utilize deep learning and convert networked data into low dimensional representation. Despite all the recent advancement, there is still a lack of insightful understanding of the theoretical and methodological underpinning of community detection, which will be critically important for future development of the area of network analysis. In this paper, we develop and present a unified architecture of network community-finding methods to characterize the state-of-the-art of the field of community detection. Specifically, we provide a comprehensive review of the existing community detection methods and introduce a new taxonomy that divides the existing methods into two categories, namely probabilistic graphical model and deep learning. We then discuss in detail the main idea behind each method in the two categories. Furthermore, to promote future development of community detection, we release several benchmark datasets from several problem domains and highlight their applications to various network analysis tasks. We conclude with discussions of the challenges of the field and suggestions of possible directions for future research.

Full PDF

aa r X i v : . [ c s . S I] J a n A Survey of Community Detection Approaches:From Statistical Modeling to Deep Learning

Di Jin, Zhizhi Yu, Pengfei Jiao, Shirui Pan, Philip S. Yu and Weixiong Zhang

Abstract —Community detection, a fundamental task for network analysis, aims to partition a network into multiple sub-structures tohelp reveal their latent functions. Community detection has been extensively studied in and broadly applied to many real-world networkproblems. Classical approaches to community detection typically utilize probabilistic graphical models and adopt a variety of priorknowledge to infer community structures. As the problems that network methods try to solve and the network data to be analyzedbecome increasingly more sophisticated, new approaches have also been proposed and developed, particularly those that utilize deeplearning and convert networked data into low dimensional representation. Despite all the recent advancement, there is still a lack ofinsightful understanding of the theoretical and methodological underpinning of community detection, which will be critically important forfuture development of the area of network analysis. In this paper, we develop and present a uniﬁed architecture of network community-ﬁnding methods to characterize the state-of-the-art of the ﬁeld of community detection. Speciﬁcally, we provide a comprehensive reviewof the existing community detection methods and introduce a new taxonomy that divides the existing methods into two categories, namelyprobabilistic graphical model and deep learning. We then discuss in detail the main idea behind each method in the two categories.Furthermore, to promote future development of community detection, we release several benchmark datasets from several problemdomains and highlight their applications to various network analysis tasks. We conclude with discussions of the challenges of the ﬁeldand suggestions of possible directions for future research.

Index Terms —Complex Network, Community Detection, Graph Clustering, Statistical Modeling, Deep Learning. ✦ INTRODUCTION N ETWORK science is the study of complex systems inthe form of networks using theories and techniques ofcomputer science, mathematics and physics. In particular,network structures [1] (see an example in Fig 1) have beenstudied extensively under the notions of subgraphs, net-work modules, and communities. Identiﬁcation of networkstructures or community detection is to divide nodes ina network into groups where the nodes in a group aredensely connected whereas nodes in different groups aresparsely linked. Mining network structures is also the keyto revealing and comprehending organizational principlesand operational functions of complex network systems. Forexample, community detection has been applied to appli-cations such as recommendation [2], [3], anomaly detection[4], [5], and terrorist organization identiﬁcation [6], just toname a few. Much effort has also been devoted to theanalysis of network structural properties, e.g., the smallworld effect (i.e., the average distance between nodes isshort [8]) and the scale free property (i.e., the distributionof node degrees follows a power law distribution [9]).Many algorithms for community detection have beenproposed, a majority of which employ exclusively the in-formation of network topology. They include hierarchicalclustering [10] [11], modularity optimization [12] [13] [14],spectral clustering [15] [16] and statistical inference [17] [18].New methods were developed to utilize node semantics ornode attributes in addition to network topology to improvethe quality of resulting communities and meanwhile pro-vide explanation to the results. These include heuristic opti-mization (multi-objective) [19] [20], matrix factorization [21][22] and Bayesian model [23]. As more complex network Fig. 1: An illustrative example (Zacharys karate club net-work [7]) showing community structure. The nodes of thisnetwork are divided into two groups, with most connectionsfalling within groups and only a few between groups.problems were tackled, complex network data from mul-tiple sources, e.g., network topology and node semantics,must be effectively integrated. As a result, it became difﬁcultfor these traditional approaches to perform data fusioneffectively on data of very high dimensions and diverseproperties. The technique of deep learning was recentlyadopted to handle the high dimensional network data andlearn low-dimensional representation of network structures.Examples include methods based on the auto-encoder [24][25] and the generative adversarial approach [26] [27].An important and effective idea for community detec-tion is to learn an adequate representation of the networkstructure of a given network. We call such approaches learning-based community detection . Among these methods arethe model-based generative models. The most popular and representative example is the stochastic block model (SBM)[28], which detects communities by formalizing a generativeprocess of a network as a sequence of rigorous probabilitydistributions. Several extensions and improvements havebeen introduced to boost the performance of SBM [29] [30].Another model-based learning approach adopted Markovrandom ﬁeld (MRF), an undirected graphical model, totake advantage of neighborhood structures in networks [31].A primary recent development in learning-based methodsexploited the low-dimensional representation capability ofdeep learning. For instance, convolutional neural network(CNN)-based methods [32] utilized convolution and pool-ing operations to reduce the dimensions of network data,resulting in low-dimensional representations, to effectivelydiscover communities in networks. Graph convolutionalnetwork (GCN) [33], which inherits the advantages of CNNand directly operates on network structured data, has alsobeen explored to derive community representation [34].Despite the endured effort to develop effective methodsfor community detection, there is still a lack of understand-ing of the theoretical and methodological underpinning ofcommunity detection, particularly that based on learning.To bridge this gap, in this paper, we will provide a syn-thesized survey of the existing representative methods. Wefocus particularly on two general lines of approaches, onebased on probabilistic modeling and the other on deeplearning. We start with a detailed description of each lineof the work and provide a thorough review and comparisonof the methods. We then consider several applications ofcommunity detection in diverse ﬁelds. We ﬁnish with thediscussion of some critical challenges of the ﬁeld of networkanalysis and directions for future rewarding research.One of the major objectives of our survey is to providea new perspective on the existing methods to help betterunderstand the fundamental issues and enabling techniquesfor community detection. Our survey differs from the pub-lished ones in three aspects. First, we summarize the exist-ing methods by focusing on learning, a central issue of com-munity modeling and community detection, whereas theexisting reviews [35] [36] generally discuss the chronologicaldevelopment of the existing methods. Second, we present arecent trend in the development of methods for communitydetection, i.e., from statistical modeling to deep learning,while the others focus mainly on individual techniques,e.g., statistical inference [37] or deep learning [38]. Third,we present a uniﬁed system architecture to characterize theexisting methods, which provides a novel and synthesizedperspective on the existing methods based on statisticalmodeling and deep learning, which go signiﬁcantly beyondsome of the existing surveys [39] [40].Aiming at offering a general guidance to researchersand practitioners who are interested in network science andnetwork data analysis, we make our unique contributionsin this work as summarized below. • We present the most comprehensive and extensiveoverview of learning-based community detection anddivide them into two categories of strategies, probabilisticgraphical model and deep learning. To the best of ourknowledge, this is the ﬁrst attempt devoted to communitydetection from the perspective of learning. It offers asolid foundation for understanding the intuition behind community detection, and can be used as a guideline fordesigning and utilizing different methods for communitydetection. • We provide a thorough theoretical analysis of learning-based community detection methods, discuss their sim-ilarities and differences, identify critical challenges thatremain poorly addressed and point out ﬁve directions forfuture development. • We gather abundant resources on learning-based com-munity detection, including state-of-the-arts benchmarkdatasets and applications.The rest of this survey paper is organized as follows.Section 2 gives the preliminaries and categorization of exist-ing community detection approaches. Section 3 presents atechnical overview of research progress in statistical mod-eling for community detection. Section 4 overviews theresearch on deep learning-based community detection. Sec-tion 5 discusses applications of community detection. Wesuggest promising future research directions in Section 6and conclude in Section 7.

RELIMINARIES AND C ATEGORIZATION

We introduce the terms and notations and present a classiﬁ-cation of the methods for community detection that we willdiscuss in this paper.

Deﬁnition 1.

Network . A network G = ( V, E, X ) consists of n nodes V = { v , v , . . . , v n } , m edges E = { e ij } ⊆ V × V ,and a maximal number q of attributes x i on a node v i , whereall x i ’s collectively give rise to an n × q attribute matrix X = ( x i ) n × q . The topological structure of G can be deﬁnedby an n × n adjacency matrix A = ( a ij ) n × n , where a ij = 1 if e ij ∈ E , or 0, otherwise. G is undirected if a ij = a ji , ordirected, otherwise [39]. Deﬁnition 2.

Community . The network G contains k communities C = {C , C , . . . , C k } , where C i is a subgraphof G and the nodes within C i are densely connected whereasthe nodes across C i and C j are sparsely connected. Thecommunities are non-overlapping when C i ∩ C j = ∅ ∀ i, j . Deﬁnition 3.

Community Detection . Given a network G , community detection is to design a mapping F to as-sign every node v i of G into at least one of the k com-munities, i.e., to label v i at least one community identity c i ∈ {C , C , . . . , C k } . Equivalently, the problem is to derivea community assignment of nodes C = ( c , c , . . . , c n ) . To help better understand the existing learning-based meth-ods and facilitate our discussion in the rest of the paper,we introduce a taxonomy of the methods for communitydetection (Fig. 2) where the methods are grouped into twocategories, probabilistic graphical model and deep learning.

Probabilistic graphical model-based methods employheuristics or meta-heuristics on community generation todiscover network communities. These methods typicallyadopt some models of network structures to describe thedependencies among the entities (i.e., nodes) via the edges

Fig. 2: Classiﬁcation breakdown of methods for community detection.of the networks. Depending the type of probabilistic graph-ical models used, community detection can be further di-vided into three main categories: directed graphical models,undirected graphical models and hybrid graphical modelsintegrating directed and undirected graphs. Speciﬁcally, di-rected graphical models are mainly based on hidden vari-ables (i.e., variables not observed in the sample), leveragingsimilarity of nodes or block structures, to generate theobserved edges in a network; undirected graphical modelsare usually based on ﬁeld structures, using the constraintsof unary and pairwise potentials (e.g., the community labelagreement between nearby nodes) to discover community;hybrid graphical models normally transform these twotypes of models into a uniﬁed factor graph to take advan-tages of both models for community detection.

Deep learning-based methods aim to identify com-munity structures utilizing a new type of community-oriented network representation. It derives the new net-work representation through some learning strategies thatmap network data from the original input space to alow-dimensional feature space, with the advantages oflow computational complexity and high capability for par-allelization. Depending on the learning strategies used,deep learning-based methods fall into four main categories:auto-encoder-based, generative adversarial network-based,graph convolutional network (GCN)-based, and integrat-ing graph convolutional network and undirected graphi-cal models. Concretely, auto-encoder-based methods exploitunsupervised auto-encoder, which encodes a network intoa low-dimensional representation in the latent space andreconstructs the network, along with its community struc-tures, from the low-dimensional representation. Generativeadversarial network-based methods adopt the idea of ad-versarial learning. They detect communities via an adver-sarial game between a generator and a discriminator. Graphconvolutional network-based methods extract communities by the propagation and aggregation of features on networktopology. Hybrid graphical model-based methods integrategraphical convolutional networks and undirected graphicalmodels by, for example, converting a Markov random ﬁeld(MRF) layer into GCN, to take advantages of both models.

OMMUNITY D ETECTION WITH P ROBABILISTIC G RAPHICAL M ODEL

Probabilistic graphical model-based methods ordinarily de-tect communities through network modeling, i.e., employ-ing graphical models to explain the generation process ofnetworks. In this section, we will focus on three generalmethods: directed graphical models, undirected graphicalmodels, and hybrid graphical models integrating directedand undirected graphs.

We will review the recent development of directed graph-ical models for community detection, including stochasticblock model, topic model and matrix factorization. Thesemethods have solid theoretical basis and reasonably goodperformance, and have been broadly applied.

Stochastic block model (SBM), an effective generative modelof network block structures, adopts statistical modeling forcommunity detection for the ﬁrst time [28]. The methodprobabilistically assigns nodes in a network to differentcommunities (block structures) using a node membershiplikelihood function, and then progressively infers the prob-abilities of node memberships by inferencing on the likeli-hood function to derive hidden communities in the network.Note that there are several SBM variants for communitydetection, but their core generation process is the same. Thebasic generation process can be divided into two steps: the ﬁrst is to iteratively assign a community to each node inthe network, and the second is to compute or update theprobability of two nodes connected by an edge.Taking a social network as an example, SBM can be usedto capture a probabilistic generation process with the com-munity distribution as a hidden variable. The communitiescan be reconstructed by maximizing a likelihood functionof the node community membership. In this social network,the nodes are partitioned into k disjoint communities withprobability ω = { ω , . . . , ω k } . Assuming there are twonodes v i and v j belonging to two communities C r and C s ,represented by c ir and c js . The probability that nodes v i and v j connected by an edge, i.e., a ij (0 or 1), obeys aBernoulli distribution with parameter π rs . The use of a linkprobability between nodes in two communities makes themodel ﬂexible with various types of network structures [41].The network generation distribution can be deﬁned as: P ( C | ω ) = n Y i =1 Multinomial( c i ; 1 , ω ) = n Y i =1 k Y r =1 ω c ir r , (1) P ( A | C, π ) = n Y i

Since the basic SBM is onlysuitable under the assumption that a node belongs to onlyone community, Airoldi et al . [29] propose a mixed member-ship stochastic blocks model (MMSB) that introduces mixedmembership to the stochastic model so that one node maybelong to multiple communities. MMSB allows communi-ties to overlap on a directed network, where a i → j indicateswhether there is a link (arrow) from nodes v i to v j . Foreach node v i , c i obeys a Multinomial distribution. If v i ∈ C r and v j ∈ C s , the community connection probability π rs follows a Beta distribution and a i → j ∼ Multinomial( ω i ) , a i ← j ∼ Multinomial( ω j ) , where ω is the mixed membershipparameter of nodes. The links between communities are rep-resented by a Bernoulli distribution. The joint distribution of MMSB can be formulated as: P ( A, π, ω, a i → j , a i ← j | α, β ) = Y i P ( ω i | α ) × P ( π | β ) Y i,j P ( a i → j | ω i ) P ( a i ← j | ω j ) P ( a ij | a i → j , a i ← j , π ) . (4)The process of community detection based on MMSB isdescribed in Appendix B [29], which assumes that theparameters are estimated by inference methods such as EM.The original MMSB is not good at handling diversetypes of the information of nodes in community, e.g., thenodes may represent people who are connected one anotherbased on different social relationships. To address this prob-lem, Fan et al . [43] propose a novel MMSB-based method,named Copula mixed membership stochastic block model(cMMSB) to introduce a Copula function into MMSB tomodel dependencies among nodes. Moreover, to boost theembedding performance of MMSB, Pal et al. [44] proposea mixed membership degree-correct SBM and develop aninference method of the posterior distribution with Markovchain Monte Carlo (MCMC). The degree-corrected SBM iswidely used, which we will discuss next. Degree-corrected SBMs.

Newman et al. [30] reason thatthe basic SBM divides nodes according to their degrees thatare usually nonuniform distributed. To accommodate thepossible broad degree distributions, they propose degree-corrected SBM (DCSBM), which introduces a degree param-eter to every node to scale the edge probabilities and makeexpected degrees match observed degrees. The probabilityfunction of network G is deﬁned as follows: P ( G | π, c ) = Y i

Different from the above methods, an-alyzing dynamic networks based on SBM is also a relativelyactive ﬁeld. Yang et al . [50] suggest a dynamic stochasticblock model named DSBM, which progressively updatesthe probabilistic model to ﬁnd community in large dynamicsparse network. Speciﬁcally, DSBM uses the distribution ofmodel parameter instead of the most likely values for themodel parameters in prediction, and provides an ofﬂineinference and an online inference to estimate the param-eters. DSBM assumes that nodes in a dynamic networkremain unchanged. Letting C T = { C (1) , C (2) , ..., C ( T ) } bethe collection of community assignments of all nodes over T discrete time steps, the generation process of DSBM isillustrated in Appendix B [50], and the likelihood function TABLE 1: Summary of SBM-based community detection, where ”AD k ” describes whether the approach can automaticallydetermine the number of communities, i.e., Yes or No. Categories Approaches Sketches Overlapping AD k Basic SBM (1983) [28] Propose a stochastic model for social network. No NoMMSB MMSB (2008) [29] Extend blockmodels for relational data to ones that capturemixed membership latent relational structure. Yes NocMMSB (2016) [43] Combine an individual Copula function with MMSB withimproved in capturing group interactions. Yes NoMMDCB (2019) [44] Propose a mixed membership degree-correct SBM and developan inference method of the posterior distribution with Markovchain Monte Carlo (MCMC). Yes NoDCSBM DCSBM (2012) [45] Introduce expected values to basic SBM to adapt multi-edgesand self-edges contained in social networks. Yes NosparseDCSBM (2017) [46] Propose a spectral clustering algorithm with normalized adja-cency matrix based on DCSBM. No YesCMM (2018) [47] Establish a convexiﬁed modularity maximization approach forestimating the hidden community based on DCSBM. No NoDynSBM dMMSB (2009) [48] Propose a state space MMSB which can track across timedynamic evolution. Yes NoDynamicSBM (2010) [49] Propose a novel Bayesian approach for network tomographicinference buliding on MMSB and apply in dynamic network. No NoDSBM (2011) [50] Capture the evolution of community by modeling the transi-tion of community memberships for individual nodes. No NoDBTDP (2014) [51] Proposes a dynamic stochastic block model with temporalDirichlet process for hidden community. Yes NoSBTM (2015) [52] Provide a local search algorithm for the inference procedure oftime evolution. No NodDCSBM (2016) [53] Propose a dynamic DCSBM to model and monitor dynamicnetworks that undergo a signiﬁcant structural change. No YesDPSBM (2019) [54] Establish a fully Bayesian generation model with the hetero-geneity of node degree. No YesSNR-DSBM/ER (2020) [55] Focus on estimating the location of a single change point ina dynamic stochastic block model and take a least squarescriterion function for evaluating each point in time. No YesfcMMSB (2020) [56] Propose a non-parametric fragmentation coagulation basedMMSB to capture the community information for entities andlinkage-based clustering to derive the group information forlinks simultaneously. Yes NoOSBM OSBM (2011) [57] Provide a global and local variational technique for discover-ing community. Yes NoK-LAFTER (2018) [58] Present a small-variance asymptotics based SBM for overlap-ping community detection. Yes YesMNPAOCD (2020) [59] Optimize the inference process and expect parameters in pro-ceeding. Yes YesLSBM LMBP (2015) [18] Combine heterogeneous distribution with SBM to link com-munity detection. Yes YesGNNSBM DGLRFM (2019) [60] Design a GNN-based overlapping SBM framework and can beadapted readily for other types of SBMs. Yes No of the model is as follows: P ( W ( T ) , C ( T ) | ω, π, A ) = T Y t =1 P ( W ( t ) | C ( t ) , π ) T Y t =2 P ( C ( t ) | C ( t − , A ) P ( C (1) | ω ) , (6)where W ( t ) and C ( t ) denote the snapshot of a network andthe community assignments of nodes at a given time step t .Following DSBM, Tang et al . [51] introduce the Dirichletprocess to SBM, which can ﬁnd the optimal number ofcommunities of evolution, and in turn, alleviate the problemof ﬁxed community number of dynamic social networks. Xu et al . [61] propose a new approach named stochastic blocktransition model (SBTM) that includes two hidden Markovassumptions for dynamic network. Wu et al . [62] proposea full Bayesian generation model, which incorporates theheterogeneity of the degrees of nodes to model dynamic complex networks. Bhattacharjee et al . [55] optimize SBMwith change-point estimation in dynamic social networks.Inspired by the success of MMSB, some dynamic SBMsbased on MMSB have been proposed. Xing et al . [49] proposea variant MMSB model for dynamic networks. Fu et al . [48]design a state space mixed membership stochastic blockmodel with crossing time. Moreover, to improve the embed-ding performance of MMSB and fully describe communityevolution over time, Yu et al . [56] introduce community levelto MMSB. They combine the discrete fragmentation coag-ulation process (DFCP) into their framework to relax theconstraints of ﬁxed size compatibility matrix over time inMMSB. Besides, there also exists DCSBM-based approaches.Wilson et al . [53] suggest a dynamic version of DCSBMto model and monitor dynamic networks that undergo asigniﬁcant structural change. Others.

In addition to the above methods, there are several other extensions to the basic SBM where commu-nities can overlap, as summarized in Table 2. For instance,OSBM represents the SBM that are designed to ﬁnd over-lapping communities and LSBM represents the SBM thatare extended to ﬁnd link community. Speciﬁcally, Latouche et al . introduce OSBM with global and local variationaltechnology. Jin et al . [59] provide a stochastic model to ac-commodate the relative importance and the expected degreeof every node in each community, and improve the inferencetechnique that it uses.Link communities are often more informative and in-tuitive than node communities, because links usually haveunique identities, whereas nodes may have multiple roles.For instance, in a social network, most individuals belongto multiple communities such as families or friends, whilethe link between two individuals often exists for a dominantreason, which may represent family ties or friendship. Fur-thermore, multiple links connecting to a node may belong todistinct link communities, so that the node can be assignedto multiple communities of links. He et al . [18] combineheterogeneous distributions (e.g., power law distribution) ofcommunity sizes with SBM for link community detection.They suggest a stochastic model for link community andextend the model by introducing a scheme of interactivebipartition. Besides the above models, Mehta et al . [60]introduce graph neural network into SBM, which integratedeep learning and SBM for the ﬁrst time.

Topic model, such as Latent Dirichlet Allocation (LDA) [63],is a statistical model capable of modeling hidden topicsbehind texts in natural language processing. LDA modelstopics by employing latent variables, which have attractedsigniﬁcant interests and have been widely used in detectingcommunities. Topic models can be grouped into two cate-gories: one models network structures as documents and theother models attributes of network, such as user interests, todetect communities.

Modeling network structures as documents.

We takeLDA as an example to describe the principle of the methodsin the ﬁrst category. To be speciﬁc, a method in this groupﬁrst assumes that each node in a network may belongto multiple communities, and thus the communities areregarded as ”topics” while the nodes are taken as ”docu-ments”. It then selects several initial communities, and iter-atively updates the communities according to the topologyof the network to obtain resulting communities. Among theexisting methods, a representative model is SSN-LDA [64],which is a LDA-based hierarchical Bayesian algorithm onlink networks where communities are modeled as latentvariables. Nodes in such a social network are regardedas social actors and edges as social interaction. Social in-teraction proﬁle (SIP) of each social actor, consisted of aset of neighbors and weights, is used to characterize theactor. Speciﬁcally, in SSN-LDA, a social network is viewedas a corpus, where social interaction proﬁles are regardedas documents and the occurrence of social interaction isdeemed as words. The nodes are modeled as a corpusby SSN-LDA, which mines communities on transformedcorpus, and this problem is equivalent to topic detection on corpus utilizing LDA. The generation process of SSN-LDA for one social interaction proﬁle ( SIP i ) is clariﬁed inAppendix C [64], and the joint distribution is written as: P ( a i , c i , ~θ i , ~φ | ~α, ~β ) = N i Y j =1 P ( a ij | ~φ c i ) P ( c i | ~θ i ) P ( ~θ i | ~α ) P ( ~φ | ~β ) , (7)where ~φ is the mixture component of community c i , N i is the number of social interactions in a social interactionproﬁle ( SIP i ), ~θ i is the community mixture proportion for SIP i , and ~α and ~β are the Dirichlet prior distribution hyper-parameters that are known. Using social network attributes.

Numerous topic mod-els utilize attributes of social network, e.g., user interests, todiscover community. Yin et al . [65] propose to integrate com-munity detection and topic model, which gives rise to latentcommunity topic analysis (LCTA). Their method divides thesampling process into user node and link samplings. Theprocess is to sample all network connections after samplinga user node, and exploit the sampling results of these twostages as the sampling result of user node. LCTA assignscommunity membership attributes to each user node andlink. After the sampling process, user nodes can be assignedto communities based on community membership. Theadvantage is that the two-stage sampling process forms asampling area with user nodes as the core, which can simu-late the semantic inﬂuence of user nodes on surroundinglinks. The disadvantage is that LCTA does not considerthe link relationship of social network when assuming thedegree of community membership, which may disconnectindividual communities. Further, Cha et al . [66] design atree relationship model according to the topic information offollowers in social network, use hierarchical LDA to modelthe text information in tree relationship model, and proposeHLDA for semantic social network analysis.A method of combining topic model with Bayesianmodel is proposed recently by Xu et al . [67]. They deﬁne ajoint probability distribution for all possible attributed net-works. For a given attributed social network to be clustered,the model assigns a probability to each possible clustering ofnodes. Therefore, the clustering problem can be transferredto the problem of ﬁnding the clusters that have the highestprobability. The algorithm for clustering attribute communi-ties is shown in Appendix C [67]. The Bayesian probabilisticmodel for clustering attributed networks is as follows: P ( α, θ, φ, A, X, C | ε, λ, µ, ν )= P ( α | ε ) P ( θ | λ ) P ( φ | µ, ν ) P ( C | α ) P ( A | C, φ ) P ( X | C, θ ) , (8)where α denotes the probability of the nodes belonging todifferent communities, θ represents the attribute probabilitydistribution of nodes, φ denotes the edge occurrence proba-bility between communities, ε and λ are the Dirichlet priordistribution hyper-parameters, and µ and ν are the Betaprior distribution hyper-parameters.Later, He et al . [68] introduce a generative model forsimultaneously identifying communities and deriving theirsemantic description. They combine a nested EM algorithmwith belief propagation, and explore the hidden correlation between the two parts to improve resulting communitiesand description. The method proposed by Jin et al . [69]differs signiﬁcantly from the other existing methods. Theyobserve that the attributes usually embody a hierarchicalsemantic structure. To handle this, they propose a novelBayesian model named BTLSC, which distinguishes wordsfrom background and general from specialized topics. Thismodel comprises three components: a topological compo-nent for describing network community, a context compo-nent for describing semantics, and a probabilistic transitionmachine linking the ﬁrst two components.Unlike traditional topic models that assume that thetopics of social network are independent, topic embeddingmethods focus on describing correlations between topicsby embedding words and topics into topic models. He etal . [70] present a topic embedding model that combinesdistributed representation learning with topic correlationmodeling. Jin et al . [71] develop a novel topic embed-ding model named community-enhanced topic embedding(CeTe), which combines topic documents and networkstructures to detect communities. CeTe consists of threecomponents: a document component for describing topics,a topological component for representing network commu-nities, and a probabilistic transition mechanism connectingthe ﬁrst two parts. Speciﬁcally, CeTe uses a DCSBM todescribe the sub-component of network community, wherecommunities obey a Dirichlet distribution, and topics obeya Uniform distribution. For each document, the commu-nity assignment is drawn from a Multinomial distribution,whereas the link between two documents obeys a Bernoullidistribution. For each word, CeTe draws topic distributionfollowing a Multinomial distribution. Non-negative matrix factorization (NMF) [72] is anotherdirected graphical model for community detection. Speciﬁ-cally, the NMF-based methods assume there are k commu-nities in a network, and deem the adjacency matrix A =( a ij ) n × n ∈ R + as a non-negative matrix to be decomposed,where a ij denotes the likelihood if there is a connectionbetween nodes v i and v j . We deﬁne W = ( w ir ) n × k ∈ R + and H = ( h jr ) n × k ∈ R + , whose elements w ir and h jr represent the likelihoods that v i generates an out-edge, i.e.,an edge starting from v i , and v j generates an in-edge, i.e., anedge ending at v j , belonging to the r -th community. Then,the likelihood that nodes v i and v j are connected can bedescribed as: f a ij = k X r =1 w ir h Tjr . (9)As a result, the community detection problem can be repre-sented as e A = WH T . In general, there are two classic lossfunctions to evaluate the performance of NMF. The ﬁrst isthe square of the Frobenius norm of the difference between A and e A [73], which is deﬁned as: J = min W≥ , H≥ (cid:13)(cid:13)(cid:13) A − WH T (cid:13)(cid:13)(cid:13) F . (10)The second is the KL-divergence that measures their differ-ence, which is described as: J = KL ( A ||WH T ) . (11) Furthermore, for a undirected network with A beingsymmetric, the non-negative factorization matrices W and H should be equal. In this paper, we use B to representthese matrices, and equation (10) can be rewritten as: J = min B ≥ (cid:13)(cid:13)(cid:13) A − BB T (cid:13)(cid:13)(cid:13) F . (12)NMF is initially used to identify non-overlapped com-munity. Since it is easily extendable, NMF has been adoptedto solve other types of community detection problems, suchas overlapping, attributed, dynamic and semi-supervised,as summarized in Table 3. In the rest of this section, we willdiscuss them in details. Basic NMF.

Kuang et al . [74] propose a general ap-proach, which inherits the advantages of NMF by enforc-ing non-negativity on the clustering assignment matrix, forgraph clustering. Shi et al . [75] present a novel pairwiseconstrained non-negative symmetric matrix factorization(PCSNMF) method, which imposes pairwise constraintsgenerated from ground-truth community information, toimprove the performance of community detection. Sun etal . [76] design a non-negative symmetric encoder-decoderapproach to derive a better latent representation to improvecommunity detection. Unlike other NMF-based methodsthat merely pay attention to the loss of the decoder, theycombine the loss of the decoder and encoder to construct auniﬁed loss function, so that the community membership ofeach node obtained is clearer and more explanatory.

Overlapping NMF.

Overlapping community detectionis another active research topic due to the overlapping andnesting properties of real-world networks. Wang et al . [21]develop a NMF framework to identify non-overlapping andoverlapping community structure, and give a symmetricNMF formula for undirected networks. Moreover, they clar-ify the methods on asymmetric NMF and joint NMF, wherethe former is capable of identifying community structuresin directed networks, while the latter is more suitable forcompound networks (e.g., an automatic movie recommen-dation system which contains three networks: user net-work, movie network and user-movie network). Yang etal . [77] present a cluster afﬁliation model BIGCLAM todetect densely overlapping, hierarchically nested, and non-overlapping communities in a massive network. Speciﬁcally,BIGCLAM ﬁrst builds on communities based on communityafﬁliation of nodes, i.e., each node has an afﬁliation strengthto each community via assigning node-community pair anon-negative latent factor, and then combines the NMFwith block stochastic gradient descent, so as to estimate thenon-negative latent factors to detect communities in largenetworks. The loss function of the model is deﬁned as: J = X e ij ∈ E log(1 − exp( − b i b Tj )) − X e ij / ∈ E b i b Tj . (13)Besides the methods discussed earlier, Cao et al . [78] pro-pose a novel method called community detection withnon-negative matrix factorization (CDNMF), which can notonly identify overlapping communities, but also efﬁcientlydetect outlier nodes and hubs in a community. Zhang etal . [79] propose a preference-based non-negative matrixfactorization (PNMF) that contains implicit link preferenceinformation. Based on the fact that most nodes prefer to link TABLE 2: Summary of NMF-based community detection, where ”AD k ” describes whether the approach can automaticallydetermine the number of communities, i.e., Yes or No. Categories Approaches Sketches Objective Functions Overlapping AD k Basic SymNMF (2012) [74] Develop a symmetric NMF frame-work based on Newton-like forgraph clustering. (cid:13)(cid:13) A − BB T (cid:13)(cid:13) F No NoPCSNMF (2015) [75] Present a symmetric NMF methodwith pairwise constraints generatedfrom the ground-truth communityinformation. (cid:13)(cid:13) A − BB T (cid:12)(cid:12) F + α [Tr( B T M B Q )+Tr( B T P B )] No NoNSED (2017) [76] Pose a non-negative symmetric en-coder and decoder approach to ob-tain a better network representation. k A − WHk F + (cid:13)(cid:13) H − W T A (cid:13)(cid:13) F No NoOverlapping SNMF, ANMF,JNMF (2011) [21] Apply NMF to community detectionﬁrstly. (cid:13)(cid:13) A − BB T (cid:13)(cid:13) F Yes NoBIGCLAM (2013) [77] Present a cluster afﬁliation model foroverlapping, hierarchically nestedcommunity detection in large scalenetworks. P e ij ∈ E log(1 − exp( − b i b Tj )) − P e ij / ∈ E b i b Tj Yes YesCDNMF (2013) [78] Propose a NMF model to detect out-lier nodes and hubs in a commu-nity besides identifying communitystructure. (cid:13)(cid:13) A − B R B T (cid:13)(cid:13) F Yes NoPNMF (2015) [79] Provide a preference-based NMFmodel containing implicit link pref-erence information. max B ∈ R n × k + Q v i ∈ V p ( > v i | B ) Yes NoHNMF (2016) [80] Design a homophily-based NMFmethod to model both-sided rela-tionships between links and com-munities. (cid:13)(cid:13) A − BB T (cid:13)(cid:13) F Yes NoAttribute NMTF (2015) [81] Develop a NMF clustering frame-work combining nodes’ relationsand users’ contents to detect com-munity structure. (cid:13)(cid:13) M u − u − U H U T (cid:13)(cid:13) F + (cid:13)(cid:13) M t − f − V H N T (cid:13)(cid:13) F + (cid:13)(cid:13) M u − f − U H N T (cid:13)(cid:13) F No NoSCI (2016) [82] Propose a semantic communityidentiﬁcation method, which can an-notate semantic as well as detectcommunity. k B − XS k F + α P kr =1 k S (: , r ) k + β (cid:13)(cid:13) A − BB T (cid:13)(cid:13) F No NoDynamic DBNMF (2016) [83] Present a Bayesian probabilisticmodel based on NMF to identifyoverlapping communities in tempo-ral networks. − log P ( V t | B t ) − log P ( B t | B ′′ t − , α ) − log P ( B t | β t ) − log P ( β i ) Yes YessE-NMF (2017) [84] Develop a semi-supervised evolu-tionary NMF framework for dy-namic community detection viaprior information. (cid:13)(cid:13)(cid:13) A t − ˜ B t ˜ B t ′ (cid:13)(cid:13)(cid:13) F No NoSemi-supervised USSF (2015) [85] Present a uniﬁed semi-supervisedcommunity detection algorithmbased on combination of prior andtopology information aimed NMF. L α ( A, B ) + λ R β ( O, B ) No NoPSSNMF (2017) [86] Propose a semi-supervised NMFmethod. (cid:13)(cid:13) A − BB T (cid:13)(cid:13) F + λ Q ( B ) Yes No their neighbors in the same community, PNMF maximizesthe likelihood of the order of priority connections for eachnode. The loss function is formulated as follows: J = max B ∈ R n × k + Y v i ∈ V p ( > v i | B ) , (14)where > v i denotes the observed preferences for node v i .The likelihood of preference order for a single node v i , i.e., p ( > v i | B ) , is described as: p ( > v i | B ) = Y ( v j ,v k ) ∈ V × V p ( v j > v i v k | B ) τ ( v j ∈ N + ( i )) τ ( v k ∈ N − ( i )) · (1 − p ( v j > v i v k | B )) − τ ( v j ∈ N + ( i )) τ ( v k ∈ N − ( i )) , (15) where N + ( i ) denotes the set of node v i ’s neighbors, and N − ( i ) denotes the set of node v i ’s non-neighbors. In ad-dition, τ ( · ) is a binary indicator function: if the parametercondition in τ is true, such as v i ∈ N + ( i ) , the value is 1, and0 otherwise. Further, they develop a new homophily-basedmethod, which clariﬁes how the community membership ofnode is represented by its linked neighbors via modeling thetwo-way relationship between links and communities [80]. Attribute NMF.

Recently, it has attracted a substantialamount of interest to the semantic information of com-munity structure utilizing NMF, i.e., delineating the corre-sponding community semantic information while identify-ing community structure [81], [82], [87]. In particular, Pei etal . [81] combine social relations and content of users to de- tect communities via a non-negative matrix tri-factorization(NMTF)-based clustering with three types of graph regu-larization. Here, NMTF can combine the relations and con-tent seamlessly and graph regularization can capture usersimilarity, message similarity and user interaction explic-itly. However, the above method merely exploits networktopology and content information to discover communities,without considering how to utilize the mined contents, i.e.,semantic information, to explain the meaning of communi-ties. To address this issue, wang et al . [82] propose a semanticcommunity identiﬁcation called SCI, which integrates thecommunity membership matrix denoting network topologyand community attribute matrix representing semantic in-formation. Their approach not only conducts communitydetection effectively, but also annotates communities withsemantic information to make the result to be easily inter-preted. The loss function of the model is deﬁned as: min B ≥ ,S ≥ J = k B − XS k F + α k X r =1 k S (: , r ) k + β (cid:13)(cid:13)(cid:13) A − BB T (cid:13)(cid:13)(cid:13) F , (16)where S represents attribute community matrix, α is a trade-off hyper-parameter between the ﬁrst error and the secondsparsity term, and β is a positive parameter for setting theproportion of the contribution of network topology. Dynamic and semi-supervised NMF.

It deserves furtherattention that, during recent years, several investigatorshave extended NMF in the ﬁeld of dynamic and semi-supervised community detection, and achieved encourag-ing results. For dynamic community detection, Wang et al .[83] utilize a Bayesian model based on NMF to identify over-lapping communities on temporal networks, and automat-ically derive the number of communities in each snapshotnetwork based on automatic relevance determination. Theloss function is as follows: J t = − log P ( V t | B t ) − log P ( B t | B ′′ t − , α ) − log P ( B t | β t ) − log P ( β i ) , (17)where V t is a snapshot of a temporal network, B t is the non-negative matrix obtained via V t , and B ′′ t − is the new B t − which has been adjusted according to the node distributionof B t . β ( · ) is a parameter from a half normal distribution,and α is a parameter to balance the clustering results ofthe current and previous snapshot networks. Later, Ma et al .[84] show that NMF can be applied to dynamic communitydetection via clarifying the equivalence relationship amongevolutionary spectral clustering, evolutionary NMF andoptimization of evolutionary modularity density. Therefore,they employ the above equivalence relationship to developa semi-supervised evolutionary NMF method, named sE-NMF, to integrate the prior information to detect communi-ties in dynamic temporal networks.For semi-supervised community detection, Yang et al .[85] put forward a uniﬁed semi-supervised algorithm bycombining prior information and topology informationaimed at two non-negative matrices generated by NMF.Moreover, with the must-link prior information (i.e., theprior information that a node pair composed of two nodesmust belong to the same community [93]), they add a graph regularization item as a penalty function to the lossfunction to minimize the difference between nodes in thesame community, thereby improving the performance ofcommunity detection. The loss function is deﬁned as: J ( B | A, O ) = L α ( A, B ) + λ R β ( O, B ) , (18)where O denotes the matrix of prior information. L α ( A, B ) is the loss function of NMF, where α ∈ { LSE, KL, SYM,MOD, ADJ, LAP, NLAP } is the parameter to measure sim-ilarity. λ R β ( O, B ) is a graph regulation term, where λ isthe tradeoff parameter between the loss function and graphregulation term, and β ∈ { LSE, KL } is the speciﬁc graphregulation term: R β ( O, B ) = ( Tr (cid:0) B T LB (cid:1) , β = LSE P i,j,o ij o ij ( KL ( b i || b j ) + KL ( b j || b i )) , β = KL, (19)where L is Laplacian matrix, KL ( · ) is the KL-divergence,and o ij ∈ O. Liu et al . [86] propose a semi-supervisedNMF method named SSNMF, which integrates graph regu-larization representation and pair-wise constraints to NMF.Since the inherent geometrical structures of nodes belongingto the same community may easily be lost when high-dimensional data are mapped into low-dimensional space,they introduce node popularity parameters to make priorinformation better facilitate community detection. The lossfunction is deﬁned as: J = (cid:13)(cid:13)(cid:13) A − BB T (cid:13)(cid:13)(cid:13) F + λ Q ( B ) , (20)where Q ( B ) denotes semi-supervised term of graph regula-tion for non-negative matrix B , and λ is a balance parameterbetween the loss function and graph regulation term. To the best of our knowledge, the existing studies of undi-rected graphical models for community detection mainlyexploit Markov random ﬁeld (MRF) [94]. MRF, a kind ofrandom ﬁeld, has enjoyed much success covering a varietyof applications, such as computer vision and image pro-cessing. Particularly, we are interested in its applicationsto community detection. The MRF-based methods can begrouped into two categories (as summarized in Table 4):one is the modeling based on MRF that detects communityrelation based on network topology and the other exploitthe information of semantic attributes.

Topology MRF. He et al . [31] ﬁrst apply MRF to net-work analysis where data are organized on networks withirregular structures, and propose a network-speciﬁc MRFapproach, namely NetMRF, for community detection. Thismethod effectively encodes the structural properties of an ir-regular network in an energy function so that the minimiza-tion of the energy function gives rise to the best communitystructures. The energy function can be represented as thesum of pairwise potential functions, written as follows: E ( C ; A ) = X v i = v j Θ ij ( c i , c j ; a ij )= X v i = v j [ − ( − δ ( c i ,c j ) ( d i d j m − a ij )] , (21) TABLE 3: Summary of MRF-based community detection.

Categories Approaches Sketches Object Functions

Topology NetMRF (2018) [31] Apply MRF to community detection ﬁrstly. P v i = v j [ − ( − δ ( c i ,c j ) ( d i d j m − a ij )] GMRF (2019) [88] Optimize network embedding and develop ageneral MRF framework by incorporating net-work embedding into MRF to better detect com-munity structure. P v i Θ i ( c i ) + P e ij ∈ E Θ ij ( c i , c j ) ModMRF (2020) [89] Propose a MRF method formalizing modularityas the energy function for community detection. P v i ,v j ∈ V − ( a ij − d i d j m ) δ ( c i , c j ) Topology & attribute attrMRF (2019) [90] Present a model integrating LDA into MRF toform an end-to-end learning system for commu-nity detection. P v i = v j Θ ij ( c i , c j ; a ij ) − n P r =1 1 β ln f θ r − q P p =1 1 β ln f φ p Combining GNN MRFasGCN (2019) [91] Design a new approach based on the combi-nation of GCN and MRF for semi-supervisedcommunity detection. − n ′ P i =1 k P j =1 Y ij ln Z ij GMNN (2019) [92] Propose a new approach combining the advan-tages of both statistical relational learning andgraph neural network for semi-supervised nodeclassiﬁcation. E q θ ( y U | x V )[log p φ ( y L , y U | x V ) − log q θ ( y U | x V )] where δ is the probability of nodes v i and v j falling intothe same community partition, d i is the degree of node v i , and m is the numbers of edges. According to [94], thesmaller the function, the better the community partition.Further, Jin et al . [89] formalize the modularity functionas a statistical model and propose a novel MRF methodfor community detection. This method redeﬁnes the energyfunction via the approach of modularity representation,and leverages the max-sum belief propagation (BP) to infermodel parameters to improve the performance. The energyfunction is represented as follows: E ( C ; A ) = X v i ,v j ∈ V − ( a ij − d i d j m ) δ ( c i , c j ) . (22)Moreover, to overcome the issue of losing vital structuralinformation between nodes after network embedding, Jin et al . [88] propose a general MRF method to incorporatecoupling relationship between pairs of nodes in networkembedding to better detect community. In this method, theenergy function is composed of two components: a set ofunary potentials that make the network embedding to playa dominant role and a set of pairwise potentials that utilizeconstraints on node pairs to ﬁne-tune unary potentials.Formally, the complete energy function can be deﬁned as: E ( C ; A ) = X v i Θ i ( c i ) + X e ij ∈ E Θ ij ( c i , c j ) , (23)where Θ i and Θ ij are the unary potential function andpairwise potential function respectively. Topology & attribute MRF.

The combination of MRFand node semantic models (e.g., a topic model) have beena recent research focus. However, methods that directlyintegrate MRF with node semantic models cannot in gen-eral achieve satisfactory results. It is mainly because theparameters of the two models cannot be adjusted to supporteach other, making it difﬁcult to combine the advantagesof the two methods. He et al . [90] propose a new model,named attrMRF, to integrate LDA [63] and MRF to form anend-to-end learning system to train the parameters jointly. Concretely, attrMRF ﬁrst transforms LDA and MRF into auniﬁed factor graph, realizing the effective integration ofdirected graphic model (i.e., LDA) and undirected graphicmodel (i.e., MRF). Then it adopts a backpropagation (BP)algorithm to train the parameters simultaneously, resultingin an end-to-end learning of the two models. The globalenergy function of this model is represented as: E ( Z, C ; A, X, α, β ) = X v i = v j Θ ij ( c i , c j ; a ij ) − n X r =1 β ln f θ r − q X p =1 β ln f φ p , (24)where P v i = v j Θ ij ( c i , c j ; a ij ) denotes the global energy po-tential of MRF as deﬁned in (21), β is a temperaturecoefﬁcient, f θ r and f φ p are the intermediate results gener-ated by LDA joint probability distribution. Besides attrMRF,there are also several approaches incorporating probabilisticgraphical models into deep learning, such as MRFasGCN[91] and GMNN [92], which will be covered in detail later. Directed and undirected graphical models have also beenintegrated to detect communities in complex networks. Thistype of integration is typically implemented via a factorgraph model. A factor graph [95] is a tuple ( V, F , ε ) consist-ing a set V of variable nodes, a set F of factor nodes, and aset ε ⊆ V × F of edges each of which connects a variablenode and a factor node. Taking MRF as an example, the jointprobability distribution of a factor graph is described as: p ( y ) = 1 Z Y F ∈F ψ F ( y N ( F ) ) , (25)where Z = P y ∈ Y Q F ∈F ψ F ( y N ( F ) ) denotes a normaliza-tion factor, and N ( F ) = { v i ∈ V : ( v i , F ) ∈ ε } is the set ofvariable nodes adjacent to factor node F .Yang et al . [96] ﬁrst propose an instantiation model basedon factor graph, which incorporates three layers, bottomlayer (observed nodes), middle layer (hidden vector) and top layer (latent variables for communities). It utilizes node-feature and edge-feature functions to mine dependenciesbetween bottom and top layer nodes to represent corre-sponding communities, so as to better detect communities.Further, Jia et al . [97] apply factor graph model to ego-center network (a kind of representation of human socialnetworks, which is used to represent the network betweenan individual and others that the ego has a social relation-ship with [98]), and propose an ego-centered method toanalyze social academic inﬂuence on co-author networks.This method model the ego-centered community detectionin a uniﬁed factor graph, employing a parameter learningalgorithm to estimate the topic-level social inﬂuence, thesocial relationship strength between these nodes and com-munity structures to detect ego-community structures.These methods merely identify the structure of commu-nities and ignore the semantic information of community,which is much critical for understanding the meaning ofcommunity structure. He et al . [90] employ a factor modelto overcome the deﬁciency that directed graphical model(i.e., LDA) and undirected graphical model (i.e., MRF) areinsufﬁcient to integrate due to parameter sharing and jointtraining and to make the discovered community structuresemantically interpretable. The joint probability distributionof MRF and LDA formulated in factor graph is written as: P ( Z, C ; A, X, α, β ) = 1 Z n Y r =1 f θ r q Y p =1 f φ p Y v i = v j f γ ij , (26)where Z denotes normalization term, f θ r and f φ p are de-ﬁned in (24), and f γ ij is the pairwise potential of nodes v i and v j . Their major contributions lie in that they adopt thefusion technique of MRF and LDA to deal with communitydetection, which can well overcome the difﬁculties that twomodel’s parameters are hard to share and train together viafactor graphs and belief propagation.The emergence of factor graph models that integratedirected and undirected graphical models has greatly im-proved the performance of community detection. However,these probabilistic graph models generally adopt variationalinference or Markov chain Monte Carlo (MCMC) samplingfor model optimization, which inevitablely leads to highcomputational complexity. Deep learning, with the abilityto effectively optimize on high-dimensional network data,has a potential in handling community detection. OMMUNITY D ETECTION WITH D EEP L EARNING

In recent years, deep learning has drawn a great deal ofattention and has been demonstrated to have great power ona wide variety of problems, including community detection.Classic deep learning explores and exploits convolutionalneural networks (CNNs) and probability modeling for com-munity detection. For example, Sperl`ı et al . [32] design anovel approach, based on CNNs and the topological char-acteristics of adjacency matrices, for automatic communitydetection. Sun et al . [115] propose a probabilistic generativemodel, i.e., vGraph, to jointly detect overlapping (and non-overlapping) communities and learn node (and community)representation. vGraph represents each node by a mixtureof communities and deﬁnes a community as a Multinomialdistribution over nodes. Although these methods have had reasonable perfor-mance on discovering communities, they are straightfor-ward applications of deep learning to community detection[116], without considering the characteristics of networks,e.g., irregularity of network topology and complex networkstructures. In this section, we will discuss the following fourtypes of methods that are designed for complex networks,i.e., auto-encoder-based, generative adversarial network-based, graph convolutional network-based, and methodsintegrating graph convolutional network and undirectedgraphical models.

Auto-encoders [117] are simple but important neural mod-els that convert high-dimensional (network) data intolow-dimensional representations. Concretely, auto-encoderslearn a new representation of data in an unsupervisedmanner using the encoder and decoder components. Theyalways have multi-hidden layers and a symmetrical archi-tecture, and the output of one layer is the input to its succes-sive layer. The objective of auto-encoders is to minimize theerror between original input and reconstructed data to learnan optimal hidden representation, which can be denoted as:

Loss ( θ , θ ) = n X i =1 l ( x i , g ( f ( x i ; θ ); θ )) , (27)where f ( · ; θ ) and g ( · ; θ ) are the encoder and decoder withparameters θ and θ , and l ( · ) is the loss function.Herein, we choose several representative auto-encoder-based models for network community detection, and sum-marize their main characteristics in Table 4. Since mostauto-encoder-based methods derive network embeddingsas their outputs (e.g., [99], [102]), clustering, such as K-means and spectral clustering, is subsequently applied toextract communities. An alternative is to integrate cluster-ing into the model (e.g., [104], [107]), to directly discovercommunities. Depending on the type of auto-encoder used,we divide the models into four types, namely stacked,sparse, denoising and variational auto-encoders. Stackedauto-encoder, a basic type that consists of a series of auto-encoders, is often used as a block for other types of auto-encoders. Particularly, when a stacked model has othertargets, such as sparsity and denoise, we classify them assparse or denoising auto-encoders. Stacked auto-encoders.

The semi-DNR [99] stacks asequence of auto-encoders to form a deep nonlinear recon-struction of the input networks (DNR), and requires thateach layer of the encoder contains fewer neurons than theprevious layer to reduce the data dimension and extract themost salient features in the input data. Semi-DNR makes fulluse of the prior knowledge if v i and v j belong to the samecommunity to incorporate pairwise constraints between thetwo nodes in the network. Speciﬁcally, it deﬁnes a priorinformation matrix O = ( o ij ) n × n , where o ij = 1 if v i and v j are known to be in the same community, or 0, otherwise.The loss function for semi-DNR is represented as: Loss = l ( M, Z ) + λ Tr( H T LH ) , (28)where L = D − O , Tr( · ) is the trace of a matrix, M the modularity matrix, Z the reconstruction data, H the TABLE 4: Summary of auto-encoder-based community detection, where ”A” and “X” denote whether the approach utilizenetwork topology and node attributes respectively, and ”-” represents no constraint.

Categories Approaches A X Encoder Decoder Focus Constraints

Stacked Semi-DNR (2016) [99] Yes No MLP MLP Network embedding Pairwise constraintDIR (2017) [100] Yes Yes MLP MLP Network embedding -INSNCCD (2018) [101] Yes Yes MLP MLP Network embedding Modularity maximizationAAGR (2018) [102] Yes Yes MLP MLP Network embedding Adaptive parameterCDDTA (2019) [103] Yes No MLP MLP Network embedding Regularization termDeCom (2019) [104] Yes No MLP MLP Clustering result Modularity maximizationNEC (2020) [105] Yes Yes GCN Inner product Embedding and clustering Modularity maximizationSparse GraphEncoder (2014) [106] Yes No MLP MLP Network embedding Sparsity constraintDFuzzy (2018) [107] Yes No MLP MLP Clustering result Sparsity constraint andmodularity maximizationCDMEC (2020) [108] Yes No MLP MLP Clustering result Sparsity constraintDenoising MGAE (2017) [24] Yes Yes GCN GCN Network embedding Interplay exploitationGRACE (2017) [109] Yes Yes MLP MLP Embedding and clustering Propagation constraintVariational ARVGA (2018) [110] Yes Yes GCN Inner product Network embedding Prior constraintVGAECD (2018) [111] Yes Yes GCN Inner product Embedding and clustering -DAEGC (2019) [112] Yes Yes GAT Inner product Embedding and Clustering KL divergence constraintNew VGAECD (2019) [113] Yes Yes GCN Inner product Embedding and clustering -NetVAE (2019) [114] Yes Yes MLP MLP Network embedding Prior constraint representation matrix and λ a parameter for making a trade-off between the reconstruction error and consistency of thenew representation given the prior information. Further, alayer-wise stacked auto-encoder in DeCom [104] is adoptedto ﬁnd seed nodes and add nodes to communities accordingto the structure of the network. It is remarkable that DeComis suitable for handling large networks and there is noneed to pre-deﬁne the number of communities due to theadaptive learning process. Besides, CDDTA [103] effectivelycombines transfer learning and auto-encoder. AAGR [102]and DIR [100] utilize stacked auto-encoders to incorporatethe information of topology and attributes adaptively, thuswell realizing the balance between network topology andnode attributes. NEC [105] employs graph convolutionalnetworks to encode and decode network data, which takestopology and attribute information as input, but only selectsto reconstruct the adjacency matrix to ensure that the modelcan still work without node attributes. Sparse auto-encoders.

Large-scale networks are in gen-eral difﬁcult to store and process, so it is necessary to have asparse representation. A new line of research is to adaptivelyﬁnd the optimal representation by adding a sparse con-straint to auto-encoder for this purpose. GraphEncoder [106]introduces an explicit regularization term for the hiddenlayer to restrict the size of hidden representation. If z i is the i -th vector of reconstructed data, the reconstruction errorwith sparsity constraints are as follows: Loss = n X i =1 k z i − x i k + βKL ( ρ | b ρ ) , (29)where β controls the sparsity penalty, ρ and b ρ are thesparsity parameters, where the former denotes the averageactivation of a neuron across a collection of training samples,and the latter denotes the average activation across alltraining samples. DFuzzy [107] is a parallel and scalablefuzzy clustering model with sparse auto-encoders as build-ing blocks. It trains an auto-encoder using personalized PageRank, which is effective for capturing relationshipsamong network nodes. Besides, CDMEC [108] combinestransfer learning with auto-encoder, where input matrix A isused to build four similarity matrices of complex networks.CDMEC takes one matrix as the source domain, and theother three matrices as the target domain to obtain multipledistinct low-dimensional feature representations. All rep-resentations are then put into a clustering algorithm, andthe clustering results are integrated into a new, concensusmatrix Q . The consensus matrix Q is introduced to measurethe co-occurrence of samples in the clustering result, where Q ij represents the average times that v i and v j are groupedinto the same class. Denoising auto-encoders.

Denoising auto-encoders canbe applied to noisy inputs to get node representation thatis robust to noise. MAGE [24] ﬁrst employs a convolutionalnetwork to integrate content and structure information, andthen iteratively adds random noises to content informa-tion in the auto-encoder process. In this way, the structureinformation and content information are integrated into auniﬁed framework, and the interplay between the two canbe analyzed. Further, Yang et al . [109] propose GRACE todeal with dynamic networks. They model clusters underthe consideration of network dynamics, and believe that theformation of clusters requires dynamic embedding to reacha stable state.

Variational auto-encoders.

There are also approachesbased on variational auto-encoder [118], which views thehidden representation as a latent variable with its own priordistribution. In variational inference, it exploits an approx-imation q ( H | X ) of the true posterior p ( H | X ) of the latentvariable and tries to approximate the variational posterior q ( H | X ) to the true prior p ( H ) using the KL-divergenceas a measure. For instance, the theme of ARVGA [110] isnot only to minimize the reconstruction errors of networkstructure, but also to enforce the latent codes to match a prior distribution: Loss = E q ( H | ( X,A )) [log p ( b A | H )] − KL [ q ( H | X, A ) || p ( H )] . (30)During the training of VGAECD [111], the reconstructionloss deviates from its primary objective of clustering. Thenew VGAECD [113] rectiﬁes this issue by introducing adual variational objective. To differentiate the two typesof information of network topology and node attributes,Jin et al . [114] propose the NetVAE that uses one encoderand one dual decoder with two different generative mecha-nisms to reconstruct network topology and node attributesseparately. The entire loss of this model consists of thereconstruction errors of network structure and node at-tributes, a prior constraint, and the average energy requiredto reconstruct network structure by introducing a Gaussianmixture model into decoder. Generative adversarial networks (GANs) [119], which areinspired by the minimax two-player game, have achievedunprecedented success in various ﬁelds. GANs typicallyconsist of two modules, a generator G and a discriminator D . The generator is to capture the data distribution, i.e., togenerate samples that are as similar to the real data as pos-sible; while the discriminator is to estimate the probabilitythat a sample a piece of real data rather than synthetic datagenerated by the generator. Formally, the training process ofGANs can be deﬁned as: min G max D V ( G , D ) = min G max D ( E x ∼ p data ( x ) [log D ( x )]+ E z ∼ p z ( z ) [log(1 − D ( G ( z )))]) , (31)where the ﬁrst expectation is the loss of discriminator forreal data and the second is the loss of discriminator forsynthetic data generated by the generator.The inspiration of applying GANs to community detec-tion came from the fact that GANs are usually unsuper-vised, and (in theory) the new data generated have thesame distribution as real data, which provides a powerfulnetwork data analysis capability. Jia et al . [26] propose anovel method called CommunityGAN to adopt the idea ofafﬁliation graph model (AGM) to boost the performance byintroducing the minimax competition between the networkmotif-level generator and the discriminator. It ﬁrst com-poses some representation vectors of nodes by assigningeach node-community pair a nonnegative factor that repre-sents the degree of membership of the node to community,and then optimizes such representation through a speciﬁ-cally designed GAN to detect communities. The joint valuefunction is formulated as: min θ G max θ D V ( G , D ) = n X i =1 ( E m ∼ p true ( . | vi ) [log D ( m ; θ D )]+ E s ∼G ( s | vi ; θ G ) [log(1 − D ( G ( s ; θ D )))]) , (32)where θ D (and θ G ) is the union of representation vectorsof all nodes in the discriminator D (and generator G ), m the motifs of networks and s the subset of nodes. Byemploying GANs, CommunityGAN can ﬁnd overlappingcommunities and learn a graph representation altogether. Further, Zhang et al . [27] present a novel approach of seedexpansion with generative adversarial learning (SEAL). Itemploys a discriminator to predict whether a communityis real or not and a generator to construct communities totrick the discriminator by implicitly ﬁtting features of realones for learning heuristics for community detection.There are also methods based on GANs to derive noderepresentation that can be applied to community detec-tion, e.g., employing clustering algorithms such as K-meansfor deriving embeddings to acquire resulting communi-ties [120], [121], [122]. He et al . [123] further argue thatthe existing GANs-based methods do not make full useof the essential advantages of GANs, which are to learnthe underlying representation mechanism rather than therepresentation itself. To this end, they propose to utilizeadversarial idea on the representation mechanism to acquirenode representation for downstream tasks. Speciﬁcally, thetraining loss is deﬁned as follows: min E , G max D V ( G , D , E ) = E x ∼ p data ( x ) [log D ( M I ( x, E ( x )))]+ E z ∼ p z ( z ) [log(1 − D ( M I ( G ( z ) , z )))]) , (33)where E represents the encoder that derives node represen-tation, M I ( x, E ( x )) is the mutual information between thenode attributes and node representation, D is the discrim-inator that identiﬁes the mutual information from eitherpositive or negative samples, and G is the generator thatgenerates negative samples by calculating the mutual in-formation between fake node attributes based on Guassiannoise. Yang et al . [124] argue that most GANs compare theresults of embeddings with samples obtained from Gaussiandistribution without rectiﬁcation from real data, makingthem not truly beneﬁcial for adversarial learning. Therefore,they design a joint adversarial network embedding (JANE)model, which jointly distinguishes the real and fake combi-nations of embeddings, topology information and node at-tributes, to improve node embeddings and the performanceof network analysis. Graph convolutional networks (GCNs) [33], the most repre-sentative branch of graph neural network methods [125] forlearning representation from graph data, have attracted agreat deal of attention thanks to its success on supervisedand semi-supervised classiﬁcation of nodes in a network.Several novel GCNs-based algorithms have also been de-veloped lately to exploit the power of GCNs for effectivelymodeling and inferring high-dimensional complex networkdata for community detection.Jin et al . [34] raise the concern that embeddings derivedfrom GCNs are not community-oriented and communitydetection is inherently unsupervised. To address this prob-lem, they introduce an unsupervised model, named JGE-CD, for community detection through joint GCN embed-ding. It consists of three modules, a dual encoder that de-rives two embeddings using the original attribute networkand its variant; a community detection module that stackson top of dual encoder to detect community; and a topologyreconstruction module that is employed to reconstruct net- work topology. Formally, the probability that the i -th nodebelongs to the r -th community is deﬁned as: u ir = exp( θ Tr h i ) P kr =1 exp( θ Tr h i ) , (34)where h i represents the embedding of node v i obtainedfrom GCN and θ the model parameters. Furthermore, He etal . [126] extend JGE-CD by designing a new GCN approachthat casts MRFasGCN (to be discussed in Section 4.4 shortly)as an encoder, and exploits a community-centric dual en-coder to reconstruct network topology and node attributesseparately, so as to perform unsupervised community detec-tion. In particular, the decoder for reconstructing networktopology is denoted as: b A = sigmoid (cid:16) DU W U T D T (cid:17) , (35)where U is the probability distribution matrix of nodes be-longing to different communities derived from the encoder, D the node degree matrix and W the weight matrix ofneural networks. The decoder for reconstructing attributesis inspired by topic modeling, i.e., nodes in the samecommunity are more likely to have similar distributions ofattribute words. The attribute matrix can be generated by: b X = U · R, (36)where the deﬁnition of U is the same as that in (35) and R is the probability matrix of communities selecting attributewords from the entire word set.More recently, some researches for community detec-tion make use of GCNs on heterogeneous networks thatcontain a diversity of types of nodes and relationships.Zheng et al . [127] design a heterogeneous-temporal GCN,namely HTGCN, to detect community from hetergeneousand temporal networks. Concretely, it ﬁrst obtains featurerepresentation of each hetergeneous network at each timestep by adopting a heterogeneous GCN, and then utilizesa residual compressed aggregation mechanism to expressboth the static and dynamic characteristics of community.Beyond that, there are also certain approaches incorporat-ing graph convolutional network with undirected graphicalmodels, e.g., MRFasGCN [91] and GMNN [92], which willbe discussed next. In the last few years, a number of studies have begun tointegrate graph convolutional network (GCN) and undi-rected graphical models (e.g., MRF or CRF) for communitydetection. The main idea of this line of research is thatGCN essentially constructs node embeddings through localfeature smoothing, which does not consider communityproperties and makes the node embeddings not community-oriented. While undirected graphical models generally offera good global objective to describe community, it does notconsider information on nodes and requires a substantialamount of computation for learning the model. Therefore,GCN and undirected graphical models are complementaryand can be combined to take advantage of their strengths. A major work in this line is MRFasGCN [91], whichintegrates GCN with MRF to solve the problem of semi-supervised community detection in attributed networks.The method ﬁrst extends NetMRF (as discussed in Section3.2) to extended MRF (eMRF) by adding both unary po-tentials and attribute information, and then reparameterizesthe MRF model to make it ﬁt to the GCN architecture. Theenergy function of eMRF is deﬁned as: E ( C ; A, X ) = X v i − p ( h c i i ) + α X v i = v j µ ( c i , c j ) η ( v i , v j ) , (37)where − p ( h c i i ) , whose value comes from the result of GCN,denotes the unary potential representing the probability thatnode v i belongs to community c i , µ ( c i , c j ) η ( v i , v j ) is thepairwise potential where µ ( c i , c j ) represents the similarityrelationship between communities of nodes v i and v j , and η ( v i , v j ) is the similarity of attributes of nodes v i and v j ,and α is the parameter for making a tradeoff between theunary and pairwise potentials.After MRFasGCN, several other lines of work incorpo-rate MRF or CRF into GCN to learn node embedding forcommunity detection. Qu et al . [92] propose a new approach,called graph Markov neural network (GMNN), combinesthe advantages of both statistical relational learning andgraph neural networks. A GMNN is able to learn an ef-fective node representation and model label dependencybetween different nodes, thereby completing the task ofsemi-supervised node classiﬁcation. The model parameterscan be learned by employing pseudolikelihood variationalexpectation-maximization [128] to optimize the evidencelower bound (ELBO) of log-likelihood function, which isformulated as: log p φ ( y L | x V ) > E q θ ( y U | x V )[log p φ ( y L , y U | x V ) − log q θ ( y U | x V )] , (38)where log p φ ( y L | x V ) is the log-likelihood function of ob-served node labels, and q θ ( y U | x V ) is any distribution over y U . Noting that the equality holds when q θ ( y U | x V ) = p φ ( y U | y L , x V ) . Gao et al . [129] ﬁnd that the existing GCNsfail to preserve the similarity relationship between differentnodes hidden in network data. To handle this issue, theyadd a CRF layer to GCNs to force similar nodes to havesimilar hidden features. This will enhance the quality ofnode embeddings and, in turn, improve the performanceof network analysis. PPLICATIONS OF C OMMUNITY D ETECTION

We start our discussion with a summary of the benchmarkdatasets that have been used in the area of communitydetection. We then describe real applications of communitydetection in many application ﬁelds.

We have put the detailed information of the datasets usedfor community detection in a publicly accessible web tofacilitate open research on this rapidly-developing topic.These datasets can be separated into two groups, syntheticnetworks and real-world networks.

1. http://bdilab.tju.edu.cn/ There are two classes of randomly generated synthetic net-works with known community structures, i.e., the Girvan-Newman (GN) [1] and LFR networks [130]. The GN net-work consists of four non-overlapping communities withthe same size. Each community has 32 nodes, each of whichconnects with 16 other nodes on average. Among these16 edges, Z in edges connect to nodes of the same com-munity, Z out edges to nodes of different communities, and Z in + Z out = 16 . The LFR network, another widely adoptedbenchmark for testing the performance of algorithms forcommunity detection, has distributions of node degree andcommunity size which follow power laws with tunableexponents. The LFR network captures several importantfeatures of real-world systems, e.g., the scale free property. The real-world networks that we examine include fourtypes, i.e., social networks, citation networks, collaborationnetworks, and others, listed in Table 6. To be speciﬁc, socialnetworks are formed by individuals and their interactions,including eight representative datasets such as Football andDBLP (Table 6). Citation networks consist of papers (orpatents) and their relationships (e.g., citation or inclusion),including eight classic datasets such as Cora and arXiv.Collaboration networks are comprised of scientists and theircollaborations (i.e., co-authoring papers), including fourtypical datasets such as Computer Science and Medicine.The nodes in these networks range from tens to millions,and the maximum number of edges reaches hundreds tohundreds of millions.TABLE 5: The statistics of real-world networks.

Categories Datasets

SocialNetworks Friendship7 [35] 68 220Football [1] 115 613Facebook [131] 1,045 26,749LiveJournal [132] 44,093 871,409Twitter [27] 87,760 1,293,985Orkut [132] 297,691 7,747,026DBLP [133] 317,080 1,049,866Youtube [133] 1,134,890 2,987,624CitationNetworks Small-hep

397 812Polblogs [134] 1490 16718Cora [135] 2708 5429Citeseer [135] 3312 4732Large-hep We ﬁrst discuss the applications of community detectionon different domains, and then extend to other networkanalysis tasks. We ﬁnish by discussing the potential ofcommunity detection to network science.

Community detection has diverse applications across dif-ferent domains such as online social networks and neuro-science. Online social networks, including Facebook, Twit-ter and Wechat, comprise the interactions among peoplethrough the web. Discovering community in such networksis an effective way to infer the relationships of individu-als, which has been adopted for tasks such as spammerdetection and crisis response. Jin et al . [141] indicate thatlinks in online social networks generally carry semantic in-formation, and communities of links can better characterizecommunity behaviors compared to communities of nodes.In view of this, they design a novel probabilistic modelexploring network topology and link contents altogether toperform link community detection to effectively mine socialrelationships among individuals. Wu et al . [142] design anovel end-to-end deep learning model, i.e., MRFwithGCN,based on GCN which directly operates on directed socialnetworks. They introduce in their model a MRF layer thatcaptures user following information to reﬁne predictionmade by GCN for social spammer detection.Neuroscience is a discipline studying the nervous sys-tems and brain. With the recent development of brain map-ping and neuroimaging techniques, the brain has begun tobe modeled as networks. A large amount of effort has beenput forward to exploit such networks to help extract thefunctional subdivisions of the brain. Liu et al . [143] proposea framework of siamese community preserving graph con-volutional network (SCP-GCN). The method ﬁrst retains thecommunity structure considering the intra-community andinter-community properties in learning process, and thenuses siamese architecture that models the pair-wise similar-ity to guide this learning process, so as to learn the structuraland functional joint embedding of brain networks. Jin etal . [144] argue that the existing studies typically constructcommunity structures of brain networks employing resting-state functional magnetic resonance imaging (fMRI) data,while ignoring the inherent timing and validity of fMRI timeseries. They introduce the dynamic time warping (DTW)algorithm that analyzes the synchronization and asynchro-nism of fMRI time series to extract the correlation betweenbrain regions.

With the great success of community detection, numer-ous application problems, e.g., recommendation and linkprediction, have been formulated as ﬁnding communitystructures in network systems. We now discuss how theexisting community detection methods are utilized to solvesome of these problems.Recommendation is a common task that addresses theissue of information overload for users by establishing aproﬁle of user interests based on items in their purchas-ing or browsing history and later recommending similaritems to users. The existing methods for recommendationincluding collaborative ﬁltering [145] and neural networks[146]. In particular, the concept of community has beenemployed to improve the quality of recommendation. Eissa et al . [147] make recommendation based on interest-basedcommunities that generated from topic based attributed social networks. They ﬁrst augment users with interestsimplicitly extracted from contents, then establish interest-based communities where users in a community share com-mon interest in the same topic, and use these communitiesto generate recommendation. Satuluri et al . [2] present ageneral-purpose representation layer, i.e., similarity-basedclusters (SimClusters), which settles a multitude of rec-ommendation tasks at Twitter through detecting bipartitecommunities from the user-user network and leveragingthem as a representation space.Link prediction is another important task in networkmining. It deals with missing connections and predictspossible connections in the future through the analysis ofobserved network structure and external information. Alarge number of approaches have been proposed to facilitatelink prediction via considering community. Xu et al . [148]indicate that the existing metric learning for link predictionignores community that contain abundant structural infor-mation. Therefore, they design community speciﬁc similar-ity metrics by means of joint community detection to dealwith cold-start link prediction where edges between nodesare unavailable. De et al . [149] propose a stacked two-levellearning framework, which ﬁrst learns a local similaritymodel exploiting locality structures and node attributes, andthen combines the model with community-level featuresthat derive utilizing co-clustering for link prediction. Network science is an interdisciplinary research area, whichcan be used not only in computer science, but also in otherﬁelds such as sociology and biology. Community detection,one of the most important problem in network analysis,can tremendously promote the development of networkscience. For example, in citation network, nodes representpapers and edges represent the citations among papers. Bygrouping the papers (i.e., discovering communities wherepapers have similar attributes such as belonging to the sameauthor or topic), we can analyze the inﬂuence of authors andaccurately grasp the latest research trends or technologies,which is of guiding signiﬁcance for comprehending networkand further analyzing network pattern [150]. Similarly, insociology and biology, community detection also providesa deeper understanding of network structure and promotesits development both in academia and industry.

UTURE D IRECTIONS

While learning-based community detection, including prob-abilistic graphical model and deep learning, has demon-strated superior performance across a variety of problemsand domains, there are challenges that need to be addressed.In this section, we brieﬂy discuss these challenges and futureresearch directions potentially worth pursuing.

With the rapidly increasing scale of network data, morelarge networks have become the standard across manydifferent scientiﬁc domains. These networks typically havetens of thousand or billions of nodes and edges as wellas complex structural patterns. Most existing community detection methods face excessive constraints on such largenetworks due to the potentially prohibitive demand onmemory and computation. They may require a large num-ber of training instances [151] or model parameters [152] tomake the existing methods effective. Moreover, the existingapproaches typically handle these problems by network re-duction [153] or approximation [154], which may lose someimportant network information and affect the modelingaccuracy. This raises the question of how to devise a frame-work that far exceeds the current benchmark approaches inaccuracy and efﬁciency.

Although community detection has been studied for morethan a decade, the interpretability of community remainsan important and critical issue to be adequately addressed.Most current community detection methods utilize topranked words or short phrases in the results to summa-rize communities, even though the attribute informationof nodes is typically complete sentences that have moreinformation than individual words [155], [156]. However,these methods may not be intuitive enough for understand-ing the semantics of communities due to the small numberof words and unclear relationship between words. Howto make the best use of network information to providea better semantic interpretation for community is one offuture research directions.

Adaptive model selection for community detection aimsto choose the most appropriate algorithm for discoveringcommunity, according to the characteristics of different net-works (e.g., heterogeneous or dynamic) or speciﬁc require-ments of different tasks (e.g., the highest accuracy or thelowest time complexity). Although the existing methods canbe extended from one network or task to another to someextent (which inevitably affects the accuracy and stabilityof the resulting model) [157] [158], few of them considerhow to perform model adaptation. Thus, focus has shiftedto designing a uniﬁed architecture that can automaticallyadapt to speciﬁc tasks or networks while maintaining modelaccuracy and stability instead of proposing diverse frame-works for different networks or tasks. This is an emergingresearch area that would be challenging but rewarding.

Many real-world networks are heterogeneous, dynamic,hierarchical, or incomplete. Heterogeneous networks [159]are those that contain different types of nodes and edges,or different types of descriptions on nodes and edges, suchas text and images. Dynamic networks [160] are networkswhose topology and/or attributes change over time. Dy-namic networks appear when nodes and edges are addedor deleted, thus altering the properties of nodes or edges.Hierarchical networks [161] are composed of several layers,each of which has speciﬁc semantics and functions. Incom-plete networks [162] are the ones with missing informationof their topology, nodes, or edges. While these networkscan be partly explored by the learning-based community detection, there still exist several serious issues. First, mostexisting methods assume homogeneous networks, whichmay in fact difﬁcult to handle. Second, due to the variabilityof dynamic networks, most existing methods, especiallythe ones based on deep learning, need to be re-trainedover a series of steps when the networks evolve, whichis very time consuming and may not meet the real timeprocessing demand. Third, hierarchical networks typicallyhave different types of relationships across the network hi-erarchies, which are important while often not well handledby the existing methods. Moreover, almost all the existingmethods regard the networks to be analyzed to be completeand accurately documented without noise. Unfortunately,this is rarely the case in practice as it is challenging toobtain complete information of the networks. Therefore,new methods should be developed to handle these issuesto better improve the performance of community detectionon these types of complex networks. Although several methods have been proposed to combinestatistical modeling with deep learning, such as MRFas-GCN, it is still a virgin but promising research area. Forinstance, the existing methods typically utilize the priorknowledge (e.g., communities) that statistical model offersto reﬁne the embeddings of GCN to improve resulting com-munities. However, these methods may not fully considerthe time complexity or interpretability of the models, raisingenormous challenges to community detection in practice.Furthermore, it remains an open problem to integrate statis-tical modeling in deep learning methods. For example, it isdifﬁcult to apply a strategy for recommendation or medicaldiagnosis to make deep learning become better representa-tion learning, which in turn facilitates more accurate rec-ommendation or diagnose. New innovative algorithms arehighly desirable to integrate statistical inference and deeplearning to help deep learning more produce interpretablenetwork representation models that are suitable for variousnetwork problems in broad application ﬁelds.

ONCLUSION

In this paper, we provide a comprehensive and up-to-dateliterature review on the community detection approaches.One of our main objectives is to organize and present mostof the work conducted so far in a uniﬁed perspective. In aﬁrst step, we discuss in details the problem of communitydetection, and provide a new taxonomy to group mostexisting methods into two categories from the perspective oflearning: probabilistic graphical model and deep learning.We then thoroughly review, compare, and summarize theexisting methods in these two categories, and discuss howsome of these methods can be interested. Moreover, sincethe problem is highly application oriented, we introducea wide range of applications of community detection invarious ﬁelds. We also highlight that more effort is neededto address several challenging open problems for the re-search of community detection. We expect that our viewthat attempts to synthesize the state-of-the-art of the ﬁled ofcommunity detection will contribute to a better understand-ing of this highly active and increasingly important area of study in network science, serve as a source of informationfor new researchers entering this ﬁeld and the researchersworking in this area, and promote future developments ofnext-generation community detection approaches. R EFERENCES [1] M. Girvan and M. E. J. Newman, “Community structure in socialand biological networks,”

Proc. Natl. Acad. Sci. , vol. 99, no. 12,pp. 7821–7826, 2002.[2] V. Satuluri, Y. Wu, X. Zheng, Y. Qian, B. Wichers, Q. Dai, G. M.Tang, J. Jiang, and J. Lin, “Simclusters: Community-based rep-resentations for heterogeneous recommendations at twitter,” in

Proceedings of SIGKDD , pp. 3183–3193, 2020.[3] S. Mukherjee, H. Lamba, and G. Weikum, “Experience-awareitem recommendation in evolving review communities,” in

Pro-ceedings of ICDM , pp. 925–930, 2015.[4] M. R. Keyvanpour, M. B. Shirzad, and M. Ghaderi, “AD-C: a newnode anomaly detection based on community detection in socialnetworks,”

Int. J. Electron. Bus. , vol. 15, no. 3, pp. 199–222, 2020.[5] J. Wang and I. C. Paschalidis, “Botnet detection based on anomalyand community detection,”

IEEE Trans. Control. Netw. Syst. , vol. 4,no. 2, pp. 392–404, 2017.[6] F. Saidi, Z. Trabelsi, and H. B. Ghezala, “A novel approachfor terrorist sub-communities detection based on constrainedevidential clustering,” in

Proceedings of RCIS , pp. 1–8, 2018.[7] W. W. Zachary, “An information ﬂow model for conﬂict andﬁssion in small groups,”

J. Anthropol. Res. , vol. 33, no. 4, pp. 452–473, 1977.[8] D. J. Watts and S. H. Strogatz, “Collective dynamics of ’small-world’ networks,”

Nature , vol. 393, no. 6684, pp. 440–442, 1998.[9] A.-L. Barab´asi and R. Albert, “Emergence of scaling in randomnetworks,”

Science , vol. 286, no. 5439, pp. 509–512, 1999.[10] C. C. Aggarwal and H. Wang, “Managing and mining graphdata,”

Advances in Database Systems , vol. 40, pp. 275–301, 2010.[11] S. Jia, L. Gao, Y. Gao, J. Nastos, Y. Wang, X. Zhang, and H. Wang,“Deﬁning and identifying cograph communities in complex net-works,”

New J. Phys. , vol. 17, no. 1, p. 013044, 2015.[12] L. Yang, X. Cao, D. He, C. Wang, X. Wang, and W. Zhang,“Modularity based community detection with deep learning,”in

Proceedings of IJCAI , pp. 2252–2258, 2016.[13] M. Newman and M. Girvan, “Finding and evaluating communitystructure in networks,”

Phys. Rev. E , vol. 69, no. 2, p. 26113, 2004.[14] P. Zhang and C. Moore, “Scalable detection of statistically sig-niﬁcant communities and hierarchies, using message passing formodularity,”

Proc. Natl. Acad. Sci. , vol. 111, no. 51, pp. 18144–18149, 2014.[15] M. Fanuel, C. M. Ala´ız, and J. A. K. Suykens, “Magnetic eigen-maps for community detection in directed networks,”

Phys. Rev.E , vol. 95, no. 2, p. 022302, 2017.[16] Y. Li, K. He, D. Bindel, and J. E. Hopcroft, “Uncovering thesmall community structure in large networks: A local spectralapproach,” in

Proceedings of WWW , pp. 658–668, 2015.[17] A. Anandkumar, R. Ge, D. J. Hsu, and S. M. Kakade, “A tensorapproach to learning mixed membership community models,”

J.Mach. Learn. Res. , vol. 15, no. 1, pp. 2239–2312, 2014.[18] D. He, D. Liu, D. Jin, and W. Zhang, “A stochastic model for de-tecting heterogeneous link communities in complex networks,”in

Proceedings of AAAI , pp. 130–136, 2015.[19] C. Pizzuti and A. Socievole, “Multiobjective optimization andlocal merge for clustering attributed graphs,”

IEEE Trans. Cybern. ,pp. 1–13, 2019.[20] Z. Li, J. Liu, and K. Wu, “A multiobjective evolutionary algorithmbased on structural and attribute similarities for communitydetection in attributed networks,”

IEEE Trans. Cybern. , vol. 48,no. 7, pp. 1963–1976, 2018.[21] F. Wang, T. Li, X. Wang, S. Zhu, and C. Ding, “Communitydiscovery using nonnegative matrix factorization,”

Data Min.Knowl. Discov. , vol. 22, no. 3, pp. 493–521, 2011.[22] Y. Zhang and D. Yeung, “Overlapping community detection viabounded nonnegative matrix tri-factorization,” in

Proceedings ofSIGKDD , pp. 606–614, 2012.[23] B. Yang, X. Zhao, and X. Liu, “Bayesian approach to modelingand detecting communities in signed network,” in

Proceedings ofAAAI , pp. 1952–1958, 2015. [24] C. Wang, S. Pan, G. Long, X. Zhu, and J. Jiang, “MGAE: marginal-ized graph autoencoder for graph clustering,” in Proceedings ofCIKM , pp. 889–898, 2017.[25] B. Sun, H. Shen, J. Gao, W. Ouyang, and X. Cheng, “A non-negative symmetric encoder-decoder approach for communitydetection,” in

Proceedings of CIKM , pp. 597–606, 2017.[26] Y. Jia, Q. Zhang, W. Zhang, and X. Wang, “Communitygan: Com-munity detection with generative adversarial nets,” in

Proceedingsof WWW , pp. 784–794, 2019.[27] Y. Zhang, Y. Xiong, Y. Ye, T. Liu, W. Wang, Y. Zhu, and P. S. Yu,“SEAL: learning heuristics for community detection with gener-ative adversarial networks,” in

Proceedings of SIGKDD , pp. 1103–1113, 2020.[28] P. W. Holland, K. B. Laskey, and S. Leinhardt, “Stochastic block-models: First steps,”

Soc. Networks , vol. 5, no. 2, pp. 109–137, 1983.[29] E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing, “Mixedmembership stochastic blockmodels,”

J. Mach. Learn. Res. , vol. 9,pp. 1981–2014, 2008.[30] B. Karrer and M. E. J. Newman, “Stochastic blockmodels andcommunity structure in networks,”

Phys. Rev. E , vol. 83, no. 1,2011.[31] D. He, X. You, Z. Feng, D. Jin, X. Yang, and W. Zhang, “Anetwork-speciﬁc markov random ﬁeld approach to communitydetection,” in

Proceedings of AAAI , pp. 306–313, 2018.[32] G. Sperl`ı, “A deep learning based community detection ap-proach,” in

Proceedings of SAC , pp. 1107–1110, 2019.[33] T. N. Kipf and M. Welling, “Semi-supervised classiﬁcation withgraph convolutional networks,” in

Proceedings of ICLR , 2017.[34] D. Jin, B. Li, P. Jiao, D. He, and H. Shan, “Community detectionvia joint graph convolutional network embedding in attributenetwork,” in

Proceedings of ICANN , vol. 11731, pp. 594–606, 2019.[35] J. Xie, S. Kelley, and B. K. Szymanski, “Overlapping communitydetection in networks: the state of the art and comparativestudy,”

ACM Comput. Surv. , vol. 45, no. 4, pp. 1–35, 2013.[36] C. C. Aggarwal and K. Subbian, “Evolutionary network analysis:A survey,”

ACM Comput. Surv. , vol. 47, no. 1, pp. 1–36, 2014.[37] C. Bianfang, J. Caiyan, and Y. Jian, “Overview of communitydetection models on statistical inference,”

Comput. Sci. , vol. 39,no. 8, pp. 1–7, 2012.[38] F. Liu, S. Xue, J. Wu, C. Zhou, W. Hu, C. Paris, S. Nepal, J. Yang,and P. S. Yu, “Deep learning for community detection: Progress,challenges and opportunities,” in

Proceedings of IJCAI , pp. 4981–4987, 2020.[39] F. D. Malliaros and M. Vazirgiannis, “Clustering and communitydetection in directed networks: A survey,”

Phys. Rep.-Rev. Sec.Phys. Lett. , vol. 533, no. 4, pp. 95–142, 2013.[40] T. Hartmann, A. Kappes, and D. Wagner, “Clustering evolvingnetworks,” in

Algorithm Engineering , vol. 9220, pp. 280–329, 2016.[41] C. Lee and D. J. Wilkinson, “A review of stochastic block modelsand extensions for graph clustering,”

Appl. Netw. Sci. , vol. 4, no. 1,p. 122, 2019.[42] T. A. Snijders and K. Nowicki, “Estimation and prediction forstochastic blockmodels for graphs with latent block structure,”

J.Classif. , vol. 14, no. 1, pp. 75–100, 1997.[43] X. Fan, R. Y. D. Xu, and L. Cao, “Copula mixed-membershipstochastic block model,” in

Proceedings of IJCAI , 2016.[44] S. Pal and M. Coates, “Scalable MCMC in degree correctedstochastic block mode,” in

Proceedings of ICASSP , pp. 5461–5465,2019.[45] Y. Zhao, E. Levina, and J. Zhu, “Consistency of communitydetection in networks under degree-corrected stochastic blockmodels,”

Ann. Stat. , vol. 40, no. 4, pp. 2266–2292, 2012.[46] L. Gulikers, M. Lelarge, and L. Massouli´e, “A spectral methodfor community detection in moderately sparse degree-correctedstochastic block models,”

Adv. Appl. Probab. , vol. 49, no. 3,pp. 686–721, 2017.[47] J. X. Yudong Chen, Xiaodong Li, “Convexiﬁed modularity max-imization for degree-corrected stochastic block models,”

Ann.Stat. , vol. 46, no. 4, pp. 1573–1602, 2018.[48] W. Fu, L. Song, and E. P. Xing, “Dynamic mixed membershipblockmodel for evolving networks,” in

Proceedings of ICML ,pp. 329–336, 2009.[49] E. P. Xing, W. Fu, and L. Song, “A state-space mixed membershipblockmodel for dynamic network tomography,”

Ann. Appl. Stat. ,vol. 4, no. 2, pp. 535–566, 2010. [50] T. Yang, Y. Chi, S. Zhu, Y. Gong, and R. Jin, “Detecting communi-ties and their evolutions in dynamic social networks - a bayesianapproach,”

Mach. Learn. , vol. 82, no. 2, pp. 157–189, 2011.[51] X. Tang and C. C. Yang, “Detecting social media hidden com-munities using dynamic stochastic blockmodel with temporaldirichlet process,”

ACM Trans. Intell. Syst. Technol. , vol. 5, no. 2,pp. 1–21, 2014.[52] K. S. Xu, “Stochastic block transition models for dynamic net-works,” in

Proceedings of AISTATS , vol. 38, pp. 1079–1087, 2015.[53] J. D. Wilson, N. T. Stevens, and W. H. Woodall, “Modeling anddetecting change in temporal networks via a dynamic degreecorrected stochastic block model,”

Qual. Reliab. Eng. Int. , vol. 35,no. 5, pp. 1363–1378, 2016.[54] X. Wu, P. Jiao, Y. Wang, T. Li, W. Wang, and B. Wang, “Dynamicstochastic block model with scale-free characteristic for temporalcomplex networks,” in

Proceedings of DASFAA , pp. 502–518, 2019.[55] M. G. Bhattacharjee M, Banerjee M, “Change point estimation ina dynamic stochastic block model,”

J. Mach. Learn. Res. , vol. 21,no. 107, pp. 1–59, 2020.[56] Z. Yu, X. Fan, M. Pietrasik, and M. Z. Reformat, “Fragmentationcoagulation based mixed membership stochastic blockmodel,” in

Proceedings of AAAI , pp. 6704–6711, 2020.[57] P. Latouche, E. Birmel´e, C. Ambroise, et al. , “Overlapping stochas-tic block models with application to the french political blogo-sphere,”

Ann. Appl. Stat. , vol. 5, no. 1, pp. 309–336, 2011.[58] G. Arora, A. Porwal, K. Agarwal, A. Samdariya, and P. Rai,“Small-variance asymptotics for nonparametric bayesian over-lapping stochastic blockmodels,” in

Proceedings of IJCAI ,pp. 2000–2006, 2018.[59] D. Jin, B. Li, P. Jiao, D. He, H. Shan, and W. Zhang, “Modelingwith node popularities for autonomous overlapping communitydetection,”

ACM Trans. Intell. Syst. Technol. , vol. 11, no. 3, pp. 1–23, 2020.[60] N. Mehta, L. Carin, and P. Rai, “Stochastic blockmodels meetgraph neural networks,” in

Proceedings of ICML , vol. 97, pp. 4466–4474, 2019.[61] K. S. Xu and A. O. Hero, “Dynamic stochastic blockmodels fortime-evolving social networks,”

IEEE J. Sel. Top. Signal Process. ,vol. 8, no. 4, pp. 552–562, 2014.[62] X. Wu, P. Jiao, Y. Wang, T. Li, W. Wang, and B. Wang, “Dynamicstochastic block model with scale-free characteristic for tempo-ral complex networks,” in

Proceedings of DASFAA , vol. 11447,pp. 502–518, 2019.[63] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet alloca-tion,”

J. Mach. Learn. Res. , vol. 3, pp. 993–1022, 2003.[64] H. Zhang, B. Qiu, C. L. Giles, H. C. Foley, and J. Yen, “An lda-based community structure discovery approach for large-scalesocial networks,” in

Proceedings of ISI , pp. 200–207, 2007.[65] Z. Yin, L. Cao, Q. Gu, and J. Han, “Latent community topic anal-ysis: Integration of community discovery with topic modeling,”

ACM Trans. Intell. Syst. Technol. , vol. 3, no. 4, pp. 63:1–63:21, 2012.[66] Y. Cha and J. Cho, “Social-network analysis using topic models,”in

Proceedings of SIGIR , pp. 565–574, 2012.[67] Z. Xu, Y. Ke, Y. Wang, H. Cheng, and J. Cheng, “A model-based approach to attributed graph clustering,” in

Proceedings ofCOMAD , pp. 505–516, 2012.[68] D. He, Z. Feng, D. Jin, X. Wang, and W. Zhang, “Joint identiﬁca-tion of network communities and semantics via integrative mod-eling of network topologies and node contents,” in

Proceedings ofAAAI , pp. 116–124, 2017.[69] D. Jin, K. Wang, G. Zhang, P. Jiao, D. He, F. Fogelman-Soulie, andX. Huang, “Detecting communities with multiplex semantics bydistinguishing background, general and specialized topics,”

IEEETrans. Knowl. Data Eng. , vol. 32, no. 11, pp. 2144–2158, 2020.[70] J. He, Z. Hu, T. Berg-Kirkpatrick, Y. Huang, and E. P. Xing,“Efﬁcient correlated topic modeling with topic embedding,” in

Proceedings of SIGKDD , pp. 225–233, 2017.[71] D. Jin, J. Huang, P. Jiao, L. Yang, D. He, F. Fogelman-Souli´e,and Y. Huang, “A novel generative topic embedding modelby introducing network communities,” in

Proceedings of WWW ,pp. 2886–2892, 2019.[72] D. D. Lee and H. S. Seung, “Algorithms for non-negative matrixfactorization,” in

Proceedings of NeurIPS , pp. 556–562, 2000.[73] R.-S. Wang, S. Zhang, Y. Wang, X.-S. Zhang, and L. Chen,“Clustering complex networks and biological networks by non-negative matrix factorization with various similarity measures,”

Neurocomputing , vol. 72, no. 1-3, pp. 134–141, 2008. [74] D. Kuang, H. Park, and C. H. Q. Ding, “Symmetric nonnegativematrix factorization for graph clustering,” in Proceedings of SIAM ,pp. 106–117, 2012.[75] X. Shi, H. Lu, Y. He, and S. He, “Community detection in socialnetwork with pairwisely constrained symmetric non-negativematrix factorization,” in

Proceedings of ASONAM , pp. 541–546,2015.[76] B. Sun, H. Shen, J. Gao, W. Ouyang, and X. Cheng, “A non-negative symmetric encoder-decoder approach for communitydetection,” in

Proceedings of CIKM , pp. 597–606, 2017.[77] J. Yang and J. Leskovec, “Overlapping community detection atscale: a nonnegative matrix factorization approach,” in

Proceed-ings of WSDM , pp. 587–596, 2013.[78] X. Cao, X. Wang, D. Jin, Y. Cao, and D. He, “Identifying overlap-ping communities as well as hubs and outliers via nonnegativematrix factorization,”

Sci Rep , vol. 3, no. 10, p. 2993, 2013.[79] H. Zhang, I. King, and M. R. Lyu, “Incorporating implicit linkpreference into overlapping community detection,” in

Proceedingsof AAAI , pp. 396–402, 2015.[80] H. Zhang, T. Zhao, I. King, and M. R. Lyu, “Modeling thehomophily effect between links and communities for overlappingcommunity detection,” in

Proceedings of IJCAI , pp. 3938–3944,2016.[81] Y. Pei, N. Chakraborty, and K. P. Sycara, “Nonnegative matrix tri-factorization with graph regularization for community detectionin social networks,” in

Proceedings of IJCAI , pp. 2083–2089, 2015.[82] X. Wang, D. Jin, X. Cao, L. Yang, and W. Zhang, “Semantic com-munity identiﬁcation in large attribute networks,” in

Proceedingsof AAAI , pp. 265–271, 2016.[83] W. Wang, P. Jiao, D. He, D. Jin, L. Pan, and B. Gabrys, “Au-tonomous overlapping community detection in temporal net-works: A dynamic bayesian nonnegative matrix factorizationapproach,”

Knowl. Based Syst. , vol. 110, pp. 121–134, 2016.[84] X. Ma and D. Dong, “Evolutionary Nonnegative Matrix Fac-torization Algorithms for Community Detection in DynamicNetworks,”

IEEE Trans. Knowl. Data Eng. , vol. 29, no. 5, pp. 1045–1058, 2017.[85] L. Yang, X. Cao, D. Jin, X. Wang, and D. Meng, “A uniﬁedsemi-supervised community detection framework using latentspace graph regularization,”

IEEE Trans. Cybern. , vol. 45, no. 11,pp. 2585–2598, 2015.[86] X. Liu, W. Wang, D. He, P. Jiao, D. Jin, and C. V. Cannistraci,“Semi-supervised community detection based on non-negativematrix factorization with node popularity,”

Inf. Sci. , vol. 381,pp. 304–321, 2017.[87] T. Guo, S. Pan, X. Zhu, and C. Zhang, “CFOND: consensus fac-torization for co-clustering networked data,”

IEEE Trans. Knowl.Data Eng. , vol. 31, no. 4, pp. 706–719, 2018.[88] D. Jin, X. You, W. Li, D. He, P. Cui, F. Fogelman-Souli´e, andT. Chakraborty, “Incorporating network embedding into markovrandom ﬁeld for better community detection,” in

Proceedings ofAAAI , pp. 160–167, 2019.[89] D. Jin, B. Zhang, Y. Song, D. He, Z. Feng, S. Chen, W. Li,and K. Musial, “Modmrf: A modularity-based markov randomﬁeld method for community detection,”

Neurocomputing , vol. 405,pp. 218–228, 2020.[90] D. He, W. Song, D. Jin, Z. Feng, and Y. Huang, “An end-to-end community detection model: Integrating LDA into markovrandom ﬁeld via factor graph,” in

Proceedings of IJCAI , pp. 5730–5736, 2019.[91] D. Jin, Z. Liu, W. Li, D. He, and W. Zhang, “Graph convolutionalnetworks meet markov random ﬁelds: Semi-supervised commu-nity detection in attribute networks,” in

Proceedings of AAAI ,pp. 152–159, 2019.[92] M. Qu, Y. Bengio, and J. Tang, “GMNN: graph markov neuralnetworks,” in

Proceedings of ICML , vol. 97, pp. 5241–5250, 2019.[93] X. Ma, L. Gao, X. Yong, and L. Fu, “Semi-supervised cluster-ing algorithm for community structure detection in complexnetworks,”

Physical A: Statistical Mechanics and its Applications ,vol. 389, no. 1, pp. 187–197, 2010.[94] S. Nowozin and C. H. Lampert, “Structured learning and predic-tion in computer vision,”

Found. Trends Comput. Graph. Vis. , vol. 6,no. 3-4, pp. 185–365, 2011.[95] J. Zeng, W. K. Cheung, and J. Liu, “Learning topic models bybelief propagation,”

IEEE Trans. Pattern Anal. Mach. Intell. , vol. 35,no. 5, pp. 1121–1134, 2013. [96] Z. Yang, J. Tang, J. Li, and W. Yang, “Social Community Analysisvia a Factor Graph Model,”

IEEE Intell. Syst. , vol. 26, no. 3, pp. 58–65, 2011.[97] Y. Jia, Y. Gao, W. Yang, J. Huo, and Y. Shi, “A novel ego-centered academic community detection approach via factorgraph model,” in

Proceedings of IDEAL , vol. 8669, pp. 223–230,2014.[98] A. Passarella, R. I. M. Dunbar, M. Conti, and F. Pezzoni, “Egonetwork models for future internet social networking environ-ments,”

Comput. Commun. , vol. 35, no. 18, pp. 2201–2217, 2012.[99] L. Yang, X. Cao, D. He, C. Wang, X. Wang, and W. Zhang,“Modularity based community detection with deep learning,”in

Proceedings of IJCAI , pp. 2252–2258, 2016.[100] J. Di, G. Meng, L. Zhixuan, L. Wenhuan, H. Dongxiao, andF. Fogelman-Soulie, “Using deep learning for community discov-ery in social networks,” in

Proceedings of ICTAI , pp. 160–167, 2017.[101] J. Cao, D. Jin, L. Yang, and J. Dang, “Incorporating networkstructure with node contents for community detection on largenetworks using deep learning,”

Neurocomputing , vol. 297, pp. 71–81, 2018.[102] J. Cao, D. Jin, and J. Dang, “Autoencoder based communitydetection with adaptive integration of network topology andnode contents,” in

Proceedings of KSEM , vol. 11062, pp. 184–196,2018.[103] Y. Xie, X. Wang, D. Jiang, and R. Xu, “High-performance com-munity detection in social networks using a deep transitiveautoencoder,”

Inf. Sci. , vol. 493, pp. 75–90, 2019.[104] V. Bhatia and R. Rani, “A distributed overlapping communitydetection model for large graphs using autoencoder,”

FutureGener. Comput. Syst. , vol. 94, pp. 16–26, 2019.[105] H. Sun, F. He, J. Huang, Y. Sun, Y. Li, C. Wang, L. He, Z. Sun,and X. Jia, “Network embedding for community detection inattributed networks,”

ACM Trans. Knowl. Discov. Data , vol. 14,no. 3, pp. 1–25, 2020.[106] F. Tian, B. Gao, Q. Cui, E. Chen, and T. Liu, “Learning deeprepresentations for graph clustering,” in

Proceedings of AAAI ,pp. 1293–1299, 2014.[107] V. Bhatia and R. Rani, “Dfuzzy: a deep learning-based fuzzyclustering model for large graphs,”

Knowl. Inf. Syst. , vol. 57, no. 1,pp. 159–181, 2018.[108] R. Xu, Y. Che, X. Wang, J. Hu, and Y. Xie, “Stacked autoencoder-based community detection method via an ensemble clusteringframework,”

Inf. Sci. , vol. 526, pp. 151–165, 2020.[109] C. Yang, M. Liu, Z. Wang, L. Liu, and J. Han, “Graph clusteringwith dynamic embedding.,” arXiv , 2017.[110] S. Pan, R. Hu, G. Long, J. Jiang, L. Yao, and C. Zhang, “Adver-sarially regularized graph autoencoder for graph embedding,” in

Proceedings of IJCAI , pp. 2609–2615, 2018.[111] J. J. Choong, X. Liu, and T. Murata, “Learning communitystructure with variational autoencoder,” in

Proceedings of ICDM ,pp. 69–78, 2018.[112] C. Wang, S. Pan, R. Hu, G. Long, J. Jiang, and C. Zhang,“Attributed graph clustering: a deep attentional embedding ap-proach,” in

Proceedings of IJCAI , pp. 3670–3676, 2019.[113] J. J. Choong, X. Liu, and T. Murata, “Optimizing variational graphautoencoder for community detection,” in

Proceedings of BigData ,pp. 5353–5358, 2019.[114] D. Jin, B. Li, P. Jiao, D. He, and W. Zhang, “Network-speciﬁcvariational auto-encoder for embedding in attribute networks,”in

Proceedings of IJCAI , pp. 2663–2669, 2019.[115] F. Sun, M. Qu, J. Hoffmann, C. Huang, and J. Tang, “vgraph:A generative model for joint community detection and noderepresentation learning,” in

Proceedings of NeurIPS , pp. 512–522,2019.[116] M. K. Rahman and A. Azad, “Evaluating the community struc-tures from network images using neural networks,” in

Proceed-ings of Complex Networks and Their Applications , vol. 881, pp. 866–878, 2019.[117] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimension-ality of data with neural networks.,”

Science , vol. 313, no. 5786,pp. 504–507, 2006.[118] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,”in

Proceedings of ICLR , 2014.[119] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio, “Generativeadversarial nets,” in

Proceedings of NeurIPS , pp. 2672–2680, 2014. [120] Y. Sun, S. Wang, T. Hsieh, X. Tang, and V. G. Honavar, “MEGAN:A generative adversarial network for multi-view network em-bedding,” in Proceedings of IJCAI , pp. 3527–3533, 2019.[121] H. Gao, J. Pei, and H. Huang, “Progan: Network embeddingvia proximity generative adversarial network,” in

Proceedings ofSIGKDD , pp. 1308–1316, 2019.[122] H. Hong, X. Li, and M. Wang, “GANE: A generative adversarialnetwork embedding,”

IEEE Trans. Neural Networks Learn. Syst. ,vol. 31, no. 7, pp. 2325–2335, 2020.[123] D. He, L. Zhai, Z. Li, D. Jin, L. Yang, Y. Huang, and P. S. Yu, “Ad-versarial mutual information learning for network embedding,”in

Proceedings of IJCAI , pp. 3321–3327, 2020.[124] L. Yang, Y. Wang, J. Gu, C. Wang, X. Cao, and Y. Guo, “JANE:jointly adversarial network embedding,” in

Proceedings of IJCAI ,pp. 1381–1387, 2020.[125] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, “Acomprehensive survey on graph neural networks,”

IEEE Trans.Neural Networks Learn. Syst. , 2020.[126] D. He, Y. Song, D. Jin, Z. Feng, B. Zhang, Z. Yu, and W. Zhang,“Community-centric graph convolutional network for unsuper-vised community detection,” in

Proceedings of IJCAI , pp. 3515–3521, 2020.[127] Y. Zheng, S. Chen, X. Zhang, and D. Wang, “Heterogeneousgraph convolutional networks for temporal community detec-tion,” arXiv , 2019.[128] R. M. Neal and G. E. Hinton, “A view of the em algorithm thatjustiﬁes incremental, sparse, and other variants,” in

Learning inGraphical Models , vol. 89, pp. 355–368, 1998.[129] H. Gao, J. Pei, and H. Huang, “Conditional random ﬁeld en-hanced graph convolutional neural networks,” in

Proceedings ofSIGKDD , pp. 276–284, 2019.[130] A. Lancichinetti and S. Fortunato, “Benchmarks for testing com-munity detection algorithms on directed and weighted graphswith overlapping communities,”

Phys. Rev. E , vol. 80, no. 1,p. 016118, 2009.[131] J. J. McAuley and J. Leskovec, “Learning to discover social circlesin ego networks,” in

Proceedings of NeurIPS , pp. 548–556, 2012.[132] S. Harenberg, G. Bello, L. Gjeltema, S. Ranshous, and N. Sam-atova, “Community detection in large-scale networks: A surveyand empirical evaluation,”

WIREs Computational Statistics , vol. 6,no. 6, pp. 426–439, 2014.[133] H. V. Lierde, T. W. S. Chow, and G. Chen, “Scalable spectralclustering for overlapping community detection in large-scalenetworks,”

IEEE Trans. Knowl. Data Eng. , vol. 32, no. 4, pp. 754–767, 2020.[134] L. A. Adamic and N. S. Glance, “The political blogosphereand the 2004 U.S. election: divided they blog,” in

Proceedings ofSIGKDD , pp. 36–43, 2005.[135] W. Ren, G. Yan, X. Liao, and L. Xiao, “Simple probabilistic algo-rithm for detecting community structure,”

Phys. Rev. E , vol. 79,no. 2, p. 036111, 2009.[136] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Gallagher, andT. Eliassi-Rad, “Collective classiﬁcation in network data,”

AIMagazine , vol. 29, no. 3, pp. 93–106, 2008.[137] Ginsparg and Paul, “Arxiv at 20,”

Nature , vol. 476, no. 7359,pp. 145–7, 2011.[138] J. Leskovec, J. M. Kleinberg, and C. Faloutsos, “Graphs over time:densiﬁcation laws, shrinking diameters and possible explana-tions,” in

Proceedings of SIGKDD , pp. 177–187, 2005.[139] O. Shchur and S. G ¨unnemann, “Overlapping community detec-tion with graph neural networks,” arXiv , 2019.[140] J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney,“Community structure in large networks: Natural cluster sizesand the absence of large well-deﬁned clusters,”

Internet Math. ,vol. 6, no. 1, pp. 29–123, 2009.[141] D. Jin, X. Wang, R. He, D. He, J. Dang, and W. Zhang, “Robust de-tection of link communities in large social networks by exploitinglink semantics,” in

Proceedings of AAAI , pp. 314–321, 2018.[142] Y. Wu, D. Lian, Y. Xu, L. Wu, and E. Chen, “Graph convolutionalnetworks with markov random ﬁeld reasoning for social spam-mer detection,” in

Proceedings of AAAI , pp. 1054–1061, 2020.[143] J. Liu, G. Ma, F. Jiang, C. Lu, P. S. Yu, and A. B. Ragin,“Community-preserving graph convolutions for structural andfunctional joint embedding of brain networks,” in

Proceedings ofBigData , pp. 1163–1168, 2019.[144] D. Jin, R. Li, and J. Xu, “Multiscale community detection in func-tional brain networks constructed using dynamic time warping,”

IEEE Trans. Neural Syst. Rehabil. Eng. , vol. 28, no. 1, pp. 52–61,2020.[145] S. Zhang, L. Yao, L. V. Tran, A. Zhang, and Y. Tay, “Quaternioncollaborative ﬁltering for recommendation,” in

Proceedings ofIJCAI , pp. 4313–4319, 2019.[146] X. He, K. Deng, X. Wang, Y. Li, Y. Zhang, and M. Wang, “Light-gcn: Simplifying and powering graph convolution network forrecommendation,” in

Proceedings of SIGIR , pp. 639–648, 2020.[147] A. H. B. Eissa, M. E. El-Sharkawi, and H. M. O. Mokhtar,“Towards recommendation using interest-based communities inattributed social networks,” in

Proceedings of WWW , pp. 1235–1242, 2018.[148] L. Xu, X. Wei, J. Cao, and P. S. Yu, “On learning mixedcommunity-speciﬁc similarity metrics for cold-start link predic-tion,” in

Proceedings of WWW , pp. 861–862, 2017.[149] A. De, S. Bhattacharya, S. Sarkar, N. Ganguly, and S. Chakrabarti,“Discriminative link prediction using local, community, andglobal signals,”

IEEE Trans. Knowl. Data Eng. , vol. 28, no. 8,pp. 2057–2070, 2016.[150] H. Wan, Y. Zhang, J. Zhang, and J. Tang, “Aminer: Search andmining of academic social networks,”

Data Intell. , vol. 1, no. 1,pp. 58–76, 2019.[151] D. Jin, H. Wang, J. Dang, D. He, and W. Zhang in

Proceedings ofAAAI , pp. 172–178, 2016.[152] Y. Wang, D. Jin, K. Musial, and J. Dang, “Community detectionin social networks considering topic correlations,” in proceedingsof AAAI , pp. 321–328, 2019.[153] X. Zhang, K. Zhou, H. Pan, L. Zhang, X. Zeng, and Y. Jin, “Anetwork reduction-based multiobjective evolutionary algorithmfor community detection in large-scale complex networks,”

IEEETrans. Cybern. , vol. 50, no. 2, pp. 703–716, 2020.[154] S. Qiao, N. Han, Y. Gao, R. Li, J. Huang, J. Guo, L. A. Gutierrez,and X. Wu, “A fast parallel community discovery model on com-plex networks through approximate optimization,”

IEEE Trans.Knowl. Data Eng. , vol. 30, no. 9, pp. 1638–1651, 2018.[155] H. Cai, V. W. Zheng, F. Zhu, K. C. Chang, and Z. Huang, “Fromcommunity detection to community proﬁling,”

Proc. VLDB En-dow. , vol. 10, no. 7, pp. 817–828, 2017.[156] D. He, Z. Feng, D. Jin, X. Wang, and W. Zhang, “Joint identiﬁca-tion of network communities and semantics via integrative mod-eling of network topologies and node contents,” in

Proceedings ofAAAI , pp. 116–124, 2017.[157] J. Shao, Z. Zhang, Z. Yu, J. Wang, Y. Zhao, and Q. Yang, “Com-munity detection and link prediction via cluster-driven low-rankmatrix completion,” in

Proceedings of IJCAI , pp. 3382–3388, 2019.[158] Y. Li, C. Sha, X. Huang, and Y. Zhang, “Community detectionin attributed graphs: An embedding approach,” in

Proceedings ofAAAI , pp. 338–345, 2018.[159] X. Li, Y. Wu, M. Ester, B. Kao, X. Wang, and Y. Zheng, “Semi-supervised clustering in attributed heterogeneous informationnetworks,” in

Proceedings of WWW , pp. 1621–1629, 2017.[160] D. J. DiTursi, G. Ghosh, and P. Bogdanov, “Local communitydetection in dynamic networks,” in

Proceedings of ICDM , pp. 847–852, 2017.[161] C. Chen, H. Tong, L. Xie, L. Ying, and Q. He, “FASCINATE: fastcross-layer dependency inference on multi-layered networks,” in

Proceedings of SIGKDD , pp. 765–774, 2016.[162] W. Lin, X. Kong, P. S. Yu, Q. Wu, Y. Jia, and C. Li, “Communitydetection in incomplete information networks,” in

Proceedings ofWWW , pp. 341–350, 2012. A PPENDIX A Here we list the key terms and notations in the main text inTable 1. TABLE 6: Summary of notations.

Notations Descriptions G A network.

V, E

The sets of nodes and edges of a network.

A, X

The adjacency matrix and node attribute matrix. D The node degree matrix. n, m

The numbers of nodes and edges. e ij The edge between nodes v i and v j . a ij The connection between nodes v i and v j . x i The attribute vector and degree of node v i . q The maximal number of node attributes. C The set of communities. C The community assignments of nodes. k The numbers of communities. c i The community which node v i belongs to. ω r The probability of nodes assigned to community C r . π rs The probability of link generation within two com-munities C r and C s . δ ( c i , c j ) The probability of nodes v i and v j falling into thesame community partition. E ( C ; A ) The energy function in MRF. Θ i The unary potential function in E ( C ; A ) . Θ ij The pairwise potential function in E ( C ; A ) . KL ( ·||· ) KL-divergence. e A The estimation matrix of A in NMF. B, S

Community membership matrix and attribute com-munity matrix in NMF.

H, W

Node representation matrix and weight matrix ofneural networks.

M, L

Modularity matrix and Laplacian matrix. F The set of factor nodes. b A, b X Reconstructed adjacency matrix and node attributematrix. G , D The generator and discriminator of GAN. E The encoder that derives node representation. A PPENDIX B In Section 3.1.1, we have introduced several SBM variantsfor community detection. Here, we give the overall processof community detection based on the basic SBM with aBernoulli distribution [28], MMSB [29] and DSBM [50] inalgorithm 1, algorithm 2 and algorithm 3 respectively. A PPENDIX C In Section 3.1.2, we have presented several topic modelsfor community detection. Here, we give the generationprocess of SSN-LDA [64] for one social interaction proﬁle inalgorithm 4, and provide the process for clustering attributecommunities [67] in algorithm 5.

Algorithm 1:

The basic SBM-based method [28]

Input: n , k . Output: the community assignments of nodes C . Assume that nodes are independently divided into k communities; Inference the parameters ω , π of likelihood functionby using EM algorithm; for each node v i do for each node v j do c i i · i · d ∼ Multinomial(1; ω ) ; a ij | c ir , c js i · i · d ∼ Bernoulli( π rs ) | < r, s ≤ k ; return C ; Algorithm 2:

The MMSB-based method [29]

Input: n , k , α , β . Output: the community assignments of nodes C . for each node v i do ω i ∼ Dirichlet( α ) ; Assigns the community assignments c i with ω i ; for each node v j do π rs | c ir , c js ∼ Beta( β ) | < r, s ≤ k ; a i → j ∼ Multinomial( ω i ) ; a i ← j ∼ Multinomial( ω j ) ; a ij ∼ Bernoulli( a Ti → j π rs a i ← j ) ; return C ; Algorithm 3:

The DSBM-based method [50]

Input: n, π, A, T . Output: the community assignments of nodes C ( T ) . if time t == 1 then generate the social network followed by SBM; for each time t > do generate c ( t ) i ∼ π ( c ( t ) i | c ( t − i , A ) | < i ≤ n ; for each pair of nodes ( v i , v j ) at time t do generate w ( t ) ij ∼ Bernoulli( ·| π c ( t ) i ,c ( t − i ) | < i, j ≤ n ; return C ( T ) ; Algorithm 4:

The SSN-LDA method [64]

Input: k, ~α, ~β, ε.

Output: the community assignment of one node c i . sample mixture components ~φ ∼ Dirichlet( ~β ) ; choose ~θ i ∼ Dirichlet( ~α ) ; choose N i ∼ Poisson( ε ) ; for each neighbor v j of v i do choose a community c i ∼ Multinomial( ~θ ) ; choose a social interaction a ij ∼ Multinomial( ~φ c i ) ; return c i ; Algorithm 5:

The generation of Bayesian attributedgraph clustering (BAGC) [67]

Input: n, k, T, C , a time attributes set Λ = { λ (1) , . . . , λ ( T ) } , parameters ε, µ, ν . Output: the community assignments of nodes C ,attribute matrix X . choose α ∼ Dirichlet( ε ) ; for C i ∈ {C , C , ..., C k } do for each attribute λ ( t ) do choose θ ( t ) i ∼ Dirichlet( λ ( t ) ) ; for each community C j ∈ {C i , C i +1 , ..., C k } do choose φ ij ∼ Beta( µ, ν ) ; for each node v i do choose c i ∼ Multinomial( α ) ; for each attribute λ ( t ) do choose x ( t ) i ∼ Multinomial( θ ( t ) c i ) ; for each node v j with i > j do choose a ij ∼ Bernoulli( φ c i c j ) | < i, j ≤ n ; returnreturn