[PDF] Big Networks: A Survey

Abstract

A network is a typical expressive form of representing complex systems in terms of vertices and links, in which the pattern of interactions amongst components of the network is intricate. The network can be static that does not change over time or dynamic that evolves through time. The complication of network analysis is different under the new circumstance of network size explosive increasing. In this paper, we introduce a new network science concept called big network. Big networks are generally in large-scale with a complicated and higher-order inner structure. This paper proposes a guideline framework that gives an insight into the major topics in the area of network science from the viewpoint of a big network. We first introduce the structural characteristics of big networks from three levels, which are micro-level, meso-level, and macro-level. We then discuss some state-of-the-art advanced topics of big network analysis. Big network models and related approaches, including ranking methods, partition approaches, as well as network embedding algorithms are systematically introduced. Some typical applications in big networks are then reviewed, such as community detection, link prediction, recommendation, etc. Moreover, we also pinpoint some critical open issues that need to be investigated further.

Full PDF

aa r X i v : . [ c s . S I] A ug Big Networks: A Survey

Hayat Dino Bedru a , Shuo Yu a , Xinru Xiao a , Da Zhang b , Liangtian Wan a ,He Guo a , Feng Xia a,c, ∗ a Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Schoolof Software, Dalian University of Technology, Dalian 116620, China. b Department of Electrical and Computer Engineering, University of Miami, USA c School of Science, Engineering and Information Technology, Federation UniversityAustralia, Australia

Abstract

A network is a typical expressive form of representing complex systems in termsof vertices and links, in which the pattern of interactions amongst componentsof the network is intricate. The network can be static that does not change overtime or dynamic that evolves through time. The complication of network analy-sis is diﬀerent under the new circumstance of network size explosive increasing.In this paper, we introduce a new network science concept called big network.Big networks are generally in large-scale with a complicated and higher-orderinner structure. This paper proposes a guideline framework that gives an in-sight into the major topics in the area of network science from the viewpointof a big network. We ﬁrst introduce the structural characteristics of big net-works from three levels, which are micro-level, meso-level, and macro-level. Wethen discuss some state-of-the-art advanced topics of big network analysis. Bignetwork models and related approaches, including ranking methods, partitionapproaches, as well as network embedding algorithms are systematically intro-duced. Some typical applications in big networks are then reviewed, such ascommunity detection, link prediction, recommendation, etc. Moreover, we alsopinpoint some critical open issues that need to be investigated further.

Keywords:

Network Science, Network Analysis, Big Networks, Complex ∗ Corresponding author

Email address: [email protected] (Feng Xia)

Preprint submitted to Computer Science Review August 11, 2020 etworks, Large-scale Networks

1. Introduction

Complex systems are extraordinarily important in the current and near fu-ture [1]. Researchers of various ﬁelds consider the formulation of complex sys-tems as a crucial issue. Complex systems are sometimes described by networksthat are represented by nodes (vertices) and edges (links). Generally, nodesrepresent the entities and edges represent the connections amongst entities inthe network, respectively. There are some examples of complex networks suchas brain structures, transportation, mobile communication, social relationship,protein-protein interaction, etc. It has been proved that there exist diﬀerenttypes of structural models, including scale-free, random, small-world, and reg-ular networks [2].There are numerous studied that investigated fundamental concepts in com-plex networks. Yu et al. [3] presented an in-depth survey of big data andtechnologies that are considered to be fundamental in big data. Speciﬁcally,they have elaborated the deﬁnition of big data, how to establish and illustratebig data as well as its available applications, including system modeling andbig data scheduling. In this survey, the authors mainly focused on the hard-ware networking structure of big data. Xia et al. [4] comprehensively surveyedbig scholarly data, including its background and state-of-the-art technologies.They have discussed big scholarly data management as well as data analysismechanisms, including social network analysis, content analysis, and statisticalanalysis. Besides, they have explained several big data technologies, such asacademic recommendation systems and academic impact evaluation techniques.Similarly, Khan et al. [5] investigated the trends and challenges of big data fromthe perspectives of data management, analysis as well as data visualization.Additionally, Kong et al. [6] provided in-depth explanation of academic socialnetwork (ASN). They have discussed the background and relevant technologiesof ASN. Furthermore, they presented detailed explanation of tools and models2hich can be suitable for ASN. These survey papers [4, 5, 6] mainly focused onacademic related data (e.g., DBLP and MAG ).In this paper, we propose the concept of big networks (BNs) that are bothcomplex and large-scale networks with higher-order and complicated inner struc-tures. Analyzing the structure as well as characteristics of big networks is themost promising research issue in the area of network science [7]. Furthermore,it is fundamental to understand the network topology in order to discover theclasses and nature (i.e., static or dynamic) of a big network. However, how tocharacterize the structural form of BNs is an issue that needs serious atten-tion from scholars. We analyze the structural characteristics of BNs from threelevels including micro-level, meso-level, and macro-level. Also, the high-orderalgorithms are considered to ﬁnd out the problems in BNs. Hence, we are mo-tivated to propose a guideline framework that characterizes the main researchareas of BNs.Existing methods and algorithms have not speciﬁed BN issues in detail.Hence, this study aims to give guidance to researchers of big networks domainas well as providing insights into the basis of network science objects, from nodesto motifs. Therefore, we introduce basic ideas and explanations of big networks,review the up-to-date of network motifs detection algorithms, multi-layer net-works, community detection, link prediction, recommendation methods, as wellas the challenges occurred in these topics and open issues from the viewpointof BNs.This paper is structured as follows. Section 2 discusses the BN structurecharacterize levels. Section 3 and 4 present the big network models and tech-nologies in BNs, respectively. Section 5 introduces some of the important ap-plications in BNs. Following the open issues and challenges of BNs in Section6, we conclude the paper in Section 7. https://dblp.org/ http://research.microsoft.com/en-us/projects/mag/ . Structural Characteristics Researchers try to understand how communities/groups of individuals aredensely connected with each other. Network models tend to focus on the networkstructures, and nodes inside the network are considered as individuals. Atsome point, it focuses on discovering the pattern of groups’ connection. On theother hand, as social networks (SNs) become complex, a comprehensible patternemerges from the local relationships of the network.Social network analysis tends to focus on the scale relevant to the theoreticalresearch area of the scholars. For instance, in a co-authorship network, one couldanalyze how weak or strong is the collaboration tie of individual authors, howbig is a certain team or community in a network, and how concentrated is thetie strength [8]. There are three approaches to investigate and understand thenetwork structure and characteristics: micro-, meso-, and macro-levels. Theseanalysis levels are predominantly used in social science studies like sociology,political science, and economics.At the micro-level , researchers analyze the node- and edge-levels of connec-tion. In essence, it tends to focus on individuals and their associations to others.For instance, in a co-authorship network, analyses of micro-level might includea one-to-one link between authors. At the meso-level , researchers investigategroup-level interactions that might include the characteristics of the group andhow it is organized. Contrarily, at the macro-level , the analyses cover globalcharacteristics of a given network. For instance, investigating the scientiﬁc col-laboration of two diﬀerent institutes found in geographically dispersed locationsconsidered to be a macro-level. Moreover, scholars working in diﬀerent levels in-vestigate several features of scientiﬁc teams, propose distinct ﬁndings, and makecontributions in terms of presenting numerous techniques and theories. Conse-quently, each level analyzes the diﬀerent scale of data; adopt various methods,algorithms, and visualization tools. 4 .1. Micro-level

At the Micro-level, we take into account individuals or a small group of in-dividuals’ interactions. For instance, the dyadic level considers communicationsamong two people. Node-centric interaction is among the smallest unit of socialnetwork analysis. Moreover, micro-level examines the characteristics of indi-viduals in a network. It also assesses the smallest levels of interaction betweencouples of vertices. It may also analyze the perception of how a certain vertexinﬂuenced by its connections.

In mathematics, the network is a graph or a family of graphs that includesvertices and the set of interconnections between vertices. Usually, a set of ver-tices in a network G represented as V or V ( G ). The vertices could be peoplein a social network, proteins in a biology network, and web pages on the inter-net. In single-layer networks, various measurements (such as PageRank, degree,closeness, betweenness, and eigenvector centrality) can be applied to identify in-ﬂuential nodes and analyzing the structural signiﬁcance of each node [9]. Whenthe characteristics are extended to multi-layer networks, they become diﬀerent.For example, the degree of a node becomes a vector. An edge is an interconnection that appears between two nodes, which canbe weighted or unweighted and directed or undirected. A set of edges in anetwork G commonly illustrated as E or E ( G ). Edges can build a complexstructure in networks. The edges in the network model can be divided intothree categories [10]. (1) Explicit edges: These edges are known in networks,such as the “following” relationships in Facebook and “referring” relationshipsin citation networks. (2) Discrete edges: These edges represent transactionsbetween two nodes, such as text messages and phone calls. (3) Inferred edges:These edges denote some statistical measure of similarity. Since the data inthe real world are often rich but noisy and sometimes even missing information,5esearchers gradually paid more attention to non-explicit edges. For instance,Newman [11] proposed a technique that enables to provide optimal estimates ofthe accurate network structure by using rich but noisy data. Meso-Level network analysis helps to understand better the nature of subnet-works, such as how subnetworks are formed, interactions between subnetworks,the diﬀerence between subnetworks, for instance, the number of vertices eachsubnetwork has and their features, and so on. Generally, it is a study of com-munities in the same network. It may also consider exploring networks that areparticularly constructed to divulge links between micro- and macro-levels. Fur-thermore, meso-level networks might manifest the connection processes diﬀerentfrom micro-level networks.

Network motifs are frequently recurring sub-graphs in a network whose dis-tribution can reﬂect structural properties of complex networks [12]. Becausea motif can be regarded as a basic building block in the global system, it hasimportant applications in many ﬁelds. For example, in [13], the researcher ap-plied it to the algorithm of constructing directed and unweighted networks. Thealgorithm starts from the empty graph and continues to select the in-degree orout-degree distribution of the network by encouraging or suppressing the for-mation of speciﬁc motifs. Besides, the discovery of motifs has also been appliedin many ﬁelds, such as the functional analysis of brain neural networks in brainscience, the pattern detection in biological networks, and the community dis-covery in social networks [14, 15]. As a result, motif discovery algorithms havegradually become active research topics in data mining.There are two main types of existing motif discovery algorithms [16]. (1) Basedon Subgraph Enumeration: Algorithms under this category are not eﬀective inﬁnding motifs with more than eight nodes [17]. (2) Based on Frequency Esti-mation: Compared with the ﬁrst type, algorithms which lie under this category6an get a better result in ﬁnding large motifs. However, they generally costtoo much computing resources [18]. To deal with this problem, Lin et al. [16]proposed a solution based on GPUs (Graphical Processing Units) to reduce theoverall computational time, which parallelizes a great number of tasks of sub-graphs matching when calculating the frequency of subgraphs in random graphs.In the meantime, they also experimented on various biological networks; andobtained several key factors aﬀecting GPU performance.

Compared with the edges in a general graph, which can only indicate theconnection between a pair of vertices, the hyper-edge in the hyper-graph cancontain multiple vertices. Mathematically, a hyper-graph is a graph that canbe used to represent the connection between multiple vertices. In a hyper-graph, an edge can be linked to any number of nodes, that is called hyper-edge.For instance, in a general network of scientiﬁc collaboration, the edge can onlyrepresent whether two authors have collaboration relationship. However, in ascientiﬁc collaboration hyper-network (network with hyper-graph topology), ahyper-edge can represent an article written by several authors.Since the relationships in the real world are often not just simple binaryrelationships, the research studies on hyper-graph have gradually become a hotspot. The introduction of hyper-edge can reduce not only the complexity ofthe network structure but also portray more complex relationships. At present,many types of research on hyper-edges and hyper-graphs focused on the char-acteristics of hyper-network. For example, in [19], the team of Purkait hasproved theoretically and experimentally that using large hyper-edges can getbetter clustering accuracy in hyper-graph clustering, and has also proposed asampling large hyper-edges algorithm. In [20], Kabiljo et al. proposed a dis-tributed algorithm which can partition hyper-graph with billions of vertices andhyper-edges in a few hours. 7 .3. Macro-level

Rather than individuals and communities interactions, in the macro-level weanalyze the structure of large-scale as well as complex networks at the level ofa network component, density, and so on. This is a deeper level that studies atthe level of the whole big network.

Network density assesses the density of edges between nodes in a network.It is also the quantitative relation of the total edges in the network to themaximum variable that the network can accommodate. It also explains thepercentage of actual links that could appear between two vertices. The actuallinks are connections that exist in the network. For instance, in a particularscientiﬁc team, the actual links between researchers might be many (— it mighteven be a 100% of all possible links in the team). A possible link is a linkbetween researchers that might exist in the network. On the other hand, theactual link between researchers is likely to be low in comparison to possible linksthat appear at a conference. Hence, we could say that the network density in ascientiﬁc team is high but relatively low density at the conference.Network density D for an undirected network is mathematically representedas D = EN ( N − , where N and E refer to the number of nodes and edges in thenetwork, respectively. As an essential parameter in network science, it is mainlyapplied as an evaluation criterion in experiments [10]. Since there are often overlapping links in networks, it is an important taskto study the overlap and multi-degree. The overlap in the multi-layer networkcan be divided into two types: global overlap and local overlap [21]. Globaloverlap between layer α and layer β can be deﬁned as: O αβ = P i

61 2 3 Layer α Layer β Layer 1 ---------- ---------- ----- -----

Figure 1: The structure of a multi-layer network.

Besides the overlap of links, there might exist an overlap of motifs as wellas an overlap of communities in a network. Li et al. [14] combined the motifdiscovery technique and clustering to discover overlapping communities in social9etworks and achieved good experimental results.

3. Big Network Models

In this section, we give comprehensive reviews of various big network models,including time-aware BN model, motif-based BN model, and multi-layer BNmodel. In each subsection, we discuss the overview of each model, categories ofthe models, and their corresponding algorithms from the perspective of BNs.

A network is a prevalent form of representing information. For instance,in a social network, there is a form of graph that is connecting people, in bi-ological networks, there are regulatory structures, inﬂuences, and correlationsin the form of a graph, and in academic social networks, there are researcherslinked through citations or co-authorship [6]. Networks can be static, where thevertices and links do not change over time, or dynamic, where both can appearor disappear throughout the lifetime of the network.Furthermore, in a static network, there is no change in vertices, and links re-main the same permanently. Whereas in a dynamic network, there is a probabil-ity of vertex disappearance and the formation of new vertex. The disappearancemay occur in their links although they can be recovered or reappeared. Also,the topological structure of dynamic networks varies over time. Some examplesof real-world dynamic networks are social networks, transportation networks,and communications networks.In this section, we present summaries of static and dynamic networks. Wefocus on the high-level topics that are crucial in big networks. For more com-prehensive reviews, readers can refer to [23, 24, 25, 26].

The contents in a static network either rarely or never changes. For instance,if we take a static website, the contents on it remain there for days, weeks,10onths, or even for years (see Figure 2). The nature of a static network can beundirected or directed and unweighted or weighted.

Page 1 Page 4 Page 3 Page 2

Home Page

Figure 2: A static web network that is directed. The connection between vertices (i.e., webpages) depict the hyperlinks

As stated in [24], there are two fundamental ways to represent static net-works; these are adjacency matrix as well as link list. These representationshighlight features of static networks, and are susceptible to speciﬁc kinds ofcomputations. In the adjacency matrix, networks can be illustrated as an N × N matrix, in which two vertices are adjacent if they have links between them thatconnect them directly. Note that, representing a static network using adjacencymatrix is beneﬁcial while developing and quantifying the structure and dynami-cal processes of the network. However, it consumes much memory at the time ofcomputation. The processing of a network with N number of vertices requiresa complexity of O ( N ). Having considered the limitation, the link list can be anoption to represent a static network. Unlike the adjacency matrix, the link listis eﬃcient to use for randomization of links as well as for numerical experimentsof networks with sparse interactions.There are numerous mechanisms utilized to analyze the structure and char-acteristics of a static network starting by measuring some of the properties ofnetworks. For instance, (i) analyzing degree distribution to describe the con-nectivities between networks, (ii) the average path length in the network sothat one can tell how fast information can propagate, and (iii) clustering coef-11cient to ﬁnd out the group ﬁtness of individuals in the network. Quantifyingsuch statistics is a non-trivial task; hence, there are more sophisticated meth-ods to analyze networks. In some cases, data analyst are interested in analyzingsomething called local network property, which is calculating the frequency ofoccurrences of subgraphs in a network, i.e., network motifs (see Section 3.2).Similarly, to evaluate the importance of vertices in a network, analysts employseveral measurements such as PageRank, Katz, degree centrality, betweennesscentrality, as well as closeness centrality [9].Furthermore, one of the most crucial issues in big network analysis is an-alyzing the community structure of a network [27]. Thus, scholars proposednumerous approaches to discover communities in a static network; one of thewell-known methods is Infomap. Infomap is designed explicitly for a directedand weighted static network that aims to identify the non-overlapping commu-nity structure of a network. There are also methods that detect the overlappingcommunities of static networks such as K -clique algorithm and the Lancichinettimethod [24].Rand et al. [28] studied the usefulness of static network in the context ofhuman cooperation. The authors claimed that a static network structure helpsto make human cooperation steadfast. Verily in a ﬁxed type of network, in-teractions among cooperators become more intense in such a way that theybeneﬁt each other more. Rand et al. [28] presented evidence that supports theargument that static networks can promote human cooperation. Networks that evolve over time are called temporal or dynamic networks,such as transportation networks, social networks, communication networks, net-works of citations, and many more real-world networks [26, 29]. As stated in[30], in dynamic networks, connections are denoted by a time-slot of static net-works. In essence, in contrast to static networks, dynamic networks considerthe timestamps as well as take into account the temporal information. Figure3 shows a simple example of a dynamic network.12 er t i ce s t5 a a b cd e t1 t2 t3 t4 Time b cd e Figure 3: Dynamic Network with 5 number of vertices showing the evolvement of interactionsamong vertices in diﬀerent time spans

From the perspectives of human behaviors, Rand et al. [31] discussed thatin dynamic networks, changes occur regarding the behavior of an individual’sconnections in a social network. Moreover, the authors found out that human co-operation decreases through times when the random-walking process takes placein social networks. Additionally, human cooperation will decrease or increasewhen there are infrequent and frequent changes in the network, respectively.However, the experimental results in [31] indicate that the dynamic nature ofsocial networks can promote human cooperation in large groups of interactions.Similarly, Melamed et al. [32] proved that dynamic networks endorse coopera-tion at the higher levels where there is a new formation of connections or elsediscarding of a connection.Analyzing the structural characteristics of a dynamic network as well asmeasuring its properties has the same purpose and features as of a static net-work. However, researchers extended the models and methods proposed forstatic networks so that they could ﬁt in dynamic networks. For instance, Luis et al. [33] proposed a random-based measurement to quantify the centrality ofindividuals in a temporal network called TempoRank. TempoRank is an exten-sion of PageRank that mainly works for static networks. In [34], the authorscategorized the centrality measures of vertices for dynamic networks into two,such as time-dependent and time-independent centrality measures. The formeridentiﬁes the changes in the importance of a vertex. Also, it analyzes the prob-13bility that a vertex inﬂuential at a particular time may not be inﬂuential atother times. Whereas the latter evaluates how a vertex is vital in general. Re-cently, Koo et al. [35] proposed a ranking algorithm speciﬁcally for a dynamicweb environment.Like static networks, one of the challenging tasks in a dynamic networkis community detection. Moreover, it is vital to analyze the structure of theinteractions of vertices and how they evlove at times. Hence, Liu et al. [36] pro-posed a community detection method for dynamic networks called “persistentcommunities by eigenvector smoothing (PisCES)” which is derived from degreecorrection (—heterogeneity of degree within clusters) and evolutionary spectralclustering techniques. The method merges information across a sequence of net-works over time. In another work [37], scholars proposed an R package dynamiccommunity detection for evolving networks called DynComm. DynComm hasan understandable application programming interface (API) that eases the de-tection of communities for a big dynamic network[38]. Table 1 brieﬂy shows thecomparison of static and dynamic networks.

Recently, network motifs are getting more attention from researchers as net-work motifs are useful to discover the structure of big networks [43]. Researchersare adapting the concept of network motifs to analyze the structure of big net-works including social networks, co-authorship networks, biological networks,neural networks, protein-protein interaction networks, and so on. A variety ofnetworks inclined to have various collections of local structures that occur fre-quently [12]. In this section, we discuss network motifs, speciﬁcally the conceptof network motif and the algorithms of discovering network motifs in diﬀerentscenarios within big networks.The theoretical deﬁnition of network motif is ﬁrst proposed by Milo et.al [44], wherein, they described network motifs as “patterns of interactions occur-ring in complex networks at numbers that are signiﬁcantly higher than thosein randomized networks”. Generally speaking, if the frequent occurrence of a14 able 1: Comparison of Static and Dynamic Networks

Static Network Dynamic NetworkOverview information eitherrarely or never changes information/data evolve andchange over time, impor-tant to disclose patterns thatmight be hidden in a more ag-gregated networkCentrality Measure-ments Degree Centrality,PageRank, Katz, andother classic measure-ments [9] TempoRank [33], C-Rank [35]Community Detec-tion Methods Infomap, Fast Unfold-ing Method [24] DYNCOMM [37], [39], [40]Overlapping Com-munity DetectionMethods K -Clique [24] [41], [42]subgraph g ′ in a network G is more than it occurs in a random network, then g ′ will be labeled as a network motif.Network motifs help to understand big networks by identifying small func-tional subgraphs. Those subgraphs are simpler to understand in contrast tothe whole complexity of the big network at once. The subgraphs describedby certain patterns of interactions among nodes may show eﬃciently achievedstructural characteristics of a particular network.Milo et.al [44] discussed network motifs in a food web network assuming adirected uni-partite network in which vertices and links represent the group ofspecies and the ﬂow of energy through the network, respectively. Moreover, itessentially looks for common patterns that are occurring between three species.Furthermore, having considered the limitation of studies regarding network mo-tifs in dynamic networks, Paranjape et al. [45] introduced a notion that gives15nsights into the importance of motifs in networks that evolve over time. Theyexplained temporal motifs as “induced subgraphs on a sequence of temporaledges”. Also, they proposed an algorithm that counts available motifs in agiven temporal network.Researchers have proposed several algorithms to identify patterns of reoc-curring interactions and essentially see which ones occur more frequently thanexpected randomly. In this paper, we discuss two types of motifs, includingtriangle motifs and higher-order motifs. Besides, we present existing algorithmsthat tackle network motifs discovery challenges by taking into account the com-plexity and size of the networks. Moreover, the algorithms discussed here areselected approaches that can be comparatively applicable to BNs. Triangle Motifs could appear in a particular network that designates theinteractions among three vertices. Moreover, it is beneﬁcial to comprehend theinter-connectivity of vertices in a network. Also, a triangle motif describes thesocial pattern in a network [46]. It can also model a social closure. Let usconsider a static directed network S that is induced by links of motifs T . Intriangle motifs, S comprises 3 vertices and at least one directed link amongstany pair of vertices. S of T consists at least three and at most six static edges[45]. The high order network structure is associated with a graph and subgraph.In complex networks, the number of motifs is calculated for graph clustering andcommunity detection. The higher-order motifs are computed to ﬁnd the relationin pair of the nodes and the authority of the nodes [47]. High order connectivitypattern are building blocks of a single homogeneous network which are essentialfor the modeling components of the network. A graphlet is a small connectedsubgraph, and the non-trivial graphlet is a node pair structure connected by anedge. Higher order graphlets have a greater number of nodes and edges.16urther, a typed network is used to uncover the high order organization ofheterogeneous networks. The typed graphlet network captures both the connec-tivity pattern and typed [48]. An imperative high order network structure suchas cliques and big stars can be discovered interactively by the user in real-time.Network motifs noticeably identify the vital higher-order structures. Figure 4shows the higher-order network structure of a small co-authorship network.

CSARDI, G SAGER, J

YOUNG, MHAGA, PMARTIN, R

DRAS, P

KAISER, M

HILGETAG, CSPORNS, O KOTTER, R

ONEIL, MSTEPHEN

BURNSKAMPER, L

BOZKURTBLACKMORE, C

SCANNELL, J

Figure 4: The high-order network visualization of a small co-authorship network. The diﬀerentcolors of the edges represent diﬀerent high-order motifs that appear in the network

The baseline motif discovery approaches presented at the early stage pri-marily consider two fundamental stages: 1) calculating the frequency of allsubgraphs of a certain amount obtained in the network known as “subgraph cen-sus”; 2) generating a set of similar random graphs with similar degree sequencelike the given network. At the second stage, the subgraph census is computed oneach of generated subgraphs from which the statistical signiﬁcance of isomorphicsubgraphs of distinct classes is computed as well. The statistical signiﬁcance iscomputed by using the probability of patterns being overrepresented. The mainlimitation of such methods occurs while computing subgraphs census even in anetwork with less number of nodes. Thus, in this section, we discuss recentlyproposed algorithms that take into account the limitation mentioned above as17ell as computational complexity while applied in a big network. gLabTries

G-tries is a preﬁx tree data structure that facilitates the storage of a set ofgraphs eﬃciently by preventing re-use of the subgraphs information among com-mon preﬁxes. Misael et al. [49] proposed motif discovering algorithms for bothundirected and directed networks called gLabTrie. gLabTrie is an extension tothe original G-tries motif discovery algorithm [50]. gLabTrie is a data structurefor discovering motifs with constraints. As stated in [49], the performance ofthis method highly depends on a certain network size. The fundamental changemade on gLabTries is “label-based queries”. Mongiov´ı et al. [49] deﬁned label-based queries as quadruple Q containing multiset of labels C , requested size ofmotifs k , frequency threshold f , and p -value threshold ( Q = ( C, k, f, p )). Whileimplementing gLabTrie, users give sets of constraints as a requirement, and thesystem generates topology for each speciﬁed constraints.

VALMOD: Variable Length MOtif Discovery

To mine network motifs discovery of variable lengths, Linardi et al. [51] pro-posed an algorithm called VALMOD. This algorithm has the ability to discoverthe top- k motifs pairs of variable length. VALMOD is a scalable algorithmthat can be used by users to reveal accurate motifs eﬃciently. Besides the mo-tif discovery algorithm, they also proposed motifs ranking approach named asVALMAP. VALMAP is a metadata series that mainly uses a new normalizedlength for ranking motif pairs of variable length. LCNM: Large Co-regulatory Network Motif

Luo et al. [52] proposed an algorithm named large coreglulatory networkmotif (LCNM) that aims to detect large coregulatory motifs with relatively lowcomputational complexity. They mainly considered colored network motifs ina large human coregulatory network. Moreover, Luo et al. proposed candidatesubgraphs patterns generating methods such as quick sampling and random18alking methods as well as exhaustive counting to generate all subgraph pat-terns. The authors adopted G-tries aiming to make the algorithm capable ofsaving a set of motifs in G-tries. Moreover, G-tries is improved in such a waythat it could identify the maximum number of motifs of a size larger than 4nodes in a large network. Besides, a method that improves the computationalcomplexity of motif discovery in a large network is also proposed [52]. However,it still consumes time when applied it to a big network with thousands andmillions of nodes. Unlike other methods, LCNM can be able to discover motifsup to a maximum of 8 interacting nodes.

Recently, multi-layer networks (MLNs) are getting attention from scholarsin many disciplines, including economics, infrastructures, climate, neuroscience,and so on. MLNs have been presented under the circumferences of social sci-ences to explain distinct types of social interactions existing among the verticesof social networks [53]. More than one interrelating networks form a multi-layernetwork, and one typical example of MLNs is a social network [54]. DescribingMLNs is critical to comprehending complex and big networks such as brain net-works [55], transportation networks [56], big scholarly networks, and so forth.Also, MLN makes it easier to characterize the structure of big networks. Fur-thermore, it provides a comprehensive perspective of big networks compared tothe framework of a single layer network [53].

Deﬁnition 1.

A multi-layer network has a set of vertices, edges, and layers G ( V, E, L ) . The layer is the one that contains diﬀerent characteristics of agiven network. Moreover, it is a combination of networks at diﬀerent layerswith distinct types of edges (i.e., multiple types of interactions) among vertices. Also, Bianconi [53] deﬁned a multi-layer network as follows.

Deﬁnition 2.

A given multi-layer formed by distinct M layers is formed by a setof M networks describing the interactions within each layer and M ( M − / etworks describing the interactions between nodes in every pair of diﬀerentlayers. Additionally, in [53], MLN is mathematically deﬁned as:

Deﬁnition 3.

MLN is given by the triple G M = (cid:16) ´ Y , ´ G, G¸ (cid:17) , where ´ Y denotesthe set of layers, such that ´ Y = { α | α, ∈ { , , . . . , M }} of the MLN, and M denotes the total number of layers, i.e., the cardinality of M = | ´ Y | . The network G M has n number of vertices in each layer, V = { , , , . . . , n } ,and M layers with diﬀerent characteristics. Each layer contains a set of vertices.The vertices can create links within the layers (i.e., intra-layer links) as wellas across the layers (i.e., inter-layer links). For example, assume there is ascholarly multi-layer network with two layers in which the ﬁrst layer is a citationnetwork, and the second one is a co-authorship network. In the citation network,vertices and edges represent papers and the citing papers, respectively. In theco-authorship network, authors are vertices, and they get connected if they co-authored one or more papers together. The interactions that appear amongthese two diﬀerent networks form an authorship network, i.e., authors linked tothe papers they wrote.The framework of MLN reduces the challenges that happen while measur-ing the centrality of the vertices, detecting communities, discovering inﬂuentialcommunities, predicting links, and recommending in a big network. Mucha et al. [57] proposed the ﬁrst community detection algorithm con-sidering a multi-slice network. A multi-slice network is one kind of multi-layernetworks in which a combination of diﬀerent networks tied over connections thatlink each vertex from a speciﬁc slice to another. The proposed algorithm al-lows the analysis of the network’s community structure that changes over time,i.e., a temporal network. The type of network considered in their study hasseveral scales and links with distinct characteristics. The authors implemented20heir algorithm on diﬀerent real-world networks, and have obtained satisfyingresults.Additionally, in [58], another approach has been introduced mainly to iden-tify consensus clusterings in a multi-layer network. The method produces ac-curate and stable results deriving out of partitions provided by stochastic ap-proaches. Moreover, while combining the method with other existing commu-nity detection algorithms, it enhances the accuracy and stability of the generatedpartitions. Also, the authors claimed that the method is suitable to characterizeand keep track of the community structure of temporal networks. Lancichinetti et al. [58] applied the method on large-scale citation networks and witnessed itscapability to control the structure of multi-layer networks.De et al. [59] proposed an algorithm that generates overlapping communitiesin a multi-layer network, i.e., the method identiﬁes communities across layersthat instigated from similar interaction.Furthermore, Raul et al. [60] introduced a method that discovers the richstructure of communities of multi-layer networks by connecting each multi-linkwith a community. The multi-links portray the associations presents amongstvertices of the multi-layer networks, and they are a combination of a distinctnumber of appropriate layers.

Quantifying the centrality as well as the ranking of vertices in a multi-layernetwork is as critical as it is in a single layer network. Thus, numerous ap-proaches have been proposed by interested scholars. Many of the measurementsproposed to identify the importance of vertices in single layer networks areextended to be applied to multi-layer networks. For instance, the PageRankmethod is extended to Multiplex PageRank [39], which assesses the centralityof vertices of multi-layer networks. Mainly, Multiplex PageRank evaluates howthe central vertex in one layer is inﬂuential on another layer. For example, sup-pose we have a co-authorship network containing a collection of scholars whowork explicitly on big data. Scholar A is the prestigious scholar with high cen-21rality score in this network. Thus, if A takes part in another scientiﬁc teamthat works explicitly on cloud computing, the centrality score of A will mighthave an impact. Additionally, it inﬂuences the centrality of A in the otherco-authorship network with a diﬀerent research area. Hence, according to theexperiment done by Abrahao et al. [39], a vertex’s centrality in a particular layermight aﬀect the centrality of the same vertex in another layer.Additionally, considering the limitation of Multiplex Pagerank, Rahmede etal. [61] introduced an algorithm that eﬀectively ranks vertices as well as layersof the MLNs. The centrality and importance of vertices are dependent on eachother. Moreover, the authors argued that a layer with more central vertices init attains a more signiﬁcant inﬂuence than layers with less central vertices. Luis et al. [62] extended the standard eigenvector centrality measure to be suitableto MLNs. The method measures the importance of vertices in MLNs.

4. Technologies in Big Networks

In this section, we introduce state-of-the-art technologies of BNs, such asranking approaches, partitioning algorithms, as well as overview of networkembeddings and available techniques.

The main idea of ranking is mining information available in the cloud or inany storage area. The aim of ranking is to extract data which are appropriatefor the purpose they are intended for. Some of the instances which clearly showranking impacts are: how recognizable are human’s merit and success [63, 64],how to distinguish and prevent an infectious disease while happening withoutwarnings [65], how to assign funding for scientiﬁc research, and how to identifykey authors in multi-authored papers.Understanding the network representation of any input data is a criticalpart of ranking algorithms. Nowadays, the complex network has risen as oneof the main promising approaches to analyze diﬀerent categories of complex22ata like ﬁnancial, information systems, and social [66]. As a result, networkrepresentation helps to minimize the complexity of any system. It also enablesusers to comprehend the structure and dynamics of any complex system.There are abundant surveys and literature reviews that cover ranking meth-ods [67]. In this review, we discuss algorithms designed particularly for rankingvertices, motifs, and communities. Table 2 shows a summary of the rankingmethods. Note that network type refers to weighted or unweighted and directedor undirected.

Discovering the most important nodes in large-scale and complex networkshas attained great consideration from scholars [72, 73, 74]. Recently, plenty ofapproaches have been designed to identify inﬂuential vertices in large-scale aswell as complex networks. Some of the traditional and well-known methodsare the centrality measurements [9]; these are degree centrality, betweennesscentrality, closeness centrality, and eigenvector centrality. Additionally, PageR-ank, HITS, and Katz centrality are the other typical ranking methods applied inmany aspects. Having considered the fact that classical methods do not performwell on big networks, scholars proposed numerous methods.Chen et al. [68] introduced a local vertex ranking approach called Cluster-Rank concerning the clustering coeﬃcient of a vertex. Hu et al. [69] proposeda novel method that ranks nodes to discover important ones by applying struc-ture holes called E-Burt. As stated in [69], a structural hole is a gap amongindividuals who have no either direct or indirect repetitive relations. However,they have complementary sources of information. This method can be imple-mented in weighted networks. It considers three factors such as the connectionstrengths of the vertex locally, the number of links that connect the vertices,and the distribution of the connectivity strengths on its connecting links. Toquantify the constraints of vertices while forming a structural hole, the authorsin [69] employed constraint coeﬃcient. If a vertex has a smaller coeﬃcient, itmeans the vertex can easily compose structural holes as well as it becomes the23 able 2: Summary of Vertex Ranking Methods.

Method name/ Ref-erence ost inﬂuential. Hu et al. [69] claimed that the more inﬂuential the verticesare, the stronger the disseminating capability they will have in the network.Similarly, Wei et al. [71] introduced a practical approach to identify inﬂu-ential vertices built upon network representation learning (NRL). NRL aimsat learning disseminated vector representation for all vertices in a given net-work. This approach considers the structure of a given network including theoverlapping communities found in the network. For this method, informationdistributed to several communities via vertices in community overlaps. Wei etal. [71] claimed that if a vertex is a member of multiple communities comparedto other vertices, then there is a high probability that this vertex will have aninﬂuence on more communities than others. According to the experiment donein [71], the method is pertinent to networks that are complex and large-scale.Salavati et al. [70] proposed an inﬂuential node detecting method that takesinto account the closeness centrality of vertices in a network. The authorsproposed a ranking algorithm called BridgeRank by improving the closenesscentrality measure using the local structure of vertices. The proposed methodimplemented as follows. First, it ﬁnds the local centrality score for each vertex.Next, it extracts one prominent vertex from each community using the centralityvalue. Finally, the method ranks the vertices according to the summation of thevertices’ shortest path length and generate the inﬂuential vertices. Accordingto [70], the inﬂuential vertices have the capability of high spreading informationwith low computational time. Moreover, the method is suitable for complex andlarge-scale networks compared to other benchmark methods. There are numerous methods with disparate approaches but similar objec-tives designed to quantifying similarities amongst DNA motifs. There are alsoapproaches mainly focus on discovering, grouping, comparing, and ranking net-work motifs [75, 76]. In this section, we present some of the methods which canbe applicable for BNs in their chronological order.Having considered the lack of methods that discover motifs, match, compare,25nd cluster known motifs, Kankainen et al. [77] developed a web-based toolcalled Matlign. Matlign ﬁlls these gaps, especially reduces repetition of similarmotifs. Matlign mainly facilitates post-processing such as clustering, matching,and comparing DNA sequence motifs. Matlign is implemented on transcriptionfactor databases which stores proﬁles of transcription factor binding sites. Insuch cases, motifs can be represented by two formats such as position frequencymatrices or consensus sequences. Thus, Matlign facilitates the post-processing ofdiscovered motifs in both formats. It also initiates from a massive amount of pre-identiﬁed motifs, and discovers, aligns, and evaluates the similarities of motifsgenerated by prediction tools. Consequently, the tool clusters the discoveredmotifs together and generates a set of non-redundant motifs. Kankainen etal. [77] conclude that their tool outperforms other previously proposed methodsbased on the extensive comparative analysis they have done.Similarly, Habib et al. [78] designed a method that identiﬁes and comparesdiscovered motifs with already-known motifs and gives a set of non-redundantmotifs. The method initially adopts relevant motif discovery algorithms for de-tecting new motifs and ﬁltering them in accordance with their profusion amongstthe given set of sequences. Afterward, clustering and merging of newly detectedmotifs take place individually by considering a non-redundant group of motifs.Finally, the method ranks and identiﬁes a non-redundant set of motifs. Havingcompared with other approaches, this method is more relevant to be applied inBNs.

Numerous real-world BNs such as co-authorship networks, social networks,neural networks, and so on comprise community structures [79]. Since the pastfew decades, the problem of identifying clusters/communities in a complex andlarge-scale network is the most crucial problem which attracts scholars’ atten-tion [27]. The community identiﬁcation problem focuses on discovering availablecommunities/clusters in a particular network. However, community detectionapproaches failed to consider the most inﬂuential communities amongst the dis-26overed ones. Most of the approaches identify key vertices to form a communitysurrounding them. Identifying the top inﬂuential community plays a criticalrole, for instance, to ﬁnd out the community which is capable of spreading in-formation faster to other communities in a network [80]. Moreover,

Li et al. [81]discussed that one vital feature of a community is the ability to propagate in-formation for the outsiders. Another instance is that, assume that Ana is a newbig data researcher and she wants to investigate some speciﬁc research problem.Hence, she wants to discover the most inﬂuential research teams from a co-authorship network in which Big Data related research issues are investigated.The discovered team supposed to be beneﬁcial to produce quality research work.Thus, recently few research works have been done on this problem,

Li et al. [79]was the ﬁrst to formulate the problem of unraveling the most prominent com-munities in a large network. Subsequently, Doo et al. [82] proposed inﬂuentialcommunity detection approach by adopting undirected network. Doo et al. described a community’s inﬂuence as “the minimum weight of vertices in thatspeciﬁc community and a community with the largest inﬂuence value consideredas the top inﬂuential community.” In another work, Du et al. [83] proposed acommunity ranking method that classiﬁes communities based on their strength,which alters over time. Moreover, Faisal et al. [84] discussed remarkable scenar-ios that emphasized the need and signiﬁcance to discover the most inﬂuentialcommunities in a particular network. From the perspective of BNs, identifyingthe most inﬂuential community could reduce the complexity and computationaltime of the process than identifying key vertex in a whole big network.Li et al. [79] proposed a model “ k -inﬂuential community” that can capturean inﬂuential community in a network by adopting the idea of k -core. To beginwith, Li et al. [79] gave a formal deﬁnition of ‘inﬂuential’ in an individual andcommunity levels. Li et al. suggested numerous approaches and optimization forinvestigating the “ﬁnding of inﬂuential communities” research problem. Basedon their model, they introduced an online searching method aiming to unravelthe “top- r k -inﬂuential communities” of a given undirected network. Further-more, for getting a fast searching process, they proposed a “linear space index27tructure,” which maintains eﬃcient searching of the “top- r k -inﬂuential com-munities” in an optimal time. They experimented the algorithms on diﬀerentlarge-scale networks. Having considered the limitation (i.e., high time complex-ity) occurred during applying the inﬂuential community model on big networks,Li et al. [85] proposed an improved approach called Inﬂuential Community-Preserved Structure (ICPS). ICPS reserves k -inﬂuential communities as well asholds linear space concerning the size of the network.Zhan et al. [80] introduced a method that discovers top- k inﬂuential com-munities in a big network by adopting the well-known centrality measure that isKatz centrality. They considered Katz centralities to deﬁne the strength of com-munities. They assumed that an inﬂuential community is the one that connectsto more number of communities. In such a case, information can be dissemi-nated immediately to the largest possible number of communities available inthe network. Zhan et al. employed two main factors to rank the communitiesin a network. First, they compute the average katz centrality value of eachindividual vertex in a particular community. Second, they discover the totalcommunities into which a particular community could propagate information.To do that, they calculate the interactions of the vertices in a community withvertices in diﬀerent neighboring communities. A community with a higher valueof Katz centrality is considered as the most inﬂuential community if it can ableto share information to the maximum number of diﬀerent communities in anetwork apart from the disseminator community. [80].Bi et al. [86] proposed a method called LocalSearch that is an instant-optimalalgorithm with a linear computational complexity. On top of that, they intro-duced an approach that facilitates LocalSearch in a progressive way to comput-ing and reporting top- k inﬂuential communities in a descending inﬂuence value.The subnetwork’s inﬂuence value is explained as “the minimum weight of thevertices in a subnetwork”. Unlike the method discussed previously, this doesnot demand to specify the value of k . As described in [86], a user has an optionto end the algorithm as far as the determined inﬂuential communities have beengenerated. 28 .2. Partition Algorithms Partitioning is a decomposition technique that optimizes the handling ofcomplex systems. Partitioning techniques decompose a big network into man-ageable smaller subnetworks called clusters or communities. Hence, any BNapplications can be applied on the subnetworks independently to such a degreethat reduces the complexity and computational costs. Partitioning methodshave to minimize the linkage amongst the subnetworks.

Deﬁnition 4.

Given a network G ( V, E ), wherein, each vertex v ∈ V , V isconsidered as the total size of the network in terms of vertices. The problem ofpartitioning is to divide V into κ disconnected subnetworks { v , . . . , v κ } suchthat it optimizes the functionality of the network, based on certain constraints.While applying partition algorithms (PAs), initially the number of communi-ties is given as an assumption as well as a network G of V vertices. Subsequently,PAs construct the vertices into κ partitions ( k ≤ V ), where each partition in-dicates a cluster/community and each vertex belongs to only one community.This shows as there is no link between clusters/communities; in essence, thereis a high and low inter-community and intra-community similarity, respectively.The communities are formed on the basis of distinct partitioning measurement.The vertices within a community formed by PA have similarities amongst oneanother, while they have disparate relation with vertices in the other community.Implementing partitioning algorithms on BNs is vital to address some chal-lenging issues like detecting inﬂuential vertex from a community, recommenda-tion, link prediction, etc. For example, identifying the most inﬂuential authorfrom a whole big co-authorship network could be time-consuming. Thus, if wepartition the network, it will reduce the computational time and complexitywhile discovering the inﬂuential authors.There are some traditional partitioning methods such as CLARANS, κ -medoids, and κ -means. In the case of κ -means, each community is representedby its center. Whereas in κ -medoids, a single vertex represents a community itbelongs to. We brieﬂy discuss these methods in the following subsection.29 -means Algorithm In this method, κ is an input parameter, which is the total of communitiesa network G assumed to have. The κ -means algorithm takes place as follows.First, it partitions vertices into κ non-empty subnetworks, and each subnetworkrepresents a community/cluster. Next, κ -means computes key points as thecentroid of the communities of a particular partition in which the centroid isthe central point of the community. Subsequently, it assigns the remainingvertices to the community with the nearest key point as well as the center ofthe community. Afterward, it calculates the mean value for each community.The κ -means process works iteratively until the partitioning criterion converges[87]. In most cases, assuming the number of communities (i.e., κ ) in advanceconsidered as one limitation of κ -means algorithm. Moreover, as far as BNsare concerned, deﬁning mean values for each cluster may become costly, and itmakes κ -means algorithm less applicable to be implemented on BNs. k -medoids Algorithm As the name implies, the k -medoids algorithm takes medoids as the mostcentrally placed vertex and a reference point in a community rather than acommunity’s mean value. As stated in [88], a medoid is “a statistic metric whichrepresents that data member of a data set whose average dissimilarity to all theother members of the set is minimal.” In the k -medoids algorithm, non-centralvertices clustered along with the most related representative vertex. PAM -Partitioning Around Medoids is a k -medoids algorithm that can be eﬀectivelyimplemented on small datasets yet failed to work well on big networks [87, 89].The k -medoids algorithm are implemented as follows. The number of partitionsand dataset are given. Initially, it chooses k vertices as medoids. Next, it assignsnon-selected vertices to their nearest medoids. Consequently, it computes thetotal cost of swapping vertex, which is to ﬁnd a new collection of medoids. Thealgorithm works iteratively until no change is demanded. In this algorithm, eachiteration has the computational complexity of “ O ( κ ( V − κ ) )” which makes itunﬁt to be applied on BNs. 30owever, there are extended algorithms which are proposed under the groundof κ -means and k -medoid algorithms. The extended approaches can be applica-ble in BNs. We brieﬂy discuss the state-of-art of partitioning algorithms, whichare proposed recently. Clustering Large Application

Clustering large application (CLARA) algorithm is considered to be an ex-tension of k -medoids method. It is designed by taking into account the lack ofpartitioning algorithms for large datasets and with the objective to overcomethe limitations of partition around medoids [89]. Clustering Large Algorithm Based on Randomized Search

Having considered the incapability of k -medoids method in complex andlarge networks, researchers proposed a method with the ground of k -medoidscalled clustering large algorithm based on randomized search (CLARANS) [90].CLARANS adopt the random searching technique for expediting the clusteringas well as partitioning process of a large number of datasets [90]. As mentionedearlier, CLARANS was proposed under the basis of PAM and CLARA. From theviewpoint of BNs, CLARANS is preferable as far as eﬃciency and eﬀectivenessare considered. MapReduce-based Parallel k -Medoids Clustering Algorithm Shaﬁq et al. [91] proposed a map-reduce-based clustering algorithm thatcan be applied on big datasets. As stated in [91], the authors considered thegrowing nature of real-world networks concerning velocity, volume, as well asvariety. In contrast to other classical partitioning methods, this method attainsparallelization despite the size of k -clusters which is going to be identiﬁed. Asfar as the experimental results found in [91] considered, we believe that thismethod is suitable to be applied to BNs. Table 3 depicts the comparison betweenpartitioning algorithms surveyed in this paper.31 able 3: Comparison of Partitioning Algorithms. The notations n , k , and m in the time complexity denote the numbers of points, clusters/medoids,and vertices in which the data is distributed in case of [91], respectively. Criterion/Methods Partitioning AlgorithmsK-means K-medoids CLARA CLARANS Reference [91]

Time Complexity O ( nk ) O ( k ( n − k ) ) O ( k ( c + k ) + k ( n − k )) O ( k + nk ) O ( nk/m )Eﬃciency less better than k-means better than theprevious better perfor-mance comparatively moreeﬃcientPre-determine k yes no no no noOptimization small networks small networks comparativelylarger networks large-scale networks BNsAdvantages works well for small-scale datasets easily understand-able, the algorithmworks in a ﬁxednumber of steps,less susceptibleto outliers unlikek-means can handle largerdataset thank-means andk-medoid algo-rithms gives a betterresult than othermethods, easilyhandle outliers,comparativelybetter whenimplementedon large-scaledatasets comparatively workswell on BNs, scalableand eﬀectiveDisadvantages predicting the k-valueand comparing thequality of the clustersare challenging tasks,does not work well forBNs high time complex-ity compared to k-means, not suitablefor BNs its eﬃciency de-pends on howbig the networkis, there is apossibility of ob-taining inaccurateclusters although it is de-signed for large-scale datasets, it isnot as eﬃcient the computationaltime might behigher as the size ofdatasets increase .3. Network Embedding Algorithms The emerging accessibility of big networks containing billions of verticesand edges has signiﬁcantly progressed network analysis. Network embeddinglearns an eﬃcient low-dimensional vector representation for vertices. Due tothis, big data analysts consider implementing network embedding for numerousBN applications such as community detections, link predictions, vertex cluster-ing, recommendations, as well as network visualization. In network embeddingmethods, the distance amongst vertices in the vector space captures the in-teractions between vertices. A vertex’s structural and topological features areencoded into its vector representation. [92].The classical network representation commonly avails adjacency matrix,which might encompass redundant or noise information. Whereas the NetworkEmbedding Representation Learning (NRL) tends to learn the condensed andincessant vertices’ representations in a low-dimensional space. NRL not onlyminimizes the redundant and noisy information but also it maintains the fun-damental structure information [92]. The challenges happened during networkanalysis such as high computation can be prevented by calculating the distancemetrics on the embedding vector as well as by computing its mapping functions.Network embedding approaches overcome most of big networks representationand analysis challenges. Cui et al. [92] clearly illustrated the beneﬁts of adoptingnetwork embedding over the classical approaches. In this section, we brieﬂy ex-plain recently proposed state-of-the-art network embedding approaches on bothhomogeneous and heterogeneous networks.

DeepWalk [93] is a network representation learning model that uses un-supervised way to learn low-dimensional representations for vertices in socialnetworks. In DeepWalk, graphs are supposed to be given as an input, and itprovides an output of latent representations. Furthermore, DeepWalk learnsrepresentations according to the information found on the local network and itfurther identiﬁes the classiﬁcations of vertices through a random walk. The prin-33iple of DeepWalk method was later extended to a semi-supervised algorithmcalled Node2vec [94]. Node2vec amends the scheme of random-walk in Deep-Walk into tendentious random-walks which discovers various neighborhoods anda network structure more eﬀectively. Node2vec is a scalable algorithm appliedfor nodes to learn incessant aspect representations in a network [94]. Moreover,it learns the structure of vertices to a low-dimensional-featured space represen-tation that exploits the possibilities of maintaining neighborhood of vertices ina given network.Tu et al. [95] designed a method having the aim to overcome the limitation ofDeepWalk, which is referred to as “Max-Margin DeepWalk (MMDW)”. MMDWovercomes the learned representation incapability of discrimination during ap-plying to the machine learning process. MMDW is a semi-supervised NRL modelthat simultaneously enhances the max-margin classiﬁer as well as the targetedsocial NRL. Additionally, the learned representations in case of MMDW en-compass the attributes of discrimination besides the network structure. With asimilar objective, another method was proposed referred to Discriminative DeepRandom Walk (DDRW) [96]. Tu et al. [97] introduced a model name Context-Aware Network Embedding(CANE) assuming that a vertex could have diversiﬁed features when connectingwith diverse neighborhood vertices. Thus, CANE precisely designs the semanticrelationship amongst vertices. On top of that, CANE learns the context-awareembedding for each vertex, unlike other network embedding approaches pro-posed prior to CANE.Ribeiro et al. [98] presented a ﬂexible and robust framework called struc2vec to learn the latent representation by taking into account the structural identityof vertices in a network. Structural identity is a symmetry notion in whichvertices in a network are discovered based on the structure of the network andtheir connection to other vertices. The struc2vec method employs a hierarchicalapproach to quantify vertex similarity at a distinct range. Moreover, it builds34 multi-layer network for performing and generating the structural similaritiesas well as context for vertices, respectively. Enormous real-world networks that are a combination of vertices and edgeshave a dynamic nature that changes over time. Having considered that, schol-ars proposed a network embedding model called Dynamic Attributed NetworkEmbedding (DANE). DANE concerns learning a representation of the changingattributes of vertices in a dynamic network [99]. DANE is an online frame-work that can eﬀectively learn representation. DANE aimed to overcome somechallenges happened while embedding representation in a changing network.One of the challenges is the possibility of incomplete features of vertices andnoisy correlated network that demands a vigorous learning representation. Thismethod gives online end embedding results by using matrix perturbation theoryfollowing the consensus embedding representation. Likewise, Yang et al. [100]proposed a “MultiView Correlation-learning based Deep Network Embedding”method, shortly referred to as MVC-DNE. MVC-DNE especially contemplatesthe attributes of vertices as well as the overall network structure as two inter-connected views in which the learned embedded representation vector returnsits attributes in both views. Goyal et al. [101] proposed a method that employsedges in the network and labels associated with the edges for learning vertexembeddings. This method considers optimizing higher-order vertex neighbor-hood, roles, as well as characteristics of edges re-construction error by adoptingdeep-architecture.

A semi-supervised approach in the heterogeneous social network helps onclassiﬁcation and tagging of vertices where they are of diﬀerent types with theirlabels [102]. In this method, diﬀerent vertex types are brought together intocommon latent space where they share similar features. Thus, it overcomesthe limitation of direct connection for understanding the correlation between35ertices. Traditionally, heterogeneous networks are analyzed by mapping tohomogenous, which are unable to extract the complete information. In this ap-proach, a general assumption is that, vertices which are not directly connectedare inter-dependent. These dependencies cannot be captured using a homoge-nous approach. Furthermore, by learning the dependencies between heteroge-neous vertices, both local and global characteristics are captured.Chen et al. [103] addressed the problem of calculating distance measures be-tween the heterogeneous entities. In data-driven applications, security is depen-dent on the detection of anomalies. These events are heterogeneous, and mostof the exiting works use heuristic techniques to ﬁnd the score of the events. In[103], the authors modeled these embedded entities into a mutual latent spacebased on their occurrences. Speciﬁcally, pairwise compatibility of events is ob-served with the use of weighted interaction of diverse entity kinds. This modelmakes use of “Noise-Contrastive Estimation,” and it works well regardless ofthe latent space.Fu et al. [104] presented a model for neural network named HIN2Vec, whichis developed with the objective in representing the rich semantic informationembedded in heterogeneous vertices. The proposed model accepts a set of meta-paths which specify the relationships as the input. Also, it performs predictiontasks on a targeted set of relationships to learn latent vectors of vertices. Thismodel captures a broad class of semantic relationship between nodes based onthe context.Qu et al. [105] investigated the problem of optimal order for a selection ofedges in a heterogeneous star network. Heterogeneous star network comprises ofa central vertex and set of attribute vertices connected to the center vertex viavarious types of edges. Learning vertex representation in a heterogeneous starnetwork has a variety of applications. The other approaches did not considerthe order of sampling as a critical factor. However, the optimal order plays acritical role in understanding the low-dimensional vector. Qu et al. modeledlearning node representation problem using Markov decision process along withdeep reinforcement learning algorithm to capture the optimal order.36ang et al. [106] proposed a signed heterogeneous information network em-bedding method named SHINE. Wang et al. addressed the problem of labelinguser opinion in a heterogeneous information network. Existing approaches fo-cus mainly on the text for predicting user sentiment. Also, without explicitlabels and complexity in generating labels makes the tasks of prediction chal-lenging. Wang et al. [106] developed a labelled data set of user consisting ofuser sentiment, social relations, and proﬁle knowledge. Then, they use signedheterogeneous information networking framework for extracting latent repre-sentation for accurate predictions. SHINE uses deep learning based embeddingmechanism to understand and extract users’ inclination towards the topic.

5. Big Network Applications

This section comprises three subsections elaborating a wide range of state-of-the-art applications of BNs, including community detection approaches indiﬀerent categories, link prediction approaches as well as recommendation sys-tems. This review provides the fellow readers with a recent image of the stateof complex network ﬁeld from the viewpoint of BNs.

The main target of community detection is to disclose all available commu-nities in a network according to a speciﬁc deﬁnition of community for a givenproblem. A community is a collection of densely linked vertices locally andsparsely linked with global vertices. As “community” has been given variousdeﬁnition, it can be classiﬁed as follows [107]: i) hierarchical clustering that un-ravels the multilevel community structure of a graph by discovering the likenessfor each pair of nodes, ii) graph partitioning that splits the nodes of a networkinto k clusters of pre-deﬁned threshold, iii) spectral clustering that separatesthe graph by adopting the eigenvectors of the given graph matrix, and iv) par-tition clustering that splits nodes into k clusters in such a way that the likenessamongst nodes is maximized. 37he problem of discovering community structures of BNs is ubiquitous indiverse types of networks, for instance, biological networks [108]. Hence, it hasrecently been getting attention from scholars, although it is a problem which hasbeen studying since a longtime [109]. Discovering the community structure of anetwork provides vital understandings into network components, the local com-munity impact on the global ones, inﬂuential communities, and the like. Keepthis in mind, selecting a suitable algorithm to unravel the community structureof a BN can be challenging. Also, Sah et al. [108] discussed that the process ofdiscovering the accurate community structure within a network is complicateddue to the inconsistent meanings of “community”, and diﬀerent outputs fromdiﬀerent methods. As a result, most of the existing methods were evaluatedon small scale networks with known number of community. Thus, after doingextensive literature review on existing community detection algorithms; we aimto recommend relatively applicable methods ﬁtting BNs. Table 4: Categories of Community Detection Methods

Category Description Algorithm

Disjoint CommunityDetection There is connection among commu-nities, every node goes to one com-munal. Infomap [27], [110]Overlapping Commu-nity Detection There is a possibility of overlappingbetween communities, a node couldgo to numerous communities. Over-lapping community detection ﬁndssome complexstructures. [24]

In this section, we describe relatively suitable and recently proposed state-of-the-art community detection algorithms including the traditional label prop-agation [111], fast unfolding method [110], and random-walk based approaches38112, 113].

Traditional Community Detection Algorithms

Herein, we review some traditional methods such as heuristic and label prop-agation community detection techniques.

Label Propagation Method

Label Propagation Method (LPM) is designed according to label propaga-tion, mainly focuses on detecting communities local-wise [111]. The algorithmbegins by giving a distinctive label to each vertex and randomizes the order ofvertices. LPM performs the algorithm iteratively in which each vertex embracesa label that many of its neighbors possess. The algorithm terminates as longas every vertex has a label that happens to occur more often in the network.Thus, LPM constructs a community that is a collection of vertices with akinlabels [111].

Louvain Method

Louvain method is a heuristic approach that initially assigns a distinct com-munity to each vertex of a given network [110]. The community detectionprocess takes place in two stages. First, the method assumes that there will beas many communities as there are vertices. And, it quantiﬁes modularity gainby putting away a vertex from its community to other’s vertex community witha positive gain. Otherwise, the vertex will not be discarded from its initial com-munity. The algorithm repeats this process iteratively unless there is no needfor improvement. Secondly, the algorithm constructs newly created networkconsists of the communities generated in the ﬁrst stage. As stated in [110], theweight of links between new vertices is equal to the total summation of links’weights amongst vertices in the adjacent communities. Having done the secondstage, the louvain method re-runs the ﬁrst stage until no more changes of mod-ularity are demanded. This method could be comparatively applicable to BNsas it has been previously applied to large-scale networks like phone companies.39 andom-walk-based Community Detection Methods

Among all community detection approaches, random-walk based methodsinclined to discover network communities more or less accurate with the ground-truth ones [39]. In this section, we brieﬂy discuss existing random-walk basedcommunity detection methods which can be comparatively applicable for BNs.

Walktrap

This is designed with the perception that is “random walks on a graph tendto get ‘trapped’ into densely connected parts corresponding to communities.”Walktrap initializes the process and mainly computes distance, consequentlyby analyzing the structural correlation between vertices as well as similarityamongst communities. The computed distance is used to form vertices intocommunities. As discussed in [112], there will be a higher value of distance iftwo vertices located in diﬀerent communities; otherwise, the distance will belower. For detecting a community structure, they used a hierarchical clusteringapproach as well as adopted the agglomerating method. This is to reduce thecomputational complexity while calculating the distance. After identifying thecommunity structure of a given network, Walktrap merges adjacent communitieswhich have at least an edge amongst themselves.

CONCLUDE De et al. [113] proposed a random-walk-based method called CONCLUDE(COmplex Network CLUster DEtection) aiming to bring the eﬃciency of globalmethods and computational performance of local approaches together. In thismethod, for detecting communities, comprising the network’s topological struc-ture to heuristic algorithms is necessary. CONCLUDE introduced the concept“ κ -path edge centrality” while performing the process of community detection.CONCLUDE does the process in two phases. Firstly, it computes the “ κ -pathedge centrality” of each edge in the graph. Thus, they proposed “Edge RandomWalk κ -path Centrality (ERW-Kpath)” that measures the likelihood of edgesby applying a random-walk with a ﬁnite length of κ . In the second phase, it40omputes the distances amongst the entire pairs of linked nodes in the networkusing the estimation value of κ -path edge centrality and assigns them as edgeweights. Finally, it partitions the weighted network by adopting the LouvainMethod [110]. Leader-based Community Detection Algorithms

The literature on community detection shows a variety of approaches, wherenode centrality and graph-based methods are used widely to capture the under-lying structures in the community. Realizing the basis for the community has awide variety of applications.Shah et al. [114] discussed that the traditional clustering method fails to iden-tify the precise community structures as they depend on external connectivityproperties like graph-cuts. To overcome this limitation, the authors proposeda community detection approach based on leader-follower algorithm, which de-pends on the internal relationship of the expected community. The proposedmethod uses the idea of centrality in a novel fashion to diﬀerentiate leadersfrom followers. Further, the algorithm learns communities naturally withoutdepending on the knowledge of the estimated number of communities.Information networks such as protein-protein interactions in biology, callgraphs in telecommunication, and co-authorship in biometrics have dense con-nections within the group sharing common properties while sparse connectionoutside the group. Likewise, khorasgani et al. [115] proposed an algorithmthat identiﬁes all potential leaders along with their corresponding followers, i.e.,communities. Eventually, communities help realize the underlying structuresin social networks. Similarly, in [116], authors proposed “community central-ity” based on the assumption that low degree nodes surround node with a highdegree. Initially using community centrality node with the highest degree (com-munity center) is identiﬁed, later through the process of diﬀusion, the methodgenerates multiple community centers with various degrees.Yakoubi et al. [117] introduced an eﬃcient framework LICOD for analyzingthe performance of algorithms developed for community detection. Cohen et l. [118] proposed a node-centric overlapping community detection algorithm(NECTAR) on the basis of the well-known local search method, i.e., Louvainmethod [110]. This method is applied to overlapping community structures todeal with multi-community membership issues.Rossetti et al. [119] presented diﬀerent views on node-centric approaches inan online social network both in terms of static and dynamic scenarios using al-gorithmic and analytical procedures. Further, with the incomplete informationon network topology, node-centric or local, a community detection approach hasissues in identifying the community of a given node. To overcome this, Roberto et al. [120] proposed a multi-layer network-based framework by maximizing in-ternal density to external density ratio. Meanwhile, they also proposed a biasingscheme for identiﬁcation of diﬀerent degrees of layer coverage diversiﬁcation.Gmati et al. [121] developed Fast-Bi Community Detection (FBCD) basedon bipartite graphs with maximum set matching to reduce the complexity inexisting algorithms. Adding on, in [41], both link and node attribute based over-lapping community detection in social networks is proposed. Deng et al. [122]adopted Label propagation and fuzzy C-means for a community detection whereinitial labels are derived from neighbor nodes and revised using fuzzy C-meansmembership vector. Link Prediction (LP) estimates the presence of a link between vertices ina given network. The mechanism that dives network evolution gives a correctprediction of the network. The experiment of predicting new links is costlyin biological networks such as metabolic networks or protein-protein interactionnetwork. The experiments on real and complex networks demonstrate a diﬀerentrole gives an accurate prediction. The problem of link prediction is the mostvital topic which is being investigated by big data mining researchers [123].LP was ﬁrst introduced by Liben-Nowell and Kleinberg [124] aiming to predictnew future connections between vertices which could most likely appear in anetwork. 42oreover, link prediction is a model especially proposed for evolving net-works. There is a high possibility of newly created connections as well as thedeletion of existing connections in the evolving networks. For instance, in asocial network like Instagram, a user may form a link whenever she/he followsor followed by a user. At the same time, they can discard links by unfollowinga user. Furthermore, link prediction plays a vital role in recommendation sys-tems and the Internet of Things. The well-known example is a security networkin which link prediction is utilized to uncover subversive communities of crimi-nals or terrorists [125]. While for human behavioral networks, link prediction isadopted to unveil and classify the movement and activities of people in the net-work [126]. Moreover, link prediction also has various systems replicating socialconnections, e.g., email networks, sensor networks, as well as communicationnetworks.

Deﬁnition 5.

For a given network G ( V, E ) formed at a time t i , predict thefurther connections appeared in the network from the time the network wasinitially formed t i to the time the new connection created t n . Substantially, while implementing link prediction methods in a single-layerednetwork, there are three classical approaches including similarity measurementmethodologies , matrix factorization methods , as well as probabilistic graphicalmodel approaches [127]. In the case of similarity measurement methods, linkprediction approaches predict invisible connections by computing the similaritybetween vertices. Hence, the two vertices with higher similarity indicate thatthere is a high probability of forming a future connection. There are numerousapproaches proposed on the basis of the similarity measurement methodologyin which there are common parameters used in the approaches. Some of theparameters are global similarity index, indices of local similarity, and quasi-localstructures of a network (see Table 5).Having considered vertices structural similarities and their type eﬀect (i.e.,linking behavior of vertices), a promising LP algorithm has been proposed in43 able 5: LP Parameters Comparison LP Parameters Functionality CharacteristicsGlobal indexsimilarity Computes similarity of vertices by makinguse of the global structure data • High complexity • low speed in opera-tionLocal similarityindex This taking place according to vertex’sneighbors data E.g. Jaccard Coeﬃcientin which the probability of neighbor usedto compute the similarities of pairs of ver-tices [124]. • Low complexity • low accuracy • faster in operationQuais-localstructures Considers only two vertices to do similar-ity measurement and the longer paths willbe removed E.g, Local Path [128] and Su-perposed Random-Walk (SRW) [129] • Has settlements be-tween performanceand complexity44130]. The algorithm is specially designed for a heterogeneous military networkin which there are diﬀerent categories of vertices and edges. The authors claimedthat their algorithm outperforms the other existing similarity-based methods.Because their method predicts future connections as well as it identiﬁes pseudoconnections in a given network. Gao et al. [131] proposed a project-based LPmethod speciﬁcally for a bi-partite network. Aiming to reduce the computa-tional time complexity of LP operation, Gao et al. [131] came up with a newconcept that is “Candidate Node Pair (CNP)”. CNP works based on the pro-jected graph. A projected graph is a mapping of the bi-partite network onto auni-partite network [131]. Gao et al. [131] deﬁned CNP as follows.“Let G = ( U, V, E ) be a bipartite graph, B ∈ U and x ∈ V be two verticesin G , and ( B, x ) ∈ E . Denote the U-projected graph of G as G u = ( U, E u ).By adding a new link ( B, x ) ∈ U × V to G , then construct a bipartite graph G ′ = ( U, V, E ′ ), where E ′ = E ∪ ( B, x ). Let G ′ n = ( U, E ′ u ) be the U-projectedgraph of G ′ . If G u = G ′ u , then ( B, x ) is a CNP in graph G by U-projection.”While performing the link prediction, CNP is computed on the basis of theweights of patterns it contains. Furthermore, the algorithm has a linear timecomplexity of O ( m ) of a bi-partite network with n and m vertices in two distinctparts [131].As mentioned earlier, there are also LP methods proposed based on probabilistic-model-oriented . Having considered evolutionary networks, Steve et al. [132]proposed a statistical-model-based link prediction method called temporal ex-ponential random graph models (TERGM). Steve et al. [132] claimed that theirmodel performs well with promising results on dynamic networks like commu-nication networks, gene regulation circuitry, and so on. Ji et al. [133] proposeda link prediction model built upon two factors such as diversion delay and timeattenuation in user-object based networks. Moreover, in [133], link weight isconsidered so that diversion delay, as well as time attenuation, will be of a greatsigniﬁcance to forecasting invisible connections in a user-object network. Conse-quently, they developed “time-weighted network (TWN)” model by combiningthe factors with the lifecycle of users [133]. In [134], the authors presented45 Bayesian-based link prediction model considering both directed and nodes-attributed network. The model has features of estimating future connectionsas well as it explains each estimated connection. Moreover, they proved thattheir stochastic model generates accurate information in predicting connections[134].The other category of LP methods is matrix factorization . Gao et al. [135]proposed a model by taking into consideration the formulation of matrix factor-ization. The model proposed by Gao et al. [135] employs multiple informationsources in time-evolving networks so as to forecast the probabilities of connec-tions that could appear in the near future. The information exploited by themodel comprises three types, including the global structure of a network, ver-tex’s local information along with any available contents of vertices. Similarity measure based methods are mostly applied in complex and large-scale networks. Because learning-based LP methods such as probabilistic-basedand matrix-factorization-based methods take high computational time to de-velop and learn training data when applying them on BNs [136]. Ma et al. [137]analyzed and conﬁrmed the uniqueness of the structural characteristics of dif-ferent real-world networks. Having considered that, Ma et al. [137] proposeda link prediction method referred to an adaptive fusion model that considersvarious structural qualities of a network during the LP process. The modelis implemented as follows. First, it deﬁnes a logical function comprising dif-ferent structural features. Consequently, it employs the noted features for theadaptive determination of the weight of feature in the logistic data. Finally, itapplies the determined logistic function for obtaining the invisible or missingconnections in the given network. The model follows a local index in which itadopts the information of the closest as well as the next-close neighbors. Theauthors believed that this could reduce the computation time of their proposedalgorithm. Yazdi et al. [138] proposed a community structure based link pre-diction method with the goal of improving security-related issues that happen46n social networks. The main concern of their method is to prevent inaccurate,or fraud connections recommended to user in social networks [138]. They ex-ploited global structure information for mapping a network into a hyperbolicenvironment by adopting the structure of the network community. Moreover,Louvain community detection algorithm was employed for forming vertices indistinct clusters and forecasting future connections by performing an accurateanalysis of the relations of the vertices [138]. This method can be suitable forBNs regarding link prediction as it does the process by taking into account thenetwork’s community structure. More importantly, it suggests genuine connec-tions and controls scam recommendations. In [139], the authors proposed anovel similarity measure based LP method where network motifs are used as asource for estimating similarity. The method is relatively appropriate to solvethe LP problem of networks with billions of vertices and edges such as BN. Yao et al. [136] presented a similarity based LP method that mainly focuses on theinteraction between paths. In [140], the authors proposed a method by applyingthe activeness of vertices in a dynamic network. Their new active links analyzethe activeness of vertices. Having taken that into account, authors in [140] de-signed a hypothesis in which activeness of vertices and structure of the existingvertices inﬂuence the upcoming network. The activeness or popularity of edgesis built upon structural perturbation method so that it diﬀerentiates active aswell as in-active vertices from the network. Moreover, the perturbation methodis used to unveil new connections linked with popular vertices. On top of that,their method somehow minimizes the computational time compared to otherwell-known link prediction approaches [140].

Recommendation system is a way of ﬁltering information by predicting pref-erential products of users according to the data of their previous preferences.In essence, the recommender system tries to meet the interests and needs of theusers. It is signiﬁcant to manage bulky information and overcome the problemof information overloading [141]. Further, it makes life easier for internet users47y providing them with personalized content and appropriate services extractedfrom an enormous amount of information that evolves over time [142]. With theadvent of technology and emerging data, there is an increase in education re-sources, so a recommendation system introduced to education resource platform[4]. It is also an emerging research area that attracts much of scholars’ atten-tion, especially of computer scientists. Moreover, recommendation methodsare adopted by diﬀerent areas for diﬀerent reasons. Recommendation methodswere widely used in many application settings to suggest the services, products,and information items to consumers. For instance, they are mostly used ine-commerce for recommending products for individual users as per their prefer-ences and/or other users history. Using a recommendation method in researchcollaboration networks helps to ﬁnd well-experienced and productive collabora-tors in a certain research area one required [143]. Recommendation methodsbeneﬁt users by notifying their needs they might not have come across to; thismakes recommendation methods an alternative to search algorithms. Further-more, recommendation methods do not demand a user to enter any keywords;instead, they store users history and make use of them for a recommendation.On top of that, recommendation methods utilize link prediction techniques tofacilitate the process of recommendation.There are diﬀerent approaches to design a recommendation method, such ascontent-based ﬁltering, collaborative-ﬁltering, and hybrid-ﬁltering [141]. • Content-based ﬁltering: Recommendation methods designed on the basisof content-based ﬁltering consider the content information to notify indi-vidual users with relatable services (e.g., products, papers, movies, songs,books, etc.) with their history of preferences. Moreover, this approachpops up suggestions by utilizing the content from entities envisioned fora recommendation. So, analysis will be made on contents such as texts,sounds, as well as images. Based on the analysis, the recommendationmethod built a similarity based index amongst entities as a ground for48uggesting products that match with the product a target user has rated,searched, watched, visited, and bought. • Collaborative-ﬁltering: Recommendation methods designed on the basisof collaborative-based ﬁltering notify users by collaborating informationfrom multiple users history. Collaborative ﬁltering based recommendationmethods make way for a user to provide information about their experi-ence on particular services and store adequate information. Later on, theprovided information can be used to provide reliable recommendations tothe next users. For instance, a hotel recommendation system like trip.comsuggests to users as per the ratings of the hotel given by other previouscustomers and the target user preferences. • Hybrid-ﬁltering: Recommendation methods designed on the basis of hy-brid ﬁltering combine the features of collaborative-based and content-based ﬁltering techniques [141].

It is known that big networks, including biological networks, social networks,co-authorship networks, and the likes are composed of vertices and edges. Inmost of the cases, it is crucial to provide recommendations of vertices as well asedges for future connections. For instance, a collaboration network may needco-authors recommendation to form a research team on a speciﬁc research areawhich can be taken as a vertex recommendation problem. Recommendationmethods on big networks play a vital role in the perspective of reducing timecomplexity. For instance, during the process of forming a research team, rank-ing and identifying key vertices in a whole big network, and so on. The networkhas turned to be pervasive modeling way in several applications such as infor-mation and social networks [144]. As a result, it is vital to understand thenetwork structure that can be recommended depending on the circumstancesin hand. In [144], scholars discussed the varieties of scenarios that can be usedduring recommendation. Some of the scenarios discussed are the following.49) Recommendation of vertices by authority and context in which a vertex withhigh degree considered to be a quality one. ii) Recommendation of vertices byinstances in which similarity between vertices are considered. iii) Recommen-dation of nodes by inﬂuence and content in which the vertex that disseminatesinformation faster is more like to be recommended. iv) Recommendation of linkswhich is similar to link prediction problem. Bear this in mind, several scholarsproposed recommendation methods that can be applicable for BNs. Herewith,we discuss some selected recent and state-of-the-art recommendation methods.Liu et al. [145] proposed a context-aware collaborator recommendation method,intending to recommend collaborators by taking into consideration users’ con-textual preferences. They developed the algorithm in two modules: i) Collab-orative Entity Embedding (CEE) network, in which researchers and researchtopics are characterized by vectors according to their correlation, ii) Hierarchi-cal Factorization Model (HFM), in which it discovers researchers’ characteristicsregarding their activeness and conservativeness. The authors in [145] claimedthat these manifest researchers’ strength as well as interest to work with newresearchers with whom never they collaborated before. This method recom-mends new potential collaborators suitable for the required research topic. Asthey have shown in the paper [145], according to the experimental results, themethod can be applicable for BNs.Additionally, the authors in [146] proposed a method that provides topicrecommendations for authors in a bi-partite academic information network byadopting the similarity-based link prediction approach. The method estimatesthe likelihood of links that could appear between authors and topics in a givenacademic network. Yang et al. [147] proposed a nearest neighbor-based random-walk algorithm that adopts the features of a random walk with restart (RWR)and PageRank. This method is designed to provide recommendations of collab-orators by combining the given network features like network structure and thelikelihood of walking found on the basis of the collaboration history of individ-uals. With the objective to enhance the performance of singular-value decom-position recommendation method, Cui et al. [148] presented several context-50ware recommendation methods. These methods are extended according tothe singular-value decomposition approach. The proposed algorithms namelyreferred to as context-aware-SVD (CSVD) algorithm, two-level-SVD (TLSVD)algorithm, and context-aware two-level-SVD (CTLSVD) algorithm. The algo-rithms perform as follows. Initially, CSVD presents “time” as contextual infor-mation, and ﬁlters out inappropriate recommendations. Then, the TLSVD algo-rithm implemented to split the rating matrix into user and item matrices. Also,it splits the user matrix as well as the item matrix into other two diﬀerent matri-ces by employing singular-value decomposition [148]. At last, CTLSVD providesthe ﬁnal suitable recommendations using the combined results such as the can-didate recommendations ﬁltered using CSVD and the matrices created by usingTLSVD. The authors claimed that taking “time” as a context improves the per-formance, accuracy, and eﬀectiveness of the recommendation results CTLSVDgenerates at the end of the process. Having considered the fact that the tasksuploaded in crowd-sourcing systems are supposed to be completed by onlineworkers, researchers in [149] proposed a real-time recommendation algorithmsthat take in to account the classiﬁcations of posted tasks. This can speed up therecommendation process as well as it saves workers time they spend on selectingappropriate tasks to complete. The proposed method contains TOP-K-T andTOP-K-W algorithms. The TOP-K-T [149] algorithm beneﬁts online workersto ﬁnd the top-k most appropriate tasks. The TOP-K-W [149] algorithm makesthe ﬁnding of the top-k most potential workers in the crowd-sourcing systemseasier for the end-users. As far as the enormous amount of data and tasks takeplace in the crowd-sourcing system are considered, proposing a recommendationmethod to overcome the challenge is appreciable work. The authors believe thatthis work will have a valuable impact to manage crowd-sourcing systems [149].51 . Open Issues and Challenges

The dynamic features of a big network are fundamental that need to be ana-lyzed to comprehend the overall functionalities of a certain network. Moreover,the structure of networks changes depending on the dynamic nature of verticesand edges. Analyzing dynamic networks may not be as easy as managing thenetwork properties of static-based networks. Several works like [150] have beendone by researchers to facilitate the investigation of the dynamic nature of net-works. Those studies show as there is a signiﬁcant relationship between thedynamic nature and functionalities of a particular network. Hence, it is criticalto discover the network’s structure that changes over time. In most of the cases,connections amongst vertices are created, removed, and re-created along withtime. As an instance, in a collaboration network, connection between collabora-tors exists until they complete a certain task. Over time, when the task in handis completed, the connection will be deleted. If they happened to collaborateagain in the future, then the connection will re-appear. Analyzing evolutionarynetworks is very challenging, especially when there are billions of vertices andedges that appear/disappear over time. It is highly recommended that sometools have to be invented that make the analysis of dynamic networks easier.

The emerging volume of data in networks has become a very challengingtask to manage from the viewpoint of space and time. The time rate to analyzebig networks is not only long but also very costly and highly computational.Although various cloud platforms have been developed to store real-world bignetworks information, it is still an issue that should be considered. It is prefer-able to manage data locally, especially when the network to deal with is a dy-namic one that changes its structure over time. Hence, it is crucial and wise togive special consideration to the computational complexity of algorithms mainlydesigned for BNs. Some scholars have attempted to propose some approaches52ith the objective to reduce the computational complexity of BNs. For in-stance, [113] and [121] proposed community detection methods by taking intoaccount time complexity. With a similar objective, Gao et al. [131] and Ma etal. [137] proposed link prediction algorithms that could be applicable to BNswith relatively low computational time.

The inner structures of BNs are generally dense and complicated. With thegrowing scale of BNs, the basic processing unit has shifted from traditional nodesto higher-order network blocks, i.e., motifs, graphlets, subgraphs, components,etc. It has been proved that these higher-order structures are network blocks,especially in BNs. Therefore, ﬁnding more eﬃcient ways to detect, proﬁle,and process these higher-order network blocks is an emerging task at present.Although the higher order organization of the network has drawn scholars’ at-tention; however, there still exist many problems to be solved.

7. Conclusion

The study of a complex system is getting attention in almost all disciplinesfrom computer science to biotechnology, sociology, and so forth. On top of that,the world is ubiquitous that everything is surrounded by interrelated entitieswhich give both large-scale and complex sets of data. These sets of data con-tain entities along with their connections among each other. In this paper, weintroduced a new network science concept called big network. A big networkcomprises information vast in size with a complicated inner structure. Thus, wesurvey broadly in the area of big networks and give an overview of the up-to-datemodels, technologies, and applications of network analysis tasks concerning bignetworks, as well as future directions. This review paper will provide fellow re-searchers comprehending of the bottom line as well as critical issues on the ﬁeldof network science. Moreover, it provides a guideline framework that generallycontains comprehensive research topics.53 eferencesReferences [1] D. Tsiotas, Network stiﬀness: A new topological property in complexnetworks, PLoS one 14 (6) (2019) e0218477.[2] A. Garrido, A survey on complex networks, BRAIN. Broad Research inArtiﬁcial Intelligence and Neuroscience 2 (1) (2011) 63–70.[3] S. Yu, M. Liu, W. Dou, X. Liu, S. Zhou, Networking for big data: A survey,IEEE Communications Surveys & Tutorials 19 (1) (2016) 531–549.[4] F. Xia, W. Wang, T. M. Bekele, H. Liu, Big scholarly data: A survey,IEEE Transactions on Big Data 3 (1) (2017) 18–35.[5] S. Khan, X. Liu, K. A. Shakil, M. Alam, A survey on scholarly data:From big data perspective, Information Processing & Management 53 (4)(2017) 923–944.[6] X. Kong, Y. Shi, S. Yu, J. Liu, F. Xia, Academic social networks: Model-ing, analysis, mining and applications, Journal of Network and ComputerApplications 132 (2019) 86–103.[7] C. Steinbock, O. Biham, E. Katzav, Analytical results for the distributionof shortest path lengths in directed random networks that grow by nodeduplication, The European Physical Journal B 92 (6) (2019) 130.[8] A. M. Petersen, Quantifying the impact of weak, strong, and super tiesin scientiﬁc careers, Proceedings of the National Academy of Sciences112 (34) (2015) E4671–E4680.[9] S. G´omez, Centrality in networks: Finding the most important nodes,in: Business and Consumer Analytics: New Ideas, Springer, 2019, pp.401–433. 5410] I. Brugere, B. Gallagher, T. Y. Berger-Wolf, Network structure inference,a survey: Motivations, methods, and applications, ACM Computing Sur-veys (CSUR) 51 (2) (2018) 24.[11] M. Newman, Network structure from rich but noisy data, Nature Physics14 (6) (2018) 542.[12] L. Stone, D. Simberloﬀ, Y. Artzy-Randrup, Network motifs and theirorigins, PLOS Computational Biology 15 (4) (2019) 1–7.[13] T. Muki-Marttunen, An algorithm for motif-based network design,IEEE/ACM Transactions on Computational Biology and Bioinformatics(TCBB) 14 (5) (2017) 1181–1186.[14] P. Li, H. Dau, G. Puleo, O. Milenkovic, Motif clustering and overlappingclustering for social network analysis, in: INFOCOM 2017-IEEE Confer-ence on Computer Communications, IEEE, 2017, pp. 1–9.[15] J. Hu, R. Cheng, K. C.-C. Chang, A. Sankar, Y. Fang, B. Y. Lam, Dis-covering maximal motif cliques in large heterogeneous information net-works, in: 2019 IEEE 35th International Conference on Data Engineering(ICDE), IEEE, 2019, pp. 746–757.[16] W. Lin, X. Xiao, X. Xie, X.-L. Li, Network motif discovery: A gpu ap-proach, IEEE Transactions on Knowledge and Data Engineering 29 (3)(2017) 513–528.[17] S. Sun, Y. Che, L. Wang, Q. Luo, Eﬃcient parallel subgraph enumerationon a single machine, in: 2019 IEEE 35th International Conference on DataEngineering (ICDE), IEEE, 2019, pp. 232–243.[18] A. Al-Thaedan, M. Carvalho, Online estimation of motif distribution indynamic networks, in: 2019 IEEE 9th Annual Computing and Communi-cation Workshop and Conference (CCWC), IEEE, 2019, pp. 0758–0764.5519] P. Purkait, T.-J. Chin, A. Sadri, D. Suter, Clustering with hypergraphs:the case for large hyperedges, IEEE transactions on Pattern Analysis andMachine Intelligence 39 (9) (2017) 1697–1711.[20] I. Kabiljo, B. Karrer, M. Pundir, S. Pupyrev, A. Shalita, Social hashpartitioner: a scalable distributed hypergraph partitioner, Proceedings ofthe VLDB Endowment 10 (11) (2017) 1418–1429.[21] G. Bianconi, Statistical mechanics of multiplex networks: Entropy andoverlap, Physical Review E 87 (6) (2013) 062806.[22] D. Cellai, E. L´opez, J. Zhou, J. P. Gleeson, G. Bianconi, Percolation inmultiplex networks with overlap, Physical Review E 88 (5) (2013) 052811.[23] A. Li, S. P. Cornelius, Y.-Y. Liu, L. Wang, A.-L. Barab´asi, The fundamen-tal advantages of temporal networks, Science 358 (6366) (2017) 1042–1046.[24] N. Masuda, R. Lambiotte, A guidance to temporal networks, World Sci-entiﬁc, 2016.[25] D. R. Farine, When to choose dynamic vs. static social network analysis,Journal of Animal Ecology 87 (1) (2018) 128–138.[26] O. Michail, P. G. Spirakis, Elements of the theory of dynamic networks,Communications of the ACM 61 (2) (2018) 72–81.[27] S. Fortunato, D. Hric, Community detection in networks: A user guide,Physics Reports 659 (2016) 1–44.[28] D. G. Rand, M. A. Nowak, J. H. Fowler, N. A. Christakis, Static networkstructure can stabilize human cooperation, Proceedings of the NationalAcademy of Sciences 111 (48) (2014) 17093–17098.[29] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, D.-U. Hwang, Complexnetworks: Structure and dynamics, Physics Reports 424 (4-5) (2006) 175–308. 5630] D. I. Rubenstein, S. R. Sundaresan, I. R. Fischhoﬀ, C. Tan-tipathananandh, T. Y. Berger-Wolf, Similar but diﬀerent: dynamic socialnetwork analysis highlights fundamental diﬀerences between the ﬁssion-fusion societies of two equid species, the onager and grevys zebra, PLoSone 10 (10) (2015) e0138645.[31] D. G. Rand, S. Arbesman, N. A. Christakis, Dynamic social networkspromote cooperation in experiments with humans, Proceedings of the Na-tional Academy of Sciences 108 (48) (2011) 19193–19198.[32] D. Melamed, A. Harrell, B. Simpson, Cooperation, clustering, and assor-tative mixing in dynamic networks, Proceedings of the National Academyof Sciences 115 (5) (2018) 951–956.[33] L. E. Rocha, N. Masuda, Random walk centrality for temporal networks,New Journal of Physics 16 (6) (2014) 063023.[34] P. Holme, J. Saram¨aki, Temporal networks, Springer, 2013.[35] J. Koo, D.-K. Chae, D.-J. Kim, S.-W. Kim, Incremental c-rank: An ef-fective and eﬃcient ranking algorithm for dynamic web environments,Knowledge-Based Systems 176 (2019) 147–158.[36] F. Liu, D. Choi, L. Xie, K. Roeder, Global spectral clustering in dynamicnetworks, Proceedings of the National Academy of Sciences 115 (5) (2018)927–932.[37] R. P. Sarmento, L. Lemos, M. Cordeiro, G. Rossetti, D. Cardoso, Dyn-comm r package–dynamic community detection for evolving networks,arXiv preprint arXiv:1905.01498.[38] C. Aggarwal, K. Subbian, Evolutionary network analysis: A survey, ACMComputing Surveys (CSUR) 47 (1) (2014) 10.[39] B. Abrahao, S. Soundarajan, J. Hopcroft, R. Kleinberg, A separabil-ity framework for analyzing community structure, ACM Transactions onKnowledge Discovery from Data (TKDD) 8 (1) (2014) 5.5740] M. Cordeiro, R. P. Sarmento, J. Gama, Dynamic community detection inevolving networks using locality modularity optimization, Social NetworkAnalysis and Mining 6 (1) (2016) 15.[41] R. M´arquez, R. Weber, Overlapping community detection in static and dy-namic social networks, in: Proceedings of the Twelfth ACM InternationalConference on Web Search and Data Mining, ACM, 2019, pp. 822–823.[42] A. E. Sarıy¨uce, B. Gedik, G. Jacques-Silva, K.-L. Wu, ¨U. V. C¸ ataly¨urek,Sonic: streaming overlapping community detection, Data Mining andKnowledge Discovery 30 (4) (2016) 819–847.[43] A. Masoudi-Nejad, F. Schreiber, Z. R. M. Kashani, Building blocks of bi-ological networks: a review on major network motif discovery algorithms,IET systems biology 6 (5) (2012) 164–174.[44] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, U. Alon,Network motifs: simple building blocks of complex networks, Science298 (5594) (2002) 824–827.[45] A. Paranjape, A. R. Benson, J. Leskovec, Motifs in temporal networks, in:Proceedings of the Tenth ACM International Conference on Web Searchand Data Mining, ACM, 2017, pp. 601–610.[46] L. Stone, D. Simberloﬀ, Y. Artzy-Randrup, Network motifs and theirorigins, PLoS Computational Biology 15 (4) (2019) e1006749.[47] H. Zhao, X. Xu, Y. Song, D. L. Lee, Z. Chen, H. Gao, Ranking usersin social networks with higher-order structures, in: Thirty-Second AAAIConference on Artiﬁcial Intelligence, 2018.[48] M. Ritchie, L. Berthouze, T. House, I. Z. Kiss, Higher-order structure andepidemic dynamics in clustered networks, Journal of Theoretical Biology348 (2014) 21–32. 5849] M. Mongiov´ı, G. Micale, A. Ferro, R. Giugno, A. Pulvirenti, D. Shasha,glabtrie: A data structure for motif discovery with constraints, in: GraphData Management, Springer, 2018, pp. 71–95.[50] P. Ribeiro, F. Silva, G-tries: a data structure for storing and ﬁndingsubgraphs, Data Mining and Knowledge Discovery 28 (2) (2014) 337–377.[51] M. Linardi, Y. Zhu, T. Palpanas, E. Keogh, Matrix proﬁle x: Valmod-scalable discovery of variable-length motifs in data series, in: Proceedingsof the 2018 International Conference on Management of Data, ACM, 2018,pp. 1053–1066.[52] J. Luo, L. Ding, C. Liang, N. H. Tu, An eﬃcient network motif discoveryapproach for co-regulatory networks, IEEE Access 6 (2018) 14151–14158.[53] G. Bianconi, Multilayer Networks: Structure and Function, Oxford Uni-versity Press, 2018.[54] M. E. Dickison, M. Magnani, L. Rossi, Multilayer social networks, Cam-bridge University Press, 2016.[55] M. De Domenico, Multilayer modeling and analysis of human brain net-works, Giga Science 6 (5) (2017) gix004.[56] A. Cardillo, M. Zanin, J. G´omez-Gardenes, M. Romance, A. J. G. delAmo, S. Boccaletti, Modeling the multi-layer nature of the european airtransport network: Resilience and passengers re-scheduling under randomfailures, The European Physical Journal Special Topics 215 (1) (2013) 23–33.[57] P. J. Mucha, T. Richardson, K. Macon, M. A. Porter, J.-P. Onnela, Com-munity structure in time-dependent, multiscale, and multiplex networks,Science 328 (5980) (2010) 876–878.[58] A. Lancichinetti, S. Fortunato, Consensus clustering in complex networks,Scientiﬁc Reports 2 (2012) 336. 5959] M. De Domenico, A. Lancichinetti, A. Arenas, M. Rosvall, Identifyingmodular ﬂows on multilayer networks reveals highly overlapping organi-zation in interconnected systems, Physical Review X 5 (1) (2015) 011027.[60] R. J. Mondragon, J. Iacovacci, G. Bianconi, Multilink communities ofmultiplex networks, PLoS one 13 (3) (2018) e0193821.[61] C. Rahmede, J. Iacovacci, A. Arenas, G. Bianconi, Centralities of nodesand inﬂuences of layers in large multiplex networks, Journal of ComplexNetworks 6 (5) (2017) 733–752.[62] L. Sol´a, M. Romance, R. Criado, J. Flores, A. Garc´ıa del Amo, S. Boc-caletti, Eigenvector centrality of nodes in multiplex networks, Chaos: AnInterdisciplinary Journal of Nonlinear Science 23 (3) (2013) 033131.[63] A. Spitz, E.- ´A. Horv´at, Measuring long-term impact based on net-work centrality: Unraveling cinematic citations, PLoS one 9 (10) (2014)e108857.[64] L. Waltman, A review of the literature on citation impact indicators,Journal of informetrics 10 (2) (2016) 365–391.[65] F. Iannelli, A. Koher, D. Brockmann, P. H¨ovel, I. M. Sokolov, Eﬀectivedistances for epidemics spreading on complex networks, Physical ReviewE 95 (1) (2017) 012313.[66] A.-L. Barab´asi, et al., Network science, Cambridge university press, 2016.[67] H. Liao, M. S. Mariani, M. Medo, Y.-C. Zhang, M.-Y. Zhou, Ranking inevolving complex networks, Physics Reports 689 (2017) 1–54.[68] D.-B. Chen, H. Gao, L. L¨u, T. Zhou, Identifying inﬂuential nodes in large-scale directed networks: the role of clustering, PLoS one 8 (10) (2013)e77455. 6069] P. Hu, T. Mei, Ranking inﬂuential nodes in complex networks with struc-tural holes, Physica A: Statistical Mechanics and its Applications 490(2018) 624–631.[70] C. Salavati, A. Abdollahpouri, Z. Manbari, Bridgerank: A novel fast cen-trality measure based on local structure of the network, Physica A: Sta-tistical Mechanics and its Applications 496 (2018) 635–653.[71] H. Wei, Z. Pan, G. Hu, L. Zhang, H. Yang, X. Li, X. Zhou, Identifyinginﬂuential nodes based on network representation learning in complexnetworks, PLoS one 13 (7) (2018) e0200091.[72] D. Chen, L. L¨u, M.-S. Shang, Y.-C. Zhang, T. Zhou, Identifying inﬂuen-tial nodes in complex networks, Physica a: Statistical mechanics and itsapplications 391 (4) (2012) 1777–1787.[73] X. Zhang, J. Zhu, Q. Wang, H. Zhao, Identifying inﬂuential nodes incomplex networks with community structure, Knowledge-Based Systems42 (2013) 74–84.[74] H.-L. Liu, C. Ma, B.-B. Xiang, M. Tang, H.-F. Zhang, Identifying multipleinﬂuential spreaders based on generalized closeness centrality, Physica A:Statistical Mechanics and its Applications 492 (2018) 2237–2248.[75] M. Kellis, N. Patterson, M. Endrizzi, B. Birren, E. S. Lander, Sequencingand comparison of yeast species to identify genes and regulatory elements,Nature 423 (6937) (2003) 241.[76] D. B. Gordon, L. Nekludova, S. McCallum, E. Fraenkel, Tamo: a ﬂexible,object-oriented framework for analyzing transcriptional regulation usingdna-sequence motifs, Bioinformatics 21 (14) (2005) 3164–3165.[77] M. Kankainen, A. L¨oytynoja, Matlign: a motif clustering, comparison andmatching tool, BMC Bioinformatics 8 (1) (2007) 189.6178] N. Habib, T. Kaplan, H. Margalit, N. Friedman, A novel bayesian dna mo-tif comparison method for clustering and retrieval, PLoS ComputationalBiology 4 (2) (2008) e1000010.[79] R.-H. Li, L. Qin, J. X. Yu, R. Mao, Inﬂuential community search in largenetworks, Proceedings of the VLDB Endowment 8 (5) (2015) 509–520.[80] J. Zhan, V. Guidibande, S. P. K. Parsa, Identiﬁcation of top-k inﬂuentialcommunities in big networks, Journal of Big Data 3 (1) (2016) 16.[81] J. Li, X. Wang, K. Deng, X. Yang, T. Sellis, J. X. Yu, Most inﬂuentialcommunity search over large social networks, in: 2017 IEEE 33rd Interna-tional Conference on Data Engineering (ICDE), IEEE, 2017, pp. 871–882.[82] M. Doo, L. Liu, Extracting top-k most inﬂuential nodes by activity anal-ysis, in: Proceedings of the 2014 IEEE 15th International Conference onInformation Reuse and Integration, IEEE, 2014, pp. 227–236.[83] N. Du, X. Jia, J. Gao, V. Gopalakrishnan, A. Zhang, Tracking temporalcommunity strength in dynamic networks, IEEE Transactions on Knowl-edge and Data Engineering 27 (11) (2015) 3125–3137.[84] S. Faisal, G. Tziantzioulis, A. Gok, N. Hardavellas, S. Ogrenci-Memik,S. Parthasarathy, Edge importance identiﬁcation for energy eﬃcient graphprocessing, in: 2015 IEEE International Conference on Big Data, IEEE,2015, pp. 347–354.[85] R.-H. Li, L. Qin, J. X. Yu, R. Mao, Finding inﬂuential communities inmassive networks, The VLDB JournalThe International Journal on VeryLarge Data Bases 26 (6) (2017) 751–776.[86] F. Bi, L. Chang, X. Lin, W. Zhang, An optimal and progressive approachto online search of top-k inﬂuential communities, Proceedings of the VLDBEndowment 11 (9) (2018) 1056–1068.6287] J. Han, J. Pei, M. Kamber, Data Mining: Concepts and Techniques, El-sevier Science, 2011.[88] A. Bhat, K-medoids clustering using partitioning around medoids for per-forming face recognition, International Journal of Soft Computing, Math-ematics and Control 3 (3) (2014) 1–12.[89] L. Kaufman, P. J. Rousseeuw, Finding groups in data: an introduction tocluster analysis, Vol. 344, John Wiley & Sons, 2009.[90] R. T. Ng, J. Han, Clarans: A method for clustering objects for spatialdata mining, IEEE Transactions on Knowledge & Data Engineering (5)(2002) 1003–1016.[91] M. O. Shaﬁq, E. Torunski, A parallel k-medoids algorithm for clusteringbased on mapreduce, in: 2016 15th IEEE International Conference onMachine Learning and Applications (ICMLA), IEEE, 2016, pp. 502–507.[92] P. Cui, X. Wang, J. Pei, W. Zhu, A survey on network embedding, IEEETransactions on Knowledge and Data Engineering 31 (5) (2019) 833–852.[93] B. Perozzi, R. Al-Rfou, S. Skiena, Deepwalk: Online learning of socialrepresentations, in: Proceedings of the 20th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining, ACM, 2014, pp.701–710.[94] A. Grover, J. Leskovec, node2vec: Scalable feature learning for networks,in: Proceedings of the 22nd ACM SIGKDD International Conference onKnowledge Discovery and Data Mining, ACM, 2016, pp. 855–864.[95] C. Tu, W. Zhang, Z. Liu, M. Sun, et al., Max-margin deepwalk: Discrim-inative learning of network representation., in: IJCAI, 2016, pp. 3889–3895.[96] J. Li, J. Zhu, B. Zhang, Discriminative deep random walk for networkclassiﬁcation, in: Proceedings of the 54th Annual Meeting of the Asso-63iation for Computational Linguistics (Volume 1: Long Papers), Vol. 1,2016, pp. 1004–1013.[97] C. Tu, H. Liu, Z. Liu, M. Sun, Cane: Context-aware network embeddingfor relation modeling, in: Proceedings of the 55th Annual Meeting ofthe Association for Computational Linguistics (Volume 1: Long Papers),Vol. 1, 2017, pp. 1722–1731.[98] L. F. Ribeiro, P. H. Saverese, D. R. Figueiredo, struc2vec: Learning noderepresentations from structural identity, in: Proceedings of the 23rd ACMSIGKDD International Conference on Knowledge Discovery and DataMining, ACM, 2017, pp. 385–394.[99] J. Li, H. Dani, X. Hu, J. Tang, Y. Chang, H. Liu, Attributed networkembedding for learning in a dynamic environment, in: Proceedings of the2017 ACM on Conference on Information and Knowledge Management,ACM, 2017, pp. 387–396.[100] D. Yang, S. Wang, C. Li, X. Zhang, Z. Li, From properties to links: Deepnetwork embedding on incomplete graphs, in: Proceedings of the 2017ACM on Conference on Information and Knowledge Management, ACM,2017, pp. 367–376.[101] P. Goyal, H. Hosseinmardi, E. Ferrara, A. Galstyan, Embedding net-works with edge attributes, in: Proceedings of the 29th on Hypertextand Social Media, HT ’18, ACM, New York, NY, USA, 2018, pp. 38–42. doi:10.1145/3209542.3209571 .[102] Y. Jacob, L. Denoyer, P. Gallinari, Learning latent representations ofnodes for classifying in heterogeneous social networks, in: Proceedings ofthe 7th ACM International Conference on Web Search and Data Mining,ACM, 2014, pp. 373–382.[103] L. Tang, Z. Chen, K. Zhang, H. Chen, L. Zhichun, Entity embedding-64ased anomaly detection for heterogeneous categorical events, uS PatentApp. 15/427,654 (2017).[104] T.-y. Fu, W.-C. Lee, Z. Lei, Hin2vec: Explore meta-paths in heterogeneousinformation networks for representation learning, in: Proceedings of the2017 ACM on Conference on Information and Knowledge Management,ACM, 2017, pp. 1797–1806.[105] M. Qu, J. Tang, J. Han, Curriculum learning for heterogeneous star net-work embedding via deep reinforcement learning, in: Proceedings of theEleventh ACM International Conference on Web Search and Data Mining,ACM, 2018, pp. 468–476.[106] H. Wang, F. Zhang, M. Hou, X. Xie, M. Guo, Q. Liu, Shine: Signed het-erogeneous information network embedding for sentiment link prediction,in: Proceedings of the Eleventh ACM International Conference on WebSearch and Data Mining, ACM, 2018, pp. 592–600.[107] S. Fortunato, Community detection in graphs, Physics Reports 486 (3-5)(2010) 75–174.[108] P. Sah, L. O. Singh, A. Clauset, S. Bansal, Exploring community structurein biological networks with random graphs, BMC Bioinformatics 15 (1)(2014) 220.[109] M. E. Newman, Communities, modules and large-scale structure in net-works, Nature Physics 8 (1) (2012) 25.[110] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, E. Lefebvre, Fast unfold-ing of communities in large networks, Journal of Statistical Mechanics:Theory and Experiment 2008 (10) (2008) P10008.[111] U. N. Raghavan, R. Albert, S. Kumara, Near linear time algorithm todetect community structures in large-scale networks, Physical review E76 (3) (2007) 036106. 65112] P. Pons, M. Latapy, Computing communities in large networks using ran-dom walks., J. Graph Algorithms Appl. 10 (2) (2006) 191–218.[113] P. De Meo, E. Ferrara, G. Fiumara, A. Provetti, Mixing local and globalinformation for community detection in large networks, Journal of Com-puter and System Sciences 80 (1) (2014) 72–87.[114] D. Shah, T. Zaman, Community detection in networks: The leader-follower algorithm, Stat 1050 (2010) 2.[115] R. R. Khorasgani, J. Chen, O. R. Za¨ıane, Top leaders community detectionapproach in information networks, in: 4th SNA-KDD workshop on SocialNetwork Mining and Analysis 2010, Citeseer, 2010.[116] Y. Chen, P. Zhao, P. Li, K. Zhang, J. Zhang, Finding communities bytheir centers, Scientiﬁc Reports 6.[117] Z. Yakoubi, R. Kanawati, Licod: A leader-driven algorithm for communitydetection in complex networks, Vietnam Journal of Computer Science1 (4) (2014) 241–256.[118] Y. Cohen, D. Hendler, A. Rubin, Node-centric detection of overlappingcommunities in social networks, in: International Conference and Schoolon Network Science, Springer, 2017, pp. 1–10.[119] G. Rossetti, D. Pedreschi, F. Giannotti, Node-centric community discov-ery: From static to dynamic social network analysis, Online Social Net-works and Media 3 (2017) 32–48.[120] R. Interdonato, A. Tagarelli, D. Ienco, A. Sallaberry, P. Poncelet, Node-centric community detection in multilayer networks with layer-coveragediversiﬁcation bias, in: International Workshop on Complex Networks,Springer, 2017, pp. 57–66.[121] H. Gmati, A. Mouakher, A. Gonzalez-Pardo, D. Camacho, A new algo-rithm for communities detection in social networks with node attributes,Journal of Ambient Intelligence and Humanized Computing (2019) 1–13.66122] Z.-H. Deng, H.-H. Qiao, Q. Song, L. Gao, A complex network commu-nity detection algorithm based on label propagation and fuzzy c-means,Physica A: Statistical Mechanics and its Applications 519 (2019) 217–226.[123] P. Wang, B. Xu, Y. Wu, X. Zhou, Link prediction in social networks: thestate-of-the-art, Science China Information Sciences 58 (1) (2015) 1–38.[124] D. Liben-Nowell, J. Kleinberg, The link-prediction problem for social net-works, Journal of the American Society for Information Science and Tech-nology 58 (7) (2007) 1019–1031.[125] L. L¨u, T. Zhou, Link prediction in complex networks: A survey, PhysicaA: Statistical Mechanics and its Applications 390 (6) (2011) 1150–1170.[126] H. Sid Ahmed, B. Mohamed Faouzi, J. Caelen, Detection and classiﬁ-cation of the behavior of people in an intelligent building by camera.,International Journal on Smart Sensing & Intelligent Systems 6 (4).[127] Y. Cui, Y. Liu, J. Hu, H. Li, A survey of link prediction in informationnetworks, in: 2018 IEEE International Conference on Smart Internet ofThings (SmartIoT), IEEE, 2018, pp. 29–33.[128] T. Zhou, L. L¨u, Y.-C. Zhang, Predicting missing links via local informa-tion, The European Physical Journal B 71 (4) (2009) 623–630.[129] W. Liu, L. L¨u, Link prediction based on local random walk, EPL (Euro-physics Letters) 89 (5) (2010) 58007.[130] C. Fan, Z. Liu, X. Lu, B. Xiu, Q. Chen, An eﬃcient link prediction indexfor complex military organization, Physica A: Statistical Mechanics andits Applications 469 (2017) 572–587.[131] M. Gao, L. Chen, B. Li, Y. Li, W. Liu, Y.-c. Xu, Projection-based linkprediction in a bipartite network, Information Sciences 376 (2017) 158–171. 67132] S. Hanneke, W. Fu, E. P. Xing, et al., Discrete temporal models of socialnetworks, Electronic Journal of Statistics 4 (2010) 585–605.[133] J. Liu, G. Deng, Link prediction in a user–object network based on time-weighted resource allocation, Physica A: Statistical Mechanics and its Ap-plications 388 (17) (2009) 3643–3650.[134] N. Barbieri, F. Bonchi, G. Manco, Who to follow and why: link predictionwith explanations, in: Proceedings of the 20th ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Mining, ACM, 2014,pp. 1266–1275.[135] S. Gao, L. Denoyer, P. Gallinari, Temporal link prediction by integratingcontent and structure information, in: Proceedings of the 20th ACM Inter-national Conference on Information and Knowledge Management, ACM,2011, pp. 1169–1174.[136] Y. Yao, R. Zhang, F. Yang, J. Tang, Y. Yuan, R. Hu, Link predictionin complex networks based on the interactions among paths, Physica A:Statistical Mechanics and its Applications 510 (2018) 52–67.[137] C. Ma, Z.-K. Bao, H.-F. Zhang, Improving link prediction in complexnetworks by adaptively exploiting multiple structural features of networks,Physics Letters A 381 (39) (2017) 3369–3376.[138] K. M. Yazdi, A. M. Yazdi, S. Khodayi, J. Jiang, S. Saedy, A new linkprediction method for improving security in social networks, InternationalJournal of Computer Science and Network Security 18 (5) (2018) 84–91.[139] F. Aghabozorgi, M. R. Khayyambashi, A new similarity measure for linkprediction based on local structures in social networks, Physica A: Statis-tical Mechanics and its Applications 501 (2018) 12–23.[140] T. Wang, X.-S. He, M.-Y. Zhou, Z.-Q. Fu, Link prediction in evolvingnetworks based on popularity of nodes, Scientiﬁc Reports 7 (1) (2017)7147. 68141] F. Yu, A. Zeng, S. Gillard, M. Medo, Network-based recommendation al-gorithms: A review, Physica A: Statistical Mechanics and its Applications452 (2016) 192–208.[142] F. Isinkaye, Y. Folajimi, B. Ojokoh, Recommendation systems: Principles,methods and evaluation, Egyptian Informatics Journal 16 (3) (2015) 261–273.[143] X. Bai, M. Wang, I. Lee, Z. Yang, X. Kong, F. Xia, Scientiﬁc paperrecommendation: A survey, IEEE Access 7 (2019) 9324–9339.[144] C. C. Aggarwal, et al., Recommender systems, Springer, 2016.[145] Z. Liu, X. Xie, L. Chen, Context-aware academic collaborator recommen-dation, in: Proceedings of the 24th ACM SIGKDD International Confer-ence on Knowledge Discovery & Data Mining, ACM, 2018, pp. 1870–1879.[146] S. Aslan, M. Kaya, Topic recommendation for authors as a link predictionproblem, Future Generation Computer Systems 89 (2018) 249–264.[147] C. Yang, T. Liu, L. Liu, X. Chen, A nearest neighbor based personal rankalgorithm for collaborator recommendation, in: 15th International Con-ference on Service Systems and Service Management (ICSSSM), IEEE,2018, pp. 1–5.[148] L. Cui, W. Huang, Q. Yan, F. R. Yu, Z. Wen, N. Lu, A novel context-aware recommendation algorithm with two-level svd in social networks,Future Generation Computer Systems 86 (2018) 1459–1470.[149] M. Safran, D. Che, Real-time recommendation algorithms for crowdsourc-ing systems, Applied Computing and Informatics 13 (1) (2017) 47–56.[150] I. Sendi˜na-Nadal, Y. Ofran, J. A. Almendral, J. M. Buld´u, I. Leyva, D. Li,S. Havlin, S. Boccaletti, Unveiling protein functions through the dynamicsof the interaction network, PLoS one 6 (3) (2011) e17679.69.[102] Y. Jacob, L. Denoyer, P. Gallinari, Learning latent representations ofnodes for classifying in heterogeneous social networks, in: Proceedings ofthe 7th ACM International Conference on Web Search and Data Mining,ACM, 2014, pp. 373–382.[103] L. Tang, Z. Chen, K. Zhang, H. Chen, L. Zhichun, Entity embedding-64ased anomaly detection for heterogeneous categorical events, uS PatentApp. 15/427,654 (2017).[104] T.-y. Fu, W.-C. Lee, Z. Lei, Hin2vec: Explore meta-paths in heterogeneousinformation networks for representation learning, in: Proceedings of the2017 ACM on Conference on Information and Knowledge Management,ACM, 2017, pp. 1797–1806.[105] M. Qu, J. Tang, J. Han, Curriculum learning for heterogeneous star net-work embedding via deep reinforcement learning, in: Proceedings of theEleventh ACM International Conference on Web Search and Data Mining,ACM, 2018, pp. 468–476.[106] H. Wang, F. Zhang, M. Hou, X. Xie, M. Guo, Q. Liu, Shine: Signed het-erogeneous information network embedding for sentiment link prediction,in: Proceedings of the Eleventh ACM International Conference on WebSearch and Data Mining, ACM, 2018, pp. 592–600.[107] S. Fortunato, Community detection in graphs, Physics Reports 486 (3-5)(2010) 75–174.[108] P. Sah, L. O. Singh, A. Clauset, S. Bansal, Exploring community structurein biological networks with random graphs, BMC Bioinformatics 15 (1)(2014) 220.[109] M. E. Newman, Communities, modules and large-scale structure in net-works, Nature Physics 8 (1) (2012) 25.[110] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, E. Lefebvre, Fast unfold-ing of communities in large networks, Journal of Statistical Mechanics:Theory and Experiment 2008 (10) (2008) P10008.[111] U. N. Raghavan, R. Albert, S. Kumara, Near linear time algorithm todetect community structures in large-scale networks, Physical review E76 (3) (2007) 036106. 65112] P. Pons, M. Latapy, Computing communities in large networks using ran-dom walks., J. Graph Algorithms Appl. 10 (2) (2006) 191–218.[113] P. De Meo, E. Ferrara, G. Fiumara, A. Provetti, Mixing local and globalinformation for community detection in large networks, Journal of Com-puter and System Sciences 80 (1) (2014) 72–87.[114] D. Shah, T. Zaman, Community detection in networks: The leader-follower algorithm, Stat 1050 (2010) 2.[115] R. R. Khorasgani, J. Chen, O. R. Za¨ıane, Top leaders community detectionapproach in information networks, in: 4th SNA-KDD workshop on SocialNetwork Mining and Analysis 2010, Citeseer, 2010.[116] Y. Chen, P. Zhao, P. Li, K. Zhang, J. Zhang, Finding communities bytheir centers, Scientiﬁc Reports 6.[117] Z. Yakoubi, R. Kanawati, Licod: A leader-driven algorithm for communitydetection in complex networks, Vietnam Journal of Computer Science1 (4) (2014) 241–256.[118] Y. Cohen, D. Hendler, A. Rubin, Node-centric detection of overlappingcommunities in social networks, in: International Conference and Schoolon Network Science, Springer, 2017, pp. 1–10.[119] G. Rossetti, D. Pedreschi, F. Giannotti, Node-centric community discov-ery: From static to dynamic social network analysis, Online Social Net-works and Media 3 (2017) 32–48.[120] R. Interdonato, A. Tagarelli, D. Ienco, A. Sallaberry, P. Poncelet, Node-centric community detection in multilayer networks with layer-coveragediversiﬁcation bias, in: International Workshop on Complex Networks,Springer, 2017, pp. 57–66.[121] H. Gmati, A. Mouakher, A. Gonzalez-Pardo, D. Camacho, A new algo-rithm for communities detection in social networks with node attributes,Journal of Ambient Intelligence and Humanized Computing (2019) 1–13.66122] Z.-H. Deng, H.-H. Qiao, Q. Song, L. Gao, A complex network commu-nity detection algorithm based on label propagation and fuzzy c-means,Physica A: Statistical Mechanics and its Applications 519 (2019) 217–226.[123] P. Wang, B. Xu, Y. Wu, X. Zhou, Link prediction in social networks: thestate-of-the-art, Science China Information Sciences 58 (1) (2015) 1–38.[124] D. Liben-Nowell, J. Kleinberg, The link-prediction problem for social net-works, Journal of the American Society for Information Science and Tech-nology 58 (7) (2007) 1019–1031.[125] L. L¨u, T. Zhou, Link prediction in complex networks: A survey, PhysicaA: Statistical Mechanics and its Applications 390 (6) (2011) 1150–1170.[126] H. Sid Ahmed, B. Mohamed Faouzi, J. Caelen, Detection and classiﬁ-cation of the behavior of people in an intelligent building by camera.,International Journal on Smart Sensing & Intelligent Systems 6 (4).[127] Y. Cui, Y. Liu, J. Hu, H. Li, A survey of link prediction in informationnetworks, in: 2018 IEEE International Conference on Smart Internet ofThings (SmartIoT), IEEE, 2018, pp. 29–33.[128] T. Zhou, L. L¨u, Y.-C. Zhang, Predicting missing links via local informa-tion, The European Physical Journal B 71 (4) (2009) 623–630.[129] W. Liu, L. L¨u, Link prediction based on local random walk, EPL (Euro-physics Letters) 89 (5) (2010) 58007.[130] C. Fan, Z. Liu, X. Lu, B. Xiu, Q. Chen, An eﬃcient link prediction indexfor complex military organization, Physica A: Statistical Mechanics andits Applications 469 (2017) 572–587.[131] M. Gao, L. Chen, B. Li, Y. Li, W. Liu, Y.-c. Xu, Projection-based linkprediction in a bipartite network, Information Sciences 376 (2017) 158–171. 67132] S. Hanneke, W. Fu, E. P. Xing, et al., Discrete temporal models of socialnetworks, Electronic Journal of Statistics 4 (2010) 585–605.[133] J. Liu, G. Deng, Link prediction in a user–object network based on time-weighted resource allocation, Physica A: Statistical Mechanics and its Ap-plications 388 (17) (2009) 3643–3650.[134] N. Barbieri, F. Bonchi, G. Manco, Who to follow and why: link predictionwith explanations, in: Proceedings of the 20th ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Mining, ACM, 2014,pp. 1266–1275.[135] S. Gao, L. Denoyer, P. Gallinari, Temporal link prediction by integratingcontent and structure information, in: Proceedings of the 20th ACM Inter-national Conference on Information and Knowledge Management, ACM,2011, pp. 1169–1174.[136] Y. Yao, R. Zhang, F. Yang, J. Tang, Y. Yuan, R. Hu, Link predictionin complex networks based on the interactions among paths, Physica A:Statistical Mechanics and its Applications 510 (2018) 52–67.[137] C. Ma, Z.-K. Bao, H.-F. Zhang, Improving link prediction in complexnetworks by adaptively exploiting multiple structural features of networks,Physics Letters A 381 (39) (2017) 3369–3376.[138] K. M. Yazdi, A. M. Yazdi, S. Khodayi, J. Jiang, S. Saedy, A new linkprediction method for improving security in social networks, InternationalJournal of Computer Science and Network Security 18 (5) (2018) 84–91.[139] F. Aghabozorgi, M. R. Khayyambashi, A new similarity measure for linkprediction based on local structures in social networks, Physica A: Statis-tical Mechanics and its Applications 501 (2018) 12–23.[140] T. Wang, X.-S. He, M.-Y. Zhou, Z.-Q. Fu, Link prediction in evolvingnetworks based on popularity of nodes, Scientiﬁc Reports 7 (1) (2017)7147. 68141] F. Yu, A. Zeng, S. Gillard, M. Medo, Network-based recommendation al-gorithms: A review, Physica A: Statistical Mechanics and its Applications452 (2016) 192–208.[142] F. Isinkaye, Y. Folajimi, B. Ojokoh, Recommendation systems: Principles,methods and evaluation, Egyptian Informatics Journal 16 (3) (2015) 261–273.[143] X. Bai, M. Wang, I. Lee, Z. Yang, X. Kong, F. Xia, Scientiﬁc paperrecommendation: A survey, IEEE Access 7 (2019) 9324–9339.[144] C. C. Aggarwal, et al., Recommender systems, Springer, 2016.[145] Z. Liu, X. Xie, L. Chen, Context-aware academic collaborator recommen-dation, in: Proceedings of the 24th ACM SIGKDD International Confer-ence on Knowledge Discovery & Data Mining, ACM, 2018, pp. 1870–1879.[146] S. Aslan, M. Kaya, Topic recommendation for authors as a link predictionproblem, Future Generation Computer Systems 89 (2018) 249–264.[147] C. Yang, T. Liu, L. Liu, X. Chen, A nearest neighbor based personal rankalgorithm for collaborator recommendation, in: 15th International Con-ference on Service Systems and Service Management (ICSSSM), IEEE,2018, pp. 1–5.[148] L. Cui, W. Huang, Q. Yan, F. R. Yu, Z. Wen, N. Lu, A novel context-aware recommendation algorithm with two-level svd in social networks,Future Generation Computer Systems 86 (2018) 1459–1470.[149] M. Safran, D. Che, Real-time recommendation algorithms for crowdsourc-ing systems, Applied Computing and Informatics 13 (1) (2017) 47–56.[150] I. Sendi˜na-Nadal, Y. Ofran, J. A. Almendral, J. M. Buld´u, I. Leyva, D. Li,S. Havlin, S. Boccaletti, Unveiling protein functions through the dynamicsof the interaction network, PLoS one 6 (3) (2011) e17679.69