Flow approaches to community detection in complex network systems
Flow approaches to community detection in complex network systems
Olexandr Polishchuk
Laboratory of Modeling and Optimization of Complex Systems Pidstryhach Institute for Applied Problems of Mechanics and Mathematics, National Academy of Sciences of Ukraine, Lviv, Ukraine [email protected]
Abstract – The paper investigates the problem of finding communities in complex network systems, the detection of which allows a better understanding of the laws of their functioning. To solve this problem, two approaches are proposed based on the use of flows characteristics of complex network. The first of these approaches consists in calculating the parameters of influence of separate subsystems of the network system, distinguished by the principles of ordering or subordination, and the second, in using the concept of its flow core. Based on the proposed approaches, reliable criteria for finding communities have been formulated and efficient algorithms for their detection in complex network systems have been developed. It is shown that the proposed approaches make it possible to single out communities in cases in which the existing numerical and visual methods turn out to be disabled.
Keywords – complex network, network system, flow core, influence, community I NTRODUCTION
One of the important problems that is studied in the theory of complex networks is the search for groups of interconnected nodes (clicks, clusters, communities). Identification of such groups contributes to a better understanding of the principles of organization of complex networks (CN) and the operation processes of relevant systems. In real network systems (NS), the most widespread groups are the so-called communities – subnets, the connections between the nodes of which are more numerous and stronger than between them and other nodes of the CN [1]. Examples of communities in human society are public organizations, political parties, religious denominations, national diasporas, etc., which often play a significant role in the life of their states. Many communities exist in social networks, biological and physical systems [2-4], etc. Among the first methods for communities detection in complex network are the smallest cut, hierarchical clustering and click-based methods [5]. Algorithms based on the modularity estimation (Newman-Girvan, Blondel, Radicchi [6-8]), the spectral properties of the graph (Donetti-Munoz [9]), the estimation of network entropy (structural and dynamic methods of Rosvall-Bergstrom) and others are now widely used [10]. The main disadvantage of above mentioned algorithms for identifying communities in CN, along with computational complexity and resource consumption [11], is the lack of reliable theoretically sound criterion that defined by any of them a group of nodes actually forms a community [5, 12]. The "unreliability" of above algorithms has made popular the methods of visual search for communities [13, 14], especially in large networks. These methods are based on visual identification of CN’s components, in which the density of connections is definitely higher than the density of connections in the surrounding parts of network. Obviously, the results of such search are quite subjective. The large number of existing methods for communities detection confirms the great interest to this issue and its importance. I NTEGRAL FLOW ADJACENCY MATRIX AND FLOW CORE OF NETWORK SYSTEM
Complex networks are usually described as graphs in the form ),(
EVG , where V is the set of network nodes and E is the set of connections between them. The mathematical model of CN structure is a binary adjacency matrix Njiij a }{ A , where N is the number of network nodes. The values ij a of matrix A are equal to 1 if there is connection between nodes i n and j n , and equal to 0 if there is no such connection. Determine the integral flow adjacency matrix V ( t ) of the volumes of flows that have passed through the network edges for the period ],[ tTt up to the current time t ,)}(~{max )(~)(,)}({)( ,1,1, tVtVtVtVt mlNlm ijijNjiij V ,)()(~ tTt ijij dvtV where )( tv ij is the volume of flow that is on the network edge ),( ji nn at the time Tt , Nji ,1, . Matrix V( t ), the structure of which is identical to the structure of matrix A , is based on empirical data about the movement of flows through the network and gives a sufficiently clear quantitative view about NS operation, allows us to analyze features and predict the behavior of this process, and evaluate its effectiveness and prevent existing or potential threats [15]. Introduce the concept of flow -core of the network system [16], as the largest subnet of source network, for which all elements of the integral flow adjacency matrix V ( t ) have values )( tV ij , Nji ,1, , ]1,0[, Tt . Among other things, the flow -core of the NS allows us to determine in its structure the most important from a functional point of view components [16]. C OMMUNITIES DETECTION BASED ON SYSTEM HIERARCHIES
In real systems, the first "candidates" in the communities are the subsystems of different hierarchical levels, built on the principles of ordering or subordination [17, 18]. Let us the source network system S is divided into M subsystems Mm mm
SSS , the sets of nodes m NimimS nH }{ of which do not intersect, Mm ,1 . Denote by outS m G the set of all nodes-generators of flows, which are included in the set mS H . Determine by means of parameter ))((/)()( tstt outmSm Gi outioutS V the strength of influence of subsystem m S on NS at a whole. Here )( t outi is a volume of output flows generated in the node i n from the set outS m G and Ni Nj ij tVts )())(( V is the total volume of flows that have passed through the network per period [ t–T , t ]. Let us outimGioutS RR outmSm , is a set of numbers of nodes which are the final receivers of flows generated in the nodes belonging to the set outS m G . Divide the set outS m R into two subsets, namely out extSoutintSoutS mmm RRR ,, , where outintS m R , is the subset of nodes outS m R belonging to mS H , and out extS m R , is the subset of nodes outS m R belonging to addition to mS H in the source network. The set out extS m R , will be called the domain of output influence of the subsystem m S on NS at a whole. The external and internal output strength of influence of the nodes-generators of flows belonging to the set outS m G on the subnets out extS m R , and outintS m R , determine using the parameters )),((/)()( , , tstt out extmSm Ri outiout extS V ))((/)()( int, , tstt outmSm Ri outiout intS V Then the value )()()( ,, ttt outintSoutextSoutS mmm determines the relative strength of influence of subsystem m S on the network system as a whole. Namely, the smaller the value of parameter outS m , the smaller the strength of influence of subsystem m S on the NS, Mm ,1 . Denote by inS m R the set of all nodes – final receivers of flows, which are included in the set mS H . Determine by means of parameter ))((/)()( tstt inmSm Ri iniinS V the strength of influence of network system on subsystem m S , Mm ,1 . Here )( t ini is a volume of input flows received in the node i n from the set inS m R per period [ t–T , t ]. Let us in imRiinS GG inmSm , is a set of numbers of nodes-generators from which flows are directed to nodes belonging to the set inS m R . Divide the set inS m G into two subsets, namely in extSin intSinS mmm GGG ,, , where in intS m G , is the subset of nodes inS m G belonging to mS H , and in extS m G , is the subset of nodes inS m G belonging to addition to mS H in the source network. The set in extS m G , will be called the domain of input influence of the network system on subsystem m S . The external and internal input strength of influence of the nodes – final receivers of flows belonging to the set inS m R on the subnets in extS m G , and in intS m G , determine using the parameters )),((/)()( , , tstt out extmSm Gi iniin extS V ))((/)()( , , tstt out intmSm Gi iniin intS V Then the value )()()( ,, ttt in intSin extSinS mmm determines the relative strength of influence of network system on subsystem m S . Namely, the smaller the value of parameter inS m , the smaller the strength of influence of NS on subsystem m S , Mm ,1 . The pair of parameters ),( inSoutS mm forms an objective criterion of whether the subsystem m S forms a community in the network system. Indeed, the smaller the value of these parameters, the smaller the external interaction of the subsystem m S with the system as a whole and the greater the interactions within the subsystem, which is, in essence, the definition of community. We can also use the betweenness parameters of subsystem m S to build an objective criterion and corresponding algorithm for detection of communities-subsystems in the source network system, Mm ,1 , [15]. C OMMUNITIES AND FLOW CORES OF NETWORK SYSTEMS
Obviously, one of the most objective indicators of connection strength between two network nodes is the volume of flows that pass through the edge connected them over a period of time ],[ tTt , or in other words, the values of elements of integral flow adjacency matrix Ttt ),( V . This means that if during the construction of -core of the source NS (Fig. 1a – source CN, 1b – source NS with the reflected -core) with a consistent increase of value at a certain value the flow -core is divided into unconnected components (Fig. 1c) , then the largest communities in the network system are detected. Importantly, the structure and consist of the nodes and connections of these communities are clearly determined from the matrix Ttt ),( V . If with further growth at a certain value detected in the previous step communities are again divided into unconnected components, we obtain sub-communities of these communities (Fig. 1d), etc. In contrast to the first approach, the use of flow cores of network system allows us not only to identify a particular subsystem as a community, but to perform a global search of all communities in the network system. Note that none of numerical algorithms mentioned in the introduction, as well as the visual methods, makes it possible to detect the presence of communities in the image in Fig. 1a the simplest regular network. Similar examples can be given for much more complex real network structures [16]. a) b) c) d) Fig. 1. Use of flow -cores for communities detection in complex network system C ONCLUSIONS
The paper determines the importance of problem of communities detection in complex network systems and briefly analyzes the shortcomings of known numerical and visual methods of solving this problem. Examples are shown that demonstrate the inefficiency of their use due to the lack of mathematically sound search criteria. The integral flow adjacency matrix of complex network system, parameters of influence of its separate subsystems and the concept of its flow core are defined which allowed to formulate objective criteria of communities detection in complex network and to develop effective algorithms of such detection.
References [1]
Newman M. E. J. (2012) Communities, modules and large-scale structure in networks. Nature Physics, Vol. 8, P. 25–31. [2]
Newman M. E. J. (2004) Detecting community structure in networks. European Physical Journal B, Vol. 38, No. 2, P. 321–330. [3]
Polishchuk A. D. (2003) Simple and double layer potentials in the Hilbert spaces. Proceedings of 8th International Seminar/Workshop on Direct and Inverse Problems of Electromagnetic and Acoustic Wave Theory DIPED 2003, P. 94-97. [4]
Khan B. S., Niazi M. A. (2017) Network community detection: A Review and Visual Survey. arXiv: 1708.00977 [cs.SI]. [5]
Kolomeichenko M. I., Poliakov I. V., Chepovsky A. A., Chepovsky A. M. (2016) Communities detection in graph of interacting objects. Fundamental and applied mathematics, Vol. 21, No. 3, P. 131-139. [6]
Girvan M., Newman M. E. J. (2002) Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the USA, Vol. 99, P. 7821-7826. [7]
Blondel V. D., Guillaume J.-L., Lambiotte R., Lefebvre E. (2008) The Louvain method for community detection in large networks. Journal of Statistical Mechanics. Theory and Experiments, P. 108-121. [8]
Radicchi F., Castellano C., Cecconi F., Loreto F., Parisi D. (2004) Defining and identifying communities in networks. Proceedings of the National Academy of Sciences of the USA, Vol. 101, No. 9, P. 2658-2663. [9]
Donetti L., Mu˜noz M. A. (2005) Improved spectral algorithm for the detection of network communities. arXiv: physics/0504059 [physics.soc-ph]. [10]
Rosvall M., Bergstrom C. T. (2007) An information-theoretic framework for resolving community structure in complex networks. Proceedings of the National Academy of Sciences of the USA, Vol. 104, No. 18, P. 7327-7331. [11]
Polishchuk O. D., Tyutyunnyk M. I., Yadzhak M. S. (2007) Quality evaluation of complex systems function on the base of parallel calculations. Information Extraction and Processing, Vol. 26, No. 102, P. 121-126. [12]
Lambiotte R., Rosvall M. (2012) Ranking and clustering of nodes in networks with smart teleportation. Physical Review E, Vol. 85, No. 5, 056107. [13]
Babak F., Naghmeh M. (2015) Growing multiplex networks with arbitrary number of layers. arXiv: 1506. 06278v2 [physics.soc-ph]. [14]
Kolomeichenko M. I., Chepovsky A. M. (2014) Large graph visualization and analysis. Business Informatics, Vol. 30, No. 4, P. 7-16. [15]
Polishchuk O. D., Yadzhak M. S. (2018) Network structures and systems: I. Flow characteristics of complex networks. System research and informational technologies, No. 2, P. 42-54. [16]
Polishchuk O. D., Yadzhak M. S. (2018) Network structures and systems: II. Cores of networks and multiplexes. System research and informational technologies, No 3, P. 38-51. [17]
Polishchuk O. D., Polishchuk D. O., Tyutyunnyk M. I., Yadzhak M. S. (2015)
Issues of regional development and evaluation problems. AASCIT Communications, Vol. 2, No. 4, P. 115-120. [18]
Polishchuk O. D., Yadzhak M. S. (2018) Network structures and systems: III. Hierarchies and networks. System research and informational technologies, No. 4, P. 82-95.(2018) Network structures and systems: III. Hierarchies and networks. System research and informational technologies, No. 4, P. 82-95.