[PDF] Structure controllability of complex network based on preferential matching

Abstract

Minimum driver node sets (MDSs) play an important role in studying the structural controllability of complex networks. Recent research has shown that MDSs tend to avoid high-degree nodes. However, this observation is based on the analysis of a small number of MDSs, because enumerating all of the MDSs of a network is a #P problem. Therefore, past research has not been sufficient to arrive at a convincing conclusion. In this paper, first, we propose a preferential matching algorithm to find MDSs that have a specific degree property. Then, we show that the MDSs obtained by preferential matching can be composed of high- and medium-degree nodes. Moreover, the experimental results also show that the average degree of the MDSs of some networks tends to be greater than that of the overall network, even when the MDSs are obtained using previous research method. Further analysis shows that whether the driver nodes tend to be high-degree nodes or not is closely related to the edge direction of the network.

Full PDF

SStructural controllability of complex networks based on preferential matching

Xizhe Zhang , Tianyang Lv , XueYing Yang , Bin Zhang （ College of Information Science and Engineering, Northeastern University, Shenyang110819, China ）（ College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China) （ College of Computer Science and Technology, Tsinghua University, Beijing 100084, China) （ Audit Research Institute, National Audit Office, Beijing 100830, China) * Email: [email protected]

Abstract

Introduction

Controlling complex systems is a critical topic in many applications. A system is called controllable if it can be driven from any initial state to any desired state in a finite time. Previous researches have usually adopted a complex network as the fundamental model to analyze the topological structure , the evolving model , and the dynamic behavior of complex systems. However, we still lack a thorough understanding of how to control complex networks. According to the control theory, a linear time-invariant system whose states are determined by the following equation: ( ) ( ) ( ) dx t Ax t Bu tdt   (1) where the vector x ( t )=( x ( t ), …, x N ( t )) T , denotes the state of N nodes in the network at time t , A is the transpose of the adjacency matrix of the network, B is the input matrix that defines how control signals are inputted to the network, and u ( t )=( u ( t ), …, u H ( t )) T represents the H input signals at time t . A node whose control signal is directly inputted is called a driver node. The minimum sets of driver nodes to control a network are called the minimum driver nodes sets (MDSs). Lin presented a network representation of linear time-invariant systems and stated that the system is structurally controllable if and only if the network can be spanned by cacti structures. Commault proved that the minimal signals need to control a network can be obtained by maximal matching of network. Based on above works, Liu developed an analysis tool to study the controllability of an arbitrary complex directed network, and found that MDSs tend to be composed of low-degree nodes in both real and model networks. owever, the maximum matching of a network is usually not unique , and thus neither are the MDSs. Previous studies have only randomly sampled MDSs and analyzed a small number of the MDSs of a network because enumerating all possible maximum matchings is in the class of . Therefore, the past researches have not been sufficient to arrive at a convincing conclusion about whether MDSs tend to avoid high degree nodes or not. In this paper, we propose a preferential matching algorithm to find some MDSs with desired degree properties. To find these MDSs, the algorithm arranges the matching order of the nodes according to their degree rank. Because low-ranking nodes have higher probabilities of being driver nodes, the obtained MDSs tend to be composed of the high- or the medium-degree nodes of the network. The algorithm can also be applied to obtain the MDSs with other topological properties. By using the preferential matching algorithm, we found that there were some MDSs composed mainly of high- and medium-degree nodes in some networks. Moreover, in some networks, the average degree of the MDSs tended to be greater than that of the overall network, even if the MDSs were obtained using the previous random-matching method. We conclude that there are networks that favor low-degree MDSs and other networks that favor high-degree MDSs. To find the underlying reason for this phenomenon, we designed a directed BA model for model networks and a reversal strategy for the edge direction for real networks. The experimental results showed that the MDSs of the network tended to be composed of high-degree nodes if the majority of the edges of a network were pointing from high-degree nodes to low-degree nodes; otherwise, the MDSs of the network tended to be composed of low-degree nodes. Therefore, whether the driver nodes tended to be high degree or not was closely related to the edge direction of the network.

Preferential Matching Algorithm

First, we will briefly introduce the basic concepts of maximum matching. For a directed network G , V ( G ) is the node set and E ( G ) is the edge set, with N =| V | and L =| E |. A set of edges in G is called a matching M if no two edges in M have a node in common. A node v i is matched by M if there is an edge of M pointing to v i , otherwise v i is unmatched. A path P is said to be M -alternating if the edges of P are alternately in and not in M . An M -alternating path P that starts and ends at the unmatched nodes is called an M -augmenting path. A matching with the maximum number of nodes is called a maximum matching M * . A matching M is called a perfect matching if all of the nodes of G are matched by M . The minimum input theorem proves that if there is a perfect matching in a network, the number of driver nodes is one, otherwise the number of driver nodes is equal to the number of unmatched nodes with respect to any maximum matchings. And the driver nodes are unmatched nodes. T he size of the maximum matching M * is denoted | M * |. The minimum number of driver nodes is thus   * max ,1 D n N M   (2) Based on this theorem, the MDSs can be obtained by finding the maximum matchings of a network. Therefore, it is critical to find all of the maximum matchings. Previous maximum matching algorithms, such as Hopcroft-Karp and the Hungarian algorithm , are based on the theorem proposed by Berge . That theorem proves that M * is a maximum matching if and only if here is no augmenting path in G relative to M * . Therefore, the basic idea of the maximum matching algorithm is as follows: first, find an augmenting path from each unmatched node by current matching M (initially M = φ ), then obtain an expanded matching M ’. Repeat the first and the second steps until no augmenting path exists. The final matching is a maximum matching. Using this process, once a node v i becomes a matched node, it will be matched by the final maximum matching and won’t be a driver node. Therefore, if we deliberately arrange the matching order of nodes according to the order of degree, we would find MDSs with a desired degree property such as finding some high-degree MDSs, particularly when a network has many maximum matchings. However, the matching order of nodes is determined by the time when a node first appears in the augmenting path, but the time is hard to be pre-decided. It is possible that a node with a high degree appears very early in an augmenting path, even if it is arrange to be the last one as the start of augmenting paths. For example, we can sort the nodes as { v , v , v , v , v , v , v } in the ascending order by degree and treat this order as the input sequence to select the unmatched start node in finding an augmenting path. But we may find an augmenting path P v → v → v → v at the very first step. Although the path starts from v with the lowest degree, it contains the highest degree nodes v , v and v and these nodes cannot be the driver nodes of the final MDSs. Thus, the matching order of the nodes would be quite different from the degree order of the nodes, and the MDSs with a desired degree property could not be easily found. To overcome this problem, we designed an iterative preferential matching method. We sort the nodes as { v , v ,… v n } in the ascending order by degree and denote m as the number of preferential matching nodes. The method starts from the sub graph H with the lowest-degree node ranked first; at each iterative step i , the sub graph H i will be extended by adding the node with the i -th rank, and the maximum matching of H i is calculated based on the previously obtained maximum matching of H i-1 . We repeat this procedure until the sub graph H i is equal to the whole network or until m preferential nodes have been added. Details of the preferential matching method are as follows: 1. Sort nodes as { v , v ,… v n }, H ={ v }, M * = φ , i =1; 2. Set H i = H i -1 +{ v i } and find a maximum matching M *i of H i based on M *i -1 , i = i +1; 3. Repeat step 2 until i = m ; 4. If m < N , find the resulting maximum matching M * of G based on M *m ; else M *m is the resulting maximum matching of G , and the MDS is composed of the unmatched nodes with respect to M * . An example of the proposed method is shown in Figure 1. We obtain a maximum matching of G in the step 4. And, as with current algorithms

12, 21 , once v i is matched in the process, it must be matched by the resulting maximum matching. The proposed method ensures that we can find the maximum number of matched nodes of H i from the first i ranking nodes and that a high-degree node will not be matched in early steps because the node is not included in the early sub-graphs. Therefore, we can make the matching order of the nodes as similar as possible to the predefined order of degrees. Thus, high-degree nodes will have a higher probability of being the driver nodes. However, the order of arrangement has no influence on some particular nodes, for instance the nodes with zero in-degree must be driver nodes no matter what the input order is. Experimental Results and Analysis o analyze the degree property of MDSs, we selected 21 real networks that belong to categories, including trust networks, food networks, electric networks, neuronal networks, citation networks, the World Wide Web, the internet, social communication networks and social organization networks. Table 1 shows the average degree of a network < k >, the size of the networks’ MDSs n D , and the fraction of driver nodes λ D = n D / N . First, we find the MDSs with the desired high-degree property based on the preferential matching algorithm. Let < k D > be the average degree of the MDSs obtained under a different number m of preferential nodes, and let and < k D min > be the maximum and the minimum < k D > of all of the obtained MDSs, respectively. Figure 2 shows the variation in < k D > versus m in the real and model networks. Obviously, the preferential matching method can find MDSs with the preferred high-degree property, and the high-degree property becomes clearer with the increment of m . If the nodes are sorted in ascending order according to degree, < k D > will increase with m to the upper bound < k Dmax > ; if the nodes are sorted in descending order according to degree, < k D > will decrease with m to the lower bound < k Dmin >. From Table 1 and Figure 2, a basic observation was that the MDS s were structurally diverse : the < k D > of many networks varied widely. Thus, the different MDSs of the same network could have quite different degree properties. Moreover, < k Dmax > was greater than < k > in many networks, such as the Grassland , Seagrass , Ythan , and

Florida networks. Therefore, we were able to find the MDSs whose < k D > was greater than the average degree of the network. To further verify the above observation, we analyzed the degree distribution of driver nodes of the MDSs with high < k D >. We computed the MDS with the highest average degree < k Dmax > by using the preferential matching method. Figure 3 shows the results of some real and model networks. In Figure 3, each point corresponds to the set of nodes with the specific degree k . The black point means that no node with the degree k appears in the result MDS, and the red point means that some nodes with the degree k appear in the result MDS. The inset graph shows the degree distribution of all driver nodes of the MDS with < k Dmax > . Therefore, if all red points have high degree, the MDS tends to be composed of high-degree nodes. It can be seen from Figure 3 that there do exist the MDS mainly composed of high- or medium- degree nodes in some networks. Taking the world-trade network as an example, 66.2% of its nodes have k ≤20, but none of these low-degree nodes appeared in the result MDS; meanwhile, 88.9% of the rest high-degree nodes with k >20 appeared in the MDS. Similar results can be observed in the BA and ER networks. However, not all networks had the MDS mainly composed of high-degree nodes. The MDS with < k Dmax > of some networks was composed of the nodes with degree ranging from the lowest degree to the highest, such as the seagrass , florida and c.elegans networks, while the MDS with < k Dmax > of other networks was mainly composed of the low-degree nodes, such as the P2P- network. Second, we tried to prove that the average degree of the MDSs of some networks tended to be greater than that of the overall networks, even if the MDSs were obtained using the previous random matching method. In the experiment, we randomly sampled 10,000 different MDSs of each network. Table 1 shows the average value 〈𝑘 𝐷 〉̅̅̅̅̅̅ of the average degree of all of the sampled MDSs because the average degree of the different MDSs varied. We found that the 〈𝑘 𝐷 〉̅̅̅̅̅̅ of some networks, such as the Zewail , world trade and literature networks, were greater than or equal to < k > even when using the previous sample method . Finally, these experimental results provoked us to explain why the driver nodes of some etworks tended to be low degree while others were not. According to the minimum input theorem, a driver node is not pointed to by any matched edge. Therefore, if the majority of edges of a network point from high-degree nodes to low-degree nodes, the MDSs tend to be composed of high-degree nodes. Otherwise, the MDSs tend to be composed of low-degree nodes. Figure 4 gives an example where two networks have the same topology except that the directions of their edges are opposite. The edges of the network in Figure 4( a ) are pointing to the low-degree nodes, while the edges in Figure 4( b ) are pointing to the high-degree nodes. Therefore, they have very different MDSs. The driver nodes of network Figure 4( a ) are v , v and v and have the highest degrees, while the driver nodes of network Figure 4( b ) are v , v and v , which have the lowest degrees. Therefore, we believe that the node composition of the MDSs is closely related to the direction of the edges in a network. To verify this hypothesis, we designed a revised BA model to generate directed networks. The model was the same as the classical BA model [39] except that the direction of a newly added edge is determined by the following rule: the direction of the new edge points from an existing old node v old to a new node v new with probability p , and the probability of pointing in the opposite direction is -p. Therefore, if p is large enough, the edges of a high-degree node v old will have a high probability of pointing to other nodes. The result of this arrangement is that the edges of a generated network tend to point from high-degree nodes to low-degree nodes, so the high degree nodes are more likely to be the source nodes , which must receive the control signal from outside. We calculated the fraction f hi-lo of edges that pointed from high-degree nodes to low-degree nodes in a directed BA network. Figure 5(a) shows the linear relation between f hi-lo and p. Then, we randomly calculated 10,000 MDSs of several directed BA networks using the Hopcroft-Karp algorithm. Figure 5(b) shows the average degree of the MDSs 〈𝑘 𝐷 〉̅̅̅̅̅̅ increases with p . When p =0.5, which means that the direction of the edges are randomly decided, 〈𝑘 𝐷 〉̅̅̅̅̅̅ is much less than < k >; as p increases to close to 1, 〈𝑘 𝐷 〉̅̅̅̅̅̅ gradually becomes greater than < k >; and in Figure 5(c), when p =1, the 〈𝑘 𝐷 〉̅̅̅̅̅̅ of all of the directed BA networks is always greater than < k >. We also verified this hypothesis in the real networks. Due to the complexity of degree correlation in real directed networks [40] , there may be no obvious relationship between 〈𝑘 𝐷 〉̅̅̅̅̅̅ and f hi-lo in different real networks. Therefore, we designed the following edge-reversal strategy to verify this hypothesis: for an edge v i →v j , if k i < k j , then reverse the edge direction to v j →v i with probability R . Similarly to the directed BA model, if R is large enough, the edges of a high-degree node will have a high probability of pointing to a low-degree node. Figure 5(d) shows 〈𝑘 𝐷 〉̅̅̅̅̅̅ versus R . We can see that if the original 〈𝑘 𝐷 〉̅̅̅̅̅̅ of a network is less than < k >, the 〈𝑘 𝐷 〉̅̅̅̅̅̅ increases gradually with the increase of R and becomes greater than or equal to the < k > of the network. However, for a few networks such as TRN-Yeast-

1, the average degree of the MDSs will decrease with R . This finding suggests that other topological factors also influence the degree properties of MDSs, although the direction of the edges may be a major factor. Discussion

The minimal driver nodes set can be obtained by finding the maximal matching of network. However, the MDSs of a network are not unique, and have very different topological features exist. Thus, one important research direction in the controllability of complex networks is analyzing the topological features of all of the possible MDSs. owever, enumerating all of the MDSs is in the class of

References

1. Fortunato S (2010) Community detection in graphs. Physics Reports 486: 75-174. 2. Ghoshal G, Barabasi AL (2011) Ranking stability and super-stable nodes in complex networks. Nature Communications 2. 3. Karsai M, Kivela M, Pan RK, Kaski K, Kertesz J, et al. (2011) Small but slow world: How network topology and burstiness slow down spreading. Physical Review E 83. 4. Papadopoulos F, Kitsak M, Serrano MÁ, Boguná M, Krioukov D (2012) Popularity versus similarity in growing networks. Nature 489: 537-540. 5. Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286: 509-512. 6. Watts DJ (1999) Networks, dynamics, and the small-world phenomenon. American Journal of Sociology 105: 493-527. 7. Vespignani A (2012) Modelling dynamical processes in complex socio-technical systems. Nature Physics 8: 32-39. 8. Stanoev A, Smilkov D, Kocarev L (2011) Identifying communities by influence dynamics in social networks. Physical Review E 84: 046102. 9. Palla G, Barabasi AL, Vicsek T (2007) Quantifying social group evolution. Nature 446: 664-667. 10. Lin C T (1974) Structural controllability. IEEE Transactions on Automatic Control, 19: 201-208. 11.Commault C, Dion J M, van der Woude J W (2002) Characterization of generic properties of linear structured systems for efficient computations. Kybernetika, 38(5): 503-520. 12. Hopcroft, J.E., R.M. Karp. (1973) An n5/2 algorithm for maximum matchings in bipartite. SIAM J. Comput. 2: 225-231. 13. Liu YY, Slotine JJ, Barabasi AL (2011) Controllability of complex networks. Nature 473: 167-173. 14. Zdeborova, L., Mezard, M (2006) The number of matchings in random graphs. J. Stat. Mech. 05, 05003. 5. Wang W X, Ni X, Lai Y C (2012) Optimizing controllability of complex networks by minimum structural perturbations. Physical Review E 85: 026115(5). 16. Müller F J, Schuppert A (2011) Few inputs can reprogram biological networks. Nature 478: E4-E4. 17. Nepusz T, Vicsek T (2012) Controlling edge dynamics in complex networks. Nature Physics 8: 568-573. 18. Yan G, Ren J, Lai YC, Lai CH, Li B (2012) Controlling complex networks: How much energy is needed? Physical Review Letters 108: 218703. 19. Cowan NJ, Chastain EJ, Vilhena DA, Freudenberg JS, Bergstrom CT (2012) Nodal Dynamics, Not Degree Distributions, Determine the Structural Controllability of Complex Networks. PLoS ONE 7(6):e38398.doi:10.1371/journal.pone.0038398 20. Valiant, L.G. (1979) The complexity of computing the permanent. Theoretical Computer Science: 8(2), 189-201 21. Kuhn HW (1955) The Hungarian method for the assignment problem. Naval research logistics quarterly 2: 83-97. 22. Berge C (1957) Two theorems in graph theory. Proceedings of the National Academy of Sciences of the United States of America 43: 842-844. 23. Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2009) Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics 6: 29-123. 24. Dunne JA, Williams RJ, Martinez ND (2002) Food-web structure and network theory: the role of connectance and size. Proceedings of the National Academy of Sciences 99: 12917-12922. 25. Martinez ND (1991) Artifacts or attributes? Effects of resolution on the Little Rock Lake food web. Ecological Monographs: 367-392. 26.Christian RR, Luczkovich JJ (1999) Organizing and understanding a winter’s seagrass foodweb network through effective trophic levels. Ecological Modelling 117: 99-124. 27. Ulanowicz RE, DeAngelis DL (2005) Network Analysis of Trophic Dynamics in South Florida Ecosystems. US Geological Survey Program on the South Florida Ecosystem: 114. 28. Patrıcio J, Ulanowicz R, Pardal M, Marques J (2004) Ascendency as an ecological indicator: a case study of estuarine pulse eutrophication. Estuarine, Coastal and Shelf Science 60: 23-35. 29. Watts D, Strogatz S (1998) Collective Dynamics of Small-World Networks. Nature 393: 440-442. 30. Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD): 177-187. 31. Citation networks. http://vlado.fmf.uni-lj.si/pub/networks/data/cite/default.htm 32. Adamic LA, Glance N (2005) The political blogosphere and the 2004 US election: divided they blog. Proceedings of the WWW-2005 Workshop on the Weblogging Ecosystem: 36-43. 33. Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data 1: 2. 34. Opsahl T, Panzarasa P (2009) Clustering in weighted networks. Social networks 31: 155-163. 35. Balaji S, Babu MM, Iyer LM, Luscombe NM, Aravind L (2006) Comprehensive analysis of combinatorial regulation using the transcriptional regulatory network of yeast. Journal of Molecular Biology 360: 213-227. 6. Norlen K, Lucas G, Gebbie M, Chuang J (2002) EVA: Extraction, visualization and analysis of the telecommunications and media ownership network. Proceedings of International Telecommunications Society 14th Biennial Conference (ITS2002), Seoul Korea, August 2002. 37. De Nooy W (1999) A literary playground: Literary criticism and balance theory. Poetics 26: 385-404. 38. Smith DA, White DR (1992) Structure and dynamics of the global economy: Network analysis of international trade 1965–1980. Social Forces 70: 857-893. 39. Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286: 509-512. 40. Foster, Jacob G., et al. (2010) Edge direction and the structure of networks. Proceedings of the National Academy of Sciences 107.24: 10815-10820. 41. Cross R, Parker A (2004) The Hidden Power of Social Networks. Harvard Business School Press, Boston, MA. 42. Ruths J, Ruths D (2014) Control Profiles of Complex Networks. Science 343, 1373-1376. DOI: 10.1126/science.1242063 Table 1 | Overview of real networks and the statistical results of their MDSs. < k > is the average degree of a network, n D is the size of a MDS, λ D = n D / N , < k D > is the average degree of the MDS, 〈𝑘 𝐷 〉̅̅̅̅̅̅ is the average value of < k D > for all of the obtained MDSs, and < k D min > and are the maximum and the minimum values < k D > of all of the MDSs obtained by the preferential matching method under a different preferential matching number m . Type Name N L < k > 〈𝑘 𝐷 〉̅̅̅̅̅̅ [< k Dmin >, ] n D λ D Trust Wiki-Vote

88 137 3.11 2.67 [2.20,3.02] 46 0.52 Little Rock

183 2494 27.26 15.39 [14.79,15.83] 99 0.54 Seagrass

49 226 9.22 8.06 [6.46,11.08] 13 0.27 Ythan

135 601 8.90 7.43 [4.86,9.67] 69 0.51 Florida

128 2106 32.91 24.86 [16.1,36.6] 30 0.23 Mondego

46 400 17.39 12.47 [9.26,12.58] 19 0.41 Power Grid USpowerGrid

306 2345 15.33 5.6 [3.36,16.47] 58 0.19 Citation Hep-th

35 81 4.628 4.72 [4.46,5.46] 13 0.37 rade World_trade