Collective computation in a network with distributed information
aa r X i v : . [ c s . S I] A p r Collective computation in a network with distributedinformation
A. C´ordobaDepartamento de F´ısica de la Materia Condensada, Universidad de Sevilla,P. O. Box 1065, 41080 Sevilla, [email protected]. Aguilar-HidalgoMax Planck Institute for the Physics of Complex Systems,N¨othnitzer Str. 38, 01187 - Dresden, GermanyM. C. LemosDepartamento de F´ısica de la Materia Condensada, Universidad de Sevilla,P. O. Box 1065, 41080 Sevilla, Spain
Abstract
We analyze a distributed information network in which each node has access to the infor-mation contained in a limited set of nodes (its neighborhood) at a given time. A collectivecomputation is carried out in which each node calculates a value that implies all informationcontained in the network (in our case, the average value of a variable that can take differ-ent values in each network node). The neighborhoods can change dynamically by exchangingneighbors with other nodes. The results of this collective calculation show rapid convergenceand good scalability with the network size. These results are compared with those of a fixednetwork arranged as a square lattice, in which the number of rounds to achieve a given accu-racy is very high when the size of the network increases. The results for the evolving networksare interpreted in light of the properties of complex networks and are directly relevant to thediameter and characteristic path length of the networks, which seem to express ”small world”properties.
We propose a model for a distributed information network in which each node at a given timecan obtain information from a limited number of other nodes to which it is connected. Fromthe information residing in these nodes, each one of them can perform specific tasks (calculation,classification, etc.) involving all the information contained in the network nodes. In a system whereinformation is distributed at different sites, the access to this information can be an expensiveprocedure if the number of sites is high. Moreover, if a set of N nodes want to do at any given timea calculation of a magnitude involving information contained in all other ones, the direct access ofall to all nodes requires a number of requests of order N , which is very high if N is large.1ere we pose the problem of a set of N elements (nodes), each characterized by the value of amagnitude s , in general different for each node, so that each and every one of the elements wish tocalculate the average value of s (or a function of s ) in the set, by accessing at one time to informationcontained in other q elements of the set ( q << N ) with which it is connected. This is in line withsystems using epidemic protocols [1, 2, 3] or gossip ones [4, 5, 6], such as the newscast protocol [7].In these systems the goal is not to enable both point-to-point communication between nodes, butrapid and efficient dissemination of information. The system intends to perform a specific collectivetask (for example, calculating the average value of a variable in the set of nodes, setting the positionof each node in a ranking according to the set value of the variable, etc.) so that, eventually, allnodes have access to the result obtained from the set. To do this we consider a network in whicheach element is connected to other q ones (neighbors), from which it extracts information. Theneighborhood of each element can be changed along the process of calculation, so that each nodecan exchange a neighbor randomly with another node. We discuss issues such as scalability andconvergence, considering different sizes and data sets. We also compare the results obtained for thedynamically changing network with those obtained from the fixed static network. To do this wecompare network configurations from different times of evolution with those corresponding to theinitial fixed network (which may take the form of a conventional cellular automaton). In a gossip framework there is an exchange of information among system elements In this exchangeof information, which is a dynamic process, one receives information from another (also can bea reciprocal exchange). In turn, the receiver of information can give information to other peers.Overall the transmission process basically comprises three aspects: peer selection, data exchangedor transmitted between peers and data processing. In our model we consider a network arranged ina two-dimensional lattice (we do this initially to compare with a conventional cellular automatonwith Moore’s neighborhood [8]) with m × n sites forming a square lattice. To each node is randomlyassigned the value of a variable s , in general different for each node. The objective is that eachnode ”knows” the average value of this variable in the whole at the end of the process. Each nodehas access to limited information on each step of calculation. To obtain specific results, it has beenconsidered that each node, at each instant, can only store eight values of other nodes, and thatinitially each node has access to the data of the eight nearest neighbors in the lattice. This can bematched with a two-dimensional cellular automaton with Moore’s neighborhood. Throughout theprocess the neighborhood of each site is not fixed, but each node exchanges a neighbor with anotherrandomly chosen node. This implies a double selection (with appropriate calls to random numbersroutines), one of them for the site of exchanging and the other for choosing the neighbor exchanged.The average value of the variable of each site and its eight neighbors (inputs) is assigned to thevariable of the corresponding site for each system update. The updating of all sites is simultaneous,although it could also be carried out sequentially. Since in the course of evolution neighborhoodsrandomly change, the initial regular structure of the network is irrelevant (only holds the fact thateach node has eight input and eight outputs, generally different). However, in order to formallycreate a by analogy with the above mentioned cellular automaton, we initially identify each node asa point on a square lattice m × n (Figure 1A), but now, after the dynamic exchange of information,the system is turned into a complex directed network and connections do not generate a square2igure 1: Representation of 32x32 networks (1024 nodes and 8192 links). A) Network generatedby the Cellular Automaton with Moore’s neighborhood and used as an initial condition for theEvolutionary method. B) Evolutionary network representation. This last network is far less orderedthan the cellular automaton case.lattice (Figure 1B) .To test the scalability we have considered various system sizes (32 ×
32, 100 × ×
320 and1000 × b , to evaluate how the computation progresses in each system update,i.e., convergence to the desired value with a given accuracy. b = standard deviation / mean value (1)We have imposed the condition that the calculation is stopped when the variation of b betweentwo successive updates is less than a preset value. We have also established a limited number ofsystem updates. Moreover, using the Cytoscape software [9, 10], we have analyzed some of theproperties of the network [11-12] to establish the relationship between its topology and the degreeof computation efficiency. Figures 1 A and B are represented in a circular layout in order to compact the graph, as an orthogonal layout(square lattice like) is not friendly looking due to the high number of nodes. m × n b < − b < − b < − b < − b < − × ∼ × ∼ × ∼ × ∼ b in a evolutionary network for a set ofrandomly distributed data. Fixed networksConfiguration of evolutionary network when it reaches: (i) b < − , (ii) b < − . m × n b < − b < − b < − b < − b < − × i ) 2 5 8 12 16320 × i ) 2 5 9 13 1732 × ii ) 2 4 7 9 12320 × ii ) 2 5 7 10 12Table 2: Number of updates required to reach the value of b in two fixed network for a set ofrandomly distributed data. We have taken the value of b = 10 − as the its limit of b and have scored the number of updatesrequired to reach a value of b less than the successive powers of 10 from 10 − to 10 − . The Table1 shows the results for the evolving network for a set of data randomly distributed according toa uniform distribution. As can be seen, the number of updates is low and the scalability is verygood, because for large variations in the size of the system the number of updates nearly remainsthe same. Each node has to perform a small number of updates. Since for each update of theentire system one must perform N = m × n updates of the values of the nodes, then, the totalnumber of individual operations is virtually proportional to system size, i.e. escalates as N (adirect calculation escalates as N ). Figure 2A graphically show this behavior where the numberof updates slowly increases when b is diminished. It must be noticed that the number of updatesremains the same for a certain b independently of the network size.Let’s now change the computation order. Instead of updating the system as the network evolves,the evolution takes first place and then, for a fixed network successive updates are done. Theresults for this case can be seen in Table 2. Here the configurations that are considered are thosereached with the evolutionary network when (i) b < − and (ii) b < − . In these cases, thedifferences with the evolutionary network are not very significant, but for larger values of the limitof b convergence is slightly faster, and for lower values of this limit convergence is slightly slowerin the case (i) and slightly faster in the case (ii). Again, the number of updates slowly increaseslinearly when b is diminished in both cases (i) and (ii) (Figure 2B).Table 3 shows the results obtained for the same random distribution of data using the fixednetwork forming a square lattice (cellular automaton with Moore’s neighborhood). As can be seen,the number of updates required to achieve a given accuracy rapidly grows with size. To be concrete,the number of updates increases exponentially when b is diminishes (Figure 2C) and the size of the4igure 2: (A) Number of updates required to reach the b value in an evolutionary network for arandomly distributed data set. The bar graph shows a linear increase of the number of updateswith the required precision and remains constant when increasing the network size. (B) Number ofupdates required to reach the b value in two fixed networks for a randomly distributed data set: (i)Evolutionary network configuration when b < − . (ii) Evolutionary network configuration when b < − . As in the case of using evolutionary network, the number of updates needed to reacha certain precision in the calculus increases linearly with this precision. In this case, the increasein the network size makes the number of updates to remain nearly constant. (C, D) Number ofupdates required to reach the b value applying a cellular automaton with Moore neighborhood fora randomly distributed data set. (C) varying b . (D) Varying the network size. Contrary to thetwo anterior cases of study, the use of cellular automata makes the number updates increase in anexponential tendency as the precision and the network size grow.5ellular automaton on square lattice with Moore’s neighborhood m × n b < − b < − b < − b < − b < − × ∼ × ∼ × ∼ × ∼ b by applying a cellular automatonwith Moore’s neighborhood for a set of randomized data. (*) The stop criterion is reached beforeobtaining the value of b indicated.network increases (Figure 2D).To analyze the relationship of these results with the network structure, we have made an analysisof some of the properties of these networks. We have considered the clustering coefficient, thediameter and the characteristic path length (CPL). The clustering coefficient of a node is the ratio p/r , where p is the number of links between the neighboring nodes and r is the maximum numberof links that would be possible among them; the clustering coefficient of the network is the averagevalue of clustering coefficients of all the network nodes. The network diameter is the maximumdistance between two nodes. The characteristic path length gives the expected distance betweenany two nodes. It is defined as the average number of steps along the shortest paths for all possiblepairs of network nodes. As can be seen, the shorter the CPL the better the communication alongthe network. By construction the number of links of each node in these networks is the same (eightinputs and eight outputs). The clustering coefficient of the evolutionary network has a very lowvalue and that of the cellular automaton is high. The two significant parameters are the diameterof the network and the characteristic path length. The values of these parameters for the fixednetworks considered are shown in Table 4. As can be seen, whereas in the four evolutionary networkconfigurations considered the diameter and the characteristic path length have small values, anddo not change significantly with the limit of b , in the cellular automaton these values are muchhigher and rapidly growing with size (Figure 3A). For the cellular automaton convergence is veryslow, the number of updates required to reach the limits of b is very high and the scalability is verypoor. Therefore this fact suggests that the speed of convergence and scalability of the calculus inthe network is strongly associated with the values of these two parameters. This can be seen asmanifesting the property of ”small world”, which appears in different types of complex networks[11, 12]. Topologically speaking, the evolutionary networks are distributed in a more sparse way thanthe cellular automaton, according to the clustering coefficient values. Regarding CPL and diametervalues, the evolutionary networks are much ‘better’ connected than the cellular automaton, in termsof information dissemination. A network with low CPL means that the information contained inone node is more accessible to the rest of the nodes than in a network with high CPL. All thisgives a method that re-distributes a network in a way where information is easy to share, and so,computationally speaking, with a low cost in collective computations.Next we have examined the influence of the way in which data are distributed. Instead of arandom distribution, we have considered other one in which data is strongly grouped. This consistsof four different values distributed in the nodes associated to each of the four quadrants shown inFigure 4. The results for the evolutionary network are shown in Table 5. As can be seen scalabilityis very good as in the case of the data with random distribution, although the number of updates6opological Parameters.Network Diameter Characteristicpath length Clustering coefficientEvolutionary net-work 32 ×
32 whenreaching b < − × b < − × b < − × b < − ×
32 16 10.7 0.429Cellular automaton100 ×
100 50 33.3 0.429Cellular automaton320 ×
320 160 106.7 0.429Table 4: Values of some characteristic parameters of the listed networks.7igure 3: (A) Topological parameters measured in the analyzed networks. The bar graph shows aconstant behavior for the diameter and the characteristic path length in the evolutionary networks(at the left side of the vertical discontinuous line). In the cellular automata case (right side of thevertical discontinuous line), these parameters highly increase with the network size. The clusteringcoefficient has a low value in the evolutionary network and higher in the cellular automata. Thoughthe connectivity remains always stationary, the evolutionary networks present a much more sparsetopology than in the cellular automata cases, which is indicative for a more efficient connectivity interms of information dispersion. (B) Number of updates required to reach the b value in a specificcase using evolutionary networks when the data is not randomly distributed but strongly grouped.In this case the scalability is still very good as in the case with randomly distributed data. Thenumber of updates also slowly increases in a linear way as in the rest of studied cases for theevolutionary network. 8igure 4: Schema of data grouped by quadrants.Evolutionary network m × n b < − b < − b < − b < − × ∼ × ∼ × ∼ × ∼ • If a node is deleted, its input links could be redirected to its output target nodes, so that thosenodes that had access to it now are addressed to those to which the deleted node accessed. • If a node is inserted, it provides access to eight of the existing nodes (as input links), and inexchange, each of these nodes will deliver to the new node a link to one of its neighbors (asoutput nodes).This will allow to perform a system robustness analysis. If the variation in the number ofnodes is not very sudden, it is expected the overall system to be not severely impacted, given theinformation transmission speed. Even in the case of a change in many nodes, it can be expectedthat a good accuracy in the result of the global parameter will be reached again in a reduced numberof updates, i.e. resilience is very good. 9
Conclusions
In a networked system, which nodes have a limited access to information only from a few neighbors,the collective computation involving the whole data set is very efficient when the network changesdynamically, or with a fixed structure generated by the same dynamic process. In this case, afast convergence towards the average value is achieved. Also the number of computing updatesshows a good scalability with the size of the network. This is associated with the diameter and thecharacteristic path length of the network. Given the high rate with which the system converges tothe desired value, the system is also is expected to be robust to changes in the number of nodes orin the values distribution assigned to the network. These results clearly contrast with what happensin a fixed regular network in which the convergence is slow and the computation required hugelygrows when the size of the system increases.
Acknowledgement
This work is partially financed by the Project FIS2008-04120 of the Spanish Ministry for Scienceand Innovation (MICINN).
References [1] P. T. Eugster, R. Guerrraoui, A.-M, Kermarrec, L. Massouli, Epidemic information dissemi-nation in distributed systems, IEEE Computer 37 (2004) 60–67.[2] P. De, S. K. Das, Epidemic models, algorithms and protocols in wireless sensor and ad-hocnetworks, in