[PDF] Collective computation in a network with distributed information

Abstract

We analyze a distributed information network in which each node has access to the information contained in a limited set of nodes (its neighborhood) at a given time. A collective computation is carried out in which each node calculates a value that implies all information contained in the network (in our case, the average value of a variable that can take different values in each network node). The neighborhoods can change dynamically by exchanging neighbors with other nodes. The results of this collective calculation show rapid convergence and good scalability with the network size. These results are compared with those of a fixed network arranged as a square lattice, in which the number of rounds to achieve a given accuracy is very high when the size of the network increases. The results for the evolving networks are interpreted in light of the properties of complex networks and are directly relevant to the diameter and characteristic path length of the networks, which seem to express "small world" properties.

Full PDF

aa r X i v : . [ c s . S I] A p r Collective computation in a network with distributedinformation

A. C´ordobaDepartamento de F´ısica de la Materia Condensada, Universidad de Sevilla,P. O. Box 1065, 41080 Sevilla, [email protected]. Aguilar-HidalgoMax Planck Institute for the Physics of Complex Systems,N¨othnitzer Str. 38, 01187 - Dresden, GermanyM. C. LemosDepartamento de F´ısica de la Materia Condensada, Universidad de Sevilla,P. O. Box 1065, 41080 Sevilla, Spain

Abstract

We analyze a distributed information network in which each node has access to the infor-mation contained in a limited set of nodes (its neighborhood) at a given time. A collectivecomputation is carried out in which each node calculates a value that implies all informationcontained in the network (in our case, the average value of a variable that can take diﬀer-ent values in each network node). The neighborhoods can change dynamically by exchangingneighbors with other nodes. The results of this collective calculation show rapid convergenceand good scalability with the network size. These results are compared with those of a ﬁxednetwork arranged as a square lattice, in which the number of rounds to achieve a given accu-racy is very high when the size of the network increases. The results for the evolving networksare interpreted in light of the properties of complex networks and are directly relevant to thediameter and characteristic path length of the networks, which seem to express ”small world”properties.

We propose a model for a distributed information network in which each node at a given timecan obtain information from a limited number of other nodes to which it is connected. Fromthe information residing in these nodes, each one of them can perform speciﬁc tasks (calculation,classiﬁcation, etc.) involving all the information contained in the network nodes. In a system whereinformation is distributed at diﬀerent sites, the access to this information can be an expensiveprocedure if the number of sites is high. Moreover, if a set of N nodes want to do at any given timea calculation of a magnitude involving information contained in all other ones, the direct access ofall to all nodes requires a number of requests of order N , which is very high if N is large.1ere we pose the problem of a set of N elements (nodes), each characterized by the value of amagnitude s , in general diﬀerent for each node, so that each and every one of the elements wish tocalculate the average value of s (or a function of s ) in the set, by accessing at one time to informationcontained in other q elements of the set ( q << N ) with which it is connected. This is in line withsystems using epidemic protocols [1, 2, 3] or gossip ones [4, 5, 6], such as the newscast protocol [7].In these systems the goal is not to enable both point-to-point communication between nodes, butrapid and eﬃcient dissemination of information. The system intends to perform a speciﬁc collectivetask (for example, calculating the average value of a variable in the set of nodes, setting the positionof each node in a ranking according to the set value of the variable, etc.) so that, eventually, allnodes have access to the result obtained from the set. To do this we consider a network in whicheach element is connected to other q ones (neighbors), from which it extracts information. Theneighborhood of each element can be changed along the process of calculation, so that each nodecan exchange a neighbor randomly with another node. We discuss issues such as scalability andconvergence, considering diﬀerent sizes and data sets. We also compare the results obtained for thedynamically changing network with those obtained from the ﬁxed static network. To do this wecompare network conﬁgurations from diﬀerent times of evolution with those corresponding to theinitial ﬁxed network (which may take the form of a conventional cellular automaton). In a gossip framework there is an exchange of information among system elements In this exchangeof information, which is a dynamic process, one receives information from another (also can bea reciprocal exchange). In turn, the receiver of information can give information to other peers.Overall the transmission process basically comprises three aspects: peer selection, data exchangedor transmitted between peers and data processing. In our model we consider a network arranged ina two-dimensional lattice (we do this initially to compare with a conventional cellular automatonwith Moore’s neighborhood [8]) with m × n sites forming a square lattice. To each node is randomlyassigned the value of a variable s , in general diﬀerent for each node. The objective is that eachnode ”knows” the average value of this variable in the whole at the end of the process. Each nodehas access to limited information on each step of calculation. To obtain speciﬁc results, it has beenconsidered that each node, at each instant, can only store eight values of other nodes, and thatinitially each node has access to the data of the eight nearest neighbors in the lattice. This can bematched with a two-dimensional cellular automaton with Moore’s neighborhood. Throughout theprocess the neighborhood of each site is not ﬁxed, but each node exchanges a neighbor with anotherrandomly chosen node. This implies a double selection (with appropriate calls to random numbersroutines), one of them for the site of exchanging and the other for choosing the neighbor exchanged.The average value of the variable of each site and its eight neighbors (inputs) is assigned to thevariable of the corresponding site for each system update. The updating of all sites is simultaneous,although it could also be carried out sequentially. Since in the course of evolution neighborhoodsrandomly change, the initial regular structure of the network is irrelevant (only holds the fact thateach node has eight input and eight outputs, generally diﬀerent). However, in order to formallycreate a by analogy with the above mentioned cellular automaton, we initially identify each node asa point on a square lattice m × n (Figure 1A), but now, after the dynamic exchange of information,the system is turned into a complex directed network and connections do not generate a square2igure 1: Representation of 32x32 networks (1024 nodes and 8192 links). A) Network generatedby the Cellular Automaton with Moore’s neighborhood and used as an initial condition for theEvolutionary method. B) Evolutionary network representation. This last network is far less orderedthan the cellular automaton case.lattice (Figure 1B) .To test the scalability we have considered various system sizes (32 ×

32, 100 × ×

320 and1000 × b , to evaluate how the computation progresses in each system update,i.e., convergence to the desired value with a given accuracy. b = standard deviation / mean value (1)We have imposed the condition that the calculation is stopped when the variation of b betweentwo successive updates is less than a preset value. We have also established a limited number ofsystem updates. Moreover, using the Cytoscape software [9, 10], we have analyzed some of theproperties of the network [11-12] to establish the relationship between its topology and the degreeof computation eﬃciency. Figures 1 A and B are represented in a circular layout in order to compact the graph, as an orthogonal layout(square lattice like) is not friendly looking due to the high number of nodes. m × n b < − b < − b < − b < − b < − × ∼ × ∼ × ∼ × ∼ b in a evolutionary network for a set ofrandomly distributed data. Fixed networksConﬁguration of evolutionary network when it reaches: (i) b < − , (ii) b < − . m × n b < − b < − b < − b < − b < − × i ) 2 5 8 12 16320 × i ) 2 5 9 13 1732 × ii ) 2 4 7 9 12320 × ii ) 2 5 7 10 12Table 2: Number of updates required to reach the value of b in two ﬁxed network for a set ofrandomly distributed data. We have taken the value of b = 10 − as the its limit of b and have scored the number of updatesrequired to reach a value of b less than the successive powers of 10 from 10 − to 10 − . The Table1 shows the results for the evolving network for a set of data randomly distributed according toa uniform distribution. As can be seen, the number of updates is low and the scalability is verygood, because for large variations in the size of the system the number of updates nearly remainsthe same. Each node has to perform a small number of updates. Since for each update of theentire system one must perform N = m × n updates of the values of the nodes, then, the totalnumber of individual operations is virtually proportional to system size, i.e. escalates as N (adirect calculation escalates as N ). Figure 2A graphically show this behavior where the numberof updates slowly increases when b is diminished. It must be noticed that the number of updatesremains the same for a certain b independently of the network size.Let’s now change the computation order. Instead of updating the system as the network evolves,the evolution takes ﬁrst place and then, for a ﬁxed network successive updates are done. Theresults for this case can be seen in Table 2. Here the conﬁgurations that are considered are thosereached with the evolutionary network when (i) b < − and (ii) b < − . In these cases, thediﬀerences with the evolutionary network are not very signiﬁcant, but for larger values of the limitof b convergence is slightly faster, and for lower values of this limit convergence is slightly slowerin the case (i) and slightly faster in the case (ii). Again, the number of updates slowly increaseslinearly when b is diminished in both cases (i) and (ii) (Figure 2B).Table 3 shows the results obtained for the same random distribution of data using the ﬁxednetwork forming a square lattice (cellular automaton with Moore’s neighborhood). As can be seen,the number of updates required to achieve a given accuracy rapidly grows with size. To be concrete,the number of updates increases exponentially when b is diminishes (Figure 2C) and the size of the4igure 2: (A) Number of updates required to reach the b value in an evolutionary network for arandomly distributed data set. The bar graph shows a linear increase of the number of updateswith the required precision and remains constant when increasing the network size. (B) Number ofupdates required to reach the b value in two ﬁxed networks for a randomly distributed data set: (i)Evolutionary network conﬁguration when b < − . (ii) Evolutionary network conﬁguration when b < − . As in the case of using evolutionary network, the number of updates needed to reacha certain precision in the calculus increases linearly with this precision. In this case, the increasein the network size makes the number of updates to remain nearly constant. (C, D) Number ofupdates required to reach the b value applying a cellular automaton with Moore neighborhood fora randomly distributed data set. (C) varying b . (D) Varying the network size. Contrary to thetwo anterior cases of study, the use of cellular automata makes the number updates increase in anexponential tendency as the precision and the network size grow.5ellular automaton on square lattice with Moore’s neighborhood m × n b < − b < − b < − b < − b < − × ∼ × ∼ × ∼ × ∼ b by applying a cellular automatonwith Moore’s neighborhood for a set of randomized data. (*) The stop criterion is reached beforeobtaining the value of b indicated.network increases (Figure 2D).To analyze the relationship of these results with the network structure, we have made an analysisof some of the properties of these networks. We have considered the clustering coeﬃcient, thediameter and the characteristic path length (CPL). The clustering coeﬃcient of a node is the ratio p/r , where p is the number of links between the neighboring nodes and r is the maximum numberof links that would be possible among them; the clustering coeﬃcient of the network is the averagevalue of clustering coeﬃcients of all the network nodes. The network diameter is the maximumdistance between two nodes. The characteristic path length gives the expected distance betweenany two nodes. It is deﬁned as the average number of steps along the shortest paths for all possiblepairs of network nodes. As can be seen, the shorter the CPL the better the communication alongthe network. By construction the number of links of each node in these networks is the same (eightinputs and eight outputs). The clustering coeﬃcient of the evolutionary network has a very lowvalue and that of the cellular automaton is high. The two signiﬁcant parameters are the diameterof the network and the characteristic path length. The values of these parameters for the ﬁxednetworks considered are shown in Table 4. As can be seen, whereas in the four evolutionary networkconﬁgurations considered the diameter and the characteristic path length have small values, anddo not change signiﬁcantly with the limit of b , in the cellular automaton these values are muchhigher and rapidly growing with size (Figure 3A). For the cellular automaton convergence is veryslow, the number of updates required to reach the limits of b is very high and the scalability is verypoor. Therefore this fact suggests that the speed of convergence and scalability of the calculus inthe network is strongly associated with the values of these two parameters. This can be seen asmanifesting the property of ”small world”, which appears in diﬀerent types of complex networks[11, 12]. Topologically speaking, the evolutionary networks are distributed in a more sparse way thanthe cellular automaton, according to the clustering coeﬃcient values. Regarding CPL and diametervalues, the evolutionary networks are much ‘better’ connected than the cellular automaton, in termsof information dissemination. A network with low CPL means that the information contained inone node is more accessible to the rest of the nodes than in a network with high CPL. All thisgives a method that re-distributes a network in a way where information is easy to share, and so,computationally speaking, with a low cost in collective computations.Next we have examined the inﬂuence of the way in which data are distributed. Instead of arandom distribution, we have considered other one in which data is strongly grouped. This consistsof four diﬀerent values distributed in the nodes associated to each of the four quadrants shown inFigure 4. The results for the evolutionary network are shown in Table 5. As can be seen scalabilityis very good as in the case of the data with random distribution, although the number of updates6opological Parameters.Network Diameter Characteristicpath length Clustering coeﬃcientEvolutionary net-work 32 ×

32 whenreaching b < − × b < − × b < − × b < − ×

32 16 10.7 0.429Cellular automaton100 ×

100 50 33.3 0.429Cellular automaton320 ×

320 160 106.7 0.429Table 4: Values of some characteristic parameters of the listed networks.7igure 3: (A) Topological parameters measured in the analyzed networks. The bar graph shows aconstant behavior for the diameter and the characteristic path length in the evolutionary networks(at the left side of the vertical discontinuous line). In the cellular automata case (right side of thevertical discontinuous line), these parameters highly increase with the network size. The clusteringcoeﬃcient has a low value in the evolutionary network and higher in the cellular automata. Thoughthe connectivity remains always stationary, the evolutionary networks present a much more sparsetopology than in the cellular automata cases, which is indicative for a more eﬃcient connectivity interms of information dispersion. (B) Number of updates required to reach the b value in a speciﬁccase using evolutionary networks when the data is not randomly distributed but strongly grouped.In this case the scalability is still very good as in the case with randomly distributed data. Thenumber of updates also slowly increases in a linear way as in the rest of studied cases for theevolutionary network. 8igure 4: Schema of data grouped by quadrants.Evolutionary network m × n b < − b < − b < − b < − × ∼ × ∼ × ∼ × ∼ • If a node is deleted, its input links could be redirected to its output target nodes, so that thosenodes that had access to it now are addressed to those to which the deleted node accessed. • If a node is inserted, it provides access to eight of the existing nodes (as input links), and inexchange, each of these nodes will deliver to the new node a link to one of its neighbors (asoutput nodes).This will allow to perform a system robustness analysis. If the variation in the number ofnodes is not very sudden, it is expected the overall system to be not severely impacted, given theinformation transmission speed. Even in the case of a change in many nodes, it can be expectedthat a good accuracy in the result of the global parameter will be reached again in a reduced numberof updates, i.e. resilience is very good. 9

Conclusions

In a networked system, which nodes have a limited access to information only from a few neighbors,the collective computation involving the whole data set is very eﬃcient when the network changesdynamically, or with a ﬁxed structure generated by the same dynamic process. In this case, afast convergence towards the average value is achieved. Also the number of computing updatesshows a good scalability with the size of the network. This is associated with the diameter and thecharacteristic path length of the network. Given the high rate with which the system converges tothe desired value, the system is also is expected to be robust to changes in the number of nodes orin the values distribution assigned to the network. These results clearly contrast with what happensin a ﬁxed regular network in which the convergence is slow and the computation required hugelygrows when the size of the system increases.

Acknowledgement

This work is partially ﬁnanced by the Project FIS2008-04120 of the Spanish Ministry for Scienceand Innovation (MICINN).

References [1] P. T. Eugster, R. Guerrraoui, A.-M, Kermarrec, L. Massouli, Epidemic information dissemi-nation in distributed systems, IEEE Computer 37 (2004) 60–67.[2] P. De, S. K. Das, Epidemic models, algorithms and protocols in wireless sensor and ad-hocnetworks, in