A Methodology to Select Topology Generators for WANET Simulations (Extended Version)
AA Methodology to Select Topology Generatorsfor WANET Simulations(Extended Version)
Michael O’Sullivan, Leonardo Aniello, and Vladimiro Sassone
Electronics and Computer Science,
University of Southampton , Southampton, UK { M.O’Sullivan | L.Aniello | vsassone } @soton.ac.uk Abstract.
Many academic and industrial research works on WANETsrely on simulations, at least in the first stages, to obtain preliminary re-sults to be subsequently validated in real settings. Topology generators(TG) are commonly used to generate the initial placement of nodes in ar-tificial WANET topologies, where those simulations take place. The sig-nificance of these experiments heavily depends on the representativenessof artificial topologies. Indeed, if they were not drawn fairly, obtained re-sults would apply only to a subset of possible configurations, hence theywould lack of the appropriate generality required to port them to the realworld. Although using many TGs could mitigate this issue by generatingtopologies in several different ways, that would entail a significant addi-tional effort. Hence, the problem arises of what TGs to choose, amonga number of available generators, to maximise the representativeness ofgenerated topologies and reduce the number of TGs to use.In this paper, we address that problem by investigating the presence ofbias in the initial placement of nodes in artificial WANET topologiesproduced by different TGs. We propose a methodology to assess suchbias and introduce two metrics to quantify the diversity of the topologiesgenerated by a TG with respect to all the available TGs, which can beused to select what TGs to use. We carry out experiments on three well-known TGs, namely BRITE, NPART and GT-ITM. Obtained resultsshow that using the artificial networks produced by a single TG canintroduce bias.
Keywords:
Topology generator, WANET, BRITE, NPART, GT-ITM A wireless ad hoc network (WANET) is based on a decentralised topology of de-vices/nodes that cooperate to implement some routing protocol, i.e. each deviceforwards its own and other devices’ traffic according to a specific algorithm withthe aim of reaching the target destination. WANETs do not rely on any fixedinfrastructure and each node can only communicate with those other nodes ly-ing within the transmission range of one another. WANET applications are wide a r X i v : . [ c s . N I] A ug Michael O’Sullivan, Leonardo Aniello, and Vladimiro Sassone and significant, ranging from wireless sensor networks to vehicular ad hoc net-works (VANETs) to mobile ad hoc networks (MANETs), and they are used ineveryday scenarios as well as more critical settings, such as military operations.Several WANET aspects are still being investigated by the research com-munity, e.g. routing protocols [3,4] and security [20,21]. For convenience, manyacademic works heavily rely on simulation to test a proposed solution and obtainpreliminary results that are used to validate its effectiveness. Network simula-tions commonly entail evaluating a given approach on many different WANETtopologies to ensure results are meaningful, i.e. to have evidence that they canapply to a wide variety of networks and are not tied to particular network con-figurations. Hence, as also suggested by G¨unes et al. [6], a key aspect in anynetwork protocol simulation is the design and selection of what test networktopologies to consider.Network topology generators (TGs) are usually employed to create a pos-sibly large number of topologies, on the basis of predefined network models,real-world measurements and additional parameters available to tune the gen-eration process. Although any TG is designed and implemented to generate arepresentative set of topologies, different TGs do not rely on the same modelsand assumptions, do not follow the same generation approach and thus are likelyto produce diverse topologies, which in turn can lead to obtain dissimilar sim-ulation results [12,7]. Hence, we claim that the choice of the TG can affect thistype of experiments, i.e. a TG is likely to introduce bias in simulations. Thisholds true for WANET simulations as well, where TGs are used to generate theinitial placement of nodes, which in turn plays an important role in the way aWANET network evolves over time.Despite the fact that each TG has its own peculiarities, and that sometimesresearchers can select a TG on the basis of the specific mathematical or physicalmodel they need, there are in general several TGs that can be used to createartificial topologies representing the initial placement of nodes in WANETs.In this context, the best option would be to use all the available TGs to runsimulations on the largest possible range of topologies, so as to ensure thatobtained results are not biased by the choice of a specific TG, or subset ofTGs. On the other hand, using many TGs proves to be really demanding forresearchers in terms of required time and effort to delve into the technical issuesof each TG. Therefore, a trade-off arises between reducing the effort to spend insetting up the simulations, i.e. minimising how many TGs to use, and maximisingthe representativeness of the simulations themselves, i.e. minimising the biasintroduced by TG selection .In this paper, we delve into the analysis of the differences between topologiesgenerated by distinct TGs to help researchers to reduce how many TGs to usewhile still preserving the representativeness of generated topologies. In partic-ular, given a fixed number of available TGs , we address the following researchquestions. itle Suppressed Due to Excessive Length 3 – RQ1 : How to measure the difference between topologies generated by dis-tinct TGs? i.e. how to characterise the bias introduced by the choice of aspecific TG rather than using all the TGs? – RQ2 : how to choose what TG, or TGs, to use to reduce such a bias?The approach we propose relies on a compact, numeric representation oftopologies, based on a number of aspects about how network nodes are placedover the plane (e.g. inter-node distance, clustering) and about how WANETswork (e.g. nodes can only communicate with other nodes within their transmis-sion range). Each topology is modelled as a vector of numeric features, whichenables to compute distance metrics. We consider a fixed number of TGs andpropose to interpret the bias as a measure of the differences that arise in gener-ated WANET topologies when selecting any single TG, or subset of TGS, insteadof picking all the available TGs.We tackle RQ1 by focussing on two complementary facets of the distancesbetween topologies. On the one hand, we want to quantify the bias by mea-suring the average distance between topologies generated by distinct TGs. Inthe specific, we use
Hedges’ g measure of effect size to compute the bias index ,which measures the difference between topologies produced by a specific TGs, orsubset of TGs, and those created by all the available TGs . On the other hand, weare also interested in evaluating to what extent existing differences are distin-guishing of some TG, i.e. whether such differences allow to determine which TGgenerated a topology, regardless of the extent of those differences. In this regard,we employ machine learning techniques to compute the classification accuracy ,i.e. to estimate how precisely we can discover which TG generated a topology.We answer RQ2 by proposing a simple methodology, based on the bias index,to select what TGs to use to reduce the bias, depending on how many TGs canbe picked at most.We carry out an experimental evaluation using three well-known TGs, i.e.BRITE, NPART and TG-ITM. Obtained results show that using a single TG islikely to introduce bias, and that in this case picking NPART is the best choiceto mitigate this issue. If two TGs can be used, BRITE and NPART provide thelowest bias. The experiments on the classification accuracy show that topologiescan be correctly classified according to their TGs with high accuracy, i.e. upto almost 78%, and that, in this specific case, four topology features contributemost to distinguishing between different TGs.To the best of our knowledge, this is the first work in literature that system-atically investigates the differences between topologies generated by diverse TGsin the context of WANET simulation. The contributions of this work are1. the definition of a vector-based representation of WANET topologies , basedon a number of features derived from different aspects of node placement;2. the definition of two novel metrics to assess the differences between TGs, i.e.the bias index and the classification accuracy ;3. a methodology to choose what TG, or TGs, to use among available TGs tominimise the bias;
Michael O’Sullivan, Leonardo Aniello, and Vladimiro Sassone
4. an experimental evaluation on BRITE, NPART and GT-ITM TGs, showingthe presence of bias in picking either a single TG or a pair of TGs.The rest of the paper is organised as follows. Section 2 describes backgroundand discusses related work. The system model for our investigation is introducedin section 3. The methodology we propose is detailed in section 4. The experi-ments and obtained results are presented in section 5. Finally, section 6 drawsconclusions and outlines possible future work.
In this paper we focus on TGs that provide the initial placement of nodes overa plane. As we are dealing with WANETs, we are not interested in how nodesare connected among each other and assume that any node can communicatedirectly with all the nodes lying within its transmission range.TGs can differ mainly in how nodes placement is decided [17] and whateach node represents [7].
Node placement strategy can be based either on some predefined model or on real-world measurements . In the former case, a certainprobability distribution can be used, such as the
Waxman model [19], or spe-cific strategies can be enforced to preserve the inter-node distance among nodesplaced on a line ( chain node placement ) or to position nodes at the intersectionsof square cells when the plane is organised as a grid ( grid node placement ). In thelatter case, nodes positions are instead determined in compliance with real-worldmeasurements of existing network topologies. Nodes in an artificial topology canrepresent either autonomous systems (AS), i.e.
AS-level topologies , or routers,i.e. router-level topologies .Some existing works in literature deal with the investigation of diverse as-pects of TGs, e.g. how realistic generated topologies are. Several works [10,11,12]focus on TGs for Internet topologies by comparing the topologies they generatewith available real Internet map topologies, with the aim of assessing to whatextent those topologies can be considered realistic. Rossi et al. [16] propose aframework to analyse Internet topologies by using a multi-level approach basedon a number of graph measures and existing reference datasets. Their goal isto assess whether Internet TGs comply with their claimed objectives and howrealistic generated topologies are. Our work differs from those papers mainlybecause we do not evaluate whether artificial topologies are realistic, rather weinvestigate the bias in topologies generated by different TGs. Furthermore, wetackle WANETs rather than Internet.Heckmann et al. [7] compare three TGs according to the similarity of gener-ated topologies with an available collection of real-world topologies.Although all those works, likewise ours, focus on evaluating and comparingexisting TGs, the main difference lies in the goal of such a comparison. In fact,while existing literature is interested in measuring how well generated topologiesrepresent real-world networks, we concentrate on an orthogonal aspect by inves-tigating whether picking a certain TG rather than another one, or rather than itle Suppressed Due to Excessive Length 5 choosing more TGs, can introduce bias. From this point of view, our contributionis novel and complements existing research on comparing available TGs.
We consider a set
T G with N T G topology generators (TG), i.e. |T G| = N T G .Each TG generates coordinates for the initial placement of nodes, i.e. devices,within a defined square topology area , with sides D units long. Each TG tg i gen-erates a set T i with N T topologies, where i =0 , . . . , N T G −
1. The set containingall the topologies generated by all the TGs is referred to as T = N TG − (cid:91) i =0 T i hence |T | = N T G · N T . Each topology t j ∈ T i has N nodes N j = { n k } , where i =0 , . . . , N T G − j =0 , . . . , N T − k =0 , . . . , N −
1. Each node n k is identified byits bi-dimensional coordinates ( x k , y k ) in the topology area, where 0 ≤ x k , y k ≤ D . Given two nodes n a and n b ( a, b =0 , . . . , N − d ( n a , n b ) = (cid:112) ( x a − x b ) + ( y a − y b ) In WANETs, any device can establish connections with other devices placedwithin a specific distance, which we refer to as radius r . We consider a number N R of different radii R = { r i } , where i =0 , . . . , N R − < r j < r j +1 < D for j =0 , . . . , N R − In general, a topology generator (TG) introduces bias if the topologies it gener-ates are not representative enough of some target application, such as analysingrouting protocols. It is not trivial to decide whether a given set of topologiescan be considered representative enough of a certain application, let alone it ispossible to provide general criteria to evaluate the representativeness of a groupof topologies regardless of what they are intended to be used for. However, ifwe consider the universe set
T U , containing all the possible topologies, and asubset of it
S ⊂ T U , we can investigate to what extent S is representative of T U by inspecting the differences between topologies in S and topologies in T U .We propose to use those differences to analyse the bias of using topologies in S only, i.e. the larger and sharper such differences, the higher the bias.Although we cannot have in practice a set like T U , we do have a numberof available TGs,
T G (see section 3), which can be used to generate a set oftopologies T . While we do not know how much T is representative of T U , weclaim that T is the best approximation of T U we can aim for from a pragmaticpoint of view. Hence, to measure the bias introduced by a TG tg i ∈ T G , we Michael O’Sullivan, Leonardo Aniello, and Vladimiro Sassone can examine the differences between the topologies it generates, T i , and thetopologies in T .We propose a two-steps methodology to analyse the bias of TGs. The firststep is modelling topologies by extracting a number of characteristic features,which will be used to have a compact, numeric representation of topologies andenable to measure the differences between them. The second step is indeed com-puting metrics to quantify the dissimilarities between topologies generated bydifferent TGs. We claim that there are two complementary aspects to investi-gate when analysing such dissimilarities. On the one hand, the extent of thosedifferences, i.e. how large they are, provides an objective scale of the bias. Onthe other hand, the peculiarity of those differences, i.e. how much distinctive ofTGs they are, allows to figure out whether topologies generated by different TGsare distinguishable from each other by looking at specific aspects, regardless ofthe extent of the existing differences. We propose the following two approaches. – Computing the average distance between the topologies generated by a TGand all the topologies in T . By mapping topologies into the space generatedby the chosen features, we use the Hedges’ g [8], measure of effect size, toquantify the difference between two populations: the topologies generatedby a specific TG and the topologies in T . We refer to such a difference as bias index . – Assessing the accuracy in distinguishing which TG generated a given topol-ogy. We train a classifier with the topologies in T and the information onwhich TG generated each of them, then we test the obtained classificationmodel by measuring its classification accuracy .We define the features we use to characterise topologies in section 4.1, then wedetail how compute the bias index and the classification accuracy in section 4.2and 4.3, respectively, To choose what features to consider, we focus on the aspects we deem mostrepresentative to show variance within topologies. We thus consider the featuresthat characterise the placement of the nodes within the test plane and relation-ships between nodes. We extract features by looking at the following aspects of ageneric topology t j : inter-node distance, node spatial distribution, node density,shared node neighbours, node clustering coefficient. Inter-node Distance.
We consider the set of node distances D defined asfollows D = { d ( n a , n b ) | n a , n b ∈ N j , ≤ a < b < N } The features we extract from D are (i) the minimum value d min , (ii) the maxi-mum value d max , (iii) the value range d max − d min , (iv) the mode, i.e. the mostfrequent value , (v) how many times the mode occurs, (vi) the mean value and(vii) the standard deviation. If there are more most-frequent values, by convention we pick the smallest one.itle Suppressed Due to Excessive Length 7
Spatial Distribution.
By taking inspiration from the Quadrat Method [5] usedto test the Complete Spatial Randomness hypothesis, we partition the topologyarea in d smaller squares, each with sides D/d units long. Figure 1 shows anexample of topology area partitioning.
Fig. 1.
Partition of a topology area with 1000 units sides in 100 smaller squares, eachwith 100 units dies. This partitioning is used to compute spatial distribution features.
We consider the set
N C of node counts, where each element nc s ∈ N C is thenumber of nodes in the s -th partition of the topology area, with s =0 , . . . , d − N C are (i) the minimum value nc min , (ii) themaximum value nc max , (iii) the value range nc max − nc min , (iv) the mode and(v) how many times the mode occurs. Node Density.
We define the density nd r ( n a ) of a node n a ∈ N j of a topology t j , for a given radius r ∈ R , as the number of other nodes within distance r from n a , i.e. nd r ( n a ) = |{ n b ∈ N j \ { n a } | d ( n a , n b ) < r }| We extract as many features f density as the number of radii in R , each corre-sponding to the average node density for a given radius r , defined as follows f density ( r ) = (cid:80) n a ∈N j nd r ( n a ) N , r ∈ R
Shared Neighbours Distribution.
For any given pair of nodes n a , n b ∈ N j and radius r ∈ R , the shared neighbours [15] are those nodes within distance r from both n a and n b . Figure 2 shows an example where nodes 1 and 3 areshared neighbours of nodes 3 and 4. We first introduce the neighbours function neigh r ( n a ) for a node n a ∈ N j and a radius r as neigh r ( n a ) = { n b ∈ N j \ { n a } | d ( n a , n b ) < r } Michael O’Sullivan, Leonardo Aniello, and Vladimiro Sassone node 2node 3 node 4node 1
Fig. 2.
Nodes 1 and 2 are shared neighbours of nodes 3 and 4. In this case, the valueof shared neighbours count for nodes 3 and 4 is 2.
Then we define the shared neighbours count snc r ( n a , n b ) for nodes n a , n b ∈ N j and radius r ∈ R as snc r ( n a , n b ) = | neigh r ( n a ) ∩ neigh r ( n b ) | We extract as many features f shared neigh as the number of radii in R , each cor-responding to the average shared neighbours count for a given radius r , definedas follows f shared neigh ( r ) = (cid:80) n a ,n b ∈N j ,n a (cid:54) = n b snc r ( n a , n b ) N ( N − / , r ∈ R Clustering coefficient.
The clustering coefficient [9] of a node n a ∈ N j is ameasure based on the number c of node pairs that lie within distance r ∈ R from n a and are neighbours of each other. An example is reported in figure 3. Thiscoefficient is calculated as the ratio between c and the number of neighboursof n a , i.e. its density. More formally the clustering coefficient cc r ( n a ) of a node n a ∈ N j for a radius r ∈ R is defined as cc r ( n a ) = |{ n b , n c ∈ neigh r ( n a ) | < d ( n b , n c ) < r }| nd r ( n a )We extract as many features f clustering as the number of radii in R , each corre-sponding to the average cluster coefficient for a given radius r , defined as follows f clustering ( r ) = (cid:80) n a ∈N j cc r ( n a ) N , r ∈ R
The bias index of a TG tg i ∈ T G with respect to all the TGs in T G is measuredas the distance between the topologies T i generated by tg i and all the topologies itle Suppressed Due to Excessive Length 9 node 1 node 2node 3 node 4 node 5 Fig. 3.
The neighbours of node 3 are nodes 1, 2, 4 and 5. Among those neighbours,there is a pair of nodes, i.e. nodes 4 and 5, which are neighbours of each other, whilenodes 1 and 2 are not neighbour of any other node. generated T . This distance is computed on the basis of the following feature-based representation of a topology t j t j = (cid:104) f j , . . . , f jF − (cid:105) where f jk is the value of the k -th feature of t j ( k = 0 , . . . , F −
1) and F is thenumber of used features, detailed in section 4.1, equal to 12 + 3 N R . Hedges g [8] is used to estimate the standardised mean difference betweentwo populations, i.e. the average distance between the elements of two differentpopulations, measured in standard deviations. Although in its original form itcan be applied to single-dimension elements only, we propose to extend Hedges’g to F dimensions to quantify the difference between topologies in T i and in T .We first detail how to apply Hedges’ g to a single feature f k , where k =0 , . . . , F −
1. We define T k and T ki as the projections of T and T i to feature f k ,respectively, as follows T k = { f jk | t j = (cid:104) f j , . . . , f jF − (cid:105) ∈ T }T ki = { f jk | t j = (cid:104) f j , . . . , f jF − (cid:105) ∈ T i } Let m k and s k be the mean and standard deviation of T k , respectively. Let m ki and s ki be the mean and standard deviation of T ki , respectively. In compliancewith the original formulation, we define Hedges’ g for a single feature f k as g ki = m k − m ki s ∗ ki There are 7 features for inter-node distances, 5 features for spatial distribution and asmany features as the number of radii N R for (i) node density, (ii) shared neighboursdistribution and (iii) clustering coefficient (see section 4.1).0 Michael O’Sullivan, Leonardo Aniello, and Vladimiro Sassone where s ∗ ki is the pooled standard deviation for T k and T ki , computed as follows s ∗ ki = (cid:115) ( | T k | − · ( s k ) + ( | T ki | − · ( s ki ) | T k | + | T ki | − g i for tg i , we combine all the F values g ki byconsidering each of them as a distance along one dimension, as follows g i = (cid:118)(cid:117)(cid:117)(cid:116) F − (cid:88) k =0 ( g ki ) TGs Selection.
The bias index can be used to choose what TG to pick to reducethe possible bias. Selecting the TG with the lowest bias index would correspondto using the set of topologies with the lowest distance, on average, from thewhole set T of available topologies. According to the methodology approachintroduced at the beginning of this section, this in turn means choosing the mostrepresentative subset of topologies available, if a single TG has to be selected.If more than one TG can be picked, say p out of N T G , then the same strategycan be used by considering the possible (cid:0) N TG p (cid:1) subsets of T , each in the form T i ,...,i p − = p − (cid:91) j =0 T i j with T i j ⊂ T , 0 ≤ i < · · · < i p − < N T G , 0 < p < N
T G , and computing thecorresponding bias index. Again, the subset with the lowest bias index is themost representative of T . We refer to g i ,...,i p − as the bias index of T i ,...,i p − . We consider the accuracy of a classifier trained with the topology generatorground truth
T GGT defined as follows
T GGT = N TG − (cid:91) i =0 {(cid:104) t j , tg i (cid:105) | t j ∈ T i } where each pair (cid:104) t j , tg i (cid:105) represents the fact that topology t j has been generatedby TG tg i , i.e. tg i is the class of t j . We use part of the ground truth for trainingand the other for testing, where we compute the actual classification accuracy.To avoid any possible bias deriving from the choice of how the ground truth issplit between training and testing, we employ the well-known k -fold cross vali-dation [18] method, which works as follows. The ground truth is first partitionedin k equally sized folds, then a classifier is trained from the scratch in k differentways by using each time all the folds but one. After each training, the resulting itle Suppressed Due to Excessive Length 11 classifier is tested by using the fold excluded during the training and the classifi-cation accuracy is recorded. The final accuracy is the average of the k accuracyvalues obtained during the k trainings.More formally, let T GGT l be the l -th partition of T GGT , for l = 0 , . . . , k − T GGT = k − (cid:91) l =0 T GGT l T GGT a ∩ T GGT b = ∅ , ≤ a < b < k Since all the folds have the same size, we have that |T GGT l | = N T N T G /k . Let C l : T → T G be the function computed by a classifier trained using all the foldsexcept
T GGT l . We define the classification accuracy a l for the l -th fold as theratio of correctly classified topologies to the total number of classified topologies a l = |{(cid:104) t j , tg i (cid:105) ∈ T GGT l | C l ( t j ) = tg i }||T GGT l | The final classification accuracy is defined as a = (cid:80) k − i =0 a l k . In the absence of bias,the classification accuracy should be close to 1 /T T G . Higher values indicate thatthere are some features that are peculiar to specific TGs. In that case, a featureanalysis can help to identify those features, to understand whether and to whatextent they are relevant for the particular experiments the generated topologieshave to be used for. The feature analysis can be based on a sequential featureselector [1] algorithm, used to reduce the initial dimension of the feature space.The goal is to create a subset of features that explain the most variance inthe dataset. This is done by either adding or removing one feature at a timeand measuring the corresponding classification accuracy until convergence isachieved, i.e. the process stops when the accuracy ceases to grow.
We apply the proposed methodology to a number of well-known TGs, detailedin section 5.1. The parameters we choose to instantiate the model (see section 3)are reported in section 5.2. The experiments on bias index and classificationaccuracy, as well as obtained results, are described in sections 5.3 and 5.4, re-spectively. We also carry out a more detailed analysis on the impact of eachfeature on experiments outcomes, pointed out in section 5.5.
In our experiments, we use three well known TGs: BRITE, NPART and GT-ITM, described in this section.
BRITE.
The
Boston University Representative Internet Topology [13] (BRITE)is a universal model-based TG designed to be extendable to enable the addition ofnew models. BRITE uses various models for the placement of nodes, as detailedbelow. Flat Router model . It represents a router-level topology and is designed forrouter networks. The placement of nodes is either random or uses the heavytailed approach, where the plane is divided into squares, each square is as-signed a number of nodes drawn from a heavy-tailed distribution, and thennodes are placed randomly within the square.2.
Flat AS-level model . It is very similar to the Flat Router model except thatit generates AS-level topologies.3.
Hierarchical Topologies model . It generates Internet-like topologies. It canbe configured to use either a top down or bottom up approach. In the firstcase, an AS-level topology is first built by using the Flat AS-level model,then for each node a router level topology is generated. In the second case,a router-level topology is first generated, then AS nodes are introduced andeach is linked to a number of router nodes.The Flat AS-Level model is used for our experiments because it is more repre-sentative of a mesh network. NPART.
The
Node Placement Algorithm for Realistic Topologies [14] (NPART)TG generates topologies based on properties of real networks, i.e. any artificialnetwork is generated randomly but in compliance with a number of propertiesof the real-world topologies. The generation algorithm used by NPART relies ona number of sociological and technological observations introduced by Aha etal. [1], listed below. – It is more likely that a new participant joins the network in areas whereconnectivity is high. – A participant in the network expects to have at least one single communi-cating link to the rest of the network, possibly creating a large number ofpenned nodes. – A pendant node may become a seed for a new, larger and well-connectedsub network. – It is the network that specifies the area it occupies, not the other way around.So, instead of defining the node placement area like most of the existingplacement algorithms, the network should be allowed to grow.Configuration options for the generation of topologies are
NPART Berlin (basedon the real Berlin’s mesh network consisting of 275 nodes),
NPART Leipzig (based on the real Leipzig’s mesh network consisting of 346 nodes),
Uniformplacement model (it uses uniform probability placement within the test area),
Grid placement (also known as mesh placement, where nodes are located atintersection of a rectangular grid),
Quasi-Grid placement (node are placed as aGaussian distribution with the mean given by regular grid points) and
RandomWaypoint Model (it is a random model for the movement of mobile users andhow their location, velocity and acceleration changes over time).The uniform placement model has been used in our experiments.
GT-ITM.
The
Georgia Tech Internetwork Topology Model [2] (GT-ITM) is amodel-based TG that produces wide area networks like topologies. Two models itle Suppressed Due to Excessive Length 13 are available to decide the node placement:
Flat Random Graphs and
Hierarchi-cal .The Flat Random Graphs model distributes nodes randomly over the testplane. This model does not aim to reflect the real world, it is rather for simplicity.A variations of this model uses Waxman probability [19] produce more realistictopologies.The Hierarchical model creates a topology by connecting smaller componentstogether according to a larger scale structure. This suggests that this model hasa propensity towards clustering, hence we choose the Flat Random Graph modelwhich represents better WANETs.
With reference to the system model defined in section 3, we consider the N T G =3 TGs described in the previous section, i.e.
T G = { BRITE, NPART, GT-ITM } ,and generate N T = 1000 topologies for each TG. Each topology has N = 1000nodes. The reference topology area has sides D = 1000 units long. We evaluatethe following N R = 8 radii: R = { , , , , , , , } . We compute the bias index g i for each TG and g i,j for each pair of TGs, asdescribed in section 4.2. The results are reported in table 1. As can be noted,BRITE topologies seem to be significantly different from those generated byNPART and GT-ITM, and vice-versa, which suggests that using either TG alonewould provide a set of topologies significantly different from the set including allthe topologies. However, if only one TG has to be selected, NPART proves togenerate topologies that are less different on average from those generated by allavailable TGs. If two TGs can be chosen, BRITE and NPART show to be thebest pair to consider. Table 1.
Bias index of the considered TGs.
Topology Generator(s) Bias index
NPART 1.890GT-ITM 2.145BRITE 4.282BRITE + NPART 0.908BRITE + GT-ITM 0.976NPART + GT-ITM 2.4304 Michael O’Sullivan, Leonardo Aniello, and Vladimiro Sassone
In this context, classification accuracy is investigated to understand to whatextent topologies can be distinguished with respect to their TG. The higherthe classification accuracy, the sharper the differences between topologies gen-erated by different TGs. Although classifiers are commonly selected and tunedto maximise classification accuracy, in this case we are only interested in veri-fying whether the accuracy can be relevantly higher than 1 /
3, i.e. 1 /N T G (seesection 4.3). We choose Naive Bayes as classifier because of its simplicity andwe test three different probability distributions, i.e. Gaussian, Bernoulli andMultinomial, to assess whether results are consistent regardless of the particulardistribution. Table 2 shows the classification accuracy for the three algorithms,where it can be observed that all the values are significantly larger than 1 / Table 2.
Classification accuracy for the three classification algorithms.
Algorithm Classification Accuracy (%)
GaussianNB 77.95BernoulliNB 58.56MultinomialNB 70.07
We carry out a more detailed analysis of what features weight most for classifi-cation by applying the Forward Sequential Selection (FSS) method as sequentialfeature selector (see section 4.3). FSS works sequentially, it starts with an emptyset of features and, at each iteration, selects the feature that yields the highestaccuracy. Figure 4 shows how the classification accuracy varies with the numberof considered features, for a single fold, and that the highest accuracy is achievedwith 4 features.Table 3 details the outcomes of the first 9 iterations of FSS, in terms of classi-fication accuracy and the corresponding set of features that yield such accuracy.Table 4 lists the four features yielding the highest classification accuracy. Thisfeatures analysis identifies what are the most distinguishing characteristics oftopologies generated by different TGs, with respect to the three TGs used inour experiments. Whether this fact can introduce bias depends on the specificapplication those topologies have to be used for, i.e. to what extent those featuresare relevant for the target scenario.
In this paper, we investigate the presence of bias in WANETs simulations due thechoice of what TG, or TGs, to use among a fixed number of available TGs. We itle Suppressed Due to Excessive Length 15
Fig. 4.
Classification accuracy by varying the number of used features for a specificfold.
Table 3.
Classification accuracy and corresponding features set for the first 9 iterationsof FSS.
Features Accuracy Feature IDs propose two metrics, namely bias index and classification accuracy, to measurethe extent and significance of the distance between topologies generated by asingle TG and topologies produced by all available TGs. We also propose amethodology to select what TG, or TGs, to pick to minimise the bias.We present an experimental evaluation where we compute bias index andclassification accuracy for three well-known TGs: BRITE, NPART and GT-ITM.Obtained results prove that topologies generated by a single TG are differentfrom those created by all the three TGs, and that the TG which generated acertain topology can be determined with relevant accuracy.As future work, we plan to carry out additional evaluations to investigatehow bias index and classification accuracy are linked to variance in the resultsof same experiments performed on different TGs. A number of reference algo-rithms can be chosen, e.g. routing protocols, and executed on available generated
Table 4.
Set of features yielding to the highest classification accuracy.
Feature ID Feature description
31 Shared neighbours distribution with 30 units radius6 Minimum inter-node distance30 Shared neighbours distribution with 20 units radius14 Node density with 20 units radius topologies to verify whether lower values for bias index and classification accu-racy can actually lead to reduced variance of obtained results, with respect tothe experimental outcomes that would be achieved by using all the availableTGs.An additional, significant future work concerns the sensitivity analysis onboth system model parameters and TGs configurations, to assess to what extentsuch a tuning affects computed values for bias index and classification accuracy.Furthermore, the choice of what classifier to use for the classification accuracyneeds to be investigated by considering a larger number of classification algo-rithms. eferences [1] David W Aha and Richard L Bankert. A comparative evaluation of sequen-tial feature selection algorithms. In
Learning from data , pages 199–206.Springer, 1996.[2] Kenneth L Calvert, Matthew B Doar, and Ellen W Zegura. Modeling in-ternet topology.
IEEE Communications magazine , 35(6):160–163, 1997.[3] Bo Cheng and G. Hancke. Energy efficient scalable video manycast inwireless ad-hoc networks. In
IECON 2016 - 42nd Annual Conference ofthe IEEE Industrial Electronics Society , pages 6216–6221, Oct 2016.[4] C. Cheng and S. Lin. A hole-bypassing routing algorithm for wanets. In , pages547–550, Oct 2017.[5] Peter Greig-Smith. The use of random and contiguous quadrats in the studyof the structure of plant communities.
Annals of Botany , pages 293–316,1952.[6] M. H. G¨unes and M. B. Akgn. Link-level network topology generation.In
Proceedings of 31st International Conference on Distributed ComputingSystems Workshops (ICDCSW) , 2011.[7] Oliver Heckmann, Michael Piringer, Jens Schmitt, and Ralf Steinmetz. Onrealistic network topologies for simulation. In
Proceedings of the ACM SIG-COMM workshop on Models, methods and tools for reproducible networkresearch , pages 28–32. ACM, 2003.[8] L. V. Hedges. Distribution Theory for Glass’s Estimator of Effect size andRelated Estimators.
Journal of Educational and Behavioral Statistics , 1981.[9] Paul W. Holland and Samuel Leinhardt. Transitivity in Structural Modelsof Small Groups.
Comparative Group Studies , 1971.[10] D. Magoni and J. . Pansiot. Evaluation of internet topology generators bypower law and distance indicators. In
Proceedings 10th IEEE InternationalConference on Networks (ICON 2002). Towards Network Superiority (Cat.No.02EX588) , pages 401–406, 2002.[11] Damien Magoni and J J Pansiot. Analysis and Comparison of InternetTopology Generators.
NETWORKING 2002: Networking Technologies,Services, and Protocols; Performance of Computer and CommunicationNetworks; Mobile and Wireless Communications , 2345:364–375, 2006.[12] Damien Magoni and Jean-Jacques Pansiot. Influence of network topologyon protocol simulation. In
International Conference on Networking , pages762–770. Springer, 2001.[13] Alberto Medina, Anukool Lakhina, Ibrahim Matta, and John Byers.BRITE: Universal Topology Generation from a User’s Perspective. Techni-cal report, Boston, MA, USA, 2001.[14] Bratislav Milic and Miroslaw Malek. NPART-node placement algorithm forrealistic topologies in wireless multihop network simulation. In
Proceedingsof the 2nd international conference on simulation tools and techniques , 2009. [15] S Nowak, M Nowak, and K Grochla. Properties of Advanced MeteringInfrastructure Networks’ Topologies.
Network Operations and ManagementSymposium (NOMS), 2014 IEEE , pages 1–6, 2014.[16] R Rossi, S Fahmy, and N Talukder. A Multi-Level Approach for EvaluatingInternet Topology Generators. , pages 9pp.–9 pp., 2013.[17] M L Sanni, A A Hashim, F Anwar, G S M Ahmed, and S Ali. How tomodel wireless mesh networks topology.
IOP Conference Series: MaterialsScience and Engineering , 53(1):012037, 2013.[18] Mervyn Stone. Cross-validatory choice and assessment of statistical pre-dictions.
Journal of the royal statistical society. Series B (Methodological) ,pages 111–147, 1974.[19] B. M. Waxman. Routing of multipoint connections.
IEEE Journal onSelected Areas in Communications , 6(9):1617–1622, Dec 1988.[20] Y. Xu, J. Liu, Y. Shen, X. Jiang, and T. Taleb. Security/qos-aware routeselection in multi-hop wireless ad hoc networks. In , pages 1–6, May 2016.[21] Y. Xu, J. Liu, O. Takahashi, N. Shiratori, and X. Jiang. Soqr: Secureoptimal qos routing in wireless ad hoc networks. In