[PDF] A Methodology to Select Topology Generators for WANET Simulations (Extended Version)

Abstract

Many academic and industrial research works on WANETs rely on simulations, at least in the first stages, to obtain preliminary results to be subsequently validated in real settings. Topology generators (TG) are commonly used to generate the initial placement of nodes in artificial WANET topologies, where those simulations take place. The significance of these experiments heavily depends on the representativeness of artificial topologies. Indeed, if they were not drawn fairly, obtained results would apply only to a subset of possible configurations, hence they would lack of the appropriate generality required to port them to the real world. Although using many TGs could mitigate this issue by generating topologies in several different ways, that would entail a significant additional effort. Hence, the problem arises of what TGs to choose, among a number of available generators, to maximise the representativeness of generated topologies and reduce the number of TGs to use. In this paper, we address that problem by investigating the presence of bias in the initial placement of nodes in artificial WANET topologies produced by different TGs. We propose a methodology to assess such bias and introduce two metrics to quantify the diversity of the topologies generated by a TG with respect to all the available TGs, which can be used to select what TGs to use. We carry out experiments on three well-known TGs, namely BRITE, NPART and GT-ITM. Obtained results show that using the artificial networks produced by a single TG can introduce bias.

Full PDF

AA Methodology to Select Topology Generatorsfor WANET Simulations(Extended Version)

Michael O’Sullivan, Leonardo Aniello, and Vladimiro Sassone

Electronics and Computer Science,

University of Southampton , Southampton, UK { M.O’Sullivan | L.Aniello | vsassone } @soton.ac.uk Abstract.

Many academic and industrial research works on WANETsrely on simulations, at least in the ﬁrst stages, to obtain preliminary re-sults to be subsequently validated in real settings. Topology generators(TG) are commonly used to generate the initial placement of nodes in ar-tiﬁcial WANET topologies, where those simulations take place. The sig-niﬁcance of these experiments heavily depends on the representativenessof artiﬁcial topologies. Indeed, if they were not drawn fairly, obtained re-sults would apply only to a subset of possible conﬁgurations, hence theywould lack of the appropriate generality required to port them to the realworld. Although using many TGs could mitigate this issue by generatingtopologies in several diﬀerent ways, that would entail a signiﬁcant addi-tional eﬀort. Hence, the problem arises of what TGs to choose, amonga number of available generators, to maximise the representativeness ofgenerated topologies and reduce the number of TGs to use.In this paper, we address that problem by investigating the presence ofbias in the initial placement of nodes in artiﬁcial WANET topologiesproduced by diﬀerent TGs. We propose a methodology to assess suchbias and introduce two metrics to quantify the diversity of the topologiesgenerated by a TG with respect to all the available TGs, which can beused to select what TGs to use. We carry out experiments on three well-known TGs, namely BRITE, NPART and GT-ITM. Obtained resultsshow that using the artiﬁcial networks produced by a single TG canintroduce bias.

Keywords:

Topology generator, WANET, BRITE, NPART, GT-ITM A wireless ad hoc network (WANET) is based on a decentralised topology of de-vices/nodes that cooperate to implement some routing protocol, i.e. each deviceforwards its own and other devices’ traﬃc according to a speciﬁc algorithm withthe aim of reaching the target destination. WANETs do not rely on any ﬁxedinfrastructure and each node can only communicate with those other nodes ly-ing within the transmission range of one another. WANET applications are wide a r X i v : . [ c s . N I] A ug Michael O’Sullivan, Leonardo Aniello, and Vladimiro Sassone and signiﬁcant, ranging from wireless sensor networks to vehicular ad hoc net-works (VANETs) to mobile ad hoc networks (MANETs), and they are used ineveryday scenarios as well as more critical settings, such as military operations.Several WANET aspects are still being investigated by the research com-munity, e.g. routing protocols [3,4] and security [20,21]. For convenience, manyacademic works heavily rely on simulation to test a proposed solution and obtainpreliminary results that are used to validate its eﬀectiveness. Network simula-tions commonly entail evaluating a given approach on many diﬀerent WANETtopologies to ensure results are meaningful, i.e. to have evidence that they canapply to a wide variety of networks and are not tied to particular network con-ﬁgurations. Hence, as also suggested by G¨unes et al. [6], a key aspect in anynetwork protocol simulation is the design and selection of what test networktopologies to consider.Network topology generators (TGs) are usually employed to create a pos-sibly large number of topologies, on the basis of predeﬁned network models,real-world measurements and additional parameters available to tune the gen-eration process. Although any TG is designed and implemented to generate arepresentative set of topologies, diﬀerent TGs do not rely on the same modelsand assumptions, do not follow the same generation approach and thus are likelyto produce diverse topologies, which in turn can lead to obtain dissimilar sim-ulation results [12,7]. Hence, we claim that the choice of the TG can aﬀect thistype of experiments, i.e. a TG is likely to introduce bias in simulations. Thisholds true for WANET simulations as well, where TGs are used to generate theinitial placement of nodes, which in turn plays an important role in the way aWANET network evolves over time.Despite the fact that each TG has its own peculiarities, and that sometimesresearchers can select a TG on the basis of the speciﬁc mathematical or physicalmodel they need, there are in general several TGs that can be used to createartiﬁcial topologies representing the initial placement of nodes in WANETs.In this context, the best option would be to use all the available TGs to runsimulations on the largest possible range of topologies, so as to ensure thatobtained results are not biased by the choice of a speciﬁc TG, or subset ofTGs. On the other hand, using many TGs proves to be really demanding forresearchers in terms of required time and eﬀort to delve into the technical issuesof each TG. Therefore, a trade-oﬀ arises between reducing the eﬀort to spend insetting up the simulations, i.e. minimising how many TGs to use, and maximisingthe representativeness of the simulations themselves, i.e. minimising the biasintroduced by TG selection .In this paper, we delve into the analysis of the diﬀerences between topologiesgenerated by distinct TGs to help researchers to reduce how many TGs to usewhile still preserving the representativeness of generated topologies. In partic-ular, given a ﬁxed number of available TGs , we address the following researchquestions. itle Suppressed Due to Excessive Length 3 – RQ1 : How to measure the diﬀerence between topologies generated by dis-tinct TGs? i.e. how to characterise the bias introduced by the choice of aspeciﬁc TG rather than using all the TGs? – RQ2 : how to choose what TG, or TGs, to use to reduce such a bias?The approach we propose relies on a compact, numeric representation oftopologies, based on a number of aspects about how network nodes are placedover the plane (e.g. inter-node distance, clustering) and about how WANETswork (e.g. nodes can only communicate with other nodes within their transmis-sion range). Each topology is modelled as a vector of numeric features, whichenables to compute distance metrics. We consider a ﬁxed number of TGs andpropose to interpret the bias as a measure of the diﬀerences that arise in gener-ated WANET topologies when selecting any single TG, or subset of TGS, insteadof picking all the available TGs.We tackle RQ1 by focussing on two complementary facets of the distancesbetween topologies. On the one hand, we want to quantify the bias by mea-suring the average distance between topologies generated by distinct TGs. Inthe speciﬁc, we use

Hedges’ g measure of eﬀect size to compute the bias index ,which measures the diﬀerence between topologies produced by a speciﬁc TGs, orsubset of TGs, and those created by all the available TGs . On the other hand, weare also interested in evaluating to what extent existing diﬀerences are distin-guishing of some TG, i.e. whether such diﬀerences allow to determine which TGgenerated a topology, regardless of the extent of those diﬀerences. In this regard,we employ machine learning techniques to compute the classiﬁcation accuracy ,i.e. to estimate how precisely we can discover which TG generated a topology.We answer RQ2 by proposing a simple methodology, based on the bias index,to select what TGs to use to reduce the bias, depending on how many TGs canbe picked at most.We carry out an experimental evaluation using three well-known TGs, i.e.BRITE, NPART and TG-ITM. Obtained results show that using a single TG islikely to introduce bias, and that in this case picking NPART is the best choiceto mitigate this issue. If two TGs can be used, BRITE and NPART provide thelowest bias. The experiments on the classiﬁcation accuracy show that topologiescan be correctly classiﬁed according to their TGs with high accuracy, i.e. upto almost 78%, and that, in this speciﬁc case, four topology features contributemost to distinguishing between diﬀerent TGs.To the best of our knowledge, this is the ﬁrst work in literature that system-atically investigates the diﬀerences between topologies generated by diverse TGsin the context of WANET simulation. The contributions of this work are1. the deﬁnition of a vector-based representation of WANET topologies , basedon a number of features derived from diﬀerent aspects of node placement;2. the deﬁnition of two novel metrics to assess the diﬀerences between TGs, i.e.the bias index and the classiﬁcation accuracy ;3. a methodology to choose what TG, or TGs, to use among available TGs tominimise the bias;

Michael O’Sullivan, Leonardo Aniello, and Vladimiro Sassone

4. an experimental evaluation on BRITE, NPART and GT-ITM TGs, showingthe presence of bias in picking either a single TG or a pair of TGs.The rest of the paper is organised as follows. Section 2 describes backgroundand discusses related work. The system model for our investigation is introducedin section 3. The methodology we propose is detailed in section 4. The experi-ments and obtained results are presented in section 5. Finally, section 6 drawsconclusions and outlines possible future work.

In this paper we focus on TGs that provide the initial placement of nodes overa plane. As we are dealing with WANETs, we are not interested in how nodesare connected among each other and assume that any node can communicatedirectly with all the nodes lying within its transmission range.TGs can diﬀer mainly in how nodes placement is decided [17] and whateach node represents [7].

Node placement strategy can be based either on some predeﬁned model or on real-world measurements . In the former case, a certainprobability distribution can be used, such as the

Waxman model [19], or spe-ciﬁc strategies can be enforced to preserve the inter-node distance among nodesplaced on a line ( chain node placement ) or to position nodes at the intersectionsof square cells when the plane is organised as a grid ( grid node placement ). In thelatter case, nodes positions are instead determined in compliance with real-worldmeasurements of existing network topologies. Nodes in an artiﬁcial topology canrepresent either autonomous systems (AS), i.e.

AS-level topologies , or routers,i.e. router-level topologies .Some existing works in literature deal with the investigation of diverse as-pects of TGs, e.g. how realistic generated topologies are. Several works [10,11,12]focus on TGs for Internet topologies by comparing the topologies they generatewith available real Internet map topologies, with the aim of assessing to whatextent those topologies can be considered realistic. Rossi et al. [16] propose aframework to analyse Internet topologies by using a multi-level approach basedon a number of graph measures and existing reference datasets. Their goal isto assess whether Internet TGs comply with their claimed objectives and howrealistic generated topologies are. Our work diﬀers from those papers mainlybecause we do not evaluate whether artiﬁcial topologies are realistic, rather weinvestigate the bias in topologies generated by diﬀerent TGs. Furthermore, wetackle WANETs rather than Internet.Heckmann et al. [7] compare three TGs according to the similarity of gener-ated topologies with an available collection of real-world topologies.Although all those works, likewise ours, focus on evaluating and comparingexisting TGs, the main diﬀerence lies in the goal of such a comparison. In fact,while existing literature is interested in measuring how well generated topologiesrepresent real-world networks, we concentrate on an orthogonal aspect by inves-tigating whether picking a certain TG rather than another one, or rather than itle Suppressed Due to Excessive Length 5 choosing more TGs, can introduce bias. From this point of view, our contributionis novel and complements existing research on comparing available TGs.

We consider a set

T G with N T G topology generators (TG), i.e. |T G| = N T G .Each TG generates coordinates for the initial placement of nodes, i.e. devices,within a deﬁned square topology area , with sides D units long. Each TG tg i gen-erates a set T i with N T topologies, where i =0 , . . . , N T G −

1. The set containingall the topologies generated by all the TGs is referred to as T = N TG − (cid:91) i =0 T i hence |T | = N T G · N T . Each topology t j ∈ T i has N nodes N j = { n k } , where i =0 , . . . , N T G − j =0 , . . . , N T − k =0 , . . . , N −

1. Each node n k is identiﬁed byits bi-dimensional coordinates ( x k , y k ) in the topology area, where 0 ≤ x k , y k ≤ D . Given two nodes n a and n b ( a, b =0 , . . . , N − d ( n a , n b ) = (cid:112) ( x a − x b ) + ( y a − y b ) In WANETs, any device can establish connections with other devices placedwithin a speciﬁc distance, which we refer to as radius r . We consider a number N R of diﬀerent radii R = { r i } , where i =0 , . . . , N R − < r j < r j +1 < D for j =0 , . . . , N R − In general, a topology generator (TG) introduces bias if the topologies it gener-ates are not representative enough of some target application, such as analysingrouting protocols. It is not trivial to decide whether a given set of topologiescan be considered representative enough of a certain application, let alone it ispossible to provide general criteria to evaluate the representativeness of a groupof topologies regardless of what they are intended to be used for. However, ifwe consider the universe set

T U , containing all the possible topologies, and asubset of it

S ⊂ T U , we can investigate to what extent S is representative of T U by inspecting the diﬀerences between topologies in S and topologies in T U .We propose to use those diﬀerences to analyse the bias of using topologies in S only, i.e. the larger and sharper such diﬀerences, the higher the bias.Although we cannot have in practice a set like T U , we do have a numberof available TGs,

T G (see section 3), which can be used to generate a set oftopologies T . While we do not know how much T is representative of T U , weclaim that T is the best approximation of T U we can aim for from a pragmaticpoint of view. Hence, to measure the bias introduced by a TG tg i ∈ T G , we Michael O’Sullivan, Leonardo Aniello, and Vladimiro Sassone can examine the diﬀerences between the topologies it generates, T i , and thetopologies in T .We propose a two-steps methodology to analyse the bias of TGs. The ﬁrststep is modelling topologies by extracting a number of characteristic features,which will be used to have a compact, numeric representation of topologies andenable to measure the diﬀerences between them. The second step is indeed com-puting metrics to quantify the dissimilarities between topologies generated bydiﬀerent TGs. We claim that there are two complementary aspects to investi-gate when analysing such dissimilarities. On the one hand, the extent of thosediﬀerences, i.e. how large they are, provides an objective scale of the bias. Onthe other hand, the peculiarity of those diﬀerences, i.e. how much distinctive ofTGs they are, allows to ﬁgure out whether topologies generated by diﬀerent TGsare distinguishable from each other by looking at speciﬁc aspects, regardless ofthe extent of the existing diﬀerences. We propose the following two approaches. – Computing the average distance between the topologies generated by a TGand all the topologies in T . By mapping topologies into the space generatedby the chosen features, we use the Hedges’ g [8], measure of eﬀect size, toquantify the diﬀerence between two populations: the topologies generatedby a speciﬁc TG and the topologies in T . We refer to such a diﬀerence as bias index . – Assessing the accuracy in distinguishing which TG generated a given topol-ogy. We train a classiﬁer with the topologies in T and the information onwhich TG generated each of them, then we test the obtained classiﬁcationmodel by measuring its classiﬁcation accuracy .We deﬁne the features we use to characterise topologies in section 4.1, then wedetail how compute the bias index and the classiﬁcation accuracy in section 4.2and 4.3, respectively, To choose what features to consider, we focus on the aspects we deem mostrepresentative to show variance within topologies. We thus consider the featuresthat characterise the placement of the nodes within the test plane and relation-ships between nodes. We extract features by looking at the following aspects of ageneric topology t j : inter-node distance, node spatial distribution, node density,shared node neighbours, node clustering coeﬃcient. Inter-node Distance.

We consider the set of node distances D deﬁned asfollows D = { d ( n a , n b ) | n a , n b ∈ N j , ≤ a < b < N } The features we extract from D are (i) the minimum value d min , (ii) the maxi-mum value d max , (iii) the value range d max − d min , (iv) the mode, i.e. the mostfrequent value , (v) how many times the mode occurs, (vi) the mean value and(vii) the standard deviation. If there are more most-frequent values, by convention we pick the smallest one.itle Suppressed Due to Excessive Length 7

Spatial Distribution.

By taking inspiration from the Quadrat Method [5] usedto test the Complete Spatial Randomness hypothesis, we partition the topologyarea in d smaller squares, each with sides D/d units long. Figure 1 shows anexample of topology area partitioning.

Fig. 1.

Partition of a topology area with 1000 units sides in 100 smaller squares, eachwith 100 units dies. This partitioning is used to compute spatial distribution features.

We consider the set

N C of node counts, where each element nc s ∈ N C is thenumber of nodes in the s -th partition of the topology area, with s =0 , . . . , d − N C are (i) the minimum value nc min , (ii) themaximum value nc max , (iii) the value range nc max − nc min , (iv) the mode and(v) how many times the mode occurs. Node Density.

We deﬁne the density nd r ( n a ) of a node n a ∈ N j of a topology t j , for a given radius r ∈ R , as the number of other nodes within distance r from n a , i.e. nd r ( n a ) = |{ n b ∈ N j \ { n a } | d ( n a , n b ) < r }| We extract as many features f density as the number of radii in R , each corre-sponding to the average node density for a given radius r , deﬁned as follows f density ( r ) = (cid:80) n a ∈N j nd r ( n a ) N , r ∈ R

Shared Neighbours Distribution.

For any given pair of nodes n a , n b ∈ N j and radius r ∈ R , the shared neighbours [15] are those nodes within distance r from both n a and n b . Figure 2 shows an example where nodes 1 and 3 areshared neighbours of nodes 3 and 4. We ﬁrst introduce the neighbours function neigh r ( n a ) for a node n a ∈ N j and a radius r as neigh r ( n a ) = { n b ∈ N j \ { n a } | d ( n a , n b ) < r } Michael O’Sullivan, Leonardo Aniello, and Vladimiro Sassone node 2node 3 node 4node 1

Fig. 2.

Nodes 1 and 2 are shared neighbours of nodes 3 and 4. In this case, the valueof shared neighbours count for nodes 3 and 4 is 2.

Then we deﬁne the shared neighbours count snc r ( n a , n b ) for nodes n a , n b ∈ N j and radius r ∈ R as snc r ( n a , n b ) = | neigh r ( n a ) ∩ neigh r ( n b ) | We extract as many features f shared neigh as the number of radii in R , each cor-responding to the average shared neighbours count for a given radius r , deﬁnedas follows f shared neigh ( r ) = (cid:80) n a ,n b ∈N j ,n a (cid:54) = n b snc r ( n a , n b ) N ( N − / , r ∈ R Clustering coeﬃcient.

The clustering coeﬃcient [9] of a node n a ∈ N j is ameasure based on the number c of node pairs that lie within distance r ∈ R from n a and are neighbours of each other. An example is reported in ﬁgure 3. Thiscoeﬃcient is calculated as the ratio between c and the number of neighboursof n a , i.e. its density. More formally the clustering coeﬃcient cc r ( n a ) of a node n a ∈ N j for a radius r ∈ R is deﬁned as cc r ( n a ) = |{ n b , n c ∈ neigh r ( n a ) | < d ( n b , n c ) < r }| nd r ( n a )We extract as many features f clustering as the number of radii in R , each corre-sponding to the average cluster coeﬃcient for a given radius r , deﬁned as follows f clustering ( r ) = (cid:80) n a ∈N j cc r ( n a ) N , r ∈ R

The bias index of a TG tg i ∈ T G with respect to all the TGs in T G is measuredas the distance between the topologies T i generated by tg i and all the topologies itle Suppressed Due to Excessive Length 9 node 1 node 2node 3 node 4 node 5 Fig. 3.

The neighbours of node 3 are nodes 1, 2, 4 and 5. Among those neighbours,there is a pair of nodes, i.e. nodes 4 and 5, which are neighbours of each other, whilenodes 1 and 2 are not neighbour of any other node. generated T . This distance is computed on the basis of the following feature-based representation of a topology t j t j = (cid:104) f j , . . . , f jF − (cid:105) where f jk is the value of the k -th feature of t j ( k = 0 , . . . , F −

1) and F is thenumber of used features, detailed in section 4.1, equal to 12 + 3 N R . Hedges g [8] is used to estimate the standardised mean diﬀerence betweentwo populations, i.e. the average distance between the elements of two diﬀerentpopulations, measured in standard deviations. Although in its original form itcan be applied to single-dimension elements only, we propose to extend Hedges’g to F dimensions to quantify the diﬀerence between topologies in T i and in T .We ﬁrst detail how to apply Hedges’ g to a single feature f k , where k =0 , . . . , F −

1. We deﬁne T k and T ki as the projections of T and T i to feature f k ,respectively, as follows T k = { f jk | t j = (cid:104) f j , . . . , f jF − (cid:105) ∈ T }T ki = { f jk | t j = (cid:104) f j , . . . , f jF − (cid:105) ∈ T i } Let m k and s k be the mean and standard deviation of T k , respectively. Let m ki and s ki be the mean and standard deviation of T ki , respectively. In compliancewith the original formulation, we deﬁne Hedges’ g for a single feature f k as g ki = m k − m ki s ∗ ki There are 7 features for inter-node distances, 5 features for spatial distribution and asmany features as the number of radii N R for (i) node density, (ii) shared neighboursdistribution and (iii) clustering coeﬃcient (see section 4.1).0 Michael O’Sullivan, Leonardo Aniello, and Vladimiro Sassone where s ∗ ki is the pooled standard deviation for T k and T ki , computed as follows s ∗ ki = (cid:115) ( | T k | − · ( s k ) + ( | T ki | − · ( s ki ) | T k | + | T ki | − g i for tg i , we combine all the F values g ki byconsidering each of them as a distance along one dimension, as follows g i = (cid:118)(cid:117)(cid:117)(cid:116) F − (cid:88) k =0 ( g ki ) TGs Selection.

The bias index can be used to choose what TG to pick to reducethe possible bias. Selecting the TG with the lowest bias index would correspondto using the set of topologies with the lowest distance, on average, from thewhole set T of available topologies. According to the methodology approachintroduced at the beginning of this section, this in turn means choosing the mostrepresentative subset of topologies available, if a single TG has to be selected.If more than one TG can be picked, say p out of N T G , then the same strategycan be used by considering the possible (cid:0) N TG p (cid:1) subsets of T , each in the form T i ,...,i p − = p − (cid:91) j =0 T i j with T i j ⊂ T , 0 ≤ i < · · · < i p − < N T G , 0 < p < N

T G , and computing thecorresponding bias index. Again, the subset with the lowest bias index is themost representative of T . We refer to g i ,...,i p − as the bias index of T i ,...,i p − . We consider the accuracy of a classiﬁer trained with the topology generatorground truth

T GGT deﬁned as follows

T GGT = N TG − (cid:91) i =0 {(cid:104) t j , tg i (cid:105) | t j ∈ T i } where each pair (cid:104) t j , tg i (cid:105) represents the fact that topology t j has been generatedby TG tg i , i.e. tg i is the class of t j . We use part of the ground truth for trainingand the other for testing, where we compute the actual classiﬁcation accuracy.To avoid any possible bias deriving from the choice of how the ground truth issplit between training and testing, we employ the well-known k -fold cross vali-dation [18] method, which works as follows. The ground truth is ﬁrst partitionedin k equally sized folds, then a classiﬁer is trained from the scratch in k diﬀerentways by using each time all the folds but one. After each training, the resulting itle Suppressed Due to Excessive Length 11 classiﬁer is tested by using the fold excluded during the training and the classiﬁ-cation accuracy is recorded. The ﬁnal accuracy is the average of the k accuracyvalues obtained during the k trainings.More formally, let T GGT l be the l -th partition of T GGT , for l = 0 , . . . , k − T GGT = k − (cid:91) l =0 T GGT l T GGT a ∩ T GGT b = ∅ , ≤ a < b < k Since all the folds have the same size, we have that |T GGT l | = N T N T G /k . Let C l : T → T G be the function computed by a classiﬁer trained using all the foldsexcept

T GGT l . We deﬁne the classiﬁcation accuracy a l for the l -th fold as theratio of correctly classiﬁed topologies to the total number of classiﬁed topologies a l = |{(cid:104) t j , tg i (cid:105) ∈ T GGT l | C l ( t j ) = tg i }||T GGT l | The ﬁnal classiﬁcation accuracy is deﬁned as a = (cid:80) k − i =0 a l k . In the absence of bias,the classiﬁcation accuracy should be close to 1 /T T G . Higher values indicate thatthere are some features that are peculiar to speciﬁc TGs. In that case, a featureanalysis can help to identify those features, to understand whether and to whatextent they are relevant for the particular experiments the generated topologieshave to be used for. The feature analysis can be based on a sequential featureselector [1] algorithm, used to reduce the initial dimension of the feature space.The goal is to create a subset of features that explain the most variance inthe dataset. This is done by either adding or removing one feature at a timeand measuring the corresponding classiﬁcation accuracy until convergence isachieved, i.e. the process stops when the accuracy ceases to grow.

We apply the proposed methodology to a number of well-known TGs, detailedin section 5.1. The parameters we choose to instantiate the model (see section 3)are reported in section 5.2. The experiments on bias index and classiﬁcationaccuracy, as well as obtained results, are described in sections 5.3 and 5.4, re-spectively. We also carry out a more detailed analysis on the impact of eachfeature on experiments outcomes, pointed out in section 5.5.

In our experiments, we use three well known TGs: BRITE, NPART and GT-ITM, described in this section.

BRITE.

The

Boston University Representative Internet Topology [13] (BRITE)is a universal model-based TG designed to be extendable to enable the addition ofnew models. BRITE uses various models for the placement of nodes, as detailedbelow. Flat Router model . It represents a router-level topology and is designed forrouter networks. The placement of nodes is either random or uses the heavytailed approach, where the plane is divided into squares, each square is as-signed a number of nodes drawn from a heavy-tailed distribution, and thennodes are placed randomly within the square.2.

Flat AS-level model . It is very similar to the Flat Router model except thatit generates AS-level topologies.3.

Hierarchical Topologies model . It generates Internet-like topologies. It canbe conﬁgured to use either a top down or bottom up approach. In the ﬁrstcase, an AS-level topology is ﬁrst built by using the Flat AS-level model,then for each node a router level topology is generated. In the second case,a router-level topology is ﬁrst generated, then AS nodes are introduced andeach is linked to a number of router nodes.The Flat AS-Level model is used for our experiments because it is more repre-sentative of a mesh network. NPART.

The

Node Placement Algorithm for Realistic Topologies [14] (NPART)TG generates topologies based on properties of real networks, i.e. any artiﬁcialnetwork is generated randomly but in compliance with a number of propertiesof the real-world topologies. The generation algorithm used by NPART relies ona number of sociological and technological observations introduced by Aha etal. [1], listed below. – It is more likely that a new participant joins the network in areas whereconnectivity is high. – A participant in the network expects to have at least one single communi-cating link to the rest of the network, possibly creating a large number ofpenned nodes. – A pendant node may become a seed for a new, larger and well-connectedsub network. – It is the network that speciﬁes the area it occupies, not the other way around.So, instead of deﬁning the node placement area like most of the existingplacement algorithms, the network should be allowed to grow.Conﬁguration options for the generation of topologies are

NPART Berlin (basedon the real Berlin’s mesh network consisting of 275 nodes),

NPART Leipzig (based on the real Leipzig’s mesh network consisting of 346 nodes),

Uniformplacement model (it uses uniform probability placement within the test area),

Grid placement (also known as mesh placement, where nodes are located atintersection of a rectangular grid),

Quasi-Grid placement (node are placed as aGaussian distribution with the mean given by regular grid points) and

RandomWaypoint Model (it is a random model for the movement of mobile users andhow their location, velocity and acceleration changes over time).The uniform placement model has been used in our experiments.

GT-ITM.

The

Georgia Tech Internetwork Topology Model [2] (GT-ITM) is amodel-based TG that produces wide area networks like topologies. Two models itle Suppressed Due to Excessive Length 13 are available to decide the node placement:

Flat Random Graphs and

Hierarchi-cal .The Flat Random Graphs model distributes nodes randomly over the testplane. This model does not aim to reﬂect the real world, it is rather for simplicity.A variations of this model uses Waxman probability [19] produce more realistictopologies.The Hierarchical model creates a topology by connecting smaller componentstogether according to a larger scale structure. This suggests that this model hasa propensity towards clustering, hence we choose the Flat Random Graph modelwhich represents better WANETs.

With reference to the system model deﬁned in section 3, we consider the N T G =3 TGs described in the previous section, i.e.

T G = { BRITE, NPART, GT-ITM } ,and generate N T = 1000 topologies for each TG. Each topology has N = 1000nodes. The reference topology area has sides D = 1000 units long. We evaluatethe following N R = 8 radii: R = { , , , , , , , } . We compute the bias index g i for each TG and g i,j for each pair of TGs, asdescribed in section 4.2. The results are reported in table 1. As can be noted,BRITE topologies seem to be signiﬁcantly diﬀerent from those generated byNPART and GT-ITM, and vice-versa, which suggests that using either TG alonewould provide a set of topologies signiﬁcantly diﬀerent from the set including allthe topologies. However, if only one TG has to be selected, NPART proves togenerate topologies that are less diﬀerent on average from those generated by allavailable TGs. If two TGs can be chosen, BRITE and NPART show to be thebest pair to consider. Table 1.

Bias index of the considered TGs.

Topology Generator(s) Bias index

NPART 1.890GT-ITM 2.145BRITE 4.282BRITE + NPART 0.908BRITE + GT-ITM 0.976NPART + GT-ITM 2.4304 Michael O’Sullivan, Leonardo Aniello, and Vladimiro Sassone

In this context, classiﬁcation accuracy is investigated to understand to whatextent topologies can be distinguished with respect to their TG. The higherthe classiﬁcation accuracy, the sharper the diﬀerences between topologies gen-erated by diﬀerent TGs. Although classiﬁers are commonly selected and tunedto maximise classiﬁcation accuracy, in this case we are only interested in veri-fying whether the accuracy can be relevantly higher than 1 /

3, i.e. 1 /N T G (seesection 4.3). We choose Naive Bayes as classiﬁer because of its simplicity andwe test three diﬀerent probability distributions, i.e. Gaussian, Bernoulli andMultinomial, to assess whether results are consistent regardless of the particulardistribution. Table 2 shows the classiﬁcation accuracy for the three algorithms,where it can be observed that all the values are signiﬁcantly larger than 1 / Table 2.

Classiﬁcation accuracy for the three classiﬁcation algorithms.

Algorithm Classiﬁcation Accuracy (%)

GaussianNB 77.95BernoulliNB 58.56MultinomialNB 70.07

We carry out a more detailed analysis of what features weight most for classiﬁ-cation by applying the Forward Sequential Selection (FSS) method as sequentialfeature selector (see section 4.3). FSS works sequentially, it starts with an emptyset of features and, at each iteration, selects the feature that yields the highestaccuracy. Figure 4 shows how the classiﬁcation accuracy varies with the numberof considered features, for a single fold, and that the highest accuracy is achievedwith 4 features.Table 3 details the outcomes of the ﬁrst 9 iterations of FSS, in terms of classi-ﬁcation accuracy and the corresponding set of features that yield such accuracy.Table 4 lists the four features yielding the highest classiﬁcation accuracy. Thisfeatures analysis identiﬁes what are the most distinguishing characteristics oftopologies generated by diﬀerent TGs, with respect to the three TGs used inour experiments. Whether this fact can introduce bias depends on the speciﬁcapplication those topologies have to be used for, i.e. to what extent those featuresare relevant for the target scenario.

In this paper, we investigate the presence of bias in WANETs simulations due thechoice of what TG, or TGs, to use among a ﬁxed number of available TGs. We itle Suppressed Due to Excessive Length 15

Fig. 4.

Classiﬁcation accuracy by varying the number of used features for a speciﬁcfold.

Table 3.

Classiﬁcation accuracy and corresponding features set for the ﬁrst 9 iterationsof FSS.

Features Accuracy Feature IDs propose two metrics, namely bias index and classiﬁcation accuracy, to measurethe extent and signiﬁcance of the distance between topologies generated by asingle TG and topologies produced by all available TGs. We also propose amethodology to select what TG, or TGs, to pick to minimise the bias.We present an experimental evaluation where we compute bias index andclassiﬁcation accuracy for three well-known TGs: BRITE, NPART and GT-ITM.Obtained results prove that topologies generated by a single TG are diﬀerentfrom those created by all the three TGs, and that the TG which generated acertain topology can be determined with relevant accuracy.As future work, we plan to carry out additional evaluations to investigatehow bias index and classiﬁcation accuracy are linked to variance in the resultsof same experiments performed on diﬀerent TGs. A number of reference algo-rithms can be chosen, e.g. routing protocols, and executed on available generated

Table 4.

Set of features yielding to the highest classiﬁcation accuracy.

Feature ID Feature description

31 Shared neighbours distribution with 30 units radius6 Minimum inter-node distance30 Shared neighbours distribution with 20 units radius14 Node density with 20 units radius topologies to verify whether lower values for bias index and classiﬁcation accu-racy can actually lead to reduced variance of obtained results, with respect tothe experimental outcomes that would be achieved by using all the availableTGs.An additional, signiﬁcant future work concerns the sensitivity analysis onboth system model parameters and TGs conﬁgurations, to assess to what extentsuch a tuning aﬀects computed values for bias index and classiﬁcation accuracy.Furthermore, the choice of what classiﬁer to use for the classiﬁcation accuracyneeds to be investigated by considering a larger number of classiﬁcation algo-rithms. eferences [1] David W Aha and Richard L Bankert. A comparative evaluation of sequen-tial feature selection algorithms. In

Learning from data , pages 199–206.Springer, 1996.[2] Kenneth L Calvert, Matthew B Doar, and Ellen W Zegura. Modeling in-ternet topology.

IEEE Communications magazine , 35(6):160–163, 1997.[3] Bo Cheng and G. Hancke. Energy eﬃcient scalable video manycast inwireless ad-hoc networks. In

IECON 2016 - 42nd Annual Conference ofthe IEEE Industrial Electronics Society , pages 6216–6221, Oct 2016.[4] C. Cheng and S. Lin. A hole-bypassing routing algorithm for wanets. In , pages547–550, Oct 2017.[5] Peter Greig-Smith. The use of random and contiguous quadrats in the studyof the structure of plant communities.

Annals of Botany , pages 293–316,1952.[6] M. H. G¨unes and M. B. Akgn. Link-level network topology generation.In

Proceedings of 31st International Conference on Distributed ComputingSystems Workshops (ICDCSW) , 2011.[7] Oliver Heckmann, Michael Piringer, Jens Schmitt, and Ralf Steinmetz. Onrealistic network topologies for simulation. In

Proceedings of the ACM SIG-COMM workshop on Models, methods and tools for reproducible networkresearch , pages 28–32. ACM, 2003.[8] L. V. Hedges. Distribution Theory for Glass’s Estimator of Eﬀect size andRelated Estimators.

Journal of Educational and Behavioral Statistics , 1981.[9] Paul W. Holland and Samuel Leinhardt. Transitivity in Structural Modelsof Small Groups.

Comparative Group Studies , 1971.[10] D. Magoni and J. . Pansiot. Evaluation of internet topology generators bypower law and distance indicators. In

Proceedings 10th IEEE InternationalConference on Networks (ICON 2002). Towards Network Superiority (Cat.No.02EX588) , pages 401–406, 2002.[11] Damien Magoni and J J Pansiot. Analysis and Comparison of InternetTopology Generators.

NETWORKING 2002: Networking Technologies,Services, and Protocols; Performance of Computer and CommunicationNetworks; Mobile and Wireless Communications , 2345:364–375, 2006.[12] Damien Magoni and Jean-Jacques Pansiot. Inﬂuence of network topologyon protocol simulation. In

International Conference on Networking , pages762–770. Springer, 2001.[13] Alberto Medina, Anukool Lakhina, Ibrahim Matta, and John Byers.BRITE: Universal Topology Generation from a User’s Perspective. Techni-cal report, Boston, MA, USA, 2001.[14] Bratislav Milic and Miroslaw Malek. NPART-node placement algorithm forrealistic topologies in wireless multihop network simulation. In

Proceedingsof the 2nd international conference on simulation tools and techniques , 2009. [15] S Nowak, M Nowak, and K Grochla. Properties of Advanced MeteringInfrastructure Networks’ Topologies.

Network Operations and ManagementSymposium (NOMS), 2014 IEEE , pages 1–6, 2014.[16] R Rossi, S Fahmy, and N Talukder. A Multi-Level Approach for EvaluatingInternet Topology Generators. , pages 9pp.–9 pp., 2013.[17] M L Sanni, A A Hashim, F Anwar, G S M Ahmed, and S Ali. How tomodel wireless mesh networks topology.

IOP Conference Series: MaterialsScience and Engineering , 53(1):012037, 2013.[18] Mervyn Stone. Cross-validatory choice and assessment of statistical pre-dictions.

Journal of the royal statistical society. Series B (Methodological) ,pages 111–147, 1974.[19] B. M. Waxman. Routing of multipoint connections.

IEEE Journal onSelected Areas in Communications , 6(9):1617–1622, Dec 1988.[20] Y. Xu, J. Liu, Y. Shen, X. Jiang, and T. Taleb. Security/qos-aware routeselection in multi-hop wireless ad hoc networks. In , pages 1–6, May 2016.[21] Y. Xu, J. Liu, O. Takahashi, N. Shiratori, and X. Jiang. Soqr: Secureoptimal qos routing in wireless ad hoc networks. In