[PDF] Versatile Robust Clustering of Ad Hoc Cognitive Radio Network

Abstract

Cluster structure in cognitive radio networks facilitates cooperative spectrum sensing, routing and other functionalities. The unlicensed channels, which are available for every member of a group of cognitive radio users, consolidate the group into a cluster, and the availability of unlicensed channels decides the robustness of that cluster against the licensed users' influence. This paper analyses the problem that how to form robust clusters in cognitive radio network, so that more cognitive radio users can get benefits from cluster structure even when the primary users' operation are intense. We provide a formal description of robust clustering problem, prove it to be NP-hard and propose a centralized solution, besides, a distributed solution is proposed to suit the dynamics in the ad hoc cognitive radio network. Congestion game model is adopted to analyze the process of cluster formation, which not only contributes designing the distributed clustering scheme directly, but also provides the guarantee of convergence into Nash Equilibrium and convergence speed. Our proposed clustering solution is versatile to fulfill some other requirements such as faster convergence and cluster size control. The proposed distributed clustering scheme outperforms the related work in terms of cluster robustness, convergence speed and overhead. The extensive simulation supports our claims.

Full PDF

11 Versatile Robust Clustering of Ad Hoc CognitiveRadio Network

Di Li,

Member, IEEE,

Erwin Fang, and James Gross,

Member, IEEE (cid:70)

Abstract —Cluster structure in cognitive radio networks facilitates cooper-ative spectrum sensing, routing and other functionalities. The unlicensedchannels, which are available for every member of a group of cognitiveradio users, consolidate the group into a cluster, and the availability ofunlicensed channels decides the robustness of that cluster against thelicensed users’ inﬂuence. This paper analyses the problem that how toform robust clusters in cognitive radio network, so that more cognitive radiousers can get beneﬁts from cluster structure even when the primary users’operation are intense. We provide a formal description of robust clusteringproblem, prove it to be NP-hard and propose a centralized solution, besides,a distributed solution is proposed to suit the dynamics in the ad hoc cognitiveradio network. Congestion game model is adopted to analyse the processof cluster formation, which not only contributes designing the distributedclustering scheme directly, but also provides the guarantee of convergenceinto Nash Equilibrium and convergence speed. Our proposed clusteringsolution is versatile to fulﬁll some other requirements such as faster conver-gence and cluster size control. The proposed distributed clustering schemeoutperforms the related work in terms of cluster robustness, convergencespeed and overhead. The extensive simulation supports our claims.

Index Terms —cognitive radio, robust cluster, game theory, congestiongame, distributed, centralized, cluster size control. ntroduction C ognitive radio (CR) is a promising technology to solvethe spectrum scarcity problem [1]. Licensed users accessthe spectrum allocated to them whenever there is informationto be transmitted. In contrast, as one way, unlicensed users canaccess the spectrum via opportunistic spectrum access, i.e., theyaccess the licensed spectrum only after validating the channel isunoccupied by licensed users, where spectrum sensing [2] playsan important role in this process. In this hierarchical spectrumaccess model [3], the licensed users are also called primary users(PU), while the unlicensed users are referred to as secondaryusers and constitute a so called cognitive radio network (CRN).Regarding the operation of CRN, e ﬃ cient spectrum sensing isidentiﬁed to be critical for a smooth operation of a cognitive radionetwork [4]. This can be achieved by cooperative spectrum sensingof multiple secondary users, which has been shown to cope e ﬀ ec-tively with noise uncertainty and channel fading, thus remarkablyimproving the sensing accuracy [5]. Collaborative sensing relies D. Li ([email protected]) was with RWTH Aachen university where thiswork is conducted.Erwin. Fang was with ETH, Switzerland when this work is conducted.J. Gross ([email protected]) is with KTH Royal Institute of Technology,Sweden. on the consensus of CR users within a certain area, in thisregard, clustering is regarded as an e ﬀ ective method to realizecooperative spectrum sensing [6], [7]. Clustering is a process ofgrouping certain users in a proximity into a collective. Clusteringis also e ﬃ cient to coordinate the channel switch operation whenprimary users are detected by at least one CR node residingin the cluster. The cluster head can enable all the CR deviceswithin the same cluster to stop payload transmission swiftly on theoperating channel and to vacate the channel [8]. In addition to thecollaborate sensing advantage, the use of clusters is beneﬁcial as itreduces the interference between cognitive clusters [9]. Clusteringalgorithm has also been proposed to support routing in cognitiveradio networks [10].Clusters are formed in the very beginning of the networkoperation, and re-formed periodically according to the dynamicsof the CRN. Each formed cluster has one or multiple unlicensedchannels which are available for every CR node in the cluster.The available unlicensed channels are referred to in the followingof this paper as common licensed channels (or common channelsfor short, which is abbreviated as CC). Both payload and controloverheads can be transmitted on the CCs. When one or severalcluster members can not access one certain CC on which primaryuser activity is detected, the channel will be excluded from the setof CCs. In particular, if that channel is being used for payloadcommunication, the communication pair will stop and resumethe transmission on another available channel from CCs. Theavailability of CCs within a cluster deﬁnes the existence of thatcluster, i.e., no CCs are available means the corresponding clusterdoesn’t exist. In the context of CRN, the activity of primary usersis usually unknown to the secondary users, thus when the primaryusers’ activity is deemed as random, the cluster which securesmore CCs will anticipate a longer time span. It is obvious thatfewer members in a cluster yield more CCs, but obtaining moreCCs by decreasing the cluster size contradicts to the motivation ofclustering, i.e., beneﬁt the cluster members with cooperative de-cision making. For example, spectrum sensing accuracy coralateswith the cluster size [11], and power consumption doesn’t favorsmall clusters [12], [13]. In this paper the robustness of clustersmeans the ability of the clusters to sustain the increasing activityof primary users.There has been a lot of research done for clustering inwireless networks. In ad-hoc and mesh networks, the major goal

1. The terms user and node appear interchangeably in this paper. In partic-ular, user is adopted when its networking or cognitive ability are discussed orstressed, while we refer node typicallly in the context of the topology. a r X i v : . [ c s . G T ] A p r of clustering is to maximize connectivity or to improve theperformance of routes [14], [15]. The emphasis of clustering insensor networks is on network lifetime and coverage [10]. Inrespect of CRN, [7], [16], [17] propose the clustering schemesto form clusters, where securing CCs is the only goal. Clusteringscheme [11] improves spectrum sensing accuracy with clusterstructure. [12], [18] target on the QoS provisioning and energye ﬃ ciency with cluster structure. [19] forms clusters to coordinatethe control channel usage within one cluster. An event-drivenclustering scheme is proposed for cognitive radio sensor networkin [20]. No one among the above mentioned schemes providesthe robustness to the formed clusters against primary users. Aclustering scheme (denoted as SOC) which is designed to generaterobust clusters against primary users is proposed in [21]. SOCinvolves three phases of distributed executions. In the ﬁrst phase,every secondary user forms clusters with some one-hop neighbors,in the second and third phase, each secondary user seeks to eithermerge other clusters or join one of them. The metric adopted byevery secondary user in the three phases is the product of thenumber of CCs and cluster size. The drawbacks are as follows,although the adopted metric considers both cluster size and thenumber of CCs, cluster formation can be easily dominated byonly one factor, e.g. a node which can access many channels mayexclude its neighbor and form a cluster by itself. In addition, thisscheme leads to the high variance of the cluster sizes, which isnot desired in certain applications as discussed in [12], [19]. [22]presents a heuristic method to form clusters, although the authorsclaim robustness is one goal to achieve, the minimum number ofclusters is ﬁnally pursued.A distributed clustering scheme ROSS is proposed in [23]under the game theoretic framework. Compared with the clus-tering schemes introduced above, the clusters are formed fasterand the clusters possess more CCs within and among clusters thanSOC. But this work doesn’t intervene the outcome of singletonclusters, although the cluster sizes are not as divergent as SOC.Furthermore, this work doesn’t consider the robustness of clustersagainst the increasing activity of primary users, which leaves theirclaim of robustness unveriﬁed. This paper is on the basis of thework in [23], but extends in two directions. First, this paper sticksto a new metric of robustness, and considers the clusters whenprimary users’ activity becomes more dynamic. Second, this paperproposes size control mechanism, which solves the problem ofdevergence of clusters sizes in [23] and [21]. Besides, this paperprovides a comprehensive analysis of the robust clustering prob-lem and proposes the centralized solution. The new extensions aremade on basis of ROSS and its light weighted version, the latterinvolves less overheads thus are more suitable for the scenariowhere fast deployment is desired. Throughout this paper, we referto the clustering schemes on the basis of ROSS as variants ofROSS , which include the fast versions and that with size controlfunction.The rest of paper is organized as follows. We present thesystem model and the robust clustering problem in Section 2. Thecentralized and distributed solutions are introduced in Section 3and 4 respectively. Extensive performance evaluation is presentedin Section 5. Finally, we conclude our work and point out thedirection for future research in Section 6. ystem M odel and P roblem F ormulation We consider a set of CR users N and a set of primary usersdistributed over a given area. A set of licensed channels K isavailable for the primary users. The CR users are allowed totransmit on channel k ∈ K only if no primary user is detectedon channel k . CR users conduct spectrum sensing independentlyand sequentially on all licensed channels. We adopt the unit diskmodel [24] for both primary and CR users’ transmission. If a CRnode i locates within the transmission range of an active primaryuser p , then i is not allowed to use the channel which is beingused by p . We assume the primary users change their operationchannels slowly, then we omit the time index for the spectrumavailability, i.e., as the result of spectrum sensing, K i ⊆ K denotesthe set of available licensed channels for CR user i at a timepoint. As the transmission range of primary users is limited andsecondary users have di ﬀ erent locations, di ﬀ erent secondary usersmay have di ﬀ erent views of the spectrum availability, i.e., forany i , j ∈ N , K i = K j does not necessarily hold. A cognitiveradio network can be represented as a graph G = ( N , E ), where E ⊆ N × N such that { i , j } ∈ E if and only if K i ∩ K j (cid:44) ∅ and d i , j < r , where d i , j is the distance between i , j , r is the radius ofsecondary user’s transmission range. Among the secondary users,we denote Nb( i ) as user i ’s neighborhood, which consists of theCR nodes located within the transmission range of i .We assume there is one dedicated control channel which isused to exchange signaling messages during clustering process.This control channel could be one of the ISM bands or otherreserved spectrum which is exclusively used for transmittingcontrol messages. Over the control channel, a secondary user i can exchange its spectrum sensing result K i with all its one hopneighbors Nb( i ). In the following, we refer to the licensed channelas channel in general, and will explicitly mention the dedicatedcontrol channel if necessary.We give the deﬁnition of cluster in CRN as follows. A cluster C is a set of secondary nodes which possess the same set ofCCs. In particular, one cluster consists of a cluster head h ( C )and a number of cluster members, and the cluster head is ableto communicate with any cluster member directly. A cluster canbe composed only by the cluster head. Nb( i ) denotes node i ’sneighborhood which consists of all its one hop neighbors. Clustersize of C is written as | C | . Cluster C ( i ) means the cluster headof this cluster is i . K ( C ) denotes the set of CCs in cluster C , K ( C ) = ∩ i ∈ C K i . The notations used in the system model and thefollowing problem description are listed in Table 1. As introduced in Section 1, in order to be robust against primaryusers’ activity, the formed clusters should have more CCs toexpect longer life expectancy. On the other hand, the sizes of theformed clusters should not diverge from the desired size greatly.The formation of small clusters or the singleton clusters , i.e., thecluster which has only one CR node, contradicts the motivationof forming clusters, as the beneﬁts brought in by the collective of

2. We assume that every node can detect the presence of an active primaryuser on each channel with certain accuracy. The spectrum availability can bevalidated with a certain probability of detection. Spectrum sensing / validationis out of the scope of this paper.3. Actually, the control messages involved in the clustering process canalso be transmitted on the available licensed channels through a rendezvousprocess by channel hopping [25], [26], i.e., two neighboring nodes establishcommunication on the same channel. TABLE 1. NotationsSymbol Description N set of CR users in a CRN N number of CR users in a CRN, N = |N|K set of licensed channels k ( i ) the working channel of user i Nb( i ) the neighborhood of CR node iC ( i ) a cluster whose cluster head is iK i the set of available channels at CR node iK ( C ( i )) the set of available CCs of cluster C ( i ) h ( C ) the cluster head of a cluster C δ the cluster size which is preferred S i a set of claiming clusters, each of which includesdebatable node i after phase I d i individual connectivity degree of CR node ig i neighborhood connectivity degree of CR node if ( C ) the number of CCs of a cluster C , which is usedin the problem description S the collection of all the possible clusters in N C i the i th cluster in S the cluster members are compromised. On the other hand, largeclusters are not preferred in some scenarios neither, e.g., for theCRN composed with resource limited users, managing the clustermembers in a large cluster is a substantial burden. Hence, thecluster size should fall in a desired range according to di ﬀ erentapplication scenarios [27], [28]. Considering the above mentionedrequirements, we present the deﬁnition of robust clustering prob-lem as follows. D EFINITION Robust clustering problem in CRN.Given a cognitive radio network N where |N| = N, the col-lection of all the possible clusters in N is denoted as S where S = { C , C , . . . , C |S| } and there is (cid:83) ≤ i ≤|S| C i = N. With therequirements on the cluster size are enforced, i.e., the desired sizeis δ and the cluster sizes should fall in the scope (cid:104) δ , δ (cid:105) , where δ, δ , δ ∈ Z + , and δ ≤ δ ≤ δ , a feasible clustering solution is asubcollection S (cid:48) ⊆ S , which satisﬁes δ ≤ | C i | ≤ δ , (cid:83) C i ∈S (cid:48) C i = N and C i (cid:48) ∩ C i = ∅ where C i (cid:48) , C i ∈ S (cid:48) and i (cid:48) (cid:44) i. The optimalclustering solution is the feasible clustering solution whose sumof the numbers of CCs of clusters (cid:80) C ∈S (cid:48) f ( C ) is the maximal. entralized S olution for R obust C lustering Based on Deﬁnition 1, the decision version of this problem isto determine whether there is a non-empty S (cid:48) ⊆ S , so that (cid:80) C ∈S (cid:48) f ( C ) (cid:62) λ where λ is a real number. We have the followingtheorem on the complexity of this problem. T HEOREM

Robust clustering problem in CRN is NP-hard,when the maximum size of clusters is larger than 3, and δ = and δ = N. The proof is in Appendix 19.We propose a centralized solution which solves an optimiza-tion with standard solvers. To formulate the optimization, we need

4. Possible cluster means the collection of CR nodes, which complies withthe deﬁnition in 25. The subscripts of the clusters can be decided in any convenient way i.e.,the sequence of identifying them. to do some preparation beforehand. First, all the possible clusterscomplying with the description in the system model are foundand constitute a set S . Second, we assign a weight about sizeto each cluster, which correlates with the di ﬀ erence between thecluster size and the desired cluster size. In particular, considering |S| = M , C i ∈ S means the i th cluster in S , and δ is the desiredsize, the weight about size for each cluster is given as follows, p i ( C i ) =  | C i | = δρ if | C i | = δ − | C i | = δ + ρ if | C i | = δ − | C i | = δ + ... where ρ , ρ , · · · are positive values. In particular, ρ increases withthe divergence between | C i | and δ , i.e., these is ρ > ρ > S and decides on certainclusters which constitute the whole CRN without overlappingbetween any two of them, besides, the sum of CCs of the chosenclusters is maximized. Then, a central node (or controller) with theknowledge of all CR nodes (also the possible clusters) will solvethe problem based on the following formulation.min w i , x ij Σ Nj = Σ Mi = ( w i ∗ t ij )subject to Σ Gi = x ij = , f or ∀ j = , . . . , N Σ Nj = x ij = | C i | ∗ w i , f or ∀ i = , . . . , Mx ij and w i are binary variables. i ∈ { , , · · · M } , j ∈ { , , · · · N } (1)The objective is to maximize the sum of CCs of all the clusters. w i and x ij are the two binary variables in this problem. N is thetotal number of CR users in network N , M = |S| . Being either 1or 0, w i denotes whether the i th cluster C i in S is chosen to bein the solution or not. x ij indicates whether the CR node j residesin the i th potential cluster, i.e., x ij = j resides inthe cluster C i . Node index j is identical to the node ID. t ij is aconstant which is p i ( C i ) − q ij | C i | . q ij = | K ( C i ) | when there is j ∈ C i ,and q ij = j (cid:60) C i . | K ( C i ) | is the number of CCs ofcluster C i , and | C i | is the size of cluster C i .Now we examine the objective function to see whether it inline with the goal to maximize the total number of CCs meanwhileconsider the restriction on cluster size. The objective function canbe written as,min w i , x ij Σ Nj = Σ Mi = ( − w i ∗ q ij | C i | + w i ∗ p i ( C i ))The sum of the ﬁrst items is the sum of CCs of all the chosenclusters. The minus sign in front of the ﬁrst item explains thereason why we minimize instead to maximize the objective func-tion. As to the second item, when w i is zero ( C i is chosen), if | C i | (cid:44) δ , the second component will be positive which contradictsthe direction of the optimization. Thus the second item discouragesthe appearance the clusters whose sizes are di ﬀ erent from δ ,especially those whose sizes diverge far from δ .The constraints guarantee to obtain the clusters which togetherinclude all the CR users and don’t overlap. The ﬁrst constraintregulates that each CR node should reside in exactly one cluster.The second constraint regulates that when the i th possible cluster C i is chosen, there will be exactly | C i | CR nodes residing in C i .This problem is a binary linear programming problem, which can be solved by many available solvers. The di ﬃ culty of usingthis method lies in the preparation of the set S . In the worstcase i.e., the CRN forms a full connected graph, the size of S is Σ Nr = (cid:16) Nr (cid:17) = N −

1. To levitate this problem, a smaller set i.e.,

G ⊂ S can be used. G can be prepared based on the cluster size,and it is recommended to include all the singleton clusters to makesure the availability of feasible solutions.Another obstacle to apply this centralized scheme is, the cen-tralized entity ﬁrstly needs to collect the information from all theCR nodes, then computes the clustering solution and distributes itacross the whole network. This process involves a large number ofcommunication overheads. In CRN, it is necessary to do clusteringagain on some occasions. For example, when a certain amountof CR users move away from their clusters, i.e., they loose thedirection connection with any member in their previous clusters,or a certain amount of clusters can not be maintained as the CC inthese clusters don’t exist any longer due to primary users’ activity.Hence, when the spectrum availability and the CR users’ locationchange frequently, the centralized robust clustering scheme is notsuitable for CRN. istributed C lustering A lgorithm : ROSS In this section we introduce the distributed clustering schemeROSS. With ROSS, CR nodes form clusters based on the proxim-ity of the available spectrum in their neighborhood after a series ofinteractions with their neighbors. ROSS consists of two cascadedphases: cluster formation and membership clariﬁcation . In the ﬁrstphase, clusters are formed quickly and every CR user becomeseither a cluster head or a cluster member. In the second phase,non-overlapping clusters are formed in a way that the CCs ofrelevant clusters are mostly increased.

We assume that before conducting clustering, spectrum sensing,neighbor discovery and exchange of spectrum availability havebeen completed, so that every CR node is aware of the availablechannels on themselves and their neighbors. In this phase, clusterheads are determined after a series of comparisons with theirneighbors. Two metrics are proposed to characterize the proxim-ity in terms of available spectrum between CR node i and itsneighborhood, which are used in the comparisons to decide on thecluster heads. • Individual connectivity degree d i : d i = (cid:80) j ∈ Nb( i ) | K i ∩ K j | . d i is thetotal number of the CCs between node i and every its neighbor.It is an indicator of node i ’s adhesion to the CRN. • neighborhood connectivity degree g i : g i = | (cid:84) j ∈ Nb( i ) ∪ i K j | . It isthe number of CCs which are available for i and all its neighbors. g i represents the ability of i to form a robust cluster with itsneighbors.Individual connectivity degree d i and neighborhood connectivitydegree g i together form the connectivity vector . Figure 1 illustratesan example CRN where every node’s connectivity vector is shown. The procedure of determining the cluster heads is as follows.Each CR node decides whether it is a cluster head by comparingits connectivity vector with its neighbors. When CR node i haslower individual connectivity degree than all its neighbors except A F EG DHB C (12,1)

33 2 244 43 4 3 23 32 (9,2) (8,2) (9,1) (19,1) (7,1) (6,2) (14,0)

Fig. 1: Connectivity graph of the example CRN and the con-nectivity vector ( d i , g i ) for each node. The desired clustersize δ =

3. The sets of the indices of the available chan-nels sensed by each node are: K A = { , , , , , , } , K B = { , , , , } , K C = { , , , } , K D = { , , , } , K E = { , , , } , K F = { , , , , } , K G = { , , , , } , K H = { , , , } .Dashed edge indicates the end nodes are within each other’stransmission range.for those which have already identiﬁed to be cluster heads, node i becomes a cluster head. If there is another CR node j in itsneighborhood which has the same individual connectivity degreeas i , i.e., d j = d i and d j < d k , ∀ k ∈ Nb( j ) \ { Λ ∪ i } where Λ denotesthe cluster heads, then the node between i and j , which has higherneighborhood connectivity degree will become the cluster head,and the other node become one member of the newly identiﬁedcluster head. If g i = g j as well, the node ID is used to break thetie, i.e., the one with smaller node ID becomes a cluster head. Thenode which is identiﬁed as a cluster head broadcasts a messageto notify its neighbors of this change, and its neighbors which arenot cluster heads become cluster members . The pseudo code forthe cluster head decision and the initial cluster formation is shownin Algorithm 1 in the appendix.After receiving the notiﬁcation from a cluster head, a CR node i is aware that it becomes a member of a cluster. Consequently, i sets its individual connectivity degree to a positive number M > |K| · N , and broadcasts the new individual connectivity degreeto all its neighbors. When a CR node i is associated to multipleclusters, i.e., i has received multiple notiﬁcations of cluster headeligibility from di ﬀ erent CR nodes, d i is still set to be M . Themanipulation of the individual connectivity degree of the clustermembers fastens the speed of completing choosing the clusterheads. We have the following theorem to show that as long asa secondary user’s individual connectivity degree is greater thanzero, every secondary user will eventually be either integrated intoa certain cluster, or becomes a cluster head. T HEOREM

Given a CRN, it takes at most N steps that everysecondary user either becomes cluster head, or gets included intoat least one cluster.

Here, by step we mean one secondary user executing Algo-rithm 4.1 for one time. The Proof is in Appendix 19.The procedure of the proof also illustrates the time neededto conduct Algorithm 4.1. Consider an extreme scenario, whereall the secondary nodes sequentially execute Algorithm 1, i.e.,they constitute a list as discussed in the example in the proof.If one step can be ﬁnished within certain time T , then the totaltime needed for the network to conduct Algorithm 4.1 is N ∗ T .

6. The reasons for the occurrence of the cluster heads in the neighborhoodof a new cluster head will be explained in Section 4.1.2 and 4.1.3)

In other scenarios, as Algorithm 1 can be executed concurrentlyby secondary users which locate in di ﬀ erent places, the neededtime can be considerably reduced. Let us apply Algorithm 1 tothe example shown in Figure 1. Node B and H have the sameindividual connectivity degree, i.e., d B = d H . As g H = > g B = H becomes the cluster head and cluster C ( H ) is { H , B , A , G } . After executing Algorithm 1, certain formed clusters may notpossess any CCs. As decreasing cluster size increases the CCswithin a cluster, for those clusters having no CCs, certain nodesneed to be eliminated to obtain at least one CC. The sequenceof elimination is performed according to an ascending list ofnodes which are sorted by the number of common channelsbetween the nodes and the cluster head. In other words, thecluster member which has the least common channels with thecluster head is excluded ﬁrst. If there are multiple nodes havingthe same number of common channels with the cluster head, thenode whose elimination brings in more common channels will beexcluded. If this criterion meets a tie, the tie will be broken bydeleting the node with smaller node ID. It is possible that thecluster head excludes all its neighbors and resulting in a singletoncluster which is composed by itself. The pseudo code for thisprocedure is shown in Algorithm 2. As to the nodes which areeliminated from the previous clusters, they restore their originalindividual connectivity degrees, execute Algorithm 1 and becomeeither cluster heads or get included into other clusters afterwardsaccording to Theorem 4.1.During Phase I, when ever a CR node is decided to be a clusterhead and accordingly forms a cluster, or its cluster’s compositionis changed, the cluster head will broadcast the updated informationabout its cluster, which includes the sets of available channels onall its cluster members.

In this subsection, we illustrate the pressing necessity to controlthe cluster size when CRN becomes denser.We consider a cluster C ( i ) where i is the cluster head in adense CRN. To make the analysis easier, we assume there is nocluster heads which are generated within i ’s neighborhood duringthe procedure of guaranteeing CCs. Assuming the CR users andPUs are evenly distributed and PUs occupy the licensed channelsrandomly, then both CR nodes density and channel availability inthe CRN can be seen to be spatially homogeneous. In this casethe formed clusters are decided by the transmission range andnetwork density. According to Algorithm 1, the nearest clusterheads could locate just outside node i ’s transmission range. Aninstance of this situation is shown in Figure 2. In the ﬁgure, blackdots represent cluster heads, the circles denote the transmissionranges of cluster heads. Cluster members are not shown in theﬁgure. Let l be the length of side of simulation plan square,and r be CR’s transmission radius. Based on the aforementionedanalysis and geometry illustration as shown in Figure 2, we givean estimate on the maximum number of generated clusters, whichis the product of the number of cluster heads in one row andthat number in one line, l / r ∗ l / r = l / r . Given r =

10 and l =

100 200 300 400 500 600 7001820222426 N u m be r o f c l u s t e r s Number of CR nodes N u m be r o f ne i ghbou r s number of generated clusterstheoretical upper bound on num. of clustersnumber of neighbours Fig. 3: The correlation between the number of formed clusters andnetwork density.Both analysis and simulation show that when applying ROSS,after the clusters are saturated with the increase of network density,the cluster size increases linearly with the network density, thuscertain measures are needed to curb this problem. This task falls tothe cluster heads. To control the cluster size, cluster heads prunetheir cluster members to reach the desired cluster size. The desiredsize δ is decided based on the capability of the CR users and thetasks to be conveyed. As there are overlaps between neighboringclusters, the sizes of the clusters formed in this phase are largerthan that of the ﬁnally formed clusters. Hence, a cluster headexcludes some cluster members when the cluster size exceeds t · δ ,where constant parameter t is dependent on the network densityand CR nodes’ transmission range and t >

1. In particular, thecluster head removes the cluster members sequentially accordingto the following principle, the absence of one cluster member leadsto the maximum increase of common channels within the cluster.This process ends when each cluster’s size is smaller or equal to t · δ . This procedure is similar with guaranteeing the existence ofCCs in cluster, thus can reuse Algorithm 2. The t is set to 1.3. As to the example CRN shown in Figure 1, the resulted clustersare shown in Figure 4 after running phase I of ROSS. We noticethat nodes A , B , D are included in more than one cluster. We referto these nodes as debatable nodes as their cluster a ﬃ liations arenot decided. The clusters which include the debatable node i arecalled claiming clusters of node i , and the set of these clusters isdenoted as S i . The debatable nodes which are generated from theﬁrst phase of ROSS should be exclusively associated with only Cluster Head

A F EG DHB C (12,1) (9,2) (8,2) (9,1) (19,1) (7,1) (6,2) (14,0)

Fig. 4: Clusters formation after the phase I of ROSS. CR nodes A , B , D are debatable nodes as they belong to multiple clusters.one cluster and be removed from the other claiming clusters, thisprocedure is called cluster membership clariﬁcation . Assume a debatable node i needs to decide one cluster C ∈ S i to stay, and thereafter leaves the rest others in S i . In this process,the principle for i is that its move should result in the greatestincrease of CCs in all its claiming clusters. Note that node i isaware of the spectrum availability on all the cluster members ofeach claiming cluster, thus node i is able to calculate how manymore CCs can be produced in one claiming cluster if i leavesthat cluster. If there exists one cluster C ∈ S i , when i leavesthis cluster brings the least increased CCs than leaving any otherclaiming clusters, then i chooses to stay in cluster C . When therecomes a tie, among the claiming clusters, i chooses to stay in thecluster whose cluster head shares the most CCs with i . In casethere are multiple claiming clusters demonstrating the same onthe aforementioned metric, node i chooses to stay in the claimingcluster which has the smallest size. Node IDs of cluster heads willbe used to break tie if all the previous metrics could not decide onthe unique claiming cluster for i to stay. The pseudo code of thisalgorithm is given as Algorithm 3. After deciding its membership,debatable node i notiﬁes all its claiming clusters of its choice, andthe claiming clusters from which node i leaves also broadcast theirnew cluster composition and the spectrum availability on all theircluster members.The autonomous decisions made by the debatable CR nodesraise the concern on the endless chain e ﬀ ect in the membershipclariﬁcation phase. A debatable node’s choice is dependent onthe compositions of its claiming clusters, which can be changedby other debatable nodes’ decisions. As a result, the debatablenode which makes decision ﬁrst may change its original choice,and this process may go on forever. To erase this concern, weformulate the process of membership clariﬁcation into a game,where a equilibrium is reached after a ﬁnite number of bestresponse updates made by the debatable nodes. Game theory is a powerful mathematical tool for studying, mod-elling and analysing the interactions among individuals. A gameconsists of three elements: a set of players, a selﬁsh utility foreach player, and a set of feasible strategy space for each player. Ina game, the players are rational and intelligent decision makers,which are related with one explicit formalized incentive expression(the utility or cost). Game theory provides standard procedures tostudy its equilibriums [29]. In the past few years, game theoryhas been extensively applied to problems in communication andnetworking [30], [31]. Congestion game is an attractive gamemodel which describes the problem where participants compete for limited resources in a non-cooperative manner, it has goodproperty that Nash equilibrium can be achieved after ﬁnite stepsof best response dynamic, i.e., each player choose strategy tomaximizes / minimizes its utility / cost with respect to the otherplayers’ strategies. Congestion game has been used to model cer-tain problems in internet-centric applications or cloud computing,where self-interested clients compete for the centralized resourcesand meanwhile interact with each other. For example, serverselection is involved in distributed computing platforms [32], orusers downloading ﬁles from cloud, etc.To formulate the debatable nodes’ membership clariﬁcationinto the desired congestion game, we observe this process froma di ﬀ erent (or opposite) perspective. From the new perspective,the debatable nodes are regarded to be isolated and don’t belongto any cluster, in other words, their claiming clusters becomeclusters which are beside them. Now for the debatable nodes, theprevious problem of deciding which clusters to leave becomesa new problem that which cluster to join. In the new problem,debatable node i chooses one cluster C out of S i to join if thedecrease of CCs in cluster C is the smallest in S i , and the decreaseof CCs in cluster C is (cid:80) C ∈ S i ∆ | K ( C ) | = (cid:80) C ∈ S i ( | K ( C ) | − | K ( C ∪ i ) | ).The interaction between the debatable nodes and the claimingclusters is shown in Figure 5.Fig. 5: Debatable nodes and claiming clustersIn the following, we show that the decision of debatable nodesto clarify their membership can be mapped to the behaviour ofthe players in a player-speciﬁc singleton congestion game whenproper cost function is given. The game to be constructed isrepresented with a 4-tuple Γ = ( P , R , (cid:80) i , i ∈P , f ), and the elementsin Γ are explained below, • P , the set of players in the game, which are the debatable nodesin our problem. • R = ∪ S i , i ∈ P , denotes the set of resources for players tochoose, in our problem, S i is the set of claiming clusters ofnode i , and R is the set of all claiming clusters. • Strategy space (cid:80) i , i ∈ P is the set of claiming clusters S i .As debatable node i is supposed to choose only one claimingcluster, then only one piece of resource will be allocated to i . • The utility (cost) function f ( C ) as to a resource C . f ( C ) =∆ | K i ( C ) | , C ∈ S i , which represents the decreased number of CCsin cluster C when debatable node i joins C . As to cluster C ∈ S i ,the decrease of CCs caused by including the debatable nodes is (cid:80) i : C ∈ S i , i → C ∆ | K i ( C ) | . i → C means i joins cluster C . Obviouslythis function is non-decreasing with respect to the number ofnodes joining cluster C .The utility function f is not purely decided by the number ofplayers accessing the resource (debatable nodes join claimingclusters), which happens in a canonical congestion game. Thereason is in this game the channel availability on debatablenodes is di ﬀ erent. Given two same groups of debatable nodesand their sizes are the same, when the nodes are not completelythe same (neither are the channel availabilities on these nodes), the cost happened on one claiming cluster could be di ﬀ erent ifthe two groups of debatable nodes join that cluster respectively.Hence, this congestion game is player speciﬁc [33]. In this game,every player greedily updates its strategy (choosing one claimingcluster to join) if joining a di ﬀ erent claiming cluster minimizesthe decrease of CCs (cid:80) i : C ∈ S i ∆ | K i ( C ) | , and a player’s strategy inthe game is exactly the same with the behaviour of a debatablenode in the membership clariﬁcation phas.As to singleton congestion game, there exists a pure equilibriawhich can be reached with the best response update, and the upperbound for the number of steps before convergence is n ∗ m [33],where n is the number of players, and m is the number ofresources. In our problem, the players are the debatable nodes,and the resources are the claiming clusters. Thus the upper boundof the number of steps can be expressed as O ( N ).In fact, the number of steps which are actually involved inthis process is much smaller than N , as both n and m areconsiderably smaller than N . The percentage of debatable nodesin N is illustrated in Figure 13, which is between 10% to 60%of the total number of CR nodes in the network. The number ofclusters heads, as discussed in Section 4.1, is dependent on thenetwork density and the CR node’s transmission range. As shownin Figure 3, the cluster heads take up only 3.4% to 20% of thetotal number of CR nodes. On the basis of ROSS-DGA, we propose a faster version ROSS-DFA which di ﬀ ers from ROSS-DGA in the second phase. WithROSS-DFA, debatable nodes decide their respective cluster headsonce. The debatable nodes consider their claiming clusters toinclude all their debatable nodes, thus the membership of claimingclusters is static and all the debatable nodes can make decisionsimultaneously without considering the change of membership oftheir claiming clusters. As ROSS-DFA is quicker than ROSS-DGA, the former is especially suitable for the CRN where thechannel availability changes dynamically and re-clustering is nec-essary. To run ROSS-DFA, debatable node executes only one loopin Algorithm 3.Now we apply both ROSS-DGA and ROSS-DFA to the toynetwork in Figure 4 which has been applied the phase I ofROSS. In the network, node A ’s claiming clusters are cluster C ( C ) , C ( H ) ∈ S A , their members are { A , B , C , D } and { A , B , H , G } respectively. The two possible strategies of node A is illustratedin Figure 6. In Figure 6(a), node A staying in C ( C ) and leaving C ( H ) brings 2 more CCs to S A , which is more than that broughtby another strategy showed in 6(b). After the decisions madesimilarly by the other debatable nodes B and D , the ﬁnal clustersare formed as shown in Figure 7. erformance E valuation The schemes involved in the simulation are listed as follows, • ROSS without size control, i.e., ROSS-DGA and ROSS-DFA. • ROSS with size control. i.e., ROSS- δ -DGA and ROSS- δ -DFAwhere δ is the desired cluster size. In the following, we refer tothe above mentioned four schemes as the variants of ROSS. • SOC [21], a distributed clustering scheme pursuing clusterrobustness. • Centralized robust clustering scheme. The formulated optimiza-tion is an integer linear optimization problem, which is solvedby MATLAB with the function bintprog . (cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0) A F E G D H B C (a) Node A stays in cluster C ( C ), quits C ( H ), ∆ | K ( C ( C )) | +∆ | K ( C ( H )) | = (cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0) A F E G D H B C (b) Node A stays in cluster C ( H ), quits C ( C ), ∆ | K ( C ( C )) | +∆ | K ( C ( H )) | = Fig. 6: Membership clariﬁcation: possible cluster formationscaused by node A’s di ﬀ erent choices A F E G D H B C { } { } { }

Fig. 7: Final formation of clusters. Common channels are shownbeside corresponding clusters.The ROSS without size control mechanism is similar with theschemes proposed in [23]. The authors of [21] compared SOCwith other schemes in terms of the average number of CCs ofthe formed cluster, on which SOC outperforms other schemes by50%-100%. SOC’s comparison schemes are designed either forad hoc network without consideration of channel availability [34],or for CRN but just considering connection among CR nodes [7].Thus SOC is chosen to be the only distributed scheme as compari-son, besides, we also compare ROSS with the centralized scheme.Before we investigating the performance of the clusteringschemes with simulation, we apply the two comparison clusteringschemes in the example CRN in Figure 1, and make an initial com-parison in terms of the amount of CCs. As to the centralized robustclustering scheme, we set the desired cluster size δ as 3, as a result,according to the network topology, the collection of all the possi-ble clusters S = {{ A } , { B } , . . . , { B , C } , { B , A } , { B , H } , · · · , { B , A , C } , { B , H , C } , { A , D , C } , · · · } , and |S| =

38. We set ρ and ρ as 0.2 and0.8 respectively. The formed clusters by the centralized clusteringscheme are shown in Fig. 8(b). The resulted clustering solutionsfrom SOC is shown in Fig. 8(a). We compare the average numberof CCs achieved by di ﬀ erent schemes, the results of ROSS ,centralized and SOC are 2.66, 2.66, and 3 respectively. Notethere is one singleton cluster C ( H ) generated by SOC, which isnot preferred. When we only consider the clusters which are notsingleton, the average number of CCs of SOC drops to 2.5.We investigate the schemes with four metrics. • The average number of CCs per non-singleton cluster.

Non-singleton cluster refers the cluster whose cluster size is largerthan 1. Comparing with the metric adopted by SOC [21], whichis the average number of CCs of all the clusters, this metric pro-vides a more accurate description of the robustness of the non-singleton clusters. Having more CCs per non-singleton clustersmeans these clusters have longer life expectancy when the

7. In this example network, both ROSS-DGA and ROSS-DFA and their sizecontrol variants form the same clusters)

A F E G D H B C {1,2,5,8} {1,3} {2,5,7} (a) Generated by SOC

A F E G D H B C {2,5} {1,3,4} {1,2,5} (b) Generated by the centralizedclustering scheme

Fig. 8: Final clusters formed by the centralized clustering schemeand SOC.primary users’ operation becomes more intense. Although thismetric doesn’t disclose the information about the unclustered

CR nodes which are the synonyms of the singleton clusters, westill examine this metric as the number of CCs is involved in theutility adopted by all the variants of ROSS and SOC. • Cluster sizes.

We investigate the distribution of CRs residing inthe formed clusters with di ﬀ erent sizes. • Robustness of the clusters against newly added PUs.

Weincrease the number of PUs to challenge the non-singletonclusters, and count the number of the unclustered CR nodes.This metric directly indicates the robustness of clusters from amore practical point of view, i.e., as to the clusters formed fora given CRN and spectrum availability, how many CR nodescan still make use of the clusters when the spectrum availabilitydecreases. • Amount of control messages involved.

We investigate thenumber of control messages involved in the clustering process.Simulation consists of two parts, ﬁrst we investigate theperformance of centralized scheme and the distributed schemesin a small network, as there is no polynomial time solutionavailable to solve the centralized problem. In the second part, weinvestigate the performance of the proposed distributed schemesin the CRN with di ﬀ erent scales and densities. The followingsimulation settings is the same for both simulation parts. CRsand PUs are deployed on a two-dimensional Euclidean plane.The number of licensed channels is 10, each PU is operatingon each channel with probability of 50%. CR users are assumedto be able to sense the existence of primary users and identifyavailable channels. All primary and CR users are assumed to bestatic during the process of clustering. The simulation is written inC ++ , and the performance results are averaged over 50 randomlygenerated topologies, and the conﬁdence interval corresponds to95% conﬁdence level. There are 10 primary users and 20 CR users dropped randomly(with uniform distribution) within a square area of size A , wherewe set the transmission ranges of primary and CR users to A /

3. When clustering scheme is executed, around 7 channels areavailable on each CR node. The desired cluster size δ is 3. Asto the centralized scheme, the parameters used in the punishment for choosing the clusters with undesired sizes are set as follows, ρ = . ρ = . From Figure 9, we can see the centralized schemes outperformthe distributed schemes. Among the distributed schemes, SOC achieves the most CCs. The reason is, SOC is liable to groupthe neighboring CRs which share the most abundant spectrumtogether, no matter how many of them are there, thus the numberof CC of the formed clusters is higher. In the other hand, SOCgenerates the most unclustered CRs, which can be seen when wediscuss the performance on the number of unclustered CR nodes.As to the variants of ROSS, we notice that the greedy mechanismincreases CCs in non-singleton clusters signiﬁcantly.

Figure 10 depicts the empirical cumulative distribution of the CRsin clusters of di ﬀ erent sizes, from which we have two conclusions.The ﬁrst, SOC generates more unclustered CR nodes than otherschemes. The centralized schemes don’t produce unclustered CRnodes in the simulation, the unclustered nodes generated byROSS-DGA / DFA account for 3% of the total CR nodes, ascomparison, 10% of nodes are unclustered when applying SOC.ROSS-DGA and ROSS-DFA with size control feature generate5%-8% unclustered CR nodes, which is due to the cluster pruningprocedure (discussed in section 4.1.2 and section 4.1.3). Second,the centralized schemes and cluster size control mechanism ofROSS generate clusters with the desired cluster size. As to ROSS-DFG and ROSS-DFA with size control feature, CR nodes resideaveragely in clusters whose sizes are 2, 3 and 4. The sizes ofclusters resulted from ROSS-DGA and ROSS-DFA are disperse,but appear to be better than SOC, i.e., the 50% percentilesfor ROSS-DGA, ROSS-DFA and SOC are 4.5, 5, and 5.5, andthe 90% percentiles for the three schemes are 8, 8, and 9, thecorresponding sizes of ROSS are closer to the desired size.

In this part of simulation, we put PUs sequentially into CRN todecrease the available spectrum. 10 PUs are in the network in thebeginning, then extra 19 batches of PUs are added sequentially,where each batch includes 5 PUs.Figure 11 shows certain clusters can not maintain and thenumber of unclustered CR nodes grows when the number of PUsincreases. The centralized scheme with desired size of 2 generatesthe most robust clusters, meanwhile, SOC results in the mostvulnerable clusters. The centralized scheme with desired size of 3doesn’t outperform the variants of ROSS, because pursuing clustersize prevents forming the the clusters with more CCs. In contrary,the variants of ROSS generate some smaller clusters which aremore likely to maintain when there are more PUs.

In this section we compare the overhead of signaling involved indi ﬀ erent clustering schemes. We don’t consider the the controlmessages which are involved in neighborhood discovery, whichis the premise and deemed to be the same for all clusteringschemes. According to [35], the message complexity is deﬁnedas the number of messages used by all nodes. To have the samemetric to compare, we count the number of transmissions ofcontrol messages , without distinguishing broadcast or uni-castcontrol messages. This metric is synonymous with the numberof updates discussed in Section 4.As to ROSS, the control messages are generated in bothphases. In the ﬁrst phase, when a CR node decides itself to bethe cluster head, it broadcasts a message containing its ID, clustermembers and the set of CCs in its cluster. In the second phase,a debatable node broadcasts its a ﬃ liation to inform its claiming A v e r . nu m . o f CCC s pe r non − s i ng l e t on c l u s t e r R O SS − D G A R O SS − D F A R O SS − − D G A R O SS − − D F A S O C c en t r a li z ed − c en t r a li z ed − Fig. 9: Number of common channels ofnon-singleton clusters

Cluster size E CD F ROSS−DGAROSS−DFAROSS−3−DGAROSS−3−DFASOCCentralized−3Centralized−2

Fig. 10: Cumulative distribution of CRsresiding in clusters with di ﬀ erent sizes

10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 10010502468101214161820

Number of added PUs N u m be r o f un c l u s t e r ed node s ROSS−DGAROSS−DFAROSS−3−DGAROSS−3−DFASOCcentralized−3centralized−2

Fig. 11: Number of unclustered CRs withdecreasing spectrum availabilityFig. 12: Comparison between the distributed and centralized clustering schemes ( N = N . As to the processof dissemination, in an extreme situation where all the gatewayand the backbone nodes broadcast, the number of transmissions is h + m , where h is the number of cluster heads and m is number ofdebatable nodes.The number of control messages which are involved in ROSSvariants and the centralized scheme is related with the numberof debatable nodes. Figure 13 shows the percentage of debatablenodes with di ﬀ erent network densities, from which we can obtainthe value of m . Table 2 shows the message complexity, quanti-tative amount of the control messages, and the size of controlmessages. Figure 14 shows the analytical result of the amount oftransmissions involved in di ﬀ erent schemes.

50 100 200 300 400 500 0%10%20%30%40%50%60%70%

Number of CR nodes P e r c en t age Fig. 13: The percentage of debatable nodes after phase I of ROSS.

100 200 300 400 500020040060080010001200140016001800

Number of CR nodes N u m be r o f t r an s m i tt i ng c on t r o l m e ss age s ROSS−DGAROSS−DFASOCCentralized (by analysis)

Fig. 14: Quantitative amount of control messages.

In this section we investigate the performances of distributedclustering schemes in CRN with di ﬀ erent network scales anddensities. The transmission range of CR is A /

5, PU’s transmissionrange is 2 A /

5. The initial number of PU is 30. The desired sizesadopted are listed in the Table 3, which is about 60% of the averagenumber of neighbours. When run ROSS, the parameter t which isused to control cluster size in phase I is 1.3. Figure 15 shows the average number of CCs of the non-singletonclusters. We notice that SOC achieves the most CCs per non-singleton cluster, although the lead over the variants of ROSSshrinks signiﬁcantly when N increases. TABLE 2. Signalling overhead

Scheme Message Complexity Quantitative number ofmessages Content of message (size of message)ROSS-DGA,ROSS- δ -DGA O ( N ) (worst case) h + m d (upper bound) Cluster head i broadcasts channel availability on all clustermembers ( | C ( i ) ||K| bytes); Cluster member i broadcasts thenew individual connectivity d i after being included in one ormore clusters (1 byte)ROSS-DFA,ROSS- δ -DFA O ( N ) (worst case) h + m (upper bound)SOC O ( N ) 3 N Every CR node i broadcasts channel availability on allcluster members ( | C ( i ) ||K| bytes)Centralized O ( N ) h + m + N (upperbound) [36] clustering result (2 N bytes) a a Assuming the data structure of the clustering result is in the form of { Node ID i , cluster head ID h( C ) where i ∈ C , for every i ∈ N }. TABLE 3

Number of CRs 100 200 300Average num. of neighbours 9.5 20 31Desired size δ Fig. 15: Number of common channels of non-singleton clusters.

We add extra 20 batches of PUs sequentially in the CRN, whereeach batch includes 10 PUs. Figure 16 and 17 show that when N =

100 and 200, more unclustered CR nodes appear in theCRN where SOC is applied. When the network becomes denser,as shown in Figure 18, ROSS-DGA / DFA generate slightly moreunclustered CR nodes than SOC when new PUs are not many, butSOC’s performance deteriorates quickly when the number of PUsbecomes larger. We only show the average values of the variants ofROSS as their conﬁdence intervals overlap. When applying ROSSwith size control mechanism, signiﬁcantly less unclustered CRnodes are generated. Besides, the greedy mechanism moderatelystrengthens the robustness of the clusters.

Figure 24 shows when the network density scales up, the numberof formed clusters by ROSS increases by smaller margin, andthat generated by SOC increases linearly. This result coincideswith the analysis in Section 4.1.3. To better understand thedistribution of the sizes of formed clusters, we depict the empiricalcumulative distribution of CR nodes in clusters with di ﬀ erent sizesin Figures 20 21 22.The sizes of clusters generated by ROSS-DGA and ROSS-DFA span a wider range than ROSS with size control feature. Mostof the generated clusters are smaller than the average number ofneighbours, which is roughly equal with the 95% percentile of the ROSS-DGA curve. The 50% percentile of the ROSS-DGA curveis roughly the desired size δ . When the variants of ROSS with sizecontrol feature are applied, the sizes of the most generated clustersare smaller than δ . As to the curves of SOC, the 95% percentilesare 36, 30, and 40 in respective networks. From Figure 23, weconclude that the sizes of the clusters generated by ROSS arelimited by the network density, the sizes of the clusters formed byROSS with size control feature are restricted by the desired size.In contrary, the clusters generated from SOC demonstrate strongdivergence on cluster sizes. The centralized clustering scheme is able to form the clusterswhich satisfy the requirement on cluster size strictly, and theclusters are robust against the PUs’ activity, besides, it generatesthe smallest control overhead in the process of clustering.As distributed schemes, the variants of ROSS outperform SOCconsiderably on three metrics. The variants of ROSS generatemuch less singleton clusters than SOC, and the resulted clustersare robuster than SOC when facing the newly added PUs. Thesignaling overhead involved in ROSS is about half of that neededfor SOC, and the signaling messages are much shorter that thelatter. The sizes of the clusters generated by ROSS demonstratesmaller discrepancy than that of SOC. Besides, the ROSS variantswith size control features achieve similar performance to the cen-tralized scheme in terms of cluster size, and the cluster robustnessis similar when applying the variants of ROSS and the centralizedscheme respectively.As to the variants of ROSS, the greedy mechanism in ROSS-DGA helps to improve the performance on cluster size and clusterrobustness at the cost of mildly increased signaling overhead. Wealso notice that as a metric, the number of CCs per non-singletoncluster doesn’t indicate the robustness of clusters as shown inFigure 11 and 19, although it is adopted as the metric in theformation of clusters. onclusion

In this paper we investigate the robust clustering problem in CRNextensively. We provide the mathematical description of the prob-lem and prove the NP hardness of it. We propose both centralizedand distributed schemes ROSS, the cluster structure generatedby them has longer time expectancy against the primary users’activity. Besides, the proposed schemes can generate clusters withdesired sizes. The congestion game model in game theory is usedto design the distributed schemes. Through simulation and theo-retical analysis, we ﬁnd that distributed schemes achieve similar

40 60 80 100 120 140 160 180 200 220 0%10%20%30%40%50%60%70%80%90%

Number of PUs P e r c en t age o f un c l u s t e r ed node s ROSS−DGAROSS−DFAROSS− δ −DGAROSS− δ −DFASOC Fig. 16: 100 CRs

40 60 80 100 120 140 160 180 200 220 0% 10% 20% 30% 40% 50% 60% 70% 80% 90%100%

Number of PUs P e r c en t age o f un c l u s t e r ed node s ROSS−DGAROSS−DFAROSS− δ −DGAROSS− δ −DFASOC Fig. 17: 200 CRs

50 70 90 110 130 150 170 190 210 230 0% 10% 20% 30% 40% 50% 60% 70% 80% 90%100%

Number of PUs P e r c en t age o f un c l u s t e r ed node s ROSS−DGAROSS−DFAROSS− δ −DGAROSS− δ −DFASOC Fig. 18: 300 CRsFig. 19: Percentage of CR nodes which are not included in any non-singleton clusters

Cluster size E CD F ROSS−DGAROSS−DFAROSS− δ −DGAROSS− δ −DFASOC Fig. 20: 100 CRs, 30 PUs in network

Cluster size E CD F ROSS−DGAROSS−DFAROSS− δ −DGAROSS− δ −DFASOC Fig. 21: 200 CRs, 30 PUs in network

Cluster size E CD F ROSS−DGAROSS−DFAROSS− δ −DGAROSS− δ −DFASOC Fig. 22: 300 CRs, 30 PUs in networkFig. 23: Cumulative distribution of CRs residing in clusters with di ﬀ erent sizesFig. 24: The number of formed clusters.performance with centralized optimization in terms of cluster ro-bustness, signaling overhead and cluster sizes, and outperform thecomparison distributed scheme on the above mentioned metrics.The shortcoming of distributed scheme ROSS is it doesn’tgenerate clusters whose sizes exceed the cluster head’s neigh-borhood. The reason is with ROSS, cluster heads form clusterson the basis of their neighborhood, and don’t involve the nodeswhich are outside the neighborhood. In the other way around,forming big cluster which extends a cluster head’s neighborhoodhas limited application scenarios, as multiple hop communicationand coordination are required within these clusters. P roof of T heorem Proof.

We consider a CRN which can be represented as a con-nected graph. To simplify the discussion, we assume the secondaryusers have unique individual connectivity degrees. Each user hasan identical ID and a neighborhood connectivity degree. Thisassumption is fair as the neighborhood connectivity degrees andnode ID are used to break ties in Algorithm 1, when the individualconnectivity degrees are unique, it is not necessary to use theformer two metrics.For the sake of contradiction, let us assume there exist somesecondary user α which is not included into any cluster. Then thereis at least one node β ∈ Nb α such that d α > d β . According toAlgorithm 1, δ is not included in any clusters, because otherwise d β = M , a large positive integer. Now, we distinguish betweentwo cases: If β becomes cluster head, node α is included, theassumption is not true. If β is not a cluster head, then β is notin any cluster, we can repeat the previous analysis made on node α , and deduce that node β has at least one neighbouring node γ with d γ < d β . Till now, when there is no cluster head identiﬁed,the unclustered nodes, i.e., α , β form a linked list, where theirconnectivity degrees monotonically decrease. But this list will notcontinue to grow, because the minimum individual connectivity Algorithm 1:

ROSS phase I: cluster head determination andinitial cluster formation for CR node i Input: d j , g j , j ∈ Nb i \ Λ , Λ means cluster heads. Emptysets τ , τ Result:

Returning 1 means i is cluster head, then d j is set to0, j ∈ Nb i \ Λ . returning 0 means i is not clusterhead. if (cid:64) j ∈ Nb i \ Λ , such that d i ≥ d j then return 1; end if ∃ j ∈ Nb i \ Λ , such that d i > d j then return 0; else if (cid:64) j ∈ Nb i \ Λ , such that d j == d i then τ ← j end end if (cid:64) j ∈ τ , such that g i ≤ g j then return 1; end if ∃ j ∈ τ , such that g i < g j then return 0; else if (cid:64) j ∈ τ , such that g j == g i then τ ← j end end if ID i is smaller than any ID j , j ∈ τ \ i then return 1; end return 0; Algorithm 2:

ROSS phase I: cluster head guarantees theavailability of CC (start from line 1) / cluster size control(start from line 2) Input:

Cluster C, empty sets τ , τ Output:

Cluster C has at least one CC, or satisﬁes therequirement on cluster size while K C = ∅ do while | C | > t · δ do if ∃ only one i ∈ C \ H C , i = arg min( | K H C ∩ K i | ) then C = C \ i ; else ∃ multiple i which satisﬁes i = arg min( | K H C ∩ K i | ); τ ← i ; end if ∃ only one i ∈ τ ,i = arg max( | ∩ j ∈ C \ i K j | − | ∩ j ∈ C K j | ) then C = C \ i ; else C = C \ i , where i = arg min i ∈ τ ID i end end end Algorithm 3: Debatable node i decides its a ﬃ liation in phaseII of ROSS Input: all claiming clusters C ∈ S i Output: one cluster C ∈ S i , node i notiﬁes all its claimingclusters in S i about its a ﬃ liation decision. while i has not chosen the cluster, or i has joined cluster ˜ C,but ∃ C (cid:48) ∈ S i , C (cid:48) (cid:44) ˜ C, which has | K ( C (cid:48) \ i ) | − | K ( C (cid:48) ) | < | K ( C \ i ) | − | K ( C ) | do if ∃ only one C ∈ S i , C = arg min( | K ( C \ i ) | − | K ( C ) | ) then return C ; else ∃ multiple C ∈ S i which satisﬁes C = arg min( | K ( C \ i ) | − | K ( C ) | ); τ ← C ; end if ∃ only one C ∈ τ , C = arg max( K h C ∩ K i ) then return C ; else ∃ multiple C ∈ S i which satisﬁes C = arg max( K h C ∩ K i ); τ ← C ; end if ∃ only one C ∈ τ , C = arg min | C | ) then return C ; else return arg min C ∈ τ h C ; end end degree is zero, and the length of this list is upper bounded by thetotal number of nodes in the CRN. An example of the formed nodeseries is shown as Figure 25. α β γ ωψ d α d β d γ d ψ d ω > > > > > Fig. 25: The node series discussed in the proof of Theorem 4.1,the deduction begins from node α In this example, node ω is at the tail of the list. As ω does nothave neighboring nodes with lower individual connectivity degree, ω becomes a cluster head. Then ω incorporates all its one-hopneighbours (here we assume that every newly formed cluster hascommon channels), including the nodes which precede ω in thelist. The nodes which join a cluster set their individual connectiondegrees to M , which enables the node immediately precede inthe list to become a cluster head. In this way, cluster heads aregenerated from the tail of list to the head of the list, and all thenodes in the list are in at least one cluster, which contradicts theassumption that α is not included in any cluster.If we see a secondary user becoming a cluster head , or be-coming a cluster member as one step, as the length of the list ofsecondary users is not larger than N , there are N steps for thisscenario to form the initial clusters. (cid:3) P roof of T heorem Proof.

To prove the robust clustering problem is NP-hard, wereduce the maximum weighted k-set packing problem , which is NP-hard when k (cid:62) k with weights for each set, the maximumweighted packing problem is that of ﬁnding a collection of disjointsets of maximum total weight. The decision version of weighted k -set packing problem is, D EFINITION Given a ﬁnite set G of non-negative integerswhere G (cid:40) N , and a collection of sets Q = { S , S , · · · , S m } whereS i ⊆ G and max( S i ) ≥ for ≤ i ≤ m. Every set S in Q has aweight ω ( S ) ∈ R . The problem is to ﬁnd a collection S ⊆ Q suchthat S contains only pairwise disjoint sets and the total weightof the sets in S is greater than a given positive number λ , i.e., (cid:80) S ∈S ω ( S ) > λ . We assume the weights of sets are positive integers. Then wewill show that any instance I of a weighted k -set packing problem,i.e., a collection of sets, can be transformed to a clusters forma-tion for a CRN. W.l.o.g. let set G = { , . . . , N } . The polynomialalgorithm σ consists of three steps. • First, the sets in the instance I are mapped sequentially to theclusters of CR nodes on a two-dimensional Euclidean plane,where the CR user ID is identical with the corresponding ele-ment’s index. • Second, for each mapped cluster C , we assign the channelsfor the nodes in C so that | K ( C ) | equals to the ω ( S ). We cansimply assign the ﬁrst | K ( C ) | channels to each CR node in C ,without considering the possible mismatch when the same CRnode appears in different clusters and is assigned with differentchannels.The number of steps is dependent on I , which is between 1 and N Assume we have a robust clustering black box which can checkwhether the clustering instance meets the requirement, i.e., clustersare not overlapping and the total sum of CCs exceed λ or not. If yes is said, then the total weight of the corresponding instance of themaximum weighted k-set packing problem is greater than λ . If theblack box said no , either due to overlapping clusters, or the sum ofCCs over all clusters is smaller than λ , the corresponding instanceof the packing problem is not an solution.Hence, the weighted k -set packing can be reduced to the robustclustering problem in CRN, then the latter problem is of NP-hard. (cid:3) R eferences [1] J. Mitola and G. Q. Maguire, “Cognitive radio: making software radiosmore personal,” IEEE Personal Communications , vol. 6, no. 4, pp. 13–18,Aug 1999.[2] T. Yucek and H. Arslan, “A survey of spectrum sensing algorithms forcognitive radio applications,”

IEEE Communications Surveys Tutorials ,vol. 11, no. 1, pp. 116–130, First 2009.[3] Q. Zhao and B. Sadler, “A survey of dynamic spectrum access,”

SignalProcessing Magazine, IEEE , vol. 24, no. 3, pp. 79–89, May 2007.[4] A. Sahai, R. Tandra, S. M. Mishra, and N. Hoven, “Fundamental designtradeo ﬀ s in cognitive radio systems,” in Proc. of ACM TAPAS ’06 .[5] I. F. Akyildiz, B. F. Lo, and R. Balakrishnan, “Cooperative spectrumsensing in cognitive radio networks: A survey,”

Phys. Commun. , vol. 4,no. 1, pp. 40–62, Mar. 2011.[6] C. Sun, W. Zhang, and K. B. Letaief, “Cluster-based cooperative spec-trum sensing in cognitive radio systems,” in proc. of IEEE ICC 2007 .[7] J. Zhao, H. Zheng, and G.-H. Yang, “Spectrum sharing through dis-tributed coordination in dynamic spectrum access networks,”

WirelessCom. and Mobile Computing , vol. 7, no. 9, 2007. [8] D. Willkomm, M. Bohge, D. Hollós, J. Gross, and A. Wolisz, “Doublehopping: A new approach for dynamic frequency hopping in cognitiveradio networks,” in

Proc. of PIMRC 2008 .[9] C. Passiatore and P. Camarda, “A centralized inter-network resourcesharing (CIRS) scheme in IEEE 802.22 cognitive networks,” in

Proc.of IFIP Annual Mediterranean Ad Hoc Networking Workshop 2011 .[10] A. A. Abbasi and M. Younis, “A survey on clustering algorithms forwireless sensor networks,”

Comput. Commun. , vol. 30, no. 14-15, pp.2826–2841, 2007.[11] Q. Wu, G. Ding, J. Wang, X. Li, and Y. Huang, “Consensus-baseddecentralized clustering for cooperative spectrum sensing in cognitiveradio networks,”

Chinese Science Bulletin , vol. 57, 2012.[12] H. D. R. Y. Huazi Zhang, Zhaoyang Zhang1 and X. Chen, “Distributedspectrum-aware clustering in cognitive radio sensor networks,” in

Proc.of GLOBECOM 2011 .[13] B. E. Ali Jorio, Sanaa El Fkihi and D. Aboutajdine, “An energy-e ﬃ cientclustering routing algorithm based on geographic position and residualenergy for wireless sensor network,” Journal of Computer Networks andCommunications , vol. 2015, 04 ’15.[14] V. Kawadia and P. R. Kumar, “Power control and clustering in ad hocnetworks,” in

Proc. of INFOCOM ’03 , 2003, pp. 459–469.[15] M. Krebs, A. Stein, and M. A. Lora, “Topology stability-based clusteringfor wireless mesh networks,” in

IEEE GLOBECOM 2010 .[16] T. Chen, H. Zhang, G. Maggio, and I. Chlamtac, “Cogmesh: A cluster-based cognitive radio network,”

Proc. of DySPAN ’07 .[17] K. Baddour, O. Ureten, and T. Willink, “E ﬃ cient clustering of cognitiveradio networks using a ﬃ nity propagation,” in Proc. of ICCCN 2009 .[18] D. Wu, Y. Cai, L. Zhou, and J. Wang, “A cooperative communicationscheme based on coalition formation game in clustered wireless sensornetworks,”

IEEE Transactions on Wireless Communications, , vol. 11,no. 3, pp. 1190 –1200, march 2012.[19] A. Asterjadhi, N. Baldo, and M. Zorzi, “A cluster formation protocolfor cognitive radio ad hoc networks,” in

Proc. of European WirelessConference 2010 , pp. 955–961.[20] M. Ozger and O. B. Akan, “Event-driven spectrum-aware clustering incognitive radio sensor networks,” in

Proc. of IEEE INFOCOM 2013 .[21] S. Liu, L. Lazos, and M. Krunz, “Cluster-based control channel allocationin opportunistic cognitive radio networks,”

IEEE Trans. Mob. Comput. ,vol. 11, no. 10, pp. 1436–1449, 2012.[22] N. Mansoor, A. Islam, M. Zareei, S. Baharun, and S. Komaki, “Construc-tion of a robust clustering algorithm for cognitive radio ad-hoc network,”in

Proc. of CROWNCOM 2015 .[23] D. Li and J. Gross, “Robust clustering of ad-hoc cognitive radio networksunder opportunistic spectrum access,” in

Proc. of IEEE ICC ’11 .[24] B. Clark, C. Colbourn, and D. Johnson, “Unit disk graphs,”

Annals ofDiscrete Mathematics , vol. 48, no. C, pp. 165–177, 1991.[25] Y. Zhang, G. Yu, Q. Li, H. Wang, X. Zhu, and B. Wang, “Channel-hopping-based communication rendezvous in cognitive radio networks,”

IEEE / ACM Transactions on Networking , vol. 22, no. 3, pp. 889–902,June 2014.[26] Z. Gu, Q.-S. Hua, and W. Dai, “Fully distributed algorithms for blindrendezvous in cognitive radio networks,” in

Proceedings of the 2014ACM MobiHoc , ser. MobiHoc ’14.[27] Y. P. Chen, A. L. Liestman, and J. Liu, “Clustering algorithms for adhoc wireless networks,” in

Ad Hoc and Sensor Networks. Nova SciencePublishers , 2004.[28] E. Perevalov, R. S. Blum, and D. Saﬁ, “Capacity of clustered ad hocnetworks: how large is "large"?”

IEEE Transactions on Communications ,vol. 54, no. 9, pp. 1672–1681, Sept 2006.[29] A. MacKenzie and S. Wicker, “Game theory in communications: moti-vation, explanation, and application to power control,” in

Proc. of IEEEGLOBECOM 2001 .[30] J. O. Neel, “Analysis and design of cognitive radio networks anddistributed radio resource management algorithms,” Ph.D. dissertation,Blacksburg, VA, USA, 2006, aAI3249450.[31] B. Wang, Y. Wu, and K. R. Liu, “Game theory for cognitive radionetworks: An overview,”

Comput. Netw. , vol. 54, no. 14, pp. 2537–2561,Oct. 2010.[32] B. J. S. Chee and C. Franklin, Jr.,

Cloud Computing: Technologies andStrategies of the Ubiquitous Data Center , 1st ed. CRC Press, Inc., 2010.[33] H. Ackermann, H. RÃ˝uglin, and B. VÃ˝ucking, “Pure Nash equilibria inplayer-speciﬁc and weighted congestion games,”

Theoretical ComputerScience , vol. Vol. 410, no. 17, pp. 1552 – 1563, 2009.[34] S. Basagni, “Distributed clustering for ad hoc networks,”

Proc. of I-SPAN’99 , pp. 310 –315, 1999. [35] X.-Y. Li, Y. Wang, and Y. Wang, “Complexity of data collection, aggre-gation, and selection for wireless sensor networks,” IEEE Transactionson Computers , vol. 60, no. 3, pp. 386–399, 2011.[36] M. Onus, A. Richa, K. Kothapalli, and C. Scheideler, “E ﬃ cient broad-casting and gathering in wireless ad-hoc networks,” in Proc. of ISPAN2005 .[37] M. R. Garey and D. S. Johnson,