[PDF] Analysis of Spectrum Occupancy Using Machine Learning Algorithms

Abstract

In this paper, we analyze the spectrum occupancy using different machine learning techniques. Both supervised techniques (naive Bayesian classifier (NBC), decision trees (DT), support vector machine (SVM), linear regression (LR)) and unsupervised algorithm (hidden markov model (HMM)) are studied to find the best technique with the highest classification accuracy (CA). A detailed comparison of the supervised and unsupervised algorithms in terms of the computational time and classification accuracy is performed. The classified occupancy status is further utilized to evaluate the probability of secondary user outage for the future time slots, which can be used by system designers to define spectrum allocation and spectrum sharing policies. Numerical results show that SVM is the best algorithm among all the supervised and unsupervised classifiers. Based on this, we proposed a new SVM algorithm by combining it with fire fly algorithm (FFA), which is shown to outperform all other algorithms.

Full PDF

aa r X i v : . [ c s . N I] M a r Analysis of Spectrum Occupancy UsingMachine Learning Algorithms

Freeha Azmat, Yunfei Chen,

Senior Member, IEEE , and Nigel Stocks

Abstract

In this paper, we analyze the spectrum occupancy using different machine learning techniques. Bothsupervised techniques (naive Bayesian classiﬁer (NBC), decision trees (DT), support vector machine(SVM), linear regression (LR)) and unsupervised algorithm (hidden markov model (HMM)) are studiedto ﬁnd the best technique with the highest classiﬁcation accuracy (CA). A detailed comparison of thesupervised and unsupervised algorithms in terms of the computational time and classiﬁcation accuracyis performed. The classiﬁed occupancy status is further utilized to evaluate the probability of secondaryuser outage for the future time slots, which can be used by system designers to deﬁne spectrum allocationand spectrum sharing policies. Numerical results show that SVM is the best algorithm among all thesupervised and unsupervised classiﬁers. Based on this, we proposed a new SVM algorithm by combiningit with ﬁre ﬂy algorithm (FFA), which is shown to outperform all other algorithms.

Index Terms

Fire ﬂy algorithm, hidden markov model, spectrum occupancy and support vector machine.

September 10, 2018 DRAFT

I. I

NTRODUCTION

A cognitive radio network (CRN) is composed of two types of users, namely, the licensedprimary users (PU’s) and the unlicensed secondary users (SU’s). The core idea behind CR isto allow unlicensed user’s access to the licensed bands in an opportunistic manner to avoidinterference with the licensed users. To achieve this, a realistic understanding of the dynamicusage of the spectrum is required. The spectrum measurement is an important step towardsthe realistic understanding of the dynamic spectrum usage. Various spectrum measurementcampaigns covering a wide range of frequencies have been performed [1]. These pectrummeasurements studies have found signiﬁcant amount of unused frequency bands in the caseof normal usage due to the static spectrum regulations. This has led researchers to understandthe spectrum occupancy characteristics in depth for exploiting the free spectrum.

A. Problem deﬁnition

Many studies have been performed to understand the occupancy statistics. For instance, thestatistical and spectral occupation analysis of the measurements was presented in [2] in order tostudy the trafﬁc density in all frequency bands. In [3], autoregressive model was used to predictthe radio resource availability using occupancy measurements in order to achieve uninterrupteddata transmission of secondary users. In [4], the occupancy statistics were utilized to selectthe best channels for control and data transmission purposes, so that less time is required forswitching transmission from one channel to the other for the case when the PU appears. Further,In [5], [6], the bandwidth efﬁciency was maximized by controlling the transmission power ofcognitive radio using spectrum occupancy measurements.In [7], different time series models were used to categorize speciﬁc occupancy patterns in thespectrum measurements. All of the aforementioned works have evaluated the spectrum occupancymodels by using conventional probabilistic or statistical tools. These tools are often limited dueto assumptions required to derive their theories. For example, one has to determine whether thevalue is random variable or a random process in order to use either probabilistic and statistical

September 10, 2018 DRAFT tools. On the other hand, machine learning (ML) is a very powerful tool that has receivedincreasing attention recently [8]. The machine learning algorithms are often heuristic, as theydon’t have any prerequisites or assumptions on data. As a result, in many cases, they providehigher accuracy than conventional probabilistic and statistical tools. There are very few workson the use of ML in spectrum occupancy. For example, the ML works related to CR in [9]-[13] discussed cooperative spectrum sensing and spectrum occupancy variation. However, in thispaper, we aim to provide a comprehensive investigation on the use of ML for analyzing spectrumoccupancy. The motivation is that different ML algorithms are often suitable for different typesof data. Thus, one needs to try different ML algorithms in order to ﬁnd the one that suits thespectrum data best, not just one ML algorithm.

B. Contributions

The contributions are listed as follows:1. We propose the use of ML algorithms in spectrum occupancy study. Both supervised andunsupervised algorithms are used. The machine learning techniques are advantageous becausethey are capable of implicitly learning the surrounding environment and are much more adaptivecompared with the traditional spectrum occupancy models. They can describe more optimizeddecision regions on feature space than other approaches. In [9] and [10], ML was used forcooperative spectrum sensing. However we use ML for spectrum occupancy modelling that maybe used in all CR operations, including spectrum management, spectrum decision and spectrumsensing. In [11], authors have discussed call-based modelling for analyzing the spectrum usageof the dataset collected from the cellular network operator. Further, they have shown that randomwalk process can be used for modeling aggregate cell capacity. However, we use ML to modelspectrum occupancy in time slots for all important bands.2. We have utilized four supervised algorithms, naive Bayesian classiﬁer (NBC), decision trees(DT), support vector machine (SVM), linear regression (LR), and one unsupervised algorithm,hidden markov model (HMM), to classify the occupancy status of time slots. The classiﬁed

September 10, 2018 DRAFT occupancy status is further utilized for evaluating the probability of SU outage. In [12], HMMwas used to predict the channel status. Our supervised algorithms and modiﬁed HMM all performbetter than HMM. In [13], LR was used to investigate the spectrum occupancy variation in timeand frequency. Our approach outperforms LR as well.3. We propose a new technique that combines SVM with ﬁre ﬂy algorithm (FFA) thatoutperforms all supervised and unsupervised algorithms.The rest of the paper is organized as follows: Section II explains the system model, followedby the detailed explanation of classiﬁers in Section III. The numerical results and discussion arepresented in Section IV. II. S

YSTEM M ODEL

A. Measurement setup and data

We have measured the data from 880 MHz to 2500 MHz containing eight main radio frequencybands for approximately four months (6th Feb-18th June 2013) at the University of Warwickusing radiometer. The eight bands are: 880-915 MHz, 925-960 MHz, 1900-1920 MHz, 1920-1980MHz, 1710-1785 MHz, 1805-1880 MHz, 2110-2170 MHz and 2400-2500 MHz. The numberof the frequency bins in each band varies. For example, the band 925-960 MHz contains 192frequency bins, each occupying a bandwidth of 0.18 MHz, while the band 1710-1785 MHzcontains 448 frequency bins, each occupying a bandwidth of 0.167 MHz. The data is arrangedin a two dimensional matrix ( t i , f j ) for each band; where each row t i represents the measureddata at different frequencies in one minute while each column f j represents the data at differenttime instants of each frequency bin. As we have measured the data for four months whichconstitute 131 days (188917 minutes), the numbers of rows are 188917 while the number ofcolumns varies according to the number of the frequency bins in a particular band. September 10, 2018 DRAFT

B. SU Model

In a network of licensed users, SU is allowed to access the licensed band without causingany harmful interference to the PU. Let i denote the time slot and j denote the frequency bin,where i = 1 , , ..n , j = 1 , , ...k , n represents the total number of time slots and k representsthe total number of frequency bins. Using energy detection [14], if y i ( j ) is the sample sensedat the i th time slot in the j th frequency bin. One has y i ( j ) = x i ( j ) + w i ( j ) (1a) or y i ( j ) = w i ( j ) (1b)where x i ( j ) represents the received PU signal and w i ( j ) represents the additive white Gaussiannoise (AWGN) with zero mean and variance σ w . Each sample is compared with a threshold ( γ ).The selection of γ is very important because small values of γ will cause false alarms while largevalues will miss spectrum opportunities. The computation of γ was explained in [15]. In ourapproach, the threshold is dynamic and its selection is explained in Section IV-B. The spectrumstatus is given as S i ( j ) =  , y i ( j ) > γ , y i ( j ) < γ. The occupancy for the ith time slot for all k frequency bins is deﬁned as OC i = P kj =1 S i ( j ) k (2)For example, a three minutes interval for the band 880 - 890 MHz having 9 frequency binsis shown in Fig.1, where each bin occupies 1MHz. For each frequency bin, S i ( j ) is decided.Once S i ( j ) is evaluated, the occupancy OC i is calculated using (2). It is observed that morefrequency bins are occupied for the ﬁrst minute than for the second and third minutes so that ithas less chance for SU to transmit. Following the discussion above, we need to set the criteriafor quantifying this chance based on the occupancies. September 10, 2018 DRAFT

Fig. 1. Occupancy for different time slots in the band.

C. PU Model

As per our approach, the status of PU ( P i ) for each i th time slot can be decided using thefollowing rules: P i =  , OC i > U oc ( Condition , L oc < = OC i < = U oc AN D con i < B ( Condition , L oc < = OC i < = U oc AN D con i > = B ( Condition , OC i < L oc ( Condition where U oc and L oc represents the maximum and minimum values of occupancy for all n timeslots, con i represents the number of consecutive free frequency bins in each ith time slot and B represents the maximum value of con i , when PU is considered present. Each condition isexplained as follows:1. Condition 1 and Condition 4: The values of U oc and L oc vary with the frequency band, the September 10, 2018 DRAFT day and the threshold. Our test show that U oc should not be less than 75 % and L oc should not begreater than 40 % . For ﬁxed frequency band and day, we have evaluated U oc and L oc for differentthresholds in Section IV-B. In order to guarantee PU protection and ensure SU transmissionwhen the values of OC i lie in the range between L oc and U oc , further criterion is applied.2. Condition 2 and Condition 3: When L oc < = OC i < = U oc , it is difﬁcult to apply condition1 and condition 4. So we evaluate con i for each time slot. If con i > B for L oc < = OC i < = U oc , there exists at least B consecutive free frequency bins in ith time slot; thus SU can transmitand vice versa when con i > B . The value of B is selected to provide PU protection. This willbe explained in Section IV-B. D. Machine Learning Framework for SU and PU Model

ML constructs a classiﬁer to map S i to P i , where S i = [ S i (1) , S i (2) , ..S i ( k )] represents thefeature vector and P i is the corresponding response to the feature vector. There are two stepsfor constructing a classiﬁer:

1) Training:

Let S itrain = [ S i (1) train , S i (2) train ..., S i ( k ) train ] T denote the training spectrumstatus and P itrain represent the training PU status for the ith time slot respectively, where i =1 , , ..n and n represents the number of training time slots fed into the classiﬁer.

2) Testing:

Once the classiﬁer is successfully trained, it is ready to receive the test vector forclassiﬁcation. Let S itest = [ S i (1) test , S i (2) test ..., S i ( k ) test ] T denote the testing spectrum status and P itest represent the testing PU status for the ith time slot respectively, where i = n , n , ..n and n represents the length of testing sequence. It is assumed that n = n + n . For our proposedapproach, the matrix of size n ∗ k is divided into training data matrix of size n ∗ k and testing data matrix of size n ∗ k . The value P itest is not used during the testing but as areference for computing the classiﬁcation error.

3) Classiﬁcation Accuracy (CA):

Let P ieval denote the PU status determined by the classiﬁerfor the ith time slot. The classiﬁer categorizes the testing vector S itest as ’occupied class’ (i.e., P ieval = 1 ) or ’unoccupied class’ (i.e., P ieval = 0 ). Therefore, the PU status is correctly determined, September 10, 2018 DRAFT when P ieval = P itest , giving CA i = 1 . The misdetection occurs, when P ieval = 0 and P itest = 1 while false alarm occurs, when P ieval = 1 and P itest = 0 , giving CA i = 0 . E. Probability of SU outage

Let P ieval be a vector of length (( n − n ) + 1) evaluated by each classiﬁer, and P ieval representthe presence/absence of PU for the i th time slot. When P ieval = 0 , SU is allowed to utilize the i th time slot. Deﬁne out su as the minimum value of consecutive free time slots required by SUfor transmission. SU outage occurs, when SU cannot ﬁnd out su consecutive free time slots in avector P ieval of length (( n − n ) + 1) . The probability of SU outage is given by P ( SU outage ) = 1 − P ( SU transmit ) (3a)where P ( SU transmit ) = C X c =1 P ( F B c ) (3b)where F B c represents the block of free consecutive time slots of length out su , c = { , , ..C } and C represents the total number of free blocks present in P ieval . The probability for a freeblock starting at index, say r in P ieval is evaluated using the following equation P ( F B c ) = r + out su Y i = r OC i . (3c)III. P ROPOSED A LGORITHMS

In the proposed approach, ﬁve machine learning algorithms are utilized to predict the futurePU status using the occupancy data, which is a function of time, frequency and threshold.Among them, four are supervised learning algorithms: NBC, DT, SVM and LR, while one isan unsupervised algorithm, HMM. The motivation to use ﬁve different algorithms is to ﬁnd thebest machine learning algorithm as they have different characteristics.

September 10, 2018 DRAFT

A. Naive Bayesian Classiﬁer

A Naive Bayesian classiﬁer is a generative model based on the Bayes theorem. It is also called’independent feature model’ because it does not take dependency of features into account. Thefeature vector for the ith time slot in our model contains all the samples which are independent ofeach other, since every feature represents a speciﬁc frequency bin. For example, the status vectorof the ith time slot is given as S i = S i (1) , S i (1) , S i (2) , .., S i ( k ) , where S i (1) is independentfrom S i (2) . However, the response variable in our approach i.e. PU status ( P i ) is a dependentvariable which is affected by each frequency bin. As our features are independent, so we willuse NBC for classiﬁcation. The probability of S i belonging to the class P i evaluated using theBayes theorem is formally deﬁned as [16] p ( P i , S i ) = p ( P i ) ∗ p ( S i | P i ) . (4)when P i = 0 , S i will be classiﬁed as ’idle’ class, while when P i = 1 , S i will be classiﬁedas ’occupied’ class. The goal is to ﬁnd the class with the largest posterior probability in theclassiﬁcation phase. The classiﬁcation rule is given as classif y ( ˆ S i ) = argmax S i { p ( P i , ( ˆ S i ) } (5)where ˆ S i = { ˆ S i (1) , ˆ S i (2) ... ˆ S i ( k ) } . NBC is sensitive to the choice of kernel and the priorprobability distribution of classes. This will be explained in Section IV-B. B. Decision Trees

Decision tree builds classiﬁcation or regression models in the form of a tree structure. Thedecision trees used in this approach are classiﬁcation trees whose leaf represents the class labels.Unlike NBC, it can handle feature interactions and dependencies. In DT, the decision is madeon each internal node which is used as a basis for dividing the data into two subsets while leafnodes represent the class labels (in the case of classiﬁcation trees) or the real numbers (in thecase of regression trees). Data come in the form

September 10, 2018 DRAFT ( S i , P i ) = ( S i (1) , S i (2) , S i (3) .., S i ( k ) , P i ) . (6)where P i is the dependent variable representing the class label of ith time slot. The class labels P i are assigned by calculating the entropy of the feature, as [17] Entropy ( t ) = − Z X id =0 p ( id | t ) log p ( id | t ) . (7)Where p ( id | t ) denote the fraction of records belonging to class id at a given node t and Z represents the total number of classes. In our approach, Z = 1 . The smaller entropy implies thatall records belong to the same class. It will be discussed in Section IV-C on how fraction ofrecords per node affects the classiﬁcation accuracy of DT. C. Support Vector Machines

SVM is a discriminative classiﬁer with high accuracy. Unlike DT, it prevents over-ﬁtting andcan be used for online learning [18]. There are two types of classiﬁers in SVM: linear SVM forseparable data and non-linear SVM for non-separable data. The linear classiﬁer is used here. Thetraining feature and response vectors can be represented as D = ( P i , S i ) where P i ∈ { , } . Thetwo classes are separated by deﬁning a random division line H represented as d. S i + b = ρ , where d and b represent the weighting vector and bias, respectively, while ρ represents the constantfor dividing two hyper planes. The maximum-margin hyper planes that divide the points having P i = 1 from those P i = 0 are given as: P i = +1 when d. S i + b > ρ ( Occupied Class ) (8a) P i = 0 when d. S i − b < ρ ( Idle Class ) (8b)The separation between two hyper planes is margin, controlled by the parameter called boxconstraint Box ct . We have evalauted the optimal value of Box ct using a bio-inspired techniquei.e. FFA in our approach. September 10, 2018 DRAFT0

D. SVM with Fire Fly Algorithm

In FFA, let X be a group of ﬁre ﬂies, X = [ l , l , ..l X ] , initially located at speciﬁc positions a X = [ a l , a l , ..a l X ] . Each ﬁre ﬂy moves and tries ﬁnd a brighter ﬁre ﬂy, which has more lightintensity than its own. The objective function f ( x ) used for evaluating the brightness of the ﬁreﬂy in our approach is the classiﬁcation accuracy i. e. f ( x ) = CA ( a X ) . When a ﬁre ﬂy, say l ﬁnds another brighter ﬁre ﬂy l at another location having more intensity compared to its own,it tends to move towards ﬁre ﬂy l . The change in position is determined as [20] a v +1 l = a vl + β e − ψ l l rd l l ( a vl − a vl ) + α ( rand − . (9)where v represents the number of iterations, a l and a l represents the position of ﬁre ﬂy l and l respectively, α , β and ψ l l are constants and rand is a uniformly distributed random number.For our approach, the starting positions of the X ﬁre ﬂies are initialized, while the position ofeach ﬁre ﬂy represents the value of box constraints Box ct . E. Linear Regression

The ﬂexibility of linear regression to include mixture of various features in different dimensionse. g. space, frequency, time and threshold as a linear combination is the main motivation of usingit for modeling in this approach. The linear regression model for our approach is given by: P i = e + e S i (1) + e S i (2) + ... + e k S i ( k ) = e + k X j =1 e j S i ( j ) . (10)where the class label P i is represented as a linear combination of parameters e , e , , e k andfeatures ( S i (1) , S i (2) , .., S i ( k ) ) in the ith time slot. The stepwise-linear regression is used in thisapproach. In each step, the optimal term based on the value of deﬁned ’criterion’ is selected. The’criterion’ can be set as the sum of squares error (SSE), deviance, akaike information criterion(AIC), Bayesian information criterion (BIC) or R-squared etc. SSE is used in this approach.The small values of SSE are encouraged for a good model. It is observed from (10), that thecomputational time for evaluating the response of the model linearly increases with the number September 10, 2018 DRAFT1 of frequency bins/ predictors involved. So we need to select an appropriate number of predictorsfor linear regression.

F. Hidden Markov Models

It is an unsupervised algorithm for modeling the time series data. The motivation to use theunsupervised algorithm is that it does not need the training phase. In HMM, the sequence ofstates can be recovered by an analysis of the sequence of observations. The set of states andobservations are represented by U and G given as U = ( u , u , ...u N ) , G = ( g , g , ...g M ) , where u and u represent the states when P i = 0 and P i = 1 , respectively. The observations g and g represent the value of OC i corresponding to each P i . HMM is deﬁned as λ = ( C h , D h , π ) (11)where the transition array C h is the probability of switching from state u to state u given as[21], C h = [ c ] = P ( q t = u | q t − = u ) . The D h is the probability of observation g beingproduced from state, D h = [ d , ] = P ( o t = g , | q t = u ) and π is the initial probability array, π = P ( q = u ) .HMM has two main steps. In the ﬁrst step, the sequence of observations O = ( o , o , ...o T ) ,transition probability matrix C h and emission probability matrix D h are utilized to ﬁnd theprobability of observations O given hmm model λ given in ( [21], Eq.13) as, P ( O | λ ) = P Q P ( O | Q, λ ) P ( Q | λ ) , where Q = ( q , q , ...q T ) and P ( O | Q, λ ) = Q Tt =1 P ( o t | q t , λ ) = g q ( o ) ∗ g q ( o ) ..g q T ( o T ) . The probability of the state sequence is given as P ( Q | λ ) = π q c q q c q q ...c q T − q T .In the second step, the hidden state sequence, that is most likely to have produced an observationis decoded using the viterbi algorithm. The most likely sequence of states Q L generated using theviterbi algorithm is matched with the expected ﬁxed state sequence Q to compute classiﬁcationaccuracy. HMM can be also be supervised by adding two extra steps as Step(a) : Use the initial guesses of C h and D h to compute Q and O , that are used for computing P ( O | λ ) in forward algorithm September 10, 2018 DRAFT2

Step(b) : Use O , D h and C h in Step(a) to estimate the transition probability matrix C h ′ andemission probability matrix D h ′ using maximum likelihood estimation [22].The C h ′ and D h ′ collectively form the estimated HMM model ( λ e ) that can be further used forevaluating P ( O | λ ) and Q L using the forward algorithm and the Viterbi algorithm respectively.IV. N UMERICAL R ESULTS AND D ISCUSSION

In order to analyze the occupancy of the eight bands, the statistics of data in all bands from 880to 2500 MHz are presented in Section IV-A. The classiﬁcation criteria are explained in SectionIV-B. The selection of the best parameters for each model using the classiﬁcation criteria arediscussed in Section IV-C. The classiﬁcation models with the optimal parameters are comparedto ﬁnd the best classiﬁer in terms of the CA, deﬁned as CA = No. of correct classﬁcationsTotal number of test samples

A. Statistics of Data

The CDF plot is shown in Fig.2 which gives the summarized view of all power ranges forthe eight bands. It can be observed from Fig.2 that the eight bands can be categorized into twomain groups. Group A contains those bands that have wide power ranges between -110 dBm to-30 dBm including 1805-1800 MHz, 1710-1785 MHz and 2110-2170 MHz. Group B has ﬁvebands: 925-960 MHz, 880-915 MHz, 2400-2500 MHz, 1920-1980 MHz and 1900-1920 MHzthat have power ranges between -110 dBm and -100 dBm. Thus, Group A bands have largerstandard deviation than Group B bands. Next we discuss the effects of two main parameters(frequency and threshold) on occupancy.

1) Occupancy Vs Threshold:

The threshold selection is an important task for analyzing theoccupancy of each time slot. We took the minimum and the maximum value of power foreach frequency band and tested seven values of thresholds in this range. Each band is analyzedseparately for the seven values of the threshold using the four months data. Due to limited space,only 925-960 MHz is given in Fig.3. It is observed that occupancy monotonically decreases whenthe value of threshold increases. These results have proved that larger value of threshold willclassify less samples as occupied.

September 10, 2018 DRAFT3

2) Occupancy Vs Frequency:

The relationship between occupancy and frequency is analyzedby computing the occupancy of the jth bin individually. Eq.(2) can be modiﬁed for computingthe occupancy of the jth frequency bin ( OC j = P ni =1 S i ( j ) n ). We have found in Fig.4 a uniqueperiodicity in some bands. We found that four bands can be categorized as the periodic groupbands: 880-915 MHz, 1710-1785 MHz, 2110-2170 MHz and 2400-2500 MHz bands. The bands925-960 MHz, 1805-1880 MHz, 1920-1980 MHz and 2110-2170 MHz do not have this property.The periodicity may be caused by the usage pattern. For instance, the periodicity in eachband lies in their uplink/downlink usage pattern. For instance, the bands 1710-1785 MHz and1900-1920 MHz are uplinks, while the aperiodic bands 1805-1880 MHz and 1920-1980 MHz aredownlinks. The uplink transmits data from the mobile user to base station so that its activity iscompletely determined by mobile users’s periodic usage pattern. On the other hand, the downlinktransmits the data from base station to the mobile user so that its activity is also affected bycontrol and broadcast channels, making it less or non periodic. B. Classiﬁcation Criteria

This subsection studies the choice of U oc , L oc , con i and B in Section II-C as shown in Fig5. We have utilized Day1 (1-1440 min), Day 2 (1441-2448 min) and Day 5 (7200-8640 min)in Band 880-915 MHz, and four different values of threshold: γ = [ − , − , − , − dBm. The parameters U oc and L oc will be selected by M s , which represents the occupancysplit that divides the data into occupied and idle classes. It varies from 0.1 to 0.9 with a stepsize of 0.1. It is observed in Fig.5 that the value of CA depends on day and the value ofthreshold. The actual value of OC itrain in (2) always lies in a certain range, [ L s , U s ] , where L s represents the lowest value of OC itrain and U s represents the maximum value of OC itrain . When L s < = M s < = U s , two groups of classes P i = 0 (available class) and P i = 1 (occupied class)can be classiﬁed correctly. When M s > U s or M s < L s , all the samples will be classiﬁed as oneclass because OC itrain is a closed set whose values do not lie outside the range [ L s , U s ] . Thisexplains why the CA = 1 for [ L oc , U oc ] = [0 . , . and [ L oc , U oc ] = [0 . , . while CA < September 10, 2018 DRAFT4 for [ L oc , U oc ] = [0 . , . for Day 1 using γ = − dBm. Thus, the classiﬁcation cannot beperformed when M s > U s or M s < L s . The optimal range is [ L oc , U oc ] = [0 . , . for CA < .However, for CA < , there are four different choices of threshold available. In our proposedapproach, we choose that speciﬁc value of threshold that contains the largest number of valuesbetween L oc and U oc . Following this, we have selected γ = − dBm for Day1, Day2 and Day5as the optimal threshold which ensures the largest amount of samples between L oc and U oc . The [ L oc , U oc ] = [0 . , . for Day 1, [ L oc , U oc ] = [0 . , . for Day2 and [ L oc , U oc ] = [0 . , . for Day 5 respectively. The optimal values of γ , U oc and L oc are further used for ﬁnding B foreach day. C. Model Performance Comparison

Following the discussion above, we have compared the performance of the algorithms in thissection using 1 month data of Band 880-915 MHz. Our tests show that the number of minimumobservations/node for DT can be seclected as 17, number of predictors for LR as 15, normalkernel for NBC and linear kernel for SVM. The optimal splitting range, optimal threshold and B will be selected corresponding to the data of each day.

1) Supervised VS Unsupervised Algorithms using k = 55 : In Fig. 6(a), it is observed that themean CA attained by LR, SVM, DT, NBC and HMM is 0.9257, 0.9162, 0.8483, 0.9493 and0.4790 respectively. The mean computation time in each iteration by LR, SVM, DT, NBC andHMM is 350.19, 0.092, 0.0136, 0.0045, and 0.0171 seconds, respectively. Thus, NBC is the bestconsidering the accuracy and complexity.

2) Supervised vs Unsupervised Algorithms using K = 192 : We have compared HMM,Trained HMM, SVM, DT and NBC in Fig.6(b) for 30 days. Each iteration represents 1 day.LR is not shown as it takes an excessively long time in this case. It is observed that trainedHMM performed better than HMM, but worst than DT, NBC and SVM. The mean CA attainedby Trained HMM, HMM, SVM, DT and NBC is 0.6816, 0.4887, 0.8528, 0.8392, 0.7970 whilethe computational time for each iteration of Trained HMM, HMM, SVM, DT and NBC 0.0205,

September 10, 2018 DRAFT5

3) SVM with Fire Fly Algorithm :

So far, the best overall performance is attained by thelinear SVM technique. The performance of linear SVM is affected by the value of

Box ct asillustrated in Section IV-C. The ﬁre ﬂy algorithm can be used to select the best value of Box ct .We set α = 1 , β = 2 and ψ l l = 1 . for FFA. Fig. 7(a) depicts that ’SVM+FFA’ performsbetter than the conventional SVM in most of the cases. The mean CA attained by SVM+FFA,SVM, DT, NBC and HMM is 0.8728, 0.8499, 0.7970, 0.8392 and 0.4822, respectively.

4) Probability of SU Outage:

This probability is computed using SVM+FFA, SVM, DT, NBCand HMM and compared with the expected P ( SU outage ) to compute the difference betweenevaluated and expected values. It is evident in Fig. 7(b) that SVM+FFA has predicted the P ( SU outage ) with minimum difference and is very close to the expected one. The expected SUoutage is 0.9191 in Fig. 7(b) while the predicted P ( SU outage ) using SVM+FFA, SVM, NBC,DT and HMM is 0.9264, 0.9322, 0.9638, 0.9577 and 1, respectively. The P ( SU outage ) for HMMis always 1, which implies that HMM has failed to ﬁnd any block of consecutive free time slotof length out su .

5) Supervised vs Unsupervised Algorithms using different Training/ Testing Data vectors:

We have presented the detailed comparison of supervised and unsupervised algorithms usingdifferent sizes of training and testing data Table 1. The classiﬁcation accuracy and computationtime for all supervised algorithms increases with an increase in the size of the training data.SVM+FFA has attained the highest CA but with the longest computation time in most cases.R

EFERENCES [1] Y. Chen, H-S. Oh, ”A survey of Measurement-based spectrum occupancy modelling for cognitive radios”,

IEEECommunications Surveys and Tutorials , Vol. PP, Issue. 99, pp. 1, Oct 2014.[2] V. Blaschke, H. Jaekel, T. Renk, C. Kloeck, F. K. Jondral, ”Occupation measurements supporting dynamic spectrumallocation for cognitive radio design”,

Proc. CrownCom’07 , pp. 50-57, Orlando, Florida, Aug. 2007.

September 10, 2018 DRAFT6

Performance ComparisonTraining data, Testing data Technique Mean CA Mean Computational Time15 % , 85 % Decision Trees 0.7612 0.0132Support Vector Machine (SVM) 0.8945 0.0128SVM + Fire Fly Algorithm 0.9034 3.0412Hidden Markov Model 0.4925 0.0241Naive Bayesian 0.8714 0.008430 % , 70 % Decision Trees 0.8028 0.0198Support Vector Machine (SVM) 0.9143 0.0153SVM + Fire Fly Algorithm 0.9189 3.8947Hidden Markov Model 0.4841 0.0191Naive Bayesian 0.9064 0.0098TABLE IP

ERFORMANCE C OMPARISON OF F IVE ML ALGORITHMS USING DIFFERENT SIZES OF T RAINING /T ESTIG DATA .[3] S. Kaneko, S. Nomoto, T. Ueda, S. Nomura and K. Takeuchi, ” Predicting radio resource availability in cognitive radio -an experimental examination”,

CrownCom08 , Singapore, May 2008.[4] M. Hoyhtya, S. Pollin, A. Mammela, ”Classiﬁcation-based predictive channel selection for cognitive radios”,

Proc. ICC10 ,pp. 1-6, Cape town, South Africa, May. 2010[5] X. Zhou, J. Ma, Y. Li, Y. H. Kwon, A. C. K. Soong, G. Zhao, ”Probability-based transmit power control for dynamicspectrum access”,

Proc. DySPAN08 , pp. 1-5, Chicago, USA, Oct. 2008.[6] X. Zhou, J. Ma, Y. Li, Y. H. Kwon, A.C.K. Soong, ”Probability-based optimization of inter-sensing duration and powercontrol in cognitive radio”,

IEEE Transactions on Wireless Communications , vol. 8, pp. 4922 - 4927, Oct. 2009.[7] Z. Wang, S. Salous, ”Spectrum occupancy statistics and time series models for cognitive radio”,

Journal of SignalProcessing Systems , vol. 62, Feb. 2011.[8] C. Rudin, K. L. Wagstaff, ”Machine learning for science and society”,

Springer Journal on Machine Learning , Vol. 95,Issue. 1, pp. 1-9, Nov 2013.[9] K. W. Choi, E. Hossain, D. I. Kin, ”Cooperative Spectrum Sensing Under a Random Geometric Primary User NetworkModel”,

IEEE Transaction on Wireless Communications , Vol. 10, No. 6, June 2011.[10] K. M. Thilina, K. W. Choi, N. Saquib, and E. Hossain, ”Machine Learning Techniques for Cooperative[11] D. Willkomm, S. Machiraju, J. Bolot, A. Wolisz, ”Primary Users in Cellular Networks: A Large-scale Measurement Study”, , pp. 1-11.

September 10, 2018 DRAFT7 [12] V. K. Tumuluru, P. Wang, D. Niyato, ”Channel status prediction for cognitive radio networks”,

Wiley WirelessCommunications and Mobile Computing , Vol. 12, Issue. 10, pp. 862-874, July 2012.[13] S. Pagadarai and A. M. Wyglinski, ”A linear mixed-effects model of wireless spectrum occupancy”,

EURASIP Journal onWireless Communications and Networking 2010 , Aug. 2010[14] Z. Xuping, P. Jianguo, ”Energy-detection based spectrum sensing for cognitive radio”,

Proc. CCWMSN07 , pp. 944 -947,Dec. 2007[15] A. J. Petain, ”Maximizing the Utility of Radio Spectrum: Broadband Spectrum Measurements and Occupancy Model forUse by Cognitive Radio”,

Ph.D. dissertation , Georgia Institute of Technology, Atlanta, GA, USA, 2005.[16] H. Zhang, ”The Optimality of Naive Bayes”, available online at, http://courses.ischool.berkeley.edu/i290-dm/s11/SECURE/Optimality-of-Naive Bayes.pdf.[17] L. Rokach, O. Maimon, ”Decesion Trees”,

Data Mining and Knowledge Discovery Handbook

Data Mining Techniques for the Life Sciences Methodsin Molecular Biology , vol. 609, pp. 223-239, 2010.[20] X. Yang, ”Fireﬂy algorithms for multimodal optimization”,

LNCS 5792 , pp. 169178, 2009.[21] P. Blunsom, ”Hidden markov models”, Aug 2004, available online at, http://digital.cs.usu.edu/ ∼ cyan/CS7960/hmm-tutorial.pdf.[22] D. Garrette, J. Baldridge, ”Type-supervised hidden Markov models for part-of-speech tagging with incomplete tagdictionaries”, EMNLP-CoNLL’12 , pp. 821-831, Stroudsburg, PA, USA, 2012.[23] Kernel (statistics), available online at, http://en.wikipedia.org/wiki/Kernel (statistics).

September 10, 2018 DRAFT8 −110 −100 −90 −80 −70 −60 −50 −40 −3000.10.20.30.40.50.60.70.80.91

Power(dBm) P r obab ili t y Fig. 2. The CDFs for the eight bands between 880-2500 MHz. −110 −100 −90 −80 −70 −60 −50 −40 −3000.20.40.60.811.21.41.61.8 Threshold M ean O cc upan cy Mean Occupancy from 6th Feb−18th June

Fig. 3. Occupancy VS threshold for Band 925-960 MHz

September 10, 2018 DRAFT9

Frequency O cc upan cy Occupancy for band 880−915 MHz (a)

Frequency O cc upan cy Occupancy for band 925−960 MHz (b)Fig. 4. Occupancy VS spectrum frequency for (a) Band 880-915 MHz (b) 925-960 MHz.

September 10, 2018 DRAFT0 s C l a ss i f i c a t i on A cc u r a cy Day 1 Data Threhold (−102)Threhold (−104)Threhold (−106)Threhold (−108)0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.900.51 M s C l a ss i f i c a t i on A cc u r a cy Day 2 Data Threhold (−102)Threhold (−104)Threhold (−106)Threhold (−108)0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.70.80.91 M s C l a ss i f i c a t i on A cc u r a cy Day 5 Data Threhold (−102)Threhold (−104)Threhold (−106)Threhold (−108)

Fig. 5. Selection of optimal threshold ( γ ) and optimal splitting range ( [ U oc , L oc ]) for determining the classiﬁcation criteria ofthree days data. September 10, 2018 DRAFT1

Number of days C l a ss i f i c a t i on A cc u r a cy Performance Comparison

SVMLRDTNBCHMM (a)

Number of days C l a ss i f i c a t i on A cc u r a cy Performance Comparison

SVMDTNBCHMMTrained HMM (b)Fig. 6. Performance Comparison of (a) SVM, DT, NBC, LR and HMM with k = 55 . (b) SVM, DT, NBC, HMM and trainedHMM with k = 192 . September 10, 2018 DRAFT2

Number of days C l a ss i f i c a t i on A cc u r a cy Performance Comparison

SVM+FFASVMDTNBCHMM (a)

Number of days O u t age P r obab ili t y Outage Probabality

SVM+FFASVMDTNBCHMMExpected Outage (b)Fig. 7. Performance Comparison of ML algorithms: SVM, DT, NBC, HMM and ’SVM+FFA’ using k = 192 for a set of 30days. (b) Comparison of ’expected probability of SU outage’ with the SU outage evaluated using SVM, DT, NBC, HMM and’SVM+FFA’ using k = 192 for a set of 30 days.for a set of 30 days.