Distributed Wideband Spectrum Sensing
DDistributed Wideband Spectrum Sensing
Thomas Kealy , Oliver Johnson and Robert Piechocki Abstract —We consider the problem of reconstructing wide-band frequency spectra from distributed, compressive measure-ments. The measurements are made by a network of nodes,each independently mixing the ambient spectra with low fre-quency, random signals. The reconstruction takes place via localtransmissions between nodes, each performing simple statisticaloperations such as ridge regression and shrinkage.
I. I
NTRODUCTION
There is an almost ubiquitous growing demand for mobileand wireless data, with consumers demanding faster speedsand better quality connections in more places. Consequently4G is now being rolled out in the UK and US and with 5Gbeing planned for 2020 and beyond [5].However, there is constrained amount of frequencies overwhich to transmit this information; and demand for frequenciesthat provide sufficient bandwidth, good range and in-buildingpenetration is high.Not all spectrum is used in all places and at all times, andjudicious spectrum management, by developing approaches touse white spaces where they occur, would likely be beneficial.Broadly, access to spectrum is managed in two, comple-mentary ways, namely through licensed and licence exemptaccess. Licensing authorises a particular user (or users) toaccess a specific frequency band. Licence exemption allowsany user to access a band provided they meet certain technicalrequirements intended to limit the impact of interference onother spectrum users.A licence exempt approach might be particularly suitable formanaging access to white spaces. Devices seeking to accesswhite spaces need a robust mechanism for learning of thefrequencies that can be used at a particular time and location.One approach is to refer to a database, which maps the locationof white spaces based on knowledge of existing spectrumusers. An alternative approach is for devices to detect whitespaces by monitoring spectrum use.The advantages of spectrum monitoring [1] over persistinga database of space-frequency data are the ability of networksto make use of low-cost low-power devices, only capableof making local (as opposed to national) communications,keeping the cost of the network low and opportunistic channel
This work was supported by the Engineering and Physical Sciences Re-search Council [grant number
EP/I028153/1 ]; Ofcom; and the Universityof Bristol. The authors would particularly like to thank Gary Clemo of Ofcomfor useful discussions. Thomas Kealy is with CDT in Communications, MVB, School of Engi-neeering University of Bristol, UK [email protected] Oliver Johnson is with the Department of Mathematics, University Walk,Bristol, University of Bristol, UK.
[email protected] Robert Piechocki is with the CSN Group, MVB, School of Engineering,University of Bristol, UK. [email protected] usage for bursty traffic, reducing channel collisions in densenetworks.The realisation of any Cognitive Radio standard (such asIEEE 802.22 [10]), requires the co-existence of primary (TVusers) and secondary (everybody else who wants to use TVWSspectrum) users of the frequency spectrum to ensure properinterference mitigation and appropriate network behaviour. Wenote, that whereas TVWS bands are an initial step towardsdynamic spectrum access, the principles and approaches wedescribe are applicable to other frequency bands - in particularit makes ultra-wideband spectrum sensing possible.The challenges of this technology are that Cognitive Ra-dios (CRs) must sense whether spectrum is available, andmust be able to detect very weak primary user signals.Furthermore they must sense over a wide bandwidth (due tothe amount of TVWS spectrum proposed), which challengestraditional Nyquist sampling techniques, because the samplingrates required are not technically feasible with current RF orAnalogue-to-Digital conversion technology.Due to the inherent sparsity of spectral utilisation, Com-pressive Sensing (CS) [4] is an appropriate formalism withinwhich to tackle this problem. CS has recently emerged as anew sampling paradigm allowing images to be taken from asingle pixel camera for example. Applying this to wirelesscommunication, we are able to reconstruct sparse signals atsampling rates below what would be required by Nyquisttheory, for example the works [7], and [12] detail how thissampling can be achieved.However, even with CS, spectrum sensing from a singlemachine will be costly as the proposed TVWS band willbe over a large frequency range (for instance in the UK theproposed TVWS band is from 470 MHz to 790 MHz, requiringtraditional sampling rates of ~1600 MHz). CS at a singlesensor would still require high sampling rates. In this paper wepropose a distributed model, which allows a sensing budget ateach node far below what is required by centralised CS. Themain contribution of this paper is that the model can be solvedin a fully distributed manner - we do not require a centralfusion centre as in [13]. Moreover, we are able to show thatthe set of updates at each nodes takes closed form.The structure of the paper is as follows: in section II weintroduce the sensing model, in section IV we describe thedistributed reconstruction algorithm [8], and finally in sectionV we show some results of the reconstruction quality of thismodel. II. M
ODEL
We consider a radio environment with a single primary user(PU) and a network of J nodes collaboratively trying to sense a r X i v : . [ c s . I T ] J un nd reconstruct the PU signal, either in a fully distributedmanner (by local communication), or by transmitting measure-ments to a fusion centre which then solves the linear system.We try to sense and reconstruct a wideband signal, dividedinto L channels. We have a (connected) network of J (= 50)nodes placed uniformly at random within the square [0 , × [0 , . This is the same model, as in [13]. The calculationswhich follow are taken from [13] as well.The nodes individually take measurements (as in [7]) bymixing the incoming analogue signal x ( t ) with a mixingfunction p i ( t ) aliasing the spectrum. x ( t ) is assumed to bebandlimited and composed of up to k uncorrelated transmis-sions over the L possible narrowband channels - i.e. the signalis k -sparse.The mixing functions - which are independent for each node- are required to be periodic, with period T p . Since p i isperiodic it has Fourier expansion: p i ( t ) = ∞ (cid:88) l = −∞ c il exp (cid:18) jlt πT p (cid:19) (II-.1)The c il are the Fourier coefficients of the expansion andare defined in the standard manner. The result of the mixingprocedure in channel i is therefore the product xp i , withFourier transform (we denote the Fourier Transform of x by X (˙ . ) ): X i ( f ) = (cid:90) ∞−∞ x ( t ) p i ( t ) dt = ∞ (cid:88) l = −∞ c il X ( f − lf p ) (II-.2)(We insert the Fourier series for p i , then exchange the sumand integral). The output of this mixing process then, is alinear combination of shifted copies of X ( f ) , with at most (cid:100) f N Y Q/f p (cid:101) terms since X ( f ) is zero outside its support (wehave assumed this Nyquist frequency exists, even though wenever sample at that rate).This process is repeated in parallel at each node so that eachband in x appears in baseband.Once the mixing process has been completed the signal ineach channel is low-pass filtered and sampled at a rate f s ≥ f p .In the frequency domain this is a ideal rectangle function, sothe output of a single channel is: Y i (cid:0) e j πfT s (cid:1) = + L (cid:88) l = − L c il X ( f − lf p ) (II-.3)since frequencies outside of [ − f s / , f s / will filtered out. L is the smallest integer number of non-zero contributions in X ( f ) over [ − f s / , f s / - at most (cid:100) f N Y Q/f p (cid:101) if we choose f s = f p . These relations can be written in matrix form as: y = Ax + w (II-.4)where y contains the output of the measurement process,and A is a product matrix of the mixing functions, their Fourier coefficients, a partial Fourier Matrix, and a matrix of channelcoefficients. x is the vector of unknown samples of x ( t ) .i.e. A can be written: A m × L = S m × L F L × L D L × L H L × L (II-.5)The measurements y are transmitted to a Fusion Centre viaa control channel. The system II-.4 can then be solved (in thesense of finding the sparse vector x by convex optimisationvia minimising the objective function: (cid:107) Ax − y (cid:107) + λ (cid:107) x (cid:107) (II-.6)where λ is a parameter chosen to promote sparsity. Larger λ means sparser x . III. ADMMThe alternating direction method of multipliers [3],(ADMM), algorithm solves problems of the form arg min x f ( x ) + g ( z ) s.t U x + V z = c (III-.7)where f and g are assumed to be convex function with rangein R , U ∈ R p × n and V ∈ R p × m are matrices (not assumedto have full rank), and c ∈ R p .ADMM consists of iteratively minimising the augmentedLagrangian L p ( x, z, η ) = f ( x ) + g ( z ) + η T ( U x + V z − c ) + ρ (cid:107) U x + V z − c (cid:107) ( η is a Lagrange multiplier), and ρ is a parameter we canchoose to make g ( z ) smooth [9], with the following iterations: x k +1 := arg min x L ρ (cid:0) x, z k , η k (cid:1) (III-.8) z k +1 := arg min z L ρ (cid:0) x k +1 , z, η k (cid:1) (III-.9) η k +1 := η k + ρ (cid:0) U x k +1 + V z k +1 − c (cid:1) (III-.10)The alternating minimisation works because of the decom-posability of the objective function: the x minimisation stepis independent of the z minimisation step and vice versa.We illustrate an example, relevant to the type of problemsencountered in signal processing. A. Example: ADMM for Centralised LASSO
ADMM can be formulated as an iterative MAP estimationprocedure for the problem (which is referred to as LASSO see[11]): (cid:107) Ax − b (cid:107) + λ (cid:107) x (cid:107) (III-A.11)This can be cast in constrained form as: (cid:107) Ax − b (cid:107) + λ (cid:107) z (cid:107) (III-A.12)s.t z = x (III-A.13)i.e this is of the form (III-.7) with f ( x ) = (cid:107) Ax − y (cid:107) , g ( z ) = λ (cid:107) z (cid:107) , U = I , V = − I , and c = 0 .The associated Lagrangian is: L ρ = 12 (cid:107) Ax − b (cid:107) + λ (cid:107) z (cid:107) + η ( x − z )+ ρ (cid:107) x − z (cid:107) (III-A.14)Now, given a set of noisy measurements (say of radiospectra) y , and a sensing matrix A we can use ADMM tofind the (sparse) radio spectra.The ADMM iterations for LASSO, which can be found byalternately differentiating (III-A.14) with respect to x , z and η ,are (in closed form): x k +1 := (cid:0) A T A + ρI (cid:1) − (cid:0) A T b + ρ (cid:0) z k − y k (cid:1)(cid:1) (III-A.15) z k +1 := S λ/ρ (cid:0) x k +1 + η k /ρ (cid:1) (III-A.16) η k +1 := η k + ρ (cid:0) x k +1 − z k +1 (cid:1) (III-A.17)where S λ/ρ ( ◦ ) is the soft thresholding operator: S γ ( x ) i =sign( x i ) ( | x i | − γ ) + .This algorithm has a nice statistical interpretation: it it-eratively performs ridge regression, followed by shrinkagetowards zero. This is the MAP estimate for x under a Laplaceprior.The soft-thresholding operator can be derived by consider-ing the MAP estimate of the following model: y = x + w (III-A.18)where x is some (sparse) signal, and w is additive whiteGaussian noise. We seek ˆ x = arg max x P x | y ( x | y ) (III-A.19)This can be recast in the following form by using Bayes rule,noting that the denominator is independent of x and takinglogarithms: ˆ x = arg max x [log P w ( y − x ) + log P ( x )] (III-A.20)The term P n ( y − x ) arises because we are considering x + w with w zero mean Gaussian, with variance σ n . So,the conditional distribution of y (given x ) will be a Gaussiancentred at x .We will take P ( x ) to be a Laplacian distribution: P ( x ) = 1 √ σ exp − √ σ | x | (III-A.21)Note that f ( x ) = log P x ( x ) − √ σ | x | , and so by differen-tiating f (cid:48) ( x ) = − √ σ sign ( x ) Taking the maximum of III-A.20 we obtain: y − ˆ xσ n − √ σ sign ( x ) = 0 (III-A.22)Which leads the soft thresholding operation defined earlier,with γ = √ σ n σ as (via rearrangement): y = ˆ x + √ σ n σ sign ( x ) or ˆ x ( y ) = sign( y ) (cid:32) y − √ σ n σ (cid:33) + i.e S γ ( y ) .IV. C ONSTRAINED O PTIMISATION ON G RAPHS
We model the network as an undirected graph G = ( V, E ) ,where V = { . . . J } is the set of vertices, and E = V × V isthe set of edges. An edge between nodes i and j implies thatthe two nodes can communicate. The set of nodes that node i can communicate with is written N i and the degree of node i is D i = |N i | .Individually nodes make the following measurements: y p = A p x + n p (IV-.23)where A p is the p th row of the sensing matrix from(II-.4), and the system (II-.4) is formed by concatenating theindividual nodes’ measurements together.We assume that a proper colouring of the graph is available:that is, each node is assigned a number from a set C = { . . . c } , and no node shares a colour with any neighbour.To find the x we are seeking, to each node we give a copyof x , x p and we constrain the copies to be indentical across alledges in the network. We can write the combined optimisationvariable as ¯ x , which collects together C copies of a n × vector x : Definition 1.
We define vectors x c , where c = 1 , . . . , C andwrite the vector of length nJ : ¯ x = C (cid:88) c =1 w c ⊗ x c = (cid:104) x Tc (1) , . . . , x Tc ( J ) (cid:105) T (IV-.24) where w c ( i ) = I ( c ( i ) = c ) , I is the indicator function, and wehave written c ( i ) for the colour of the i th node. The problem then is to solve: arg min ¯ x C (cid:88) c =1 (cid:88) j ∈ c (cid:107) A j x j − y j (cid:107) + λJ (cid:107) x (cid:107) and x i = x j if { i, j } ∈ E and x i = z i ∀ i ∈ { , . . . , C } (IV-.25)These constraints can be written more compactly by intro-ducing the node-arc incidence matrix B: a V by E matrixwhere each column is associated with an edge ( i, j ) ∈ E nd has and − in the ith and jth entry respectively.Figures (IV.1) and (IV.2) show examples of a network andit’s associated incidence matrix. Definition 2. u := (cid:0) B T ⊗ I n (cid:1) ¯ x = (cid:0) B T ⊗ I n (cid:1) C (cid:88) c =1 w c ⊗ x c = C (cid:88) c =1 B Tc ⊗ x c where we have used the definition (IV-.24) in the second line,and the property of Kronecker products ( A ⊗ C )( B ⊗ D ) =( AB ⊗ CD ) between the second and third lines, and we write B c = w Tc B . The constraint x i = x j if { i, j } ∈ E can now be written C (cid:88) c =1 (cid:0) B Tc ⊗ I n (cid:1) ¯ x c = 0 (IV-.26)note that (cid:0) B T ⊗ I n (cid:1) ∈ R nE × nJ . Together (IV-.24) and(IV-.26), suggests that the problem (IV-.25) can be re-writtenas: arg min ¯ x C (cid:88) c =1 (cid:88) j ∈ C c (cid:107) A j x j − y j (cid:107) + β (cid:107) z j (cid:107) s.t. C (cid:88) c =1 (cid:0) B Tc ⊗ I n (cid:1) ¯ x c = 0 and ¯ x c − ¯ z c = 0 (IV-.27)where β = λJ . Figure IV.1. An example of a network
The Augmented Lagrangian for the problem (IV-.27) canbe written down as:
Figure IV.2. The incidence matrix associated with Figure (IV.1) L ρ = C (cid:88) c =1 ( (cid:88) j ∈ c (cid:0) (cid:107) A j x j − y j (cid:107) + β (cid:107) z j (cid:107) (cid:1) + η T (cid:0) B Tc ⊗ I n (cid:1) ¯ x c + ρ || ¯ x c − ¯ z c || + θ T (¯ x c − ¯ z c ) + ρ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) C (cid:88) c =1 (cid:0) B Tc ⊗ I n (cid:1) ¯ x c (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (IV-.28)The term ( (cid:107) (cid:80) Cc =1 (cid:0) B Tc ⊗ I n (cid:1) ¯ x c (cid:107) ) of (IV-.28), can bedecomposed, using the following lemma: Lemma IV.1. (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) C (cid:88) c =1 (cid:0) B Tc ⊗ I n (cid:1) ¯ x c (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:88) j ∈ C D j || x j || − (cid:88) k ∈ N j x Tj x k (IV-.29) and η T C (cid:88) c =1 (cid:0) B Tc ⊗ I n (cid:1) ¯ x = (cid:88) l ∈ C c (cid:88) m ∈ N l sign ( m − l ) η Tml x l (IV-.30) where η is decomposed edge-wise: η = ( . . . , η ij , . . . ) , suchthat η i,j = η j,i , and is associated with the constraint x i = x j .Proof. u T u = C (cid:88) c =1 C (cid:88) c =1 (cid:0) B c ⊗ x Tc (cid:1) (cid:0) B Tc ⊗ x c (cid:1) = (cid:88) c ,c B c B Tc ⊗ x Tc x c BB T is a J × J matrix, with the degree of the nodes onthe main diagonal and − in position ( i, j ) if nodes i and j are neighbours (i.e BB T is the graph Laplacian). Hence, sincewe can write B c B Tc = w Tc BB T w c , the trace of B c B Tc issimply the sum of the degrees of nodes with colour 1.For c (cid:54) = c , B c B Tc corresponds to an off diagonal blockof the graph Laplacian, and so counts how many neighbourseach node with colour 1 has.Finally, note that η ∈ R nE and can be written: η = C (cid:88) c =1 w c ⊗ η c (IV-.31)here η c is the vector of Lagrange multipliers associatedacross edges from colour c . Now η T u = C (cid:88) c =1 C (cid:88) c =1 w c Bw c ⊗ η Tc x c by the properties of Kronecker products, and the definition of B c . For c = c , η T u is zero, as there are no edges betweennodes of the same colour b definition. For c (cid:54) = c , η T u countsthe edges from c to c , with the consideration that the edgesfrom c to c are counted with opposite parity.Adding together this with the lemma, lets us write (IV-.28)as: L ρ = C (cid:88) c =1 (cid:88) j ∈ C c (cid:0) (cid:107) A j x j − y j (cid:107) + β (cid:107) z j (cid:107) (cid:1) + ν T x j + ρ D i || x j || + ρ (cid:107) x j − z j (cid:107) (IV-.32)where we have defined: ν i = (cid:32) (cid:88) k ∈N i sign ( k − i ) η { i,k } − ρx k (cid:33) (IV-.33)this is a rescaled version of the Lagrange multiplier, η ,which respects the graph structure.Then by differentiating (IV-.32) with respect to x j and z j we can find closed forms for the updates as: Theorem 1. x k +1 j := (cid:0) A Tj A j + ( ρD J + 1) I (cid:1) − (cid:0) A Tj y j + z k − ν kT (cid:1) (IV-.34) z k +1 j := S β/ρ (cid:0) x k +1 j (cid:1) (IV-.35) θ k +1 j := θ kj + ρ (cid:0) x k +1 − z k +1 (cid:1) (IV-.36) η k +1 j := η kj + ρ (cid:88) m ∈ N j z km − z kj (IV-.37)V. R ESULTS
The model described in section (II), equation (II-.4) wassimulated, with a wideband signal of 201 channels and anetwork of 50 nodes (i.e. the signal will be sampled at a1/4 of rate predicted by Nyquist theory). The mixing patternswere generated from iid Gaussian sources (i.e the matrix Shad each entry drawn from an iid Gaussian source). MonteCarlo simulations were performed at SNR values ranging from5 to 20, and the expected Mean Squared Error (MSE) ofsolutions of a centralised solver (spgl1) and a distributed solver(ADMM) were calculated over 10 simulations per SNR value.The results can be seen in fig (V.4).The MSE was calculated as follows: (cid:12)(cid:12)(cid:12)(cid:12) Z k − Z ∗ (cid:12)(cid:12)(cid:12)(cid:12) || Z ∗|| (V-.38) where Z k is the result of the algorithm at iteration k , and Z ∗ is the optimal solution.These results indicate that for both centralised and dis-tributed solvers, adding noise to the system results in a de-grading of performance. Interestingly note, that the distributedsolver seems to (slightly) outperform the centralised solver atall SNRs. This is counter-intuitive, as it would be expectedthat centralised solvers knowing all the available informationwould outperform distributed solutions. We conjecture that theupdates described in section (IV), take into account differencesin noise across the network. The distributed averaging steps,which form the new prior for each node, then penalise updatesfrom relatively more noisy observations. This corroboratesobservations from [2].This observation is (partially) confirmed in figure ( ?? ),which plots the progress of the centralised and distributedsolvers (as a function of iterations) towards the optimumsolution. The SNR is 0.5 (i.e the signal is twice as strong as thenoise). Note that after around 300 iterations, the MSE of thedistributed solver is consistently below that of the centralisedsolver. Figure V.3. Mse vs SNR for the sensing model, with AWGN only, showingthe performance of distributed and centralised solvers
VI. C
ONCLUSIONS
We have demonstrated an alternating direction algorithm fordistributed optimisation with closed forms for the computationat each step, and discussed the statistical properties of theestimation.We have simulated the performance of this distributedalgorithm for the distributed estimation of frequency spectra,in the presence of additive (white, Gaussian) and multiplicative(frequency flat) noise. We have shown that the algorithmis robust to a variety of SNRs and converges to the samesolution as an equivalent centralised algorithm (in relativemean-squared-error). igure V.4. Mse vs SNR for the sensing model, showing the performance ofdistributed and centralised solversFigure V.5. The progress of the distributed solver as a function of the numberof iterations, with different values of the regression parameter λ We plan to work on larger, more detailed, models forthe frequency spectra and to accelerate the convergence viaNesterov type methods to smooth the convergence of thedistributed algorithm [6]. Specifically, we seek to dampen theringing seen in Figure V.6R
EFERENCES[1] O. B. Akan, O. Karli, and O. Ergul, “Cognitive radio sensor networks,”
Network, IEEE , vol. 23, no. 4, pp. 34–40, 2009.[2] G. B. Bazerque, J.A Giannakis, “Distributed spectrum sensing forcognitive radios by exploiting sparsity,”
Proc. of 42nd Asilomar Conf.on Signals, Systems, and Computers , 2008.[3] S. Boyd, “Distributed Optimization and Statistical Learning via theAlternating Direction Method of Multipliers,”
Foundations and Trends R (cid:13) in Machine Learning , no. 1, pp. 1–122. Figure V.6. The progress of a distributed (blue) and a centralised (green)solver as a function of the number of iterations. The value of λ = 0 . [4] E. J. Candes, J. Romberg, and T. Tao, “Robust Uncertainty Principles :Exact Signal Frequency Information,” vol. 52, no. 2, pp. 489–509, 2006.[5] E. Dahlman, “5G wireless acces: Requirements and realization,” IEEECommunications Magazine , no. December, pp. 42–47, 2014.[6] T. Goldstein, B. O’Donoghue, S. Setzer, and R. Baraniuk, “Fast alternat-ing direction optimization methods,”
SIAM Journal on Imaging Sciences ,vol. 7, no. 3, pp. 1588–1623, 2014.[7] M. Mishali and Y. C. Eldar, “From theory to practice: Sub-nyquistsampling of sparse wideband analog signals,”
Selected Topics in SignalProcessing, IEEE Journal of , vol. 4, no. 2, pp. 375–391, 2010.[8] J. F. Mota, J. M. Xavier, P. M. Aguiar, and M. Puschel, “D-admm:A communication-efficient distributed algorithm for separable optimiza-tion,”
Signal Processing, IEEE Transactions on , vol. 61, no. 10, pp.2718–2723.[9] Y. Nesterov, “Smooth minimization of non-smooth functions,”
Mathe-matical programming , vol. 103, no. 1, pp. 127–152, 2005.[10] C. R. Stevenson, G. Chouinard, Z. Lei, W. Hu, S. J. Shellhammer, andW. Caldwell, “IEEE 802.22: The first cognitive radio wireless regionalarea network standard,”
IEEE Communications Magazine , vol. 47, no. 1,pp. 130–138, 2009.[11] R. Tibshirani, “Regression shrinkage and selection via the lasso,”
Journal of the Royal Statistical Society. Series B (Methodological) , pp.267–288, 1996.[12] J. A. Tropp, J. N. Laska, M. F. Duarte, J. K. Romberg, and R. G.Baraniuk, “Beyond nyquist: Efficient sampling of sparse bandlimitedsignals,”
Information Theory, IEEE Transactions on , vol. 56, no. 1, pp.520–544, 2010.[13] H. Zhang, Z. Zhang, and Y. Chau, “Distributed compressed widebandsensing in Cognitive Radio Sensor Networks,” in2011 IEEE Conferenceon Computer Communications Workshops, INFOCOM WKSHPS 2011