[PDF] A New Algorithm for Distributed Nonparametric Sequential Detection

Abstract

We consider nonparametric sequential hypothesis testing problem when the distribution under the null hypothesis is fully known but the alternate hypothesis corresponds to some other unknown distribution with some loose constraints. We propose a simple algorithm to address the problem. These problems are primarily motivated from wireless sensor networks and spectrum sensing in Cognitive Radios. A decentralized version utilizing spatial diversity is also proposed. Its performance is analysed and asymptotic properties are proved. The simulated and analysed performance of the algorithm is compared with an earlier algorithm addressing the same problem with similar assumptions. We also modify the algorithm for optimizing performance when information about the prior probabilities of occurrence of the two hypotheses are known.

Full PDF

AA New Algorithm for Distributed NonparametricSequential Detection

Shouvik Ganguly, K. R. Sahasranand, Vinod Sharma

Department of Electrical Communication EngineeringIndian Institute of Science, Bangalore, IndiaEmail: [email protected], { sanandkr, vinod } @ece.iisc.ernet.in A BSTRACT

We consider nonparametric sequential hypothesis testingproblem when the distribution under the null hypothesis is fullyknown but the alternate hypothesis corresponds to some otherunknown distribution with some loose constraints. We propose asimple algorithm to address the problem. This is also generalizedto the case when the distribution under the null hypothesis isnot fully known. These problems are primarily motivated fromwireless sensor networks and spectrum sensing in CognitiveRadios. A decentralized version utilizing spatial diversity is alsoproposed. Its performance is analysed and asymptotic propertiesare proved. The simulated and analysed performance of thealgorithm are shown to be better than an earlier algorithmaddressing the same problem with similar assumptions. Wealso modify the algorithm for optimising performance wheninformation about the prior probabilities of occurrence of thetwo hypotheses are known.

Key words: Sequential Detection, Distributed detection, Asymp-totic performance. I. I

NTRODUCTION

Presently there is a scarcity of spectrum due to the proliferationof wireless services. However, it has been observed that much ofthe licensed spectrum remains underutilised for most of the time.Cognitive Radios (CRs) are proposed as a solution to this problem([1]). These are designed to exploit the unutilised spectrum fortheir communication, without causing interference to the primaryusers. This is achieved through spectrum sensing by the CRs, togain knowledge about spectrum usage by the primary users.For CRs, spectrum sensing needs to be achieved at very lowSNR in a wireless channel ([1]). Distributed detection, whichcan mitigate the time-varying, fading, shadowing and electro-magnetic interference is very well-suited for this application.Thus distributed detection has been a highly-studied topic recently([2],[1]).This has also found applications in sensor networks ([3],[4]).Distributed detection problems can be looked upon in cen-tralised or decentralised framework ([5]). In a centralised algo-rithm, the information collected by the local nodes are transmitteddirectly to the fusion centre which then uses it as a usual detectionproblem. In a decentralised algorithm, the local nodes transmitcertain quantized values (or local decisions) to the fusion node.This has the advantage of requiring less power and bandwidth in partially supported by a grant from ANRC transmission, but is suboptimal since the fusion centre has to takea decision based on less information.Distributed detection problem can also be classiﬁed as ﬁxedsample size or sequential ([6], [7]). In a ﬁxed sample framework,the decision has to be made based on a ﬁxed number of samples,and a likelihood ratio test turns out to be optimal for a simplebinary hypothesis problem. In a sequential framework, samplesare taken until some conditions are fulﬁlled, and once the processof taking samples has stopped, a decision is arrived at. It is knownthat in case of a single node, Sequential Probability Ratio Test(SPRT) outperforms other sequential or xed sample size detectorsin the simple binary hypothesis problem ([8]). But optimalsolutions in the decentralised setup are not available ([9]). In theparametric case, with full knowledge about the distributions, [10]proposes an asymptotically optimal decentralised sequential testwhen the communication channel between the local nodes andFC is perfect.For nonparametric sequential setup, [11] has provided separatealgorithms for different problems like changes in mean, variance,etc.[12] and [13] have studied the distributed decentralised detec-tion problem in a sequential framework, with a noisy reportingMAC. The algorithm in [12] requires complete knowledge ofthe probability distributions involved, and is thus parametric innature. The approach in [13] is non-paramteric in the sense thatit assumes very little knowledge of one of the distributions. Inthis paper, we have presented a simpler algorithm to address theproblem studied in [13]. Our algorithm has the added advantageof better performance in most cases, as borne out by simulationsand analysis.The paper is organized as follows. Section II presents thesystem model and the algorithm. Section III provides theoreticalanalysis of the algorithm for a single node case. Section IVprovides an approximate analysis of the distributed algorithm. Theasymptotics of the distributed algorithm are studied in Section V.Section VI compares our algorithm to KTSLRT in [13]. SectionVII provides a generalization of our algorithm along with anexplanation in the CR setup.II. S YSTEM M ODEL

There are L nodes and one fusion centre. Node l makesobservation X k,l at time k . We assume { X k,l , k ≥ } are i.i.d.(independent identically distributed). We also assume that theobservations received by different nodes are independent ofeach other. The distribution of the observations at each node iseither P or P . Each local node makes a decision based on theobservations it receives and conveys the decision to the fusion a r X i v : . [ c s . I T ] N ov ode. The fusion node makes the ﬁnal decision based on thelocal decisions it receives. The decision to be made is, H : if the probability distribution is P and H : if the probability distribution is P .We assume that P is known, but that P belongs to thefamily { P : D ( P || P ) ≥ λ and H ( P ) ≥ H ( P ) } , where D ( P || P ) is the divergence E P [log P ( X ) P ( X ) ] and H ( P ) is theentropy, or differential entropy, of the distribution P . Thedistribution P can be different for different nodes, allowing fordifferent fading gains.Our motivation for this setup is the Cognitive Radio (CR)system. A CR node has to detect if a channel is free (the primarynode is not transmitting) or not. When the channel is free then theobservations X k,l is the receiver noise with distribution P . Thiswill often be known (even Gaussian) and hence it is reasonableto assume that P is known (see, however, the generalization inSection VII). But when the primary is transmitting, it could beusing adaptive modulation and coding, unknown to the secondarynode, and even the fading of the wireless channel from theprimary transmitter to the local CR node may be time-varying andnot known to the receiver. This leads to an unknown distribution P under H . We will elaborate in Section VII how this scenariocan lead to the above class of distributions qualiﬁed for H .We have chosen sequential framework for detection at the localnodes as well as at the FC because it provides faster decisionson the average, for any probabilities of errors.Our detection algorithm works as follows. Each local node l ,on receiving X k,l at time k , computes W k,l as W k,l = W k − ,l − log P ( X k,l ) − H ( P ) − λ , W ,l = 0 (1)If W k,l ≥ − log α l , node l decides H ; if W k,l ≤ log β l , it decides H ; otherwise it waits for the next observation. If at time k ,node l has decided H , it transmits this decision to the FC bytransmiting b ; if it has decided H it transmits b ; otherwiseit sends nothing (i.e., 0). In the algorithm, α l , β l , b and b are constants appropriately chosen so as to provide good systemperformance.Let Y k,l be the transmission from node l to the FC at time k .The FC receives from local nodes at time k , Y k = L (cid:88) l =1 Y k,l + Z k , where Z k is the FC receiver noise. We will assume { Z k } to bei.i.d. Beecause of Z k , the FC does not directly know the localdcisions of the nodes. Thus it cannot use the majority rule, AND-rule, etc. usually used in literature. An advantage of allowing allnodes to transmit at the same time is that it reduces transmissiondelays from the nodes to the FC.The nodes keep transmitting to the FC till the FC makes itsdecision. Once the FC makes the decision, it will broadcast amessage to all the local nodes to stop transmission.The local nodes make their decisions at random times. Thus { Y k } received by the FC are not i.i.d. However, inspired bysequential detection algorithms (e.g. SPRT as in [8]) it uses thefollowing algorithm to make decisions. At time k , it computes F k = F k − + log g µ ( Y k ) g µ ( Y k ) , F = 0 .After that, it decides H if F k ≤ log β ; H if F k ≥ log α andwaits for the next observation otherwise. Here the distributions g µ and g µ are appropriately decided. We assume that thedistribution of noise { Z k } is known. Then g µ is the distributionof µ + Z k , and g µ is the distribution of µ + Z k , where µ and µ are constants. By choosing µ and µ appropriately, wecan ensure that FC makes a decision which is (say) close to amajority decision of local nodes.The overall algorithm is summarized asi Node l receives X k,l at time k ≥ and computes W k,l ii Node l transmits Y k,l = b { W k,l ≥ − log α l } + b { W k,l ≤ log β l } iii Fusion node receives at time k , Y k = L (cid:88) l =1 Y k,l + Z k iv Fusion node computes, F k = F k − + log g µ ( Y k ) g µ ( Y k ) , F = 0 v Fusion node decides H if F k ≤ log β , H if F k ≥ − log α ;otherwise it waits for the next observation.In the rest of the paper we analyze the performance of thisalgorithm. First we analyze the performance for a single (local)node in Section III and then for the decentralised algorithm inSection IV. Proofs of the lemmas and theorems provided aresimilar to those in [12] and [13] and are skipped in this paper forlack of space. We will show that the present algorithm providesa better performance than [13]. Also, unlike in [13], it is simplerto implement because it does not require a universal source coderand it also does not require quantization of observations.In the following, E i [ . ] and P i ( . ) denote the expectation andprobability, respectively, under H i , i = 0 , .III. P ERFORMANCE A NALYSIS FOR A S INGLE N ODE

In this section we provide performance of the test for a singlenode. We will omit the node index l in this section. Let N (cid:44) inf { n : W n > − log α } N (cid:44) inf { n : W n < log β } N (cid:44) min( N , N ) Lemma 3 . . P ( N < ∞ ) = 1 under H and H . (cid:4) We will use the notation that P F A (cid:44) P ( decide H ) and P MD (cid:44) P ( decide H ) . Also, f ( x ) = O [ g ( x )] denotes that lim x →∞ f ( x ) g ( x ) < ∞ Theorem 3 . . a) P F A = O ( α s ) where s is a solution of E [ e − s ( H ( P )+log P ( X k )+ λ − (cid:15) ) ] = 1 where < (cid:15) < λ and s > .b) P MD = O ( β s ∗ ) where s ∗ is a solution of E [ e − s ∗ ( − H ( P ) − log P ( X k ) − λ + (cid:15) ) ] = 1 , < (cid:15) < D ( P || P ) + H ( P ) − H ( P ) − λ and s ∗ > (cid:4) Let, N ∗ ( (cid:15) ) (cid:44) sup { n ≥ | − log P ( x n ) − nH ( P ) | > n(cid:15) } , ∗ ( (cid:15) ) (cid:44) sup { n ≥ |− log P ( x n ) − nH ( P ) − nD ( P || P ) | >n(cid:15) } Theorem 3 . : a) Under H , lim α,β → N | log β | = 2 λ a.s.If in addition, E ( N ∗ ( (cid:15) ) p ) < ∞ and E [(log P ( X )) p +1 ] < ∞ for all (cid:15) > and for some p ≥ , then lim α,β → E [ N q ] | log β | q = ( 2 λ ) q for all < q ≤ p .b) Under H , lim α,β → N | log β | = 1 D ( P || P ) + H ( P ) − H ( P ) − λ a.s.If in addition, E ( N ∗ ( (cid:15) ) p ) < ∞ and E [(log P ( X )) p +1 ] < ∞ for all (cid:15) > and for some p ≥ , then lim α,β → E [ N q ] | log β | q = ( D ( P || P ) + H ( P ) − H ( P ) − λ − q for all < q ≤ p . (cid:4) From Lemma 3.1 and Theorem 3.2 we se that the asymptoticbehaviour of our algorithm is comparable to that in [13] and alsoto dualSPRT ([12]). However, we wil see via simulations that itsubstantially outperforms KTSLRT in [13].IV. A

PPROXIMATE P ERFORMANCE OF D ECENTRALISED A LGORITHM

In the following, we take, for convenience, α l = β l ∀ l , α = β , b = − b = b , and µ = − µ = µ = I.b , for some ≤ I ≤ L .Roughly speaking, this ensures that the FC makes decisions H when I more nodes decide H compared to the nodes deciding H . Similarly for H . In the following, N il corresponds to N i at node l and N l corresponds to N . Similarly, N , N and N represent the corresponding terms for the FC. Lemma 4 . . For i = 0 , , P i ( N l = N il ) → as α l , β l → P i ( N = N i ) → as α l , β l → and α, β → (cid:4) Note: In general, when α l (cid:54) = β l , the results of Lemma 4.1under H demand that β and/or β l → , and the results under H demand that α and/or α l → . Analogous comments willhold true for the subsequent results as well. Lemma 4 . . Under H i ,a) | N l − N il | → a.s. as α l , β l → and lim α l → N l | log α l | = lim α l → N il | log α l | = 1 | δ i,l | a.s. and in L . b) | N − N i | → a.s. and lim N | log α | = lim N i | log α | a.s. and in L , as α l , β l → and α, β → . (cid:4) Lemmas 4.1 and 4.2 show that the local nodes make theright decisions as the thresholds | log α l | and | log β l | tend toinﬁnity. Then the FC also makes the right decisions when itsown thresholds increase. We need to set the thresholds such thatthe probabilities of errors are small.We will use the following notation: δ ji,F C (cid:44) mean drift of the fusion centre process { F k } under H i ,when j local nodes are transmitting. t j (cid:44) time at which the mean drift of { F k } changes from δ j − i,F C to δ ji,F C . ˜ F j (cid:44) E [ F t j − ] Now, it is seen that under H i , ˜ F j = ˜ F j − + δ j − i,F C ( E ( t j ) − E ( t j − )) , ˜ F = 0 . Lemma 4 . : P i ( decision of local node l at time t k is H i and t k is the k th orderstatistics of { N i , ... N iL } ) → as α l → ∀ l . (cid:4) Lemma 4 . : When α l and β l are small, N il ∼ N ( ±| log α l | δ i,l , ±| log α l | ρ i,l δ i,l ) where the ’ + ’ sign occurs under H and − sign under H . Proof : See Theorem 5.1, Chapter 3 in [14]. (cid:4)

Let E DD (cid:44) E i [ N ] , i = 0 , . In the following we provide anapproximation for this for both i = 0 , .When α l and α are small, probabilities of error are small, asproved in the above lemmas. Hence in such a scenario, forapproximation, we assume that local nodes are making correctdecisions.Let l ∗ i (cid:44) min { j : δ ji,F C > and ±| log α |− ˜ F j δ ji,FC < E ( t j +1 ) − E ( t j ) } ,where the ’ + ’ sign is taken under H . The detection delay E DD can be approximated as, E DD ≈ E ( t l ∗ i ) + ±| log α | − ˜ F l ∗ i δ l ∗ i i,F C (2)where the ’ + ’ sign occurs under H .The ﬁrst term in approximation (2) corresponds to the mean timetill the mean drift of { F k } becomes positive (for H ) or negative(for H ), and the second term corresponds to the mean timefrom then on till it crosses the threshold. Using the Gaussianapproximation of Lemma 4.4, the t k ’s are the order statisticsof i.i.d. Gaussian random variables and hence, the ˜ F k ’s can becomputed. See, for example, [15].In the following, we compute approximate expressionsfor P F A (cid:44) P [ decision is H | H ] and P MD (cid:44) P [ decision is H | H ] .Under the same setup of small α l and α , for P F A analysis, weassume that all local nodes are making correct decisions. Thenfor false alarm, the dominant event is { N < t } . Also, forreasonable performance, P ( N < t ) should be small. Then,the probability of false alarm, P F A , can be approximated as P F A = P ( N < N ) ≥ P ( N < t , N > t ) ≈ P ( N < t ) . (3)lso, P ( N < N ) ≤ P ( N < ∞ )= P ( N < t )+ P ( t ≤ N < t ) + · · · (4)The ﬁrst term in the RHS of (4) should be the dominant termsince after t , the drift of F k will have the desired sign (will atleast be in the favourable direction) with a high probability, ifthe local nodes make correct decisions.Equations (3) and (4) suggest that P ( N < t ) should serveas a good approximation for P F A . Similar arguments show that P ( N < t ) should serve as a good approximation for P MD .In the following, we provide approximations for these.Let ξ k (cid:44) log g µ ( Y k ) g µ ( Y k ) before t have mean 0 and probabilitydistribution symmetric about 0. Then, from the Markov propertyof the random walk { F k } , before t , P ( N < t ) ≈ ∞ (cid:88) k =1 P [ { F k ≥ − log α } k − (cid:92) n =1 { F n < − log α }| t > k ] .P ( t > k )= ∞ (cid:88) k =1 P [ { F k ≥ − log α }| k − (cid:92) n =1 { F n < − log α } ] .P [ k − (cid:92) n =1 { F n < − log α } ] .P ( t > k )= ∞ (cid:88) k =1 P ( F k ≥ − log α | ( F k − < − log α ) .P ( sup ≤ n ≤ k − F n < − log α ) . [1 − Φ t ( k )]= ∞ (cid:88) k =1 [ (cid:90) ∞ u =0 P ( ξ k > u ) f F k − ( − log α − u ) du ] .P ( sup ≤ n ≤ k − F n < − log α ) . [1 − Φ t ( k )] , where Φ t is the CDF of t .We can ﬁnd a lower bound to the above expression by using P ( sup ≤ n ≤ k − F n < − log α ) ≥ − P ( F k − ≥ − log α ) ([16], pg 525) and an upper bound by replacing sup ≤ n ≤ k − F n by F k − .Similarly, P MD can be approximated as P MD (cid:38) ∞ (cid:88) k =1 [ (cid:90) ∞ u =0 P ( ξ k < − u ) f F k − (log β + u ) du ] . [1 − P ( F k − ≤ log β )] . [1 − Φ t ( k )] , and P MD (cid:46) ∞ (cid:88) k =1 [ (cid:90) ∞ u =0 P ( ξ k < − u ) f F k − (log β + u ) du ] .P ( F k − > log β ) . [1 − Φ t ( k )] .We will show in Section VI that these approximate resultscompare well with simulations.V. A SYMPTOTIC R ESULTS

In this section, we assumei E i [ N ∗ i,l ( (cid:15) )] < ∞ , where these quantities are as deﬁned inLemma 3.3, but for local node l . ii E i [ V k,l ] p +1 < ∞ , for some p > , where V k,l is the driftof the test statistic at the local nodes.iii E i [ | ξ ∗ k | p +1 ] < ∞ iv ρ i,l < ∞ , where ρ i,l is the variance of V k,l .In this section, we also use the notationi θ i = E i ( ξ ∗ k ) ii A i = { ω ∈ Ω : all local nodes transmit correct decisions ( b i ) under H i } iii ∆( A i ) = Drift of fusion centre LLR under A i , i.e. E i [ ξ k |A i ] iv D tot = Lλ v D tot = L (cid:88) l =1 [ D ( f ,l || f ,l ) + H ( f ,l ) − H ( f ,l ) − λ vi r l = L vii ρ l = D ( f ,l || f ,l )+ H ( f ,l ) − H ( f ,l ) − λ D tot viii Λ i ( α ) = sup λ [ aλ − log g i ( λ )] ix g i = m.g.f. of | ξ ∗ k | x α + i = ess sup | ξ ∗ k | Furthermore, local node thresholds are − r l | log c | and ρ l | log c | ,where c is a constant, and the fusion centre thresholds are −| log c | and | log c | . Theorem H i , lim sup c → N | log c | ≤ D itot + C i ∆( A i ) (4)a.s. and in L ,where C = − (1 + θ D tot ) and C = 1 + θ D tot (cid:4) We introduce the following function, s i ( η ) (cid:44)  ηα + i , if η ≥ Λ i ( α + i ) η Λ − i ( η ) , if η ∈ (0 , Λ i ( α + i )) ,where g i ( λ ) = m.g.f. of | ξ ∗ k | , and Λ i ( α ) = sup λ [ aλ − log g i ( λ )] R i (cid:44) min ≤ l ≤ L {− log inf t ≥ E i [exp {− t ( − log f ,l ( X k,l ) − H ( P ) − λ } ] } . Theorem lim c → P F A c = 0 if for some < η < R , s ( η ) > .b) lim c → P MD c = 0 if for some < η < R , s ( η ) > . (cid:4) From Theorems 5.1 and 5.2 we see that the asymptotic perfor-mance of our algorithm is comparable to SPRT and KTSLRT in[12] and [13]. VI. S

IMULATIONS

In this section, we compare the simulated and theoreticalperformances of the new algorithm with KT-SLRT ([13]). Forsimulations, we have taken b = − b = 1 , L = 5 , µ = − µ = 2 .Also, the FC noise has been taken as zero mean Gaussian withvariance σ . Hence in this case, ∆( A ) = − ∆( A ) = σ = θ = θ . In the following simulations, E DD (cid:44) . E ( N ) + E ( N )] and P e (cid:44) . P F A + P MD ) The observations X k,l are considered with the following dis-tributions:areto Distribution P , P ∼ P (10 , and P ∼ P (3 , .Lognormal Distribution ln N , P ∼ ln N (0 , and P ∼ ln N (3 , .Gaussian Distribution N with P ∼ N (0 , and under P ∼N (1 , . The channel gains from the primary to the secondarynodes are 1 except in the Gaussian case, where these are takenas 0 dB, -1.5 dB, -2.5 dB, -4 dB and -6 dB.We plot the results in Figs 1-7. We see that the new algorithmmarkedly outperforms KT-SLRT. This may be due to thepresence of compression in KT-SLRT, due to which redundancyis introduced, leading to inaccuracies in the estimate. Also, theapproximations provided in Section IV are much closer to thesimulated values than the asymptotics.Fig. 1: Performance for Pareto DistributionTop: Detection Delay; Bottom: Error RateFig. 2: Performance Comparison with KT-SLRT for Pareto Dis-tribution VII. F URTHER G ENERALIZATIONS

Let us now consider a generalization of the problem, in which P is not exactly known. Speciﬁcally, the hypothesis testing (a) Detection Delay(b) Error Rate Fig. 3: Performance for Lognormal DistributionTop: Detection Delay; Bottom: Error RateFig. 4: Performance Comparison with KT-SLRT for LognormalDistributionproblem we now consider is: H : P ∈ { P (cid:48) : D ( P (cid:48) || P ) ≤ γλ } , for some ≤ γ < . (5) H : P ∈ { P (cid:48) : D ( P (cid:48) || P ) ≥ λ and H ( P (cid:48) ) > H ( P (cid:48) ) , for all P (cid:48) ∈ H } The detection algorithm remains the same except that now wewrite the test statistic at the local node l as (cid:102) W k,l = (cid:102) W k − ,l − log ˆ P ( X k,l ) − H ( ˆ P ) − υλ. For good performance we should pick ˆ P from the class in (5)and choose υ carefully. We elaborate on this in the following.Let us try to justify this problem statement from a practical CRstandpoint. In a CR setup, H actually indicates the presence ofonly noise, while under H , the observatios are signal + noise.Due to electromagnetic interference, the receiver noise can bechanging with time (). Thus we assume that the noise power P N is bounded as σ N,L ≤ P N ≤ σ N,H . Similarly, let the signal powerbe bounded as σ S,L ≤ P S ≤ σ S,H . Now we formulate theseconstraints in the form (5) where we should select appropriate P , λ and γ . We will compute these assuming we are limiting a) Detection Delay(b) Error Rate Fig. 5: Performance of KTSLRT for Gaussian Distribution withdifferent received SNRsTop: Detection Delay; Bottom: Error Rate (a) Detection Delay(b) Error Rate

Fig. 6: Performance for Gaussian Distribution with differentreceived SNRsTop: Detection Delay; Bottom: Error Rateourselves to Gaussian distributions but will see that these workwell in general.We take P ∼ N (0 , σ ) , with σ determined from the givenbounds as follows.Given two Gaussian distributions Q and Q with zero meanand variances σ and σ respectively, D ( Q || Q ) = ln σ σ + 12 ( σ σ − Fig. 7: Performance Comparison with KT-SLRT for GaussianDistribution with different received SNRsLet f ( σ ) (cid:44) ln σ σ + 12 ( σ σ − . We choose σ such that f ( σ N,L ) = f ( σ N,H ) . This can be achieved for some σ ∈ ( σ N,L , σ

N,H ) , since f is convex with a minimum at σ . This choice ensures that P is at some sort of a ”centre” ofthe class of distributions under consideration in H . We nowchoose γλ (cid:44) f ( σ N,L ) = f ( σ N,H ) .For the class of distributions considered under H , σ N,L + σ S,L ≤ E [ X ] ≤ σ N,H + σ S,H . We take λ (cid:44) inf σ ∈ ( σ N,L + σ S,L ,σ N,H + σ S,H ) f ( σ ) = f ( (cid:113) σ N,L + σ S,L ) .Next we compute ˆ P . If the X k,l has distribution P (cid:48) i for i =0 , , then the drift at the local nodes is D ( P (cid:48) || ˆ P ) + H ( P (cid:48) ) − H ( ˆ P ) − υλ under H , and D ( P (cid:48) || ˆ P ) + H ( P (cid:48) ) − H ( ˆ P ) − υλ under H . This drift is an important parameter in determining thealgorithm performance and will decide ˆ P .Let W i be the cost of rejecting H i wrongly, and c be the costof taking each observation. Then, Bayes risk for the test is given([17]) by R c ( δ ) = (cid:88) i =0 π i [ W i P i ( reject H i ) + cE i ( N )] , where π i is theprior probability of H i .Taking the same thresholds as in Section V and using Theorems5.1 and 5.2, lim c → R c ( δ ) c | log c |≤ π L [ − D ( P (cid:48) || ˆ P ) − H ( P (cid:48) )+ H ( ˆ P )+ υλ ] (1 − θ ∆( A ) )+ π L [ D ( P (cid:48) || ˆ P ) + H ( P (cid:48) ) − H ( ˆ P ) − υλ ] (1+ θ ∆( A ) ) − π ∆( A ) + π ∆( A ) . (6)Following a minimax approach, we ﬁrst maximize the aboveexpression with respect to P (cid:48) and P (cid:48) , and then minimize theresulting maximal risk w.r.t. ˆ P and υ . As noted before, weachieve this optimization limiting ourselves to only Gaussianfamily.The second term in (6) is maximized when D ( P (cid:48) || ˆ P ) + h ( P (cid:48) ) is minimized. Let us denote the variance of ˆ P by Γ . Now,the variances of all eligible P (cid:48) s are greater than Γ . Hence, D ( P (cid:48) || ˆ P ) + h ( P (cid:48) ) is minimized when P (cid:48) has the least possiblevariance, i.e. σ N,L + σ S,L . Using N (0 , σ N,L + σ S,L ) in place of P (cid:48) , the second term in (6) becomes (after simpliﬁcation), ( π /L )(1 + θ ∆( A ) ) ( σ N,L + σ S,L Γ − − υλ . imilarly, to maximize the ﬁrst term in (6), we have tominimize D ( P (cid:48) || ˆ P ) + H ( P (cid:48) ) w.r.t. P (cid:48) . After this, the ﬁrst termbecomes ( π /L )(1 − θ ∆( A ) ) υλ − ( σ N,H Γ − .Taking x (cid:44) , y (cid:44) υλ, , a = σ N,H , b = σ N,L + σ S,L ,A = ( π /L )(1 − θ ∆( A ) ) and B = ( π /L )(1 + θ ∆( A ) ) , (7)the non-constant part of the above expression can be written asa function of x and y in the form, g ( x, y ) = Ay + − ax + B bx − y − Minimizing this w.r.t. y yields, y opt = 12 √ A ( bx −

1) + √ B ( ax − √ A + √ B (8)Together with this, we can choose x ∈ ( 1 σ N,H , σ N,L ) .In the following, we demonstrate the advantage of optimizingthe above parmeters on the examples considered in Section VI.The bounds on the noise and signal power were chosen in eachcase such that the distributions speciﬁed in Section VI satisfythose constraints. Also, the thresholds were chosen the same asbefore.For the following simulations, we have taken Γ = σ N,L + σ N,H and determined y opt in accordance with (8).For Gaussian distribution, P (cid:48) ≡ N (0 , , P ≡ N (0 , For Lognormal distribution, P (cid:48) ≡ log N (0 , , P ≡ log N (3 , For Pareto distribution, P (cid:48) ≡ P (10 , , P ≡ P (3 , We compare the performances in Figs. 8-10. We see that the op-timized version performs noticeably better, even for distributionsother than Gaussian.Fig. 8: Optimization for Pareto DistributionC

ONCLUSIONS

We have developed a new distributed sequential algorithm fordetection, where under one of the hypotheses, the distribution canbelong to a nonparametric family. This can be useful for spectrumsensing in Cognitive Radios. This algorithm is shown to performbetter than a previous efﬁcient algorithm and is also easier toimplement. We have also obtained its performance approximatelyand studied asymptotic performance. The approximations matchwith the simulations better than the asymptotics. The asymptoticsare comparable to SPRT and other known algorithms even thoughit is in the non-parametric setup. Fig. 9: Optimization for Lognormal DistributionFig. 10: Optimization for Gaussian DistributionR

EFERENCES[1] I. F. Akyildiz, B. Lo, and R. Balakrishnan, “Cooperative spectrum sensingin cognitive radio networks: A survey,”

Physical Communication , vol. 4,no. 1, pp. 40–62, 2011.[2] J. Unnikrishnan, D. Huang, S. Meyn, A. Surana, and V. Veeravalli, “Uni-versal and composite hypothesis testing via mismatched divergence,”

IEEETransactions on Information Theory , vol. 57, no. 3, pp. 1587–1603, 2011.[3] J. Chamberland and V. Veeravalli, “Wireless sensors in distributed detectionapplications,” vol. 24, pp. 16–25, 2007.[4] P. Varshney and R. Viswanathan, “Distributed detection with multiplesensors,” in

IEEE, Proceedings of , vol. 85, pp. 54–63, IEEE, 1997.[5] K. Liu and A. M. Sayeed, “Optimal distributed detection strategies forwireless sensor netowrks,” vol. 17, 2010.[6] E. Geraniotis, “Robust distributed discrete-time block and sequential detec-tion in uncertain environments,”

Naval Research Journal .[7] S. Marano and V. Matta, “Elements of sequential detection with applicationsto sensor networks,”[8] D. Siegmund,

Sequential Analysis: Tests and Conﬁdence Intervals . Springer,1985.[9] V. Veeravalli, T. Basar, and H. V. Poor, “Decentralised sequential detectionwith a fusion centre performing the sequential test,”

IEEE Transactions onInformation Theory , vol. 39, no. 2, pp. 433–442, 1993.[10] G. Fellouris and G. V. Moustakides, “Decentralized sequential hypothesistesting using asynchronous communication,”

IEEE Transactions on Infor-mation Theory , vol. 57, no. 1, pp. 534–548, 2011.[11] N. Mukhopadhyay and B. De Silva,

Sequential Methods and Their Appli-cations . CRC: Chapman and Hall.[12] J. K. Sreedharan and V. Sharma, “Spectrum sensing using distributed se-quential detection via noisy reporting mac,” No. arXiv:1211.5562v2 [cs.IT],2013.[13] J. K. Sreedharan and V. Sharma, “Nonparametric decentralized sequentialdetection via universal source coding,” No. arXiv:1308.6481v1 [cs.IT], 2013.[14] A. Gut,

Stopped random walks : limit theorems and applications , vol. 5 of

Applied probability . New York: Springer-Verlag, 1988.[15] H. Barakat and Y. Abdelkader, “Computing the moments of order statisticsfrom non-identical random variables,”

Statistical Methods and Applications ,vol. 13, pp. 15–26, 2004.[16] P. Billingsley,

Probability and Measure . John Wiley and Sons, 1986.[17] E. Lehmann and G. Casella,

Theory of Point Estimation . Springer Texts inStatistics, New York: Springer, 2003.[18] T. M. Cover and J. A. Thomas,

Elements of Information Theory . New York:Wiley-Interscience, 2 ed.[19] V. Poor,

An Introduction to Signal Detection and Estimation . New York:Springer-Verlag, 2 ed., 1994.[20] T. Banerjee, V. Sharma, V. Kavitha, and A. Jayaprakasam, “Generalizedanalysis of a distributed energy efﬁcient algorithm for change detection,”