[PDF] Asymptotic Properties of Likelihood Based Linear Modulation Classification Systems

Abstract

The problem of linear modulation classification using likelihood based methods is considered. Asymptotic properties of most commonly used classifiers in the literature are derived. These classifiers are based on hybrid likelihood ratio test (HLRT) and average likelihood ratio test (ALRT), respectively. Both a single-sensor setting and a multi-sensor setting that uses a distributed decision fusion approach are analyzed. For a modulation classification system using a single sensor, it is shown that HLRT achieves asymptotically vanishing probability of error (Pe) whereas the same result cannot be proven for ALRT. In a multi-sensor setting using soft decision fusion, conditions are derived under which Pe vanishes asymptotically. Furthermore, the asymptotic analysis of the fusion rule that assumes independent sensor decisions is carried out.

Full PDF

aa r X i v : . [ c s . I T ] N ov IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS (DRAFT) 1

Asymptotic Properties of Likelihood Based LinearModulation Classiﬁcation Systems

Onur Ozdemir*,

Member, IEEE , Pramod K. Varshney,

Fellow, IEEE , Wei Su,

Fellow, IEEE ,Andrew L. Drozd,

Fellow, IEEE

Abstract —The problem of linear modulation classiﬁcationusing likelihood based methods is considered. Asymptotic prop-erties of most commonly used classiﬁers in the literature arederived. These classiﬁers are based on hybrid likelihood ratio test(HLRT) and average likelihood ratio test (ALRT) respectively.Both a single-sensor setting and a multi-sensor setting thatuses a distributed decision fusion approach are analyzed. Fora modulation classiﬁcation system using a single sensor, it isshown that HLRT achieves asymptotically vanishing probabilityof error ( P e ) whereas the same result cannot be proven forALRT. In a multi-sensor setting using soft decision fusion,conditions are derived under which P e vanishes asymptotically.Furthermore, the asymptotic analysis of the fusion rule thatassumes independent sensor decisions is carried out. Index Terms —Automatic modulation classiﬁcation, maximumlikelihood classiﬁer, decision fusion.

I. I

NTRODUCTION

Automatic modulation classiﬁcation (AMC) is a signalprocessing technique that is used to estimate the modulationscheme corresponding to a received noisy communicationsignal. It plays a crucial role in various civilian and militaryapplications, e.g., this technique has been widely used in manycommunication applications such as spectrum monitoring andadaptive demodulation. The AMC methods can be divided intotwo general classes (see the survey paper [1]): 1) likelihood-based (LB) and 2) feature-based (FB) methods. In this paper,we focus on the former method which is based on thelikelihood function of the received signal under each mod-ulation scheme, where the decision is made using a Bayesianhypothesis testing framework. The solution obtained by the LBmethod is optimal in the Bayesian sense, i.e., it minimizes theprobability of incorrect classiﬁcation. In the last two decades,extensive research has been conducted on AMC methods,which are mainly limited to methods based on receptions ata single sensor (communication receiver). A detailed surveyon the AMC techniques using a single sensor can be found in[1]. For a single sensor tasked with AMC, the classiﬁcationperformance depends highly on the channel quality whichdirectly affects the received signal strength. In non-cooperativecommunication environments, additional challenges exist thatfurther complicate the problem. These challenges stem fromunknown parameters such as signal-to-noise ratio (SNR) and

O. Ozdemir and A. L. Drozd are with Andro Computational Solutions,7902 Turin Road, Rome, NY 13440. P. K. Varshney is with Departmentof EECS, Syracuse University, Syracuse, NY 13244. W. Su is with U.S.Army CERDEC, Aberdeen Proving Ground, MD 21005. This work wassupported by U.S. Army contract W15P7T-11-C-H262. Email: { oozdemir,adrozd } @androcs.com, [email protected], [email protected] phase offset. In order to alleviate classiﬁcation performancedegradation in non-cooperative environments, network centriccollaborative AMC approaches have been proposed in [2], [3],[4], [5], [6]. It has been shown that the use of multiple sensorshas the potential of boosting effective SNR, thereby improvingthe probability of correct classiﬁcation.In this paper, we focus on the likelihood based classiﬁcationof linearly modulated signals, i.e., PSK and QAM signals.We notice that this problem is a composite hypothesis testingproblem due to unknown signal parameters, i.e., uncertaintyin the parameters of the probability density functions (pdfs)associated with different hypotheses. Various likelihood ratiobased automatic modulation classiﬁcation techniques havebeen proposed in the literature. An underlying assumption inall of these techniques is that each hypothesis has equallylikely priors, in which case the classiﬁers reduce to maximumlikelihood (ML) classiﬁers. These techniques take the formof a generalized likelihood ratio test (GLRT), an averagelikelihood ratio test (ALRT) or a hybrid likelihood ratio test(HLRT). A thorough review of these techniques can be foundin [7]. In the GLRT approach, all the unknown parametersare estimated using maximum likelihood (ML) methods andthen a likelihood ratio test (LRT) is carried out by pluggingin these estimates into the pdfs under both hypotheses. Inaddition to its complexity, GLRT has been shown to providepoor performance in classifying nested constellation schemessuch as QAM [8]. In the ALRT approach [7], the unknownsignal parameters are marginalized out assuming certain priorsconverting the problem into a simple hypothesis testing prob-lem. In the HLRT approach [7], the likelihood function (LF)is marginalized over the unknown constellation symbols andthen the resulting average likelihood function (LF) is used toﬁnd the ML estimates of the remaining unknown parameters.These estimates are then plugged into the average LFs tocarry out the LRT. Also, there are several variations of HLRT,which are called quasi HLRT (QHLRT), in which the MLestimates are replaced with other alternatives such as momentbased estimators. We do not discuss the details here and referthe interested reader to [7] for further details. Our goal inthis paper is to derive asymptotic (in the number of observa-tions N ) properties of modulation classiﬁcation methods. Weconsider both single sensor and multiple sensor approaches.Although there has been extensive work on developing variousmethods for modulation classiﬁcation, to the best of ourknowledge, except for the work in [9], there is no work in theliterature that investigates asymptotic properties of modulationclassiﬁcation systems under single sensor or multi-sensor EEE TRANSACTIONS ON WIRELESS COMMUNICATIONS (DRAFT) 2 settings. In [9], the authors consider a coherent scenario wherethe only unknown variables are the constellation symbols. Inthis scenario, they analyze the asymptotic behavior of MLclassiﬁers for linear modualtion schemes. Using Kolmogorov-Smirnov (K-S) distance, they show that the ML classiﬁcationerror probability vanishes as N → ∞ . Our contributions inthis paper are as follows. We start with a single sensor systemand analyze the asymptotic properties of two AMC scenarios:1) coherent scenario with known signal-to-noise ratio (SNR),2) non-coherent scenario with unknown SNR. Although theﬁrst scenario is the same as the one considered in [9], weprovide a much simpler proof which is then utilized to obtainthe results for our second scenario. We analyze both HLRTand ALRT approaches. We do not consider GLRT due to itspoor performance in classifying nested constellations. Afteranalyzing single sensor approaches, we consider a multi-sensorsetting as shown in Fig. 1. Under this framework, we analyzea speciﬁc multi-sensor approach, namely distributed decisionfusion for multi-hypothesis modulation classiﬁcation whereeach sensor uses the LB approach to make its local decision. Inthis setting, there are L sensors observing the same unknownsignal. Each sensor employs its own LB classiﬁer and sendsit soft decision to a fusion center where a global decisionis made. We analyze the asymptotic properties of ALRT andHLRT in this multi-sensor setting in the asymptotic region as N → ∞ and L → ∞ . We also provide implications of largenumber of observations for the fusion rule at the fusion center.The rest of the paper is organized as follows. In Section II,we introduce the system model and lay out our assumptions.In Section III, we formulate the likelihood-based modulationclassiﬁcation problem and summarize HLRT and ALRT ap-proaches. We consider the single sensor case in Section IVand analyze the asymptotic probability of classiﬁcation errorunder various settings. Similarly, the asymptotic probabilityof classiﬁcation error in the multi-sensor case is analyzed inSection V. We provide numerical results that corroborate ouranalyses in Section VI. Finally, concluding remarks along withavenues for future work are provided in Section VII. Sensor 1 Synchronization Signal Processing Fusion Center Sensor L ... ...

Sensor Node LSensor Node 1

Modulation ClassificationSignal Emitter Local Data/Decisions(s i ) s Synchronization s L Signal Processing

Fig. 1. Generic system model for a multi-sensor modulation classiﬁcationsystem. s l is the decision/data of the l th sensor, where l = 1 , . . . , L . II. S

YSTEM M ODEL A SSUMPTIONS

We consider a general linear modulation reception scenariowith multiple receiving sensors assuming that the wirelesscommunication channel between the unknown transmitter andeach sensor undergoes ﬂat block fading, i.e., the channelimpulse response is h ( t ) = ae jθ δ ( t ) over the observation interval. After preprocessing, the received complex basebandsignal at each sensor can be expressed as [1]: r ( t ) = s ( t | ˜ u ) + v ( t ) , ≤ t ≤ N T (1) s ( t | ˜ u ) = ae jθ e j π ∆ ft N − X n =0 I n g tx ( t − nT − εT ) , (2)where s ( t ) denotes the time-varying message signal; ˜ u repre-sents the unknown signal parameter vector; a and θ are thechannel gain (or the signal amplitude) and the channel (orthe signal) phase, respectively; v ( t ) is the additive zero-meanwhite Gaussian noise; g tx ( t ) is the transmitted pulse; T is thesymbol period; { I n } is the complex information sequence, i.e.,the constellation symbol sequence; and ε and ∆ f representresidual time and frequency offsets, respectively. The constant εT represents the propagation time delay within a symbolperiod where ε ∈ [0 , . Throughout the paper, we assumethat ε and ∆ f are perfectly known. Therefore, without loss ofgenerality, we set ε = ∆ f = 0 . The representation in (2) hasthe implicit assumption that phase jitter is negligible. Withoutloss of generality, we further assume that the constellationsymbols have unit power, i.e., E [ | I n | ] = 1 , where E [ · ] denotes statistical expectation. Note that the unknown phaseterm denoted by θ in (2) subsumes both the unknown channelphase and unknown carrier phase. Similarly, the unknownsignal amplitude a subsumes the unknown signal amplitudeas well as the unknown channel gain.After ﬁltering the received signal with a pulse-matched ﬁlter g rx ( t ) , and sampling at a rate of Q/T , where Q is an integer,the following discrete-time obervation sequence is obtained[10]: r k = s k (˜ u ) + w k (3) s k (˜ u ) = ae jθ N − X n =0 I n g ( kT /Q − nT ) , (4)where g ( t ) = g tx ( t ) ∗ g rx ( t ) with ∗ denoting the convolutionoperator, r k = r ( t ) ∗ g rx ( t ) | t = kT/Q , w k = v ( t ) ∗ g rx ( t ) | t = kT/Q , N is the total number of observed information symbol, and k = 0 , . . . , K − . Note that N = K/Q , i.e., there are Q samples per symbol. For simplicity, we assume that g tx ( t ) isa rectangular pulse where g ( t ) = 1 , ≤ t ≤ T . We furtherassume Q = 1 and w n is independent identically distributed(i.i.d.) circularly symmetric complex Gaussian noise with realand imaginary parts of variance N / , i.e., w n ∼ CN (0 , N ) .Our analysis in this paper can be easily generalized to otherpulse shapes and cases where Q > . Under these assump-tions, the received observation sequence can be written as: r n = ae jθ I n + w n , n = 0 , . . . , N − . (5)The above signal model is a commonly used model in mod-ulation classiﬁcation literature [1], [11], [12], [13]. Note that a , θ , and { I n } Nn =1 are the unknown signal parameters. In ageneral modulation classiﬁcation scenario, in addition to theunknown signal parameters, the noise power N may also beunknown. In this case, the unknown parameter vector can bewritten as ˜ u = (cid:2) a, θ, N , { I n } N − n =0 (cid:3) . EEE TRANSACTIONS ON WIRELESS COMMUNICATIONS (DRAFT) 3

III. L

IKELIHOOD - BASED L INEAR M ODULATION C LASSIFICATION

Our goal throughout this paper is to gain insights intothe modulation classiﬁcation problem using the assumptionscommonly made in the modulation classiﬁcation literature.Suppose there are S candidate modulation formats underconsideration. Let r denote the observation vector deﬁned as r := [ r , . . . , r N − ] and I ( i ) n denote the constellation symbolat time n corresponding to modulation i ∈ { , . . . , S } . Theconditional pdf of r conditioned on the unknown modulationformat i and the unknown parameter vector u , i.e., thelikelihood function (LF), is given by p i ( r | u ) = 1( πN ) N exp − N N − X n =0 | r n − ae jθ I ( i ) n | ! . (6)If the transmitted signal is an M-PSK signal, the constellationsymbol set is given as S MP = { e j πm/M | m = 0 , . . . , M − } and I ( i ) n ∈ S MP . Otherwise, if the transmitted signal isan M-QAM signal, the constellation symbol set is S MQ = { b m e jθ m | m = 0 , . . . , M − } and I ( i ) n ∈ S MQ .Note that the LF in (6) is parameterized by the modulationscheme under consideration and the only difference betweenconditional pdfs of different modulation schemes comes fromthe constellation symbols I n . In a Bayesian setting, the optimalclassiﬁer in terms of minimum probability of classiﬁcationerror is the maximum a posteriori (MAP) classiﬁer. If there isno a priori information on probability of modulation schemeemployed by the transmitter available, which is usually thecase in a noncooperative environment, one can use a non-informative prior, i.e., each modulation scheme is assigned anidentical prior probability. This is the assumed scenario in thispaper. In this case, the optimal classiﬁer takes the form of themaximum likelihood (ML) classiﬁer.Let us ﬁrst consider the HLRT approach, where the LFis averaged over the unknown constellation symbols I n andthen maximized over the remaining unknown parameters. Themodulation scheme that maximizes the resulting LF is selectedas the ﬁnal decision, i.e., ˆ i = arg max i =1 ,...,S (cid:18) max a,θ,N E I ( i ) n { p i ( r | u ) } , (cid:19) (7)where E x [ · ] denotes the expectation operator with respect tothe random variable x , and I ( i ) n is the unknown constellationsymbol for modulation format i .In the ALRT approach, the unknown parameters are allmarginalized out resulting in the marginal likelihood functionwhich is used to make the ﬁnal decision as ˆ i = arg max i =1 ,...,S E u { p i ( r | u ) } . (8)In the next section, we analyze the probability of classiﬁcationerror starting with a single sensor setting followed by a multi-sensor setting. In certain cases, these sets can be rotated by some ﬁxed phase, e.g., QPSKis represented as a rotated version of S P by e jπ/ . This does not affect ourresults. IV. A

SYMPTOTIC P ROBABILITY OF E RROR A NALYSIS :S INGLE S ENSOR C ASE

A. Scenario 1: Coherent Reception with Known SNR

In this scenario, the only unknown variables are the datasymbols I n , n = 1 , . . . , N . In this case, without loss ofgenerality, the received complex signal can be expressed as r n = I n + w n , n = 1 , . . . , N, (9)Assuming independent information symbols and white sensornoise, the LF averaged over the unknown constellation sym-bols under modulation format i is given as p i ( r ) := p ( r | H i ) = N Y n =1 p ( r n | H i ) , (10)where p ( r n | H i ) = E I ( i ) n { p ( r n | H i , I ( i ) n ) } = M i X m =1 p ( r n | I m, ( i ) n , H i ) p ( I m, ( i ) n | H i ) . (11)In ( ), M i and I m, ( i ) n are the number of constellation symbolsand the m th constellation symbol for modulation class i ,respectively. In general, the constellation symbols are assumedto have equal a priori probabilities, i.e., p ( I m, ( i ) n | H i ) = 1 /M i ,which results in p ( r n | H i ) = 1 M i M i X m =1 p ( r n | I m, ( i ) n , H i ) . (12)where p ( r n | I m, ( i ) n , H i ) = 1 πN exp (cid:18) − N | r n − I m, ( i ) n | (cid:19) (13)In this case, p ( r n | H i ) in (12) represents a complex Gaus-sian mixture model (GMM), or a complex Gaussian mixturedistribution, with M i homoscedastic components where eachcomponent has identical occurrence probability (weight) /M i as well as identical variance N , and the mean of eachcomponent is one of the unique constellation symbols inmodulation format i . Let us revisit the generic expression fora complex GMM denoted by f ( r ) : f ( r ) = M X i =1 w i φ ( r ; µ i , σ i ) (14)where φ ( r ; µ i , σ ) = 1 πσ i exp (cid:18) − | r − µ | σ i (cid:19) (15)We know that a GMM given by (14) and (15) is completelyparameterized by the set { w i , µ i , σ i } Mi =1 [14]. Remark 1:

For a given modulation format i , the Gaussianmixture model (GMM) in (12) is completely parameterizedby the means of the components in the mixture, i.e., by theconstellation symbol set S ( i ) = { I , ( i ) , . . . , I M i , ( i ) } . In otherwords, if S ( i ) = S ( j ) then p ( r n | H i ) and p ( r n | H j ) representtwo different GMMs. EEE TRANSACTIONS ON WIRELESS COMMUNICATIONS (DRAFT) 4

Let us now deﬁne the test statistics Λ i := − N log p i ( r ) = − N N X n =1 log p ( r n | H i ) . (16)Then, the ML classiﬁer is given as ˆ i = arg min i =1 ,...,S Λ i . (17)The classiﬁer performance can be quantiﬁed in terms of theaverage probability of error ( P e ) given as P e = 1 S S X i =1 P ie , (18)where P ie is the probability of error under hypothesis H i , i.e.,given that modulation i is the true modulation, P ie = 1 − P (Λ i < Λ j | H i ) , ∀ j = i. (19)Now, we can state the following theorem which shows that theprobability of error of the ML classiﬁer vanishes asymptoti-cally as N → ∞ . Note that the same result was also obtainedin [9] using Kolmogorov-Smirnov (K-S) distance. Here, weprovide a simpler proof than the one in [9]. Theorem 1:

The ML classiﬁer in (17) asymptotically attainszero probability of error for classifying digital amplitude-phasemodulations regardless of the received SNR, i.e., lim N →∞ P e = 0 . (20) Proof:

Suppose H i is the true hypothesis. In order tostudy the asymptotic ( N → ∞ ) behavior of Λ j ( r ) under H i , we follow the same technique as in [15] and write thefollowing using the law of large numbers: lim N →∞ Λ j ( r ) = − E i [log p j ( r )] (21) = E i [log( p i ( r ) /p j ( r ))] − E i [log p i ( r )] (22) = D ( p i || p j ) + h i ( r ) (23)where E i [ · ] is the expectation under H i , D ( p i || p j ) is theKullback-Leibler (KL) distance between p i and p j deﬁned as D ( p i || p j ) := E i [log( p i ( r ) /p j ( r ))] , and h i ( r ) is the differen-tial entropy deﬁned as h i ( r ) := − E i [log p i ( r )] [16]. Note that h i ( r ) is not a function of any modulation j = i . Therefore,under H i , the only difference between test statistics Λ i and Λ j is the KL distance D ( p i || p j ) ≥ , which is equal to zero ifand only if p j = p i . Now, let us revisit the ML classiﬁcationrule given in (17), ˆ j = arg min j =1 ,...,S lim N →∞ Λ j ( r ) . (24)Since the second term in (23) is independent of the teststatistics under consideration, i.e., Λ j , the only differencebetween different test statistics results from the the ﬁrst termin (23), which is the KL distance D ( p i || p j ) . If D ( p i || p j ) > for j = i and D ( p i || p j ) = 0 for j = i , the ML classiﬁer in(24) will always decide i = ˆ j = arg min j =1 ,...,S lim N →∞ Λ j ( r ) . (25)Therefore, (25) implies that perfect classiﬁcation is obtainedfor any given SNR in the limit as N → ∞ if and only if D ( p i || p j ) > , ∀ j, i, j = i . For digital phase-amplitudemodulations, we know from (12) that p i ( r ) represents a GMMand each modulation format corresponds to a unique GMM(see Remark 1). Therefore, D ( p i || p j ) > , ∀ j, i, j = i , whichis the only condition needed for asymptotically vanishing errorprobability of the ML classiﬁer. B. Noncoherent Reception with Unknown SNR

In this scenario, the received complex signal is expressedas r n = ae jθ I n + w n , n = 1 , . . . , N. (26)In this case, in addition to the unknown constellation symbols,there are three more unknown parameters which are channelamplitude ( a ), channel phase ( θ ), and noise power ( N ). Wewill denote these additional unknown parameters in vectorform as u = [ a, N , θ ] , where a ∈ [0 , ∞ ) , N ∈ [0 , ∞ ) and θ ∈ [0 , π ) .Let us ﬁrst consider the HLRT approach, where the un-known data symbols are marginalized out and the remainingunknown parameters are estimated using an ML estimator. InHLRT, these ML estimates are plugged into the likelihoodfunction to perform the ML classiﬁcation task. In practice,the complex channel gain ae jθ can be either random ordeterministic depending on the application. In deep-spacecommunications, the channel gain can be assumed to bea deterministic time-independent constant [17], whereas inurban wireless communications, the channel gain is oftenassumed to be random due to multipath effects resultingin fading. In fading channels, the duration over which thechannel gain remains constant depends on the coherence timeof the channel. Nevertheless, in HLRT, the channel gain isalways treated as a deterministic unknown regardless of theapplication and ML estimation is employed to estimate a and θ . The resulting likelihood function for modulation i can bewritten as p i ( r , ˆ u i ) := p i ( r | H i , ˆ u i ) = N Y n =1 p ( r n | H i , ˆ u i ) , (27)where p ( r n | H i , ˆ u i ) = 1 M i M i X m =1 p ( r n | H i , ˆ u i , I m, ( i ) n ) , (28) ˆ u i = arg max u N Y n =1 p ( r n | H i , u ) . (29)In order to be explicit, we re-write (28) as p ( r n | H i , ˆ u i ) = 1 M i M i X m =1 π ˆ N ,i exp − | r n − ˆ a i e j ˆ θ i I m, ( i ) n | ˆ N ,i ! . (30)From (30), we can see that p ( r n | H i , ˆ u i ) represents a complexGMM with M i homoscedastic components where each com-ponent has identical occurrence probability /M i as well asidentical variance ˆ N ,i , and the mean of each component isone of the unique constellation symbols in modulation format i mutiplied by ˆ a i e j ˆ θ i . EEE TRANSACTIONS ON WIRELESS COMMUNICATIONS (DRAFT) 5

We can deﬁne the new test statistics which now includesthe estimates of the unknown parameters as Λ i ( r , ˆ u i ) := − N log p i ( r | ˆ u i ) = − N N X n =1 log p ( r n | H i , ˆ u i ) . (31)Then (29) can be equivalently written as ˆ u i = arg min u Λ i ( r , u ) , (32)and the ML classiﬁer is given as ˆ i = arg min i =1 ,...,S Λ i ( r , ˆ u i ) . (33)We start the analysis by making the following observations.In practice, there is always some a priori knowledge on thebounds of the unknown parameters a and N . In other words,the search space for the maximization of the likelihood func-tion with respect to a and N can be conﬁned to [0 , A U ] and [0 , N U ] , respectively, for some known A U and N U . Regardingthe unknown phase θ , the search space depends on the modula-tion class that is under consideration. For M-PSK modulations,it sufﬁces to limit the search space of θ to [0 , π/M ) , becausethe likelihood function is a periodic function of θ with aperiod of π/M . This is due to averaging over the unknownconstellation symbols and rotation of the constellation mapwith respect to θ , i.e., rotation of the constellation map by π/M results in the same constellation map as far as thelikelihood function averaged over the constellation symbols isconsidered. Similarly, for M-QAM modulations, it sufﬁces tolimit the search space of θ to [0 , π/ because of the same rea-sons as M-PSK modulations discussed earlier. We now makethe following assumption which will simplify mathematicalanalysis. We assume that the unknown parameters [ a, N , θ ] liein the interior region of the cube [0 , A U ] × [0 , N U ] × [0 , π/M ] for M-PSK or [0 , A U ] × [0 , N U ] × [0 , π/ for M-QAM,respectively. Note that these assumptions are almost alwayssatisﬁed in practice. Let us denote this closed Euclidean spaceas U : [0 , A U ] × [0 , N U ] × [0 , θ U ] , where θ U = 2 π/M forM-PSK and θ U = π/ for M-QAM. Lemma 1:

Let S denote the set of PSK and QAM modu-lation classes. Deﬁne p i ( r | u i ) := p ( r | H i , u i ) . Let i, j ∈ S , u i ∈ U i , u j ∈ U j . If i = j , then D ( p i ( r | u i ) || p j ( r | u j )) > . (34) Proof:

See Appendix A.The following theorem states that the probability of errorof the HLRT classiﬁer vanishes asymptotically as N → ∞ . Theorem 2:

The ML classiﬁer in (33) asymptotically attainszero probability of error for classifying digital amplitude-phasemodulations regardless of the received SNR.

Proof:

Suppose H i is the true hypothesis and u ∗ i denotesthe true value of the unknown parameter. We start by notingthat the maximum likelihood estimator (MLE) is consistentunder some mild regularity conditions [18], which are sat-isﬁed by the likelihood functions of digital amplitude-phasemodulations. In other words, if H i is the true hypothesis and u ∗ i is the true value of the unknown parameter u , then u ∗ i = arg min u lim N →∞ Λ i ( r , u ) . (35) Under H i , we write the following using the law of largenumbers lim N →∞ Λ j ( r , ˆ u j ) = − E i [log p j ( r | ˆ u j )] , (36)where E i [ · ] denotes expectation with respect to p ( r | H i , u ∗ i ) .Then, ( ) can be written as lim N →∞ Λ j ( r , ˆ u j ) = E i [log( p i ( r | u ∗ i ) /p j ( r | ˆ u j ))] − (37) E i [log p i ( r | u ∗ i )]= D ( p i ( r | u ∗ i ) || p j ( r | ˆ u j )) + h i ( r | u ∗ i ) (38)where the second term is the differential entropy of the truedistribution deﬁned as h i ( r | u ∗ i ) := − E i [log p ( r | H i , u ∗ i )] . Theproof follows from Lemma 1 and the same reasoning as inTheorem 1.From (38), we can make the following observation. Under H i and the true parameter u ∗ i , ˆ u j = arg min u lim N →∞ Λ j ( r , u ) (39) = arg min u D ( p i ( r | u ∗ i ) || p j ( r | u )) . (40)As N → ∞ , the MLE ˆ u j minimizes the KL distance betweenthe true and the assumed distributions. This was actuallyobserved by Akaike [19] in the area of maximum likelihoodestimation under misspeciﬁed models (see also [20]). Weshould also emphasize that the consistency of the ML estima-tor is necessary for P e to vanish as N → ∞ as otherwise onecannot deduce (38) from (37). As one would expect, the resultin Theorem 2 is useful in practice only when the channel gainremains constant over a large observation interval. Channelsthat exhibit such a behavior include deep space communicationchannels as well as slowly varying fading channels.Next, we consider a variation of the HLRT approach where,in addition to unknown data symbols, a subset of remainingunknown parameters are marginalized out. Then the maximiza-tion is carried over the remaining subset. Let u denote thesubset of the unknown parameters that are marginalized outand f U ( u ) denote the joint a priori distribution of u . Let u denote the vector of the remaining unknown parametersover which the maximization is carried out. Then, the MLclassiﬁer is given as ˆ i = arg max i =1 ,...,S p i ( r | ˆ u i ) (41) ˆ u i = arg max u p i ( r | u ) (42)where p i ( r | u ) = Z U p i ( r | u , u ) f U ( u ) d u . (43)Since the unknowns [ a, N , θ ] stay constant over the obser-vation interval, it is clear from (57) that the observations r n become dependent after averaging (conditional independenceis no longer valid), i.e., p i ( r | u ) = N Y n =1 p i ( r n | u ) . (44)Due to this dependence, the law of large number cannot beinvoked. Therefore, these classiﬁers do not have provably EEE TRANSACTIONS ON WIRELESS COMMUNICATIONS (DRAFT) 6 vanishing P e in the asymptotic regime as N → ∞ . This isalso the case for the ALRT approach where all the unknownsare marginalized out before classiﬁcation. In practice, ALRTmay be preferred over HLRT since the latter requires multi-dimensional maximization of the LF which is generally anon-convex optimization problem. In order to alleviate thisproblem, a suboptimal HLRT called quasi-HLRT (or QHLRT)was proposed in [8], [12], where the MLEs of the unknownparameters were replaced with moment based estimators. Ingeneral, QHLRT does not guarantee provably asymptoticallyvanishing P e , since these estimators are generally not consis-tent.V. A SYMPTOTIC P ROBABILITY OF E RROR A NALYSIS :M ULTI -S ENSOR C ASE

In this section, we consider a multi-sensor setting whereeach sensor transmits its soft decision to a fusion center wherea global decision is made. We start our analyses assuming softdecision fusion where each sensor sends its unquantized locallikelihood value to the fusion center.In a multiple sensor scenario, the set of unknown parameters { a, θ, N } corresponding to each sensor is independent fromthat of other sensors. However, care must be taken to analyzethis scenario as the independence of these unknowns does notguarantee the independence of different sensor observations.In the following, we will investigate the multiple sensorscenario and derive conditions under which the asymptoticerror probability goes to zero. A. Scenario 1: Coherent Reception with Known SNR

We ﬁrst consider the general case for the coherent andsynchronous environment where there are L sensors andeach sensor l ( l = 1 , . . . , L ) makes N observations. Let usdeﬁne the vector of observations for each sensor as r l :=[ r n l , . . . , r n lN ] , l = 1 , . . . , L. . We also deﬁne the set ofindices for the complex information sequence that each sensorobserves as I l := { n l , . . . , n l N } , l = 1 , . . . , L. (45)Similar to (10)-(12), the likelihood function at sensor l is p i ( r l ) := p ( r l | H i ) = Y n ∈I l p ( r n | H i ) , (46) p ( r n | H i ) = 1 M i M i X m =1 p ( r n | I m, ( i ) n , H i ) . (47)Let p i ( r s ) and p i ( r t ) denote two arbitrary likelihood functionsfor sensor s and t , where s = t . Assuming independent sensornoises, it is important to see that r s ∼ p i ( r s ) and r t ∼ p i ( r t ) are independent if and only if I s ∩ I t = ∅ . (48)The condition in (48) is required for independence since datasymbols are marginalized out in the likelihood function. Weshould note that the implicit assumption in (48) is that thedata symbols are i.i.d. in time which is a common assumptionin communications literature. From (48), we can deduce the general condition for independence. All sensor observationsare independent (across sensors) if and only if \ l =1 ,...,L I l = ∅ . (49)Physically, the condition in (49) implies that sensor observa-tions, or the underlying baseband symbol sequences, shouldnot overlap in time to satisfy independence. This conditionmay or may not be realized in practice. One possible wayof obtaining independent sensor observations is to send apilot signal to each sensor initiating data collection and leaveenough time between two consecutive pilot signals so that eachsensor observes a different non-overlapping time window ofthe same signal.Suppose the condition in (49) is satisﬁed. Let p i denotethe likelihood function at the fusion center for modulation i deﬁned as p i := p ( r , . . . , r L | H i ) = L Y l =1 Y n ∈I l p ( r n | H i ) . (50)We can now deﬁne Λ i := − LN log p i = − N L X l =1 log p ( r l | H i ) (51) = − LN L X l =1 X n ∈I l log p ( r n | H i ) . Note that the independence condition is necessary in order forthe second equality in (51) to hold. Then, the ML classiﬁer isgiven as ˆ i = arg min i =1 ,...,S Λ i . (52) Theorem 3: As P Ll =1 |I l | → ∞ , the ML classiﬁer in(52) achieves zero probability of error for classifying digitalamplitude-phase modulations regardless of the received SNRsat sensors. Proof:

The proof follows the same steps as in Theorem1 and is omitted here for brevity.

B. Noncoherent Reception with Unknown SNR

In this scenario, the received complex signal at sensor l canbe expressed as r n l = a l e jθ l I n l + w n l , n l ∈ I l . (53)The vector of unknowns for sensor l is u l = [ a l , θ l , N l ] . Letus ﬁrst consider the HLRT approach where sensor l computesits likelihood by ﬁrst marginalizing over the unknown symbols I n l , n l ∈ I l , and then plugging in the MLE of u l . Let usdeﬁne the vector of observations at the fusion center as r :=[ r , . . . , r L ] . Suppose that the independence condition in (49)is satisﬁed. Let p Hi ( r ) denote the likelihood function at thefusion center for the HLRT given as p Hi ( r ) := p ( r | H i , ˆ u , . . . , ˆ u L ) = L Y l =1 Y n l ∈I l p ( r n l | H i , ˆ u l ) . (54) | · | is the cardinality operator. EEE TRANSACTIONS ON WIRELESS COMMUNICATIONS (DRAFT) 7

Following the same reasoning as in the single sensor scenario,we can claim that P e → as N → ∞ using Theorem 1.However, the same result cannot be claimed for ﬁnite N evenwhen L → ∞ due to different unknown parameters at differentsensors.If a subset of unknowns are marginalized out in the HLRTapproach (see Section IV-B eqs. (41)-(44)), the distribution atthe fusion center takes the following form: p ( r | H i , ˆ u , ( i ) , . . . , ˆ u L, ( i ) ) = L Y l =1 p ( r l | H i , ˆ u l, ( i ) ) , (55)where ˆ u l, ( i ) denotes the ML estimate of the remaining un-known parameters of sensor l under H i , i.e., ˆ u l, ( i ) = arg max u p i ( r l | u ) (56)where p i ( r l | u ) = Z U p i ( r l | u , u ) f U ( u ) d u . (57)Then, the ML classiﬁer is given as ˆ i = arg max i =1 ,...,S p ( r | H i , ˆ u , ( i ) , . . . , ˆ u L, ( i ) ) (58)Similar to (44), since the unknowns [ a l , N l , θ l ] , l = 1 , . . . , L stay constant over the observation interval, it is clear from (57)that the observations r n l become dependent after averaging,i.e., p i ( r l | u ) = Y n l ∈I l p i ( r n l | u ) . (59)Therefore, these classiﬁers do not have provably vanishing P e in the asymptotic regime as N → ∞ due to dependence oras L → ∞ due to different unknown parameters at differentsensors.Let us now consider the ALRT approach where all theunknowns are marginalized out. Denote the joint a priori distribution of u l as f U ( u ) . Let p Ai ( r ) denote the likelihoodfunction at the fusion center for ALRT deﬁned as p Ai ( r ) := L Y l =1 p A ( r l | H i ) (60)where p A ( r l | H i ) = Z U p ( r l | H i , u ) f U ( u ) d u . (61)Now, deﬁne the following Λ Ai := − L log p Ai ( r l ) = − L L X l =1 log p A ( r l | H i ) . (62)The ML classiﬁer is given as ˆ i = arg min i =1 ,...,S Λ Ai . (63)For ALRT, we consider a special case where N is known , a is Rayleigh distributed with E [ a ] = Γ , and θ is uniformlydistributed over [ − π, π ] , i.e., θ ∼ U [ − π, π ] . From [1], we can When there is no non-stationary interference in the environment, N corresponds to stationary sensor background noise power which can beaccurately estimated using ofﬂine techniques. write the conditional pdf at sensor l as in (64) shown at thetop of the page, where C is a normalizing constant which isidentical for all modulation classes. Note that the expectation E I ( i ) in (64) requires summation over M Ni combinationsof constellation sequences which may be computationallyprohibitive for large N . Alternatively, (64) can be computedby changing the order of averaging operations, i.e., by ﬁrstaveraging over the unknown constellation symbols followedby averaging over the unknown channel phase and the channelamplitude. This alternative approach does not result in a closedform expression, therefore, it needs to be computed by usingnumerical techniques. Lemma 2:

Let S denote the set of PSK and QAM modula-tion classes. Deﬁne p Ai ( r l ) := p A ( r l | H i ) as given in (64). For i, j ∈ S , if i = j and N > , then D ( p Ai ( r l ) || p Aj ( r l )) > . Proof:

See Appendix B.

Theorem 4:

Suppose N is known, a is Rayleigh dis-tributed, and θ is uniformly distributed over [ − π, π ] . Thenthe ML classiﬁer in (63) achieves zero probability of error as L → ∞ . Proof:

The proof follows from Lemma 2 and the samemethod as in Theorem 1.Theorem 4 ensures that asymptotically vanishing P e is guaran-teed in the number of sensors if ALRT is used at each sensorprovided that each sensor has independent observations, i.e.,each sensor observes a non-overlapping time window of thetransmitted signal. In other words, using a multi-sensor ap-proach ensures asymptotically vanishing P e for ALRT whichis not provably the case for a single sensor as explained inSection IV-B. C. Fusion Rule

In this section, we analyze the implications of the indepen-dence condition in (49) for decision fusion based modulationclassiﬁcation. For ﬁnite number of observations (

N < ∞ ), it isclear that if (49) is not satisﬁed, there are sensors observing thesame baseband sequence resulting in dependent observationsdue to averaging over unknown constellation symbols. If (49)is not satisﬁed, even though each sensor noise is independent,the joint conditional distribution at the fusion center cannotbe written as a product of individual conditional distributions,i.e., p i ( r , . . . , r L ) = L Y l =1 p i ( r l ) . (65)However, in the asymptotic regime as N → ∞ , we have thefollowing theorem. Theorem 5:

Suppose there are two groups of L sensorsdenoted as G and G ′ observing the same signal with unknownmodulation. Suppose the sensors in G have arbitrary overlapsin their observations and the sensors in G ′ have no overlaps.Let r l and r ′ l , l = 1 , . . . , L denote the observations from thesensors in G and G ′ , respectively. Let p i ( r l ) ( p i ( r ′ l ) ) denote thelikelihood function of sensor l ( l ′ ) under H i which representseither a coherent scenario with known SNR as in (46) or anoncoherent scenario with unknown SNR in the forms of HLRor ALR as in (27) or (57) or (61). Suppose both groups use EEE TRANSACTIONS ON WIRELESS COMMUNICATIONS (DRAFT) 8 p A ( r l | H i ) = CE I ( i ) (

11 + Γ N k I ( i ) k exp Γ N k I ( i ) H r l k Γ N k I ( i ) k − k r l k N !) (64)the same fusion rule to classify the unknown modulation givenas: G : ˆ i = arg max i L Y l =1 p i ( r l ) , (66) G ′ : ˆ i = arg max i L Y l =1 p i ( r ′ l ) . (67)Let P e and P ′ e denote the probabilities of classiﬁcation errorfor the fusion rules in (66) and (67), respectively. As N → ∞ ,we have the following result: lim N →∞ ( P e − P ′ e ) = 0 (68) Proof:

Sensor observations in G are dependent. This de-pendence results solely from overlapping sensor observationsregardless of the scenario under consideration and regardlessof which classiﬁcation algorithm is employed (HLR or ALR).Suppose H i is the hypothesis under consideration. Let M i denote the set of constellation symbols for modulation i with |M i | = M i ; and I n , n = 1 , . . . , N denote the receivedconstellation symbol sequence by an arbitrary sensor. Suppose s ( i ) m ∈ M i and let s ( i ) m ( I n ) denote the indicator functiondeﬁned as s ( i ) m ( I n ) = 1 if I n = s ( i ) m or s ( i ) m ( I n ) = 0 otherwise. Now, deﬁne Ω( s ( i ) m ) := N X n =1 s ( i ) m ( I n ) (69)which represents the number of occurences of s ( i ) m in thereceived symbol sequence { I , . . . , I N } . Now, take the limit lim N →∞ N Ω( s ( i ) m ) ( a ) = E s ( i ) m [ s ( i ) m ( I n )] ( b ) = 1 M i (70)where ( a ) results from applying the law of large numbers and ( b ) results from the fact that each symbol in the constellationset M i is equally likely. We can rewrite (70) as lim N →∞ Ω( s ( i ) m ) = NM i , (71)which implies that as N → ∞ , each constellation symbol s ( i ) m ∈ M i has identical number of occurences NM i . Therefore,in the asymptotic regime ( N → ∞ ), each sensor observesequal number of different constellation symbols whether thosesymbols overlap across sensors or not.Now, consider sensor l and let I l n denote the n -th symbolreceived by sensor l . Note that p i ( r l ) = Q Nk =1 p i ( r l k ) ispermutation invariant with respect to r l = [ r l , . . . , r l N ] (or { I l , . . . , I l N } ), because each I l n is i.i.d. and backgroundnoise is white. In other words, p i ( r l ) is invariant to theorder of the received symbol sequence { I l , . . . , I l N } . Letus deﬁne a virtual sensor indexed by l ′ and suppose that itobserves a symbol sequence { I l ′ , . . . , I l ′ N } that does not over-lap with those observed by other sensors, i.e., { I l , . . . , I l N } and { I l ′ , . . . , I l ′ N } represents i.i.d. symbol sequences. As welet N → ∞ , the number of occurences of each symbol in { I l , . . . , I l N } and { I l ′ , . . . , I l ′ N } become identical from (71).This implies that { I l ′ , . . . , I l ′ N } becomes a re-ordered versionof { I l , . . . , I l N } . In this case (as N → ∞ ), the elementsof the observation vector r l can be re-ordered to form a newobservation vector r l ′ such that it represents noisy observationsof the virtual symbol sequence { I l ′ , . . . , I l ′ N } . It follows that,since p i ( r l ) is permutation invariant with respect to r l , wehave the following equality as N → ∞ : p i ( r l ) = p i ( r l ′ ) . (72)Similarly, we can follow the same argument as above andshow that p i ( r l ) = p i ( r l ′ ) , l = 1 , . . . , L . This implies that as N → ∞ , L Y l =1 p i ( r l ) = L Y l =1 p i ( r l ′ ) . (73)Finally, the above equality implies that as N → ∞ . arg max i L Y l =1 p i ( r l ) = arg max i L Y l =1 p i ( r ′ l ) , (74)which concludes the proof.The above result shows that as N → ∞ , we can alwaysre-arrange the order of original observations and create anequivalent system with independent observations resulting in anew system having the same classiﬁcation performance as theoriginal one provided that both systems use the same fusionrule. Remark 2:

We know that the optimal fusion rule for G ′ which minimizes P ′ e is given as ˆ i = arg max i Q Ll =1 p i ( r ′ l ) .The practical implication of Theorem 5 is that, for large N ,regardless of any overlap in the sensor observations, the fusionrule ˆ i = arg max i Q Ll =1 p i ( r l ) will achieve the performancewhich is the best that can be achieved by a multi-sensor systemwith independent sensor observations. Practical N values forwhich this performance can be achieved will be providedby numerical results in Section VI for different modulationclassiﬁcation scenarios.In practice, it may be impossible to characterize the depen-dence in sensor observations as sensors may have arbitraryand unknown overlaps in their observations. In this case, theoptimal fusion rule simply cannot be derived and the fusionrule that assumes independence becomes a natural choice.Theorem 5 provides an asymptotic performance guarantee forsuch a scenario. VI. N UMERICAL R ESULTS

In this section, we provide numerical results that corroborateour analyses in Sections IV and V. First, we consider thesingle sensor case and investigate two classiﬁcation scenarios:

EEE TRANSACTIONS ON WIRELESS COMMUNICATIONS (DRAFT) 9

1) binary classiﬁcation of BPSK versus (vs.) QPSK, 2) 3-ary classiﬁcation of 16-PSK vs.16-QAM vs. 32-QAM. Figures2 and 3 show P e versus number of observations ( N ) undertwo different SNR regimes. The results are obtained using Monte Carlo simulations. The difference between thetwo ﬁgures is that the former assumes a coherent scenariowith known SNR whereas the latter assumes a noncoherentscenario with unknown SNR for which HLRT is used as theclassiﬁer. It is clear from both the ﬁgures that P e decreasesmonotonically as N increases under both SNR regimes whichsupport the analyses of Theorems 1 and 2. As expected, therate of decrease in P e is slower under dB SNR than thatunder dB SNR. Since Theorem 3 is an extension of Theorem1 to a multi-sensor case, we do not provide additional resultsfor that particular scenario.Fig. 4 demonstrates the performance of ALRT for classiﬁ-cation of BPSK vs. QPSK with respect to number of sensors ( L ) under two different SNR regimes. Each sensor receives aRayleigh faded signal with an average channel SNR deﬁnedas E [ a ] /N = Γ /N . The number of observations per sensoris set to N = 4 . Similar to the previous cases, MonteCarlo simulations are used to obtain the results. As stated byTheorem 4 and shown in Fig. 4, P e decreases monotonicallyas L gets larger regardles of the SNR regime. Furthermore,the rate of decrease in P e is slower for smaller SNR valuesas expected.Finally, Figures 5 and 6 illustrate how the fusion rulethat assumes independent sensor decisions behaves asymp-totically under dB SNR for two different classiﬁcationscenarios: 1) binary classiﬁcation of 16-PSK vs.16-QAM,2) 3-ary classiﬁcation of 64-QAM vs. 128-QAM vs. 256-QAM, respectively. Both ﬁgures assume coherent scenarioswith known SNRs. In the ﬁgures, “Independent Observations”refers to the case where the condition in (49) is satisﬁed, i.e.,each sensor oberves a non-overlapping window of the signal,whereas “Dependent Observations” is the case where eachsensor oberves the same window, i.e., there is complete over-lap between sensor observations. Results are obtained using Monte Carlo simulations. In Fig. 5, each marked pointrepresents L × N = 1000 observations and those points cor-respond to N = { , , , , , , , , } resultingin L = { , , , , , , , , } . When sensorobservations are independent, P e is identical for all the pointswhere L × N is constant. This is shown in both ﬁgures under“Independent Observations” case. It is clear from Fig. 5 that as N grows, the performance of both systems converge support-ing the analysis in Theorem 5. For this particular scenario,when N = 250 and L = 4 , the classiﬁcation performanceof the system with dependent observations is almost identicalto that with independent observations where both fusion rulesassume independent observations. In Fig. 6, each marked pointrepresents L × N = 3000 observations and those points cor-respond to N = { , , , , , , , , } resulting in L = { , , , , , , , , } . For thisscenario, when N = 1000 and L = 3 , the classiﬁcationperformance of the system with dependent observations isalmost identical to that with independent observations. Wenote that the convergence of the former scenario in Fig. 5 is faster than the latter in Fig. 6. This is due to the differencebetween cardinalities of constellation sets under considera-tion. Modulations with larger constellation sets require morenumber of observations for the mixing in (70) to take place.Therefore, practical N values for which the two systems thatuse the same fusion rule behave identical is dependent on theclassiﬁcation scenario under consideration.VII. C ONCLUSION

In this paper, we have investigated asymptotic behaviorof LB modulation classiﬁcation systems under two differentscenarios: 1) coherent reception with known SNR, and 2)noncoherent reception with unknown SNR. Both a single-sensor setting and a multi-sensor setting that uses a distributeddecision fusion approach are analyzed. In a single-sensorsetting, it has been shown that P e vanishes asymptoticallyin the number of observations ( N ) under coherent receptionwith known SNR. Under noncoherent reception with unknownSNR, HLRT achieves perfect classiﬁcation, i.e., P e → ∞ , inthe asymptotic regime as N → ∞ , whereas this is not provablythe case for ALRT. This property of HLRT is due to consis-tency of the ML estimator as well as statistical independenceof data symbols in time. In a multi-sensor setting, under theassumption of independent sensor observations, it has beenshown that perfect classiﬁcation is achieved, i.e., P e → ∞ ,in the asymptotic regime as the number of sensors L → ∞ provided that each sensor employs ALRT regardless of thenumber of observations ( N ). However, this is not provablythe case when each sensor employs HLRT using a ﬁnitenumber of samples ( N < ∞ ). Finally, the asymptotic analysisof the fusion rule that assumes independent sensor observa-tions is carried out. It has been shown that this fusion ruleasymptotically achieves the same performance as the best thatcan be achieved by a system employing independent sensorobservations. The asymptotic results derived in this paper havepractical implications in that they provide design guidelinesas to which LB classiﬁcation method should be selected forthe speciﬁc scenario under consideration. Furthermore, theyprovide theoretical asymptotic performance guarantees forpractical systems, which, otherwise, would be unknown.As a future work, it would be interesting to investigate thecase where each sensor makes hard decisions, i.e., quantizedlikelihoods are sent to the fusion center, instead of softdecisions (analog likelihoods) as assumed in this paper, andthe fusion center employs hard decision fusion for modula-tion classiﬁcation. We can conjecture that, under independentidentical quantizer assumptions, one would obtain similarasymptotic results as for the soft decision fusion analyzed inthis paper. Nevertheless, a rigorous treatment would be useful.Furthermore, we would like to incorporate additional unknownsignal parameters such as frequency and time offsets into thesignal model for similar asymptotic analyses in the future.A PPENDIX AP ROOF OF L EMMA i = j , then p ( r | H i , u i ) and p ( r | H j , u j ) are not identical distributions for any u i , u j . We EEE TRANSACTIONS ON WIRELESS COMMUNICATIONS (DRAFT) 10

BPSK vs. QPSKlog N P e SNR = 0 dBSNR = 6 dB1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 300.10.20.30.40.50.6 log N P e SNR = 0 dBSNR = 6 dB

Fig. 2. Coherent scenario with known SNR. P e versus number of observa-tions (N) under two different SNR regimes: dB and dB. BPSK vs. QPSKlog N P e SNR = 0 dBSNR = 6 dB1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 300.10.20.30.40.50.6 log N P e SNR = 0 dBSNR = 6 dB

Fig. 3. Noncoherent scenario with unknown SNR. P e versus number ofobservations (N) under two different SNR regimes: dB and dB. note from (30) that each p ( r | H i , u i ) is a complex GMMwith M i components where each component has the sameoccurrence probability /M i , i.e., p i ( r | u i ) = 1 M i M i X m =1 p ( r | H i , u i , I m, ( i ) )= 1 M i π M i X m =1 N ,i exp (cid:18) | r − a i e jθ i I m, ( i ) | N ,i (cid:19) (75)If the transmitted signal is an M-PSK signal, then I m, ( i ) ∈S MP . Otherwise, if the transmitted signal is an M-QAM signal,then I m, ( i ) ∈ S MQ . From (75), the mean value of eachcomponent in the GMM corresponds to a unique constellationsymbol (in the constellation map of modulation format i )scaled by a i and rotated by θ i . The variance of each componentis N ,i . For different modulation classes i and j , there are twocases to be considered:i) Case-1: Modulations i and j represent two modulationclasses with different number of constellation symbols.In this case, p i ( r | u i ) and p j ( r | u j ) represent two GMMswith different number of components, i.e., M i = M j .Therefore, p i ( r | u i ) and p j ( r | u j ) are not identical distri- butions and, hence, D ( p i ( r | u i ) || p j ( r | u j )) > .ii) Case-2: Modulations i and j represent two modulationclasses with the same number of constellation symbols. Inthis case, one of the modulation classes is M-PSK and theother is M-QAM. Suppose modulations i and j representM-PSK and M-QAM, respectively. In this case, the meanvalue of each component in the GMM is given by µ i, ( m ) ∈ S ′ MP = { a i e j (2 πm/M + θ i ) | m = 0 , . . . , M − } and µ j, ( m ) ∈ S ′ MQ = { a j b m e j ( θ m + θ j ) | m = 0 , . . . , M − } . We know from M-QAM constellation symbol set thatthere exist m and m such that b m = b m . In orderfor p i ( r | u i ) and p j ( r | u j ) to be identical, the followingcondition should be satisﬁed: a i e j (2 πm/M + θ i ) = a j b m e j ( θ m + θ j ) , m = 0 , . . . , M − . (76)Now suppose p i ( r | u i ) and p j ( r | u j ) are identical and con-sider m and m such that b m = b m . Then, from (76),we can write a i e j (2 πm /M + θ i ) = a j b m e j ( θ m + θ j ) , whichimplies that a i /a j = b m . Since p i ( r | u i ) and p j ( r | u j ) areidentical, we can also write from (76) that a i e j (2 πm /M + θ i ) = a j b m e j ( θ m + θ j ) implying that a i /a j = b m , which is a con-tradiction, because b m = b m . Then, p i ( r | u i ) and p j ( r | u j ) must be different GMMs, therefore, D ( p i ( r | u i ) || p j ( r | u j )) > . A PPENDIX BP ROOF OF L EMMA l for simplicity of the presentation.There are three cases to be considered:i) Case-1: Modulations i and j represent two differentPSK modulations, i.e., I ( i ) n ∈ S M i P = { e j πm/M i | m =1 , . . . , M i } , where M i = 2 k i , k i ∈ N . First, suppose N = 1 . Then, under H i , (64) becomes p A ( r | H i ) /C = E I ( i ) (

11 + Γ N | I ( i ) | exp Γ N | I ( i ) ∗ r | Γ N | I ( i ) | − | r | N !) = 1 M i M i X m =1

11 + Γ N | I m, ( i ) | exp Γ N | I m, ( i ) | | r | Γ N | I m, ( i ) | − | r | N ! ( a ) = 11 + Γ N exp (cid:18) − | r | Γ + N (cid:19) (77)where ( a ) follows from E [ | I ( i ) | ] = 1 and each symbolbeing equally likely, which implies that | I m, ( i ) | =1 , ∀ m . We note that (77) is independent of H i .Therefore, p A ( r | H i ) = p A ( r | H j ) which impies that D ( p Ai ( r ) || p Aj ( r )) = 0 for N = 1 . Now suppose N > .In order to show that D ( p Ai ( r ) || p Aj ( r )) > , it sufﬁcesto show that there exists an r such that p A ( r | H i ) = p A ( r | H j ) . Let us set r = (vector of ones) and write EEE TRANSACTIONS ON WIRELESS COMMUNICATIONS (DRAFT) 11 (64) as p A ( | H i ) /C = E I ( i ) (

11 + Γ N k I ( i ) k exp Γ N k I ( i ) ∗ k Γ N k I ( i ) k − k k N !) = e − NN M Ni M i X m =1 . . . M i X m N =1

11 + Γ N N P k =1 | I m k , ( i ) | exp  Γ N | N P k =1 I m k , ( i ) | Γ N N P k =1 | I m k , ( i ) |  = e − NN M Ni (1 + N Γ N ) M i X m =1 . . . M i X m N =1 exp  Γ N | N P k =1 I m k , ( i ) | N Γ N  = e − NN M Ni (1 + N Γ N ) M i X m =1 . . . M i X m N =1 exp Γ N N + N Γ N X k =1 | I m k , ( i ) | ! exp N N + N Γ N X k =1 N X h>k R{ I m k , ( i ) ∗ I m h , ( i ) } ! = e − N ( N N Γ − Γ) N N N Γ) M Ni (1 + N Γ N ) M i X m =1 . . . M i X m N =1 exp N N + N Γ N X k =1 N X h>k R{ I m k , ( i ) ∗ I m h , ( i ) } ! = e − N ( N N Γ − Γ) N N N Γ) M Ni (1 + N Γ N ) M i X m =1 . . . M i X m N =1 exp N N + N Γ N X k =1 N X h>k cos (2 π ( m h − m k ) /M i ) ! (78)where R{·} denotes the real part of a complex number.We note that, for ﬁxed

N > , (78) cannot be reducedto a constant that is independent of M i , i.e., H i . In otherwords, for each M i = 2 k i , k i ∈ N , (78) will result in adifferent value. Therefore, p A ( | H i ) = p A ( | H j ) whichimplies that D ( p Ai ( r ) || p Aj ( r )) > for N > .ii) Case-2: Modulations i and j represent two QAM mod-ulations, i.e., I ( i ) n ∈ S M i Q = { b m e jθ m | m = 1 , . . . , M i } .Using the above methodology used in Case-1, we canshow that p Ai ( ) = p Aj ( ) for N ≥ where denotesvector of zeros. Details are omitted for the sake of brevity.Therefore, D ( p Ai ( r ) || p Aj ( r )) > for N ≥ .iii) Case-3: Modulations i and j represent PSK and QAMmodulations, respectively. In this case, similar to theabove, we can show that p Ai ( ) = p Aj ( ) for N ≥ .Details are omitted for the sake of brevity. Therefore, D ( p Ai ( r ) || p Aj ( r )) > for N ≥ . Number of Sensors (L) P e BPSK vs. QPSK

Average Channel SNR = 0 dBAverage Channel SNR = 6 dB

Fig. 4. ALRT with N = 4 observations. P e versus number of sensors (L)under two different SNR regimes: dB and dB. log N P e Dependent ObservationsIndependent Observations

Fig. 5. P e with the fusion rule in (66) using dependent vs. independentobservations (16-PSK vs. 16-QAM). N P e Dependent ObservationsIndependent Observations

Fig. 6. P e for the fusion rule in (66) using dependent vs. independentobservations (64-QAM vs. 128-QAM vs. 256-QAM). R EFERENCES[1] O. A. Dobre, A. Abdi, Y. Bar-Ness, and W. Su, “Survey of auto-matic modulation classiﬁcation techniques: classical approaches and newtrends,”

IET Communications , vol. 1, no. 2, pp. 137–159, Apr. 2007.[2] P. Forero, A. Cano, and G. B. Giannakis, “Distributed feature-basedmodulation classiﬁcation using wireless sensor networks,” in

Proc. IEEEMILCOM , Nov. 2008.[3] W. Su and J. Kosinski, “Framework of network centric signal sensingfor automatic modulation classiﬁcation,” in

Proc. IEEE ICNSC , Chicago,IL, Apr. 2010, pp. 534–539.[4] J. L. Xu, W. Su, and M. Zhou, “Distributed automatic modulation

EEE TRANSACTIONS ON WIRELESS COMMUNICATIONS (DRAFT) 12 classiﬁcation with multiple sensors,”

IEEE Sensors Journal , vol. 10,no. 11, pp. 1779–1785, Nov. 2010.[5] ——, “Asynchronous and high-accuracy digital modulated signal detec-tion by sensor networks,” in

Proc. IEEE Military Communications Conf.(MILCOM) , Nov. 2011.[6] Y. Zhang, N. Ansari, and W. Su, “Optimal decision fusion basedautomatic modulation classiﬁcation by using wireless sensor networksin multipath fading channel,” in

Proc. IEEE Global CommunicationsConf. (GLOBECOM) , Dec. 2011.[7] J. L. Xu, W. Su, and M. Zhou, “Likelihood-ratio approaches to automaticmodulation classiﬁcation,”

IEEE Trans. Systems, Man, and Cybernetics- Part C: Applications and Reviews , vol. 41, no. 4, pp. 455–469, Jul.2011.[8] O. A. Dobre, A. Abdi, Y. Bar-Ness, and W. Su, “Blind modulationclassiﬁcation: a concept whose time has come,” in

Proc. IEEE SarnoffSymposium on Advances in Wired and Wireless Comm. , Apr. 2005.[9] W. Wei and J. M. Mendel, “Maximum-likelihood classiﬁcation fordigital amplitude-phase modulations,”

IEEE Trans. Communications ,vol. 48, no. 2, pp. 189–193, Feb. 2000.[10] F. Gini and G. B. Giannakis, “Frequency offset and symbol timingrecovery in ﬂat-fading channels: a cyclostationary approach,”

IEEETrans. Communications , vol. 46, no. 3, pp. 400–411, Mar. 1998.[11] O. A. Dobre, A. Abdi, Y. Bar-Ness, and W. Su, “On the classiﬁcationof linearly modulated signals in fading channels,” in

Proc. Conf. onInformation Sciences and Systems (CISS) , Mar. 2004.[12] F. Hameed, O. A. Dobre, and D. C. Popescu, “On the likelihood-basedapproach to modulation classiﬁcation,”

IEEE Trans. Wireless Comm. ,vol. 8, no. 12, pp. 5884–5892, Dec. 2009.[13] V. G. Chavali and C. R. C. M. da Silva, “Maximum-likelihood clas-siﬁcation of digital amplitude-phase modulated signals in ﬂat fadingnon-Gaussian channels,”

IEEE Trans. Communications , vol. 59, no. 8,pp. 2051–2056, Aug. 2011.[14] D. Reynolds, “Gaussian mixture models,”

Encyclopedia of BiometricRecognition, Springer , Feb. 2008.[15] A. D’Costa and A. M. Sayeed, “Collaborative signal processing for dis-tributed classiﬁcation in sensor networks,” in

Lecture Notes in ComputerScience, Proc. IPSN , F. Zhao and L. Guibas, Eds. Berlin, Germany, Apr.2003, pp. 193–208.[16] T. M. Cover and J. A. Thomas,

Elements of Information Theory . Wiley,1991.[17] J. Hamkins, M. K. Simon, and J. H. Yuhen,

Autonomous Software-Deﬁned Radio Receivers for Deep Space Applications (JPL Deep-SpaceCommunications and Navigation Series) . Wiley-Interscience, 2006.[18] G. Casella and R. L. Berger,

Statistical Inference . Duxbury Press, 2001.[19] H. Akaike, “Information theory and an extension of the likelihoodprinciple,” in

Proc. Int. Symposium on Information Theory (ISIT) ,Budapest, 1973.[20] H. White, “Maximum likelihood estimation of misspeciﬁed models,”