[PDF] Infinite Factorial Finite State Machine for Blind Multiuser Channel Estimation

Abstract

New communication standards need to deal with machine-to-machine communications, in which users may start or stop transmitting at any time in an asynchronous manner. Thus, the number of users is an unknown and time-varying parameter that needs to be accurately estimated in order to properly recover the symbols transmitted by all users in the system. In this paper, we address the problem of joint channel parameter and data estimation in a multiuser communication channel in which the number of transmitters is not known. For that purpose, we develop the infinite factorial finite state machine model, a Bayesian nonparametric model based on the Markov Indian buffet that allows for an unbounded number of transmitters with arbitrary channel length. We propose an inference algorithm that makes use of slice sampling and particle Gibbs with ancestor sampling. Our approach is fully blind as it does not require a prior channel estimation step, prior knowledge of the number of transmitters, or any signaling information. Our experimental results, loosely based on the LTE random access channel, show that the proposed approach can effectively recover the data-generating process for a wide range of scenarios, with varying number of transmitters, number of receivers, constellation order, channel length, and signal-to-noise ratio.

Full PDF

IIEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 1

Inﬁnite Factorial Finite State Machinefor Blind Multiuser Channel Estimation

Francisco J. R. Ruiz, Isabel Valera, Lennart Svensson, and Fernando Perez-Cruz

Abstract —New communication standards need to deal withmachine-to-machine communications, in which users may start orstop transmitting at any time in an asynchronous manner. Thus,the number of users is an unknown and time-varying parameterthat needs to be accurately estimated in order to properly recoverthe symbols transmitted by all users in the system. In thispaper, we address the problem of joint channel parameter anddata estimation in a multiuser communication channel in whichthe number of transmitters is not known. For that purpose,we develop the inﬁnite factorial ﬁnite state machine model, aBayesian nonparametric model based on the Markov Indianbuffet that allows for an unbounded number of transmitters witharbitrary channel length. We propose an inference algorithm thatmakes use of slice sampling and particle Gibbs with ancestorsampling. Our approach is fully blind as it does not require aprior channel estimation step, prior knowledge of the numberof transmitters, or any signaling information. Our experimentalresults, loosely based on the LTE random access channel, showthat the proposed approach can effectively recover the data-generating process for a wide range of scenarios, with varyingnumber of transmitters, number of receivers, constellation order,channel length, and signal-to-noise ratio.

Index Terms —Bayesian nonparametrics, stochastic ﬁnite statemachine, multiuser communications, machine-to-machine.

I. I

NTRODUCTION O NE of the trends in wireless communication networks(WCNs) is the increase of heterogeneity [1]. Nowadays,users of WCNs are no longer only humans talking, andthe number of services and users are booming. Machine-to-machine (M2M) communications and the Internet of Things(IoT) will shape the trafﬁc in WCN in the years to come [2]–[5]. While there are millions of M2M cellular devices alreadyusing second, third and fourth generation cellular networks,the industry expectation is that the number of devices willincrease ten-fold in the coming years [6].M2M trafﬁc, which also includes communication betweena sensor/actuator and a corresponding application server inthe network, is distinct from consumer trafﬁc, which has beenthe main driver for the design of long term evolution (LTE)

F. J. R. Ruiz is with the Department of Computer Science, ColumbiaUniversity in the City of New York, United States of America, and withthe Engineering Department, University of Cambridge, United Kingdom.E-mail: [email protected]. Valera is with the Department of Empirical Inference at the MaxPlanck Institute for Intelligent Systems in T¨ubingen, Germany, and with theDepartment of Signal Processing and Communications, University Carlos IIIin Madrid, Spain.L. Svensson is with Chalmers University of Technology at Gothenburg,Sweden.F. Perez-Cruz is Chief Data Scientist at the Swiss Data Science Center,Zurich, Switzerland, and is also with the Department of Signal Processingand Communications, University Carlos III in Madrid, Spain.Manuscript received MONTH DD, 2017; revised MONTH DD, 2017. systems. First, while current consumer trafﬁc is characterizedby small number of long lived sessions, M2M trafﬁc involves alarge number of short-lived sessions, with typical transactionsof a few hundred bytes. The short payloads involved inM2M communications make it highly inefﬁcient to establishdedicated bearers for data transmission. Therefore, in somecases it is better to transmit small payloads in the randomaccess request itself [7]. Second, a signiﬁcant number ofbattery powered devices are expected to be deployed at adverselocations such as basements and tunnels (e.g., undergroundwater monitors and trafﬁc sensors) that demand superior linkbudgets. Motivated by this need for increasing the link budgetfor M2M devices, transmission techniques that minimize thetransmit power for short burst communications are needed [8].Third, the increasing number of M2M devices requires newtechniques on massive access management [9], [10]. Due tothese differences, there are strong arguments for the need tooptimize WCNs speciﬁcally for M2M communications [6].The nature of M2M trafﬁc leads to multiuser communica-tion systems in which a large number of users may aim atentering or leaving the system (i.e., start or stop transmitting)at any given time. In this context, we need a method that allowsthe users to access the system in a way that the signalingoverhead is reduced. The method should determine the numberof users transmitting in a communication system, and jointlyperform channel estimation and detect the transmitted data,with minimum pilot signaling. This problem appears in severalspeciﬁc applications. For instance, in the context of wirelesssensor networks, where the communication nodes can oftenswitch on and off asynchronously during operation. It alsoappears in massive multiple-input multiple-output (MIMO)multiuser communication systems [11], [12], in which thebase station has a very large number of antennas and themobile devices use a single antenna to communicate withinthe network. In a code division multiple access (CDMA)context, a set of terminals randomly access the channel tocommunicate with a common access point, which receives thesuperposition of signals from the active terminals only [13].In these applications, the number of users is an unknown andtime-varying parameter that we need to infer.In this paper, we aim at solving the channel estimationand symbol detection problems in a fully unsupervised way,without the need of signaling data. Our approach is thussuitable for applicability on the random access channel, inwhich more than one terminal may decide to transmit data atthe same time. To this end, we advocate for the use of Bayesiannonparametric (BNP) tools, which can adapt to heterogeneousstructures by considering an unbounded number of users intheir prior. We develop a BNP model that has the required a r X i v : . [ ee ss . SP ] O c t EEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 2 ﬂexibility to account for any number of transmitters withoutthe need of additional previous knowledge or bounds. Weare interested in showing that BNP models can solve theproblem blindly and that versatility can improve detectionin heterogeneous WCNs. Thus, or goal is to show that wecan recover the messages and infer the number of transmit-ters using as little information as possible. The transmittedmessages are therefore assumed to lack additional pilots orother structure. This makes our approach applicable on anyinterference-limited multiuser communication system in whichthe transmission of pilots or the need for synchronization mayseverely limit the spectral efﬁciency of the WCN.The main difference between our model and the existingapproaches in the literature is that our model is fully blind—with respect to the channel coefﬁcients, the number of trans-mitters, and the transmitted symbols—and can recover thenumber of users without the need of synchronization. Indeed,the problem of multi-user detection has been widely studied inthe literature, using standard linear methods (least squares orminimum mean squared error) or non-linear approaches (e.g.,successive interference cancellation or parallel interferencecancellation) [14], [15]. Most of these approaches requiresynchronization techniques (see, e.g., [13], [16], [17]). In orderto perform the user identiﬁcation step, the common approachis to pre-specify signature sequences that are speciﬁc for eachuser [18]–[20], which imposes a constraint on the total numberof users in the system.Our model also differs from existing approaches in otherways. For example, in [21] users may only enter or leavethe system at certain pre-deﬁned instants. In [22] the useridentiﬁcation step is performed ﬁrst without symbol detection.The approach in [23] requires knowledge of the channel, andthe method in [13] is restricted to ﬂat-fading channels. Themethods in [24], [25] come at the cost of both importantapproximations and prohibitive computational cost (experi-ments are presented with short frames of only ten symbols). Incontrast, our model is fully blind, it does not need a previouschannel estimation step, and it is not restricted to ﬂat-fadingchannels or very short frames.All these methods assume that the number of users in thesystem is known, which makes sense in a DS-CDMA systembut may represent a limitation in other scenarios. They alsorequire synchronization mechanisms. We build on previousBNP approaches that do not have these constraints [26]–[28].In particular, we extend the method in [28] to channels withmemory, and we address the exponential complexity of [26],[27], which is only applicable for BPSK systems and for achannel length of at most . Instead, the complexity of ouralgorithm scales with the square of the channel length.In summary, our model does not need any of the follow-ing requirements: a constraint on the number of users, thatthe users are synchronized with the network, that the userstransmit a preamble, that the channel is known, or that thechannel length equals one. Our model allows for an unboundednumber of transmitters due to its nonparametric nature, and itis not restricted to memoryless channels because transmittersare modeled as ﬁnite state machines. Our goal is to show thatwe can recover the number of transmitters and their payloads with very little information.Our model follows the three steps of cognitive communica-tions. It gathers the incoming signal, which is a mixture of alltransmitters that are active at a given time instant (perception);it learns the number of users, the channel they face, and itdetects their payload in a blind way (learning); and it assignsthe symbols from each user and passes that information to thenetwork to act upon (decision-making). These features canmitigate the effect of collisions and reduce the number ofretransmissions over the random access channel, which is ofmayor concern in LTE-A systems when M2M communicationsstart being a driving force in WCNs [29], [30]. A. Technical contributions

We model the multiuser communication system as an in-ﬁnite factorial ﬁnite state machine (IFFSM) in which a po-tentially inﬁnite number of ﬁnite state machines (FSMs), eachrepresenting a single transmitter, contribute to generate the ob-servations. The symbols transmitted by each user correspondto the inputs of each FSM, and its memory accounts for themultipath propagation between each transmitter-receiver pair.The output of the IFFSM corresponds to the received signal,which depends on the inputs and the current states of the activeFSMs (i.e., the active transmitters), under noise. Our IFFSMconsiders that transmitters may start or stop transmitting atany time, and it ensures that only a ﬁnite subset of the usersbecome active during any ﬁnite observation period, while theremaining (inﬁnite) transmitters remain in an idle state (i.e.,they do not transmit).As for most BNP models, one of the main challengesof our IFFSM is posterior inference. We develop a suitableinference algorithm by building a Markov chain Monte Carlo(MCMC) kernel using particle Gibbs with ancestor sampling(PGAS) [31], a recently proposed algorithm that belongs tothe broader family of particle Markov chain Monte Carlo(PMCMC) methods [32]. This algorithm presents quadraticcomplexity with respect to the memory length, avoiding the ex-ponential complexity of previous approaches, such as forward-ﬁltering backward-sampling schemes [26], [27], [33], [34].Our experimental results, based on the LTE random accesschannel, show that the proposed approach efﬁciently solvesuser activity detection, channel estimation and data detectionin a jointly and fully blind way.

B. Organization of the paper

The rest of the paper is organized as follows. In Section II,we review the basic building blocks that we use to developour model, namely, the stochastic ﬁnite-memory FSM andthe Markov Indian buffet process (mIBP). Section III detailsour proposed IFFSM, whereas we describe the inferencealgorithm in Section IV. Sections V and VI are devoted tothe experiments and conclusions, respectively.II. B

ACKGROUND

Here we describe the two main building blocks that we useto develop our IFFSM: the FSM [35] and the mIBP [34].

EEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 3

A. Stochastic Finite State Machines

FSMs have been applied to a huge variety of problems inmany areas, including biology (e.g., neurological systems),signal processing (e.g., speech modeling), control, communi-cations and electronics [36], [37]. In its general form [35], anFSM is deﬁned as an abstract machine consisting of: • A set of states, which might include one or several initialand ﬁnal states. • A set of discrete or continuous inputs. • A set of discrete or continuous outputs. • A state transition function, which takes the current inputand state and returns the next state. • An emission function, which takes the current state andinput and returns an output.The transition and emission functions are, in general,stochastic (e.g., hidden Markov models (HMMs) [38], [39]),which implies that the input events and the states are notdirectly observable through the output events (instead, eachinput and state produces one of the possible output events witha certain probability). We focus on stochastic ﬁnite-memoryFSMs, in which each state can be represented as a ﬁnite-lengthsequence containing the last L inputs. This FSM relies on aﬁnite memory L and a ﬁnite alphabet X . Each input x t ∈ X produces a deterministic change in the state of the FSM, anda stochastic observable output y t at time t . The next state andthe output depend on the current state and the input. The statecan be expressed as the vector containing the last L inputs,i.e., [ x t , x t − , . . . , x t − L +1 ] , therefore yielding |X | L differentstates, where |X | denotes the cardinality of the set X . Eachstate can only transition to |X | different states, depending onthe next input event. A state diagram of an HMM and an FSMis depicted in Figure 1.For any ﬁnite-memory FSM, the probability distributionover the inputs x t and the observations (outputs) y t can bewritten as p ( { x t , y t } Tt =1 ) = T (cid:89) t =1 p ( x t ) T (cid:89) t =1 p ( y t | x t , . . . , x t − L +1 ) , (1)i.e., the likelihood of each observation y t depends not onlyon x t , but also on the previous L − inputs. The model alsorequires the speciﬁcation of the initial state, which is deﬁnedby the inputs x , . . . , x − L . We show the graphical model foran FSM with memory length L = 2 in Figure 2. Note thatthis model can be equivalently represented as a standard HMMwith a sparse transition probability matrix of size |X | L ×|X | L .However, the HMM representation of the FSM leads to acomputationally intractable inference algorithm due to theexponential dependency on the memory length L of the statespace cardinality. B. Markov Indian Buffet Process

The central idea behind BNPs is the replacement of classicalﬁnite-dimensional prior distributions with general stochasticprocesses, allowing for an open-ended number of degrees offreedom in a model [40]. They constitute an approach to modelselection and adaptation in which the model complexity isallowed to grow with data size [41].

State 0State 1 State 2 State 3 (a)

State 00 State 01State 10 State 11 x t = 0 x t = x t = x t = 0 x t = x t = 1 x t = x t = 1 (b)Fig. 1. (a) State diagram of an HMM with states, in which all transitionsamong states are allowed. (b) State diagram of a stochastic ﬁnite-memory FSMwith memory length L = 2 and X = { , } . Each state can be represented bythe vector containing the last L input events. Each state can only transitionto other |X | = 2 states, depending on the input event x t . Hence, not alltransitions among states are allowed, but it is possible to reach any otherstate in the graph after L = 2 transitions. . . . . . . y y y T y x x x x T x T Fig. 2. Graphical representation of an FSM model with memory length L =2 . Here, x t denotes the transmitted symbol at time t , and y t is the received(observed) signal. The arrows indicate that the t -th received symbol dependson both the t -th and the ( t − )-th transmitted symbols. Within the family of BNP models, the Markov Indian buffetprocess (mIBP) is a variant of the Indian buffet process (IBP)[42] and constitutes the main building block of the inﬁnitefactorial hidden Markov model (IFHMM), which considersan inﬁnite number of ﬁrst-order Markov chains with binary-valued states that evolve independently [34].The mIBP places a prior distribution over a binary matrix S with a ﬁnite number of rows and an inﬁnite number ofcolumns. The t -th row represents time step t , whereas the m -thcolumn contains the (binary) states of the m -th Markov chain.Each element s tm = ( S ) tm ∈ { , } indicates whether the m -th Markov chain is active at time instant t , with t = 1 , . . . , T , m = 1 , . . . , M , and M → ∞ . The states s tm evolve accordingto the transition matrix A m = (cid:32) − a m a m − b m b m (cid:33) , (2)i.e., a m = p ( s tm = 1 | s ( t − m = 0) is the transition probabil-ity from inactive to active, and b m = p ( s tm = 1 | s ( t − m = 1) is the self-transition probability of the active state. We assumean inactive initial state for all chains at time t = 0 .In order to specify the prior distribution over the transitionprobabilities a m and b m , we focus here on the stick-breakingconstruction of the IBP in [43], because it allows for simpleand efﬁcient inference algorithms. Under this construction, weﬁrst introduce the notation a ( m ) to denote the sorted values of a m , such that a (1) > a (2) > a (3) > . . . , with a (1) ∼ Beta ( α, , (3) EEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 4 and p ( a ( m ) | a ( m − ) ∝ ( a ( m ) ) α − I (0 ≤ a ( m ) ≤ a ( m − ) , (4)being I ( · ) the indicator function, which takes value one ifits argument is true and zero otherwise. Here, α is theconcentration hyperparameter of the model, which controls theexpected number of latent Markov chains that become active a priori . The prior over variables b m is given by b m ∼ Beta ( β , β ) , (5)which does not depend on the index m . In (3) and (5), α , β and β are hyperparameters of the model. This priordistribution ensures that, for any ﬁnite value of T , only a ﬁnitenumber of columns M + become active, while the rest of themremain in the all-zero state.III. M ODEL D ESCRIPTION

We use the stochastic ﬁnite-memory FSM and the mIBPdetailed in Section II as building blocks for our IFFSM model,which we use for modeling a multiuser communication systemin which the number of transmitters is potentially inﬁnite.The key idea of our approach is to model the multiusercommunication system as a factorial FSM in which each FSMrepresents a user, and the inputs to the FSM are the symbolssent by each transmitter. The memory of the FSM accounts forthe multipath propagation between each transmitter-receiverpair, and the output of the factorial FSM model correspondsto the received signal, which depends on the inputs and thecurrent states of all the active FSMs (transmitters).In order to account for an unbounded number of transmitters(parallel FSMs), we need to consider an inactive state, suchthat the observations cannot depend on those transmitters thatare inactive. Furthermore, we need to ensure that only a ﬁnitenumber of them become active for any ﬁnite-length observedsequence. As detailed in Section II-B, the mIBP in [34] meetsthese requirements by placing a prior distribution over binarymatrices with an inﬁnite number of columns.

A. Inﬁnite Factorial Finite State Machine

We place a mIBP prior over an auxiliary binary matrix S ,where each element s tm indicates whether the m -th FSM isactive at time instant t . While active, the input symbol tothe m -th FSM at time instant t , denoted by x tm , is assumedto belong to the set A , with ﬁnite cardinality |A| . Here, A represents the constellation, and x tm stands for the transmittedsymbol of the m -th transmitter at time instant t . We assumethat the transmitted symbols are independent and uniformlydistributed on the set A . While inactive, we can assumethat x tm = 0 and, therefore, each input x tm ∈ X , with X = A (cid:83) { } . This can be expressed as x tm | s tm ∼ (cid:40) δ ( x tm ) if s tm = 0 , U ( A ) if s tm = 1 , (6)where δ ( · ) denotes a point mass located at , and U ( A ) denotes the uniform distribution over the set A .In the IFFSM model, each input symbol x tm does not onlyinﬂuence the observation y t at time instant t , but also the ↵ a m s m s m s Tm . . . , b m x m x m x Tm x m x m . . . s m s m . . . y y y y T m = 1 , . . . , Fig. 3. Graphical model of the IFFSM with memory length L = 2 . Here, x tm denotes the transmitted symbol by device m at time instant t , and y t are the observations (a combination of the signals transmitted by all the activedevices). The variable s tm indicates whether device m is active or inactiveat time instant t . The variables a m and b m govern the transition probabilitiesfrom inactive to active and vice-versa. The hyperparameters of the model are α , β , and β . future L − observations, y t +1 , . . . , y t + L − . Therefore, thelikelihood function for y t depends on the last L input symbolsof all the FSMs, yielding p ( y t | X ) = p ( y t |{ x tm , x ( t − m , . . . , x ( t − L +1) m } ∞ m =1 ) , (7)with X being the T × M matrix that contains all symbols x tm . We assume innactive states s tm = 0 for t ≤ (note that s tm = 0 implies x tm = 0 and viceversa).The resulting IFFSM model, particularized for L = 2 ,is shown in Figure 3. Note that this model can be equiv-alently represented as a non-binary version of the IFHMMin [34], using the extended states given by the vector [ x tm , x ( t − m , . . . , x ( t − L +1) m ] . However, we maintain therepresentation in Figure 3 because it allows deriving anefﬁcient inference algorithm.

B. Observation Model

The model described in the previous section is generaland can be applied to any sequence that can be explainedby a potentially unbounded number of parallel FSMs. Inthis section, we particularize this model for its applicabilityon wireless communications, in which the parallel chainscorrespond to different transmitters. The IFFSM requires twogeneral conditions for the likelihood model to be valid as thenumber of FSMs M tends to inﬁnity: i) the likelihood must beinvariant to permutations of the chains, and ii) the distributionon y t cannot depend on any parameter of the m -th FSM if s τm = 0 for τ = t − L + 1 , . . . , t . These conditions aremet by simultaneous transmissions in WCNs. The ﬁrst onesays that the likelihood does not depend on how we labelthe different transmitters, and the second one indicates thatinactive transmitters should not affect the observations. Amongothers, the standard discrete-time interference channel modelwith additive white Gaussian noise fulﬁls these restrictions: y t = M + (cid:88) m =1 L (cid:88) (cid:96) =1 h (cid:96)m x ( t − (cid:96) +1) m + n t . (8)In (8), x tm represents the complex input (i.e., a symbol from agiven constellation) at time instant t for the m -th FSM (user), EEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 5 ↵ a m s m s m s Tm . . . , b m x m x m x Tm x m x m . . . s m s m . . . y y y y T m = 1 , . . . , y ` = 1 , . . . , L ` h `m H , ,  Fig. 4. Graphical Gaussian observation model for an IFFSM with memorylength L = 2 . It combines the building block from Figure 3 with the channelmodel coefﬁcients h (cid:96)m , the prior variance of those channel coefﬁcients σ (cid:96) ,and the noise variance σ y . h (cid:96)m are the channel coefﬁcients (emission parameters, in theBNP literature), and n t is the additive white noise, which iscircularly symmetric complex Gaussian distributed, i.e., n t ∼ CN ( , σ y I D , ) . (9)The hyperparameter σ y is the noise variance, I D is the identitymatrix of size D , and D is the number of receiving antennas,and hence the dimensionality of y t , h (cid:96)m and n t . Consequently,the probability distribution over the received complex vectorat time t is described by y t |{ h (cid:96)m } , X ∼ CN  M + (cid:88) m =1 L (cid:88) (cid:96) =1 h (cid:96)m x ( t − (cid:96) +1) m , σ y I D ,  . (10)We ﬁnally place a circularly symmetric complex Gaussianprior over the channel coefﬁcients, i.e., h (cid:96)m | σ (cid:96) ∼ CN ( , σ (cid:96) I D , ) , (11)and an inverse-gamma hyper-prior over the variances σ (cid:96) , p ( σ (cid:96) ) = ( ν (cid:96) ) τ Γ( τ ) (cid:18) σ (cid:96) (cid:19) τ +1 exp (cid:26) − ν (cid:96) σ (cid:96) (cid:27) , (12)where ν (cid:96) = ( τ − σ H e − λ ( (cid:96) − and τ = 2 + κ − , being σ H , λ and κ hyperparameters of the model. The mean andstandard deviation of the variances σ (cid:96) are respectively givenby E [ σ (cid:96) ] = σ H e − λ ( (cid:96) − and Std[ σ (cid:96) ] = κ E [ σ (cid:96) ] . The choiceof this particular prior is based on the assumption that thechannel coefﬁcients h (cid:96)m are a priori expected to decay withthe index (cid:96) , since they model the multipath effect. However,if the data contains enough evidence against this assumption,the posterior distribution will assign higher probability massto larger values of σ (cid:96) .We depict the corresponding graphical model in Figure 4,with a value of L = 2 . The complex Gaussian distribution CN ( µ , Γ , C ) over avector x of length D is given by p ( x ) = π D √ det( Γ ) det( P ) × exp  − (cid:2) ( x − µ ) H , ( x − µ ) (cid:62) (cid:3) (cid:34) Γ CC H Γ (cid:63) (cid:35) − (cid:20) x − µ ( x − µ ) (cid:63) (cid:21) ,where P = Γ (cid:63) − C H Γ − C , ( · ) (cid:63) denotes the complex conjugate, and ( · ) H denotes the conjugate transpose. A circularly symmetric complex Gaussiandistribution has µ = and C = . IV. I

NFERENCE

One of the main challenges in Bayesian probabilistic modelsis posterior inference, which involves the computation of theposterior distribution over the hidden variables in the modelgiven the data. In most models of interest, including BNPmodels, the posterior distribution cannot be obtained in closedform, and an approximate inference algorithm is used instead.In many BNP time series models, inference is carried outvia MCMC methods [44]. In particular, for the IFHMM [34],typical approaches rely on a blocked Gibbs sampling algorithmthat alternates between sampling the number of parallel chainsand the global variables (emission parameters and transitionprobabilities) conditioned on the current value of matrices S and X , and sampling matrices S and X conditioned on thecurrent value of the global variables. We follow a similaralgorithm, which proceeds iteratively as follows: • Step 1:

Add M new new inactive chains (FSMs) usingan auxiliary slice variable and a slice sampling method.In this step, the number of considered parallel chains isincreased from its initial value M + to M ‡ = M + + M new (we do not update M + because the new chains are in theall-zero state). • Step 2:

Sample the states s tm and the input symbols x tm of all the considered chains. Compact the representationby removing those FSMs that remain inactive in the entireobservation period, consequently updating M + . • Step 3:

Sample the global variables, i.e., the transitionprobabilities a m and b m and the observation parameters h (cid:96)m for each active chain ( m = 1 , . . . , M + ), as well asthe variances σ (cid:96) .In Step 1 , we follow the slice sampling scheme for inferencein BNP models based on the IBP [34], [43], which effectivelytransforms the model into a ﬁnite factorial model with M ‡ = M + + M new parallel chains. We ﬁrst sample an auxiliary slicevariable ϑ , which is distributed as ϑ | S , { a ( m ) } M + m =1 ∼ Uniform (0 , a min ) , (13)where a min = min m : ∃ t,s tm (cid:54) =0 a ( m ) , and we can replace theuniform distribution with a more ﬂexible scaled beta distribu-tion. Then, starting from m = M + + 1 , new variables a ( m ) are iteratively sampled from p ( a ( m ) | a ( m − ) ∝ exp (cid:32) α T (cid:88) t =1 t (1 − a ( m ) ) t (cid:33) × ( a ( m ) ) α − (1 − a ( m ) ) T I (0 ≤ a ( m ) ≤ a ( m − ) , (14)with a ( M + ) = a min , until the resulting value is less than theslice variable, i.e., until a ( m ) < ϑ . Since Eq. 14 is log-concavein log a ( m ) [43], we can apply adaptive rejection sampling(ARS) [45]. Let M new be the number of new variables a ( m ) that are greater than the slice variable. If M new > , thenwe expand the representation of matrices S and X by adding M new zero columns, and we sample the corresponding per-chain global variables (i.e., h (cid:96)m and b m ) from the prior, givenin Eqs. 11 and 5, respectively. An inactive chain is a chain in which all elements s tm = 0 . EEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 6

Step 2 consists in sampling the elements of the matrices S and X given the current value of M ‡ and the globalvariables. In this step, several approaches can be taken. Ana¨ıve Gibbs sampling algorithm that sequentially samples eachelement x tm (jointly with s tm ) is simple and computationallyefﬁcient, but it presents poor mixing properties due to thestrong couplings between successive time steps [34], [46].An alternative to Gibbs sampling typically applied in facto-rial HMMs is blocked sampling, which sequentially sampleseach parallel chain, conditioned on the current value of theremaining ones. This approach requires a forward-ﬁlteringbackward-sampling (FFBS) sweep in each of the chains, whichyields runtime complexity of O ( T M ‡ |X | L +1 ) for our IFFSMmodel. The exponential dependency on L makes this methodcomputationally intractable. In order to address this limitationof the FFBS approach, we propose to jointly sample matrices S and X using PGAS, an algorithm recently developed forinference in state-space models and non-Markovian latentvariable models [31]. The runtime complexity of this algorithmscales as O ( P T M ‡ L ) , where P is the number of particlesused for the PGAS kernel. Details on the PGAS approach aregiven in Section IV-A.After running PGAS, we remove those chains that remaininactive in the whole observation period. This implies remov-ing some columns of S and X as well as the correspondingvariables h (cid:96)m , a m and b m , and updating M + .In Step 3 , we sample the global variables in the modelfrom their complete conditional distributions. The completeconditional distribution over the transition probabilities a m under the semi-ordered stick-breaking construction [43] is a m | S ∼ Beta ( n m , n m ) , (15)being n mij the number of transitions from state i to state j inthe m -th column of S . For the self-transition probabilities ofthe active state b m , we have b m | S ∼ Beta ( β + n m , β + n m ) . (16)The complete conditional distributions over the emission pa-rameters h (cid:96)m for all chains m = 1 , . . . , M + and for all taps (cid:96) = 1 , . . . , L are given by complex Gaussians of the form h ( d ) | Y , X , { σ (cid:96) } ∼ CN (cid:16) µ ( d ) POST , Γ POST , (cid:17) , (17)for d = 1 , . . . , D , where Y is the T × D matrix containingall vectors y t . Here, we have deﬁned for notational simplicity h ( d ) as the vector that contains the d -th component of vectors h (cid:96)m for all m and (cid:96) , as given by h ( d ) = (cid:104) ( h ) d , . . . , ( h L ) d , . . . , ( h M + ) d , . . . , ( h LM + ) d (cid:105) (cid:62) . (18) The complete conditional is the conditional distribution of a hiddenvariable, given the observations and the rest of hidden variables.

We can write the parameters µ ( d ) POST and Γ POST as follows. Weﬁrst deﬁne the extended matrix X ext = (cid:2) X (1) , . . . , X ( M + ) (cid:3) ofsize T × LM + , with X ( m ) =  x m · · · x m x m · · · x m x m x m · · · ... ... ... . . . ... x T m x ( T − m x ( T − m · · · x ( T − L +1) m  , (19) Σ as the L × L diagonal matrix containing all variables σ (cid:96) , and y ( d ) as the T -dimensional vector containing the d -th elementof each observation y t . Then, the posterior parameters inEq. 17 are given by Γ POST = (cid:18) ( I M + ⊗ Σ ) − + 1 σ y ( X ext ) H X ext (cid:19) − (20)and µ ( d ) POST = 1 σ y Γ POST ( X ext ) H y ( d ) , (21)being ( · ) H the conjugate transpose, ‘ ⊗ ’ the Kronecker product,and I M + the identity matrix of size M + .Regarding the complete conditionals of the variances σ (cid:96) ,they are given by inverse-gamma distributions of the form p ( σ (cid:96) |{ h (cid:96)m } M + m =1 ) ∝ (cid:18) σ (cid:96) (cid:19) τ + DM + exp (cid:40) − ν (cid:96) + (cid:80) M + m =1 || h (cid:96)m || σ (cid:96) (cid:41) , (22)being || h (cid:96)m || the squared L -norm of the vector h (cid:96)m . A. Particle Gibbs with Ancestor Sampling

We rely on PGAS [31] for Step 2 of our inference algo-rithm, in order to obtain a sample of the matrices S and X (which at this stage of the inference algorithm are truncatedto M ‡ columns). PGAS is a method within the frameworkof PMCMC [32], which is a systematic way of combiningsequential Monte Carlo (SMC) and MCMC to take advantageof the strengths of both techniques.In PGAS, a Markov kernel is constructed by running anSMC sampler in which one particle is set deterministically to areference input particle. This reference particle corresponds tothe output of the previous PGAS iteration (extended to accountfor the M new new FSMs). After a complete run of the SMCalgorithm, a new reference trajectory is obtained by selectingone of the particle trajectories with probabilities given by theirimportance weights. In this way, the resulting Markov kernelleaves its target distribution invariant, regardless of the numberof particles used. In contrast to other particle Gibbs withbackward simulation methods [47], [48], PGAS can also beapplied to non-Markovian latent variable models, i.e., modelsthat are not expressed on a state-space form [31], [49]. In thissection, we brieﬂy describe the PGAS algorithm and providethe necessary equations for its implementation.In PGAS, we assume a set of P particles for each time in-stant t , each representing the hidden states { x tm } M ‡ m =1 (hence, EEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 7 x t x t x t x t x t +1 x t +1 x t x t x t +1 t a t = 1 a t = a t = a t +1 = 3 a t +1 = 2 a t + = i = 1 i = 2 i = 3 t + 1 t Fig. 5. Example of the connection of particles in PGAS. We represent P = 3 particles x iτ for τ = { t − , t, t + 1 } . The index a iτ denotes the ancestorparticle of x iτ . It can be seen that, e.g., the trajectories x t +1 and x t +1 only differ at time instant t + 1 . they also represent { s tm } M ‡ m =1 ). We denote the state of the i -th particle at time t by the vector x it of length M ‡ . Similarly,the input reference particle for each time instant is denoted as x (cid:48) t . We also introduce the ancestor indexes a it ∈ { , . . . , P } in order to denote the particle that precedes the i -th particleat time t . That is, a it corresponds to the index of the ancestorparticle of x it . Let also x i t be the ancestral path of particle x it , i.e., the particle trajectory that is recursively deﬁned as x i t = ( x a it t − , x it ) . (23)Figure 5 shows an example to clarify the notation.The machinery inside the PGAS algorithm resembles anordinary particle ﬁlter, with two main differences: one of theparticles is deterministically set to the reference input sample,and the ancestor of each particle is randomly chosen and storedduring the algorithm execution. Algorithm 1 summarizes theprocedure. For each time instant t , we ﬁrst generate theancestor indexes for the ﬁrst P − particles according to theimportance weights w it − . Given these ancestors, the particlesare then propagated across time according to a distribution r t ( x t | x a t t − ) . For simplicity, and dropping the global vari-ables from the notation for conciseness, we assume that r t ( x t | x a t t − ) = p ( x t | x a t t − )= M ‡ (cid:89) m =1 p ( x tm | s tm ) p ( s tm | s a t ( t − m ) , (24)i.e., particles are propagated as in Figure 3 using a simplebootstrap proposal kernel, p ( x tm , s tm | s ( t − m , x ( t − m ) . The P -th particle is instead deterministically set to the referenceparticle, x Pt = x (cid:48) t , whereas the ancestor indexes a Pt aresampled according to some weights (cid:101) w it − | T . Indeed, this isa crucial step that vastly improves the mixing properties ofthe MCMC kernel.We now focus on the computation of the importance weights w it and the ancestor weights (cid:101) w it − | T . For the former, theparticles are weighted according to w it = p ( x t | y t ) p ( x t − | y t − ) r t ( x t | x t − ) ∝ p ( y t | x t ) p ( x t ) p ( y t − | x t − ) p ( x t − ) p ( x t | x t − )= p ( y t | x t − L +1: t ) , (25) Algorithm 1

Particle Gibbs with ancestor sampling

Input:

Reference particle x (cid:48) t for t = 1 , . . . , T , and globalvariables Output:

Sample x out1: T from the PGAS Markov kernel Draw x i ∼ r ( x ) for i = 1 , . . . , P − (Eq. 24) Set x P = x (cid:48) Compute the weights w i for i = 1 , . . . , P (Eq. 25) for t = 2 , . . . , T do Draw a it ∼ Categorical ( w t − , . . . , w Pt − ) for i =1 , . . . , P − Compute (cid:101) w it − | T for i = 1 , . . . , P (Eq. 26) Draw a Pt ∼ Categorical ( (cid:101) w t − | T , . . . , (cid:101) w Pt − | T ) Draw x it ∼ r t ( x t | x a it t − ) for i = 1 , . . . , P − (Eq. 24) Set x Pt = x (cid:48) t Set x i t = ( x a it t − , x it ) for i = 1 , . . . , P (Eq. 23) Compute the weights w it for i = 1 , . . . , P (Eq. 25) end for Draw k ∼ Categorical ( w T , . . . , w PT ) return x out1: T = x k T being y τ : τ the set of observations { y t } τ t = τ . We have applied(24) to derive this expression. Eq. 25 implies that, in orderto obtain the importance weights, it sufﬁces to evaluate thelikelihood at time t .The weights (cid:101) w it − | T used to draw a random ancestor for thereference particle are given by (cid:101) w it − | T = w it − p ( x i t − , x (cid:48) t : T | y T ) p ( x i t − | y t − ) ∝ w it − p ( y T | x i t − , x (cid:48) t : T ) p ( x i t − , x (cid:48) t : T ) p ( y t − | x i t − ) p ( x i t − ) ∝ w it − p ( x (cid:48) t | x it − ) t + L − (cid:89) τ = t p ( y τ | x i t − , x (cid:48) t : T ) . (26)In order to obtain this expression, we have made use of theMarkov property of the model, and we have also ignoredfactors that do not depend on the particle index i . Note that thetransition probability p ( x t | x t − ) factorizes across the parallelchains of the factorial model, as given in (24). We also notethat, for memoryless models (i.e., L = 1 ), Eq. 26 can besimpliﬁed, since the product in the last term is not presentand, therefore, (cid:101) w it − | T ∝ w it − p ( x (cid:48) t | x it − ) . B. Computational Complexity

For any channel length L , the resulting complexity ofeach iteration of the algorithm scales as O ( P T M ‡ L ) . Thisis because the most expensive step is the computation ofthe weights (cid:101) w it − | T in (26) for i = 1 , . . . , P , which hascomplexity scaling as O ( P M ‡ L ) . This computation needsto be performed for each time instant t = 1 , . . . , T , and hencethe resulting overall complexity.V. E XPERIMENTS

In this section, we apply the IFFSM model in Section IIIto the problem of user activity detection and blind channel

EEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 8 M + (a) Inferred M + . R e c o v e r ed T x (b) Recovered transmitters. −6 −4 −2 A D E R SNR IFFSMG−PGASG−FFBS (c) ADER. −5 SE R SNR IFFSMG−PGASG−FFBS (d) SER. −3 −2 −1 M SE SNR (e) MSE.Fig. 6. Results for different SNRs ( L = 1 ). estimation. We use two different channel models, namely, astandard multipath Rayleigh fading channel model in Sec-tion V-A and a ray-tracing model in Section V-B. A. Multipath Rayleigh Fading

In LTE, when the user equipment (UE) wants to transmitinformation, it needs to ask for resources in the random accesschannel (RACH) [50]. This is the ﬁrst of four messagesbetween the enhanced Node B (eNB), i.e., the network accesspoint, and the UE before the actual transmission of informationstarts. The UE selects one of random sequences with symbols and uses subcarriers ( . MHz) to transmit thesequence in . ms in a -ms sub-frame ( . ms are used forcyclic preﬁx and time guard band). In the typical conﬁguration,there is a RACH subframe every -ms frame. If two UEstransmit the same preamble sequence at the same physicalRACH, one of them (or both) would not get a response fromthe network and it (they) would need to resend the RACHpreamble, although in most cases this channel goes unused.In our simulations, we assume that instead of a RACHpreamble asking for resources, our IoT devices send a -symbol sequence with the payload that they want to transmitfor . ms, and the remaining . ms are reserved for timeguard band. The timing of the IoT might not be as accurate as M + (a) Inferred M + . R e c o v e r ed T x (b) Recovered transmitters. −4 −3 A D E R (c) ADER. −3 −2 SE R (d) SER. −3 −2 M SE (e) MSE.Fig. 7. Results for different number of transmitters ( L = 1 ). typical cellphones, and hence we allow for much larger guardband. We assume that the devices can start their transmissionat any time in the ﬁrst half of the physical RACH, and we alsoassume active users in each physical RACH channel interval.We consider a Rayleigh AWGN channel, i.e., the channelcoefﬁcients and the noise are circularly-symmetric complexGaussian distributed with zero mean, where the covariancesmatrices are σ (cid:96) I and σ y I , respectively. We simulate a base sce-nario with D = 20 receiving antennas, quadrature phase-shiftkeying (QPSK) modulation (the constellation is normalized toyield unit energy), and σ y = 2 . Using this base conﬁguration,we vary one of the parameters while holding the rest ﬁxed.We set the hyperparameters of the IFFSM as σ H = 1 , λ = 0 . , κ = 1 , α = 1 , β = 2 and β = 0 . . The choiceof β and β is based on the fact that we expect the activeMarkov chains to remain active and, therefore, the transitionprobabilities from active to active b m , which are Beta ( β , β ) distributed, are a priori expected to be of the order of oneover a few hundred bits. In order to avoid getting trapped inlocal modes of the posterior, we use a tempering procedure.We ﬁrst add artiﬁcial noise so that the resulting σ y = 10 . and after each iteration we reduce this noise by a factor of . , until there is no artiﬁcial noise left. After that, we runadditional iterations to favor exploitation. EEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 9

Evaluation.

For the recovered transmitters, we evaluate theperformance in terms of the activity detection error rate(ADER), the symbol error rate (SER) and the mean squarederror (MSE) of the channel coefﬁcient estimates. The ADERis the probability of detecting activity (inactivity) in a trans-mitter while that transmitter is actually inactive (active). Whencomputing the SER, an error is computed at time t wheneverthe estimated symbol for a transmitter differs from the actualtransmitted symbol, considering that the transmitted symbolwhile inactive is x tm = 0 . The MSE for each transmitter is MSE m = 1 LD (cid:88) d,(cid:96) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( h (cid:96)m ) d − (ˆ h (cid:96)m ) d (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , (27)being ˆ h (cid:96)m the inferred channel coefﬁcients.We compare our approach (denoted by IFFSM in the plots)with three genie-aided methods which have perfect knowledgeof the true number of transmitters and channel coefﬁcients. In particular, we run: (i) The PGAS algorithm that we use inStep 2 of our inference algorithm (G-PGAS), (ii) the FFBSalgorithm over the equivalent factorial HMM with state spacecardinality |A (cid:83) { }| L (G-FFBS), and (iii) the optimum BCJRalgorithm [51], over an equivalent single HMM with a numberof states equal to |A (cid:83) { }| LN t , being N t the true number oftransmitters (G-BCJR). Due to their complexity, we only runthe BCJR algorithm in scenarios with |A (cid:83) { }| LN t ≤ ,and the FFBS in scenarios with |A (cid:83) { }| L ≤ .For each considered scenario, we run independent sim-ulations, each with different simulated data. We run iterations of our inference algorithm, and obtain the inferredsymbols ˆ x tm as the component-wise maximum a posteriori (MAP) solution over the last samples. The estimates ofthe channel coefﬁcients ˆ h (cid:96)m are then obtained as the MAPsolution, conditioned on the data and the inferred symbols ˆ x tm . For the BCJR algorithm, we obtain the symbol estimatesaccording to the component-wise MAP solution for eachtransmitter m and each instant t . For the genie-aided PGASand FFBS methods, we follow a similar approach by runningthe algorithms for iterations and considering the last samples to obtain the symbol estimates. Unless otherwisespeciﬁed, we use P = 3000 particles for PGAS. Results with perfect knowledge of the channel memory.

We ﬁrst evaluate the performance of our model and inferenceprocedure assuming that the memory of the channel is accu-rately known. We initially consider memoryless channels, i.e.,with memory length L = 1 . Figure 6 shows the results whenthe noise variance varies from σ y = 10 . to σ y = 1 (SNRfrom 1dB to 13dB). Speciﬁcally, we ﬁrst show a box-plotrepresentation of the inferred number of transmitters M + , Our procedure is fully blind, so it might return spurious transmitters orit might mix two or more transmitters in a single chain. We discard thosechains in our evaluations. For the genie-aided methods, we use a m = 0 . and b m = 0 . . We obtain the per-user SNR (in dB) as

10 log (cid:18) DLσ y (cid:19) . We depict the 25-th, 50-th and 75-th percentiles in the standard format,as well as the most extreme values. Moreover, the mean value is representedwith a pink circle, and the true number of transmitters is represented with agreen star. M + (a) Inferred M + . R e c o v e r ed T x (b) Recovered transmitters. −5 −4 −3 −2 −1 A D E R (c) ADER. −4 −3 −2 −1 SE R (d) SER. −3 −2 −1 M SE (e) MSE.Fig. 8. Results for different number of receiving antennas ( L = 1 ). and also a box-plot representation of the number of recoveredtransmitters (i.e., how many of the true transmitters we are ableto recover). For the recovered transmitters, we additionallyshow the ADER, the SER, and the MSE. As expected, theperformance improves with the SNR. For low values of theSNR, transmitters are more likely to be masked by the noiseand, therefore, on average we recover a slightly lower numberof transmitters. We also observe that the performance (in termsof ADER and SER) of the proposed IFHMM is very similarto the genie-aided methods, which have perfect knowledge ofthe number of transmitters and channel coefﬁcients.Figure 7 shows the results when the true number of transmit-ters ranges from 2 to 6. Although the number of parametersto be estimated grows with the number of transmitters, weobserve that the performance is approximately constant. TheIFHMM recovers all the transmitters in nearly all the simula-tions, with performance similar to the genie-aided methods.Figure 8 shows the results when the number of receivingantennas varies from 2 to 30. In this ﬁgure, we observe thatwe need at least 6 receivers in order to properly recover themessages of all the transmitters. We see no further improve-ment in detectability with more than 8 receivers. As expected,the performance in terms of ADER and SER improves whenthe number of receiving antennas increases, as the diversity in EEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 10 M + (a) Inferred M + . R e c o v e r ed T x (b) Recovered transmitters. −5 −4 −3 −2 −1 A D E R SNR IFFSMG−PGAS (c) ADER. −4 −3 −2 −1 SE R SNR IFFSMG−PGAS (d) SER. −2 −1 M SE SNR (e) MSE.Fig. 9. Results for different SNRs ( L = 5 ). the observations helps to recover the transmitted symbols andchannel coefﬁcients. This behavior is similar to the obtained bythe genie-aided PGAS and FFBS, as shown in this ﬁgure. Notethat the MSE curve ﬂattens after approximately receivers,as it reaches the threshold imposed by the noise level.We now evaluate the performance of our model and infer-ence procedure for frequency-selective channels, i.e., consider-ing L > . Figure 9 shows the results when the noise variancevaries from σ y = 10 . to σ y = 10 . (SNR from 2dB to11dB), considering L = 5 to generate the data. We use the truevalue of the channel length L for inference. In the ﬁgure, weshow the ADER, the SER, the MSE, a box-plot representationof the inferred number of transmitters M + , and also a box-plotrepresentation of the number of recovered transmitters. As inthe memoryless case, the performance improves with the SNR.However, the values of the resulting SER and ADER are muchlower than in the memoryless case for a ﬁxed value of the noisevariance. For instance, the SER in Figure 6 for σ y = 10 . (SNR= dB) is one order of magnitude above the reportedSER in Figure 9 for the same value of the noise variance(SNR= dB). This is a sensible result, because the channelmemory adds more redundancy in the observed sequence, andtherefore the resulting SNR is higher for a ﬁxed value of thenoise variance. Our inference algorithm is able to exploit such M + (a) Inferred M + . R e c o v e r ed T x (b) Recovered transmitters. −5 −4 −3 −2 A D E R L IFFSMG−PGASG−FFBS (c) ADER. −4 −3 −2 −1 SE R L IFFSMG−PGASG−FFBS (d) SER. −2 −1 M SE L (e) MSE.Fig. 10. Results for different values of L . redundancy to better estimate the transmitted symbols, despitethe fact that more channel coefﬁcients need to be estimated.Note that the performance in terms of ADER and SER issimilar to the genie-aided PGAS-based method.In Figure 10, we show the obtained results for differentvalues of the parameter L , ranging from to . We usethe true value of the channel length L for inference, and weconsider σ y = 10 . in these experiments. The ﬁgure showsthe ADER, the SER, the MSE, and box-plot representationsof the inferred number of transmitters M + and the numberof recovered transmitters. Here, it becomes clear that ourmodel can exploit the redundancy introduced by the channelmemory, as the performance in terms of SER and ADERimproves as L increases. The MSE also improves with L ,although it reaches a constant value for L > , similarly tothe experiments in which we increase the number of receivers(although differently, in both cases we add redundancy to theobservations). We can also observe that the performance issimilar to the genie-aided methods (we do not run the FFBSalgorithm for L = 5 due to its computational complexity).Finally, we run an experiment with a different constellationorder. Speciﬁcally, we use a 1024-QAM constellation, nor-malized to yield unit energy, and we vary the noise variancefrom σ y = 10 . to σ y = 10 . (SNR from 28dB to 37dB). EEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 11

28 31 34 370123456 SNR M + (a) Inferred M + .

28 31 34 37012345 SNR R e c o v e r ed T x (b) Recovered transmitters.

28 31 34 3710 −6 −5 −4 A D E R SNR IFFSMG−PGAS (c) ADER.

28 31 34 3710 −4 −3 −2 −1 SE R SNR IFFSMG−PGAS (d) SER.

28 31 34 3710 −6 −5 −4 −3 M SE SNR (e) MSE.Fig. 11. Results for different SNRs using a 1024-QAM constellation.

In order to improve the mixing of the sampling procedure,and due to the high cardinality of A , after running iterations of the inference algorithm, we run additionaliterations in which we sequentially sample each hidden chainconditioned on the current value of the remaining ones,similarly to the standard FFBS algorithm for factorial models.We proceed similarly for the genie-aided PGAS method. Wedo not run the genie-aided FFBS or BCJR methods due totheir high computational complexity in this scenario. We plotin Figure 11 the ADER, SER, and MSE, as well as a box-plot representation of the inferred number of transmitters M + and the number of recovered transmitters. We observe that,for the considered values of the SNR, we can recover the ﬁvetransmitters in nearly all the cases. Furthermore, the SER curveof our approach closely follows the curve of the genie-aidedmethod with perfect knowledge of the channel coefﬁcients andthe number of transmitters. Results for mismatched channel length.

In the experimentsabove, we have assumed that the true value of the channellength is known at the receiver side. As this scenario mightnot be realistic, we now run an experiment to show that wecan properly estimate the transmitted symbols and the channelcoefﬁcients as long as our inference algorithm considers asufﬁciently large value of L . For this purpose, we use our M + (a) Inferred M + . R e c o v e r ed T x (b) Recovered transmitters. −4 −3 A D E R L IFFSMG−PGASG−FFBS (c) ADER. −3 −2 SE R L IFFSMG−PGASG−FFBS (d) SER. −3 −2 M SE L (e) MSE.Fig. 12. Results for different channel lengths ( L true = 1 ). base experimental setup and generate data using L = 1 (i.e.,memoryless channel). However, we use different values forthe channel length L for inference.In Figure 12, we show the obtained results for L rangingfrom to . The obtained ADER and SER do not signiﬁcantlydegrade with increasing values of L , and we are able to recoverthe ﬁve transmitters in nearly all the cases. Interestingly, theMSE improves as L increases. This is a consequence of theway we measure it when L is larger than the ground truth, aswe compare our channel estimates with zero. The fact that theMSE becomes lower indicates that we obtain better estimatesfor the zero coefﬁcients than for the non-zero ones, which inturn implies that our inference algorithm can properly decreasethe channel variances σ (cid:96) when needed. Sensitivity to the number of particles.

Finally, we evaluatethe impact of the number of particles in the PGAS kernelon the performance of the proposed algorithm. Note that, asthe effective dimensionality of the hidden space increases, weshould expect a larger number of particles to be requiredin order to properly estimate the transmitted symbols. Tosee this, we design an experiment with transmitters and σ y = 10 . . Figure 13 shows the log-likelihood trace plot for iterations of the inference algorithm, with a number ofparticles ranging from to . This experiment is based EEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 12 Iterations l og − li k e li hood

300 particles1000 particles3000 particles10000 particles30000 particles

Fig. 13. Log-likelihood for varying number of particles ( L = 1 ). The initialslope is due to the tempering procedure, in which we linearly increase theSNR at each iteration. on a single run of the algorithm, as it is enough to understandits behavior. It can be seen that the best performance isachieved with the largest number of considered particles.Additionally, this plot suggests that P = 10000 particles areenough for this scenario.We also show in Figure 14 the number of inferred trans-mitters M + , as well as the number of recovered transmitters,for each value of P . In this ﬁgure, we represent with a greenstar the true number of transmitters (again, these results areobtained after a single run of the algorithm). Although weinfer M + = 10 transmitters with only P = 3000 particles,Figure 14b shows that we only recover of them. The othertwo transmitters have been mixed in the last two inferredchains. In agreement with Figure 13, increasing the numberof particles from P = 10000 to does not seem toimprove performance: in both cases our algorithm is able torecover all the transmitters. Furthermore, even the genie-aidedPGAS algorithm, which has perfect knowledge of the channelcoefﬁcients, needs a large value of P (above ) in orderto recover all the transmitters.We can conclude from these plots that we should adjustthe number of particles based on the number of transmitters.However, the number of transmitters is an unknown quantitythat we need to infer. There are two ways to overcome thisapparent limitation. A sensible solution is to adaptively selectthe number of particles P as a function of the current numberof active transmitters, M + . In other words, as we gather evi-dence for the presence of more transmitters, we consequentlyincrease P . A second approach, which is computationally lessdemanding but may present poorer mixing properties, consistsin running the PGAS inference algorithm sequentially overeach chain, conditioned on the current value of the remainingtransmitters, similarly to the standard FFBS procedure for theIFHMM [34]. Alternatively, we can apply the PGAS algorithmover ﬁxed-sized blocks of randomly chosen transmitters. B. Ray-tracing Model

With the aim of considering a more realistic communicationscenario, we use WISE software [52] to design an indoorwireless system. This software tool, developed at Bell Labo-ratories, includes a 3D ray-tracing propagation model, as well M + (a) Inferred M + .

300 1000 3000 10000 300000246810 Number of particles R e c o v e r ed T x IFFSMG−PGASG−FFBS (b) Recovered transmitters.Fig. 14. Number of inferred and recovered transmitters for varying numberof particles ( L = 1 ). The green star indicates the ground truth. as algorithms for computational geometry and optimization,to calculate measures of radio-signal performance in user-speciﬁed regions. Its predictions have been validated withphysical measurements.Using WISE software and the map of an ofﬁce locatedat Bell Labs Crawford Hill, we place receivers and transmitters across the ofﬁce, intentionally placing the trans-mitters together in order to ensure that interferences occur inthe nearby receivers. Figure 15 shows the considered map.We consider a Wi-Fi transmission system with a bandwidthof MHz or, equivalently, ns per channel tap. We simulatethe transmission of -symbol bursts over this communica-tion system, using a QPSK constellation normalized to yieldunit energy. We scale the channel coefﬁcients by a factor of , and we consequently scale the noise variance by ,yielding σ y ≈ . × − . We set the transmission power to dBm. Each transmitter becomes active at a random point,uniformly sampled in the interval [1 , T / . We set T = 2000 time instants, so the burst duration of symbols ensuresoverlapping among all the transmitted signals.Wi-Fi systems are not limited by the noise level, which istypically small enough, but by the users’ interferences, whichcan be avoided by using a particular frequency channel foreach user. Our goal is to show that cooperation of receivers in Fig. 15. Plane of the considered ofﬁce building. Circles represent receivers,and crosses represent transmitters. All transmitters and receivers are placedat a height of metres. EEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 13 a Wi-Fi communication system can help recover the symbolstransmitted by several users even when they simultaneouslytransmit over the same frequency channel, therefore allowingfor a larger number of users in the system.In our experiments, we vary L from to . Five channel tapscorrespond to the radio signal travelling a distance of 750 m,which should be enough given the dimensions of this ofﬁcespace (the signal suffers attenuation when it reﬂects off thewalls, so we should expect it to be negligible in comparisonto the line-of-sight ray after a 750-m travelling distance). Fol-lowing the tempering procedure described above, we initializethe algorithm with σ y ≈ . and we linearly increase theSNR for around iterations, running additionaliterations afterwards. We compare our IFFSM with a non-binary IFHMM model with state space cardinality |A (cid:83) { }| L using FFBS sweeps for inference (we do not run the FFBSalgorithm for L = 5 due to its computational complexity). Inthis case, we set the hyperparameters as σ H = 0 . , λ = 0 . , κ = 1 , α = 1 , β = 2 and β = 0 . .We show in Table Ia the number of transmitters recoveredafter running the two inference algorithms, together with theinferred value of M + , averaged for the last iterations.We see that the IFFSM recovers the transmitters in allcases, and it does not tend to create spurious chains. Incontrast, the IFHMM signiﬁcantly overestimates the number oftransmitters, which deteriorates the overall symbol estimatesand, as a consequence, not all the transmitted symbols arerecovered. In the best case, the IFHMM only recovers outof the transmitters. In the extreme case of L = 4 , theinference algorithm for the IFHMM estimates chains, butnone of them corresponds to a true transmitter; rather, eachtransmitter is explained by a combination of several chains.For completeness, we additionally report in Table Ib theMSE of the channel coefﬁcients, averaged for the last iterations. We sort the transmitters so that the MSE is mini-mized, and ignore the extra inferred transmitters. As expected,the MSE decreases as we consider a larger value of L , sincethe model better ﬁts the true radio propagation model.VI. C ONCLUSIONS AND D ISCUSSION

In this paper, we have proposed a fully blind approach forjoint channel estimation and detection of the transmitted datawhen the number of transmitters is unknown. Our approachis based on a BNP model, which we refer to as IFFSM,

Model L − (a) M + .Model L .

13 8 .

09 1 .

39 0 .

37 0 . IFHMM .

20 4 .

11 0 . − − (b) MSE of the channel coefﬁcients ( × − ).TABLE IR ESULTS FOR THE W I -F I EXPERIMENT . and considers a potentially unbounded number of stochasticﬁnite-memory FSMs that evolve independently over time. Ourmodel and the PGAS-based inference algorithm allow us toreadily account for frequency-selective channels, avoiding theexponential complexity with respect to the channel length ofprevious approaches. We have evaluated the performance ofour approach through a comprehensive experimental designusing both synthetic and real data.These results are promising for the suitability of BNPsapplied to signal processing for communications. As a result,this paper opens several further research lines on this area.Some examples of such possible future research lines are: • Improving the scalability of the inference algorithm. Thiswould allow us to account for both a larger number oftransmitters and larger observation sequences at a sub-linear computational cost. We believe that further investi-gations on faster approximate inference algorithms wouldbe of great interest and importance for the developmentof practical receivers that operate in real-time. • An online inference algorithm. In many cases, the re-ceiver does not have access to a ﬁxed window of obser-vations, but data arrives instead as a never-ending stream.Thus, an online inference algorithm instead of a batch oneis also of potential interest. • Modelling of time-varying channels. Our current ap-proach is restricted to static channels, i.e., the channelcoefﬁcients do not vary over time. A potentially usefulresearch line may consist in taking into account thetemporal evolution of the channel coefﬁcients. • An extension of the model that accounts for codingschemes. A channel coding scheme can be used in orderto add redundancy to the user’s data, effectively decreas-ing the resulting bit error probability. This redundancycan potentially be included into the model and exploitedby the inference algorithm. • An extension for sparse code multiple access (SCMA).Our approach can also be adapted to the particularitiesof the uplink SCMA techniques [53], [54], which usenon-orthogonal transmission signals in the random ac-cess channel to reduce latency. The signal structure andthe sparsity of the code/pilot scheme may simplify theinference procedure of the Bayesian model, at the costof an upper bound on the number of potential users inthe network. This would result in a model that does notneed to be nonparametric, but our joint approach may stillreduce the errors of the current several-stage detectors.An extension of our IFFSM, less related to the aforemen-tioned research lines, consists in a semi-Markov approach.In this way, transmitters are assumed to send only a burstof symbols during the observation period, and a changepointdetection algorithm may detect the (de)activation instants.A

CKNOWLEDGMENTS

Francisco J. R. Ruiz acknowledges the European UnionH2020 program (Marie Skłodowska-Curie grant agreement706760).

EEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 14 R EFERENCES[1] “Small cell market status,” Informa Telecoms and Media. Issue 4, Tech.Rep., December 2012.[2] “M2M white paper: the growth of device connectivity,” The FocalPointGroup, Tech. Rep., 2003. [Online]. Available: goo.gl/S0j5O[3] “Global machine to machine communication,” Vodafone, Tech. Rep.,2010. [Online]. Available: goo.gl/4V8az[4] “Device connectivity unlocks value,” Ericsson, Tech. Rep., Jan. 2011.[Online]. Available: goo.gl/alou6[5] S.-Y. Lien, K.-C. Chen, and Y. Lin, “Toward ubiquitous massive accessesin 3GPP machine-to-machine communications,”

IEEE CommunicationsMagazine , vol. 49, no. 4, pp. 66–74, 2011.[6] H. S. Dhillon, H. Huang, H. Viswanathan, and R. A. Valenzuela, “Funda-mentals of throughput maximization with random arrivals for M2M com-munications,”

IEEE Transactions on Communications , vol. 62, no. 11,pp. 4094–4109, Nov. 2014.[7] Y. Chen and W. Wang, “Machine-to-machine communication in LTE-A,”in , Sep. 2010.[8] H. S. Dhillon, H. C. Huang, H. Viswanathan, and R. A. Valenzuela,“Power-efﬁcient system design for cellular-based machine-to-machinecommunications,”

CoRR , vol. abs/1301.0859, 2013. [Online]. Available:http://arxiv.org/abs/1301.0859[9] C.-Y. Tu, C.-Y. Ho, and C.-Y. Huang, “Energy-efﬁcient algorithms andevaluations for massive access management in cellular based machineto machine communications,” in

Proceedings of 2011 IEEE VehicularTechnology Conference , Sep. 2011, pp. 1–5.[10] C.-Y. Ho and C.-Y. Huang, “Energy-saving massive access control andresource allocation schemes for M2M communications in OFDMAcellular networks,”

IEEE Wireless Communications Letters , vol. 1, no. 3,pp. 209–212, Jun. 2012.[11] J. Hoydis, S. T. Brink, and M. Debbah, “Massive MIMO in the UL/DLof cellular networks: How many antennas do we need?”

IEEE Journalon Selected Areas in Communications , vol. 31, no. 2, pp. 160–171, Feb.2013.[12] L. Lu, G. Y. Li, A. L. Swindlehurst, A. Ashikhmin, and R. Zhang, “Anoverview of massive MIMO: Beneﬁts and challenges,”

IEEE Journal ofSelected Topics in Signal Processing , vol. 8, no. 5, pp. 742–758, Oct.2014.[13] M. A. V´azquez and J. M´ıguez, “User activity tracking in DS-CDMAsystems,”

IEEE Transactions on Vehicular Technology , vol. 62, no. 7,pp. 3188–3203, 2013.[14] S. Verdu,

Multiuser detection . Cambridge University Press, 1998.[15] M. Jiang and L. Hanzo, “Multiuser MIMO-OFDM detection for nextgeneration wireless systems,”

Proceedings of the IEEE , vol. 95, no. 7,pp. 1430–1469, Jul. 2007.[16] W.-C. Wu and K.-C. Chen, “Identiﬁcation of active users in synchronousCDMA multiuser detection,”

IEEE Journal on Selected Areas in Com-munications , vol. 16, no. 9, pp. 1723–1735, dec 1998.[17] P. Pad, A. Mousavi, A. Goli, and F. Marvasti, “Simpliﬁed MAP-MUDfor active user CDMA,”

IEEE Communications Letters , vol. 15, no. 6,pp. 599–601, Jun. 2011.[18] Z. Pi and U. Mitra, “On blind timing acquisition and channel estimationfor wideband multiuser DS-CDMA systems,”

Journal of VLSI signalprocessing systems for signal, image and video technology , vol. 30, no.1-3, pp. 127–142, 2002.[19] S. Buzzi, L. Venturino, A. Zappone, and A. De Maio, “Blind userdetection in doubly dispersive DS/CDMA fading channels,”

IEEE Trans-actions on Signal Processing , vol. 58, no. 3, pp. 1446–1451, Mar. 2010.[20] A. Mousavi, P. Pad, P. Delgosha, and F. Marvasti, “A new decodingscheme for errorless codes for overloaded CDMA with active userdetection,” in ,May 2011, pp. 201–205.[21] K. W. Halford and M. Brandt-Pearce, “New-user identiﬁcation in aCDMA system,”

IEEE Transactions on Communications , vol. 46, no. 1,pp. 144–155, jan 1998.[22] A. Angelosante, E. Biglieri, and M. Lops, “Multiuser detection in adynamic environment - Part II: Joint user identiﬁcation and parameterestimation,”

IEEE Transactions on Information Theory , vol. 55, pp.2365–2374, May 2009.[23] H. Zhu and G. B. Giannakis, “Exploiting sparse user activity in multiuserdetection,”

IEEE Transactions on Communications , vol. 59, no. 2, pp.454–465, Feb. 2011.[24] D. Angelosante, E. Grossi, G. B. Giannakis, and M. Lops, “Sparsity-aware estimation of CDMA system parameters,” in

IEEE 10th Workshopon Signal Processing Advances in Wireless Communications ’09 , Jun.2009, pp. 697–701. [25] D. Angelosante, E. Biglieri, and M. Lops, “Low-complexity receiversfor multiuser detection with an unknown number of active users,”

SignalProcessing , vol. 90, pp. 1486–1495, May 2010.[26] I. Valera, F. J. R. Ruiz, and F. Perez-Cruz, “Inﬁnite factorial unboundedhidden Markov model for blind multiuser channel estimation,” in , May2014.[27] ——, “Inﬁnite factorial unbounded-state hidden Markov model,”

IEEETransactions on Pattern Analysis and Machine Intelligence , vol. 38,no. 9, pp. 1816–1828, 2015.[28] I. Valera, F. J. R. Ruiz, L. Svensson, and F. Perez-Cruz, “A Bayesiannonparametric approach for blind multiuser channel estimation,” in

European Signal Processing Conference , 2015.[29] M. S. Ali, E. Hossain, , and D. I. Kim, “LTE/LTE-A random access formassive machine-type communications in smart cities,”

IEEE Commu-nication Magazine , vol. 55, no. 1, pp. 76–83, Jan. 2017.[30] M. Polese, M. Centenaro, A. Zanella, and M. Zorzi, “M2m massiveaccess in lte: Rach performance evaluation in a smart city scenario,” in

Proceedings of the International Conference on Communications , May2016.[31] F. Lindsten, M. I. Jordan, and T. B. Sch¨on, “Particle Gibbs with ancestorsampling,”

Journal of Machine Learning Research , vol. 15, no. 1, pp.2145–2184, 2014.[32] C. Andrieu, A. Doucet, and R. Holenstein, “Particle Markov chain MonteCarlo methods,”

Journal of the Royal Statistical Society Series B , vol. 72,no. 3, pp. 269–342, 2010.[33] Z. Ghahramani and M. I. Jordan, “Factorial hidden Markov models,”

Machine Learning , vol. 29, no. 2–3, pp. 245–273, 1997.[34] J. Van Gael, Y. W. Teh, and Z. Ghahramani, “The inﬁnite factorial hiddenMarkov model,” in

Advances in Neural Information Processing Systems ,vol. 21, 2009.[35] J. E. Hopcroft, R. Motwani, and J. D. Ullman,

Introduction to AutomataTheory, Languages, and Computation (3rd Edition) . Boston, MA, USA:Addison-Wesley Longman Publishing Co., Inc., 2006.[36] J. A. Anderson and T. J. Head,

Automata Theory with Modern Applica-tions . Cambridge University Press, 2006.[37] J. Wang,

Handbook of Finite State Based Models and Applications ,1st ed. Chapman & Hall/CRC, 2012.[38] L. E. Baum and T. Petrie, “Statistical inference for probabilisticfunctions of ﬁnite state Markov chains,”

The Annals of MathematicalStatistics , vol. 37, no. 6, pp. 1554–1563, 1966.[39] L. Rabiner and B. Juang, “An introduction to hidden Markov models,”

ASSP Magazine, IEEE , vol. 3, no. 1, pp. 4–16, Jan. 1986.[40] M. I. Jordan,

Hierarchical models, nested models and completely randommeasures , M.-H. Chen, D. Dey, P. Mueller, D. Sun, and K. Ye, Eds. NewYork, (NY): Springer, 2010.[41] P. Orbanz and Y. W. Teh, “Bayesian nonparametric models,” in

Ency-clopedia of Machine Learning . Springer, 2010.[42] T. L. Grifﬁths and Z. Ghahramani, “The Indian buffet process: An in-troduction and review,”

Journal of Machine Learning Research , vol. 12,pp. 1185–1224, 2011.[43] Y. W. Teh, D. G¨or¨ur, and Z. Ghahramani, “Stick-breaking constructionfor the Indian buffet process,” in

Proceedings of the InternationalConference on Artiﬁcial Intelligence and Statistics , vol. 11, 2007.[44] C. P. Robert and G. Casella,

Monte Carlo Statistical Methods (SpringerTexts in Statistics) . Springer New York., 2005.[45] W. R. Gilks and P. Wild, “Adaptive rejection sampling for Gibbssampling,”

Journal of the Royal Statistical Society. Series C (AppliedStatistics) , vol. 41, no. 2, pp. 337–348, 1992.[46] S. L. Scott, “Bayesian methods for hidden Markov models: recursivecomputing in the 21st century,”

Journal of the American StatisticalAssociation , vol. 97, no. 457, pp. 337–351, 2002.[47] N. Whiteley, C. Andrieu, and A. Doucet, “Efﬁcient Bayesian inferencefor switching state-space models using particle Markov chain MonteCarlo methods,” Bristol Statistics Research, Tech. Rep., 2010.[48] F. Lindsten and T. B. Sch¨on, “Backward simulation methods for MonteCarlo statistical inference,”

Foundations and Trends in Machine Learn-ing , vol. 6, no. 1, pp. 1–143, 2013.[49] I. Valera, F. J. R. Ruiz, L. Svensson, and F. Perez-Cruz, “Inﬁnite facto-rial dynamical model,” in

Advances in Neural Information ProcessingSystems , 2015.[50] S. Sesia, I. Touﬁk, and M. Baker,

LTE - The UMTS Long Term Evolution:From Theory to Practice , 2nd ed. Wiley, 2011.[51] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding oflinear codes for minimizing symbol error rate,”

IEEE Transactions onInformation Theory , vol. 20, no. 2, pp. 284–287, Mar. 1974.

EEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 15 [52] S. J. Fortune, D. M. Gay, B. W. Kernighan, O. Landron, R. A. Valen-zuela, and M. H. Wright, “WISE design of indoor wireless systems:Practical computation and optimization,”

IEEE Computing in Science &Engineering , vol. 2, no. 1, pp. 58–68, Mar. 1995.[53] K. Au, L. Zhang, H. Nikopour, E. Yi, A. Bayesteh, U. Vilaipornsawai,J. Ma, and P. Zhu, “Uplink contention based SCMA for 5G radio access,”in

Globecom Workshops (Austin, TX) , 2014.[54] A. Bayesteh, E. Yi, H. Nikopour, and H. Baligh, “Blind detectionof SCMA for uplink grant-free multiple-access,” in ,2014.

Francisco J. R. Ruiz received the B.Sc. degreein Electrical Engineering from the University ofSeville, Spain, and the M.Sc. degree in Multimediaand Communications from the University Carlos IIIin Madrid, Spain, in 2010 and 2012, respectively.He received the Ph.D. degree in 2015 from theUniversity Carlos III in Madrid. He is currently aPostdoctoral Research Scientist at the Department ofComputer Science in Columbia University, and at theEngineering Department in the University of Cam-bridge. Francisco holds a Marie Skłodowska-Curiefellowship in the context of the E.U. Horizon 2020 program. His researchis focused on statistical machine learning, specially Bayesian modeling andinference.

Isabel Valera is currently leading the ProbabilisticLearning Group in the Department of EmpiricalInference at the Max Planck Institute for IntelligentSystems in T¨ubingen. Prior to this she was a post-doctoral researcher at the University of Cambridge,and previously, she was a Humboldt post-doctoralfellowship holder at Max Planck Institute for Soft-ware Systems. She obtained her PhD in 2014 andher Master degree in 2012, both from the UniversityCarlos III in Madrid.

Lennart Svensson was born in ¨Alv¨angen, Swedenin 1976. He received the M.S. degree in electri-cal engineering in 1999 and the Ph.D. degree in2004, both from Chalmers University of Technology,Gothenburg, Sweden. He is currently Professor ofSignal Processing, again at Chalmers Universityof Technology. His main research interests includemachine learning and Bayesian inference in general,and nonlinear ﬁltering, deep learning, and multi-target tracking in particular.