[PDF] An Efficient Active Set Algorithm for Covariance Based Joint Data and Activity Detection for Massive Random Access with Massive MIMO

Abstract

This paper proposes a computationally efficient algorithm to solve the joint data and activity detection problem for massive random access with massive multiple-input multiple-output (MIMO). The BS acquires the active devices and their data by detecting the transmitted preassigned nonorthogonal signature sequences. This paper employs a covariance based approach that formulates the detection problem as a maximum likelihood estimation (MLE) problem. To efficiently solve the problem, this paper designs a novel iterative algorithm with low complexity in the regime where the device activity pattern is sparse \unicode{x2013} a key feature that existing algorithmic designs have not previously exploited for reducing complexity. Specifically, at each iteration, the proposed algorithm focuses on only a small subset of all potential sequences, namely the active set, which contains a few most likely active sequences (i.e., transmitted sequences by all active devices), and performs the detection for the sequences in the active set. The active set is carefully selected at each iteration based on the current detection result and the first-order optimality condition of the MLE problem. Simulation results show that the proposed active set algorithm enjoys significantly better computational efficiency (in terms of the CPU time) than the state-of-the-art algorithms.

Full PDF

AAN EFFICIENT ACTIVE SET ALGORITHM FOR COVARIANCE BASED JOINT DATA ANDACTIVITY DETECTION FOR MASSIVE RANDOM ACCESS WITH MASSIVE MIMO

Ziyue Wang (cid:63), § , Zhilin Chen † , Ya-Feng Liu § , Foad Sohrabi † , and Wei Yu † (cid:63) School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China † Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada § LSEC, ICMSEC, AMSS, Chinese Academy of Sciences, Beijing, ChinaEmail: [email protected], { zchen, fsohrabi, weiyu } @ece.utoronto.ca, yaﬂ[email protected] ABSTRACT

This paper proposes a computationally efﬁcient algorithm to solvethe joint data and activity detection problem for massive random ac-cess with massive multiple-input multiple-output (MIMO). The BSacquires the active devices and their data by detecting the transmittedpreassigned nonorthogonal signature sequences. This paper employsa covariance based approach that formulates the detection problemas a maximum likelihood estimation (MLE) problem. To efﬁcientlysolve the problem, this paper designs a novel iterative algorithm withlow complexity in the regime where the device activity pattern issparse – a key feature that existing algorithmic designs have not pre-viously exploited for reducing complexity. Speciﬁcally, at each it-eration, the proposed algorithm focuses on only a small subset ofall potential sequences, namely the active set , which contains a fewmost likely active sequences (i.e., transmitted sequences by all activedevices), and performs the detection for the sequences in the activeset. The active set is carefully selected at each iteration based on thecurrent detection result and the ﬁrst-order optimality condition ofthe MLE problem. Simulation results show that the proposed activeset algorithm enjoys signiﬁcantly better computational efﬁciency (interms of the CPU time) than the state-of-the-art algorithms.

Index Terms — Active set, joint data and active detection, mas-sive MIMO, massive random access.

1. INTRODUCTION

Massive machine-type communication (mMTC) is a main use casein the ﬁfth generation (5G) cellular systems [1]. A challenging taskin mMTC is the uncoordinated random access, in which a large num-ber of sporadically active devices wish to send small data to the base-station (BS) in the uplink [2]. To meet the low-latency requirementin mMTC, the grant-free random access scheme could be a promis-ing solution [3, 4], in which each device is preassigned multiple sig-nature sequences from a large set of nonorthogonal sequences, andthe active device selects one sequence from the assigned sequencesto transmit. The data and device identiﬁcation are embedded in thesequence selection. The BS then detects the active devices and de-codes their data by detecting which sequences are transmitted.By exploiting the sporadic device trafﬁc, the joint data and activ-ity detection problem has been formulated as a compressed sensingproblem [4], in which the data and the device activity are recoveredalong with the instantaneous channel state information (CSI) via theapproximate message passing (AMP) algorithm. Similar methodshave also been used for the scenario where each device is associated with only one sequence for the purpose of device activity detection[5, 6, 7]. However, if the CSI is not needed, it is actually possible torecover the data and activities without recovering the channel coefﬁ-cients using a convariance based approach [8, 9], which outperformsthe AMP method, especially in the massive multiple-input multiple-output (MIMO) systems. In the covariance based method, the de-tection problem is formulated as a maximum likelihood estimation(MLE) problem, in which the channel coefﬁcients are treated as ran-dom samples and averaged out in computing the covariance. Thiscovariance based method is ﬁrst suggested in [10] for device activitydetection, and it has also been used for a few related data/activity de-tection problems, e.g., data decoding for unsourced random access[11], cooperative activity detection in cell-free systems [12], and ac-tivity detection with interference [13].The coordinate descent (CD) algorithm that iteratively updatesthe sequence selection for each device is commonly used in solv-ing the detection problem in the covariance based approach, whichachieves excellent detection performance; see [8, 10, 13] for moredetails. The possible reason for the popularity of the CD algorithmis that its subproblem (i.e., the original problem with respect to onlyone variable) admits a nice closed-form solution [10], which makesit easily implementable. To further speed up the convergence of theCD algorithm, a new coordinate sampling strategy is proposed in[14]. Other algorithms for solving the detection problem include theexpectation maximization/minimization (EM) algorithm (i.e., sparseBayesian learning) [15] and the SPICE algorithm [16]. However,none of the above mentioned solutions take advantage of the spar-sity of the true solution of the detection to lower their algorithmiccomplexities, thus becoming less computationally efﬁcient when theproblem’s dimension is huge, which is the case in mMTC.In this paper, we propose a computationally efﬁcient algorithmthat carefully exploits the sparsity of the true solution to solve thejoint data and activity detection problem in the covariance based ap-proach. Speciﬁcally, we propose an iterative algorithm that attacksthe original large-scale problem by solving a sequence of small-sizeproblems. We focus on only a small subset of all sequences at eachiteration, termed as the active set , which contains only the mostlikely active sequences and can be seen as an approximation of theset of active sequences. We perform joint data and activity detectionfor only the sequences in the active set using a low-complexity spec-tral projected gradient (PG) algorithm [17]. We carefully update theactive set at each iteration based on the current detection result andthe ﬁrst-order optimality condition of the joint detection problem. Active sequences in this paper refer to transmitted sequences by all activedevices in the joint data and activity detection problem. a r X i v : . [ ee ss . SP ] F e b e also establish the convergence of the proposed active set algo-rithm. Simulation results show that as compared to the commonlyused CD algorithm in the covariance based approach, the proposedactive set algorithm has much higher computational efﬁciency (interms of the CPU time).

2. SYSTEM MODEL AND PROBLEM FORMULATION2.1. System Model

Consider an uplink single-cell system where there are one BSequipped with M (cid:29) antennas and N devices each equippedwith a single antenna. Assume a quasi-static narrow-band chan-nel model, where the wireless channels remain unchanged withineach transmission block but may vary over different blocks. Let √ g n h n ∈ C M × denote the channel vector from device n to theBS, where g n ≥ is the large-scale fading component (dependingon the device’s location), and h n ∈ C M × is the Rayleigh fadingcomponent following the i.i.d. complex Gaussian distribution.In each coherence block, only K (cid:28) N devices are active (dueto the sporadic trafﬁc), and each active device wishes to transmit J bits of data to the BS, where J is a small value in the mMTCscenario. Assume that each device n has a unique signature sequenceset S n = { s n, , s n, , . . . , s n,Q } , where s n,q ∈ C L × , ≤ q ≤ Q (cid:44) J , and L is the signature sequence length. When device n isactive and needs to send J bits of data, it selects one sequence from S n to transmit. Finally, let χ n,q ∈ { , } indicate whether or notsequence q of device n (i.e., s n,q ) is transmitted. Notice that at mostone sequence is selected by each device, then it follows that χ n,q satisﬁes (cid:80) Qq =1 χ n,q ∈ { , } , where (cid:80) Qq =1 χ n,q = 0 indicates thatdevice n is inactive, and (cid:80) Qq =1 χ n,q = 1 indicates that device n isactive.Assume that the sequences transmitted by active devices are per-fectly synchronized. Then the received signal Y ∈ C L × M at the BS,which is a superposition of the transmitted signals from all active de-vices, can be expressed as Y = N (cid:88) n =1 Q (cid:88) q =1 χ n,q s n,q √ g n h Tn + W , (1)where W ∈ C L × M is the effective i.i.d. Gaussian noise whose vari-ance σ w is the background noise power normalized by the devicetransmit power.To obtain a more compact expression of the received sig-nal in (1), we deﬁne S n = [ s n, , . . . , s n,Q ] ∈ C L × Q , D n = √ g n diag { χ n, , . . . , χ n,Q } ∈ C Q × Q , H n = [ h n , . . . , h n ] T ∈ C Q × M for all n. Based on them, we further deﬁne S = [ S , . . . , S N ] ∈ C L × NQ , Γ / = diag { D , . . . , D N } ∈ C NQ × NQ , and H =[ H T , . . . , H TN ] T ∈ C NQ × M . Then, the received signal in (1) canbe compactly expressed as Y = SΓ / H + W . (2)Let γ ∈ C NQ × denote the diagonal entries of Γ , i.e., γ =[ γ T , . . . , γ TN ] T , where γ n = [ γ n, , . . . , γ n,Q ] T ∈ C Q × with γ n,q = g n χ n,q . In the following, we will use γ and Γ interchange-ably. The joint activity and data detection problem is to detect the vari-ables γ n,q ’s, which indicate both the activity of device n and its data (if it is active) from the received signal Y based on the knowledgeof the signature sequence matrix S . Speciﬁcally, if γ n,q > , thendevice n is active and it transmits sequence s n,q ; otherwise device n is inactive.As shown in [10, 8], the above joint activity and data detectionproblem can be mathematically formulated as the MLE problem.Speciﬁcally, it can be observed from (2) that given γ , each columnof Y , denoted as y m ∈ C L × , ≤ m ≤ M , can be seen as inde-pendent samples from a complex Gaussian distribution as y m ∼ CN (cid:16) , SΓ / ΛΓ / S H + σ w I (cid:17) , (3)where the covariance matrix is obtained by computing E [ y m y Hm ] based on (2), Λ is a block diagonal matrix with each block being theall-one matrix E ∈ R Q × Q , and I is an identity matrix. Since there isat most one non-zero entry in each diagonal block D n in Γ / , thecovariance matrix in (3) can be simpliﬁed as SΓ / ΛΓ / S H + σ w I = SΓS H + σ w I . Given γ , we have p ( Y | γ ) = Π Mm =1 p ( y m | γ ) . Based on this and(3), the minimization of − M log p ( Y | γ ) , equivalent to the maxi-mization of p ( Y | γ ) , can be formulated as min γ log (cid:12)(cid:12)(cid:12) SΓS H + σ w I (cid:12)(cid:12)(cid:12) + Tr (cid:18)(cid:16) SΓS H + σ w I (cid:17) − ˆ Σ (cid:19) (4a) s . t . γ ≥ , (4b)where ˆ Σ = YY H /M is the sample covariance matrix computed byaveraging over different antennas, and γ ≥ is due to the fact that γ n,q = g n χ n,q ≥ for all n and q. Since the objective function inproblem (4) depends on Y only through the sample covariance ma-trix ˆ Σ , the approach of estimating activity and associated data basedon solving problem (4) is called the covariance based approach. Itis worthwhile mentioning that problem (4) reduces to the activitydetection problem in [10] if each device has only a single signaturesequence (i.e., J = 0 and thus Q = 1 ).Let f ( γ ) denote the objective function of problem (4). Then,for any q = 1 , , . . . , Q, n = 1 , , . . . , N, the gradient of f ( γ ) withrespect to γ n,q is [ ∇ f ( γ )] n,q = s Hn,q Σ − s n,q − s Hn,q Σ − ˆΣΣ − s n,q . The ﬁrst-order (necessary) optimality condition of problem (4) is [ ∇ f ( γ )] n,q (cid:40) = 0 , if γ n,q > ≥ , if γ n,q = 0 , ∀ q, n, (5)which is equivalent to [ γ − ∇ f ( γ )] + − γ = , where [ · ] + denotes the projection operator onto the nonnegative or-thant. It can be checked that computing ∇ f ( γ ) has a complexity of O ( NQL ) .

3. PROPOSED ACTIVE SET ALGORITHM

The basic idea of the proposed active set algorithm for solving prob-lem (4) is to fully exploit the sparsity of its true solution in the algo-rithmic design, which is in sharp contrast to all existing algorithmssuch as EM [15], CD [10, 8], and SPICE [16]. More speciﬁcally, atach iteration, the active set algorithm ﬁrst judiciously selects an ac-tive set then solves the subproblem deﬁned over the variables in theactive set with all the other variables ﬁxed being zero. Since the truesolution of problem (4) is sparse, it is expected that the cardinality ofthe carefully selected active set, i.e., the dimension of the subprob-lem, will be signiﬁcantly less than the total number of variables ofthe original problem (4). Therefore, solving the subproblem deﬁnedover the variables in the active set will be much more computation-ally efﬁcient than directly solving the original problem (4) (over allvariables).

Selecting the active set.

In principle, a desirable active setshould contain the indices of active sequences in order to correctlydetect the active users and associated data; on the other hand, itscardinality should be as small as possible in order to avoid unnec-essary computation and improve the computational efﬁciency. Ourselection strategy of the active set A k at a given feasible point γ k is based on (i) the engineering insight of the joint activity and datadetection problem and (ii) the ﬁrst-order necessary optimality con-dition (5) of the joint detection problem. In particular, for any givenfeasible point γ k , the selected active set A k contains the indiceswhose corresponding values of γ k are positive and large (based on(i)) and the indices whose corresponding values of ∇ f ( γ k ) are neg-ative and small (due to (ii)). Mathematically, the proposed selectionstrategy of the active set A k is A k = (cid:110) ( i, q ) | γ ki,q > ω k or [ ∇ f ( γ k )] i,q < − ν k (cid:111) , (6)where ω k and ν k are two positive parameters. The choices of theparameters ω k and ν k provide a trade-off between reducing the car-dinality of the active set and not missing the active sequences. Ingeneral, the smaller these two parameters, the larger the cardinalityof the selected active set and the lower probability of missing theactive sequences. To make sure of not missing the active sequences,we let ω k ↓ and ν k ↓ in (6), which means that ω k and ν k mono-tonically decrease and converge to zero. Solving the subproblem.

At the k -th iteration, once the activeset A k is selected, we solve the following subproblem min ˆ f ( γ A k ) (7a) s . t . γ A k ≥ , (7b)where γ A k is the subvector of γ indexed by A k and ˆ f ( γ A k ) is f ( γ ) deﬁned over γ A k with all the other variables ﬁxed being zero. Obvi-ously, problem (7) is different from problem (4). For instance, prob-lem (7) is deﬁned over γ A k , whereas problem (4) is deﬁned over γ .Therefore, the dimension of problem (7) is potentially much smallerthan that of problem (4) (if the set A k in (7) is properly chosen).We apply the spectral PG algorithm [17] to solve the subproblemin (7) until γ k +1 A k satisfying (cid:13)(cid:13)(cid:13)(cid:104) [ γ k +1 A k − ∇ ˆ f ( γ k +1 A k )] + − γ k +1 A k (cid:105)(cid:13)(cid:13)(cid:13) < ε k (8)is found, where ε k > is the solution tolerance at the k -th itera-tion. The spectral PG algorithm [17] is an PG algorithm with thespectral stepsizes, also called the Barzilai-Borwein (BB) stepsizes[18], which approximately solves the Quasi-Newton equation. Inthe PG algorithm, we need to compute the gradient of the objec-tive function ˆ f ( γ A k ) (equivalent to computing the partial gradientof function f ( γ ) with respect to the variables in A k ), which has acomplexity of O ( (cid:12)(cid:12) A k (cid:12)(cid:12) L ) . Two distinctive advantages of the spec-tral PG algorithm [17] in the context of solving problem (7) are asfollows. First, the non-negative constraint is easy to project onto, and thus the algorithm can be easily implemented to solve problem (7).Second, the algorithm enjoys a quite good numerical performancedue to the use of the alternating BB stepsizes [18, 19].Now, we are ready to present the proposed active set PG algo-rithm for solving problem (4). The pseudocodes of the proposedalgorithm are given in Algorithm 1.

Algorithm 1

Proposed active set PG algorithm for solving problem(4) Initialize: γ = , k = 0 , { ω k , ν k , ε k } k ≥ , and ε > repeat Select the active set A k according to (6); Apply the spectral PG algorithm [17] to solve the subproblem(7) until (8) is satisﬁed; Set k ← k + 1; until (cid:107) [ γ k − ∇ f ( γ k )] + − γ k (cid:107) < ε Output: γ k Next, we present some convergence properties of the proposedactive set PG Algorithm 1 (without rigorous proofs due to the spacelimitation). Note that a not careful selection of the active set (andparameters in it) might lead to oscillation (and divergence) of thecorresponding active set algorithm among different active sets. Thefollowing ﬁnite termination property is mainly because of the activ-ity set selection strategy in (6) (and careful choices of parameters ω k and ν k ) and the convergence property of the spectral PG algorithm. Theorem 1

For any given tolerance ε > , suppose that the pa-rameters ω k and ν k in (6) satisfy ω k ↓ and ν k ↓ and the param-eter ε k in (8) satisfy lim k →∞ ε k < ε, then the active set PG Algorithm 1will terminate within a ﬁnite number of iterations.

4. SIMULATION RESULTS

In this section, we present some simulation results to show the efﬁ-ciency of the proposed active set PG algorithm for solving the jointdata and activity detection problem (4). We generate the same pa-rameters as in [8] in our numerical simulations. More speciﬁcally,we consider a single cell of radius 1000m and consider the worst-case scenario where all devices are located in the cell edge such thatthe large-scale fading components g n ’s are the same for all devices.The power spectrum density of the background noise is -169dBm/Hzover 10 MHz and the transmit power of each device is set as 25dBm.The number of BS antennas, the length of the signature sequence,and the bits of the data are set to be M = 256 , L = 150 , and J = 1 (and thus Q = 2 ), respectively. We generate all signature sequencesfrom i.i.d. complex Gaussian distribution with zero mean and unitvariance. We set K/N = 0 . , which means that of the totaldevices are active, and compare the performance of different algo-rithms as N increases. The parameters in the proposed Algorithm 1are ω k = 10 − − k , ε k = max (cid:110) − k , . ∗ − (cid:111) ,ν k = min (cid:26) − k , . (cid:12)(cid:12)(cid:12)(cid:12) min n,q (cid:26)(cid:104) ∇ f ( γ k ) (cid:105) n,q (cid:27)(cid:12)(cid:12)(cid:12)(cid:12)(cid:27) , and ε = 10 − . All simulation results in this section are obtained byaveraging over

Monte-Carlo runs.The upper subﬁgure in Fig. 1 plots the average ratio of the cardi-nality of the selected active sets during all iterations of the proposed

000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Total number of users R a t i o Total number of users N u m be r o f It e r a t i on s Fig. 1 . Performance of the proposed active set PG algorithm.

Total number of users C P U t i m e [ s e c ] Random CDActive Set PGIdeal CDIdeal PG

Fig. 2 . Average CPU time comparison of the proposed active set PGalgorithm, the random CD algorithm, the ideal CD algorithm, andthe ideal PG algorithm with

K/N = 0 . . Algorithm 1 and the number of active sequences (i.e., K ). In theideal case where the set of active sequences is always selected asthe active set in Algorithm 1, the corresponding ratio is ; in theworst case where the whole set is always selected as the active setin Algorithm 1, the corresponding ratio is QN/K = 20 . This ra-tio measures the efﬁciency of the corresponding active set selectionstrategy and the smaller the ratio the better the active set selectionstrategy. We can observe from the upper subﬁgure in Fig. 1 that theratio is in the interval [1 . , . (and in fact very close to for differ-ent N ’s), which clearly shows that the proposed active set selectionstrategy (6) is very efﬁcient. The lower subﬁgure in Fig. 1 plots theaverage number of iterations that the proposed algorithm needs toterminate. This subﬁgure shows that Algorithm 1 will generally ter-minate within 4–7 iterations, which validates the ﬁnite terminationresult in Theorem 1.Next, we compare the proposed active set PG Algorithm 1 withthe following three benchmark algorithms:• Random CD [8, 10]: To the best of our knowledge, randomCD is the state-of-the-art algorithm for solving the covariancebased detection problem (4), which is much faster than EM [15] and SPICE [16]. Among various variants, one of themost efﬁcient ones is the so-called random permuted CD [8],which randomly permutes the indices of all variables at eachiteration and then updates the variables one by one accordingto the order in the permutation in closed form (see line 5 of[10, Algorithm 1]).• Ideal CD: This refers to the algorithm which applies the ran-dom permuted CD algorithm to solve problem (4) deﬁnedover the variables in the (ideal) set of active sequences. Sincethis ideal set is not known at the BS in practice, ideal CDis not a practical algorithm. Ideal CD is compared here forthe purpose of characterizing the best possible performanceof the CD types of algorithms.• Ideal PG: This refers to the algorithm which applies the PGalgorithm [17] to solve problem (4) deﬁned over the variablesin the (ideal) set of active sequences. This algorithm is alsonot practical and is only theoretically interesting for charac-terizing the best possible performance of the PG algorithm.We have observed in the simulations that random CD and activeset PG algorithms always ﬁnd the same solution of problem (4), andthus below we focus on their CPU time comparison. Fig. 2 plotsthe CPU time comparison of the proposed algorithm with the abovethree benchmark algorithms. It can be clearly observed from Fig. 2that the proposed active set PG algorithm signiﬁcantly outperformsthe state-of-the-art random CD algorithm [10, 8] in terms of the CPUtime. The proposed algorithm even achieves slightly better com-putational efﬁciency than the ideal CD algorithm. This shows theimportance of exploiting the sparsity of the true solution in orderto efﬁciently solve problem (4). Note that we ﬁx K/N = 0 . inFig. 2. It is expected that the proposed active set PG algorithm willbecome more efﬁcient than the random CD algorithm as the ratio K/N becomes smaller (i.e., the solution of problem (4) becomesmore sparse).We have also observed that directly applying the PG algorithm[17] to solve problem (4) is much slower than random CD. Fig. 2shows that ideal PG is more efﬁcient than ideal CD. These obser-vations are consistent with our optimization practice that it is betterto coordinately update all variables together instead of individually(unless for very large-scale optimization problems where it might becomputationally expensive to update all variables together).In summary, the high computational efﬁciency of the proposedactive set PG algorithm is mainly attributed to the following two fac-tors. First, the active set selection strategy (6) is efﬁcient, which isable to substantially reduce the dimension of the subproblems (com-pared to the original problem). Second, it is important to choose anappropriate algorithm for solving the subproblems deﬁned over thevariables in the active set and the PG algorithm [17] turns out to be agood option (which is obviously much better than the state-of-the-artCD algorithm [8, 10]).

5. CONCLUSIONS

Scalable and efﬁcient joint data and activity detection is essentialfor massive random access in mMTC. In this paper, we propose anovel active set PG algorithm that carefully exploits the sporadicnature of the device trafﬁc. The proposed algorithm is much moreefﬁcient than the existing state-of-the-art algorithms (in terms of theCPU time). We have observed from simulation results that severalﬁrst-order algorithms can ﬁnd the same (global) solution of the non-convex joint detection problem. It will be interesting to obtain sometheoretical guarantees for this observation. . REFERENCES [1] C. Bockelmann, N. Pratas, H. Nikopour, K. Au, T. Svens-son, C. Stefanovic, P. Popovski, and A. Dekorsy, “Massivemachine-type communications in 5G: Physical and MAC-layersolutions,”

IEEE Commun. Mag. , vol. 54, no. 9, pp. 59–65,Sept. 2016.[2] X. Chen, D. W. K. Ng, W. Yu, E. G. Larsson, N. Al-Dhahir,and R. Schober, “Massive access for 5G and beyond,”

IEEE J.Sel. Areas Commun. (to appear) , 2020.[3] L. Liu, E. G. Larsson, W. Yu, P. Popovski, ˇC. Stefanovi´c, andE. de Carvalho, “Sparse signal processing for grant-free mas-sive connectivity: A future paradigm for random access pro-tocols in the internet of things,”

IEEE Signal Process. Mag. ,vol. 35, no. 5, pp. 88–99, Sept. 2018.[4] K. Senel and E. G. Larsson, “Grant-free massive MTC-enabledmassive MIMO: A compressive sensing approach,”

IEEETrans. Commun. , vol. 66, no. 12, pp. 6164–6175, Dec. 2018.[5] L. Liu and W. Yu, “Massive connectivity with massive MIMO—Part I: Device activity detection and channel estimation,”

IEEE Trans. Signal Process. , vol. 66, no. 11, pp. 2933–2946,June 2018.[6] Z. Chen, F. Sohrabi, and W. Yu, “Sparse activity detection formassive connectivity,”

IEEE Trans. Signal Process. , vol. 66,no. 7, pp. 1890–1904, Apr. 2018.[7] L. Liu and Y.-F. Liu, “An efﬁcient algorithm for devicedetection and channel estimation in asynchronous IoTsystems,” in

Proc. IEEE Int. Conf. Acoustics, Speech,Signal Process. (ICASSP), Toronto, Canada , 2021. [Online].Available: https://arxiv.org/abs/2010.09979[8] Z. Chen, F. Sohrabi, Y.-F. Liu, and W. Yu, “Covariance basedjoint activity and data detection for massive random accesswith massive MIMO,” in

Proc. IEEE Int. Conf. Commun.(ICC), Shanghai, China , May 2019, pp. 1–6.[9] Z. Chen, F. Sohrabi, Y.-F. Liu, and W. Yu, “Phasetransition analysis for covariance based massive randomaccess with massive MIMO,” 2020. [Online]. Available:https://arxiv.org/abs/2003.04175[10] S. Haghighatshoar, P. Jung, and G. Caire, “Improved scalinglaw for activity detection in massive MIMO systems,” in

Proc.IEEE Int. Symp. Inf. Theory (ISIT), Vail, CO, USA , June 2018,pp. 381–385.[11] A. Fengler, G. Caire, P. Jung, and S. Haghighatshoar, “MassiveMIMO unsourced random access,” 2019. [Online]. Available:http://arxiv.org/abs/1901.00828[12] X. Shao, X. Chen, D. W. K. Ng, C. Zhong, andZ. Zhang, “Covariance-based cooperative activity detectionfor massive grant-free random access,” 2020. [Online].Available: https://arxiv.org/abs/2008.10155[13] D. Jiang and Y. Cui, “ML estimation and MAP estimationfor device activities in grant-free random access with interfer-ence,” in

Proc. IEEE Wireless Commun. Netw. Conf. (WCNC) ,2020, pp. 1–6.[14] J. Dong, J. Zhang, Y. Shi, and J. H. Wang, “Fasteractivity and data detection in massive random access: Amulti-armed bandit approach,” 2020. [Online]. Available:https://arxiv.org/abs/2001.10237 [15] D. P. Wipf and B. D. Rao, “An empirical Bayesian strategyfor solving the simultaneous sparse approximation problem,”

IEEE Trans. Signal Process. , vol. 55, no. 7, pp. 3704–3716,July 2007.[16] Z. Yang, J. Li, P. Stoica, and L. Xie, “Sparse methods fordirection-of-arrival estimation,” in

Academic Press Library inSignal Processing . Elsevier, 2018, vol. 7, pp. 509–581.[17] E. G. Birgin, J. M. Mart´ınez, and M. Raydan, “Nonmonotonespectral projected gradient methods on convex sets,”

SIAM J.Optim. , vol. 10, no. 4, pp. 1196–1211, 2000.[18] J. Barzilai and J. M. Borwein, “Two-point step size gradientmethods,”

IMA J. Numer. Anal. , vol. 8, no. 1, pp. 141–148,1988.[19] Y.-H. Dai and R. Fletcher, “Projected Barzilai-Borwein meth-ods for large-scale box-constrained quadratic programming,”