[PDF] Deep-Learned Approximate Message Passing for Asynchronous Massive Connectivity

Abstract

This paper considers the massive connectivity problem in an asynchronous grant-free random access system, where a huge number of devices sporadically transmit data to a base station (BS) with imperfect synchronization. The goal is to design algorithms for joint user activity detection, delay detection, and channel estimation. By exploiting the sparsity on both user activity and delays, we formulate a hierarchical sparse signal recovery problem in both the single-antenna and the multiple-antenna scenarios. While traditional compressed sensing algorithms can be applied to these problems, they suffer high computational complexity and often require the perfect statistical information of channel and devices. This paper solves these problems by designing the Learned Approximate Message Passing (LAMP) network, which belongs to model-driven deep learning approaches and ensures efficient performance without tremendous training data. Particularly, in the multiple-antenna scenario, we design three different LAMP structures, namely, distributed, centralized and hybrid ones, to balance the performance and complexity. Simulation results demonstrate that the proposed LAMP networks can significantly outperform the conventional AMP method thanks to their ability of parameter learning. It is also shown that LAMP has robust performance to the maximal delay spread of the asynchronous users.

Full PDF

11 Deep-Learned Approximate Message Passingfor Asynchronous Massive Connectivity

Weifeng Zhu, Meixia Tao,

Fellow, IEEE , Xiaojun Yuan,

Senior Member, IEEE ,and Yunfeng Guan

Abstract

This paper considers the massive connectivity problem in an asynchronous grant-free random accesssystem, where a huge number of devices sporadically transmit data to a base station (BS) with imperfectsynchronization. The goal is to design algorithms for joint user activity detection, delay detection,and channel estimation. By exploiting the sparsity on both user activity and delays, we formulatea hierarchical sparse signal recovery problem in both the single-antenna and the multiple-antennascenarios. While traditional compressed sensing algorithms can be applied to these problems, theysuffer high computational complexity and often require the perfect statistical information of channeland devices. This paper solves these problems by designing the Learned Approximate Message Passing(LAMP) network, which belongs to model-driven deep learning approaches and ensures efﬁcient perfor-mance without tremendous training data. Particularly, in the multiple-antenna scenario, we design threedifferent LAMP structures, namely, distributed, centralized and hybrid ones, to balance the performanceand complexity. Simulation results demonstrate that the proposed LAMP networks can signiﬁcantlyoutperform the conventional AMP method thanks to their ability of parameter learning. It is also shownthat LAMP has robust performance to the maximal delay spread of the asynchronous users.

Index Terms

Asynchronous massive connectivity, grant-free random access, massive machine-type communica-tion, compressed sensing, approximate message passing, deep learning

This paper was presented in part at the IEEE International Conference of Communications (ICC) 2020 [1].W. Zhu, M. Tao, and Y. Guan are with the Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai200240, China (e-mail: { wf.zhu, mxtao, yfguan69 } @sjtu.edu.cn).X. Yuan is with the Center for Intelligent Networking and Communication (CINC), University of Electronic Science andTechnology of China, Chengdu 610000, China (e-mail: [email protected]). a r X i v : . [ c s . I T ] J a n I. I

NTRODUCTION

The ﬁfth generation (5G) of wireless cellular networks has identiﬁed the massive machine-typecommunications (mMTC) as one of its core services [2], [3]. The mMTC service is expected toprovide cellular connectivity to a large amount of low-cost machine-type devices for Internet ofThings (IoT) applications. A key feature of IoT trafﬁcs is that the uplink transmission is usuallysporadic and has short packet size, so that only a small and random subset of devices are activeand for a short while [4], [5]. The main challenge to support mMTC services is therefore todesign new multiple-access schemes that can facilitate user activity detection, channel estimationand data detection timely and accurately.Grant-free (GF) random access is promising to establish sporadic connection between machine-type devices and their associated base stations (BSs) with minimal control overhead [5]. In theGF random access, each activated device directly transmits a unique pilot sequence followed bydata packets without asking for permission from the BS. The pilot sequences are pre-designedfor user identiﬁcation. They are often non-orthogonal due to a large quantity of devices butlimited time-frequency resource. In each time slot, the BS needs to identify all the active usersby detecting which pilots are received, and then estimate their channels for data detection.Note that many existing works on user activity detection in GF transmission assume thatthe transmissions of all the active devices are perfectly synchronized [5]–[14]. In practice, low-cost IoT devices usually work in a narrow-band system, and they have bursty transmissionsand inconsistent time accuracies. Due to the large overhead, the conventional synchronizationmechanism is hard to be employed among the massive number of low-cost devices. Thus, itis necessary to also consider a scalable scheme to tackle the imperfect synchronization. If notappropriately handled, such asynchrony in grant-free random access may severely deteriorate theperformance of user activity detection and channel estimation.The goal of this paper is to investigate the joint user activity detection, delay detection andchannel estimation in asynchronous massive access systems. By exploiting the sparsity in boththe device activity pattern and transmission delay pattern, this paper formulates a hierarchicalsparse signal recovery problem. We solve the problem by designing the Learned ApproximateMessage Passing (LAMP) network [15] with efﬁcient computation and outstanding recoverability.The LAMP network is obtained by unfolding approximate message passing (AMP) to form afeedforward network, where the parameters in the AMP framework can be learned to enhance the performance with a medium size of training data. Both the single-antenna scenario and themultiple-antenna scenario are considered.

A. Related Work

The user activity detection and channel estimation in GF random access schemes are oftencast into the sparse signal recovery problems in compressed sensing (CS). The works [6], [7]propose to jointly detect the active users and estimate their channels without prior knowledgeof the channel state information (CSI) by using orthogonal matching pursuit (OMP) and thebasis pursuit denoising (BPDN). By exploiting the statistical information of the channels andusers, the computationally efﬁcient AMP algorithm [16] is adopted in [8]–[11]. In particular, themassive MIMO techniques are considered in [9], [10], and their analysis demonstrates that theuser activity detection error can be driven to zero asymptotically when the number of antennasgoes to inﬁnity. Recently, a covariance matching algorithm is proposed in [12] to be capable ofdetecting a much larger amount of active users in the massive MIMO scenario. By exploitingthe BS cooperation, the works [13], [14] investigate the sparse activity detection in multi-cellsystem based on the AMP-based algorithms to further improve the detection performance. Insome cases, however, the BS is only interested in the transmitted information but not the useridentiﬁcation, which thereby motivates the unsourced random access [17]. In such scenario, allusers share a common codebook and the BS applies the CS-based algorithm to decode thetransmitted messages without detecting the user activity [17]–[19]. Note that the aforementionedworks all assume that the transmit signals from all devices are perfectly synchronized whenarriving at the receiver.Several existing works have attempted to investigate the massive random access problemwith imperfect synchronization. In contrast to [20], the uplink transmissions between differentusers to the BS are usually considered to be asynchronous in massive access. The work [21]introduces a blank time interval between the pilot and data whose length is large enough sothat the pilot detection and the data detection of different active users will not interfere witheach other. In [22], a zero-padding approach is adopted to avoid interference between the datasub-blocks with different indexes from different active users. In both [21], [22], the OMP-based algorithms are proposed to detect the active users, whose complexity is high when thenumber of users is large. The work [23] introduces a simple signal model with no pilot in theasynchronous systems. Then the user activity and data are jointly detected based on the Turbo bilinear generalized approximate message passing (Turbo-BiG-AMP) algorithm, where the exactstatistical information of the system is usually required.Recently, deep learning (DL) emerges as another powerful approach for user detection andchannel estimation by training a deep neural network (DNN) based on a vast amount of labeledtraining data. Previous works [24], [25] have shown that the DL approaches have potential tooffer improvements in both recoverability and complexity. In contrast to the traditional CS-basedalgorithm which usually detect the active users based on their estimated channel power, the DLapproaches can directly give a more accurate user activity detection solution and then improve thechannel estimation further. In particular, [24] proposes a block-restrict neural network (BRNN)for fast multiuser detection and then performs channel estimation. On the other hand, the systemstatistics employed in some iterative algorithms, e.g., message passing-based algorithms, may notbe precisely estimated, and the approximation in these algorithms is also likely to be inaccurate insome cases. Therefore, DL can also be integrated in these algorithms to improve the performanceby taking advantage of its parameter learning ability. In the work [25], a deep neural network-aided message passing-based block sparse Bayesian learning (DNN-MP-BSBL) algorithm isproposed, which can achieve better channel estimation accuracy with fewer iterations than MP-BSBL. These existing DL approaches are all designed for synchronous systems. To our bestknowledge, DL-based approaches have not be applied to asynchronous systems yet.

B. Contributions

This paper considers the joint user activity detection, delay detection and channel estimationin the asynchronous grant-free massive random access system. We adopt the transmission modelthat has a guard time inserted between pilot and data signals to capture both the sporadiccommunication pattern and delay pattern of the asynchronous users, which has been also usedbefore in [21], [26]. Then we formulate a hierarchical sparse signal recovery problem based on thesignal model. Depending on whether the BS has one or multiple antennas, the joint user activitydetection, delay detection and channel estimation can be reformulated as a single measurementvector (SMV) problem or a multiple measurement vector (MMV) problem. In this work, wepropose to solve these problems based on the LAMP network [15] which leverages the deeplearning technique and the AMP framework to offer improvements in both recoverability andcomplexity. Compared with other DL approaches [27], [28], the LAMP network allows feasibleperformance analysis and achieves better recoverability with more ﬂexible shrinkage function choices. To exploit the common sparsity of channels at all antennas, this paper goes forwardto design the LAMP network for the multiple-antenna scenario. Simulation results show thatLAMP can outperform AMP by beneﬁtting from the deep learning techniques. In the multiple-antenna scenario, the LAMP network can be further improved by jointly estimating the channelcoefﬁcients on all antennas. It is also observed that the performance of the proposed LAMPnetworks is robust to the maximal symbol delay of users.The main contributions of this work are summarized as follows: • We formulate a hierarchical sparse signal recovery problem with two-level sparsity inthe user activity pattern and delay pattern to perform joint user activity detection, delaydetection, and channel estimation. • We ﬁrst design a LAMP network for the single-antenna scenario. Two types of learnableshrinkage functions of the soft thresholding function and the MMSE-optimal denoisingfunction are designed to improve the recovery performance of LAMP by learning theirshrinkage parameters from the training data. In particular, the learnable MMSE-optimaldenoising function has taken the delay-level sparsity into account, which can further enhancethe performance. As well, the performance degradation caused by the situation where themeasurement matrix has non independent and identically distributed (i.i.d.) elements mayalso be overcome by the matched ﬁlter learning ability of LAMP. • We extend to design the LAMP networks for the multiple-antenna scenario. Three networkstructures are designed to balance the complexity and recoverability, namely distributedLAMP, centralized LAMP, and hybrid LAMP. Distributed LAMP is designed for the situ-ation with complexity limits, and centralized LAMP achieves the best performance amongthese three networks and has feasible performance analysis. Hybrid LAMP can balance theperformance and complexity to be employed in more complicated systems.

C. Organizations and Notations

The remaining part of this paper is organized as follows. Section II introduces the systemmodel of asynchronous massive connectivity system. In section III, we formulate a hierarchicalsparse signal recovery problem and introduce the basics of AMP. In Section IV, we introducethe LAMP network design in the single-antenna scenario. In Section V, the LAMP networksfor the multiple-antenna scenario are designed. The performance of the proposed approaches isillustrated in Section VI. Finally, we conclude this paper in Section VII.

User 𝑖 User 𝑘 User 𝑗 Pilot phase (

෨𝐿 = 𝐿 + 𝑇 𝑔 ) Data phase Symbol delays Guard time ⋮ ⋮ ⋮ ⋮

Pilot sequence 𝐬 𝑖 Data 𝐝 𝑖 𝐿 𝑇 𝑔 𝐿 𝑇 𝑔 𝐿 𝑇 𝑔 𝑡 𝑖 ≤ 𝐷𝑡 𝑗 Fig. 1. Frame structure

In this paper, upper-case and lower-case letters denote random variables and their realizations,respectively. Letters x , X , X denote vector, matrix and set, respectively. Superscripts ( · ) T denotetranspose. In addition, N denotes the all-one vector with length N . Further, E [ · ] denotes theexpectation operation; | · | denotes the magnitude of a variable or the Cardinality of a set,depending on the context; || · || p denotes the l p norm of a vector; || · || F denotes the Frobeniusnorm of a matrix. II. S YSTEM M ODEL

We consider an asynchronous massive access communication system containing one BS withsingle or multiple antennas and a very large number N users each with single antenna. In eachtransmission frame, each user sporadically transmits data to the BS with a small and ﬁxedprobability p a , and the signals of active users arrive at the BS with different and unknown timedelays. The transmit power is identical for each user. Here, we focus on the system as shown inFig. 1 where the user signals are asynchronous at the frame level but synchronous at the symbollevel. Speciﬁcally, the whole frame can be divided into many symbol intervals, and each symbolis exactly at one symbol interval, but the signal sequence may not calibrated at the start of theframe. Then it means that the delay of each user is an integer number of symbol intervals. Themaximal symbol delays of all users are assumed to be much smaller than the frame length.We adopt a grant-free random access scheme with two-phase transmission including a pilotphase and a data phase. The channel is assumed to be block fading so that it remains unchanged inone frame and varies in different frames. Speciﬁcally, the Rayleigh fading channel is considered.Each user n is assigned with a unique pilot sequence s n = [ s n, , s n, , . . . , s n,L ] T ∈ C L × foridentiﬁcation and channel estimation, where L is the length of the pilot sequence and is muchsmaller than the total number of users N , i.e., L (cid:28) N . We generate the elements in all pilotsequences from the i.i.d. Gaussian distribution with zero mean and /L variance, and then normalize each pilot sequence to have unit power. Denote the symbol delay of the signal ofuser n by t n when it arrives at the BS. We assume that t n is unknown, discrete, and uniformlydistributed in the set { , , . . . , D } with D being the maximal symbol delay spread. Note thatthere is no power transmission in the guard interval. The frame structure containing a guardtime between the pilot and data as shown in Fig. 1 is introduced in this work, where the lengthof the guard time, denoted as T g ∈ Z , is chosen to be equal to or larger than D , i.e., T g ≥ D .Based on such T g , the pilot transmission of each user will not overlap the data transmissionof all other users. Though the maximal delay may not be obtained accurately, we can set T g asufﬁciently large value. In this paper, we simply set the length of the guard time to be equalto the known maximal delay, i.e., T g = D , without affecting the algorithm design. Here, weassume that the BS does not know the explicit value of the user active probability and thechannel informationm, while we have pre-collected a large data set and these system statisticscan be implicitly exploited. Let λ n ∈ { , } denote whether user n is active or not. We takeboth the random user activity λ n and the unknown symbol delays t n into account in the signalmodel. For notation simplicity in the following, we deﬁne ˜ L = L + T g and ˜ N = N ( T g + 1) ,where ˜ L can be viewed as the expanded pilot length.When there is one single antenna equipped at the BS, the received signal at the BS in thepilot phase can be expressed as y = N (cid:88) n =1 ˜ s n,t n +1 λ n h n + z , (1)where ˜ s n,t n +1 ∈ R ˜ L × is the expanded pilot sequence of user n obtained by adding t n zerosbefore s n and ( T g − t n ) zeros after s n , i.e., ˜ s n,t n +1 = [ Tt n , s Tn , TT g − t n ] T ; h n = √ φ n g n ∈ R denotes the channel coefﬁcient between the BS and user n where φ n represents the large-scalefading attenuation and g n is the normalized small-scale fading with zero mean; z ∈ R ˜ L × denotesthe additive Gaussian noise vector with variance σ z normalized by the transmit power.Deﬁne ˜ λλλ n = [ λ n, , λ n, , . . . , λ n,T g +1 ] T ∈ R ( T g +1) × to indicate both the activity state andsymbol delay of user n , where there is at most one non-zero entry. In speciﬁc, we has λ n,t =0 , ∀ t ∈ { , , ..., T g + 1 } if user n is inactive, otherwise if user n is active and has a symboldelay of t n , we have λ n,t =  , t = t n + 1 , , otherwise . (2) The received signal y in (1) can be rewritten in a matrix-vector form as y = ˜ S ˜ Λh + z = ˜ Sx + z , (3)where ˜ S = [˜ S , ˜ S , . . . , ˜ S N ] ∈ R ˜ L × ˜ N refers to the expanded pilot matrix of all users in thesystem with ˜ S n = [˜ s n, , ˜ s n, , . . . , ˜ s n,T g +1 ] ∈ R ˜ L × ( T g +1) denoting the expanded pilot matrix ofuser n ; ˜ Λ = diag ( ˜ Λ , ˜ Λ , . . . , ˜ Λ N ) ∈ R ˜ N × ˜ N is the indicator matrix with ˜ Λ n = diag (˜ λλλ n ) ∈ R ( T g +1) × ( T g +1) ; h = [ h TT g +1 , h TT g +1 , . . . , h N TT g +1 ] T ∈ R ˜ N × is the overall channel vector;ﬁnally, x = ˜ Λh ∈ R ˜ N × represents the effective channel vector of all users that contains theinformation of user activity pattern, the channel gains and the symbol delays.Likewise, when there are M antennas at the BS, the received signal Y during the pilot phasecan be written as Y = ˜ S ˜ ΛH + Z = ˜ SX + Z , (4)where H = [ h TT g +1 , h TT g +1 , . . . , h N TT g +1 ] T ∈ R ˜ N × M is the overall channel matrix and h n = √ φ n g n ∈ R M × is the channel vector between the BS and user n with g n denoting the small-scale fading vector, and each element in g n is assumed to be i.i.d.; X = ˜ ΛH ∈ R ˜ N × M representsthe effective channel matrix of all users; ﬁnally, Z ∈ R ˜ L × M denotes the additive Gaussian noisematrix normalized by the transmit power.The problem is to recover x or X from the received signal y in (3) or Y in (4), respectively,given the measurement matrix ˜ S . This is a classic underdetermined linear inverse problem in theSMV form or the MMV form. Due to the sporadic communication pattern and only one singlesymbol delay of each asynchronous user, the effective channel x or X has a hierarchical sparserepresentation. As shown in Fig. 2, the hierarchical sparsity consists of two levels of sparsityincluding the user-level sparsity and the delay-level sparsity . The user-level sparsity means thatmost x n = [ x n, , . . . , x n,T g +1 ] T = [ λ n, h n , . . . , λ n,T g +1 h n ] T in (3) or X n = [ x n, , . . . , x n,T g +1 ] T =[ λ n, h n , . . . , λ n,T g +1 h n ] T in (4) are zeros, and the delay-level sparsity enforces that there is onlyone non-zero element in the non-zero x n or only one non-zero row in the non-zero X n . Therefore,the underdetermined problem can be possibly solved based on the CS algorithms. III. P

ROBLEM F ORMULATION AND

AMP A

LGORITHM

In this section, we formulate our problem of joint user activity detection, delay detection andchannel estimation in both the SMV and MMV forms. Then we introduce the AMP algorithm Though the measurement matrix ˜ S may not satisfy the restricted isometry property (RIP), simulation results show that theproblem can be well solved by CS-based algorithms. x = 𝜆 ℎ ⋮𝜆 𝑔 +1 ℎ ⋮ 𝜆 𝑛,1 ℎ 𝑛 ⋮𝜆 𝑛,𝑡 𝑛 +1 ℎ 𝑛 ⋮𝜆 𝑛,𝑇 𝑔 +1 ℎ 𝑛 ⋮𝜆 𝑁,1 ℎ 𝑁 ⋮𝜆 𝑁,𝑇 𝑔 +1 ℎ 𝑁 User-level sparsity

Delay-level sparsity X = 𝜆 𝐡 ⋮𝜆 𝑔 +1 𝐡 ⋮𝜆 𝑛,1 𝐡 𝑛𝑇 ⋮ 𝜆 𝑛,𝑡 𝑛 +1 𝐡 𝑛𝑇 ⋮𝜆 𝑛,𝑇 𝑔 +1 𝐡 𝑛𝑇 ⋮𝜆 𝑁,1 𝐡 𝑁 𝑇 ⋮𝜆 𝑁,𝑇 𝑔 +1 𝐡 𝑁𝑇 Fig. 2. Two-level sparse structure of the effective channel. to solve the formulated problems.

A. SMV problem for single antenna

We ﬁrst consider the problem in the single-antenna scenario, which is an SMV problem. Byconsidering the user-level sparsity and the delay-level sparsity in the effective channel vector,the problem is formulated as min x || y − ˜ Sx || (5a)s.t. || x || ≤ C, (5b) || x n || ≤ , n = 1 , , . . . , N, (5c)where the constant C (cid:28) N denotes the maximal number of active users in one frame. Theconstraint (5b) comes from the user-level sparsity with || x || = (cid:80) Nn =1 || x n || and || x n || indicateswhether user n is active or not. The constraint (5c) implies the delay-level sparsity that ensuresthat each active user has only one single symbol delay.The problem (5) is very difﬁcult to solve due to the non-smooth constraints. Then we refor-mulate the problem as min x β || x || + 12 || y − ˜ Sx || . (6)Note that the problem (6) can give a sparse channels x close to the optimal solution of (5),though the constraint (5c) cannot surely guaranteed. To reﬁne the solution of (6), we introducean common element selection operation to enforce all the elements except the one with thelargest magnitude in each group to be zeros after obtaining the solutions to the problem. Thenuser activity detection can be performed by comparing the channel power with a predeﬁneddecision threshold q th . The trade-off between the sparsity of the solution and the mean square error (MSE) || y − ˜ Sx || is adjusted by the value of the tuning parameter β . We then adopt thecommon strategy to relax the objective function into a convex function by using the l norm as min x β || x || + 12 || y − ˜ Sx || . (7)The problem (7) is a LASSO problem and can be directly solved by BPDN using interiorpoint method with computation complexity O ( ˜ N . ) . The OMP-based greedy methods can beused as well, but they involve the matrix inverse operation in each iteration. The computationcomplexity is O ( K ) in each iteration with K being the expected number of active users,which is still high in a large-scale system. The recently proposed AMP-based algorithms havemuch lower complexity in the large-scale system while approaches the performance of LASSOasymptotically, given by O ( ˜ L ˜ N ) in each iterations. Thus, we propose to employ the AMP-basedalgorithms in our problem.The AMP algorithm for our considered problem in the single-antenna scenario is describedas follows. It starts with (cid:98) x = and v = y and performs the following computations in the i th iteration [16] (cid:98) r i = (cid:98) x i − + ˜ S T v i − , (8a) (cid:98) x i = η ( (cid:98) r i ; σ i − , ϑϑϑ ) , (8b) v i = y − ˜ S (cid:98) x i + 1˜ L b i v i − , (8c)where σ i − = √ ˜ L || v i − || is the estimated standard deviation of the corrupted noise in (cid:98) r i ;the variable b i = (cid:80) Nn =1 (cid:80) T g +1 t =1 ∂ [ η ( r ; σ i − ,ϑϑϑ )] n,t ∂r n,t (cid:12)(cid:12)(cid:12) r = (cid:98) r i is calculated to obtain “Onsager correction”term L b i v i − ; and the shrinkage function η ( · ) is usually a non-linear component-wise functionoperating on each element of (cid:98) r i individually with ϑϑϑ being the parameter set. Note that anyLipschitz-continuous shrinkage function can be used [15], which enables the usage of morewell-performed shrinkage functions if we have more prior knowledge of the channels and users.For example, the shrinkage function based on MMSE-optimal criterion can be designed to achievebetter recoverability when the system statistics are perfectly known. Since there exists the delay-level sparsity in the effective channel vector x n , which means that the common component-wisefunction may not be able to achieve the optimal performance of AMP. In the following, wehave designed a non-separable shrinkage function based on the MMSE-optimal criterion to alsoexploit the delay-level sparsity in the effective channel x . In AMP, the “Onsager correction”term enables the algorithm to be analyzable by state evolution in the asymptotic regime, i.e., ˜ L , ˜ N → ∞ with their ratio and p a ﬁxed when ˜ S has i.i.d. sub-Gaussian elements [16]. The state evolution is given by δ i = 1˜ L E d [ || ηηη ϑϑϑ,δ i − ( x + δ i − d ) − x || ] + σ z , (9)where d are the random variables with distribution N (0 , I ˜ N ) . The input to the shrinkage functionin the i th iteration can be modeled as an AWGN-corrupted version of the true vector x plusnoise with estimated variance σ i − , i.e., (cid:98) r i = x + δ i − d . The variable δ i − is usually estimatedby the empirical result σ i − , i.e., (cid:98) δ i − = σ i − . B. MMV problem for multiple antennas

When the BS has multiple antennas, the considered problem becomes an MMV problem,which is formulated as min X || Y − ˜ SX || F (10a)s.t. N (cid:88) n =1 T g +1 (cid:88) t =1 I ( x n,t ) ≤ C, (10b) T g +1 (cid:88) t =1 I ( x n,t ) ≤ , n = 1 , , . . . , N. (10c)where I ( · ) is the indicator function with boolean output deﬁned as I ( x ) =  , if x has non-zero elements , , otherwise . (11)The problem (10) is also difﬁcult to solve directly. By following the similar operations in theSMV problem, we can reformulate the problem as min X β N (cid:88) n =1 T g +1 (cid:88) t =1 || x n,t || + 12 || Y − ˜ SX || F . (12)We replace the indicator function using the l norm and the problem is relaxed as a l , -normRLS problem [29]. To solve the relaxed problem, the conventional convex problem solver andgreedy algorithm still suffer high computation complexity. Two recently proposed AMP-basedalgorithms can be applied to solve the MMV problem (12) with affordable complexity.

1) Parallel AMP-MMV:

The parallel AMP-MMV algorithm proposed in [30] solves the MMVproblem in the distributed way. In each iteration, the parallel AMP-MMV algorithm ﬁrst estimatesthe channel coefﬁcients on the M antennas separately and then exchanges the soft information ofuser activity among different antennas. In speciﬁc, each iteration in parallel AMP consists of fourdistinct phases, which can be labeled using the mnemonics (into), (within), (out), and (across).In phase (into), the current beliefs about the user activity are calculated and conveyed intoeach AMP-SMV solver. In phase (within), Each of the M AMP-SMV solvers with the current beliefs follows (8) to solve the SMV problem of estimating the effective channel coefﬁcientson the corresponding antenna in parallel. In phase (out), the solution is utilized to reﬁne thebeliefs of the user activity in each AMP-SMV solver. Finally, in phase (across), the beliefsare conveyed across different AMP-SMV solvers. The algorithm will terminate after severaliterations. Interested readers may refer to [8], [30] for more details.

2) AMP with vector shrinkage function:

The AMP algorithm equipped with the vector shrink-age functions proposed in [31] solves the MMV problem in the centralized way. Similar to theiterative procedure in (8), this algorithm starts with (cid:98) X = and V = Y , then computes in the i th iteration (cid:98) R i = (cid:98) X i − + ˜ S T V i − , (13a) (cid:98) X i = η ( (cid:98) R i ; σ i − , ϑϑϑ ) , (13b) V i = Y − ˜ S (cid:98) X i + 1˜ L V i − B i , (13c)where R = [ R T , . . . , R TN ] T with R n = [ r Tn, , . . . , r Tn,T g +1 ] T is the input to the vector shrinkagefunction; σ i − = √ ˜ LM || V i − || F is the estimated standard deviation of the corrupted noise;and B i = (cid:80) Nn =1 (cid:80) T g +1 t =1 ∂ [ η ( R ; σ i − ,ϑϑϑ )] n,t ∂ r n,t (cid:12)(cid:12)(cid:12) R = (cid:98) R i is calculated for the “Onsager correction” matrix L V i − B i . This algorithm in the asymptotic region can also be analyzed by the state evolutionexpressed as ΣΣΣ i = δ i I with δ i determined by δ i = 1˜ LM E D [ || ηηη ϑϑϑ,δ i − ( X + δ i − D ) − X || F ] + σ z , (14)where each element in the random matrix D ∈ R ˜ N × M satisﬁes i.i.d. Gaussian distribution N (0 , . The input to the vector shrinkage function in the i th iteration can also be modeled asan AWGN-corrupted signal, i.e., (cid:98) R i = X + δ i − D . As well, δ i − can be estimated by σ i − .IV. D EEP -L EARNED

AMP

IN THE S INGLE -A NTENNA S CENARIO

Deep learning is also a powerful approach to accurately estimate the sparse vector x fromthe received signal y , where the network parameters of the DNN are trained to minimize thereconstruction MSE by utilizing a large amount of training data { ( y dT , x dT ) } D T d =1 regarded as(feature, label) pairs. Once the training process is completed, the DNN can predict the unknownchannels x New with the newly received signal y New . In this section, we introduce the LAMPnetwork that combines the DL techniques with the AMP framework to solve our problem in thesingle-antenna scenario. layer 1 layer 2 𝐲𝐯 ො𝐱 𝐲𝐯 ො𝐱 layer 𝐼⋯⋯⋯⋯⋯⋯ ⋯⋯⋯⋯⋯⋯ 𝐲𝐯 ො𝐱 𝐲𝐯 𝐼−1 ො𝐱 𝐼−1 𝐲𝐯 𝐼 ො𝐱 𝐼 layer 𝑖 𝐲𝐯 𝑖−1 ො𝐱 𝑖−1 𝐲𝐯 𝑖 ො𝐱 𝑖 ො𝐱 𝑖−1 𝐲𝐯 𝑖−1 𝐖 𝑖 𝜼 𝑖 (ො𝐫 𝑖 ;𝜎 𝑖−1 , 𝝑 𝑖 ) ො𝐫 𝑖 𝐯 𝑖−1 2 𝜎 𝑖−1 ො𝐱 𝑖 𝐲𝐯 𝑖 − 𝑏 𝑖 Fig. 3. The network structure and the i th layer of the LAMP network in the single-antenna scenario A. Network structure

The LAMP network is built by unfolding the iterations of AMP from (8) as a feedforwardneural network. The structure of the LAMP network and the details of the i th layer are shownin Fig. 3, which is different from the traditional deep neural networks containing multiple layersof perceptions. The signal ﬂow graph of the LAMP network is the same as that of the AMPalgorithm, where W i can be regarded as the matched ﬁlter matrix and ϑϑϑ i is the shrinkageparameter set. In the LAMP network, { W i , ϑϑϑ i } Ii =1 are considered as the network parameters,which will be learned from the training data. In the training process, the MSE loss function,deﬁned as L ( (cid:98) x ) = || (cid:98) x − x || , is used and the values of the learnable parameters will be updatedby following the back-propagation rule [32]. The back-propagation rule calculates the gradients ∂ L ( W i ,ϑϑϑ i ) ∂ W i and ∂ L ( W i ,ϑϑϑ i ) ∂ϑϑϑ i under the chain rule with the input data batch, and then updates thelearnable parameters based on the gradient descending methods.In LAMP, the shrinkage function also plays an important role. Apart from the commonly usedsoft thresholding shrinkage (ST) function and the MMSE-optimal denoising function, many otherfunctions, such as the piecewise linear function and the spline function, can be employed as wellin LAMP. It is mentioned that the learnable shrinkage parameters in different layers of LAMPcan have different values, which can improve the denoising ability of each layer. In this paper,both the ST function and the MMSE-optimal denoising function are considered. We omit theiteration index i in the following for simplicity. The ST function is given by [ ηηη ( (cid:98) r )] n,t = (cid:16)(cid:98) r n,t − θ n,t σ (cid:98) r n,t | (cid:98) r n,t | (cid:17) · I ( | (cid:98) r n,t | > θ n,t σ ) , (15)where θ n,t is a learnable tuning parameter and σ is the estimated variance of the corrupted noisein (cid:98) r . The learnable shrinkage parameter set is obtained as ϑϑϑ = { θ n,t } N,T g +1 n =1 ,t =1 . Note that in AMP, the value of θ n,t is usually obtained empirically and hence may not be optimal, while the optimalvalue of θ n,t can be learned in LAMP. Usually, all the tuning parameters share the same value,i.e., θ n,t = θ, ∀ n, t . Since the symbol delay of each user is assumed to be uniformly distributedin { , , . . . , T g } , the active probability for each user with delay t n is obtained as p a,T = p a T g +1 .Then the effective channel x n of user n can be modeled as a joint distribution as p ( x n ) = (1 − p a ) T g +1 (cid:89) t =1 δ ( x n,t ) + p a,T T g +1 (cid:88) t =1 (cid:32) CN ( x n,t ; 0 , φ n ) (cid:89) t (cid:48) (cid:54) = t δ ( x n,t (cid:48) ) (cid:33) . (16)So that the MMSE-optimal denoising function is designed to recover all elements x n,t in each x n altogether but not treats them independently as the ST function does. In this way, the delay-levelsparsity has been considered in the LAMP network. Based on the joint probability (16), theMMSE-optimal denoising function is given as (cid:98) x n,t = E [ x n,t | (cid:98) r n ]= CN ( (cid:98) r n,t ;0 ,φ n + σ ) CN ( (cid:98) r n,t ;0 ,σ ) (cid:98) r n,t (1 + σ φ n ) (cid:16)(cid:80) T g +1 t (cid:48) =1 CN ( (cid:98) r n,t (cid:48) ;0 ,σ + φ n ) CN ( (cid:98) r n,t (cid:48) ;0 ,σ ) + − p a p a,T (cid:17) = exp (cid:18) | (cid:98) r n,t | σ (1+ σ φn ) (cid:19)(cid:98) r n,t (1 + σ θ n ) (cid:20) exp (cid:18) || (cid:98) r n || σ (1+ σ φn ) (cid:19) + (1 + φ n σ ) − p a p a,T (cid:21) . (17)After obtain equation (17), the learnable MMSE-optimal denoising function can then bedeﬁned as [ ηηη ( (cid:98) r )] n,t = θ exp (cid:18) | (cid:98) r n,t | σ (1+ σ θ ,n ) (cid:19)(cid:98) r n,t (1 + σ θ n ) (cid:20) exp (cid:18) || (cid:98) r n || σ (1+ σ θ ,n ) (cid:19) + (cid:0) θ ,n σ (cid:1) θ (cid:21) − θ (cid:98) r n,t , (18)where the learnable shrinkage parameter set is deﬁned as ϑϑϑ = {{ θ ,n } Nn =1 , θ , θ , θ } and we set θ ,n = φ n and θ = − p a p a,T . The parameters θ and θ are regarded as the tuning parameters to mixthe linear and non-linear shrinkage functions, which is promising to improve the performance of(17). When applying the MMSE-optimal denoising function in the traditional AMP algorithm,we usually set θ = 1 , θ = 0 and determine the value of {{ θ ,n } Nn =1 , θ } based on the perfectlyknown system statistical parameters {{ φ n } Nn =1 , p a } . In the LAMP network, these parameters areall learned from the training data. If all users have the same large-scale channel attenuation,we can set θ ,n = θ , ∀ n = 1 , . . . , N . Thus, the number of the learnable parameters can besigniﬁcantly reduced, given that N , the total number of users, is very large. The calculation ofthe Onsager term in the neural network needs the derivative of the shrinkage function, which is Algorithm 1

Parameter training of the tied LAMP network via layer-by-layer and denoiser-by-denoiser learning strategy Initialize: W = ˜ S H and ϑϑϑ i = ϑϑϑ , i = 1 , . . . , I . for i = 1 to I do Learn { ϑϑϑ i } with ﬁxed { W , { ϑϑϑ l } i − l =1 } with the loss function L ( ˆx i ) = || ˆx i − x || . Re-learn {{ ϑϑϑ l } il =1 } with the loss function L ( ˆx i ) = || ˆx i − x || . end for Re-learn { W , { ϑϑϑ l } Il =1 } for reﬁnement with the loss function L ( ˆx I ) = || ˆx I − x || . Return { W , { ϑϑϑ i } Ii =1 } .given by ∂ [ ηηη ( (cid:98) r )] n,t ∂ (cid:98) r n,t = θ [ q ( (cid:98) r n )] t + θ ,n σ ( θ ,n + σ ) ([ q ( (cid:98) r n )] t − | (cid:98) r n,t | (cid:16) σ θ ,n (cid:17) [ q ( (cid:98) r n )] t − θ , (19)where the vector function q ( (cid:98) r n ) = [[ q ( (cid:98) r n )] , [ q ( (cid:98) r n )] , . . . , [ q ( (cid:98) r n )] T g +1 ] T ∈ C ( T g +1) × is deﬁnedto simplify the expression of (19). The function [ q ( (cid:98) r n )] t is deﬁned as [ q ( (cid:98) r n )] t = exp (cid:16) || (cid:98) r n || σ (1+ σ θ ,n ) (cid:17) + (cid:0) θ ,n σ (cid:1) θ exp (cid:16) | (cid:98) r n,t | σ (1+ σ θ ,n ) (cid:17) . (20) B. Parameter training

In the learnable parameter set of the LAMP network, the matched ﬁlter matrices W i can beﬁxed for all layers, i.e., W i = W , or vary at each layer i . Accordingly, the LAMP networkcan be referred to as “tied” and “untied”, respectively. Intuitively, the tied LAMP network is aspecial case of the untied LAMP network, and the untied LAMP network is superior to the tiedLAMP network if there is sufﬁcient training data. However, in our simulation trials, we ﬁndthat the untied neural network brings little performance improvement, and may even make thenetwork prone to overﬁtting if the training data size is not large enough. Thus, we employ thetied LAMP network in this work.Apart from the network structure and the parameter set, the training strategy also determinesthe performance of the neural network. The standard training strategy is the end-to-end trainingwhere all the parameters are updated simultaneously by following the back-propagation rule.However, it is found that the LAMP network with end-to-end training can easily converge to a bad local optimal solution due to overﬁtting. The work [33] proposes the layer-by-layer trainingand the denoiser-by-denoiser training methods to avoid overﬁtting. In the layer-by-layer training,there are totally I epoches. We ﬁrst train the learnable parameters of the ﬁrst layer in the ﬁrstepoch, then we train the 2-layer sub-network including the ﬁrst two layers in the second epoch.In the i th epoch, the parameters of the i -layer sub-network consisting of the ﬁrst i layers are alltrained. The training process repeats until the I th epoch is ﬁnished. In speciﬁc, at the start of thetraining process, we initialize the network parameters as stated in line 1 of Algorithm

1. Thenin each of the following epoches, we ﬁrstly decouple the shrinkage functions which recovers x from (cid:98) r from the target subnetwork to learn its shrinkage parameters ϑϑϑ i as stated in line 3,which is referred as the denoiser-by-denoiser training. And then all the shrinkage parameters { ϑϑϑ l } il =1 of the target subnetwork in the current epoch are updated simultaneously as stated in line4. Finally, all the network parameters { W , { ϑϑϑ i } Ii =1 } are all updated for reﬁnement. The wholetraining process based on the training strategy that integrates the denoiser-by-denoiser traininginto the layer-by-layer training approach for the tied LAMP network is outlined in Algorithm

1. It should be mentioned that the number of the training iterations is not predetermined in eachepoch. The i th epoch ﬁnishes until the performance of the updating subnetwork keeps beingworse than its best performance achieved in this epoch for a certain number of iterations T w .It is proven in [33] that this training strategy can enable such Learned denoising-based AMP(LDAMP) network to achieve MMSE optimality in theory by state evolution when the followingconditions hold: • The measurement matrix ˜ S has i.i.d. sub-Gaussian elements. • The noise z is i.i.d Gaussian. • The shrinkage functions ηηη ( · ) are Lipschitz-continuous.Though the elements of the measurement matrix ˜ S are not i.i.d sub-Gaussian, we observe thatLAMP can still approach the optimal performance closely from the numerical results. C. Discussion

The LAMP network is classiﬁed as the model-driven deep learning category that mixes hand-designed and data-driven methods [34]. Compared with traditional data-driven deep learningmethods which use conventional multi-layer perceptions or convolution networks, there are twomain advantages of the neural network LAMP. First, it does not require a very large volume oftraining data due to its well-designed structure, meaning that the cost in collecting the training LAMP-SMV LAMP-SMV 2 ⋮ LAMP-SMV 𝑀𝐲 𝐲 ⋮𝐲 𝑀 𝑣𝑒𝑐 𝐘 = 𝐱 𝐱 ⋮𝐱 𝑀 = 𝑣𝑒𝑐(𝐗) (a) Distributed LAMP network LAMP with vector shrinkage function

𝐘 𝐗 𝑣𝑒𝑐(𝐘) 𝑣𝑒𝑐(𝐗) (b) Centralized LAMP network

LAMP with vector shrinkage function LAMP with vector shrinkage function ⋮ LAMP with vector shrinkage function 𝑈 ෤𝐲 ෤𝐲 ⋮ ෤𝐲 𝑈 𝑣𝑒𝑐 𝐘 = ෤𝐱 ෤𝐱 ⋮෤𝐱 𝑈 = 𝑣𝑒𝑐(𝐗) 𝑈 𝑈 (c) Hybrid LAMP networkFig. 4. The proposed three LAMP network structures in the multiple-antenna scenario data can be reduced. Second, there is no need to re-train the neural network when only the noisevariance σ z changes since the noise variance is not used in the AMP algorithm. However, it ismentioned that the LAMP network still needs to be re-trained when the other channel statisticschange. V. D EEP -L EARNED

AMP

IN THE M ULTIPLE -A NTENNA S CENARIO

In this section, we design the LAMP network for the multiple-antenna scenario. Inspiredby the AMP algorithms for the MMV problem reviewed in Section III-B, we propose threeLAMP network structures, namely, distributed structure, centralized structure and hybrid structureas shown in Fig. 4. The distributed LAMP network (LAMP-D) allows taking advantage ofdistributed computation units to reduce the running time cost, and the centralized LAMP network(LAMP-C) achieves better recoverability and allows the theoretical performance analysis by usingthe state evolution. The hybrid LAMP network (LAMP-H) combines the advantages of both thedistributed network and the centralized network. The details are presented in the followingsubsections. In the rest of paper, to avoid confusion, we denote the LAMP network designed inthe previous section for the single-antenna case as LAMP-SMV. A. Distributed LAMP Network

The structure of the LAMP-D network is shown in Fig. 4(a), which is comprised by M parallelLAMP-SMV subnetworks. The parallel AMP-MMV algorithm adds the extra operations of beliefreﬁnement and exchanges to improve the recoverability after solving the M separate SMVproblems by AMP in (8). However, the channel statistics are still needed in the AMP estimationfor each SMV problem. By employing M independent AMP-SMV solvers with known channeldistributions and user active probability, we ﬁnd the MMV problem can also be well solved.Thus, we propose to construct the LAMP-D network by using M independent LAMP-SMVsubnetworks to estimate the channel coefﬁcients on their corresponding antennas in parallel.In the construction of the LAMP-D network, whether the M LAMP-SMV networks sharethe same parameter value or have different parameter values needs to be studied ﬁrstly. Whenthe fading coefﬁcient on each antenna at the BS is i.i.d., all the LAMP-SMV subnetworks canshare the same parameters value. Therefore, only one LAMP-SMV network is trained in practiceand then the learned parameter values are shared by all LAMP-SMV subnetworks. The receivedsignal and the effective channel are denoted as Y = [ y , y , . . . , y M ] and X = [ x , x , . . . , x M ] ,and each ( y m , x m ) is considered to be one sample for training. So that the quantity of the trainingdata for the neural network becomes M D T , where D T is the number of received signals Y inthe training data set. When the fading coefﬁcient on each antenna is not i.i.d., there are M independent LAMP-SMV subnetworks to be trained. Each subnetwork needs to be trained basedon the received signals on the corresponding antenna, and then the size of training data set foreach subnetwork is only D . In this paper, the neural network with all LAMP-SMV subnetworkssharing the same parameter value is adopted, since the i.i.d. channel distribution on each antennais assumed and the LAMP-SMV subnetwork can be trained by more training data samples toachieve better recovery performance. The LAMP-SMV subnetworks in the LAMP-D networkare trained by following Algorithm

1. It is concluded that the LAMP-D network concentrates onimproving the recoverability on each single antenna and the distributed setup of subnetworks canreduce running time cost by parallel computation. While the optimal performance of the AMPframework may not be achieved, since the common sparsity in the estimated channel matrix isnot fully exploited. layer 1 layer 2 𝐘𝐕 ෡ 𝐗 𝐘𝐕 ෡𝐗 layer 𝐼⋯⋯⋯⋯⋯⋯ ⋯⋯⋯⋯⋯⋯ 𝐘𝐕 ෡𝐗 𝐘𝐕 𝐼−1 ෡𝐗 𝐼−1 𝐘𝐕 𝐼 ෡𝐗 𝐼 layer 𝑖 𝐘𝐕 𝑖−1 ෡𝐗 𝑖−1 𝐘𝐕 𝑖 ෡𝐗 𝑖 ෡𝐗 𝑖−1 𝐘𝐕 𝑖−1 𝐖 𝑖 𝜼 𝑖 (෡𝐑 𝑖 ; 𝜎 𝑖−1 ,𝝑 𝑖 ) ෡𝐑 𝑖 𝐕 𝑖−1 𝐹 𝜎 𝑖−1 ෡𝐗 𝑖 𝐘𝐕 𝑖 − 𝐁 𝑖 Fig. 5. The network structure and i th layer of the LAMP network with vector shrinkage function B. Centralized LAMP Network

The centralized LAMP network contains only one single LAMP network with vector shrinkagefunction which “unfolds” the iterations of the AMP algorithm with vector shrinkage function.The vector shrinkage function usually recovers each row x n,t in the true signal X individuallyto exploit the common sparsity in the columns of X , so that the LAMP-C network can obtain amore accurate estimate of X than the LAMP-D network. The network structure and the detailsof the i th layer in the LAMP network with vector shrinkage function is similar to LAMP-SMV from Fig. 5. In the neural network, the input and output are usually limited to be inthe form of a vector, while the received signal and the estimated channel are both in the formof a matrix. Thus, a simple form transformation operation is needed at the input and outputof the network which is eliminated in Fig. 5. In particular, this transformation operation alsoneeds to be added both in the input and the output of each layer, since the layer-by-layerand denoiser-by-denoiser training strategy is also adopted to train the LAMP-C network. Inaddition, we also select the MSE function as the loss function of the LAMP-C network, i.e., L ( (cid:98) X ) = || vec ( (cid:98) X ) − vec ( X ) || = || (cid:98) X − X || F . We can also represent the learnable parameterset as { W i , ϑϑϑ i } Ii =1 , where W i is regarded as the matched ﬁlter matrix and ϑϑϑ i is the learnableparameters set of the vector shrinkage functions in the i th layer. We drop the index i to simplifythe expression in the following part. In the LAMP network with vector shrinkage function, theST function and the MMSE-optimal denoising function are also adopted. The ST function canbe presented as [ ηηη ( (cid:98) R )] n,t = (cid:18)(cid:98) r n,t − θ √ M σ (cid:98) r n,t || (cid:98) r n,t || (cid:19) · I ( || (cid:98) r n,t || > θ √ M σ ) , (21) where θ is the learnable tuning parameter and the learnable parameter set is ϑϑϑ = { θ } . By ex-ploiting the system statistics, the MMSE-optimal denoising function can achieve better recoveryperformance. The joint distribution of the effective channel X n of user n is given as p ( X n ) = (1 − p a ) T g +1 (cid:89) t =1 δ ( x n,t ) + p a,T T g +1 (cid:88) t =1 (cid:16) CN ( x n,t ; 0 , φ n I M ) (cid:89) t (cid:48) (cid:54) = t δ ( x n,t (cid:48) ) (cid:17) . (22)Based on the MMSE-optimal criteria, the vector shrinkage function can be obtained, whichestimates each effective channel matrix X n rather than treat each x n,t independently. The covari-ance matrix of the corrupted noise in r n,t is a diagonal matrix with identical diagonal elements,then the MMSE-optimal vector shrinkage function can be simpliﬁed. Since the Section IV hasillustrated how to obtain the learnable shrinkage function based on the MMSE-optimal shrinkagefunction, we here directly give the learnable MMSE-optimal denoising function as [ ηηη ( (cid:98) R )] n,t = θ exp (cid:18) || (cid:98) r n,t || σ (1+ σ θ ,n ) (cid:19)(cid:98) r n,t (1 + σ θ n ) (cid:20) exp (cid:18) || (cid:98) R n || F σ (1+ σ θ ,n ) (cid:19) + (cid:0) θ ,n σ (cid:1) M θ (cid:21) − θ (cid:98) r n,t , (23)where the learnable parameter set is also deﬁned as ϑϑϑ = {{ θ ,n } Nn =1 , θ , θ , θ } with θ ,n = φ n and θ = − p a p a,T . Also, we can set θ ,n = θ , ∀ n = 1 , . . . , N when all users suffer the samelarge-scale channel attenuation. In each layer of the LAMP-C network, the Jacobi matrix of thevector shrinkage function also needs to be calculated to obtain the Onsager term, which can bewritten as ∂ [ ηηη ( (cid:98) R )] n,t ∂ (cid:98) r n,t = θ [ Q ( (cid:98) R n )] t I M + θ ,n σ ( θ ,n + σ ) ([ Q ( (cid:98) R n )] t − (cid:98) r Hn,t (cid:98) r n,t (cid:16) σ θ ,n (cid:17) [ Q ( (cid:98) R n )] t − θ I M , (24)where Q ( R n ) = [[ Q ( R n )] , [ Q ( R n )] , . . . , [ Q ( R n )] T g +1 ] T ∈ C ( T g +1) × is deﬁned for simplifyingthe expression of (24) and [ Q ( R n )] t is given as [ Q ( (cid:98) R n )] t = exp (cid:16) || (cid:98) R n || F σ (1+ σ θ ,n ) (cid:17) + (cid:0) θ ,n σ (cid:1) θ exp (cid:16) | (cid:98) r n,t | σ (1+ σ θ ,n ) (cid:17) . (25)The signal ﬂow graph of the LAMP-C network is the same as the AMP algorithm with matrixshrinkage function. Thus, the denoising performance of each layer in the LAMP-C network canalso be described by state evolution ΣΣΣ i in the asymptotic region. Since the layer-by-layer anddenoiser-by-denoiser training strategy is proven to enable the LDAMP network [33] for theSMV problem to achieve MMSE optimality under the conditions mentioned in Section IV, it isreasonable to speculate that the strategy can also ensure the MMSE-optimality of the LAMP-Cnetwork under the same conditions. To prove the property, we ﬁrst follow [33] to deﬁne a set Algorithm 2

Parameter training of the tied LAMP network with vector shrinkage function vialayer-by-layer and denoiser-by-denoiser learning strategy Initialize: W = ˜ S H and ϑϑϑ i = ϑϑϑ , = 1 , . . . , I . for i = 1 to I do Learn { ϑϑϑ i } with ﬁxed { W , { ϑϑϑ l } i − l =1 } based on the loss function L ( (cid:98) X i ) = || (cid:98) X i − X || F . Re-learn {{ ϑϑϑ l } il =1 } based on the loss function L ( (cid:98) X i ) = || (cid:98) X i − X || F . end for Re-learn { W , { ϑϑϑ i } Ii =1 } for reﬁnement based on the loss function L ( (cid:98) X I ) = || (cid:98) X I − X || F . Return { W , { ϑϑϑ i } Ii =1 } .of variables { τ i } Ii =1 as τ i = 1˜ N M E D [ || ηηη iϑϑϑ i ,δ i ( X + δ i − D ) − X || ] . (26)In addition, we also give the deﬁnition of the monotone denoising function that inf ϑϑϑ E D || ηηη ϑϑϑ,δ ( X + δ D ) − X || F is a non-decreasing function of δ for any X . With the above deﬁnition, we areready to present the following lemma. Lemma 1:

Suppose that the shrinkage functions ηηη iϑϑϑ i ( · ) , i = 1 , . . . , I , are the monotonedenoising functions. By following the greedy selection strategy in [35], the parameters ϑϑϑ areupdated as ϑϑϑ ∗ to minimize E X [ τ ] and ﬁxed; then the parameters ϑϑϑ are updated as ϑϑϑ ∗ tominimize E X [ τ ] and ﬁxed, ..., and the parameters ϑϑϑ I are updated as ϑϑϑ ∗ I to minimize E X [ τ I ] .The LAMP network with all the updated learnable parameters { ϑ ∗ i } Ii =1 ﬁnally minimizes E X [ τ I ] . Proof:

This lemma can be proved by replacing τ i with E X [ τ i ] in the proof for Lemma 3in [35].If the conditions in Lemma 1 and Section IV are all satisﬁed, the LAMP-C network underlayer-by-layer and denoiser-by-denoiser training strategy achieves MMSE optimality. However,the measurement matrix ˜ S has non i.i.d elements, which makes no optimality guarantee in theory.From the numerical results, we can see that the LAMP-C network based on the layer-by-layerand denoiser-by-denoiser training strategy can still approach the optimal performance.Compared with the LAMP-SMV network, the learnable parameters set { W i , ϑϑϑ i } Ii =1 of theproposed LAMP-C network has no difference. And the LAMP-C network can be similarlydivided by the tied version and the untied version depending on whether the matched ﬁltermatrix W i for i = 1 , . . . , I , share the same values in all layers or have different values across different layers. Therefore, the training process of the LAMP-C network will also keep nearlyunchanged. We outline the training procedure of the tied LAMP-C network in Algorithm C. Hybrid LAMP Network

The neural networks proposed in the above two subsections solve the MMV problem in eitherpurely distributed way or centralized way, which concentrates only on reducing the runningtime cost or improving the recovery accuracy in the multiple-antenna scenario. Intuitively, wecan construct the LAMP network in a hybrid way, which has combined the structures of boththe LAMP-D network and the LAMP-C network to save the running time and improve therecoverability simultaneously. The structure of the LAMP-H network is shown in Fig. 4(c)that consists of U parallel LAMP networks with vector shrinkage function, and each LAMPsubnetwork only recovers the channel on its corresponding subset of antennas. When the channelcoefﬁcient on each antenna is assumed to satisfy i.i.d. distribution, we can divide all antennas into U non-overlap subsets with the same size by sequence. Thus, the effective channel matrix can berepresented by Y = [ ˜ Y , . . . , ˜ Y U ] where ˜ Y u ∈ R L × MU and we assume that MU is an integer here.Then the input vector of the u st LAMP subnetwork can be denoted as ˜ y u , where ˜ y u = vec ( ˜ Y u ) .And the output estimated channel matrix (cid:98) X can also be obtained by concatenating the estimatedsub-channels { (cid:98) X u } Uu =1 , i.e., (cid:98) X = [ (cid:98) X , . . . , (cid:98) X U ] . Similar to the LAMP-D network, the learnableparameters of different subnetworks in LAMP-H can also have different value setting accordingto the considered system. On the other hand, it is also implied that the antennas can be dividedinto different subsets in various ways, and the number of antennas in different subsets can even bevarious in some cases. However, the best dividing method is unknown which usually needs manytrials before the neural network deployment. In this work, the LAMP-H network is designed tohave the same parameter value in each LAMP subnetwork with vector shrinkage function. EachLAMP subnetwork estimates the channel coefﬁcients on the corresponding MU antennas, and thesize of the training data for the subnetwork training is MDU . D. Discussion

Since the AMP algorithm with vector shrinkage function has saturated recovery performancewhen M exceeds a certain value, the performance the LAMP-C network will be also limited insuch case, which has been validated by the numerical results. Therefore, the LAMP-H networkusually can be a better choice which can approach the optimal performance and make use of TABLE IC

OMPUTATIONAL C OMPLEXITY OF D IFFERENT A LGORITHMS IN THE S INGLE -A NTENNA S CENARIO

Algorithm BPDN OMP AMP LAMPComplexity O ( ˜ N . ) O ( IK ) O ( I ˜ L ˜ N ) O ( I ˜ L ˜ N ) TABLE IIC

OMPUTATIONAL C OMPLEXITY OF THE P ROPOSED

LAMP

NETWORKS IN THE M ULTIPLE -A NTENNA S CENARIO

Network LAMP-D LAMP-C LAMP-HComplexity O ( I ˜ L ( ˜ NM + M )) O ( I ˜ L ( ˜ NM + ˜ LM )) O ( I ˜ L ( ˜ NM + M U )) the distributed computing units to save the running time simultaneously when the number ofantennas is very large. E. Computational Complexity Analysis

To implement the proposed algorithms, the computational complexity also has a great inﬂuenceto the hardware usage and power consumption. Here, we analyze the computational complexityof our proposed algorithm and compare them with the conventional CS algorithms. Table Igives the complexities of the proposed LAMP network as well as that of the conventional CSalgorithm BPDN, OMP, and the AMP algorithm in the single-antenna scenario. And Table IIlists the complexities of the three LAMP networks in the multiple-antenna scenario. The numberof iterations or the layers of network is denoted as I in these two tables. By comparison, thecomplexity of the AMP algorithm and the LAMP networks increases linearly with L , N , sinceno matrix inversion is needed. In Table II, the complexity of LAMP-C and LAMP-H networksincrease linearly with M , while LAMP-D network increase linearly with M . Additionally, whenperfect system statistics are unavailable, more well-performed shrinkage functions can be utilizedin LAMP, which can also speed up the convergence of the AMP framework to reduce the needediterations. Thus, the LAMP network is more computationally efﬁcient in massive access.VI. S IMULATION R ESULTS

In this section, we present the simulation results of the proposed algorithms in asynchronousgrant-free random access systems.We consider a system with N = 100 users for illustration purpose, although the AMP-basedalgorithms can be used for a much larger-scale problem. Each user has a probability of p a = 0 . to be active. To simplify the demonstration, all users are placed at the edge of the cell centered by -2 -1 False alarm ratio -2 -1 M i ss ed de t e c t i on r a t i o OMPAMP-STLAMP-STAMP-MMSELAMP-MMSE

Fig. 6. The user activity detection performance comparison in the single-antenna scenario with T g = 3 . SNR -30-25-20-15-10-50 N M SE OMPAMP-STLAMP-STAMP-MMSELAMP-MMSE

Fig. 7. The channel estimation performance comparison in the single-antenna scenario with T g = 3 . the BS. Thus, the large-scale channel attenuation φ n of all users are equal, i.e., φ n = φ, ∀ n andthe signal-to-noise ratio is deﬁned by SNR = φσ z . We consider SNR = 0 dB in the simulations ifnot speciﬁed otherwise. We set the pilot length L = 40 , and the maximal symbol delay of usersand the length of guide time are set to be , i.e., T g = D = 3 . The small-scale fading coefﬁcientof each user at each antenna is generated according to the i.i.d. Gaussian distribution with zeromean and unit variance, i.e., g n,m ∼ CN (0 , , ∀ n, m . We generate independent sampleswith SNR = 0 dB for training and × independent samples with the same distribution forvalidation in both the single-antenna scenario and the multiple-antenna scenario. Then another × independent samples are generated for each SNR ( dB ) ∈ { , , , , } , which leadsto totally . × testing samples. In the training process, the training data is divided intominibatches of size . The performance of the neural network on the test set is also evaluatedby MSE of all the samples in this set, which is computed as MSE = B (cid:80) Bb =1 || (cid:98) x b − x b || with B SNR -2 -1 E rr o r de t e c t i on r a t i o OMPAMP-STLAMP-STAMP-MMSELAMP-MMSE

Fig. 8. The delay detection performance comparison in the single-antenna scenario with T g = 3 under ﬁxed false alarm ratio (cid:15) = 0 . . being the number of samples. The number of layers in the LAMP networks for both the single-antenna scenario and multiple-antenna scenario is set to be I = 10 . The neural networks are alltrained and tested by using the deep neural network library TensorFlow. The Adam optimizer isadopted with a training rate × − . The AMP algorithm and the OMP algorithm are employedas the benchmarks which are also evaluated on the same test data set.In Fig. 6, we ﬁrst evaluate the active user detection performance of the LAMP network interms of the missed detection ratio versus the false alarm ratio by varying the decision thresholdvalue q th . The missed detection ratio is deﬁned as the ratio of the number of undetected usersover the number of active users, and the false alarm ratio is deﬁned as the ratio of the numberof the inactive users falsely detected to be active over the number of the inactive users. Here,user n is detected to be active with symbol delay t max if the element with maximal magnitude,denoted as (cid:98) x n,t max +1 , in (cid:98) x n is larger than q th . AMP-ST and LAMP-ST denote the schemes withthe ST function being used, while AMP-MMSE and LAMP-MMSE employ the MMSE-optimaldenoising function. When the MMSE-optimal denoising function is selected, the exact value ofthe parameters is unknown and will be learned in LAMP while it is perfectly known in AMP.Since the tuning parameter θ of ST function is ﬁxed in the AMP-ST and LAMP-ST, manyelements in the output vector of the ST function are zeros. This means that the correspondingusers will never be detected to be active, thus the false alarm ratio of the curves of AMP-ST and LAMP-ST cannot approach one. In this paper, only the p a N users which are most Based on our simulation trials, the AMP algorithm will converge in 10 iterations in our system setting, so that we also setthe number of the layers in the LAMP network as I = 10 . -4 -3 -2 -1 False alarm ratio -4 -3 -2 -1 M i ss ed de t e c t i on r a t i o AMP-STLAMP-STAMP-MMSELAMP-D-MMSELAMP-C-MMSELAMP-H-MMSE

M=2M=4

Fig. 9. The user activity detection performance in the multiple-antenna scenario with T g = 3 . likely to be active are detected by the OMP algorithm, so that its false alarm ratio also neverapproaches one. We observe that LAMP-ST signiﬁcantly outperforms AMP-ST by learning theoptimal tuning parameter value and the matched ﬁlter matrix. It achieves similar performance tothe OMP algorithm, since LAMP-ST implicitly solves the LASSO problem and approaches theperformance of LASSO in the asymptotic regime. And the learnable MMSE-optimal shrinkagefunction can enable LAMP to outperform both LAMP-ST and OMP by exploiting the statisticalinformation of the system. It is observed that the LAMP-MMSE network can slightly outperformthe AMP-MMSE algorithm, which beneﬁts from the deep learning techniques to learn the optimalnetwork parameters.Fig. 7 shows the channel estimation performance in terms of the normalized mean squareerror (NMSE) versus SNR, where the metric is deﬁned as NMSE = || ˆ h − h || || h || . Similar to theuser activity performance, the DL techniques enable the LAMP network with ST function toachieve lower channel estimation error than AMP-ST. The LAMP-ST network is observed tooutperform the OMP algorithm at SNR = 0 dB, but the OMP algorithm can achieve much lowerNMSE than the LAMP-ST network with SNR increasing, which implies that the least squareestimation can offer better recoverability in high SNR regime. Compared with the ST function,the MMSE-optimal denoising function can provide signiﬁcant performance improvement for theAMP framework when SNR is larger, since the system statistics are exploited.The delay detection performance is shown in Fig. 8. Here, the error detection ratio is deﬁnedby the ratio of the number of the active users with wrongly detected delay to the number ofactive users. The error detection ratios are obtained under a ﬁxed false alarm ratio (cid:15) = 0 . .Since the delays of the active users are detected based on the estimated channel (cid:98) x or (cid:98) X , the SNR -35-30-25-20-15-10-5 N M SE AMP-STLAMP-STAMP-MMSELAMP-D-MMSELAMP-C-MMSE (a) M = 2 SNR -35-30-25-20-15-10-5 N M SE AMP-STLAMP-STAMP-MMSELAMP-D-MMSELAMP-C-MMSELAMP-H-MMSE (b) M = 4 Fig. 10. The channel estimation performance in the multiple-antenna scenario with T g = 3 . delay detection performance is heavily inﬂuenced by the channel estimation performance. Thus,we can ﬁnd that the delays are usually detected more accurately when more precise channelestimation is achieved. In particular, the LAMP-MMSE network can achieve slightly largererror detection ratio reduction than the AMP-MMSE algorithm with SNR increasing, which alsoinversely implies that LAMP-MMSE can enable slightly better channel estimation of the activeusers in the large SNR scenario.In the following, we consider the scenario where the BS has multiple antennas. The ST functionis only employed in the LAMP-C networks, and the MMSE-optimal denoising function is adoptedin all three kinds of LAMP networks. The LAMP-H network evaluated here consists of twoLAMP networks with vector shrinkage function that estimate the channels at two antennas when M = 4 . We ﬁrst evaluate the user activity detection performance shown in Fig. 9. It is observedthat increasing M can dramatically improve the user activity detection performance of the LAMPnetworks. It is shown that the centralized network always outperforms decentralized networkand the hybrid network when the type of the shrinkage function and M are constant, whichindicates that exploiting the common sparsity at all antennas can provide signiﬁcant performanceimprovement in user activity detection. In particular, though the statistical information is not fullyexploited in the LAMP-ST network, it can outperform LAMP-D-MMSE when M = 4 . Thisresult implies that centralized structure is potential to provide more performance improvementthan the well-designed shrinkage function by increasing the number of antennas at the BS. Theperformance of LAMP-H-MMSE lies between that of LAMP-D-MMSE and LAMP-C-MMSE,since it balances the computational complexity and the recoverability. -2 -1 False alarm ratio -2 -1 M i ss ed de t e c t i on r a t i o AMP-MMSE, Tg=0LAMP-MMSE, Tg=0LAMP-MMSE, Tg=1LAMP-MMSE, Tg=3LAMP-MMSE, Tg=5LAMP-MMSE, Tg=7 (a) The LAMP network with M = 1 -4 -3 -2 -1 False alarm ratio -4 -3 -2 -1 M i ss ed de t e c t i on r a t i o LAMP-D-MMSE, Tg=0LAMP-D-MMSE, Tg=1LAMP-D-MMSE, Tg=3LAMP-D-MMSE, Tg=5LAMP-D-MMSE, Tg=7 (b) The LAMP-D network with M = 4 -4 -3 -2 -1 False alarm ratio -4 -3 -2 -1 M i ss ed de t e c t i on r a i o LAMP-C-MMSE, Tg=0LAMP-C-MMSE, Tg=1LAMP-C-MMSE, Tg=3LAMP-C-MMSE, Tg=5LAMP-C-MMSE, Tg=7 (c) The LAMP-C network with M = 4 -4 -3 -2 -1 False alarm ratio -4 -3 -2 -1 M i ss ed de t e c t i on r a t i o LAMP-H-MMSE, Tg=0LAMP-H-MMSE, Tg=1LAMP-H-MMSE, Tg=3LAMP-H-MMSE, Tg=5LAMP-H-MMSE, Tg=7 (d) The LAMP-H network with M = 4 Fig. 11. The impact of the maximal symbol delay on the performance of the proposed LAMP networks in both the single-antennascenario and multiple-antenna scenario.

Fig. 10 shows the channel estimation performance of the LAMP networks in the multiple-antenna scenario. The metric is deﬁned as NMSE = ˆ || H − H || F || H || F . We can see that the channelestimation performance of the LAMP-D network keeps almost unchanged when the M increases.This implies that the size of our training data is large enough to enable the LAMP network workwell, which is usually unable to train a traditional deep neural network well. It is also show thatthe performance gap between LAMP-C-MMSE and the other LAMP networks including LAMP-D-MMSE and LAMP-H-MMSE will become smaller with SNR increasing. And the NMSE ofLAMP-H-MMSE will approach that of LAMP-C-MMSE in the high SNR regime. These resultsall imply that the LAMP-C-MMSE may have saturated performance when M or SNR is largeenough due to the limit of the AMP framework. Thus, the LAMP-H network is more suitable tobe employed in the practical scenario since it can approach the optimal performance and reducethe computation complexity simultaneously. Finally, Fig. 11 evaluates the impact of the maximal delay spread on the user activity detectionperformance of the LAMP network with MMSE-optimal shrinkage function. For comparisonpurpose, the synchronous system is also included, which is a special case of the asynchronoussystem (i.e., T g = D = 0 ). Therefore, its performance serves as a lower bound. Both thesingle-antenna and multiple-antenna scenarios are considered. In the single-antenna scenario,the performance of the AMP algorithm with known statistical information is also evaluated. Wecan ﬁnd that LAMP has similar performance to AMP when T g = 0 , and the performance of theLAMP network is only slightly worse when T g increases, which indicates that the performance ofthe LAMP network is robust to the maximal symbol delay. In the multiple-antenna scenario, theproposed networks also perform insensitively to the maximal symbol delay as well. In particular,the gap between two systems with different maximal symbol delays are much smaller in LAMP-C and LAMP-H compared with LAMP-D. Thus, the LAMP network with vector shrinkagefunction is more robust to the maximal symbol delay, which leads to less performance loss forthe asynchronous massive access system. Additionally, when the accurate maximal symbol delayis unknown in advance, we can set T g to be a larger value in the signal model without causingmuch performance degradation. VII. C ONCLUSION

This work shows that combining deep learning techniques with compressed sensing is effectivefor asynchronous grant-free massive connectivity with the signal model inserted into a guard time.Speciﬁcally, we propose to design the neural networks based on the AMP framework to jointlydetect active users, detect their delays and estimate their channels with unknown system statisticsof the channels and the users. Both the scenarios where the BS has single antenna and multipleantennas are considered. We ﬁrst design the LAMP network for the single-antenna scenario toexploit the potential of AMP by learning the parameters from the training data, and the neuralnetwork can slightly outperform the AMP algorithm with perfectly known system statistics.Furthermore, three LAMP network structures are proposed for the multiple-antenna scenario. Inspeciﬁc, the LAMP-D network can take advantage of distributed computation units to save therunning time consumption, the LAMP-C network can improve the recoverability by exploit thecommon sparsity in the channel matrix and allow feasible performance analysis, and the LAMP-H network has balanced complexity and recoverability, which is suitable to more complicatedsystems. Simulation results show that the signiﬁcant performance improvement is achieved by the proposed LAMP networks when perfect system statistics are unavailable. Additionally, theperformance of proposed LAMP networks is also shown to be robust to the maximal delayspread of the asynchronous users. R EFERENCES [1] W. Zhu, M. Tao, X. Yuan, and Y. Guan, “Asynchronous massive connectivity with deep-learned approximate messagepassing,” in

Proc. IEEE Int. Conf. Commun. , June 2020, pp. 1–6.[2] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C. K. Soong, and J. C. Zhang, “What will 5G be?”

IEEEJ. Sel. Areas Commun. , vol. 32, no. 6, pp. 1065–1082, June 2014.[3] C. Bockelmann and et al , “Towards massive connectivity support for scalable mMTC communications in 5G networks,”

IEEE Access , vol. 6, pp. 28 969–28 992, 2018.[4] C. Bockelmann, N. Pratas, H. Nikopour, K. Au, T. Svensson, C. Stefanovic, P. Popovski, and A. Dekorsy, “Massivemachine-type communications in 5G: physical and MAC-layer solutions,”

IEEE Commun. Mag. , vol. 54, no. 9, pp. 59–65,Sep. 2016.[5] L. Liu, E. G. Larsson, W. Yu, P. Popovski, C. Stefanovic, and E. de Carvalho, “Sparse signal processing for grant-freemassive connectivity: A future paradigm for random access protocols in the internet of things,”

IEEE Signal Process. Mag. ,vol. 35, no. 5, pp. 88–99, Sep. 2018.[6] H. F. Schepker, C. Bockelmann, and A. Dekorsy, “Exploiting sparsity in channel and data estimation for sporadic multi-usercommunication,” in

Proc. Int. Symp. Wireless Commun. Syst. , Aug 2013, pp. 1–5.[7] G. Wunder, P. Jung, and M. Ramadan, “Compressive random access using a common overloaded control channel,” in

Proc. IEEE Global Commun. Conf. Workshops , Dec 2015, pp. 1–6.[8] Z. Chen, F. Sohrabi, and W. Yu, “Sparse activity detection for massive connectivity,”

IEEE Trans. Signal Process. , vol. 66,no. 7, pp. 1890–1904, April 2018.[9] L. Liu and W. Yu, “Massive connectivity with massive MIMO-Part I: Device activity detection and channel estimation,”

IEEE Trans. Signal Process. , vol. 66, no. 11, pp. 2933–2946, June 2018.[10] K. Senel and E. G. Larsson, “Grant-free massive MTC-enabled massive MIMO: A compressive sensing approach,”

IEEETrans. Commun. , vol. 66, no. 12, pp. 6164–6175, Dec 2018.[11] Z. Sun, Z. Wei, L. Yang, J. Yuan, X. Cheng, and L. Wan, “Exploiting transmission control for joint user identiﬁcation andchannel estimation in massive connectivity,”

IEEE Trans. Commun. , vol. 67, no. 9, pp. 6311–6326, Sep. 2019.[12] A. Fengler, S. Haghighatshoar, P. Jung, and G. Caire, “Non-bayesian activity detection, large-scale fadingcoefﬁcient estimation, and unsourced random access with a massive MIMO receiver,” 2020. [Online]. Available:https://arxiv.org/abs/1910.11266v2[13] Z. Chen, F. Sohrabi, and W. Yu, “Multi-cell sparse activity detection for massive random access: Massive MIMO versuscooperative MIMO,”

IEEE Trans. on Wireless Commun. , vol. 18, no. 8, pp. 4060–4074, 2019.[14] M. Ke, Z. Gao, Y. Wu, X. Gao, and K. Wong, “Massive access in cell-free massive mimo-based internet of things: Cloudcomputing and edge computing paradigms,”

IEEE J. Sel. Areas Commun. , pp. 1–1, 2020.[15] M. Borgerding, P. Schniter, and S. Rangan, “AMP-inspired deep networks for sparse linear inverse problems,”

IEEE Trans.Signal Process. , vol. 65, no. 16, pp. 4293–4308, Aug 2017.[16] D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing algorithms for compressed sensing,”

Proc. Nat. Acad. Sci. ,vol. 106, no. 45, pp. 18 914–18 919, 2009. [17] Y. Polyanskiy, “A perspective on massive random-access,” in Proc IEEE Int. Symp. Inf. Theory , June 2017, pp. 2523–2527.[18] V. K. Amalladinne, A. K. Pradhan, C. Rush, J.-F. Chamberland, and K. R. Narayanan, “Unsourced randomaccess with coded compressed sensing: Integrating AMP and belief propagation,” 2020. [Online]. Available:https://arxiv.org/abs/2010.04364[19] V. Shyianov, F. Bellili, A. Mezghani, and E. Hossain, “Massive unsourced random access based on uncoupled compressivesensing: Another blessing of massive mimo,”

IEEE J. Sel. Areas Commun. , pp. 1–1, 2020.[20] E. Sadeghabadi, S. M. Azimi-Abarghouyi, B. Makki, M. Nasiri-Kenari, and T. Svensson, “Asynchronous downlink massivemimo networks: A stochastic geometry approach,”

IEEE Trans. Wireless Commun. , vol. 19, no. 1, pp. 579–594, 2020.[21] A. T. Abebe and C. G. Kang, “Comprehensive grant-free random access for massive low latency communication,” in

Proc.IEEE Int. Conf. Commun. , May 2017, pp. 1–6.[22] V. K. Amalladinne, K. R. Narayanan, J. Chamberland, and D. Guo, “Asynchronous neighbor discovery using coupledcompressive sensing,” in

Proc. IEEE Int. Conf. Acoustic, Speech, Signal Process. , May 2019, pp. 4569–4573.[23] T. Ding, X. Yuan, and S. C. Liew, “Sparsity learning-based multiuser detection in grant-free massive-device multipleaccess,”

IEEE Trans. Wireless Commun. , vol. 18, no. 7, pp. 3569–3582, July 2019.[24] Y. Bai, B. Ai, and W. Chen, “Deep learning based fast multiuser detection for massive machine-type communication,” in

Proc. IEEE 90th Veh. Technol. Conf. , Sep. 2019, pp. 1–5.[25] Z. Zhang, Y. Li, C. Huang, Q. Guo, C. Yuen, and Y. L. Guan, “DNN-aided block sparse bayesian learning for user activitydetection and channel estimation in grant-free non-orthogonal random access,”

IEEE Trans. Veh. Technol. , vol. 68, no. 12,pp. 12 000–12 012, Dec 2019.[26] H. F. Schepker, C. Bockelmann, and A. Dekorsy, “Coping with CDMA asynchronicity in compressive sensing multi-userdetection,” in

Proc. IEEE 77th Veh. Technol. Conf. , 2013, pp. 1–5.[27] K. Gregor and Y. LeCun, “Learning fast approximations of sparse coding,” in

Proc. Int. Conf. Mach. Learn. , 2010, pp.399–406.[28] X. Chen, J. Liu, Z. Wang, and W. Yin, “Theoretical linear convergence of unfolded ista and its practical weights andthresholds,” in

Proc. Neural Inf. Process. Syst. Conf. , 2018, pp. 9061–9071.[29] J. A. Tropp, “Algorithms for simultaneous sparse approximation. part II: Convex relaxation,”

Signal Processing , vol. 86,no. 3, pp. 589 – 602, 2006.[30] J. Ziniel and P. Schniter, “Efﬁcient high-dimensional inference in the multiple measurement vector problem,”

IEEE Trans.Signal Process. , vol. 61, no. 2, pp. 340–354, Jan 2013.[31] J. Kim, W. Chang, B. C. Jung, D. Baron, and J. C. Ye, “Belief propagation for joint sparse recovery,” 2011. [Online].Available: http://arxiv.org/abs/1102.3289[32] D. Rumelhart, G. Hinton, and R. Williams, “Learning representations by back-propagating errors,”

Parallel DistributedProcessing: Explorations in the Microstructure of Cognition , vol. 1, pp. 318–362, 1986.[33] C. Metzler, A. Mousavi, and R. Baraniuk, “Learned D-AMP: Principled neural network based compressive image recovery,”in

Proc. Neural Inf. Process. Syst. Conf. , 2017, pp. 1772–1783.[34] H. He, S. Jin, C. Wen, F. Gao, G. Y. Li, and Z. Xu, “Model-driven deep learning for physical layer communications,”

IEEE Wireless Commun. , vol. 26, no. 5, pp. 77–83, Oct. 2019.[35] C. A. Metzler, A. Maleki, and R. G. Baraniuk, “From denoising to compressed sensing,”