[PDF] A Layered Grouping Random Access Scheme Based on Dynamic Preamble Selection for Massive Machine Type Communications

Abstract

Massive machine type communication (mMTC) has been identified as an important use case in Beyond 5G networks and future massive Internet of Things (IoT). However, for the massive multiple access in mMTC, there is a serious access preamble collision problem if the conventional 4-step random access (RA) scheme is employed. Consequently, a range of grantfree (GF) RA schemes were proposed. Nevertheless, if the number of cellular users (devices) significantly increases, both the energy and spectrum efficiency of the existing GF schemes still rapidly degrade owing to the much longer preambles required. In order to overcome this dilemma, a layered grouping strategy is proposed, where the cellular users are firstly divided into clusters based on their geographical locations, and then the users of the same cluster autonomously join in different groups by using optimum energy consumption (Opt-EC) based K-means algorithm. With this new layered cellular architecture, the RA process is divided into cluster load estimation phase and active group detection phase. Based on the state evolution theory of approximated message passing algorithm, a tight lower bound on the minimum preamble length for achieving a certain detection accuracy is derived. Benefiting from the cluster load estimation, a dynamic preamble selection (DPS) strategy is invoked in the second phase, resulting the required preambles with minimum length. As evidenced in our simulation results, this two-phase DPS aided RA strategy results in a significant performance improvement

Full PDF

aa r X i v : . [ c s . I T ] F e b A Layered Grouping Random Access SchemeBased on Dynamic Preamble Selection forMassive Machine Type Communications

Gaofeng Cheng, Huan Chen, Pingzhi Fan, Li Li, Li Hao

Abstract

Massive machine type communication (mMTC) has been identiﬁed as an important use case inBeyond 5G networks and future massive Internet of Things (IoT). However, for the massive multipleaccess in mMTC, there is a serious access preamble collision problem if the conventional 4-step randomaccess (RA) scheme is employed. Consequently, a range of grant-free (GF) RA schemes were proposed.Nevertheless, if the number of cellular users (devices) signiﬁcantly increases, both the energy andspectrum efﬁciency of the existing GF schemes still rapidly degrade owing to the much longer preamblesrequired. In order to overcome this dilemma, a layered grouping strategy is proposed, where the cellularusers are ﬁrstly divided into clusters based on their geographical locations, and then the users of thesame cluster autonomously join in different groups by using optimum energy consumption (Opt-EC)based K-means algorithm. With this new layered cellular architecture, the RA process is divided intocluster load estimation phase and active group detection phase. Based on the state evolution theoryof approximated message passing algorithm, a tight lower bound on the minimum preamble length forachieving a certain detection accuracy is derived. Beneﬁting from the cluster load estimation, a dynamicpreamble selection (DPS) strategy is invoked in the second phase, resulting the required preambles withminimum length. As evidenced in our simulation results, this two-phase DPS aided RA strategy resultsin a signiﬁcant performance improvement.

Index Terms — grant-free, layered grouping, AMP algorithm, minimum preamble length.

I. I

NTRODUCTION

In the next generation cellular networks, massive machine-type communications (mMTC) willplay an essential role for building the massive Internet of Things (IoT) and hence have been

Gaofeng Cheng, Huan Chen, Pingzhi Fan, L. Li, and L. Hao are with the School of Information Science and Technology,Southwest Jiaotong University, Chengdu, 610031, China (e-mail: [email protected]; [email protected]; { pzfan, ll5e08, lhao } @home.swjtu.edu.cn). identiﬁed as one of the three main use cases in 5G and Beyond 5G (B5G) services. The mMTCservice has its particular features: (1) compared with human-to-human (H2H) communications,the number of potential users (devices) in mMTC scenario could reach up to millions [1]; (2)the user activity patterns are very sporadic; (3) mMTC demonstrates a salient short-packet-transmission property [2]; (4) the users are very sensitive to energy consumption [3].Let N represent the total number of potential users in a cell and N ac the number of activeusers during one random access opportunity (RAO). Hence, the system sparsity could be deﬁnedas λ = N ac N . Even λ is normally small in mMTC scenario, N ac is still signiﬁcantly larger thanthe size of LTE preamble pool. Because N is extremely large [4]. As a result, the traditionalLTE grant-based four-step random access will encounter a serious preamble collision problem,which dramatically reduces the access success probability and increases the access latency [2].Implementing the synchronization, active user detection, channel estimation, as well as the datarecovery in a one-shot joint operation becomes a promising direction, in grant-free (GF) RA [5]–[7]. Since GF RA realizes an ”arrival-and-go” transmission of payload, it has attracted signiﬁcantattentions in recent years.Obviously, the non-orthogonal multiple access (NOMA) technology directly motivate theconcept of GF RA as NOMA allows multiple users to share the same time-frequency resource.Base Station (BS) can distinguish multi-user data through their different signature patterns ifthe system overload does not exceed a certain level. Hence the cellular users can transmit theirdata whenever available. A range of GF NOMA schemes were proposed, including power-basedGF NOMA [8], spreading-based GF NOMA [9]–[11], interleaving-based GF NOMA [12], etc.However, before grant-free transmission, GF NOMA schemes still require necessary overheadsto tackle with the synchronization, identiﬁcation and channel estimation of active users. Someof the proposals even assume that the active users already connect to BS, or the active usersand BS know almost everything about each other, such as the number of multiplexing users,their modulation and coding schemes. These facts imply that realizing an idealized GF RA isextremely challenging, and the required overhead may be inevitably large in practice.As a result, reducing the overhead for GF RA becomes an important issue. In this spirit,compressive sensing (CS) technology becomes another foundation to enable GF RA [13], [14].Because CS is capable of recovering the desired signals from far fewer measurements than thetotal signal dimensions if a certain signal sparsity is guaranteed, CS is normally employed inGF RA schemes to overcome the challenges of user identiﬁcation and channel estimation [15], [16], or even simultaneously recover the payload data [17].In the CS based GF RA, each device is assigned a user speciﬁc pilot sequence, termedas preamble. According to CS theory, the preamble length for enabling a successful signalreconstruction is impacted by the number of total users, the sparsity, and the type of measurementmatrix. This preamble length is also regarded as a dominant overhead metric of CS based GFRA. Therefore, investigating the minimum preamble length (MPL) is important to mitigate highoverhead problem. The associated theoretical analysis has been attempted in [18], [19]. However,only an asymptotic order of MPL was obtained. Some uncertain parameters were still involvedin this asymptotic order, whose value have to be experimentally tested according to speciﬁcscenarios. This problem obviously limits the application of these theoretical results. On theother hand, with the increasing number of potential users or active users, the preamble lengthhas to be increased accordingly. Consequently, in future ultra-dense cellular IoT networks ( > devices / km ) [20], the high overhead problem may still exist even employing CS based GFRA, which limits the number of affordable users within the same cell and also aggravate theconstrained power budget of RA procedure.Extending our horizon further, to address the overload problem in future random access channel(RACH), a range of other methods have been proposed [21]. These existing approaches couldbe categorized into push-based and pull-based. In push-based approaches, the RA requests aretriggered from the device side while in pull-based approaches, the contention is controlled fromthe BS side. Among these methods, it is noticeable that, grouping is an efﬁcient alternativeto relax the cellular density, hence facilitates the massive connectivity and reduces the energyconsumption of RA procedure. That is, all users can be grouped according to various metricsincluding the quality of service (QoS) [22], the level of received energy [23], the maximumtolerable delay, etc. Most of the grouped transmissions fall under the category of the pull-basedapproach [24], where every group has its unique group identiﬁcation (GID) and the users of agroup will access the BS if and only if their GID is granted by the paging message sent from theBS [25]. In [26], every group further selects one of its members as the group head (GH). TheGH will act as a relay node for other members in the same group.In general, grouped randomaccess demonstrated some particular advantages. However, these pull-based schemes may causea serious latency problem while the interval between two paging messages that grant the sameGID is quite long [27].In this paper, we more focus on solving the active user detection challenge encountered in mMTC scenarios. More speciﬁcally, this paper aims at reducing the CS signaling overhead,saving the random access energy consumption, and accommodating signiﬁcantly more cellularusers. Hence, a layered grouping RA scheme based on dynamic preamble selection is proposed.The main contributions of this paper are summarized as follows:1) A layered grouping network framework is presented, where a cell is divided into severallarge clusters based on their geographical locations, and each cluster is further partitionedinto a number of small groups according to the proposed construction and maintenancealgorithms. It is assumed that users are normally active in units of small groups, notindividually.2) In this layered grouping cellular architecture, the initialization, update, as well as grouphead (GH) selection procedure of every small group are implemented autonomously, wherea self-organizing optimum energy consumption (Opt-EC) based K-means algorithm isdesigned and employed. The user which is capable of maximizing the energy efﬁciencyof the entire group is selected as the GH.3) Associated with the layered cellular architecture, the RA procedure is divided into clusterload estimation phase (namely, phase-I, in which RA operates in a push-based manner)and active group detection phase (namely, phase-II, in which RA operates in a pull-basedmanner). The conventional user ID (UID) based random access is replaced by a uniquegroup ID (GID) based random access. Here, the layered cellular architecture, the two-phaseaccess procedure, as well as the formed groups access entity, allow a BS to connect withmuch more coexisted users in an energy efﬁcient manner.4) Two kinds of preambles, short and long preambles, are employed, where the short pream-bles are orthogonal and allocated to clusters as their signature, while the long preambleshaving the cluster-load depended minimum length are non-orthogonal and allocated togroup heads as group identity. The state-of-the-art approximated message passing (AMP)algorithm [16], [17] is employed for realizing the active group detection.5) To analyze the overhead problem and the impact of the preamble length in the AMPalgorithm, a tight lower bound on the minimum preamble length (MPL) is derived basedon the state-evolution method.6) Based on the preamble categorization and MPL lower bound analysis, a dynamic preambleselection (DPS) strategy is adopted in phase-II, where the required preambles having thecluster-load depended minimum length are dynamically selected. It is shown that, beneﬁting from both the hierarchical preamble assignment and the DPS strategy, the overhead of theproposed RA strategy can be signiﬁcantly reduced.The rest of this paper is organized as follows: in Section II, the layered grouping networkframework is proposed. Its associated constructing and maintaining mechanism is designed,while the two-phase RA scheme is also depicted. Then, critical techniques employed in theproposed two-phase DPS RA strategy including optimal energy computing (Opt-EC) based K-means grouping algorithm, AMP detection algorithm, and dynamic preamble selection algorithmare discussed in Section III. The tight lower bound on the minimum preamble length is derivedin Section IV. Simulation results are demonstrated and analyzed in Section V. Finally, the paperis concluded in Section VI. II. S YSTEM M ODEL

A. Network Framework of Layered Grouping

We consider an mMTC cell having a radius R , where the BS equipped with a single antennalocates in the cell center and a total number of N coexisted users randomly distribute in the cell.It is assumed that all the mMTC users are mainly static, a typical scenario in mMTC applications[21], e.g. the interactions among machines in the industrial automation, the monitoring in smartagriculture, the environment monitoring for public safety, etc.According to predeﬁned system conﬁgurations including cell size, QoS requirement, maximumnumber of coexisted cellular users, etc. , BS will divide all the cellular users into a number of K clusters. Typically, K could be a small value, e.g. K = , , , or 16. During phase-I of theproposed RA, in order to reduce the overhead, if the number of clusters is relatively small, thenevery cluster can be distinguished by a very short orthogonal preamble. Hence, the overheadis reduced. On the other hand, in order to facilitate the practical synchronization, it is betterthat the users in the same cluster experience similar transmission delays, normally located ina geographical area having roughly the same distance to the BS. This could be achieved byestimating the received signal strength (RSS) at the BS [28], [29]. Ideally, the entire cellular isdivided into a number of K rings and all the users located in the same ring will be assigned tothe same cluster. This effect is visualized by the dashed ellipses in Fig.1. In practice, owing tothe limited localization accuracy, some edge users may be assigned to their adjacent cluster. Butthis potential mismatch will not impact the proposed two-phase GF RA scheme, as the proposedRA scheme is capable of adapting to unbalanced cluster loads. Fig. 1: Network topology of the layered grouping in an mMTC cell.

Then, the users assigned to the k th cluster c k , k = , , · · · , K will further participate in anumber of M ( k ) groups. These groups are initialized and updated in a self-organizing manner.The m th group in k th cluster is denoted by g k , m . The users pertaining to a group is termed asgroup members and the set of their user ID is denoted by G k , m . The number of group membersis termed as the group size and denoted by | g k . m | . We would like to constrain the group size by asmall value. Because a large group size normally results in longer distances between GH and itsgroup members, which implies less reliable device-to-device (D2D) links. Actually, it was shownin Fig.3 of [30] that packet error rate over D2D links becomes non-negligible after the group sizeexceeds 20. Hence we could further reasonably assume that all the group members are close toeach other. As a beneﬁt, the internal message exchange among group members can be reliablyrealized by the D2D communication technique [30], which could be interference-free to othergroups and hence spectral efﬁcient. Moreover, a particular group member, namely ˙ u k , m will beselected as the GH of group g k , m . If a normal group member u n wants to communicate with theBS, it ﬁrstly sends the message to its GH ˙ u k , m . Then the GH ˙ u k , m relays this message to BS,and vice versa. It implies that throughout the GF RA procedure, the GH ˙ u k , m will communicatewith the BS on behalf of all the members in the same group. The above-mentioned mechanismis visualized by the dotted ellipses in Fig.1, where the GH is equivalently denoted as a relaynode.Finally, a GH ˙ u k , m possesses two kinds of access preambles. The ﬁrst one, namely s Ik is onlyused in phase-I and actually the signature of cluster c k . All cluster signatures are orthogonal to each other, i.e. h s Ii , s Ij i =  , i f j = i , i f j = i , where i , j = , , · · · , K . Since K is relatively small, thisorthogonality can be easily satisﬁed, even for a short preamble length. The second one, namely s IIk , m is only used in phase-II and actually the signature of group g k , m . Since M ( k ) is normally avery large number in mMTC scenarios, s IIk , m employed by different groups in the same clusterhas to be non-orthogonal sequences for reducing overhead. More details of preamble assignmentcould be found in Section II-B and Section III-C. B. Construction and Maintenance of the Layered Grouping

The construction of clusters can be controlled by BS, where only an approximated distancefrom a user to BS is required. On the other hand, the formation and update of groups ineach cluster may also be implemented in a centralized manner [30] [31], where BS controlsthe selection of GHs and assigns their group members. However, this centralized managementrequires a range of global information including users’ accurate positions, propagations, datarates, battery levels, etc. Aggregating these information from millions devices in the mMTCscenario may become prohibitive. Hence distributed self-organized formation and update ofgroups are advocated in this paper. Thus, the construction and maintenance procedures of theproposed layered network framework are designed as follows1) While a user (device) u n , n = , , . . . , N , ﬁrstly powers on in the cell and hears thesystem broadcast information (including the power level of control channel) from BS,its registration process will then start by sending a registration message containing userID, device type, and a couple of reference signals, to BS in a contention free manner .2) Based on the reference signals contained in the registration message, BS is capable ofapproximating the distance between a user and itself by utilizing RSS aided positioningtechniques [28], [29] and further assigning u n to an appropriate cluster c k . Then, BS assignsthe generation method of a pair of preambles s Ik and s IIk , m , the initial preamble lengths, aswell as the group size (cid:12)(cid:12) g k , m (cid:12)(cid:12) to u n . These information and a couple of reference signalsare contained in the registration response message (RRM).3) Based on the reference signals in RRM, user u n is capable of estimating the channel fromBS, and the associated channel state information (CSI) is denoted by h n , b . It is assumed Since the number of users simultaneously switching on is normally extremely low, contention-free transmission of registrationmessage could be realized by predeﬁning a small set of speciﬁc channel resources. that all the channels are reciprocal and have a relative long coherent time based on the factthat the mMTC devices are mainly static in our application scenarios. According to RRM,user u n becomes aware of its cluster index k . Then, user u n will further autonomouslyselect itself as a GH in a probability of | g k , m | .4) BS will periodically broadcast a group update opportunity message (GUOM) to all cellularusers. Bearing the quasi-static property of our application scenario in mind, the groupupdate period could be generally long, say daily or even weekly for reducing systemoverhead.5) Once the cellular users hear GUOM, they will implement the group initialization orupdate procedure in a self-organizing manner and via D2D links. An Opt-EC based K-means grouping algorithm is designed to iteratively improve the grouping relationship andselect the energy efﬁcient GHs, which will be elaborated in Section III-A. With the aid ofthis K-means grouping algorithm, the groups in a cluster are constructed and updated.6) If the role of a user changes (i.e. switches from a normal group member to a GH and viceversa), it will inform BS of its new state. The BS will add or remove the associated groupID from its group list.The above-mentioned construction and maintenance procedures are illustrated in Fig.2. C. Two-Phase Random Access

As depicted in Fig.3, we divide the proposed RA procedure into phase-I and phase-II. Duringphase-I, after receiving the RAO message from the BS, the GHs of all the active groups in thecell will ﬁrst transmit their cluster preambles s Ik . Bear in mind that s Ik of a GH ˙ u k , m has beenspeciﬁed during the registration process introduced in Section II-B. Furthermore, a group g k , m is regarded as an active group if one or more members in this group want to transmit payloaddata to the BS. Accordingly, the signal received at the BS during phase-I can be written as y I = K ∑ k = M ( k ) ∑ m = a k , m p P k , m s Ik h k , m → b + ω , (1)where the activity state a k , m ∈ { , } indicates the activity of a group g k , m . If g k , m is active, a k , m =

1, and so forth. h k , m → b is the CSI of the channel from the GH ˙ u k , m to the BS, which In the case a group has not been created before. In the case a group has existed. (cid:258) ... T i m e u u u N BS broadcast system information sends registration message u registration response message to u sends registration message N u registration response message to N u , obtain , , and thenrandomly transform to a GH n b h k GHs send grouping messages periodically broadcast group update opportunity message ,1 A normal user selects the GH and sends subscribing message k u ,1 ,1,1 k informs BS of adding GHs a k u u k u ,1 and negotiatupdate e the GH ga a in k n (cid:712) ,1 and k n ,1,1 (cid:712) k nk n ,1 k u ,1 k u ( ) , k k M u ( ) , ( )( ) , u ... ... u u informs BS of removing from GH list u st iteration of K-means algorithm ,1 gk ( ) , k gk M ,2 gk i th iteration of K-means algorithm periodically broadcast group update opportunity message ... Fig. 2: Construction and maintenance of the layered grouping network framework. simultaneously encapsulates both the large-scale fading and small-scale fading. ω is an AWGNvector, whose elements obey i.i.d complex Gaussian distribution having zero mean and variance σ . P k , m is the actual transmit power of the GH ˙ u k , m , which is given by P k , m = P · β min β k , m , (2)where P is a common transmit power that can be afforded by all the GHs. The value of P couldbe assigned to the users during their registration process. β k , m is the average large-scale fadingcoefﬁcient of the channel from the GH ˙ u k , m to the BS. It could be continuously updated bytesting the reference signals transmitted by the BS, e.g. the reference signals included in theregistration response message, the GUO message and the RAO message. β min is the minimumvalue among β k , m , k ∈ { , , · · · , K } , m ∈ { , , · · · , M ( k ) } , which could be carried in the RAOmessage. By substituting (2) to (1), it is apparent that specifying the actual transmit power P k , m0 according to (2) is equivalent to employing an adaptive power control mechanism. Hence, at theBS, the average power of the signal received from ˙ u k , m approximately equals to P · β min [17].During a RAO, the number of active groups in a cluster c k is termed as its cluster load. Hence,based on the received signal y I , the BS is capable of estimating the cluster load of c k , which isgiven by ˆ M ( k ) ac = h y I , s Ik i p P β min . (3)Correspondingly, the sparsity of cluster c k is approximated byˆ λ k = ˆ M ( k ) ac M ( k ) . (4)After the cluster load estimation formulated by (3), the BS could rank the access priority ofdifferent clusters according to their cluster loads ˆ M ( k ) ac . A higher access priority is assigned to thecluster having a larger cluster load. As indicated by the largest streams in the middle and bottomof Fig.3, the GHs in the highest loaded cluster will ﬁrstly send their group access preambles andtheir payload data, respectively. Based on the cluster load estimation, as well as the compressivesensing theorem, the BS will adaptively select the group preamble length of L k = | s IIk , m | fordifferent clusters. After ranking the access priority and selecting the group preamble length, theBS could arrange the access slots of every cluster. Then, the cluster-speciﬁc access slots andpreamble lengths that will be used in phase-II are broadcasted by the BS. This message is calledas the “Phase-II solution message” (PSM) in Fig.3. More details of DPS strategy is provided inSection III-C.After receiving the phase-II solution messages, the GHs of all the active groups will generatetheir unique preambles of s IIk , m . The associated generation method has been determined in theuser registration process and the preamble length is indicated by the phase-II solution messages.At this moment, the proposed two-phase RA procedure starts its phase-II operations. Firstly, theGHs of all the active groups in the same cluster will simultaneously send their group preamblesduring a speciﬁc access slot that has been indicated by the phase-II solution messages, whichare illustrated by the shadowed arrows having dashed, solid, and dotted borders in the middle In this paper, complex gaussian sequences having zero mean and variance | s IIk , m | are employed as s IIk , m . Active GHs of c send preambles of s II1,m

Active GHs of c k send preambles of s II k,m GHs of c GHs of c k GHs of c K ……………… … BS … Active GHs of c K send preambles of s IIK,m

Active

Group Detection

PDTS message to c k PDTS message to c PDTS message to c K Cluster load estimationPhase-II solution message t Phase- IPhase-II

Data

Trans.

Ranking &

Dynamic preamble selection

Active GHs send their cluster preambles of s I k Active

Group Detection

Active Group Detection ............ ......

Active GHs of c send preambles of s II1,m

Active GHs of c k send preambles of s II k,m GHs of c GHs of c k GHs of c K ……… … BS … Active GHs of c K send preambles of s IIK,m

Active

Group Detection

PDTS message to c k PDTS message to c PDTS message to c K Cluster load estimationPhase-II solution message t Phase- IPhase-II

Data

Trans.

Ranking &

Dynamic preamble selection

Active GHs send their cluster preambles of s I k Active

Group Detection

Active Group Detection ............ ......

Fig. 3: Transmission stream of the two-phase grant-free random access procedure. of Fig.3. During the access slot of cluster c k , the signal received at the BS is given by y IIk = M ( k ) ∑ m = a k , m p P k s IIk , m h k , m → b + ω k = p P k S IIk x k + ω k , (5)where a k , m and h k , m → b have been deﬁned in (1). ω k is the AWGN vector in the access slot of c k . P k is the standard transmission power of all the GHs in phase-II. Let L k denote the length of s IIk , m , then we have S IIk = [ s IIk , , s IIk , , · · · , s IIk , M ( k ) ] ∈ R L k × M ( k ) and x k = [ x k , , x k , , · · · , x k , M ( k ) ] T , where x k , m = a k , m h k , m → b .As illustrated in the middle of Fig.3, after the active group detection of all the clusters arecompleted, the BS will broadcast payload data transmission solution (PDTS) message to every cluster. The active group ID detected by the BS and its payload data transmission time slot arecarried in the PDTS message. Finally, as observed at the bottom of Fig.3 that the GH of everyactive group will relay the payload data of their group members to the BS at different time slotsas indicated by PDTS messages.III. G ROUP M AINTENANCE AND D ETECTION A LGORITHMS

A. Group Initialization and Update Algorithms

As stated in Section II-B, during step 5 of the construction and maintenance procedure of thelayered grouping framework, after receiving the group update opportunity message from BS, theexisting GHs ˙ u k , m , ≤ m ≤ M ( k ) will broadcast grouping messages. Group ID is contained in thesegrouping messages. A registered neighboring non-GH user u n will select the one from whichit hears the strongest grouping message as its intended GH. Then, u n broadcasts a subscribingmessage (SM), which carries its user ID, intended group ID. After receiving all the subscribingmessages, a GH ˙ u k , m is capable of determining all of its group members, and then feedbacksthis temporary decision of G k , m to all the group members. The transmission of above-mentionedintra-group signaling messages will rely on dedicated spectrum such as 5905-5925 MHz in5G NR V2X (PC5 interface) [32], or unlicensed spectrum technologies, namely D2D outbandcommunications [30]. Hence, the impact of these extra overheads on the cellular radio accessnetwork (RAN) can be negligible. These intra-group signaling are summarized as lines 3-5 inAlgorithm 1.The proposed Opt-EC based K-means grouping algorithm aims at selecting the best GH, whichsimultaneously minimizes the energy required by intra-group communications and that requiredby external cellular communications. In order to realize this optimization, every group membershould be aware of all the channel conditions from other group members to itself, namely H n = { h i , n : n ∈ G k , m , i = , , · · · , (cid:12)(cid:12) g k , m (cid:12)(cid:12) , i = n } . Again, this requirement could be satisﬁed byexploiting D2D outband communications and the associated overhead is negligible for cellularRAN. The update of H n for every group member is summarized as line 6 in Algorithm 1.Herein, we further assume that the transmit power, packet size and bandwidth of the referencesignals contained in the intra-group signaling messages are ﬁxed to P , D , and B , respectively. Because, during the ensuing payload transmission slot, the group members will ﬁrst send their data to the GH, then the GHrelays all the data to the BS. Algorithm 1:

Opt-EC based K-means grouping algorithm

Initialize: cluster index k . begin while (Predeﬁned max iterations has not been reached.) do for ( m = to M ( k ) ) do ˙ u k , m selected in last iteration broadcasts grouping message; end forfor ( n = N ) dofor ( m = to M ( k ) ) do u n attempts to hear from ˙ u k , m and estimate h ˙ u k , m , n ; end for u n subscribe group −−−−−−−−−→ arg max ˙ u k , m heard by u n h ˙ u k , m , n ; end forfor m = M ( k ) do Update G k , m and every H n ; ˙ u k , m = arg min u n ∈ G k , m γ k , m ( u n ) ; end forend whileend Assume that u n , u i are group members of g k , m , i.e. n , i ∈ G k , m . The achievable error-free data-rate from u i to u n could be characterized by R i , n = B log (cid:20) + P | h i , n | N o (cid:21) , where N o denotes thepower density of additive Gaussian noise. Accordingly, if we select u n as the GH ˙ u k , m , theenergy required by intra-group communications in ensuing payload transmission slot could becharacterized by ε ( n ) inner = ∑ i ∈ G k , mi = n P · DR i , n . Similarly, the achievable error-free data-rate from u n to BS could be characterized by R n , b = B log (cid:20) + P | h n , b | N o (cid:21) . Again, if we select u n as GH ˙ u k , m , the energy required by external commu-nications between u n and BS could be characterized by ε ( n ) outer = P · | g k , m |· DR n , b . Finally, the energy efﬁciency of selecting u n as the GH ˙ u k , m could be characterized by γ k , m ( u n ) = ε ( n ) inner + ε ( n ) outer , where a smaller value of γ k , m ( u n ) implies a better energy efﬁciency. γ k , m ( u n ) , n ∈ G k , m will be calculated at the group member u n and then be forwarded to currentGH. Hence, by running the Opt-EC based K-means algorithm, the GH of g k , m could be selectedaccording to ˙ u k , m = arg min u n ∈ G k , m γ k , m ( u n ) , which is summarized as line 7 in Algorithm 1.The above-mentioned operations can be repeated again among the cellular users for furtheradjusting the grouping relationships and optimizing GH selections. But, in practice, owing to the limited energy budget, the iterations of Algorithm 1 has to be terminated within a predeﬁnedmaximum number. B. MMSE Denoiser Based AMP Algorithm

The classical system model used in compressive sensing is represented as y = Ax + ω , (6)where x is the original signal vector having a number of M elements. A ∈ R L × M is the measure-ment matrix. ω is an AWGN vector. Since M ≫ L , y is actually a compressed and corruptedobservation of x . As an efﬁcient solution of recovering x from y , the approximated messagepassing (AMP) algorithm is ﬁrst proposed in [33]. Its theoretical derivation could be found in[34].In more detail, the AMP algorithm could be formulated by the following iterative procedures x t + = η t ( A ∗ z t + x t , τ t ) , (7) z t + = y − Ax t + + µ z t h η ′ t ( A ∗ z t + x t ; τ t ) i , (8) τ t ≈ √ L k z t k . (9)where x is initialized to a zero vector, i.e. x = η t ( · ) is the soft thresholding function and t is the index of iteration, x t represents the estimation of x at the t th iteration, z t calculates theresidual component, A ∗ denotes the conjugate transpose of A , h·i denotes the average of a vector, η ′ t is the ﬁrst derivative of η t with respect to the ﬁrst argument, and µ = LM is the under-samplingratio. In contrast to the conventional iterative thresholding algorithms, η ′ t ( A ∗ z t + x t ; τ t ) is a newcomponent invoked by the AMP algorithm and known as the “Onsager reaction term”, which isidentiﬁed as the fundamental improvement of the AMP algorithm.Furthermore, in [15], the soft thresholding denoiser η t ( · ) is developed to an MMSE denoiseras follows η t ( ˆ x tm , g m ) = E [ X | ˆ X t = ˆ x tm , G = g m ] , (10)where X , ˆ X t , ˆ x tm , and g m have the same deﬁnitions as that in [15]. Algorithm 2:

Dynamic preamble selection algorithm.

Input: y I Output: V , L = [ L , L , · · · , L K ] ; /* V indicates the access slotindices of every cluster, L indicates the preamblelengths selected for every cluster. */ Initialize: K , target pF , target pM , n M ( k ) , k = , , · · · , K o , n [ R ( k ) , R ( k ) ] , k = , , · · · , K o ; /* R ( k ) , R ( k ) are the inner and outer radius of the k th cluster, respectively. */ begin for k=1 to K do ˆ M ( k ) ac = Estimate Cluster Load ( y I , k ) ; /* according to (3) . */ L k = Lower Bound on MPL ( ˆ M ( k ) ac , M ( k ) , R ( k ) , R ( k ) , pF , pM ) ; /* accordingto (18) , (20) and (23) . */ L [ k ] = . ∗ L k ; /* slightly enlarge L k , see Section IV. */ ˆ M ac [ k ] = ˆ M ( k ) ac ; end for V = Allocate Access Slot ( ˆ M ac , L ) ; /* arrange the access priorityof every cluster in the descending order of ˆ M ( k ) ac , thenallocate the access slot indices of every clusteraccording to L . */ Return: V , L ; end This MMSE denoiser can employ the large-scale fading coefﬁcient G known at the BS as apriori information of AMP algorithm. Hence it results in a better recovery accuracy. The above-stated MMSE based AMP algorithm is employed to solve the active group detection problem byreplacing the classical compressive sensing model given in (6) with the group access model givenin (5). Accordingly, the variables y , A , x , ω involved in (6) ∼ (9) are replaced by the variables y IIk , S IIk , x k , ω k given in (5), respectively. The under-sampling ratio of µ in (8) is calculated by L k M ( k ) . The number of total elements M and nonzero elements M ac in x is replaced by that of totalgroups M ( k ) and active groups M ( k ) ac in a cluster c k , respectively. C. Dynamic Preamble Selection Algorithm

In the context of our active group detection and according to the CS theorem [18], [35],to satisfy a certain detection accuracy, the minimum preamble length is related to the numberof active groups in a cluster, namely M ( k ) ac and to the total number of groups in a cluster,namely M ( k ) . Apparently, in practice, the value of M ( k ) ac and M ( k ) shall be salient different in different clusters. It implies employing a preamble with inappropriate length will result in eithera serious detection inaccuracy or an excessive overhead. Accordingly, a DPS algorithm shown inalgorithm 2 is designed and employed in phase-II of the random access. The technical challengeof algorithm 2 occurs at its line 3 that evaluates the MPL. Owing to analyzing complexity andimportance, we speciﬁcally discuss it in Section IV.IV. T HEORETICAL A NALYSIS ON THE M INIMUM P REAMBLE L ENGTH

According to Section III-C, ﬁnding the MPL required by MMSE based AMP algorithm forachieving a certain data recovery accuracy makes great sense. Similar works have been attemptedin [18] and [35]. However, two deﬁciencies of the MPL calculation method given in [35] preventus from applying it in our DPS algorithm: (1) two constant parameters, namely C and C are in-volved, i.e., instead of an exact value, it only provides an asymptotical order; (2) it does not relateto a speciﬁc data reconstruction method. Furthermore, the authors of [36] and [37] attemptedto answer this fundamental question from the perspective of classical asymptotic informationtheoretic analysis. In [37], seeking for the MPL is termed as the “minimum user-identiﬁcationcost” problem. In their Gaussian many-access channel (MnAC) model, the minimum number ofchannel uses for guaranteeing an error-free random user identiﬁcation is given by L = N · H ( N ac N ) log ( + N ac γ ) , (11)where N denotes the total number of cellular users. In contrast, N ac denotes the average numberof active cellular users. γ denotes the signal-to-noise ratio (SNR) and it is assumed in [37] thatevery active user is subject to the same power constraint of γ . Besides, the entropy function isdeﬁned as H ( p ) = − p log ( p ) − ( − p ) log ( − p ) . However, the theorems provided in [37] arestill not suitable in our scenarios owing to two reasons: (1) the result shown in (11) does notrelate to any speciﬁc active user detection algorithm, hence a signiﬁcant gap may exist betweenthe MPL required by AMP algorithm and that predicted by (11), as illustrated in Fig.4; (2) onlyGaussian noisy channels are considered. However, both large-scale and small-scale fading effectsare taken into account in our system for modelling a more practical random access scenario.In the following, we will provide a tight lower bound on MPL for MMSE-AMP algorithm.The state evolution method proposed in [15] is employed, where the mean square error (MSE)of data reconstruction is regarded as a state variable. In more detail, at every iteration of MMSE N Fig. 4: MPL versus sparsity. The MPL required by MMSE-AMP algorithm and that predicted by [37] are compared. AWGN channels areassumed. based AMP algorithm, ˆ X t in (10) is modeled as a noise corrupted signal. Hence, ˆ X t could beformulated as ˆ X t = X + τ t · ω , (12)where the random variable ω follows complex Gaussian distribution with zero mean and unitvariance. Then, τ t is referred to the variance of the t th estimation ˆ X t . Particularly, according to[15], τ t is given by τ t + = σ P k L k + M ( k ) L k MSE ( τ t ) , (13)where σ is the variance of background noise ω involved in (1). The function MSE ( · ) evaluatesthe MSE of its input variable and is speciﬁed in [15].Based on the state evolution method [15], in order to achieve a high data reconstructionaccuracy, the recursive reconstruction progress formulated by (7)-(9) has to converge. It meansthe variance of the AMP estimation, namely τ t should constantly decrease to σ P k L k . Hence, thefollowing inequality holds τ t + ≤ τ t , ∀ t . (14)By substituting (13) into the inequality (14), a lower bound on preamble length for satisfying the convergence of AMP algorithm is given by L k ≥ σ P k + M ( k ) MSE ( τ t ) τ t . (15)In order to evaluate the performance of AMP algorithm, two metrics are invoked: (1) theprobability of missed detection (pM) in cluster c k ; (2) the probability of false alarm (pF) incluster c k . They are deﬁned as pM ( k ) = ∑ M ( k ) m = { ˆ x k , m < θ & x k , m = } ∑ M ( k ) m = { x k , m = } , (16) pF ( k ) = ∑ M ( k ) m = { ˆ x k , m ≥ θ & x k , m = } ∑ M ( k ) m = { x k , m = } , (17)where x k , m is deﬁned in (5), ˆ x k , m is the estimation of x k , m given by the AMP based active groupdetection, and θ denotes the decision threshold employed by AMP algorithm. While ˆ x k , m ≥ θ ,AMP algorithm will regard g k , m as an active group. Then, according to the state evolution method,the pM ( k ) and pF ( k ) that can be achieved in the t th AMP iteration could be characterized by  pF ( k ) = e − θ τ t , pM ( k ) = M M ∑ m = ( − e − θ τ t + g m ) = ˆ ( − e − θ τ t + g ) · P ( k ) G ( g ) dg , (18)where P ( k ) G ( g ) is the probability density function of the large-scale fading coefﬁcient g . Therandom variable g takes both the path-loss effect and the shadowing effect into account. In moredetail, the path-loss effect is modeled as α + β log ( d ) , where d is the distance between a grouphead and the BS. The shadowing effect follows log-normal distribution with a variance of σ s .In practical applications, we aim at a target performance of pM and pF , namely pM ob j and pF ob j , respectively. Then, by substituting the target pM ob j and pF ob j into (18), the appropriatedecision threshold θ and the required variance of t th AMP estimation τ t can be determined whilegiven the large-scale fading model. The associated solutions of θ and τ t could be denoted as θ ob j and τ ob j , respectively.Bear the above statements in mind, in order to obtain θ ob j and τ ob j , we shall specify the large-scale fading model. According to the proposed layered grouping network framework shown inFig.1, the users of a cluster uniformly locates in the same ring, whose inner and outer radius are represented by R and R , respectively. Hence the distance between a user and the BS obeys d ∼ [ R , R ] . Accordingly, the probability density function of large-scale fading coefﬁcient couldbe formulated as P G ( g ) , a g − γ Q ( g ) − a g − γ Q ( g ) , (19)where  a = ( R − R ) β √ π exp ( ( ln 10 ) σ s β − ( ) αβ ) , a = R ( R − R ) β √ π exp ( ( ln 10 ) σ s β − ln ( ) αβ ) , γ , β + , γ , β + , Q i ( g ) = ´ b ln g + c i b ln g + c i exp ( − s ) ds , i ∈ { , } c i = − α − β log ( R ) √ σ s − i β b , i ∈ { , } c i = − α − β log ( R ) √ σ s − i β b , i ∈ { , } b = − √ ( ) σ s . (20)According to the state evolution method, if the AMP algorithm always achieves the targetperformance of pM ob j and pF ob j , then the following inequality has to be true as long as asufﬁcient large iteration number t is chosen τ t + ≤ τ ob j ≤ τ t . (21)By substituting (13) into the above inequality, it results in L k ≥ σ P k + M ( k ) MSE ( τ t ) τ ob j . (22)Then, it is provable that MSE ( · ) is a monotonically increasing function in the region of g ∈ [ , ] . Hence we have MSE ( τ ob j ) ≤ MSE ( τ t ) in practical scenarios. It implies replacing MSE ( τ t ) by MSE ( τ ob j ) in (22) will yield a relaxed lower bound (LB) on MPL, which is given by L k ≥ σ P k + M ( k ) MSE ( τ ob j ) τ ob j . (23)For example, in practice, simultaneously achieving pM ob j = .

05 and pF ob j = .

05 may bean acceptable active group detection performance. While considering the large-scale fadingmodel given in (19)-(20), the associated solution of (18) is θ ob j ≈ . × − , τ ob j ≈ × − .0 Assuming that network conﬁgurations including the SNR, the cluster size M ( k ) , as well as thesparsity λ are known. Then, by substituting τ ob j ≈ × − into (23), we could calculate theexact lower bound on MPL that enables the AMP algorithm to achieve the target active userdetection performance. N K M σ P k = 18 P k = 24 P k = 30 Fig. 5: Comparison between simulated MPL and its lower bound with respect to different transmit powers. The practical preamble lengthrequired by conventional “no grouping” RA scheme [15] is provided for comparison.

It is evidenced in Fig.5 that the lower bounds on MPL (23) gets quite close to the actualMPLs estimated by the Monte Carlo simulations for different transmit powers, although theirdiscrepancy will be slightly enlarged while increasing the sparsity λ . The comparison betweensimulated MPL and its lower bound with respect to different coverage areas are illustrated inFig.6. Fig.6 demonstrates that the discrepancy between the simulated MPL and its lower boundwill be impacted by different cluster coverages. This phenomenon is due to the fact that thelarge-scale fading effect will be impacted by the cluster coverage as formulated in (19) and (20).On the condition of having a low sparsity of λ ≤ .

05, this discrepancy would not exceed 10%of the theoretical lower bound. Therefore, in algorithm 2, we ﬁrst calculate the lower bound onMPL for a speciﬁc cluster c k , then the L k employed by the active groups in c k will be 10%higher than the lower bound.Furthermore, the conventional no grouping GF RA scheme proposed in [15], which alsoemploys the MMSE based AMP algorithm, is shown in Fig.5 and Fig.6 as well. It is demonstratedin Fig.5 and Fig.6 that the MPL required by the proposed layered grouping based RA schemeis always signiﬁcantly less than that required by its counterpart in [15] regardless of different P k K = 2 M = 1000 N = 10000 , [ R (1)1 , R (1)2 ] = [0, 500] k = 1 , [ R (1)1 , R (1)2 ] = [0, 500] k = 2 , [ R (2)1 , R (2)2 ] = [500, 1000] Fig. 6: Comparison between simulated MPL and its lower bound with respect to different cluster coverages. The practical preamble lengthrequired by conventional “no grouping” RA scheme [15] is provided for comparison.

SNR values, different sparsities, different coverages. For the sake of fair comparison, the totalnumber of cellular users remains the same in Fig.5 and Fig.6.V. S

IMULATION R ESULTS AND D ISCUSSIONS

In this section, the active user detection performance of the proposed two-phase DPS aidedRA scheme is simulated. The obtained results are compared with the classical CS aided RAthat does not exploit any grouping strategy [15] and with the conventional group paging aidedRA that does not exploit CS technologies [38]. Without loss of generality, the group size | g k , m | and the number of groups in different clusters M ( k ) are preﬁxed to constants regardless of theindices of k and m . Other system parameters are listed in Table.I.In Fig.7, the pF , pM versus transmit power in the two-phase DPS aided RA is compared withthat of CS aided RA [15], which also employs the MMSE based AMP algorithm. It is a generalconsensus that pF = pM implies a good performance balance of active user detection can beachieved. Hence the decision threshold employed by MMSE based AMP algorithm is adjustedfor achieving pF = pM .Comparing the solid line labelled by diamonds with the dashed line labelled by crosses in Fig.7,the two-phase DPS aided RA achieves a dramatic power gain with respect to the conventionalCS based RA [15] while using the same preamble length of 400. Comparing the solid linelabelled by triangles with the solid line labelled by squares in Fig.7, the grouping strategy of Parameter ValueRadius of the cell 1000 m Path-loss model 15 . + . ( d ( m )) Variance of shadowing σ s N o -99dBmTotal cellular devices N , K , , | g k , m | M ( k ) M = , s Ik Walsh Seq., | s Ik | = P deﬁned in (2) 23dBmPreamble type of s IIk , m Gaussian Random Seq.TX power P k deﬁned in (5) 23dBmCluster coverages of K = h R ( k ) , R ( k ) i = ( [ , ] , k = [ , ] , k = K = h R ( k ) , R ( k ) i =  [ , ] , k = [ , ] , k = [ , ] , k = [ , ] , k = TABLE I: System Conﬁguration. K = , M = K = , M = pS ) in cluster c k as pS ( k ) = ∑ M ( k ) m = { ˆ x k , m ≥ θ & x k , m = } ∑ M ( k ) m = { x k , m = } . (24)According to this deﬁnition and equation (18), pF , pM and pS have following relationships  pS ( k ) = − pM ( k ) = − ´ ( − e − θ τ t + g ) · P ( k ) G ( g ) dg , pS ( k ) = − − λ k λ k pF ( k ) = − − λ k λ k e − θ τ t . (25)The parameter τ t involved in (25) can be determined according to (13). Hence, (25) enableus to theoretically analyze the active group detection performance of the proposed RA strategy.As a result, the pS achieved by preamble length ﬁxed strategy is compared with that achievedby DPS strategy in Fig.8, where both of them employ two-phase RA framework and the grouping P k N = 20000 λ = 0 . L = 400 L = 2600 K = 2 , M = 2000 | s IIk,m | = 400 K = 2 , M = 2000 | s IIk,m | = 800 K = 4 , M = 1000 | s IIk,m | = 400 K = 4 , M = 1000 | s IIk,m | = 800 Fig. 7: pF, pM versus transmit power of the two-phase DPS aided RA. It is clear that the layered grouping scheme employing a preamble lengthof only 800 can approach the performance of no grouping scheme [15] which employs a very long preamble length of 2600. strategy is ﬁxed to K = , M = | s IIk , m | = , pS of grouped RA still rapidly drops along with the growthof sparsity. In contrast, beneﬁting from the DPS aided RA scheme, the system is capable ofachieving a high pS probability throughout the entire sparsity region. Actually, the DPS strategyapproaches a similar active group detection performance to a preamble length ﬁxed counterparthaving | s IIk , m | = ε pmb − I = N · λ ∑ Kk = ∑ M ( k ) m = a k , m P k , m | s Ik | , (2) energy consumed by waiting for the Phase-II solutionmessage, i.e. ε wait − I = P wait · T wait − I , (3) energy required by processing the received Phase-IIsolution message, i.e. ε pcs − I = P pcs · T pcs − I , (4) energy required by transmitting access preambles K = 4 , M = 1000 P k | s IIk,m || s IIk,m || s IIk,m | Fig. 8: pS versus sparsity performance of two-phase DPS aided RA, where preamble length ﬁxed strategy is compared with. The preamblelength required by dynamic scheme is always less than 210 after averaged over all clusters. for the sake of active group identiﬁcation, i.e. ε pmb − II = N · λ ∑ Kk = ∑ M ( k ) m = a k , m P k , m | s IIk | , (5) energyconsumed by the waiting model (as deﬁned in [39]) in the entire “Phase-II” duration, i.e. ε wait − II = P wait · T wait − II , (6) energy required by processing the PDTS message, i.e. ε pcs − II = P pcs − II · T pcs − II . Hence, the average random access energy per active user could be calculated as: ε = ε pmb − I + ε wait − I + ε pcs − I + ε pmb − II + ε wait − II + ε pcs − II (26)In the above energy parts, P k , m and P k are deﬁned in (2) and (5), respectively. Thus, the values of P wait , P pcs , T wait − I and T wait − II are speciﬁed according to similar parameters given in [38]- [39].Particularly, T wait − II is the average waiting time of an active group required in the entire “Phase-II” duration, which equals to subtracting length of | s IIk | and T pcs − II from duration of “Phase-II” . More speciﬁcally, according to LTE standard, up to 839 symbols can be transmitted withina single time slot (i.e. 0.5 ms). Hence, we equivalently employ the number of symbols as ourtime metric.Furthermore, if we replace practical preamble length | s IIk | used in (26) by its theoretical lower The duration of “Phase-II” is determined by system conﬁgurations of N , K , λ etc. It can be calculated by experimental methodas shown in Fig.12. bound (LB) speciﬁed in (23), the LB of ε pmb − II is given by ε ∗ pmb − II = N · λ K ∑ k = M ( k ) ∑ m = a k , m P k σ P k + M ( k ) MSE ( τ ob j ) τ ob j , (27)Simultaneously, the length of T wait − II is also minimized, which results in the minimization of ε wait − II , namely ε ∗ wait − II . Substitute ε ∗ pmb − II and ε ∗ wait − II into (26), we refer to the resultant ε astheoretical RA energy of our two-phase DPS aided RA, namely ε ∗ . − − − − − Group-paging [39]

No grouping [15]

Simulated RA Energy: ε Theoretical RA Energy: ε ∗ Fig. 9: RA energy comparison between the proposed scheme, group-paging RA scheme [38] and no grouping RA scheme [15], where N = λ = . K increases from 2 to 8. − − − − − Group-paging [39]

No grouping [15] | g k,m | Simulated RA Energy: ε Theoretical RA Energy: ε ∗ Fig. 10: RA energy comparison between the proposed scheme, group-paging RA scheme [38] and no grouping RA scheme [15], where N = λ = . | g k , m | increases from 5 to 20.6 Observe at Fig.9 and Fig.10 that on the condition of approaching the same pS ≥ pF and pM .As can be observed from Fig. 11, for the given condition, the proposed RA scheme is capableof supporting more than 10 users. In contrast, no grouping strategy [15] can only supportapproximately 2500 users for the same amount of physical resources. N No grouping [15] | g k,m | P k | s IIk,m | λ pS ≥ . Fig. 11: Comparison on affordable number of coexisted users in a cell between the proposed RA and no grouping CS aided RA.7

Finally, the average time required by an active group head ˙ u k , m for completing random accessprocedure is adopted as our delay metric. According to the proposed RA procedure in Fig.3 and inline with our RA energy analysis, the RA delay of ˙ u k , m consists of six major components: (1) Thetime T required by all active GHs for transmitting their cluster preambles of | s Ik | , k = , , · · · , K ;(2) the waiting time T wait − I during ”Phase-I”; (3) the time T pcs − I required for processing phase-II solution message; (4) the time T required by an active GH for transmitting its group accesspreamble; (5) the waiting time T wait − II during “Phase-II” and (6) the time T pcs − II required forprocessing PDTS message. Again, the number of symbols is employed as the time metric. No grouping [15] | g k,m | = 5 | g k,m | = 15 Fig. 12: Access delay versus number of clusters in the proposed two-phase GF RA scheme.

The random access delay performance of the proposed DPS aided RA scheme is demonstratedin Fig.12, where λ = . , N = K grows. But, beneﬁting from the DPS scheme, employing alarge group size could slightly mitigate the latency. Consequently, our proposal may be moresuitable for the mMTC devices which have a relative higher tolerance of time delay. Fortunately,beneﬁting from the DPS aided RA scheme, the random access delay will not linearly increasewith respect to the number of clusters.VI. C ONCLUSIONS

With the explosive growth of IoT devices (may approach around 125 billion by 2030 [21]),the mMTC communication will become one of the most important services of forthcoming B5G networks. An extremely large number of devices are accommodated in a single cell. Neighbor-hood devices are inclined to have a similar communication behavior and QoS requirement. Thesefundamental characteristics of future mMTC scenario motivate us to propose a two-phase DPSaided RA scheme. Beneﬁting from the proposed layered grouping network framework, insteadof a large number of active users directly access the BS, only the GHs of active groups accessthe BS on behalf of all the active members. The mechanism of constructing and maintainingthis layered grouping network framework, as well as the two-phase RA procedure are carefullydesigned, especially the Opt-EC based K-means grouping algorithm, the orthogonal sequencebased cluster load estimation, the dynamic preamble selection, as well as the MMSE basedAMP algorithm. A tight lower bound on MPL required by AMP algorithm for achieving thegiven detection accuracy is provided. In summery, compared with the existing RA schemes, theproposed DPS aided RA scheme achieves three major improvements: a) reducing the accessoverhead as shown in Fig. 5, Fig. 6 and Fig.8; b) saving the access energy as shown in Fig.7and Fig.9; c) increasing the number of coexisted cellular users as shown inFig.11, at the priceof relatively longer access delay. R EFERENCES [1] 3GPP, “IMT-2020 connection density evaluation results (mMTC),” R1-1808866, Aug. 2018.[2] C. Bockelmann.ed, “Towards massive connectivity support for scalable mMTC communications in 5g networks,” IEEEAccess, vol. 6, pp. 28969–28992, 2018.[3] 3GPP, “IMT-2020 self evaluation: mMTC coverage, data rate, latency & battery life,” R1-1905187, Apr. 2019.[4] G. Durisi, T. Koch, and P. Popovski, “Toward massive, ultrareliable, and low-latency wireless communication with shortpackets,” Proceedings of the IEEE, vol. 104, no. 9, pp. 1711–1726, 2016.[5] “Uplink multiple access schemes for nr,” May 3GPP, R1-165174, 2016.[6] “Discussion on grant-free transmission,” Aug. 3GPP, R1-166095, 2016.[7] “Grant-based and grant-free multiple access for mMTC,” May. 3GPP, R1-164268, 2016.[8] Z. Ding, Z. Yang, P. Fan, and H. V. Poor, “On the performance of non-orthogonal multiple access in 5g systems withrandomly deployed users,” Signal Processing Letters, IEEE, vol. 21, no. 12, pp. 1501–1505, 2014.[9] A. Bayesteh, E. Yi, H. Nikopour, and H. Baligh, “Blind detection of scma for uplink grant-free multiple-access,” in 201411th International Symposium on Wireless Communications Systems (ISWCS), 2014.[10] Wanwei, Tang, Shaoli, Kang, Bin, Ren, Xinwei, and Yue, “Uplink grant-free pattern division multiple access (gf-pdma)for 5g radio access,” China Communications, 2018.[11] Z. Yuan, C. Yan, Y. Yuan, and W. Li, “Blind multiple user detection for grant-free musa without reference signal,” in 2017IEEE 86th Vehicular Technology Conference (VTC-Fall), 2017.[12] “Contention-based non-orthogonal multiple access for ul mmtc,” May 3GPP, R1-164269, 2016.[13] F. Fazel, M. Fazel, and M. Stojanovic, “Random access compressed sensing over fading and noisy communication channels,”IEEE Transactions on Wireless Communications, vol. 12, no. 5, pp. 2114–2125, 2013. [14] J. Hong, W. Choi, and B. D. Rao, “Sparsity controlled random multiple access with compressed sensing,” IEEE Transactionson Wireless Communications, vol. 14, no. 2, pp. 998–1010, 2015.[15] Z. Chen, F. Sohrabi, and W. Yu, “Sparse activity detection for massive connectivity,” IEEE Transactions on SignalProcessing, vol. 66, pp. 1890–1904, Apr. 2018.[16] L. Liu and W. Yu, “Massive connectivity with massive MIMO part -I: Device activity detection and channel estimation,”IEEE Transactions on Signal Processing, vol. 66, pp. 2933–2946, June. 2018.[17] K. Senel and E. G. Larsson, “Grant-free massive MTC-enabled massive MIMO: A compressive sensing approach,” IEEETransactions on Communications, vol. 66, no. 12, pp. 6164–6175, 2018.[18] E. J. Candes and T. Tao, “Decoding by linear programming,” IEEE Transactions on Information Theory, vol. 51, pp. 4203–4215, Dec 2005.[19] D. L. Donoho and M. Elad, “Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization,”in Proceedings of the National Academy of Sciences of the United States of America, vol. 100, pp. 2197–2202, 2013.[20] 3GPP, “Considerations and evaluation results for IMT-2020 for mMTC connection density,” R1-1903968, Apr. 2019.[21] S. K. Sharma and X. Wang, “Toward massive machine type communications in ultra-dense cellular iot networks: Currentissues and machine learning-assisted solutions,” IEEE Communications Surveys & Tutorials, vol. 22, no. 1, pp. 426–471,2020.[22] P. Si, J. Yang, S. Chen, and H. Xi, “Adaptive massive access management for QoS guarantees in M2M communications,”IEEE Transactions on Vehicular Technology, vol. 64, pp. 3152–3166, July 2015.[23] X. Hu, C. Zhong, X. Chen, W. Xu, and Z. Zhang, “Cluster grouping and power control for angle-domain MmWave MIMONOMA systems,” IEEE Journal of Selected Topics in Signal Processing, vol. 13, pp. 1167–1180, Sep. 2019.[24] A. Biral, M. Centenaro, A. Zanella, L. Vangelista, and M. Zorzi, “The challenges of M2M massive access in wirelesscellular networks,” Digital Communications and Networks, vol. 1, no. 1, pp. 1–19, 2015.[25] G. Tsoukaneri, M. Condoluci, T. Mahmoodi, M. Dohler, and M. K. Marina, “Group communications in narrowband-IoT:Architecture, procedures, and evaluation,” IEEE Internet of Things Journal, vol. 5, pp. 1539–1549, June 2018.[26] C. Tu, C. Ho, and C. Huang, “Energy-efﬁcient algorithms and evaluations for massive access management in cellular basedmachine to machine communications,” in 2011 IEEE Vehicular Technology Conference (VTC Fall), pp. 1–5, Sep. 2011.[27] W. Zhan, X. Sun, Y. Li, F. Tian, and H. Wang, “Optimal group paging frequency for machine-to-machine communicationsin LTE networks with contention resolution,” IEEE Internet of Things Journal, vol. 6, pp. 10534–10545, Dec. 2019.[28] E. Lau and W. Chung, “Enhanced RSSI-based real-time user location tracking system for indoor and outdoor environments,”in 2007 International Conference on Convergence Information Technology (ICCIT 2007), pp. 1213–1218, Nov. 2007.[29] A. Bhattacharjee, M. R. Islam, and M. A. Ullah, “Weather independent positioning of MS using power slope patterningalgorithm via one BS,” in 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE),pp. 1–3, Feb. 2019.[30] B. Han, V. Sciancalepore, O. Holland, M. Dohler, and H. D. Schotten, “D2D-based grouped random access to mitigatemobile access congestion in 5G sensor networks,” IEEE Communications Magazine, vol. 57, pp. 93–99, Sept. 2019.[31] J. Lianghai, B. Han, M. Liu, and H. D. Schotten, “Applying device-to-device communication to enhance IoT services,”IEEE Communications Standards Magazine, vol. 1, no. 2, pp. 85–91, 2017.[32] FuTURE & TIAA V2X Working Group, “Spectrum requirements in automatic driving 5G NR V2X direct communications,”World 5G Convention, Nov. 2019, Beijing.[33] D. L. Donoho, A. Maleki, and A. Montanari, “Message passing algorithms for compressed sensing: II. analysis andvalidation,” in 2010 IEEE Information Theory Workshop on Information Theory (ITW 2010, Cairo), pp. 1–5, Jan. 2010.0