[PDF] Max-Min Fairness Based on Cooperative-NOMA Clustering for Ultra-Reliable and Low-Latency Communications

Abstract

In this paper, the performance of a cooperative relaying technique in a non-orthogonal multiple access (NOMA) system, briefly named cooperative NOMA (C-NOMA), is considered in short packet communications with finite blocklength (FBL) codes. We examine the performance of a decode-and-forward (DF) relaying along with selection combining (SC) and maximum ratio combining (MRC) strategies at the receiver. Our goal is user clustering based on C-NOMA to maximize fair throughput in a DL-NOMA scenario. In each cluster, the user with a stronger channel (strong user) acts as a relay for the other one (weak user), and optimal power and blocklength are allocated to achieve max-min throughput.

Full PDF

1  Abstract — In this paper, the performance of a cooperative relaying technique in a non-orthogonal multiple access (NOMA) system, briefly named cooperative NOMA (C-NOMA), is considered in short packet communications with finite blocklength (FBL) codes. We examine the performance of a decode-and-forward (DF) relaying along with selection combining (SC) and maximum ratio combining (MRC) strategies at the receiver. Our goal is user clustering based on C-NOMA to maximize fair throughput in a DL-NOMA scenario. In each cluster, the user with a stronger channel (strong user) acts as a relay for the other one (weak user), and optimal power and blocklength are allocated to achieve max-min throughput. To this end, first, only one cluster is considered, and optimal resource allocation is proposed. Also, a suboptimal algorithm is suggested, which converges to a near-optimal solution. Finally, the problem is extended to a general scenario, and a suboptimal C-NOMA-based clustering is proposed. Numerical results show that the proposed C-NOMA scheme in both SC and MRC strategies significantly improves the users’ fair throughput compared to the NOMA. It is also investigated that the proposed clustering scheme based on C-NOMA outperforms the Hybrid clustering from the average throughput perspective, while the fairness index degrades slightly . Index Terms — finite blocklength, short packet communication, URLLC, cooperative NOMA, max-min fairness, user clustering I. I NTRODUCTION

HE ever-increasing new demands such as tactile internet, high-resolution video streaming, virtual/augmented reality, autonomous vehicles, etc., with various requirements, may be somewhat challenging in terms of reliability and

F. Salehi is with the Faculty of Electrical and Computer Engineering, University of Birjand, Birjand, Iran (e-mail: [email protected]). N. Neda (Corresponding author) is with the Faculty of Electrical and Computer Engineering, University of Birjand, Birjand, Iran (e-mail: [email protected]). M.-H. Majidi is with the Faculty of Electrical and Computer Engineering, University of Birjand, Birjand, Iran (e-mail: [email protected]). latency. Unlike most of the existed mobile networks designed for traditional mobile broadband (MBB) services, Internet-of-Things (IoT) attempts to connect plentiful devices with the least human intervention. IoT applications are divided into massive machine-type communications (mMTC) and ultra-reliable low-latency communications (URLLC). The first one consists of many low-cost devices with massive connections and high battery lifetime requirements. On the other hand, URLLC requirements are most related to mission-critical services in which the importance of uninterrupted and robust data exchange is far greater than anything else. Short packets with FBL codes are considered to reduce the transmission delay and support low-latency communication. In the FBL regime communication, in contrast to Shannon's capacity for infinite blocklength, decoding error probability at the receiver is not negligible owing to short blocklength [1]. Polyanskiy et al. succeeded in deriving an exact approximation of the FBL regime’s information rate at the AWGN channel [2]. Following that, research in this context developed to MIMO channel with quasi-static fading [3] and a quasi-static fading channel with retransmissions [4]. Furthermore, the effect of short packets on the spectrum sharing, and scheduling of delay-sensitive packets was considered in [5] and [6], respectively. In [7], massive MIMO adoption to maximize the achievable uplink data rate for industrial applications was advocated for both MRC and zero-forcing (ZF) receivers. In [8], the resource allocation for a secure mission-critical IoT communication system was studied under finite blocklength, and two optimization problems with the aim of weighted throughput maximization and total transmit power minimization were addressed. The authors in [9] proposed a cross-layer framework for optimizing user association, packet offloading rates, and bandwidth allocation for mission-critical IoT scenarios. The NOMA performance in the FBL regime was studied in [10-13]. In [10], optimal power and blocklength allocation was considered in a high SNR scenario, and the amount of NOMA transmission delay reduction was determined compared to OMA in a closed-form. In [11], transmission rate and power allocation of the NOMA scheme were optimized to maximize the effective throughput of the strong user, while the throughput of the other user was guaranteed at a certain level. The transmitter’s energy with a hybrid transmission

Max-Min Fairness Based on Cooperative-NOMA Clustering for Ultra-Reliable and Low-Latency Communications

Fateme Salehi, Naaser Neda, and Mohammad-Hassan Majidi T et al. considered optimal power and blocklength allocation in OMA, NOMA, relaying, and C-NOMA transmissions schemes to minimize the weak user’s decoding error probability; meanwhile, the reliability of the strong user’s performance was guaranteed at a certain level. In this work, we consider a DL transmission with NOMA users and apply the cooperative relaying technique in the short packet communications scenario. It is assumed that the two users located in the same cluster, their channel gain difference is high. The strong user, which performs SIC and detects the weak user’s data, acts as a relay. The weak user, which receives its data via BS and relay separately, can implement SC or MRC to detect its data. Our main contributions in this work are summarized as follows: 1) We obtain each user’s decoding error probability in the C-NOMA transmission scheme for both SC and MRC protocols with the perfect CSI assumption. To the best of our knowledge, the MRC protocol is applied for the first time in the C-NOMA scheme in the FBL regime. 2)

To guarantee the quality of service (QoS) of the weak user and to improve fairness, joint power and blocklength optimization is done in both NOMA and relay phases to maximize the minimum throughput of two users in different combining scenarios, under latency, reliability, and energy constraints. 3)

A suboptimal solution with near-optimal performance is proposed to decrease the complexity of the optimal resource allocation, and their computational complexity is determined. 4)

The proposed scheme is extended to a multi-user scenario, and a suboptimal 2-user clustering based on C-NOMA is proposed. Meanwhile, the simulation results show its comparable performance to the exhaustive-search optimal algorithm. The remainder of this paper is organized as follows. In Section II, the system model and direct transmission analysis in the FBL regime are presented. Performance analysis of the C-NOMA transmission consist of SC and MRC strategies is provided in Section III. Problem formulation with a focus on one cluster is considered in Section IV. The optimal and one suboptimal solution are proposed for the problem in Section V. The problem is extended to a multi-user scenario and, one user clustering scheme is proposed in Section VI. Numerical results are presented in Section VII. Finally, Section VIII concludes the paper. II. P RELIMINARIES I SSUES A. System Model

As shown in Fig. 1(a), the URLLC users with different QoS requirements are grouped into clusters. We consider a cooperative relaying scenario in a DL system with one BS and two NOMA users in each C-NOMA cluster. For simplicity, we first assume just one cluster. In phase I, i.e., NOMA phase, BS transmits a NOMA frame of length I m symbols, which consists of two users’ data ( N bits, user 1’s data and N bits, user 2’s data). User 1, the strong user, performs the SIC technique and decodes user 2’s data and sends that to user 2 in a frame of length II m symbols in phase II, i.e., relaying phase. The instantaneous channel coefficients of BS-user 1, BS-user 2, and user 1-user 2 links representing small scale fading and large scale fading are denoted as h , h , and h , respectively. It is assumed that the channels are quasi-static Rayleigh fading. Hence, they are constant during one frame and vary independently from one frame to the next one. According to the power domain NOMA principle, in a two-user scenario, BS transmits i ii p x   , where i x is the message of user i ,  

1, 2 i  , and I i p refers to the allocated power of user i in phase I. So, the received signal at user i is given by I I I1 1 2 2 ( ) i i i y p x p x h n    , where i n is the complex additive white Gaussian noise with variance  . Without loss of generality, it is assumed that | | | | h h  , and more power should be allocated to user 2. Therefore, user 1 can perform the SIC technique to remove the interference, while user 2 suffers from the interference and cannot cancel it. If x is decoded correctly by user 1, it is re-encoded and 3 transmitted ( II2 2 p x  signal) . Consequently, the received signal at user 2 in the relaying phase is II II2 2 2 1,2 1,2 y p x h n   . Let

II2 p show the allocated power to user 2 by the relay (user 1) in phase II, and n is the complex additive white Gaussian noise with variance  . To implement this scheme, user 1 must know whether SIC is successful or not. To this end, we suppose that BS sends the channel coding information of both user 1 and user 2 to user 1 via an error-free dedicated channel. The channel coding can help to diagnose whether the decoded data is correct or not. Thus, user 1 knows whether the SIC is successful or not [21]. (a) (b) Fig. 1. (a) system model, (b) frame structure. B. Direct Transmission Analysis in The FBL Regime

According to [4], the achievable data rate R for a finite blocklength of m symbols ( m  ), and an acceptable block error rate (BLER)  , has an exact approximation as (1)   Qln 2

VR C m     where   log 1 C   is the Shannon capacity,  is the SNR/SINR ratio,   Q   refers to the inverse Gaussian Q- It should be noticed that x is the user 2’s data with rate I2 N m , while x  is the same data with rate II2

N m . function, and   V      is the channel dispersion. In the FBL regime, even with perfect CSI, the transmission is not error-free and the decoding error probability is given by (2)       Q ln 2 Q , , C R f R mV m        . III. P ERFORMANCE A NALYSIS O F C-NOMA T RANSMISSION

It is assumed that the receivers have access to perfect CSI, and BS and each of the users have one antenna. Also, user 2 can employ various combining strategies, including SC and MRC. In phase I, user 2 directly detects x by considering x as interference. The decoding error probability of x at user 2 in phase I is denoted by I2,2  , which is approximated based on (2) by (3)     I I I I2,2 2,2 2,2

Q , , f R m   where   p h p h    and I I2,2 2

R N m  are the received SINR and the achievable rate of user 2 related to detecting x in phase I, respectively. Since x is detected directly, I2,2  is the overall error probability of user 2 in phase I, i.e., I I2 2,2   . On the opposite, user 1 performs SIC, meaning it first decodes x while treats x as interference. Similarly, the decoding error probability of x at user 1 in phase I, which is denoted by I1,2  , is approximated as (4)     I I I I1,2 1,2 1,2

Q , , f R m   where   p h p h    and I I1,2 2

R N m  are the received SINR and the achievable rate of user 1 related to detecting x in phase I, respectively. If user 1 decodes and removes x successfully, then x can be detected without interference. Accordingly, the decoding error probability of x at user 1 in phase I, i.e., I1,1  , is denoted by (5)     I I I I1,1 1,1 1,1

Q , , f R m   where

2I I 21,1 1 1 p h   and

I I1,1 1

R N m  are the received SNR and the achievable rate of user 1 related to detecting x in phase I, respectively. By assuming that x is detected when SIC is successful and the fact that in URLLC services,  is usually in order of 10 -5  -9 , the overall decoding error probability at user 1 in phase I can be approximated as (6)   I I I I I I1 1,2 1,2 1,1 1,2 1,1 = 1         

Since it is assumed that channels are half-duplex, the relayed signal is not received at user 1. Hence, the overall decoding error probability at user 1 is denoted as

I1 1   . In contrast, the overall decoding error probability of user 2 depends on the signal of phase II and combining strategy, where the following subsections derive the equations p p p m II m I P ave BS NOMA phase Relaying phase A. Selection Combining (SC)

In this protocol, user 2 does not combine the NOMA phase and relaying phase signals, but decodes transmitted messages from BS and relay (user 1) separately and selects the correctly decoded packet. First, the received message from user 1 in the relaying phase is decoded. If decoding is failed or no signal is received from user 1, then the transmitted message from BS in the NOMA phase is decoded. To differentiate the packets, the packet ID is inserted in the packet head for each device. Therefore, an error occurs when both transmissions are unsuccessful. Decoding error probability of x  by user 2 in phase II, i.e., II2,2  , is given by (7)     II II II II2,2 2,2 2,2

Q , , f R m   where p h   and II II2,2 2

R N m  are the received SNR and the achievable rate of user 2 related to detecting x  in phase II, respectively. It should be noted that the phase II signal will be transmitted if the message of user 2 is decoded correctly in phase I, so the overall decoding error probability of user 2 in phase II is approximated as (8)   II I I II I II2 1,2 1,2 2,2 1,2 2,2 = 1         

Finally, the overall decoding error probability of user 2 in SC strategy is formulated as (9)  

I II I I II2 2 2 2,2 1,2 2,2 =        B. Maximum Ratio Combining (MRC)

By applying MRC protocol at user 2, since the coding rate of BS-user 2 and user 1-user 2 links are not equal, the determinative link is the bottleneck link, i.e., the link with the lowest coding rate. Therefore, the combined signal with the MRC protocol has a frame of length  

C I II max , m m m  symbols and the following SINR (10) I IIC I II2,2 2,2 2,2C C . m mm m     The probability that user 2 fails in MRC signal decoding is given by (11)    

C C C C2,2 2,2 2,2

Q , , f R m   where C C2,2 2

R N m  is the achievable rate of user 2 in the combined packet with MRC protocol. User 2 fails when either its message is decoded correctly by none of them in phase I, or user 1 decodes x correctly, but the combined signal is not decoded correctly. Hence, the overall decoding error probability of user 2 in the MRC strategy is given by (12)   I I I C2 1,2 2,2 1,2 2,2 = .        IV. P ROBLEM F ORMULATION

In the considered URLLC system, the two users are served with the aim of fairness during two phases with a total max D symbols period. If channel feedback is available at the transmitter side, users’ data rates can be set according to their instantaneous channel conditions. That being the case, a suitable criterion is max-min fairness [22]. The throughput of user i , i T , is defined as the average bits per each channel use (or complex symbol), which is decoded correctly at the receiver; (13)   I I,max i i i i mT RD  where i  is the reliability of user i and a predefined value for each URLLC use case. In the C-NOMA scheme, the superposition coding is performed in the NOMA phase, such that the BS enables to transmit users’ signals simultaneously with different powers within a frame of length I m . User 1 after decoding user 2’s data, sends it in the relaying phase within a frame of length II m . In Fig. 1(b) the frame structure of C-NOMA is observed. Therefore, the desired optimization problem is formulated as (14a) (14b) (14c) (14d) (14e) (14f)          max min , s.t. 0 , 0, 1, 2 0 , 1, 2 jj i ij m p ii i T Tm p p m p D Pp p P p ip Pim m D              Optimization parameters consist of blocklength and power allocated to two users in phases I and II. Constraint (14b) indicates the system’s total energy consumption budget. Constraints (14c) and (14d) are the general power constraints, where ave P is the average power, and p is the peak to average power ratio (PAPR) factor. Constraint (14e) guarantees that the decoding error probability of user i does not violate th i  . Moreover, the latency constraint is stated by (14f). V. P ROBLEM S OLVING

This section will solve the optimization problem in (14) for the SC and MRC strategies. To facilitate this issue, we first have to analyze the constraints and specify their optimal status. Let us first consider the constraint (14e) on the acceptable BLER of the two users. Since each URLLC use case needs specific reliability, allocating more resources to achieve a BLER lower than the required th i  , wastes the rare resources. Moreover, according to (1), a lower desired error probability results in a lower data rate. Therefore, th i i   is an optimal choice. About constraint (14b), invoking [13, proposition 1], the acceptable data rate (i.e., R  ) in (1), is a monotonically increasing function of the corresponding SNR/SINR. Using the contradiction method, one can prove that to maximize the throughput, the energy constraint holds with equality [21], i.e.,   I I I II II1 2 2 max ave m p p m p D P    . In 5 addition, the following proposition indicates the ratio of optimal consumed energy in two transmission phases.

Proposition 1:

At the optimal solution, the total consumed energy of the two users in phase I is always greater than the consumed energy in phase II, i.e.,

I II IIsum 2 m P m p  , where I Isum 1 2

P p p  . (Refer to Appendix A for proof.) Furthermore, invoking [13, proposition 2], at the optimum point of Problem (14), throughputs of the two users are equal, i.e., T T  . Following the above discussion, we provide a solution for the optimization problem in (14) with both SC and MRC strategies. A. Optimal Design of Max-Min Fairness in C-NOMA

Since at the optimal solution

T T  , equation I th I th2,2 1 1,1 2 (1 ) (1 )

R R     can be derived from (13). Moreover, the message of user 2 contains the same number of bits in both phases, so it can be concluded that

II I II I2,2 2,2 ( )

R m m R  . Consequently, the optimization problem in (14) is rewritten as follows (15a) (15b) (15c) (15d) (15e) (15f)       I I II1 2

I I th1 1,1 1 max, , I II IIsum 2 max aveIsum p ave 1 sumII I II II2 p ave sum 2thI II max max 1 s.t. 0 , 0 2 0 , , 1, 2 m p p i i

T m R Dm P m p D PP P p Pp P m P m pim m D           

The restriction on I1 p in (15c) is applied based on the assumption that h h  . So, to perform SIC correctly in the NOMA phase, it is necessary that I I2 1 p p  . This problem can be solved using exhaustive linear search; however, we shorten more the search range of I1 p to reduce the computational complexity. The main idea can be summarized as follows:  First, by considering user 1’s decoding error probability, i.e.,

I I1 1,2 1,1      , the I1 p bound that guarantees th1 1   is determined. According to our previous work in [13],  is convex in I1 p and at most two values hold the I th1 1 1 ( ) p   . With I1,1 R  and constant values of I m and sum P , we obtain the possible solutions that keep this equality in the range of I1 sum p P   . Clearly,

I1,1  is a monotonically decreasing function of I1 p , so it is derived that I,min I I I th1 1 1 1,1 1 1 arg{ ( ) ( ) } p p p      . On the other hand,

I1,2  a monotonically increasing function of I1 p yields to I,max I I I th1 1 1 1,2 1 1 arg{ ( ) ( ) } p p p      . Hence, the search region of I1 p is given by I,min I I,max1 1 1 p p p   .  Since the decoding error probability is a monotonically increasing function of the transmission rate, for each value of I1 p in the feasible range, I1,1 R is increased until user 1’s decoding error probability equals to th1  . It should be noted that I I1,1 1,1 ( )

R C  .  Only those

I,min I I,max1 1 1 p p p   that satisfy

I th2 1 2 ( ) p   could be acceptable. Since the decoding error probability of user 2 in both SC and MRC strategies, respectively in (9) and (12), are increasing function of I1 p , the transmit power can be obtained using the bisection search method.  After the full search on the values of I m and sum P , among the feasible solutions, the answer that maximizes T is optimal. Based on the above analysis, the algorithm for solving Problem (15) is proposed in Algorithm 1. It first determines the local maximum of T , i.e., †0 T , by taking constant I m and checking all possible values of sum P and I1 p . In each iteration, the bisection search is adopted to find the desired I1 p . By repeating this process on all possible I m with a positive integer value, the global maximum of T , i.e., T  , is found. Thus, using a three-dimensional (3-D) exhaustive linear search, the optimal global solution is achieved. B. Suboptimal Design of Max-Min Fairness in C-NOMA

Although the search bounds of the optimum solution of Problem (15) stated in Algorithm 1 have been limited, the computational complexity is still high. Now we propose a suboptimal solution to this problem. To this end, we consider the allocated power of user 2 in phase II, equal the maximum value, i.e.,

II2 p ave p P  . If phase II transmission is not successful, part of the resources will go to waste, which in turn, will cause the system throughput reduction below the NOMA scheme’s one. Therefore, to avoid this condition and decrease the decoding error probability in phase II, x  is transmitted with the maximum power. Hence, the summation of two users’ transmit power in phase I is calculated as   II II Isum max ave 2

P D P m p m      , where     max , 0 x x  . Then, as before, the local maximum of T , i.e., †0 T , is obtained by searching on the possible values of I1 p within the range of I,min I,max1 1 , p p    . By repeating this process on all possible integer values of I m that satisfy I II IIsum 2 m P m p  , the global maximum of T , i.e., T  , is found. If I max m D  , or equivalently II m  , then sum ave P P  . In this case, signal transmission in phase II does not occur, and the C-NOMA scheme is transformed into the NOMA. This suboptimal algorithm needs a two-dimensional (2-D) linear search on   I I1 , p m . Therefore, the suboptimal solution’s computational complexity decreases intensively, while the numerical results in section VII demonstrate that it has a near-optimal 6 performance. Algorithm 1:

Optimum Power and Blocklength Allocation Algorithm in the C-NOMA Scheme with SC/MRC Strategy.

Input : total blocklength max D , overall BLER of user i th i  , BS average power ave P , required accuracy . Output : optimum power

I*1 p , I*2 p , II*2 p , and blocklength I* m , II* m , and fair throughput *1 2 0 T T T   . 1: for

I max m D  for sum p ave P p P   Set

II Imax : m D m   and   II I II2 max ave sum : p D P m P m   . 4: if II2 p ave p P   and I II IIsum 2 m P m p  then Calculate

I,min1 p and I,max1 p . 6: Set

I I,min1 1 : p p  . 7: while th2 2   do Set  

I I I,max1 1 1 : min , p p p p    . 9:

Find  

I † th1,1 1 1 arg R    via bisection method with accuracy  . 10: Calculate  by (9)/(12) for SC/MRC. 11: end while Set

I,lb I1 1 : p p p    and I,ub I1 1 : p p  . Find

I† I,lb I,ub1 1 1 , p p p     that satisfies th2 2   via bisection method with accuracy  . 14: end if end for Set  

I †† I † th1,1 1,1 2 2 : max

R R    and   † th I I ††0 1 1,1 max : 1

T m R D   . 17: end for

Set   †0 0 : max

T T   . 19: Return    

I* I II* †1 2 0 , , arg max m p p T   , II* I*max m D m   ,   I* II* II* I* I*2 max ave 2 1 p D P m p m p    . C. Computational Complexity

The computational complexity of Algorithm 1 is calculated as follows. In the first step, to obtain the bounds of I1 p , a linear search with complexity  is applied. In the next step, I1,1 R is derived via the bisection method with complexity around th2 1 log ( )  where is the desired accuracy. Besides, the complexity of computing  is denoted as  . This step is performed at most I,max I,min1 1 1 ( )

K p p p    times where p  is the search step, so its complexity is denoted as th1 2 1 2 (log ( ) ) K    . In the last step, finding I1 p via the bisection search method has complexity around th2 2 log ( )  . These three steps are repeated on the possible values of sum P and I m , respectively K P p   and max D times. Therefore, the worst-case complexity of Algorithm 1 is   th th2 max 1 1 2 1 2 2 2 ( (log ( ) ) log ( )) . K D K      

Likewise, the computational complexity of the suboptimal algorithm is determined based on the above analysis. However, since

II2 p is a constant value, sum P is removed from the search process. Hence, the worst-case complexity of this algorithm is   th thmax 1 1 2 1 2 2 2 ( (log ( ) ) log ( )) . D K      

Although the number of iterations of the proposed algorithms for both SC and MRC techniques is equal, the number of basic operations related to computing the user 2’s decoding error, i.e.,  , is different. According to (9), calculation of  in the SC technique just includes one summation and one multiplication; while, calculation of  in the MRC technique, regarding (12), requires three summations (one is due to C2,2  ) and two multiplications. VI. E XTENSION T O M ULTI -U SER S CENARIO

This section considers a more general situation when there are more than two users in a cell. A. System Model and Problem Formulation

Let us denote the total number of users as K , and the set of users as  

1, , K  . We assume that the users’ channel gains are arranged in descending order, i.e., K h h h    . To implement the NOMA scheme, users are grouped into some clusters. While NOMA distinguishes the users in one cluster, the various clusters become distinct by the OMA technique. Usually, in practice, to decrease the receiver’s complexity, the number of users in each cluster is not considered more than four. Here we form clusters with two users and apply the C-NOMA scheme in each cluster. Since for relaying, the two users need to be in the coverage area of each other; clustering is done concerning their relative locations. The number of 2-user clusters is K    in the considered network, but the number of possible clustering states is completely random respecting the network topology and is denoted by Q . The throughput function of clustering in State q , where

1, , q Q  , is defined as follows (16)   I II I ,, , 0 , , , ; , . q q i jq i j i j i j f p p m a T i j   A Let , q qi j K K a      A be the clustering matrix in State q . Here , qi j a denotes the link between users i and j in State q where (17) ,

1, if users and are in the same cluster,0, otherwise. qi j i ja  

The goal is to find the optimum clustering that maximizes the minimum throughput of the cell users. Thus, the optimization problem can be formulated as 7 (18a) (18b) (18c) (18d)   , I II I,1, , , , ,, max min , , , s.t. ; , 1, 1, qq i j qq i j i jq Q aq qi j j iqi jj i qi ji j f p p ma a i ja ia j          A A Constraint (18b) shows that the clustering matrix q A is symmetric. Moreover, constraints (18c) and (18d) indicate that users i and j cannot belong to more than one cluster. The inter-programming problem of Problem (18) that applies the C-NOMA scheme in each cluster is expressed as follows (19a) (19b) (19c) (19d) (19e) (19f)       I I II, ,0 ,, ,I I I II II, , max aveI I p aveII p aveth thI II, , max max min , , 1 s.t. 2 0 0 , i j i j i j qi j i jm p pi j i j i j ji jji i j ji j i j

T T T am p p m p D P Kp p Pp Pm m D                

Here it is assumed that i j h h  so I I i j p p  . Constraint (19b) indicates that the total system’s energy consumption is distributed equally among the clusters. For solving Problem (18), it is needed that problem (19) is solved for all the potential pair-users in State q , i.e., ,

1; , 1, , qi j a i j K    . Hence, to find the optimum clustering, the inter-programming problem has to be solved . 2

Q K    times. By an exhaustive search over all the neighboring users, every two users are paired that the minimum achieved throughput in the cell is maximized. The complexity of the exhaustive search (i.e., the number of iterations needed to find the optimal pairing) is almost high, resulting in excessive scheduling delay with a large number of users. To alleviate the computational complexity, a suboptimal clustering algorithm is proposed in the following subsection. B. The proposed C-NOMA Clustering

Here, a suboptimal solution for Problem (18) is proposed. The objective is to maximize the throughput of the weakest user among all K users by allocating them into different clusters according to the geographic locations. To implement the proposed user clustering, the graph matrix of the network topology has to be obtained first. For this purpose, each user ought to find all the users in its coverage area with radius r . Since the aim is leveraging C-NOMA to increase reliability and system capacity, first C-NOMA clusters starting from the weakest user are formed. Users that are far from others and do not have a chance to exploit the C-NOMA technique use NOMA or OMA instead, depending on their channel condition. Finally, the users that have not been scheduled in C-NOMA clusters are rearranged to form hybrid clusters, as will be described in the following subsection. Algorithm 2 expresses the proposed C-NOMA-based user clustering in detail. Algorithm 2:

Suboptimal C-NOMA-based user clustering.

Input : sorted DL channel gains in descending order K h h h    and the corresponding D2D channel gains, device coverage radius r , inputs of Algorithm 1. Output : the user clustering , i j K K a      A . 1: Determine the graph matrix of the network topology. 2:

Set : i K  . 3: while i  do // allocating C-NOMA clusters 4: if user i has not been paired then Find the set of unpaired adjacent users of user i , i ψ . 6: if   length 0 i  ψ then for  

1: length i l  ψ Calculate , ( )0 i i l T ψ by Algorithm 1. 9: end for Set   , ( )*0 0 , : max i i l T index T     ψ and : ( ) i j index  ψ . 11: Pair users i and j , , : 1 i j a  . end if end if Set : 1 i i   . 15: end while

Set : i K  and : 1 j  . 17: while i j  do // allocating hybrid clusters 18: if user i has not been paired then while user j has been paired do Set : 1 j j   . 21: end while

Pair users i and j , , : 1 i j a  . 23: Set : 1 j j   . 24: end if

Set : 1 i i   . 26: end while

Return , i j K K a      A . C. Hybrid Clustering

To describe the hybrid clustering, let us first consider the 2-user NOMA clustering scheme proposed in [24]. Pursuant to this, the first strong user is paired with the first weak user; the second strong user is paired with the second weak user, and so on. Accordingly, all the users are clustered in 2-user groups. The fact is that principle of NOMA is to select users with a high difference in their channel gains. In particular, NOMA’s performance diminishes when the difference in channel gains among the users is small. For example, in Fig. 2, user-pairs 6 and 7, which have almost the same channel conditions, may decrease the spectral efficiency and system capacity due to the unsuccessful SIC. Hence, it is sensible that such non-suitable pairs are omitted from NOMA scheduling, and their grouping continues with OMA. In this scheme, the BS decides between OMA and NOMA depending on the CSI and the results of the 8 user pairing process. We discussed the hybrid clustering for scheduling the users that are isolated or left unpaired in the proposed C-NOMA-based clustering. However, these two basic schemes, namely NOMA clustering and hybrid clustering, can independently be implemented and are considered as benchmark schemes in our simulations.

User 1 User 2 User 3 User 4 User 5 User 6 User 7 User 8 User 9 User 10 User 11 User 12

Fig. 2. The 2-user NOMA clustering scheme [24].

VII. N UMERICAL R ESULTS

In this section, the proposed C-NOMA scheme’s performance along with SC and MRC strategies are evaluated through the numerical results based on our analytical solutions. A heterogeneous network consists of URLLC users with different reliability requirements is considered. PAPR factor and required accuracy in Algorithm 1 are considered as p  and   , respectively. Also, it is assumed that ave

10 W P  and max D  channel uses, unless otherwise stated. The numerical results are provided based on fixed channel gains with two users and random channel gains with more than two users, which are presented in the following two subsections. A. Two-user Network with Fixed Channel Gains

Throughout this subsection, to provide insight into the relationships between the proposed and the benchmark schemes, the channel gains of the two users are set to be fixed. For instance, it is assumed that | | / 0.8 h   and | | / 0.1 h   . We investigate the performance of the proposed schemes in two various relaying link status. Meaning, when the two users are near to each other and the relaying link is strong, it is assumed that | | / 0.5 h   , and when the two users are far from each other, and the relaying link is poor, it is assumed that | | / 0.01 h   . Meanwhile, users BLER are considered as th 71    and th 52    . In Fig. 3, the effect of total blocklength, max D , on the fair throughput in the proposed C-NOMA with SC and MRC strategies is assessed in two relaying link modes. Also, the optimal NOMA results in our previous work [13] are shown for comparison. It is observed that in the strong relaying link mode, both combining strategies applied to the C-NOMA effectively improve the fair throughput compared to the NOMA. It is also observed that the MRC receiver outperforms the SC receiver, regardless of the blocklength. Because in the combined signal with MRC protocol, SINR increases, so the decoding error probability of user 2 decreases. Hence, it is possible that by less blocklength allocation to phase II, the reliability performance of user 2 can still be guaranteed at the desired level. As a result, more blocklength is allocated to phase I. Hence, users’ data rates and system fair throughput increase. On the other hand, in a poor relaying link, the C-NOMA scheme (in both combining strategies) has exactly the same performance as the NOMA. In fact, in this case, the optimal decision is in favor of the direct link, and the C-NOMA is transformed into the NOMA. However, in a realistic wireless channel, mixed conditions occur together, and C-NOMA outperforms the NOMA on average. Moreover, it is observed that suboptimal solutions in both SC and MRC receivers converge to the near-optimal solutions. Fig. 3. Maximum fair throughput achieved by the C-NOMA and NOMA schemes versus max D , when ave

10 W P  . Fig. 4. Maximum fair throughput achieved by the C-NOMA and NOMA schemes versus ave P , when max D  . In Fig. 4, the effect of average total power, ave P , on the fair 9 throughput is investigated. In the strong relaying link mode, the C-NOMA’s superiority with MRC receiver is notable against the SC receiver and the NOMA scheme. In addition, the C-NOMA with SC strategy outperforms the NOMA in low power/SNR ranges, while it coincides with the NOMA on average powers greater than 20 W. This could be justified by the fact that in SC strategy, the signals do not combine, and transmission in phase II assures the success of user 2’s packet decoding. Hence, in low SNRs where the weak user’s probability of successful decoding in phase I is not too high, the reliability is increased by retransmission in phase II. However, in high SNRs, where the allocated power of user 2 in the NOMA phase guarantees the reliability, phase II transmission is pointless. Therefore, in this case, transmission via a single phase is optimal in comparison with two-phase, and the proposed scheme performs like the NOMA. Moreover, in the poor relaying link mode, the C-NOMA scheme always complies with the NOMA. As a result, from the complexity perspective, the C-NOMA usage with SC strategy seems sensible just in low SNR regimes. B. Multi-user Network with Random Channel Gains

Here, we assume that the BS is located at the center of a cell with radius of 300 m. The system bandwidth is set as B  , which is equivalent to a DL transmission duration

100 μs , and satisfies the low-latency criterion of URLLC standards. The noise power spectral density is -173 dBm/Hz, and small-scale channel coefficients are Rayleigh fading with   distribution. Large-scale path loss is modeled as L d   [19]. The total number of independent channel generations is set as 1000. Fig. 5 illustrates the average achievable fair throughput versus the number of users, K , for the proposed C-NOMA clustering with SC and MRC strategies, as well as NOMA and hybrid clustering schemes. The C-NOMA clustering scheme with MRC and SC techniques outperforms others, respectively. Moreover, it demonstrates that the suboptimal clustering algorithm based on C-NOMA (in both MRC and SC techniques) converges to a near-optimal solution. On the opposite, the NOMA clustering scheme stated in [24] yields the lowest throughput, especially in the presence of a large number of users. Ultimately, as expected, the hybrid clustering scheme outperforms the NOMA clustering, but it never outperforms the C-NOMA-based clustering. To evaluate the fairness of the proposed C-NOMA-based clustering scheme, Fig. 6 indicates Jain’s fairness index for the proposed scheme and the benchmarks. Jain’s fairness index is defined as [25] (20)   , K kk K kk

TJ K T     where * k T indicates the optimal fair throughput of user k . Jain’s fairness index is bounded in [0,1] which equal users’ rate obtains the maximum value. As Fig. 6 illustrates, the hybrid clustering scheme is fairer comparing to the C-NOMA and the NOMA clustering schemes. The reason is that in the C-NOMA-based clustering, creating C-NOMA clusters for all the users is not probable. Hence, unavoidably, some users are scheduled in hybrid clusters. Since the C-NOMA users will achieve more throughput than the users with hybrid clustering, the fairness will degrade in this scheme. Moreover, regarding the logic behind the hybrid clustering, it will always be fairer than the NOMA clustering. Interestingly, the C-NOMA clustering scheme (with both combining strategies) results in more fairness relative to the NOMA clustering in the presence of a large number of users. This is due to the fact that the denser the network is, the more users will experience the same channel. This will cause more failures in NOMA scheduling, so the C-NOMA clustering will obtain more fairness in that case. Fig. 5. Average fair throughput achieved by the different clustering schemes versus the number of users. Fig. 6. Fairness comparison between the different clustering schemes versus the number of users.

VIII. C ONCLUSION

In this paper, the combination of NOMA with the cooperative relaying technique (i.e., C-NOMA) was considered in short packet communications to guarantee high reliability and low latency. The performance of two relaying strategies, i.e., SC and MRC, was presented in terms of 10 decoding error probability in a quasi-static channel. Besides, the necessity to provide QoS of all users with critical services motived us to consider max-min fairness as a design criterion in URLLC systems. To this end, first, an optimization problem was formulated for a two-user DL C-NOMA system, and optimal power, blocklength, and transmission rate were determined under the total energy consumption, reliability, and delay constraints. To decrease the computational complexity, a suboptimal algorithm was proposed with near-optimal performance. Numerical results showed that the proposed C-NOMA scheme improves the users’ fair throughput significantly, compared to the NOMA scheme. Moreover, it was demonstrated that the C-NOMA scheme with MRC strategy outperforms SC strategy. Finally, the problem was extended to a multi-user scenario, and a clustering scheme based on C-NOMA was proposed. Monte Carlo simulations showed that the suboptimal C-NOMA clustering scheme performs close to the optimal solution, with less computational complexity. Further, the simulation results verify the supremacy of the proposed user clustering (with both SC and MRC techniques) over the NOMA and hybrid clustering schemes in boost the average fair throughput despite degrading the fairness index slightly. As a further matter, some concepts like developing the issue to a system with imperfect CSI, as well as heterogeneous networks with different communication patterns, remain for future studies. A

PPENDIX

A P

ROOF O F P ROPOSITION  

I† I† II† I† II†1 2 2 , , , , p p p m m , where  

I† I† I† II† II†1 2 2 m p p m p   . It can achieve the maximum value of   min ,

T T , which is denoted by †0 T . We increase I†1 p and I†2 p by multiplying in a scalar value   to attain I* I†1 1 p p  and I* I†2 2 p p  . It can be verified that the following equation holds, (21)    I† I† I† II† II†1 2 2I† I* I* II† II*1 2 2 max ave m p p m pm p p m p D P     

We note that since   , so I* I†1 1 p p  and I* I†2 2 p p  . Hence, we have I * I †1,1 1,1   and (22) p hp hp h p hp hp h       

This means as I† i p ,  

1, 2 i  , increases to I* i p , the corresponding SNR/SINR increases, which results in an increase in I, i i R and finally i T increases (Invoking [13, Appendix A], the allowed I, i i R is a monotonically increasing function of I, i i  .). On the other hand, I, i i R and so i T are clearly increasing functions of I m . Then, we can construct a new solution   I* I* II* I* II*1 2 2 , , , , p p p m m , where corresponds to *0 T . Also, I* I† m m m    and

II* II† m m m    with m   . As before, this solution satisfies   I* I* I* II* II*1 2 2 m p p m p   max ave

D P  . Since I* I† i i p p  and I* I† m m  , we have * †0 0 T T  . This contradicts the assumption that †0 T is an optimal solution. So, we can always find a proper  and m  such that   I* I* I* II* II*1 2 2 m p p m p   . R

EFERENCES [1]

G. Durisi, T. Koch, and P. Popovski, ―Toward Massive, Ultrareliable, and Low-Latency Wireless Communication with Short Packets,‖ Proc. IEEE, vol. 104, no. 9, pp. 1711–26, Aug. 2016. [2]

Y. Polyanskiy, H. V. Poor, and S. Verdu´, ―Channel coding rate in the finite blocklength regime,‖ IEEE Trans. Inf. Theory, vol. 56, no. 5, pp. 2307–2359, May 2010. [3]

W. Yang, G. Durisi, T. Koch, and Y. Polyanskiy ―Quasi-static multiple antenna fading channels at finite blocklength,‖ IEEE Trans. Inf. Theory, vol. 60, no. 7, pp. 4232–4265, Jul. 2014. [4]

P. Wu and N. Jindal, ―Coding versus ARQ in fading channels: How reliable should the PHY be?,‖ IEEE Trans. Commun., vol. 59, no. 12, pp. 3363–3374, Dec. 2011. [5]

B. Makki, T. Svensson, and M. Zorzi, ―Finite block-length analysis of the incremental redundancy HARQ,‖ IEEE Wireless Commun. Lett., vol. 3, no. 5, pp. 529–532, Oct. 2014. [6]

S. Xu, T.-H. Chang, S.-C. Lin, C. Shen, and G. Zhu, ―Energy-efficient packet scheduling with finite blocklength codes: Convexity analysis and efficient algorithms,‖ IEEE Trans. Wireless Commun., vol. 15, no. 8, pp. 5527–5540, Aug. 2016. [7]

H. Ren, C. Pan, Y. Deng, M. Elkashlan, and A. Nallanathan, ―Joint Pilot and Payload Power Allocation for Massive-MIMO-Enabled URLLC IoT Networks,‖ IEEE Journal on Selected Areas in Communications, vol. 38, no. 5, pp. 816—830, May 2020. [8]

H. Ren, C. Pan, Y. Deng, M. Elkashlan, and A. Nallanathan, ―Resource Allocation for Secure URLLC in Mission-Critical loT Scenarios,‖ IEEE Transactions on Communications, vol. 68, no. 9, pp. 5793—5807, Sep. 2020. [9]

C. She, Y. Duan, G. Zhao, T. Q. S. Quek, Y. Li, and B. Vucetic, ―Cross-Layer Design for Mission-Critical IoT in Mobile Edge Computing Systems,‖ IEEE Internet of Things Journal, vol. 6, no. 6, pp. 9360—9374, Dec. 2019. [10]

Y. Yu, H. Chen, Y. Li, Z. Ding, and B. Vucetic, ―On the Performance of Non-Orthogonal Multiple Access in Short-Packet Communications,‖ IEEE Communications Letters, vol. 22, no. 3, pp. 590—593, Mar. 2018. [11]

X. Sun, S. Yan, N. Yang, Z. Ding, C. Shen, and Z. Zhong, ―Short-Packet Downlink Transmission With Non-Orthogonal Multiple Access,‖ IEEE Transactions on Wireless Communications, vol. 17, no. 7, pp. 4550—4564, Jul. 2018. [12]

Y. Xu, C. Shen, T.-H. Chang, 5.-C. Lin, Y. Zhao, and G. Zhu, ―Transmission Energy Minimization for Heterogeneous Low-Latency NOMA Downlink,‖ IEEE Transactions on Wireless Communications, pp. 1054—1069, 2019. [13]

F. Salehi, N. Neda, and M.-H. Majidi, Max-min fairness in downlink non-orthogonal multiple access with short packet communications,‖ AEU - International Journal of Electronics and Communications, vol. 114, p. 153028, Feb. 2020. [14]

Y. Hu, J. Gross, and A. Schmeink, ―On the Performance Advantage of Relaying Under the Finite Blocklength Regime,‖ IEEE Communications Letters, vol. 19, no. 5, pp. 779—782, May 2015. [15]

Y. Hu, A. Schmeink, and J. Gross, ―Blocklength-Limited Performance of Relaying under Quasi-Static Rayleigh Channels,‖ IEEE Transactions on Wireless Communications, vol. 15, no. 7, pp. 4548—4558, 2016. [16] Y. Hu, A. Schmeink, and J. Gross, ―Optimal Scheduling of Reliability-Constrained Relaying System under Outdated CSI in the Finite Blocklength Regime,‖ IEEE Transactions on Vehicular Technology, vol. 67, no. 7, pp. 6146—6155, 2018. [17]

Y. Hu, M. Serror, K. Wehrle, and J. Gross, ―Finite Blocklength Performance of Cooperative Multi-Terminal Wireless Industrial Networks,‖ IEEE Transactions on Vehicular Technology, vol. 67, no. 7, pp. 5778—5792, Jul. 2018. [18]

Y. Hu, C. Schnelling, M. C. Gursoy, and A. Schmeink, ―Multi-Relay-Assisted Low-Latency High-Reliability Communications With Best Single Relay Selection,‖ IEEE Transactions on Vehicular Technology, vol. 68, no. 8, pp. 7630—7642, Aug. 2019. [19]

P. Nouri, H. Alves, and M. Latva-aho, ―On the performance of ultra-reliable decode and forward relaying under the finite blocklength,‖ 2017 European Conference on Networks and Communications (EuCNC), Jun. 2017. [20]

Y. Hu, M. C. Gursoy, and A. Schmeink, ―Efficient transmission schemes for low-latency networks: NOMA vs. relaying,‖ in 2017 IEEE PIMRC, Oct 2017, pp. 1–6. [21]

H. Ren, C. Pan, Y. Deng, M. Elkashlan, and A. Nallanathan, ―Joint Power and Blocklength optimization for URLLC in a Factory Automation Scenario,‖ IEEE Transactions on Wireless Communications, vol. 19, no. 3, pp. 1786–1801, Mar. 2020. [22]

S. Timotheou and I. Krikidis, ―Fairness for Non-Orthogonal Multiple Access in 5G Systems,‖ IEEE Signal Processing Letters, vol. 22, no. 10, pp. 1647—1651, Oct. 2015. [23]

Evolved Universal Terrestrial Radio Access (EUTRA); ―Further Advancements for E-UTRA Physical Layer Aspects,‖ document 3GPP TR 36.814, Release 9, 3GPP, 2010. [24]