Optimal SIC Ordering and Power Allocation in Downlink Multi-Cell NOMA Systems
Sepehr Rezvani, Eduard A. Jorswieck, Nader Mokari, Mohammad R. Javan
aa r X i v : . [ c s . I T ] F e b Optimal SIC Ordering and Power Allocation inDownlink Multi-Cell NOMA Systems
Sepehr Rezvani, Eduard A. Jorswieck,
Fellow, IEEE , Nader Mokari,
SeniorMember, IEEE , and Mohammad R. Javan,
Senior Member, IEEE
Abstract
In this work, we consider the problem of finding globally optimal joint successive interferencecancellation (SIC) ordering and power allocation (JSPA) for the general sum-rate maximization problemin downlink multi-cell NOMA systems. We propose a globally optimal solution based on the explorationof base stations (BSs) power consumption and distributed power allocation. The proposed centralizedalgorithm is still exponential in the number of BSs, however scales well with larger number of users.For any suboptimal decoding order, we address the problem of joint rate and power allocation (JRPA)to achieve maximum users sum-rate. Furthermore, we design semi-centralized and distributed JSPAframeworks with polynomial time complexity. Numerical results show that the optimal decoding orderresults in significant performance gains in terms of outage probability and users total spectral efficiencycompared to the channel-to-noise ratio (CNR)-based decoding order known from single-cell NOMA.Moreover, it is shown that the performance gap between our proposed centralized and semi-centralizedframeworks is quite low. Therefore, the low-complexity semi-centralized framework with near-to-optimalperformance is a good choice for larger number of BSs and users.
Index Terms
Multi-cell, NOMA, successive interference cancellation, optimal SIC ordering, power allocation.
S. Rezvani and E. A. Jorswieck are with the Department of Information Theory and Communication Systems, TechnischeUniversit¨at Braunschweig, Braunschweig, Germany (e-mails: { rezvani, jorswieck } @ifn.ing.tu-bs.de).N. Mokari is with the Department of Electrical and Computer Engineering, Tarbiat Modares University, Tehran, Iran (e-mail:[email protected]).M. R. Javan is with the Department of Electrical and Robotics Engineering, Shahrood University of Technology, Shahrood,Iran (e-mail: [email protected]). I. I
NTRODUCTION
A. Motivations and Related Works
It is known that the channel capacity of degraded broadcast channels can be achieved byperforming linear superposition coding (in power domain) at the transmitter side combined withmultiuser detection algorithms, such as successive interference cancellation (SIC), at the receiverside [1], [2]. This techique is known as power-domain non-orthogonal multiple access (NOMA)which is considered as a new radio access technique for the fifth generation (5G) wirelessnetworks and beyond [3]–[5]. In information theory, the main purpose of NOMA is reducingthe complexity of dirty paper coding (DPC) to attain the capacity region of degraded broadcastchannels. According to the SIC of NOMA, the SIC decoding order among multiplexed usersplays an important role to achieve the capacity region of degraded broadcast channels [6]–[8].It is well-known that the downlink single-input single-output (SISO) Gaussian broadcastchannels are degraded [1], [2]. Hence, NOMA with channel-to-noise ratio (CNR)-based decodingorder is capacity achieving in SISO Gaussian broadcast channels meaning that any rate region is asubset of the rate region of NOMA with CNR-based decoding order [1], [2], [9]. The superiorityof single-cell NOMA over single-cell OMA is also well-known in information theory [5], [6],[10]. In single-cell NOMA, it is verified that the power allocation optimization is necessary toachieve the maximum spectral efficiency of users [4]–[8]. From the optimization perspective,the optimal (CNR-based) decoding order is independent from the power allocation, so is robustand straightforward. Moreover, it is shown that the Hessian of sum-rate function under theCNR-based decoding order is strictly concave on powers [11]–[13]. In this way, the sum-ratemaximization problem in downlink single-cell NOMA is convex . The latter convex problemcan be efficiently solved by using the Lagrange dual method.Unfortunately, the capacity-achieving schemes are unknown in downlink multi-cell networks,since the capacity region of the two-user downlink interference channel is still unknown ingeneral [2], [6], [16]. In this work, we limit our study to the downlink single-antenna multi-cellnetwork, where the signal of users who do not belong to the associated cell is fully treatedas additive white Gaussian noise (AWGN), also called inter-cell interference (ICI) [17]–[28]. In this work, the term ’NOMA’ is referred to power-domain NOMA. The feasible region of the general power allocation problem in single-cell NOMA under the quality of service (QoS)requirement constraints is affine, so is convex [11]–[15].
Inspired by the degradation of SISO Gaussian broadcast channels, NOMA with channel-to-interference-plus-noise ratio (CINR)-based decoding order has the same performance as DPCat each cell such that it achieves the channel capacity of this multi-cell network called multi-cell NOMA. In this system, ’ICI+AWGN’ can be viewed as equivalent noise power at theusers. In contrast to single-cell NOMA, finding optimal decoding order in multi-cell NOMA ischallenging, because of the impact of ICI on the CINR of multiplexed users [18], [19], [22],[24]–[28]. The ICI at each cell is affected by the total power consumption of each neighboring(interfering) base station (BS). Therefore, the optimal SIC decoding order in each cell dependson the optimal power consumption of all the other interfering BSs. As a result, the optimalSIC decoding order in each cell should be jointly determined with centralized power allocationoptimization in all the neighboring cells. It is shown that under the CINR-based decoding order,the ICI in the centralized total power minimization problem verifies the basic properties of thestandard interference function [18], [22], [25]. Hence, the optimal joint SIC ordering and powerallocation (JSPA) can be obtained by using the well-known Yates power control framework [17].In other words, the globally optimal JSPA for total power minimization problem in multi-cellNOMA can be found in an iterative distributed manner with a fast convergence speed [18],[21], [22], [25]. However, the ICI in the centralized sum-rate maximization problem does notverify the basic properties of the standard interference function. As a result, Yates power controlframework does not guarantee any global optimality for the sum-rate maximization problem[25]. It is shown that the sum-rate function in multi-cell NOMA is nonconcave in powers, dueto existing ICI, which makes the centralized sum-rate maximization problem nonconvex andstrongly NP-hard [23]–[26]. The best candidate solution for solving this problem is monotonicoptimization which is still approximately exponential in the number of users [23], [29]. TheJSPA needs to examine the monotonic-based power allocation ( M !) B times, where ( M !) B isthe total number of possible decoding orders in B cells each having M users. Therefore, thejoint optimization via the monotonic optimization is basically impractical even at lower numberof BSs and users. In [23], [25], the authors address the sum-rate maximization problem inmulti-cell NOMA for the CNR-based decoding order resulting in suboptimal performance. In[24], [26], the SIC decoding orders are updated at the NOMA clustering subproblem, where thepower allocation is fixed. And, the centralized power allocation subproblem is solved accordingto the fixed decoding orders. Since the optimal decoding order inherently depends on ICI, the latter schemes cannot guarantee any global optimality , meaning that NOMA results in a lowerperformance than the DPC. To the best of our knowledge, the optimal JSPA for maximizingusers sum-rate in downlink multi-cell NOMA systems is still an open problem. Moreover, theperformance gap between the optimal and suboptimal decoding orders in multi-cell NOMA isnot yet addressed in the literature. B. Our Contributions
In this work, we study the fundamentals of optimal/suboptimal decoding orders on the capacityregion of the downlink multi-cell NOMA system including a single (fixed) NOMA cluster [1],[2], [18] in each cell . Our main contributions are presented as follows: • We study the fundamentals of SIC in multi-cell NOMA systems. By analyzing the Karush-Kuhn-Tucker (KKT) optimality conditions, we prove that at any feasible power consumptionlevel of BSs, only the NOMA cluster-head user determined by adaptive (optimal) SICdecoding order deserves additional power, while the users with lower decoding order getpower to only maintain their individual minimal rate demands. Then, we obtain closed-formexpressions of the optimal power allocation and SIC ordering in multi-cell NOMA for thegiven BSs power consumption. • We propose a globally optimal JSPA algorithm for maximizing users sum-rate under theindividual minimum rate demand of users. This algorithm utilizes both the exploration ofBSs power consumption and distributed power allocation. We show that this algorithm is agood solution for larger number of users while small number of BSs . • We analytically prove that under specific channel conditions, called SIC sufficient condition,the CNR-based decoding order is optimal for a user pair independent from power allocation. • We analyze the impact of any suboptimal decoding order on the capacity region of multi-cell NOMA. In contrast to [20], [23]–[27], we show that under any fixed (thus suboptimal) Imposing the commonly-used SIC necessary condition in NOMA clustering among users would result a suboptimalperformance. The global optimality can be guaranteed only if the optimal decoding orders be completely independent fromICI levels at all the cells for some channel conditions. In this work, we aim to investigate the impact of ICI on optimal decoding orders in sum-rate maximization problem whichinherently depends on power allocation. Generalizing this model to multiple NOMA clusters within a cell is considered as afuture work. The user with the highest decoding order which cancels the desired signal of all the other NOMA users within the cell. In practice, the number of interfering BSs is small anyway. decoding order, joint rate and power allocation (JRPA) is necessary to achieve the channelcapacity of users. For a suboptimal decoding order, we propose a near-to-optimal JRPA al-gorithm based on the sequential programming with polynomial complexity. The convergenceand performance of the JRPA algorithm for different initialization methods are investigated.The SIC sufficient condition is utilized to reduce the complexity of the JRPA algorithm. • We prove that under any suboptimal decoding order, guaranteeing successful SIC at allthe users by imposing the commonly-used SIC necessary constraint on power allocation[23]–[27] may significantly degrade the total spectral efficiency of users. • We also propose a globally optimal power allocation for any fixed (suboptimal) decodingorder under the SIC necessary constraint by modifying our proposed JSPA algorithm. • We propose a semi-centralized framework for a two-tier heterogeneous network (HetNet)consisting of multiple femto BSs (FBSs) underlying a single macro BS (MBS). We nu-merically show that this framework has a near-to-optimal performance with significantlyreduced complexity, so is a good solution for practical implementations.
C. Paper Organization
The rest of this paper is presented as follows. Section II describes the general multi-cell NOMAsystem, and formulates the JSPA problem for maximizing users sum-rate. The solution algorithmsare presented in Section III. Numerical results are provided in Section IV. Our conclusions arepresented in Section V.II. G
ENERAL D OWNLINK M ULTI -C ELL
NOMA S
YSTEM
Consider the downlink transmission of a multi-user single-carrier multi-cell NOMA system.The set of single-antenna BSs, and users served by BS b are indicated by B and U b , respectively.According to the NOMA protocol, the users associated to the same transmitter form a NOMAcluster. The signal of users associated to other transmitters (known as ICI) is fully treatedas AWGN at users within the NOMA cluster. Hence, we consider a single NOMA cluster ateach cell b [18] including |U b | users, where | . | is the cardinality of a set. The term k → i indicates that user k has a higher decoding order than user i such that user k is scheduled (andenforced) to decode and cancel the whole signal of user i , while the whole signal of user k istreated as INI at user i . For instance, assume that each cell b serves M users, i.e., |U b | = M .Generally, there are M ! possible decoding orders for users within the M -order NOMA cluster. Without loss of generality, let k → i if k > i , i.e., the SIC of NOMA in each cell b follows M → M − → · · · → . As shown in Fig. 1 in [18], in this SIC decoding order, the signal ofeach user i will be decoded prior to user k > i . In general, each user i first decodes and cancelsthe signal of users , . . . , i − . Then, it decodes its desired signal such that the signal of users i + 1 , . . . , M is treated as noise at user i [5]. In this regard, in each cell, the NOMA cluster-headuser M does not experience any intra-NOMA interference (INI).Let s b,i ∼ CN (0 , be the desired signal of user i ∈ U b . Denoted by λ b,i,k ∈ { , } , thebinary decoding decision indicator, where λ b,i,k = 1 if user k ∈ U b is scheduled to decode(and cancel when k = i ) s b,i , and otherwise, λ b,i,k = 0 . Since for each user pair within aNOMA cluster only one user can decode and cancel the signal of other user, we have λ b,i,k + λ b,k,i = 1 , ∀ b ∈ B , i, k ∈ U b , k = i . Moreover, the signal of each user should be decodedat that user, meaning that λ b,i,i = 1 , ∀ b ∈ B , i ∈ U b . Due to the transitive nature of SICordering, if λ b,i,k = 1 and λ b,k,h = 1 , then we should have λ b,i,h = 1 . In other words, we have λ b,i,k λ b,k,h ≤ λ b,i,h , ∀ b ∈ B , i, k, h ∈ U b . According to the SIC protocol, s b,i should be decodedat user i ∈ U b as well as all the users in Φ b,i = { k ∈ U b \ { i } | λ b,i,k = 1 } . Therefore, in the SICof NOMA, each user i first decodes and subtracts each signal s b,j , ∀ j ∈ U b \ {{ i } ∪ Φ b,i } , thenit decodes its desired signal s b,i such that the signal of users in Φ b,i is treated as noise (calledINI). According to the SIC protocol, the signal of user i will be decoded prior to user k if | Φ b,i | > | Φ b,k | . According to the above, the SIC decoding order among users can be determinedby finding λ b,i,k . Actually, λ b,i,k = 1 , ∀ b ∈ B , i, k ∈ U b , k = i is equivalent to k → i in cell b . Similar to [11]–[14], [18]–[28], we assume that the perfect channel state information (CSI)of all the users is available at the scheduler. The channel gain from BS j ∈ B to user i ∈ U b is denoted by g j,b,i . The allocated power from BS b to user i ∈ U b is denoted by p b,i . Afterperforming perfect SIC at each user l ∈ U b \ Φ b,i , the received signal of user i ∈ U b at user k ∈ { i } ∪ Φ b,i is given by y b,i,k = √ p b,i g b,b,k s b,i | {z } intended signal + X j ∈ Φ b,i √ p b,j g b,b,k s b,j | {z } INI + X j ∈B j = b X l ∈U j √ p j,l g j,b,k s j,l | {z } ICI + N b,k , (1) The term Φ b,i is the set of users in cell b with higher decoding orders than user i ∈ U b . In this work, we aim to address the performance gain of finding optimal decoding order and power allocation for maximizingusers total spectral efficiency in multi-cell NOMA systems. Considering any imperfect SIC is considered as a future work. where the first and second terms are the received desired signal and INI of user i ∈ U b at user k ∈ U b , respectively. The third term represents the ICI at user k ∈ U b . Moreover, N b,k is theAWGN at user k ∈ U b with zero mean and variance σ b,k . Without loss of generality, assume that | s b,i | = 1 , ∀ b ∈ B , i ∈ U b , and h j,b,k = | g j,b,k | [13], [18], [25]. According to (1), the SINR of user k ∈ Φ b,i for decoding and canceling the signal of user i ∈ U b is γ b,i,k = p b,i h b,b,k P j ∈ Φ b,i p b,j h b,b,k +( I b,k + σ b,k ) ,where P j ∈ Φ b,i p b,j h b,b,k is the INI power of user i ∈ U b (after perfect SIC) received at user k ∈ U b ,and I b,k = P j ∈B j = b P l ∈U j p j,l h j,b,k is the received ICI power at user k ∈ U b . For the case that k = i , γ b,i,i denotes the SINR of user i ∈ U b for decoding its desired signal s b,i after perfect SIC.For convenience, let h b,b,i ≡ h b,i , and γ b,i,i ≡ γ b,i . Furthermore, let us denote the matrix of allthe decoding indicators by λ = [ λ b,i,k ] , ∀ b ∈ B , i, k ∈ U b , in which λ b = [ λ b,i,k ] , ∀ i, k ∈ U b ,represents the decoding indicator matrix of users in U b . Moreover, p is the matrix of powerallocation among all the users, in which p b is the b -th row of this matrix indicating the powerallocation vector of users in cell b . According to the Shannon’s capacity formula, the achievablespectral efficiency of user i ∈ U b can be obtained by [4], [5] R b,i ( p , λ b ) = min k ∈{ i }∪ Φ b,i log p b,i h b,k P j ∈ Φ b,i p b,j h b,k + ( I b,k ( p − b ) + σ b,k ) . (2)Note that the set Φ b,i depends on λ b , although it is not explicitly shown in (2). The centralizedtotal spectral efficiency maximization problem is formulated byJSPA : max p ≥ , λ ∈{ , } X b ∈B X i ∈U b R b,i ( p , λ b ) (3a)s.t. X i ∈U b p b,i ≤ P max b , ∀ b ∈ B , (3b) R b,i ( p , λ b ) ≥ R min b,i , ∀ b ∈ B , i ∈ U b , (3c) λ b,i,k + λ b,k,i = 1 , ∀ b ∈ B , i, k ∈ U b , k = i, (3d) λ b,i,k λ b,k,h ≤ λ b,i,h , ∀ b ∈ B , i, k, h ∈ U b , (3e) λ b,i,i = 1 , ∀ b ∈ B , i, ∈ U b . (3f)where (3b) and (3c) are the per-BS maximum power and per-user minimum spectral efficiencyconstraints, respectively. P max b denotes the maximum power of BS b , and R min b,i is the minimumspectral efficiency demand of user i ∈ U b . The rest of the constraints are described above. III. S
OLUTION A LGORITHMS FOR THE S UM -R ATE M AXIMIZATION P ROBLEM
In this section, we propose globally optimal and suboptimal solutions for the main problem(3) under the centralized/decentralized resource management frameworks. Finally, we comparethe computational complexity of the proposed resource allocation algorithms.
A. Centralized Resource Management Framework
In this subsection, we first propose a globally optimal JSPA algorithm for problem (3). Then,by considering a fixed SIC decoding order in (3), we propose two suboptimal rate adoption andpower allocation algorithms.
1) Globally Optimal JSPA Algorithm:
Problem (3) can be classified as a mixed-integer non-linear programming (MINLP) problem. The sum-rate function in (3a) is nonconcave in p and λ which makes (3) nonconvex and strongly NP-hard [23], [25]. Let us define the power con-sumption coefficient of BS b as α b ∈ [0 , such that P i ∈U b p b,i = α b P max b . The received ICI powerat user i ∈ U b can be reformulated by I b,i ( α − b ) = P j ∈B j = b α j P max j h j,b,i . Let α = [ α b ] × B be the BSspower coefficient vector. We prove that for any given α − b , the optimal ( λ ∗ b , p ∗ b ) can be obtainedin closed form as follows: Proposition III.1.
In multi-cell NOMA, the optimal decoding order for the user pair i, k ∈ U b is k → i if and only if ˜ h b,i ( α − b ) ≤ ˜ h b,k ( α − b ) , where ˜ h b,l ( α − b ) = h b,l I b,l ( α − b )+ σ b,l , l = i, k . Therefore, λ ∗ b,i,k = 1 if and only if ˜ h b,i ( α − b ) ≤ ˜ h b,k ( α − b ) .Proof. Please see Appendix A.
Remark III.1.1.
In multi-cell NOMA, the optimal SIC ordering is the decoding order basedon the ascending order of users channel gain normalized by ICI-plus-noise (called CINR-baseddecoding order). The optimal decoding order is independent from the power allocation policywithin the cell. However, it depends on the received ICI, and subsequently on the total powerconsumption of neighboring (interfering) BSs.
Corollary III.1.1.
For any given α − b and subsequently ˜ h b,i ( α − b ) ≤ ˜ h b,k ( α − b ) , at the optimal λ ∗ b , we have log p b,i h b,i P j ∈ Φ ∗ b,i p b,j h b,i + ( I b,i + σ b,i ) ≤ log p b,i h b,k P j ∈ Φ ∗ b,i p b,j h b,k + ( I b,k + σ b,k ) . (4) According to (2) and (4) , at the optimal λ ∗ b , we have R b,i ( p , λ ∗ b ) = log p b,i h b,i P j ∈ Φ ∗ b,i p b,j h b,i +( I b,i + σ b,i ) . According to Remark III.1.1, the centralized power allocation and SIC ordering problemscannot be decoupled. However, for given α − b , and subsequently λ ∗ b (based on Proposition III.1),the optimal power p ∗ b can be obtained in closed form according to the following proposition: Proposition III.2.
Assume that α − b is fixed. For convenience, let |U b | = M , and k > i if ˜ h b,k ( α − b ) > ˜ h b,i ( α − b ) . According to Proposition III.1, the decoding order M → M − →· · · → is optimal. The optimal powers in p ∗ b can be obtained in closed form as follows: p ∗ b,i = β b,i i − Y j =1 (1 − β b,j ) α b P max b + 1˜ h b,i − i − X j =1 i − Q k = j +1 (1 − β b,k ) β b,j ˜ h b,j + , ∀ i = 1 , . . . , M − , (5) and p ∗ b,M = α b P max b − M − X i =1 β b,i i − Y j =1 (1 − β b,j ) α b P max b + 1˜ h b,i − i − X j =1 i − Q k = j +1 (1 − β b,k ) β b,j ˜ h b,j + , (6) where β b,i = R min b,i − R min b,i , ∀ i = 1 , . . . , M − , and [ . ] + = max { ., } .Proof. Please see Appendix B.
Remark III.2.1.
For sufficiently large normalized channel gains ˜ h b,i , ∀ i ∈ U b \ { M } , (5) and (6) can be approximated to p ∗ b,i ≈ " α b P max b β b,i i − Y j =1 (1 − β b,j ) ! + , ∀ i = 1 , . . . , M − , (7) and p ∗ b,M ≈ " α b P max b − M − X i =1 β b,i i − Y j =1 (1 − β b,j ) !! + , (8) respectively. Remark III.2.2.
For sufficiently large normalized channel gains ˜ h b,i , ∀ i ∈ U b \ { M } , if usershave the same minimum rate demands R min in cell b , the optimal power coefficient of each user i ∈ U b denoted by q ∗ b,i = p ∗ b,i α b P max b based on (7) and (8) can be obtained by q ∗ b,i ≈ R min − R min ) i , ∀ i = 1 , . . . , M − , q ∗ b,M ≈ R min ) M − . (9)Remark III.2.1 shows that in the high SINR regions, the optimal powers are approximatelyinsensitive to the exact channel gains. Hence, (7) and (8) are valid for the fast fading and/orimperfect CSI scenarios, where the CSI variations are small such that the optimal decodingorder remains constant . Moreover, Remark III.2.2 shows that the weaker user always gets morepower than the stronger user for the same minimum rate demands. For instance, for the casethat R min = 1 bps/Hz, regardless of the order of NOMA cluster, near to of the availablepower α b P max will be allocated to the weakest user . The performance of these approximationsis numerically evaluated in Subsection IV-F.Our proposed globally optimal JSPA algorithm utilizes both the exploration of different valuesof α b in α × B and the distributed power allocation optimization. In this algorithm, we performa greedy search on different values of each α b in α × B . For given ˆ α , we find the optimal λ ∗ and p ∗ according to Propositions III.1 and III.2, respectively. The pseudo code of the proposedglobally optimal solution is presented in Alg. 1. This algorithm needs to explore all the possiblevalues in α × B . For the total number of samples S α for each α b , the complexity of Alg. 1 is S Bα .For the case that each cell has M users, the complexity of exhaustive search is S BMp × ( M !) B ,where S p is the total number of samples for each p b,i in p . Hence, Alg. 1 reduces the complexityof exhaustive search by a factor of S Mp × ( M !) B when S α = S p . In fact, Alg. 1 has two mainadvantages: 1) The complexity is independent from the order of NOMA clusters resulting in lowcomplexity method for the scenarios with larger number of users while small number of BSs;2) The complexity of finding optimal SIC ordering is negligible since for a fixed α , the optimaldecoding order is obtained in closed form.Since the proposed Alg. 1 is still exponential in the number of BSs, it is important to checkthe feasibility of problem (3) with a low-complexity algorithm before performing Alg. 1. The In this work, we considered the perfect CSI scenario. The impact of imperfect CSI on the closed form of optimal powers inmulti-cell NOMA can be considered as a future work. Algorithm 1
Optimal JSPA for Sum-Rate Maximization Problem. Initialize the step size ǫ α ≪ , and R tot = 0 . for each sample ˆ α do Update ˜ h b,i = h b,i ˆ I b,i + σ b,i , ∀ b ∈ B , i ∈ U b , where ˆ I b,i = P j ∈B j = b ˆ α j P max j h j,b,i . Update λ according to λ b,i,k = 1 if ˜ h b,k > ˜ h b,i , or equivalently update users index accordingto k > i if ˜ h b,k > ˜ h b,i . Find p according to (5) and (6). if P b ∈B P i ∈U b R b,i ( p , λ b ) > R tot ! then Update R ∗ tot = P b ∈B P i ∈U b R b,i ( p , λ b ) ! , p ∗ = p , and λ ∗ = λ . end if end for The outputs λ ∗ and p ∗ are the optimal solutions.feasibility problem of (3) can be formulated by min p ≥ , λ ∈{ , } f ( p , λ ) s.t. (3b)-(3f) , (10)where f ( p , λ ) can be any objective function such that the intersection of the feasible domainof (3b)-(3f) is a subset of the feasible domain of f ( p , λ ) . Finding a feasible solution for (10)is challenging, due to the binary variables in λ . For the case that f ( p ) = P b ∈B P i ∈U b p b,i , it isproved that the ICI (in the total power minimization problem) under the CINR-based decodingorder verifies the basic properties of the standard interference function [18]. Hence, the well-known iterative distributed power minimization framework can globally solve (10) with a fastconvergence speed [18]. Here, we briefly present the structure of the iterative distributed powerminimization algorithm proposed in [18]. Let p ( t − be the output of iteration t − which is theinitial power matrix for iteration t denoted by p ( t ) . At iteration t , we first update the optimaldecoding order in cell based on the updated ICI (according to Proposition III.1). Then, we find p ∗ ( t ) for the fixed p ( t ) − . After that, we substitute p ( t )1 with p ∗ ( t ) . The updated p ( t ) is an initialpower matrix for the next cell . We repeat these steps at all the remaining cells. The updatedpower at cell B is the output of iteration t . We continue the iterations until the convergenceis achieved. For the fixed I b,i and subsequently given λ ∗ b in cell b , the optimal powers can be Algorithm 2
Optimal Joint SIC Ordering and Power Allocation for Total Power MinimizationProblem [18]. Initialize feasible p (0) , and tolerance ǫ tol (sufficiently small). while (cid:13)(cid:13) p ( t − (cid:13)(cid:13) − (cid:13)(cid:13) p ( t ) (cid:13)(cid:13) > ǫ tol do Set t =: t + 1 , and then update p ( t ) =: p ( t − . for b=1:B do Update the ICI term I b,i = P j ∈B j = b P l ∈U j p j,l ! h j,b,i at cell b . Update λ b in cell b according to k → i if ˜ h b,k > ˜ h b,i , where ˜ h b,l = h b,l I b,l + σ b,l , ∀ b ∈ B , l ∈U b . Find p ∗ ( t ) b using (11). Then, substitute p ( t ) b with p ∗ ( t ) b . end for end while The outputs λ ∗ and p ∗ are the optimal solutions.obtained in closed form as p ∗ b,i = β b,i M Y j = i +1 (1 + β b,j ) + 1˜ h b,i + M X j = i +1 β b,j j − Q k = i +1 (1 + β b,k )˜ h b,j , ∀ i = 1 , . . . , M. (11)where β b,i = 2 R min b,i − , ∀ i = 1 , . . . , M and ˜ h b,l = h b,l I b,l + σ b,l . For convenience, in (11), we assumedthat |U b | = M and also updated the users index based on the ascending order of ˜ h b,i . For moredetails, please see Appendix C. The pseudo code of the proposed power minimization algorithmis presented in Alg. 2. Numerical assessments show that Alg. 2 has a very fast convergencespeed [18]. The only concern is finding a feasible initial point p (0) . In Subsection IV-D, weprovided comprehensive discussions about the impact of feasible/infeasible initial points on theconvergence of Alg. 2.It is proved that the optimal α ∗ in the total power minimization problem is indeed a component-wise minimum [18]. It means that for any feasible ˆ α = [ ˆ α b ] , ∀ b ∈ B in (10), it can be guaranteedthat α ∗ b ≤ ˆ α b . As a result, the exploration area of each α b in Alg. 1 can be reduced from [0 , to [ α min b , , where α min b denotes the optimal power consumption coefficient of BS b in the total powerminimization problem. The lower-bounds α min b , ∀ b ∈ B can significantly reduce the complexityof Alg. 1 for the case that α min b grows, e.g., when the minimum rate demand of users increases.
2) Suboptimal SIC Ordering: Joint Rate and Power Allocation:
According to PropositionIII.1, since λ ∗ in (3) depends on α ∗− b , any decoding order before power allocation is indeedsuboptimal. For any suboptimal decoding order, Corollary III.1.1 may not hold and thus, NOMAis not capacity-achieving. The main problem (3) for given λ , or equivalently fixed Φ b,i , can berewritten as max p ≥ X b ∈B X i ∈U b R b,i ( p ) s.t. (3b) , (3c) . (12)Constraint (3c) can be equivalently transformed to a linear form as follows: R min b,i X j ∈ Φ b,i p b,j h b,k + I b,k + σ b,k ≤ p b,i + X j ∈ Φ b,i p b,j h b,k + I b,k + σ b,k , ∀ b ∈ B ,i, k ∈ U b , k ∈ { i } ∪ Φ b,i . (13)Hence, the feasible region of (12) is affine, so is convex in p . However, (12) is still stronglyNP-hard, due to the nonconcavity of the sum-rate function with respect to p . The optimal powersin (5) and (6) are derived based on the optimal decoding order. Hence, Alg. 1 is not applicablefor solving (12). Besides, the complexity of exhaustive search for solving (12) is S BMp , whichis exponential in the number of users. In the following, we apply the sequential programmingapproach [29] to find a suboptimal power allocation for (12). To tackle the non-differentiabilityof R b,i ( p ) , we first apply the epigraph technique [15], and define r b,i as the adopted spectralefficiency of user i ∈ U b such that r b,i ≤ log p b,i h b,k P j ∈ Φ b,i p b,j h b,k + I b,k + σ b,k ! , ∀ b ∈ B , i, k ∈U b , k ∈ { i } ∪ Φ b,i . Hence, (12) can be equivalently transformed to the following problem asJRPA : max p ≥ , r ≥ X b ∈B X i ∈U b r b,i (14a)s.t. (3b) ,r b,i ≥ R min b,i , ∀ b ∈ B , i ∈ U b , (14b) r b,i ≤ log p b,i h b,k P j ∈ Φ b,i p b,j h b,k + I b,k + σ b,k , ∀ b ∈ B , i, k ∈ U b , k ∈ { i } ∪ Φ b,i , (14c)where r = [ r b,i ] , ∀ b ∈ B , i ∈ U b . Although the objective function (14a) and constraints (3b) and(14b) are affine, problem (14) is still nonconvex due to the nonconvexity of (14c). The derivations Algorithm 3
Suboptimal JRPA Algorithm Based on the Sequential Programming. Initialize r (0) , iteration index t = 0 , and tolerance ǫ s ≪ . while (cid:13)(cid:13) r ( t ) − r ( t − (cid:13)(cid:13) > ǫ s do Set t =: t + 1 . Update the approximation parameter ˆ g ( r ( t ) b,i ) based on r ∗ ( t − . Find (cid:16) r ∗ ( t ) , ˜ p ∗ ( t ) (cid:17) by solving the convex approximated problem (40). end while Set p ∗ b,i = e ˜ p ∗ b,i , ∀ b ∈ B , i ∈ U b . The outputs (cid:0) r ∗ ( t ) , p ∗ ( t ) (cid:1) are adopted for the network.of the proposed sequential programming method for solving (14) is presented in Appendix D.The pseudo code of our proposed JRPA algorithm is shown in Alg. 3.The decoding order for some user pairs is independent from the ICI, so it can be determinedprior to power allocation optimization. For a specific channel gain condition, we prove that theCNR-based decoding order is optimal for a user pair independent from power allocation. Theorem 1. (SIC sufficient condition)
For each user pair i, k ∈ U b with h b,k σ b,k ≥ h b,i σ b,i , if h b,k σ b,k − h b,i σ b,i ≥ X j ∈Q b,i,k P max j σ b,i σ b,k ( h j,b,k h b,i − h j,b,i h b,k ) , (15) where Q b,i,k = n j ∈ B \ { b } (cid:12)(cid:12)(cid:12) h b,k h b,i < h j,b,k h j,b,i o , the decoding order k → i is optimal.Proof. Please see Appendix E.Theorem 1 shows that the optimal decoding order is challenging for only the user pairs inwhich their ICIs affect the sign of their CINR difference. Assume that the suboptimal CNR-baseddecoding order is applied in (14), i.e., k → i or λ b,i,k = 1 if h b,k σ b,k ≥ h b,i σ b,i . Let us define the setof users with higher decoding orders than user i ∈ U b that satisfy the SIC sufficient conditionby Φ CNR b,i = { k ∈ U b \ { i } | λ b,i,k = 1 , h b,k σ b,k − h b,i σ b,i ≥ P j ∈Q b,i,k P max j σ b,i σ b,k ( h j,b,k h b,i − h j,b,i h b,k ) } .Obviously, we have Φ CNR b,i ⊆ Φ b,i . Based on Theorem 1, it is required to check the feasibility ofconstraint (14c) for only users in { i } ∪ Φ b,i \ Φ CNR b,i , which significantly reduces the complexityof our proposed JRPA algorithm. For the case that Φ CNR b,i = Φ b,i , we can guarantee that Corollary(III.1.1) holds for user i ∈ U b , meaning that r ∗ b,i = ˜ R b,i ( p ∗ , λ ∗ b ) . Finally, for the case that theSIC sufficient condition holds for every user pair within cell b , i.e., Φ CNR b,i = Φ b,i , ∀ i ∈ U b , the CNR-based decoding order is optimal in cell b , meaning that the number of constraints in (14c)will be reduced to one in that cell.Similar to problem (10), it can be shown that the feasible region of (12) is the same as thefeasible region of the following total power minimization problem as min p ≥ , r ≥ X b ∈B X i ∈U b p b,i s.t. (3b) , (13) . (16)Since the objective function and all the constraints are affine, problem (16) is a linear program,so is convex. Hence, (16) can be solved by using the Dantzig’s simplex method or interior-pointmethods (IPMs) [15]. The solution of (16) can be utilized as initial feasible point for Alg. 3.
3) Suboptimal SIC Ordering: Power Allocation for a Fixed Rate Region:
In Subsection III-A2,we showed that for any suboptimal decoding order (fixed λ ), Corollary III.1.1 may not hold, sojoint power allocation and rate adoption is necessary to achieve the maximum possible spectralefficiency of the users. There are a number of research studies in multi-cell NOMA assuminga fixed rate region for each user as ˜ R b,i = log p b,i h b,i P j ∈ Φ b,i p b,j h b,i +( I b,i ( p − b )+ σ b,i ) ! eliminating theexploration area of finding optimal rate allocation [20], [23], [25]–[27]. According to (2), toguarantee that user i ∈ U b achieves its Shannon’s capacity for decoding its desired signal aftersuccessful SIC at users, the condition in (4) should be satisfied for each user k ∈ Φ b,i . For afixed λ , so fixed Φ b,i , (4) can be rewritten as h b,i I b,i + σ b,i ≤ h b,k I b,k + σ b,k , ∀ b ∈ B , i, k ∈ U b , k ∈ Φ b,i . (17)The constraint in (17) is known as SIC necessary condition [20], [23], [25]. Fact III.2.1.
In multi-cell NOMA, the SIC necessary condition (17) for each user pair i, k ∈U b , k ∈ Φ b,i implies additional maximum power constraints on the BSs in B \ { b } .Proof. Please see Appendix F.
Remark III.2.3.
Restricting the rate region of users under the suboptimal decoding order resultsin additional limitations on the BSs power consumption. For the user pairs satisfying the SICsufficient condition, the decoding order is optimal, and subsequently the SIC constraint (17) willbe completely independent from the power allocation (see Corollary III.1.1). As a result, thenegative side impact of the SIC necessary condition in power allocation will be eliminated. According to (3), for any fixed λ , the power allocation problem for the suboptimal decodingorder and fixed rate region, called fixed-rate-region power allocation (FRPA), is formulated byFRPA : max p ≥ X b ∈B X i ∈U b ˜ R b,i ( p ) (18a)s.t. (3b) , (17) , (18b) ˜ R b,i ( p ) ≥ R min b,i , ∀ b ∈ B , i ∈ U b . (18c)Constraints (18c) and (17) can be equivalently transformed to linear forms in p as (13) and (41),respectively. Therefore, the feasible region of (18) is affine, so is convex. However, the sum-rate function in (18a) is nonconcave, due to the existing ICI, which makes the main problem(18) nonconvex and strongly NP-hard [23], [25]. Problem (18) can be solved by using themonotonic optimization proposed in [29]. However, the complexity of monotonic optimizationis still exponential in the number of users [29]. According to Corollary III.1.1, under the optimaldecoding order λ ∗ , the main problem (3) can be rewritten as max p ≥ X b ∈B X i ∈U b ˜ R b,i ( p ) (19a)s.t. (3b) , (19b) ˜ R b,i ( p ) ≥ R min b,i , ∀ b ∈ B , i ∈ U b . (19c)The feasible region of (19) and (18) is the same for the case that the SIC necessary condition (17)is removed from (18). According to Fact III.2.1, the constraint (17) adds additional restrictionon the feasible region of (18). As a result, the feasible region of (18) is a subset of the feasibleregion of (19). In this regard, the optimal solution of (18) lies on the intersection of the feasibleregion of (19) and constraint (17). Accordingly, Alg. 1 can be modified based on the fixed λ andSIC constraint (17) to find the globally optimal solution of (18). The pseudo code of the globallyoptimal solution for (18) is presented in Alg. 4. The main difference of Algs. 1 and 4 is Step .In Alg. 4, we check the SIC necessary condition for the fixed decoding order instead of updatingthe SIC decoding orders in Alg. 1. Although Alg. 4 scales well with any number of multiplexedusers, it is still exponential in the number of BSs (see Subsection III-A1). One solution is toapply the well-known suboptimal power allocation based on the sequential programming methodproposed in [23], [25], [29]. Algorithm 4
Optimal Power Allocation for Fixed SIC Ordering and Rate Region. Initialize the step size ǫ α = S α ≪ , where S α is the number of samples for each α b , and R tot = 0 . for each sample ˆ α do Update ˜ h b,i = h b,i ˆ I b,i + σ b,i , ∀ b ∈ B , i ∈ U b , where ˆ I b,i = P j ∈B j = b ˆ α j P max j h j,b,i . if (cid:16) h b,i I b,i + σ b,i ≤ h b,k I b,k + σ b,k , ∀ b ∈ B , i, k ∈ U b , k ∈ Φ b,i (cid:17) then Find p according to (5) and (6). if P b ∈B P i ∈U b R b,i ( p ) > R tot ! then Update R ∗ tot = P b ∈B P i ∈U b R b,i ( p ) ! , and p ∗ = p . end if end if end for The output p ∗ is the optimal solution of (18).The feasible solution for (18) can be obtained by solving the following total power minimiza-tion problem as min p ≥ X b ∈B X i ∈U b p b,i s.t. (3b) , (17) , (18c) . (20)Problem (20) is a linear program which can be solved by using the Dantzig’s simplex method. B. Decentralized Resource Management Frameworks1) Fully Distributed JSPA Framework:
Although the globally optimal solution in SubsectionIII-A1 achieves the channel capacity of users, the complexity of Alg. 1 is still exponentialin the number of BSs. Moreover, the centralized framework would cause a large signalingoverhead. Here, we propose a fully distributed resource allocation framework in which eachBS independently allocates power to its associated users. Actually, we divide the main problem(3) into B single-cell NOMA problems. Fact III.2.2.
At the optimal point of sum-rate maximization problem of single-cell NOMA, theBS operates in its maximum available power. It means that the power constraint (3b) is activefor each BS b , i.e., P i ∈U b p ∗ b,i = P max b , ∀ b ∈ B in the fully distributed framework. Proof.
Please see Appendix G.According to Fact III.2.2, we set α ∗ b = 1 , ∀ b ∈ B . Based on the given α ∗ , the optimal decodingorder of users under the fully distributed framework can be easily obtained by Proposition III.1.According to Subsection III-A1, the optimal power p ∗ under the fully distributed framework canbe obtained by using Proposition III.2.
2) Semi-Centralized JSPA Framework:
The fully distributed framework works well for thecase that the ICI levels are significantly low, so α ∗ b → , ∀ b ∈ B in problem (3). For thecase that the ICI levels are high, e.g., at femto-cells underlying a single MBS [28], the fullydistributed framework may seriously degrade the spectral efficiency of femto-cell users. In thefollowing, we propose a semi-centralized JSPA framework in which we assume that the low-power FBSs operate in their maximum power, while the MBS power consumption is obtained bythe joint power allocation and SIC ordering of all the users. Let b = 1 be the MBS’s index, and b = 2 , . . . , B be the index of FBSs. In this framework, we assume that α b = 1 , ∀ b = 2 , . . . , B .Then, we utilize Alg. 1 to find the globally optimal p ∗ and λ ∗ for problem (3). This algorithmperforms a grid search on ≤ α ≤ . Hence, the complexity of finding optimal JSPA is onthe order of total number of samples for α , i.e., S α . Actually, the computational complexityof this algorithm is independent from the number of FBSs and users which is a good solutionfor the large-scale systems. The performance of the semi-centralized framework depends on theFBSs optimal power consumption in (3). For the case that α ∗ b → , ∀ b = 2 , . . . , B in (3), theperformance gap between the semi-centralized and centralized frameworks tends to zero. C. Computational Complexity Comparison Between Resource Allocation Algorithms
In this section, we compare the computational complexity of the proposed different resourceallocation algorithms for solving the sum-rate maximization problem (3). To simplify the analysis,we assume that each cell has M users. In this comparison, we apply the barrier method withinner Newton’s method to achieve an ǫ -suboptimal solution for a convex problem. The number ofbarrier iterations required to achieve mt = ǫ -suboptimal solution is exactly Υ = ⌈ log ( m/ ( ǫt (0) )) log µ ⌉ ,where m is the total number of inequality constraints, t (0) is the initial accuracy parameter forapproximating the functions in inequality constraints in standard form, and µ is the step sizefor updating the accuracy parameter t [15]. The number of inner Newton’s iterations at eachbarrier iteration i is denoted by N i . In general, N i depends on µ and how good is the initial points at the barrier iteration i . The computational complexity of solving a convex problemis thus on the order of total number of Newton’s iterations obtained by C cnvx = Υ P i =1 N i . Forthe case that the sequential programming converges in Q iterations, the complexity of thismethod is on the order of C SP = Q P q =1 Υ P i =1 N q,i , where N q,i denotes the number of inner Newton’siterations at the i -th barrier iteration of the q -th sequential iteration. For convenience, assumethat N q,i = N, ∀ q = 1 , . . . , Q, i = 1 , . . . , Υ . Hence, we have C cnvx = N Υ = N ⌈ log ( m/ ( ǫt (0) )) log µ ⌉ ,and C SP = QN Υ = QN ⌈ log ( m/ ( ǫt (0) )) log µ ⌉ . The complexity order of different solution algorithmsfor (3) is presented in Table I. In this table, S denotes the number of samples for each p b,i or TABLE IC
OMPUTATIONAL C OMPLEXITY OF R ESOURCE A LLOCATION A LGORITHMS FOR S OLVING THE S UM -R ATE M AXIMIZATION P ROBLEM . Algorithm Complexity Framework Optimal λ r α p
JSPA (Alg. 1) S B Centralized
X X X X X
Exhaustive Search ( M !) B × S BM Centralized
X X X X X
JRPA (Alg. 3) QN (cid:24) log (( B + BM + B ( M − )) / ( ǫt (0) )) log µ (cid:25) Centralized × ×
X X X
FRPA S B Centralized X × × X X
Monotonic Optimization [23], [29] ≈ S BM Centralized X × × X X
Sequential Program [23], [29] QN (cid:24) log (( B + BM + B ( M − )) / ( ǫt (0) )) log µ (cid:25) Centralized × × ×
X X
Subsection III-B2 S Semi-Centralized
X X X α X Subsection III-B1 Fully Distributed
X X X × X α b . Moreover, the optimality status is for only the simplified problem. For example, FRPA findsthe globally optimal powers in p , and subsequently α , when λ and r are fixed. It does notmean that the output is globally optimal solution for the main problem (3). Actually, only thefirst and second rows in Table I can find the globally optimal solution of (3), and the rest ofthe algorithms are indeed suboptimal. It is noteworthy that in Table I, we considered the highestpossible computational complexity (worst case) of each algorithm.IV. N UMERICAL R ESULTS
In this section, we evaluate the performance of our proposed resource allocation algorithms viaMATLAB Monte Carlo simulations through network realizations [18]. This comparisonis divided into two subsections: 1) performance comparison among our proposed JSPA, JRPA,and FRPA algorithms to demonstrate the benefits of optimal SIC ordering, and rate adoption TABLE IISYSTEM PARAMETERS
Parameter Notation Value Parameter Notation Value
Coverage of MBS ×
500 m Lognormal shadowing standard deviation × dBCoverage of FBS ×
40 m Small-scale fading × Rayleigh fadingDistance between MBS and FBS ×
200 m AWGN power density N b,i -174 dBm/HzNumber of macro-cell users |U m | { } MBS transmit power P max m
46 dBmNumber of femto-cell users |U f | { } FBS transmit power P max f
30 dBmUser distribution model × Uniform Minimum rate of macro-cell users R min m { .
5; 1; 2 } bps/HzMinimum distance of users to MBS ×
20 m Minimum rate of femto-cell users R min f { .
25; 0 .
50; 0 .
75; 1; 2; 3 } bps/HzMinimum distance of users to FBS × ǫ tol − Wireless bandwidth × MHz Step size of each α b ∈ [0 , ǫ α − MBS path loss × . . ( d/ Km ) dB Tolerance of Alg. 3 ǫ s − FBS path loss × . . ( d/ Km ) dB - - - for any suboptimal decoding order (see Table I); 2) performance comparison among the cen-tralized and decentralized resource allocation frameworks to demonstrate the effect of optimal α ∗ (ICI management) in the main problem (3). In our simulations, we adopt the commonly-used (suboptimal) CNR-based decoding order [23], [25] for the JRPA and FRPA algorithms.Finally, we evaluate the convergence of the iterative algorithms for solving (10) and (14), andthe performance of approximated optimal powers in Remark III.2.1. The complete source codeis available on GitLab [30]. A. Simulation Settings
Here, we consider a two-tier HetNet consisting of one FBS underlying a MBS . Within eachcell, there is one BS at the center of a circular area and U b users inside it [18]. The systemparameters and their corresponding notations are shown in Table II. The network topology andexemplary users placement are shown in Fig. 1. B. Performance of Centralized Resource Allocation Algorithms
In this subsection, we compare the performance in terms of outage probability, and userstotal spectral efficiency of our proposed JSPA, JRPA, and FRPA algorithms. Note that FRPAfinds the globally optimal solution of the sum-rate maximization problem in [25] and the single-carrier-based downlink power allocation problem in [23] with significantly reduced computationalcomplexity (see Table I). Although in practice there are larger number of FBSs, in this experiment, we aim to fundamentally investigate the impactof ICI from MBS to FBS and vice versa in optimal decoding order of users. -500 -300 -100 100 300 500 Horizontal axis coordinate (m) -400-300-200-1000100200300400500 V e r ti ca l a x i s c oo r d i n a t e ( m ) Coverage area of MBSCoverage area of FBSMBSFBSMacro-cell userFemto-cell user
Fig. 1. Network topology and exemplary user placement for |U m | = 3 , and |U f | = 2 .
1) Outage Probability Performance:
The outage probability of the JSPA, JRPA, and FRPAschemes is obtained based on optimally solving their corresponding total power minimizationproblems (10), (16), and (20), respectively. For each scheme, the outage probability is calculatedby dividing the number of infeasible problem instances by total number of samples [18]. Fig.2 shows the impact of order of NOMA clusters and minimum rate demands on the outageprobability of the JSPA, JRPA, and FRPA problems. According to Theorem 1, the ICI mayaffect the optimal decoding order of user pairs which cannot satisfy the SIC sufficient condition.We call these user pairs as the pairs depending on ICI when the CNR-based decoding order isapplied. In Fig. 2(a), we calculate the average number of user pairs depending on ICI (denoted by Ψ ) for different distances among BSs, and different order of NOMA clusters in the CNR-baseddecoding order. The parameter Ψ is increased by 1) increasing |U b | , which inherently decreasesthe LHS of (15); 2) increasing the inter-cell channel gain h j,b,i , due to increasing the RHS of(15). The second case for the femto-cell user pairs is inversely proportional to the BSs distance,due to the existing path loss. As shown, the impact of |U b | is higher than the impact of BSsdistance. The wide coverage of macro-cell results in large differences between the users channelgains, however the CNR of macro-cell users is typically low reducing the LHS of (15). As aresult, Ψ is also affected by |U m | . It is noteworthy that the position of FBS (BSs distance) inthe coverage area of MBS does not have significant impact on Ψ for the macro-cell users. Forthe case that at least one user pair within a cell does not satisfy the SIC sufficient condition, Number of macro/femto-cell users A v e r a g e nu m b e r o f u s e r p a i r s d e p e nd e d on I C I Macrocell users, BS Distance=200 mFemtocell users, BS Distance=25 mFemtocell users, BS Distance=50 mFemtocell users, BS Distance=100 mFemtocell users, BS Distance=300 m (a) Average number of user pairs which cannot satisfy the SICsufficient condition vs. order of NOMA cluster for the CNR-based decoding order. |U f | O u t a g e p r ob a b ilit y JSPA-Opt, |U m |=2JSPA-Opt, |U m |=3JSPA-Opt, |U m |=4JRPA-CNR, |U m |=2JRPA-CNR, |U m |=3 JRPA-CNR, |U m |=4FRPA-CNR, |U m |=2FRPA-CNR, |U m |=3FRPA-CNR, |U m |=4 (b) Outage probability vs. order of NOMA cluster for R min m = R min f = 1 bps/Hz. R minf (bps/Hz) -1 O u t a g e p r ob a b ilit y JSPA-Opt, R minm =0.5 bps/HzJSPA-Opt, R minm =1 bps/HzJSPA-Opt, R minm =2 bps/HzJRPA-CNR, R minm =0.5 bps/HzJRPA-CNR, R minm =1 bps/Hz JRPA-CNR, R minm =2 bps/HzFRPA-CNR, R minm =0.5 bps/HzFRPA-CNR, R minm =1 bps/HzFRPA-CNR, R minm =2 bps/Hz (c) Outage probability vs. users minimum rate demand for -order NOMA clusters. R minf (bps/Hz) -1 O u t a g e p r ob a b ilit y JSPA-Opt, R minm =0.5 bps/HzJSPA-Opt, R minm =1 bps/HzJSPA-Opt, R minm =2 bps/HzJRPA-CNR, R minm =0.5 bps/HzJRPA-CNR, R minm =1 bps/Hz JRPA-CNR, R minm =2 bps/HzFRPA-CNR, R minm =0.5 bps/HzFRPA-CNR, R minm =1 bps/HzFRPA-CNR, R minm =2 bps/Hz (d) Outage probability vs. users minimum rate demand for -order NOMA clusters.Fig. 2. Outage probability of the centralized JSPA, JRPA, and FRPA algorithms for different number of users and minimumrate demands. the CNR-based decoding order in that cell may not be optimal, resulting in reduced spectralefficiency and increased outage.In Figs. 2(b)-2(d), we observe that there exist significant performance gaps between the JSPAand JRPA schemes, which shows the superiority of finding the optimal decoding order in multi-cell NOMA. Besides, JRPA significantly reduces the outage probability of the FRPA schemewhich shows the importance of rate adoption when a suboptimal decoding order is applied. Thelarge performance gap between JSPA and FRPA shows that the SIC necessary condition in (17)seriously restricts the feasible region of the FRPA problem (see Fact III.2.1) resulting in high outage, specifically for the larger order of NOMA clusters (see Fig. 2(b)). Last but not least, inFig. 2(b), we observe that a larger order of NOMA clusters results in high outage probabilityeven for the optimal JSPA algorithm. Therefore, it is not wise to multiplex all the users whenconsidering a large number of users within a cell. The common solution is to divide the usersinto multiple groups in which each user group operates in an isolated subband [18], [20], [25].From the practical implementation, another disadvantages of increasing the order of NOMAclusters are increasing the receivers complexity and error propagation due to SIC [5], [18].Finding the optimal JSPA for maximizing users sum-rate in the general multi-carrier multi-cellNOMA system can be considered as a future work.
2) Total Spectral Efficiency Performance:
Fig. 3 investigates the impact of order of NOMAclusters and minimum rate remands on the average total spectral efficiency of users. Here, weset the sum-rate to zero when the problem is infeasible. As shown, JSPA always outperforms theJRPA and FRPA algorithms. The resulting performance gap between JSPA and JRPA is indeedan upper-bound of the exact performance gap between the optimal and CNR-based decodingorders, since Alg. 3 provides a lower-bound for the optimal value of (14) (see Table I). InSubsection IV-E, we show that JRPA is a near to optimal algorithm, so this lower-bound issignificantly tighten. Subsequently, the performance gap between JRPA and FRPA is indeed thelower-bound of the exact performance gain of rate adoption for the CNR-based decoding order.As can be seen, FRPA has a significantly lower performance compared to JRPA. In Fig. 3(a),we observe that for R min m = R min f = 1 bps/Hz, the negative side impact of increasing the orderof NOMA cluster is higher than the multi-user diversity gain. As a result, increasing the orderof NOMA clusters results in lower total spectral efficiency. Another reason is increasing theoutage probability (shown in Fig. 2(b)) which can significantly affect the average total spectralefficiency. For |U m | = |U f | = 2 , we observed that the performance gap is low between JSPAand JRPA, due to decreasing Ψ in Fig. 2(a). This performance gap grows when |U b | increases.As is expected, Figs. 3(b) and 3(c) show that the total spectral efficiency of users is decreasingin minimum rate demands. More importantly, there exist huge performance gaps between JSPAand FRPA for any number of users and minimum rate demands. C. Performance of Centralized and Decentralized Frameworks
In this subsection, we compare the performance of the centralized, semi-centralized, and fullydistributed JSPA frameworks (shown in Table I). |U f | T o t a l s p ec t r a l e ff i c i e n c y ( bp s / H z ) JSPA-Opt, |U m |=2JSPA-Opt, |U m |=3JSPA-Opt, |U m |=4JRPA-CNR, |U m |=2JRPA-CNR, |U m |=3 JRPA-CNR, |U m |=4FRPA-CNR, |U m |=2FRPA-CNR, |U m |=3FRPA-CNR, |U m |=4 (a) Total spectral efficiency vs. order of NOMA cluster for R min m = R min f = 1 bps/Hz. R minf (bps/Hz) T o t a l s p ec t r a l e ff i c i e n c y ( bp s / H z ) JSPA-Opt, R minm =0.5 bps/HzJSPA-Opt, R minm =1 bps/HzJSPA-Opt, R minm =2 bps/HzJRPA-CNR, R minm =0.5 bps/HzJRPA-CNR, R minm =1 bps/Hz JRPA-CNR, R minm =2 bps/HzFRPA-CNR, R minm =0.5 bps/HzFRPA-CNR, R minm =1 bps/HzFRPA-CNR, R minm =2 bps/Hz (b) Total spectral efficiency vs. users minimum rate demandfor -order NOMA clusters. R minf (bps/Hz) T o t a l s p ec t r a l e ff i c i e n c y ( bp s / H z ) JSPA-Opt, R minm =0.5 bps/HzJSPA-Opt, R minm =1 bps/HzJSPA-Opt, R minm =2 bps/HzJRPA-CNR, R minm =0.5 bps/HzJRPA-CNR, R minm =1 bps/Hz JRPA-CNR, R minm =2 bps/HzFRPA-CNR, R minm =0.5 bps/HzFRPA-CNR, R minm =1 bps/HzFRPA-CNR, R minm =2 bps/Hz (c) Total spectral efficiency vs. users minimum rate demandfor -order NOMA clusters.Fig. 3. Total spectral efficiency of the centralized JSPA, JRPA, and FRPA algorithms for different number of users and minimumrate demands.
1) Outage Probability Performance:
Fig. 4 evaluates the outage probability of different re-source allocation frameworks. As can be seen, the fully distributed framework results in hugeoutage probability. However, the outage probability gap between the semi-centralized and cen-tralized frameworks is decreasing with larger |U m | (Fig. 4(a)), and/or higher R min m (Figs. 4(b) and4(c)). The large performance gap between the semi-centralized and fully distributed frameworksshows the importance of ICI management from MBS to femto-cell users. It is noteworthy thatthe feasible point for the semi-centralized framework is obtained based on the gread search on α with stepsize ǫ α = 10 − . Although it can be shown that further reducing ǫ α would not cause |U f | O u t a g e p r ob a b ilit y Centralized, |U m |=2Centralized, |U m |=3Centralized, |U m |=4Semi-Centralized, |U m |=2Semi-Centralized, |U m |=3 Semi-Centralized, |U m |=4Distributed, |U m |=2Distributed, |U m |=3Distributed, |U m |=4 (a) Outage probability vs. order of NOMA cluster for R min m = R min f = 1 bps/Hz. R minf (bps/Hz) -1 O u t a g e p r ob a b ilit y Centralized, R minm =0.5 bps/HzCentralized, R minm =1 bps/HzCentralized, R minm =2 bps/HzSemi-Centralized, R minm =0.5 bps/HzSemi-Centralized, R minm =1 bps/Hz Semi-Centralized, R minm =2 bps/HzDistributed, R minm =0.5 bps/HzDistributed, R minm =1 bps/HzDistributed, R minm =2 bps/Hz (b) Outage probability vs. users minimum rate demand for -order NOMA clusters. R minf (bps/Hz) -1 O u t a g e p r ob a b ilit y Centralized, R minm =0.5 bps/HzCentralized, R minm =1 bps/HzCentralized, R minm =2 bps/HzSemi-Centralized, R minm =0.5 bps/HzSemi-Centralized, R minm =1 bps/Hz Semi-Centralized, R minm =2 bps/HzDistributed, R minm =0.5 bps/HzDistributed, R minm =1 bps/HzDistributed, R minm =2 bps/Hz (c) Outage probability vs. users minimum rate demand for -order NOMA clusters.Fig. 4. Outage probability of the centralized and decentralized JSPA frameworks for different number of users and minimumrate demands. significant higher total spectral efficiency, ǫ α = 10 − is not good enough for calculating outageprobability. ǫ α can be easily reduced to − in the semi-centralized framework, however we set ǫ α = 10 − for both the centralized and semi-centralized frameworks to have a fair comparision.For the same stepsize ǫ α , the feasible region of the semi-centralized problem is a subset of thefeasible region of its corresponding centralized problem.
2) Total Spectral Efficiency Performance:
The performance of the decentralized frameworksdepends on how good is the approximation of the power consumption of the BSs comparedto the centralized framework. Figs. 5(a)-5(c) show the impact of order of NOMA clusters and |U f | f Centralized, |U m |=2Centralized, |U m |=3Centralized, |U m |=4 (a) FBS power consumption in the cen-tralized framework vs. order of NOMAcluster for R min m = R min f = 1 bps/Hz. R minf (bps/Hz) f Centralized, R minm =0.5 bps/HzCentralized, R minm =1 bps/HzCentralized, R minm =2 bps/Hz (b) FBS power consumption in the cen-tralized framework vs. minimum ratedemand for -order NOMA clusters. R minf (bps/Hz) f Centralized, R minm =0.5 bps/HzCentralized, R minm =1 bps/HzCentralized, R minm =2 bps/Hz (c) FBS power consumption in the cen-tralized framework vs. minimum ratedemand for -order NOMA clusters. |U f | m Centralized, |U m |=2Centralized, |U m |=3Centralized, |U m |=4Semi-Centralized, |U m |=2Semi-Centralized, |U m |=3Semi-Centralized, |U m |=4 (d) MBS power consumption in thecentralized/semi-centralized frameworksvs. order of NOMA cluster for R min m = R min f = 1 bps/Hz. R minf (bps/Hz) m Centralized, R minm =0.5 bps/HzCentralized, R minm =1 bps/HzCentralized, R minm =2 bps/HzSemi-Centralized, R minm =0.5 bps/HzSemi-Centralized, R minm =1 bps/HzSemi-Centralized, R minm =2 bps/Hz (e) MBS power consumption in thecentralized/semi-centralized frameworksvs. minimum rate demand for -orderNOMA clusters. R minf (bps/Hz) m Centralized, R minm =0.5 bps/HzCentralized, R minm =1 bps/HzCentralized, R minm =2 bps/HzSemi-Centralized, R minm =0.5 bps/HzSemi-Centralized, R minm =1 bps/HzSemi-Centralized, R minm =2 bps/Hz (f) MBS power consumption in thecentralized/semi-centralized frameworksvs. minimum rate demand for -orderNOMA clusters.Fig. 5. Average power consumption coefficient of femto/macro BSs in the centralized/semi-centralized frameworks for differentnumber of macro/femto-cell users and minimum rate demands. minimum rate demands on the FBS power consumption coefficient α f at the optimal point ofthe centralized framework. As is expected, larger |U f | and/or R min f results in larger FBS powerconsumption. Moreover, we observe that increasing |U m | and/or R min m decreases α f . However,the impact of |U m | and/or R min m on α f is quite low and negligible, due to the low ICI level fromlow-power FBS to macro-cell users in average. More importantly, we observe that in most ofthe cases, the FBS operates in up to of its available power. It is noteworthy that in boththe decentralized frameworks, we assume that the FBS operates in of its available power.Figs. 5(d)-5(f) evaluate the impact of order of NOMA clusters and minimum rate demands onthe MBS power consumption coefficient α m at the optimal point of the centralized and semi-centralized frameworks. As is expected, α m is directly proportional to |U m | and/or R min m , whileis inversely proportional to |U f | and/or R min f . More importantly, we observe that1) α m in the semi-centralized framework is always upper-bounded by α m in the centralized framework. This is due to larger α f in the semi-centralized framework compared to thecentralized framework.2) The MBS typically operates in less than of its available power. Hence, the fullydistributed framework with α m = 1 results in significantly degraded spectral efficiency atfemto-cell users, due to high ICI from the MBS to femto-cell users.Last but not least, we observe that the MBS power consumption gap between the centralizedand semi-centralized frameworks is quite low (see Figs. 5(d)-5(f)).Fig. 6 investigates the total spectral efficiency of users in the centralized and decentralizedframeworks. According to the discussions for Fig. 5, we observed that in most of the cases,the performance gap between the centralized (optimal) and semi-centralized frameworks isquite low, specifically for the lower order of the femto-cell NOMA cluster. Hence, the semi-centralized framework with its low computational complexity (see Table I) is a good candidatesolution for the larger-scale systems. Besides, the fully distributed framework results in quitelow performance, due to the discussions for Fig. 5. D. Convergence of the Iterative Distributed Framework for Solving (10)The feasible domain of problem (10) is empty if1) Problem (10) is infeasible when the maximum power constraint (3b) is removed. Thiscorresponds to the feasibility of (10) which can be determined by the Perron–Frobeniuseigenvalues of the matrices arising from the power control subproblems (see Theorem 8in [18]). In this theorem, it is proved that regardless of the availability of powers, (10) canbe infeasible, due to the existing ICI and minimum rate demands.2) Problem (10) is infeasible while (10) without (3b) is feasible. As a result, (10) is infeasibleonly because of the lack of power resources to meet the QoS constraints in (3c).Since Alg. 2 is a component-wise minimum [18], for any feasible p (0) , it converges to a uniquepoint which is the globally optimal solution. More importantly, the results show that Alg. 2 alsoconverges to the globally optimal solution for any infeasible but finite p (0) if the feasible domainof problem (10) is nonempty. Fig. 7(b) shows the convergence of Alg. 2 for different initial points. α (0) b = 0 denotes a zero power consumption for the BSs, i.e., p (0) b,i = 0 , ∀ b ∈ B , i ∈ U b . Besides, α (0) b = 1 denotes that the BSs operate in their maximum power at the initial point which mayviolate (3c). Fig. 7(b) shows that Alg. 2 in both the initial points converges to the unique point.For α (0) b = 0 , the convergence of Alg. 2 corresponds tightening the lower-bound of optimal value, |U f | T o t a l s p ec t r a l e ff i c i e n c y ( bp s / H z ) Centralized, |U m |=2Centralized, |U m |=3Centralized, |U m |=4Semi-Centralized, |U m |=2Semi-Centralized, |U m |=3 Semi-Centralized, |U m |=4Distributed, |U m |=2Distributed, |U m |=3Distributed, |U m |=4 (a) Total spectral efficiency vs. order of NOMA cluster for R min m = R min f = 1 bps/Hz. R minf (bps/Hz) T o t a l s p ec t r a l e ff i c i e n c y ( bp s / H z ) Centralized, R minm =0.5 bps/HzCentralized, R minm =1 bps/HzCentralized, R minm =2 bps/HzSemi-Centralized, R minm =0.5 bps/HzSemi-Centralized, R minm =1 bps/Hz Semi-Centralized, R minm =2 bps/HzDistributed, R minm =0.5 bps/HzDistributed, R minm =1 bps/HzDistributed, R minm =2 bps/Hz (b) Total spectral efficiency vs. users minimum rate demandfor -order NOMA clusters. R minf (bps/Hz) T o t a l s p ec t r a l e ff i c i e n c y ( bp s / H z ) Centralized, R minm =0.5 bps/HzCentralized, R minm =1 bps/HzCentralized, R minm =2 bps/HzSemi-Centralized, R minm =0.5 bps/HzSemi-Centralized, R minm =1 bps/Hz Semi-Centralized, R minm =2 bps/HzDistributed, R minm =0.5 bps/HzDistributed, R minm =1 bps/HzDistributed, R minm =2 bps/Hz (c) Total spectral efficiency vs. users minimum rate demandfor -order NOMA clusters.Fig. 6. Total spectral efficiency of the centralized and decentralized frameworks for different number of macro/femto-cell usersand minimum rate demands. since the ICI at each user is always upper-bounded by its ICI at the converged point. Besides,since the ICI at each user reaches to its maximum possible value at α (0) b = 1 , the convergenceof Alg. 2 for α (0) b = 1 corresponds to tightening the upper-bound of optimal value. Based on theKKT conditions analysis in Appendix C, we observed that p ∗ is independent from the maximumpower constraint (3b). It can be shown that if problem (10) is infeasible while (10) without (3b)is feasible (Case 2 of the infeasibility reasons of problem (10)), Alg. 2 will converge to theoptimal finite point violating (3b). For larger minimum rate demands (Fig. 7(c)), we observethat Alg. 2 diverges and the optimal value tends to infinity, regardless of the maximum power -500 -300 -100 100 300 500 Horizontal axis coordinate (m) -400-300-200-1000100200300400500 V e r ti ca l a x i s c oo r d i n a t e ( m ) Coverage area of MBSCoverage area of FBSMBSFBSMacro-cell userFemto-cell user (a) The network topology and users placement for |U m | = 3 ,and |U f | = 2 . Iteration index -20-15-10-50 T o t a l po w e r c on s u m p ti on ( d B m ) (1)b =1 (1)b =0 (b) Convergence of the iterative distributed framework for R min m = 1 bps/Hz, and R min f = 1 . Iteration index -20020406080100 T o t a l po w e r c on s u m p ti on ( d B m ) (1)b =1 (1)b =0 (c) Divergence of the iterative distributed framework for R min m = 4 bps/Hz, and R min f = 4 .Fig. 7. Convergence/Divergence of Alg. 2 for different initial points, and a channel realization with |U m | = 3 , |U f | = 2 , anddifferent minimum rate demands. constraint (3b). This corresponds to the first case of the infeasibility reasons of (10). Hence, itis important to check the Perron–Frobenius eigenvalues of the matrices arising from the powercontrol subproblems (see Theorem 8 in [18]) before finding a feasible point for (10). Last butnot least, Fig. 7(b) verifies a fast convergence speed of Alg. 2 for both the initialization methods,however α (0) b = 0 converges in less iterations compared to α (0) b = 1 . E. Convergence and Performance of the JRPA Algorithm
In Fig. 8, we investigate the convergence of our proposed JRPA algorithm which is based onsequential programming. We assume that the CNR-based decoding order is applied. Since thesequential programming converges to the locally optimal solution, the initial point may affectthe performance of this method. In this study, we applied three initialization methods as1) Minimum rate equality (MRE): In this method, we obtain r (0) by solving the total powerminimization problem (16). It is proved that at the optimal (feasible) point, the spectral ef-ficiency of each user achieves its minimum rate demand. Hence, we have r (0) b,i = R min b,i , ∀ b ∈B , i ∈ U b .2) Approximated rate function (ARF): In this method, we substitute the strictly concaveterm g ( r b,i ) = ln (2 r b,i − with its approximated affine function mr b,i in (38c), where m = ∂ ln ( R − ) ∂R , where R is significantly large. Then, we solve the convex approximatedproblem of (38) and obtain r (0) . For sufficiently large R , g ( r b,i ) is upper-bounded by m × r b,i .3) Equal power allocation (EPA): In this method, we equally distribute P max b to all the usersin U b , and then obtain r (0) according to (2). This method may lead to an infeasible r (0) .MRE provides a feasible r (0) . However, this method does not consider the heterogeneity of usersspectral efficiency, leading to larger convergence speed and in some situations lower performance.MRE works well for the low-SINR scenarios with significantly high minimum rate demands.The ARF method provides a better feasible lower-bound for the total spectral efficiency of usersat the initial point. Fig. 8(a) shows that for larger minimum rate demands, ln (2 r − ≈ mr .The performance gap of ARF and the globally optimal solution is allocating more powers tousers operating in low spectral efficiency regions, which results in allocating less power to thestronger user deserving additional power. ARF also works well for the scenarios that the lowadditional minimum rate demands does not have significant impact on the users total spectralefficiency, i.e., high SINR regions. The EPA initialization method usually leads to infeasible r (0) violating (3c), due to INI and ICI at users. More importantly, we observed that EPA also leadsto high outage at the next iteration of the JRPA algorithm. In Fig. 8, we selected the scenariothat EPA (violating (3c)) does not make the next iteration infeasible to show the convergencebehavior of this initialization method.The users placement are shown in Fig. 8(b). As shown, the JRPA provides a sequence of r -10-8-6-4-20246810 y y=m.ry=ln(2 r -1) (a) Approximation of y = ln (2 r − with the linear function m × r , where m = y ′ ( r = 15) . -500 -300 -100 100 300 500 Horizontal axis coordinate (m) -400-300-200-1000100200300400500 V e r ti ca l a x i s c oo r d i n a t e ( m ) Coverage area of MBSCoverage area of FBSMBSFBSMacro-cell userFemto-cell user (b) The network topology and users placement for |U m | = 3 ,and |U f | = 2 . Iteration index T o t a l s p ec t r a l e ff i c i e n c y o f u s e r s ( bp s / H z ) Optimal ValueApproximated Rate FunctionEqual Power AllocationMinimum Rate Equality (c) Total spectral efficiency vs. iteration index for differentinitialization methods.
Iteration index S p ec t r a l e ff i c i e n c y o f eac h u s e r ( bp s / H z ) Femtocell user 1Femtocell user 2Macrocell user 1Macrocell user 2Macrocell user 3 (d) User spectral efficiency vs. iteration index for the MREinitialization method.
Iteration index S p ec t r a l e ff i c i e n c y o f eac h u s e r ( bp s / H z ) Femtocell user 1Femtocell user 2Macrocell user 1Macrocell user 2Macrocell user 3 (e) User spectral efficiency vs. iteration index for the ARFinitialization method.
Iteration index S p ec t r a l e ff i c i e n c y o f eac h u s e r ( bp s / H z ) Femtocell user 1Femtocell user 2Macrocell user 1Macrocell user 2Macrocell user 3 (f) User spectral efficiency vs. iteration index for the EPAinitialization method.Fig. 8. Convergence of Alg. 3 for different initialization methods for a scenario with |U m | = 3 , |U f | = 2 , R min m = 1 bps/Hz,and R min f = 2 . The CNR-based decoding order is optimal: Macro-cell users: → → ; Femto-cell users: → . improved solutions for any feasible initial point such that it converges to a stationary point.Interestingly, we observe that both the MRE and ARF methods converge to a unique point,which shows the low insensitivity of JRPA to these feasible initial points. In this scenario, EPAin iteration results in infeasible r (0) . And, JRPA finds a feasible solution r (1) based on infeasible r (0) , which is indeed the updated initial feasible point. According to Figs. 8(d)-8(f), we observethat at the converged point, only the NOMA cluster-head users get additional power (leadingto higher spectral efficiency than their minimum rate demand). The fast convergence speed ofindividual rates in Figs. 8(d)-8(f) shows that our proposed JRPA has a fact convergence speedshown in Fig. 8(c). In our simulations, we applied the ARF initialization method.As is mentioned in Subsection III-A2, it is difficult to find the globally optimal JRPA for anyfixed suboptimal decoding order. However, for the case that the fixed decoding order is the sameas the optimal decoding order, the performance gap between the optimal JSPA and suboptimalJRPA algorithms is due to suboptimal JRPA based on sequential programming. In Fig. 8, wechoosed the case that the CNR-based decoding order satisfy Theorem 1, so is optimal. The totalspectral efficiency of users (optimal value) at the globally optimal point is shown in Fig. 8(c).As can be seen, the sequential programming generates a sequence of improved solutions suchthat after few iterations, it converges to a near-to-optimal solution. F. Performance of the Approximated Optimal Powers in Remark III.2.1.
Here, we investigate the performance of approximated closed-form of optimal powers inRemark III.2.1. The advantage of this approximation is its insensitivity to the exact CSI. Fig. 9compares the average gap between the exact and approximated forms of optimal powers in femtoand macro-cells, separately. Since ICI is fully treated as AWGN, we evaluate this performancegap in different AWGN power levels of users which directly impacts the CINR of users. InFig. 9, M is the number of users within the considered cell, and R min is their minimum ratedemand. To reduce the randomness impact, we eliminated the Lognormal Shadowing from thepath loss model (see Table II). As can be seen in Figs. 9(a) and 9(c), the average gap betweenthe optimal and approximated powers is increasing in the order of NOMA clusters, minimumrate demands, and specifically AWGN power. Interestingly, for lower AWGN powers, e.g., lessthan − dBm, this performance gap tends to zero. As a result, this approximation works wellfor middle and high SINR scenarios. On the other hand, we observe that the macro/femto-celltotal spectral efficiency gaps between these two closed-form formulations are less than . for -124 -119 -114 -109 -104 -99 -94 Noise power at macro-cell users (dBm) A v e r a g e op ti m a l po w e r s g a p ( % ) M=2, R min =1M=2, R min =2M=2, R min =3M=2, R min =4M=3, R min =1M=3, R min =2M=3, R min =3M=3, R min =4 M=4, R min =1M=4, R min =2M=4, R min =3M=4, R min =4M=5, R min =1M=5, R min =2M=5, R min =3M=5, R min =4 (a) Average optimal and approximated powers gap vs. AWGNpower at macro-cell users. -124 -119 -114 -109 -104 -99 -94 Noise power at macro-cell users (dBm) A v e r a g e t o t a l s p ec t r a l e ff i c i e n c y g a p ( % ) M=2, R min =1M=2, R min =2M=2, R min =3M=2, R min =4M=3, R min =1M=3, R min =2M=3, R min =3M=3, R min =4 M=4, R min =1M=4, R min =2M=4, R min =3M=4, R min =4M=5, R min =1M=5, R min =2M=5, R min =3M=5, R min =4 (b) Average total spectral efficiency gap vs. AWGN power atmacro-cell users. -124 -119 -114 -109 -104 -99 -94 Noise power at femto-cell users (dBm) A v e r a g e op ti m a l po w e r s g a p ( % ) M=2, R min =1M=2, R min =2M=2, R min =3M=2, R min =4M=3, R min =1M=3, R min =2M=3, R min =3M=3, R min =4 M=4, R min =1M=4, R min =2M=4, R min =3M=4, R min =4M=5, R min =1M=5, R min =2M=5, R min =3M=5, R min =4 (c) Average optimal and approximated powers gap vs. AWGNpower at femto-cell users. -124 -119 -114 -109 -104 -99 -94 Noise power at femto-cell users (dBm) A v e r a g e t o t a l s p ec t r a l e ff i c i e n c y g a p ( % ) M=2, R min =1M=2, R min =2M=2, R min =3M=2, R min =4M=3, R min =1M=3, R min =2M=3, R min =3M=3, R min =4 M=4, R min =1M=4, R min =2M=4, R min =3M=4, R min =4M=5, R min =1M=5, R min =2M=5, R min =3M=5, R min =4 (d) Average total spectral efficiency gap vs. AWGN power atfemto-cell users.Fig. 9. The performance gap between the approximated and exact closed-form expression of optimal powers in macro/femto-cellfor different AWGN powers, number of macro/femto-cell users, and minimum rate demands. M ≤ , R min ≤ bps/Hz, and AWGN power less than − dBm. Hence, the results show ahigh insensitivity level of optimal powers at macro/femto-cell users to the CSI. The impact ofthe approximated closed-form optimal powers on the ergodic rate regions and/or imperfect CSIcan be considered as a future work.V. C ONCLUDING R EMARKS
In this paper, we addressed the problem of optimal joint SIC ordering and power allocation inmulti-cell NOMA systems to achieve the maximum users sum-rate. We showed that the optimalSIC ordering depends on the ICI at users. For the given total power consumption of BSs, we obtained the optimal powers and SIC decoding orders in closed-form. Then, we proposed aglobally optimal JSPA algorithm with a significantly reduced computational complexity. For anygiven suboptimal decoding order, we addressed the problem of joint rate and power allocationto maximize users sum-rate. We showed that for specific channel conditions, the CNR-baseddecoding order is optimal for a user pair. We also devised two decentralized resource allocationframeworks. Numerical assessments show that the optimal SIC ordering results in significantlylower outage probability and higher users sum-rate compared to the CNR-based decoding order.Moreover, for the fixed suboptimal SIC ordering, the rate adoption is necessary to achieve themaximum possible sum-rate. As a result, restricting the rate region of users results in high outageand subsequently seriously low sum-rate. Besides, we observed that the semi-centralized frame-work has a near-to-optimal performance with a significantly lower computational complexitycompared to the globally optimal JSPA algorithm in the centralized framework.A PPENDIX AP ROOF OF P ROPOSITION
III.1 .Let ˜ h b,k = h b,k I b,k + σ b,k . The rate function in (2) can be reformulated as R b,i = min k ∈{ i }∪ Φ b,i log p b,i ˜ h b,k P j ∈ Φ b,i p b,j ˜ h b,k + 1 , which is the same as the achievable rate of single-cell NOMA based on the equivalent noise.Since the SISO Gaussian broadcast channels are degraded [1], [2], NOMA with CINR-baseddecoding order is capacity achieving in each cell of multi-cell NOMA (where ICI is fully treatedas AWGN), so the decoding order ˜ h b,k > ˜ h b,i ⇒ k → i is optimal [5], [10].Similar to single-cell NOMA, in each cell b , we have ∂ log pb,k ˜ hb,k P j ∈ Φ b,i pb,j ˜ hb,k +1 ∂ ˜ h b,k > . Assumethat α − b is fixed. In the following, we analytically show that at any given p b (after linearsuperposition coding combined), the decoding order based on ˜ h b,k ( α − b ) > ˜ h b,i ( α − b ) ⇒ k → i achieves the maximum total spectral efficiency of users after SIC. Assume that cell b has M users. Moreover, the users index are updated based on k > i if ˜ h b,k ( α − b ) > ˜ h b,i ( α − b ) . We provethat the decoding order M → M − → · · · → outperforms any other possible decodingorders in terms of total spectral efficiency of users, so is optimal. To prove this, for two adjacentusers i and i + 1 (with ˜ h b,i +1 > ˜ h b,i ) consider different decoding orders as A) i + 1 → i , and subsequently M → M − → · · · → ; B) i → i + 1 , and subsequently M → M − → · · · → i + 2 → i → i + 1 → i − → · · · → . According to (2), the achievable spectral efficiency ofusers i and i + 1 in case A (after SIC) can be obtained by R Ab,i ( p b ) = log p b,i ˜ h b,iM P j = i +1 p b,j ˜ h b,i + 1 , R Ab,i +1 ( p b ) = log p b,i +1 ˜ h b,i +1 M P j = i +2 p b,j ˜ h b,i +1 + 1 . The achievable spectral efficiency of users i and i + 1 in case B (after SIC) is given by R Bb,i ( p b ) = log p b,i ˜ h b,iM P j = i +2 p b,j ˜ h b,i + 1 , R Bb,i +1 ( p b ) = log p b,i +1 ˜ h b,i ( p b,i + M P j = i +2 p b,j )˜ h b,i + 1 . In both Cases A and B, the signal of users i and i + 1 is treated as INI at users , . . . , i − .Moreover, the signal of users i and i + 1 is scheduled to be decoded and canceled by all the users i + 2 , . . . , M . Hence, the set Φ b,k of each user k ∈ { , . . . , i − , i + 2 , . . . , M } is the same in bothCases A and B, resulting in the same spectral efficiency formulated in (2). Since the signal of allthe users in U b is fully treated as noise (called ICI), the set Φ b ′ ,i of each user i ∈ U b ′ , b ′ ∈ B \ { b } is the same in both the cases A and B resulting in the same spectral efficiency. Accordingly,changing the decoding order of two adjacent users only changes the capacity region of these twodecoding orders for given p b . As a result, the total spectral efficiency gap between the differentdecoding orders A and B for given p b can be formulated by R A − B gap ( p b ) = X b ∈B X i ∈U b R Ab,i ( p b ) − X b ∈B X i ∈U b R Bb,i ( p b )= (cid:0) R Ab,i ( p b ) + R Ab,i +1 ( p b ) (cid:1) − (cid:0) R Bb,i ( p b ) + R Bb,i +1 ( p b ) (cid:1) = log M P j = i p b,j ˜ h b,i ! M P j = i +1 p b,j ˜ h b,i +1 ! M P j = i +1 p b,j ˜ h b,i ! M P j = i +2 p b,j ˜ h b,i +1 ! +log M P j = i +2 p b,j ˜ h b,i ! p b,i + M P j = i +2 p b,j )˜ h b,i ! p b,i + M P j = i +2 p b,j )˜ h b,i ! M P j = i p b,j ˜ h b,i ! = log M P j = i +1 p b,j ˜ h b,i +1 ! M P j = i +2 p b,j ˜ h b,i ! M P j = i +1 p b,j ˜ h b,i ! M P j = i +2 p b,j ˜ h b,i +1 ! = log M P j = i +1 p b,j ! ˜ h b,i +1 + M P j = i +2 p b,j ! ˜ h b,i + M P j = i +2 p b,j ! M P j = i +1 p b,j ! ˜ h b,i ˜ h b,i +1 M P j = i +1 p b,j ! ˜ h b,i + M P j = i +2 p b,j ! ˜ h b,i +1 + M P j = i +2 p b,j ! M P j = i +1 p b,j ! ˜ h b,i ˜ h b,i +1 . The difference of the numerator and denominator of the latter fraction is p b,i +1 (cid:16) ˜ h b,i +1 − ˜ h b,i (cid:17) ,which is always positive since ˜ h b,i +1 > ˜ h b,i , which results in R A − B gap ( p b ) > . Therefore, for anyfeasible p b , the decoding order i + 1 → i for each two adjacent users i and i + 1 in cell b isoptimal if and only if ˜ h b,i +1 > ˜ h b,i . Imposing this optimality condition to each two adjacentusers in cell b results in the decoding order based on ˜ h b,k ( α − b ) > ˜ h b,i ( α − b ) ⇒ k → i . As aresult, λ ∗ b,i,k = 1 if and only if ˜ h b,i ( α − b ) ≤ ˜ h b,k ( α − b ) , and the proof is completed.A PPENDIX BP ROOF OF P ROPOSITION
III.2 .According to Corollary III.1.1, the achievable spectral efficiency of each user i ∈ U b for thefixed α − b and optimal decoding order M → M − → · · · → can be formulated by ˜ R b,i ( p b ) = log p b,i ˜ h b,i ( α − b ) M P j = i +1 p b,j ˜ h b,i ( α − b ) + 1 . Note that ˜ h b,k ( α − b ) > ˜ h b,i ( α − b ) for each i, k ∈ U b , k > i . Moreover, ˜ R b,i is independent from p − b for given α − b , meaning that (3) can be equivalently divided into B single-cell NOMAsub-problems. In cell b , we find p ∗ b by solving the following sub-problem as max p b ≥ M X i =1 ˜ R b,i ( p b ) (21a)s.t. X i ∈U b p b,i = α b P max b , (21b) ˜ R b,i ( p b ) ≥ R min b,i , ∀ i ∈ U b . (21c)Similar to single-cell NOMA, it can be easily shown that the hessian of the sum-rate functionin (21a) is negative definite in p b for ˜ h b,k ( α − b ) > ˜ h b,i ( α − b ) for each i, k ∈ U b , k > i [11]–[13], so the objective function (21a) is strictly concave on p b . (21c) can be rewritten as the followinglinear constraint R min b,i M P j = i +1 p b,j ˜ h b,i ! ≤ M P j = i p b,j ˜ h b,i . Hence, the feasible region of (21) isaffine, so is convex. Accordingly, the problem (21) is strictly convex in p b . The Slater’s conditionholds in (21) since it is convex and there exists p b ≥ satisfying (21c) with strict inequalities.Therefore, the strong duality holds in (21). As a result, the KKT conditions are satisfied andthe optimal solution p ∗ can be obtained by using the Lagrange dual method [15]. The Lagrangefunction (upper-bound) of (21) is given by L ( p b , µ , δ , ν ) = M X i =1 log p b,i ˜ h b,i M P j = i +1 p b,j ˜ h b,i + M X i =1 µ i log p b,i ˜ h b,i M P j = i +1 p b,j ˜ h b,i − R min b,i + M X i =1 δ i p b,i + ν α b P max b − M X i =1 p b,i ! , where µ = [ µ , . . . , µ M ] , δ = [ δ , . . . , δ M ] , and ν are the Lagrangian multipliers correspondingto the constraints (21c), (21b), and p b,i ≥ , i = 1 , . . . , M , respectively. The Lagrange dualproblem is given by min µ , δ ,ν sup p { L ( p , µ , δ , ν ) } s.t. µ i ≥ , ∀ i = 1 , . . . , M,δ i ≥ , ∀ i = 1 , . . . , M. The KKT conditions are listed below:1) Feasibility of the primal problem (21):
C-1.1: log p ∗ b,i ˜ h b,i M P j = i +1 p ∗ b,j ˜ h b,i ≥ R min b,i , ∀ i, C-1.2: p ∗ b,i ≥ , ∀ i, C-1.3: M X i =1 p ∗ b,i = α b P max b .
2) Feasibility of the dual problem:
C-2.1: µ ∗ i ≥ , ∀ i = 1 , . . . , M, C-2.2: δ ∗ i ≥ , ∀ i = 1 , . . . , M, C-2.3: ν ∗ ∈ R .
3) The complementary slackness condition:
C-3.1: µ ∗ i log p ∗ b,i ˜ h b,i M P j = i +1 p ∗ b,j ˜ h b,i − R min b,i = 0 , ∀ i = 1 , . . . , M, C-3.2: δ ∗ i p ∗ b,i = 0 , ∀ i = 1 , . . . , M.
4) The condition ∇ p ∗ b L ( p ∗ b , µ ∗ , δ ∗ , ν ∗ ) = 0 , which implies that C-4: ∂L∂p ∗ b,i = i X k =1 µ ∗ i ln 2 . ˜ h b,k M P j = k p ∗ b,j ˜ h b,k − i − X k =1 µ ∗ i ln 2 . ˜ h b,k M P j = k +1 p ∗ b,j ˜ h b,k + δ ∗ i − ν ∗ = 0 , ∀ i = 1 , . . . , M. This equation can be reformulated by ∂L∂p ∗ b,i = 1 + µ ∗ i ln 2 . h b, M P j =1 p ∗ b,j ˜ h b, + i − X k =1 µ ∗ i ln 2 . ˜ h b,k +1 M P j = k +1 p ∗ b,j ˜ h b,k +1 − ˜ h b,k M P j = k +1 p ∗ b,j ˜ h b,k + δ ∗ i − ν ∗ = 0 , ∀ i = 1 , . . . , M. To ease of convenience, we indicate A ( p ∗ b ) = . ˜ h b, M P j =1 p ∗ b,j ˜ h b, and B k ( p ∗ b ) = . ˜ h b,k +1 M P j = k +1 p ∗ b,j ˜ h b,k +1 − ˜ h b,k M P j = k +1 p ∗ b,j ˜ h b,k , ∀ k = 1 , . . . , M − . Then, the last KKT condition can be reformulated as C-4.1: ∂L∂p ∗ b,i = (1 + µ ∗ i ) A ( p ∗ b ) + (1 + µ ∗ i ) i − X k =1 B k ( p ∗ b ) + δ ∗ i − ν ∗ = 0 , ∀ i = 1 , . . . , M. (23)The primal dual δ ∗ i , ∀ i = 1 , . . . , M, acts as a slack variable in C-4.1 (due to the KKT condition
C-2.2 ), so it can be eliminated by reformulating the KKT conditions (
C-4.1 , C-2.2 ) and
C-3.2 ,respectively as ν ∗ ≥ (1 + µ ∗ i ) A ( p ∗ b ) + i − X k =1 B k ( p ∗ b ) ! , ∀ i = 1 , . . . , M, (24)and p ∗ b,i ν ∗ − (1 + µ ∗ i ) A ( p ∗ b ) + i − X k =1 B k ( p ∗ b ) !! = 0 , ∀ i = 1 , . . . , M. (25)Obviously, A ( p ∗ ) , i = 1 , . . . , M, is positive at p ∗ b . According to Corollary III.1.1, B k ( p ∗ b ) , i =1 , . . . , M, is also positive at p ∗ b , since ˜ h b,k +1 > ˜ h b,k . To simplify the derivations, in the following, we assume that R min b,i > , ∀ i = 1 , . . . , M . Then, we show that the derivations are valid for thecase than R min b,i = 0 for some i ∈ U b . The assumption R min b,i > , ∀ i = 1 , . . . , M implies that p ∗ b,i > , ∀ i = 1 , . . . , M in C-1.2 . According to (25), we have ν ∗ = (1 + µ ∗ i ) A ( p ∗ b ) + i − X k =1 B k ( p ∗ b ) ! , ∀ i = 1 , . . . , M. (26)Consider two adjacent users i and i + 1 . According to (26), we have (1 + µ ∗ i ) A ( p ∗ b ) + i − X k =1 B k ( p ∗ b ) ! = (cid:0) µ ∗ i +1 (cid:1) A ( p ∗ b ) + i X k =1 B k ( p ∗ b ) ! , Since A ( p ∗ b ) > and B k ( p ∗ b ) > , ∀ k , we have A ( p ∗ b ) + i P k =1 B k ( p ∗ b ) > A ( p ∗ b ) + i − P k =1 B k ( p ∗ b ) .Accordingly, the latter equality holds if µ ∗ i +1 < µ ∗ i . Therefore, there exists ν ∗ satisfying (26) if µ ∗ i +1 < µ ∗ i for each i ∈ U b . Accordingly, we have the following strict inequalities as µ ∗ M < µ ∗ M − < · · · < µ ∗ . Based on Condition
C-2.1 , we have µ ∗ M ≥ which implies that µ ∗ i > , ∀ i = 1 , . . . , M − .According to Condition C-3.1 , the optimal power for users , . . . , M − can be obtained by log p ∗ b,i ˜ h b,i M P j = i +1 p ∗ b,j ˜ h b,i = R min b,i , ∀ i = 1 , . . . , M − . (27)According to the power condition C-1.3 , the optimal power of user M can be obtained by p ∗ b,M = α b P max b − M − X i =1 p ∗ b,i . (28)Note that µ ∗ M > implies that log (cid:16) p ∗ b,M ˜ h b,M (cid:17) = R min b,i due to Condition C-3.1 which mayviolate Condition
C-1.3 . Therefore, at the optimal point p ∗ b,M obtained by (28), we have µ ∗ M = 0 .Additionally, ν ∗ can take any value since at the optimal point, the KKT condition C-1.3 holds.From (27), it can be concluded that at the optimal point p ∗ b , the allocated power to all the userswith lower decoding order, i.e., users i = 1 , . . . , M − , is to only maintain their minimumspectral efficiency demand R min b,i . Moreover, (28) proves that only the NOMA cluster-head user M deserves additional power. According to (27), the optimal power for each user i < M canbe obtained by p ∗ b,i = T b,i α b P max b − i − P j =1 p ∗ b,j ! ˜ h b,i ! T b,i ˜ h b,i , ∀ i = 1 , . . . , M − , (29) where T i = R min b,i − h b,i , ∀ i = 1 , . . . , M − . For the case that R min b,i → for each user i = 1 , . . . , M − ,then T b,i → . Therefore, p ∗ b,i → meaning that when the spectral efficiency demand of theweaker user is zero, no power will be allocated to that user. According to (29), p ∗ b,i depends onoptimal powers p ∗ b,j , ∀ j = 1 , . . . , i − . Hence, the optimal powers can be directly obtained bycalculating p ∗ b, ⇒ p ∗ b, ⇒ · · · ⇒ p ∗ b,M − by (29), and finally p ∗ b,M according to (28). To find aclosed-form expression for p ∗ b,i , we rewrite (29) as p ∗ b,i = β b,i P max b − i − X j =1 p ∗ b,j + 1˜ h b,i ! , ∀ i = 1 , . . . , M − , where β b,i = R min b,i − R min b,i , ∀ i = 1 , . . . , M − . Then, we have p ∗ b,i = β b,i α b P max b − p ∗ b,i − − i − X j =1 p ∗ b,j + 1˜ h b,i ! = β b,i (cid:18) α b P max b − β b,i − α b P max b − i − X j =1 p ∗ b,j + 1˜ h b,i − ! − i − X j =1 p ∗ b,j + 1˜ h b,i (cid:19) = β b,i (cid:18) (1 − β b,i − ) α b P max b − (1 − β b,i − ) i − X j =1 p ∗ b,j + 1˜ h b,i − β b,i − ˜ h b,i − (cid:19) ... = β b,i (cid:18) (1 − β b,i − ) (1 − β b,i − ) . . . (1 − β b, ) α b P max b + 1˜ h b,i − β b,i − ˜ h b,i − − (1 − β b,i − ) β b,i − ˜ h b,i − . . . − (1 − β b,i − ) (1 − β b,i − ) . . . (1 − β b, ) β b, ˜ h b, (cid:19) . According to the above, we have p ∗ b,i = β b,i i − Y j =1 (1 − β b,j ) α b P max b + 1˜ h b,i − i − X j =1 β b,j i − Q k = j +1 (1 − β b,k )˜ h b,j , ∀ i = 1 , . . . , M − . According to (28), the optimal power of the NOMA cluster-head user M can be obtained by p ∗ b,M = α b P max b − M − X i =1 β b,i i − Y j =1 (1 − β b,j ) α b P max b + 1˜ h b,i − i − X j =1 β b,j i − Q k = j +1 (1 − β b,k )˜ h b,j . A PPENDIX CC LOSED -F ORM E XPRESSION OF O PTIMAL P OWERS FOR T OTAL P OWER M INIMIZATION P ROBLEM
Here, we first obtain the closed-form expression of optimal powers for a M -user singe-cellNOMA system under the CNR-based decoding order. Then, we extend the results to the casethat ICI is fixed in cell b serving M users and find the closed-form expressions of powers in p ∗ b under the CINR-based decoding order.In the power minimization problem of a M -user single-cell NOMA system with ˜ h < ˜ h < · · · < ˜ h M and thus the optimal (CNR-based) decoding order M → M − → · · · → , theachievable spectral efficiency of user i can be obtained by R i ( p ) = log p i ˜ h iM P j = i +1 p j ˜ h i +1 .Here, ˜ h i = h i σ i is the normalized channel gain of user i by its noise power σ i . The total powerminimization problem under the CNR-based decoding order can be formulated by min p ≥ M X i =1 p i (30a)s.t. M X i =1 p i ≤ P max , (30b) log p i ˜ h iM P j = i +1 p j ˜ h i + 1 ≥ R min i , ∀ i = 1 , . . . , M. (30c)The minimum rate constraint (30c) can be rewritten as p i h i ≥ (cid:16) R min i − (cid:17) M P j = i +1 p j ˜ h i + 1 ! , ∀ i =1 , . . . , M , which is affine in p . Hence, problem (30) is convex in p with an affine feasible set. Itcan be shown that in the power minimization problem, the allocated power to each user is onlyto maintain its minimal rate demand. In the following, we prove this proposition by analyzingthe KKT conditions. The Slater’s condition holds in (30) since it is convex and there exists p ≥ satisfying (30b) and (30c) with strict inequalities. Therefore, the strong duality in (30)holds. Hence, the KKT conditions are satisfied and the optimal solution p ∗ can be obtained byusing the Lagrange dual method [15]. The Lagrange function (lower-bound) of (30) is given by L ( p , µ , δ , ν ) = M X i =1 p i + M X i =1 µ i R min i − log p i ˜ h i M P j = i +1 p j ˜ h i + M X i =1 δ i ( − p i )+ ν M X i =1 p i − P max ! , where µ = [ µ , . . . , µ M ] , δ = [ δ , . . . , δ M ] , and ν are the Lagrangian multipliers correspondingto the constraints (30c), (30b), and p i ≥ , i = 1 , . . . , M , respectively. The Lagrange dualproblem is given by min µ , δ ,ν sup p { L ( p , µ , δ , ν ) } s.t. µ i ≥ , ∀ i = 1 , . . . , M,δ i ≥ , ∀ i = 1 , . . . , M. The KKT conditions are listed below.1) Feasibility of the primal problem (30):
C-1.1: log p ∗ i ˜ h i M P j = i +1 p ∗ j ˜ h i ≥ R min i , ∀ i, C-1.2: p ∗ i ≥ , ∀ i, C-1.3: M X i =1 p ∗ i ≤ P max .
2) Feasibility of the dual problem:
C-2.1: µ ∗ i ≥ , ∀ i = 1 , . . . , M, C-2.2: δ ∗ i ≥ , ∀ i = 1 , . . . , M, C-2.3: ν ∗ ≥ .
3) The complementary slackness condition:
C-3.1: µ ∗ i R min i − log p ∗ i ˜ h i M P j = i +1 p ∗ j ˜ h i = 0 , ∀ i = 1 , . . . , M, C-3.2: δ ∗ i p ∗ i = 0 , ∀ i = 1 , . . . , M, C-3.3: ν ∗ M X i =1 p ∗ i − P max ! = 0 .
4) The condition ∇ p ∗ L ( p ∗ , µ ∗ , δ ∗ , ν ∗ ) = 0 , which implies that C-4: ∂L∂p ∗ i = 1 − i − X j =1 µ ∗ j (cid:16) R min j − (cid:17) ˜ h j − µ ∗ i ˜ h i − δ ∗ i + ν ∗ = 0 , ∀ i = 1 , . . . , M. Let B j = (cid:16) R min j − (cid:17) ˜ h j , j = 1 , . . . , M . The latter equation is rewritten as C-4.1: ∂L∂p ∗ i = 1 − i − X j =1 µ ∗ j B j − µ ∗ i ˜ h i − δ ∗ i + ν ∗ = 0 , ∀ i = 1 , . . . , M. The primal dual δ ∗ i , ∀ i = 1 , . . . , M, acts as a slack variable in C-4.1 (due to the KKT condition
C-2.2 ), so it can be eliminated by reformulating the KKT conditions (
C-4.1 , C-2.2 ) and
C-3.2 ,respectively as ν ∗ ≥ i − X j =1 µ ∗ j B j + µ ∗ i ˜ h i − , ∀ i = 1 , . . . , M, (32)and p ∗ i ν ∗ − i − X j =1 µ ∗ j B j + µ ∗ i ˜ h i − !! = 0 , ∀ i = 1 , . . . , M. (33)We first assume that R min i > , ∀ i = 1 , . . . , M . Then, we show that the derivations are validfor R min i = 0 for some i . The assumption R min i > , ∀ i = 1 , . . . , M implies that p ∗ i > , ∀ i =1 , . . . , M in C-1.2 . According to (33), we have ν ∗ = i − X j =1 µ ∗ j B j + µ ∗ i ˜ h i − , ∀ i = 1 , . . . , M. (34)Consider two adjacent users i and i + 1 . According to (34), we have i − X j =1 µ ∗ j B j + µ ∗ i ˜ h i = i − X j =1 µ ∗ j B j + µ ∗ i B i ! + µ ∗ i +1 h i +1 , which can be simplified to µ ∗ i ˜ h i = µ ∗ i B i + µ ∗ i +1 h i +1 ⇒ µ ∗ i (cid:16) ˜ h i − B i (cid:17) = µ ∗ i +1 h i +1 . In the following, we prove that µ ∗ i +1 < µ ∗ i . Let µ ∗ i +1 ≥ µ ∗ i . It implies that h i +1 ≤ ˜ h i − B i ,which is equivalent to h i +1 + B i ≤ ˜ h i . Since B i > , it results in h i +1 ≤ ˜ h i which violates ourassumption h i +1 > ˜ h i . Accordingly, (34) holds if µ ∗ i +1 < µ ∗ i for each two adjacent users i and i + 1 . Hence, there exists ν ∗ satisfying (34) if µ ∗ M < µ ∗ M − < · · · < µ ∗ . According to Condition
C-2.1 , we have µ ∗ M ≥ which implies that µ ∗ i > , ∀ i = 1 , . . . , M − .The optimal Lagrangian multiplier µ ∗ M of the NOMA cluster-head user is also positive. This isdue to the fact that r M ( p ∗ M ) = log (1 + p ∗ M h M ) in (30c) is monotonically increasing in p ∗ M , andalso independent from the other optimal powers. Hence, at the optimal point which corresponds to the minimal p ∗ M , the spectral efficiency r M ( p ∗ M ) reaches to its lower-bound R min M . Hence, wehave log (1 + p ∗ M h M ) = R min M . According to the KKT condition C-3.1 , µ ∗ M > . As a result, wehave < µ ∗ M < µ ∗ M − < · · · < µ ∗ . According to Condition
C-3.1 , the optimal power for each user i = 1 , . . . , M can be obtainedby log p ∗ i ˜ h i M P j = i +1 p ∗ j ˜ h i = R min i , ∀ i = 1 , . . . , M. (35)It is noteworthy that the duality gap between the primal and dual problems is zero when theSlater’s condition holds. This condition implies that there exists p such that the KKT condition C-1.3 with strict inequality holds, meaning that the feasible region of (30) with strict inequalitypower constraint M P i =1 p i < P max is nonempty. Since M P i =1 p i = P max corresponds to the maximumvalue of the objective function (30a), satisfying the Slater’s condition ensures us M P i =1 p ∗ i < P max .According to Condition C-3.3 , we have ν ∗ = 0 . According to the above, it can be concludedthat at the optimal point p ∗ , the allocated power to each user i is only to maintain its minimumspectral efficiency demand R min i . According to (27), the optimal power (in Watts) for each user i < M can be obtained by p ∗ i = T i M X j = i +1 p ∗ j ˜ h i ! , ∀ i = 1 , . . . , M, (36)where T i = R min i − h i , ∀ i = 1 , . . . , M . For the case that R min i → , then T i → . Therefore, p ∗ i → meaning that no power will be allocated to user i . Similar to (29), it can be easily shown thatthe optimal powers can be obtained directly by (36). To obtain a closed-form expression for p ∗ i ,we rewrite (36) as p ∗ i = β i h i + M X j = i +1 p ∗ j ! , ∀ i = 1 , . . . , M, where β i = 2 R min i − , ∀ i = 1 , . . . , M . The optimal power p ∗ i can be reformulated as p ∗ i = β i h i + M X j = i +1 p ∗ j ! The discussions about optimal ν ∗ = 0 is only additional notes on the impact of the power constraint. We proved that forany non-empty feasible set satisfying the Slater’s condition, the power constraint will not be active. = β i h i + p ∗ i +1 + M X j = i +2 p ∗ j ! = β i h i + β i +1 h i +1 + M X j = i +2 p ∗ j ! + M X j = i +2 p ∗ j ! = β i (1 + β i +1 ) M X j = i +2 p ∗ j + 1˜ h i + β i +1 h i +1 ! = β i (1 + β i +1 ) p ∗ i +2 + M X j = i +3 p ∗ j ! + 1˜ h i + β i +1 h i +1 ! = β i (1 + β i +1 ) β i +2 h i +2 + M X j = i +3 p ∗ j ! + M X j = i +3 p ∗ j ! + 1˜ h i + β i +1 h i +1 ! = β i (1 + β i +1 )(1 + β i +2 ) M X j = i +3 p ∗ j + 1˜ h i + β i +1 h i +1 + β i +2 (1 + β i +1 ) h i +2 ! ... = β i (cid:18) (1 + β i +1 )(1 + β i +2 ) . . . (1 + β M ) + 1˜ h i + β i +1 h i +1 + β i +2 (1 + β i +1 ) h i +2 + . . . + β M (1 + β M − ) . . . (1 + β i +1 ) h M (cid:19) . According to the above, we have p ∗ i = β i M Y j = i +1 (1 + β j ) + 1˜ h i + M X j = i +1 β j j − Q k = i +1 (1 + β k )˜ h j , ∀ i = 1 , . . . , M. (37)In multi-cell NOMA, let cell b has M users with ˜ h b, < ˜ h b, < · · · < ˜ h b,M , where ˜ h b,i = h b,i I b,i + σ b,i . According to Corollary III.1.1, the achievable spectral efficiency of each user i ∈ U b un-der the optimal decoding order M → M − → · · · → is ˜ R b,i ( p ) = log p b,i ˜ h b,iM P j = i +1 p b,j ˜ h b,i +1 .In multi-cell NOMA, for the case that ICI is fixed, the power minimization problem of cell b under the optimal (CINR-based) decoding order corresponds to the power minimization problemof single-cell NOMA. According to (37), for the case that ˜ h b, < ˜ h b, < · · · < ˜ h b,M , the optimalpower of each user i ∈ U b can be obtained in closed form as p ∗ b,i = β b,i M Y j = i +1 (1 + β b,j ) + 1˜ h b,i + M X j = i +1 β b,j j − Q k = i +1 (1 + β b,k )˜ h b,j , ∀ i = 1 , . . . , M. A PPENDIX DJ OINT P OWER A LLOCATION AND R ATE A DOPTION A LGORITHM
By taking ln from the both sides of (14c), we have ln (2 r b,i −
1) + ln (cid:18) P j ∈ Φ b,i p b,j h b,k + I b,k + σ b,k (cid:19) ≤ ln ( p b,i h b,k ) , ∀ b ∈ B , i, k ∈ U b , k ∈ { i } ∪ Φ b,i . Now, let p b,i = e ˜ p b,i , and subsequently I b,i ( ˜ p − b ) = P j ∈B j = b P l ∈U j e ˜ p j,l ! h j,b,i . Accordingly, problem (3) can be rewritten as max ˜ p , r ≥ X b ∈B X i ∈U b r b,i (38a)s.t. (14b) , X i ∈U b e ˜ p b,i ≤ P max b , ∀ b ∈ B , (38b) ln (2 r b,i −
1) + ln X j ∈ Φ b,i e ˜ p b,j h b,k + X j ∈B j = b X l ∈U j e ˜ p j,l h j,b,i + σ b,k ≤ ˜ p b,i ln ( h b,k ) , ∀ b ∈ B , i, k ∈ U b , k ∈ { i } ∪ Φ b,i . (38c)The objective function (38) is affine, so is concave on r . Constraint (14b) is affine, so is convex.Constraint (38b) is also convex since log-sum-exp is convex [15]. However, (38c) is nonconvex. Inthe left hand side of (38c), it can be easily shown that the first term ln (2 r b,i − is strictly concaveon r b,i which makes (38) nonconvex and strongly NP-hard. Now, we apply the iterative sequentialprogramming method. At each iteration t , we approximate the term g (cid:16) r ( t ) b,i (cid:17) = ln (cid:16) r ( t ) b,i − (cid:17) toits first-order Taylor series around r ( t − b,i obtained from prior iteration ( t − as follows: ˆ g (cid:16) r ( t ) b,i (cid:17) = g (cid:16) r ( t − b,i (cid:17) + g ′ (cid:16) r ( t − b,i (cid:17) (cid:16) r ( t ) b,i − r ( t − b,i (cid:17) , (39)where g ′ ( r ) = r r − . By substituting g (cid:16) r ( t ) b,i (cid:17) with its affine approximated form ˆ g (cid:16) r ( t ) b,i (cid:17) in (39),problem (38) at iteration t will be approximated to the following convex form as max ˜ p ( t ) , r ( t ) ≥ X b ∈B X i ∈U b r ( t ) b,i (40a)s.t. (14b) , (38b) ˆ g (cid:16) r ( t ) b,i (cid:17) + ln X j ∈ Φ b,i e ˜ p ( t ) b,j h b,k + X j ∈B j = b X l ∈U j e ˜ p ( t ) j,l h j,b,i + σ b,k ≤ ˜ p ( t ) b,i ln ( h b,k ) , ∀ b ∈ B , i, k ∈ U b , k ∈ { i } ∪ Φ b,i . (40b)It can be shown that (40) satisfies the KKT conditions [29], so it can be solved by using theLagrange dual method, or IPMs [15]. In the sequential programming, we first initialize r (0) . Ateach iteration t , we solve (40) and find (cid:16) r ∗ ( t ) , ˜ p ∗ ( t ) (cid:17) according to the updated ˆ g (cid:16) r ( t ) b,i (cid:17) basedon r ∗ ( t − . We continue the iterations until the convergence is achieved.The solution of (40) remains in the feasible region of the main problem (38). This is dueto the fact that at each iteration t , we have ˆ g (cid:16) r ( t ) b,i (cid:17) ≥ g (cid:16) r ( t ) b,i (cid:17) . Let (cid:16) ˆ r ( t ) , ˆ˜ p ( t ) (cid:17) be the feasiblesolution of (40). It implies that (40b) is satisfied. Thus, we have ˆ˜ p ( t ) b,i ln ( h b,k ) − ln X j ∈ Φ b,i e ˆ˜ p ( t ) b,j h b,k + X j ∈B j = b X l ∈U j e ˆ˜ p ( t ) j,l h j,b,i + σ b,k ≥ ˆ g (cid:16) ˆ r ( t ) b,i (cid:17) , ∀ b ∈ B ,i, k ∈ U b , k ∈ { i } ∪ Φ b,i . Since ˆ g (cid:16) r ( t ) b,i (cid:17) ≥ g (cid:16) r ( t ) b,i (cid:17) , we can guarantee that (38c) is satisfied, meaning that (40) remains inthe feasible region of (38). It can be easily shown that the sequential programming generates asequence of improved feasible solutions, such that it converges to a stationary point which is alocal maxima of (38) [29]. The performance and convergence of this algorithm is numericallyevaluated in Subsection IV-E. A PPENDIX EP ROOF OF T HEOREM i, k ∈ U b , if (4) holds at any power level p − b , the decoding order k → i isoptimal. Note that (4) for cell b is completely independent from p b (see Proposition III.1). Letus rewrite (4) for the user pair i, k ∈ U b as h b,k σ b,k − h b,i σ b,i ≥ X j ∈B j = b P i ∈U b p j,i ! σ b,i σ b,k ( h j,b,k h b,i − h j,b,i h b,k ) . (41)For the case that the left-hand side (LHS) is non-negative and each term in the right-hand side(RHS) is non-positive, (41) is satisfied at any power level p − b [20]. These conditions imply thatwith h b,k σ b,k ≥ h b,i σ b,i and h b,k h b,i ≥ h j,b,k h j,b,i , ∀ j ∈ B \ { b } , the decoding order k → i is optimal. Now,assume that for BSs in Q b,i,k ⊆ B \ { b } we have h j,b,k h b,i − h j,b,i h b,k > . Moreover, each BS j ∈ Q b,i,k operates at its maximum power P max j , meaning that P i ∈U b p j,i = P max j , ∀ j ∈ Q b,i,k . In the following, we show that if h b,k σ b,k − h b,i σ b,i ≥ P j ∈Q b,i,k P max j σ b,i σ b,k ( h j,b,k h b,i − h j,b,i h b,k ) , the inequality(41) holds for any power level p − b . To prove this, we first note that the inequality X j ∈Q b,i,k P max j σ b,i σ b,k ( h j,b,k h b,i − h j,b,i h b,k ) ≥ X j ∈Q b,i,k P i ∈U b p j,i ! σ b,i σ b,k ( h j,b,k h b,i − h j,b,i h b,k ) , (42)always holds since ( h j,b,k h b,i − h j,b,i h b,k ) is positive for each BS j ∈ Q b,i,k , and P i ∈U b p j,i ≤ P max j , ∀ j ∈ Q b,i,k . On the other hand, for each BS j ′ ∈ B \ {Q b,i,k ∪ { b }} , we have ( h j,b,k h b,i − h j,b,i h b,k ) ≤ . Accordingly, if h b,k σ b,k ≥ h b,i σ b,i and (42) holds, the inequality in (41) holds for anypower level p − b , so the decoding order k → i is optimal at any feasible p − b .A PPENDIX FP ROOF OF F ACT
III.2.1 .For convenience, let all the channel gains be normalized by noise. According to (41), for eachuser pair i, k ∈ U b , k ∈ Φ b,i , (17) can be rewritten as the following linear constraint: h b,k − h b,i ≥ X j ∈B j = b X i ∈U b p j,i ! ( h j,b,k h b,i − h j,b,i h b,k ) . (43)Constraint (43) can be equivalently transformed to ( B − maximum power constraints includingall the BSs in B \ { b } . The power consumption of each BS j ∈ B \ { b } is upper-bounded by X i ∈U b p j,i ≤ ( h b,k − h b,i ) − P l ∈B\{ j,b } P i ∈U b p l,i ! ( h l,b,k h b,i − h l,b,i h b,k ) h j,b,k h b,i − h j,b,i h b,k | {z } Ψ j,b,i,k ( p − ( j,b ) ) . (44)In general, the maximum power constraint (3b) and SIC constraint (17) can be equivalentlycombined as X i ∈U b p b,i ≤ min P max b , min j ∈B\{ b } ,i,k ∈U j,k ∈ Φ j,i Ψ b,j,i,k ( p − ( b,j ) ) , ∀ b ∈ B . (45)The negative side impact of the SIC necessary constraint (17) can observed in (45). For the casethat min j ∈B\{ b } ,i,k ∈U j,k ∈ Φ j,i Ψ b,j,i,k ( p − ( b,j ) ) < P max b , (17) imposes additional limitations on the total powerconsumption of BS b which restricts the feasible region of (3b). A PPENDIX GP ROOF OF F ACT
III.2.2 .In the sum-rate maximization problem of a M -user single-cell NOMA system with nor-malized channel gains h < h < · · · < h M , and thus optimal (CNR-based) decoding order M → M − → · · · → , the achievable spectral efficiency of user i can be obtained by R i ( p ) = log p i h iM P j = i +1 p j h i +1 . The sum-rate function M P i =1 log p i h iM P j = i +1 p j h i +1 is mono-tonically increasing in p , i.e., the allocated power to user with the lowest decoding order.In fact, the signal of user with lowest decoding order is decoded and canceled by all theother users i = 2 , . . . , M . Due to the power constraint M P i =1 p i ≤ P max , and the fact that R ( p ) ismonotonically increasing in p , at the optimal point p ∗ , we have p ∗ = P max − M P i =2 p ∗ i . In otherwords, we ensure that at the optimal point, we have M P i =1 p ∗ i = P max , and the proof is completed.R EFERENCES [1] H. Weingarten, Y. Steinberg, and S. S. Shamai, “The capacity region of the Gaussian multiple-input multiple-outputbroadcast channel,”
IEEE Transactions on Information Theory , vol. 52, no. 9, pp. 3936–3964, 2006.[2] A. E. Gamal and Y.-H. Kim,
Network Information Theory . Cambridge University Press, 2011.[3] Y. Saito, Y. Kishiyama, A. Benjebbour, T. Nakamura, A. Li, and K. Higuchi, “Non-orthogonal multiple access (NOMA)for cellular future radio access,” in
Proc. IEEE 77th Vehicular Technology Conference (VTC Spring) , 2013, pp. 1–5.[4] Z. Ding, X. Lei, G. K. Karagiannidis, R. Schober, J. Yuan, and V. K. Bhargava, “A survey on non-orthogonal multipleaccess for 5G networks: Research challenges and future trends,”
IEEE Journal on Selected Areas in Communications ,vol. 35, no. 10, pp. 2181–2195, 2017.[5] S. M. R. Islam, N. Avazov, O. A. Dobre, and K. Kwak, “Power-domain non-orthogonal multiple access (NOMA) in 5Gsystems: Potentials and challenges,”
IEEE Communications Surveys & Tutorials , vol. 19, no. 2, pp. 721–742, 2017.[6] W. Shin, M. Vaezi, B. Lee, D. J. Love, J. Lee, and H. V. Poor, “Non-orthogonal multiple access in multi-cell networks:Theory, performance, and practical challenges,”
IEEE Communications Magazine , vol. 55, no. 10, pp. 176–183, 2017.[7] M. Vaezi, R. Schober, Z. Ding, and H. V. Poor, “Non-orthogonal multiple access: Common myths and critical questions,”
IEEE Wireless Communications , vol. 26, no. 5, pp. 174–180, 2019.[8] O. Maraqa, A. S. Rajasekaran, S. Al-Ahmadi, H. Yanikomeroglu, and S. M. Sait, “A survey of rate-optimal power domainNOMA with enabling technologies of future wireless networks,”
IEEE Communications Surveys & Tutorials , vol. 22, no. 4,pp. 2192–2235, 2020.[9] L. You and D. Yuan, “A note on decoding order in user grouping and power optimization for multi-cell NOMA with loadcoupling,”
IEEE Transactions on Wireless Communications , vol. 20, no. 1, pp. 495–505, 2021.[10] P. Xu, Z. Ding, X. Dai, and H. V. Poor, “A new evaluation criterion for non-orthogonal multiple access in 5G softwaredefined networks,”
IEEE Access , vol. 3, pp. 1633–1639, 2015.[11] J. Zhu, J. Wang, Y. Huang, S. He, X. You, and L. Yang, “On optimal power allocation for downlink non-orthogonalmultiple access systems,”
IEEE Journal on Selected Areas in Communications , vol. 35, no. 12, pp. 2744–2757, 2017. [12] M. S. Ali, E. Hossain, A. Al-Dweik, and D. I. Kim, “Downlink power allocation for CoMP-NOMA in multi-cell networks,” IEEE Transactions on Communications , vol. 66, no. 9, pp. 3982–3998, Sept. 2018.[13] W. U. Khan, F. Jameel, T. Ristaniemi, S. Khan, G. A. S. Sidhu, and J. Liu, “Joint spectral and energy efficiency optimizationfor downlink NOMA networks,”
IEEE Transactions on Cognitive Communications and Networking , vol. 6, no. 2, pp. 645–656, 2020.[14] M. S. Ali, H. Tabassum, and E. Hossain, “Dynamic user clustering and power allocation for uplink and downlink non-orthogonal multiple access (NOMA) systems,”
IEEE Access , vol. 4, pp. 6325–6343, 2016.[15] S. Boyd and L. Vandenberghe,
Convex Optimization . Cambridge University Press, 2009.[16] D. Tse and P. Viswanath,
Fundamentals of Wireless Communication . Cambridge University Press, 2005.[17] R. D. Yates, “A framework for uplink power control in cellular radio systems,”
IEEE Journal on Selected Areas inCommunications , vol. 13, no. 7, pp. 1341–1347, 1995.[18] Y. Fu, Y. Chen, and C. W. Sung, “Distributed power control for the downlink of multi-cell NOMA systems,”
IEEETransactions on Wireless Communications , vol. 16, no. 9, pp. 6207–6220, 2017.[19] J. Cui, Y. Liu, Z. Ding, P. Fan, and A. Nallanathan, “QoE-based resource allocation for multi-cell NOMA networks,”
IEEETransactions on Wireless Communications , vol. 17, no. 9, pp. 6160–6176, 2018.[20] L. You, D. Yuan, L. Lei, S. Sun, S. Chatzinotas, and B. Ottersten, “Resource optimization with load coupling in multi-cellNOMA,”
IEEE Transactions on Wireless Communications , vol. 17, no. 7, pp. 4735–4749, 2018.[21] D. Ni, L. Hao, Q. T. Tran, and X. Qian, “Transmit power minimization for downlink multi-cell multi-carrier NOMAnetworks,”
IEEE Communications Letters , vol. 22, no. 12, pp. 2459–2462, 2018.[22] L. Lei, L. You, Y. Yang, D. Yuan, S. Chatzinotas, and B. Ottersten, “Load coupling and energy optimization in multi-celland multi-carrier NOMA networks,”
IEEE Transactions on Vehicular Technology , vol. 68, no. 11, pp. 11 323–11 337, 2019.[23] Y. Sun, D. W. K. Ng, Z. Ding, and R. Schober, “Optimal joint power and subcarrier allocation for full-duplex multicarriernon-orthogonal multiple access systems,”
IEEE Transactions on Communications , vol. 65, no. 3, pp. 1077–1091, 2017.[24] J. Zhao, Y. Liu, K. K. Chai, A. Nallanathan, Y. Chen, and Z. Han, “Spectrum allocation and power control for non-orthogonal multiple access in HetNets,”
IEEE Transactions on Wireless Communications , vol. 16, no. 9, pp. 5825–5837,2017.[25] Z. Yang, C. Pan, W. Xu, Y. Pan, M. Chen, and M. Elkashlan, “Power control for multi-cell networks with non-orthogonalmultiple access,”
IEEE Transactions on Wireless Communications , vol. 17, no. 2, pp. 927–942, 2018.[26] K. Wang, Y. Liu, Z. Ding, A. Nallanathan, and M. Peng, “User association and power allocation for multi-cell non-orthogonal multiple access networks,”
IEEE Transactions on Wireless Communications , vol. 18, no. 11, pp. 5284–5298,2019.[27] A. B. M. Adam, X. Wan, and Z. Wang, “Energy efficiency maximization in downlink multi-cell multi-carrier NOMAnetworks with hardware impairments,”
IEEE Access , vol. 8, pp. 210 054–210 065, 2020.[28] C. Liu and D. Liang, “Heterogeneous networks with power-domain NOMA: Coverage, throughput, and power allocationanalysis,”
IEEE Transactions on Wireless Communications , vol. 17, no. 5, pp. 3524–3539, 2018.[29] A. Zappone, E. Bj¨ornson, L. Sanguinetti, and E. Jorswieck, “Globally optimal energy-efficient power control and receiverdesign in wireless networks,”