[PDF] Optimal SIC Ordering and Power Allocation in Downlink Multi-Cell NOMA Systems

Abstract

In this work, we propose a globally optimal joint successive interference cancellation (SIC) ordering and power allocation (JSPA) algorithm for the sum-rate maximization problem in downlink multi-cell non-orthogonal multiple access (NOMA) systems. The proposed algorithm is based on the exploration of base stations (BSs) power consumption, and closed-form of optimal powers obtained for each cell. Although the optimal JSPA algorithm scales well with larger number of users, it is still exponential in the number of cells. For any suboptimal decoding order, we propose a low-complexity near-optimal joint rate and power allocation (JRPA) strategy in which the complete rate region of users is exploited. Furthermore, we design a near-optimal semi-centralized JSPA framework for a two-tier heterogeneous network such that it scales well with larger number of small-BSs and users. Numerical results show that JRPA highly outperforms the case that the users are enforced to achieve their channel capacity by imposing the well-known SIC necessary condition on power allocation. Moreover, the proposed semi-centralized JSPA framework significantly outperforms the fully distributed framework, where all the BSs operate in their maximum power budget. Therefore, the centralized JRPA and semi-centralized JSPA algorithms with near-to-optimal performance are good choices for larger number of cells and users.

Full PDF

aa r X i v : . [ c s . I T ] F e b Optimal SIC Ordering and Power Allocation inDownlink Multi-Cell NOMA Systems

Sepehr Rezvani, Eduard A. Jorswieck,

Fellow, IEEE , Nader Mokari,

SeniorMember, IEEE , and Mohammad R. Javan,

Senior Member, IEEE

Abstract

In this work, we consider the problem of ﬁnding globally optimal joint successive interferencecancellation (SIC) ordering and power allocation (JSPA) for the general sum-rate maximization problemin downlink multi-cell NOMA systems. We propose a globally optimal solution based on the explorationof base stations (BSs) power consumption and distributed power allocation. The proposed centralizedalgorithm is still exponential in the number of BSs, however scales well with larger number of users.For any suboptimal decoding order, we address the problem of joint rate and power allocation (JRPA)to achieve maximum users sum-rate. Furthermore, we design semi-centralized and distributed JSPAframeworks with polynomial time complexity. Numerical results show that the optimal decoding orderresults in signiﬁcant performance gains in terms of outage probability and users total spectral efﬁciencycompared to the channel-to-noise ratio (CNR)-based decoding order known from single-cell NOMA.Moreover, it is shown that the performance gap between our proposed centralized and semi-centralizedframeworks is quite low. Therefore, the low-complexity semi-centralized framework with near-to-optimalperformance is a good choice for larger number of BSs and users.

Index Terms

Multi-cell, NOMA, successive interference cancellation, optimal SIC ordering, power allocation.

S. Rezvani and E. A. Jorswieck are with the Department of Information Theory and Communication Systems, TechnischeUniversit¨at Braunschweig, Braunschweig, Germany (e-mails: { rezvani, jorswieck } @ifn.ing.tu-bs.de).N. Mokari is with the Department of Electrical and Computer Engineering, Tarbiat Modares University, Tehran, Iran (e-mail:[email protected]).M. R. Javan is with the Department of Electrical and Robotics Engineering, Shahrood University of Technology, Shahrood,Iran (e-mail: [email protected]). I. I

NTRODUCTION

A. Motivations and Related Works

It is known that the channel capacity of degraded broadcast channels can be achieved byperforming linear superposition coding (in power domain) at the transmitter side combined withmultiuser detection algorithms, such as successive interference cancellation (SIC), at the receiverside [1], [2]. This techique is known as power-domain non-orthogonal multiple access (NOMA)which is considered as a new radio access technique for the ﬁfth generation (5G) wirelessnetworks and beyond [3]–[5]. In information theory, the main purpose of NOMA is reducingthe complexity of dirty paper coding (DPC) to attain the capacity region of degraded broadcastchannels. According to the SIC of NOMA, the SIC decoding order among multiplexed usersplays an important role to achieve the capacity region of degraded broadcast channels [6]–[8].It is well-known that the downlink single-input single-output (SISO) Gaussian broadcastchannels are degraded [1], [2]. Hence, NOMA with channel-to-noise ratio (CNR)-based decodingorder is capacity achieving in SISO Gaussian broadcast channels meaning that any rate region is asubset of the rate region of NOMA with CNR-based decoding order [1], [2], [9]. The superiorityof single-cell NOMA over single-cell OMA is also well-known in information theory [5], [6],[10]. In single-cell NOMA, it is veriﬁed that the power allocation optimization is necessary toachieve the maximum spectral efﬁciency of users [4]–[8]. From the optimization perspective,the optimal (CNR-based) decoding order is independent from the power allocation, so is robustand straightforward. Moreover, it is shown that the Hessian of sum-rate function under theCNR-based decoding order is strictly concave on powers [11]–[13]. In this way, the sum-ratemaximization problem in downlink single-cell NOMA is convex . The latter convex problemcan be efﬁciently solved by using the Lagrange dual method.Unfortunately, the capacity-achieving schemes are unknown in downlink multi-cell networks,since the capacity region of the two-user downlink interference channel is still unknown ingeneral [2], [6], [16]. In this work, we limit our study to the downlink single-antenna multi-cellnetwork, where the signal of users who do not belong to the associated cell is fully treatedas additive white Gaussian noise (AWGN), also called inter-cell interference (ICI) [17]–[28]. In this work, the term ’NOMA’ is referred to power-domain NOMA. The feasible region of the general power allocation problem in single-cell NOMA under the quality of service (QoS)requirement constraints is afﬁne, so is convex [11]–[15].

Inspired by the degradation of SISO Gaussian broadcast channels, NOMA with channel-to-interference-plus-noise ratio (CINR)-based decoding order has the same performance as DPCat each cell such that it achieves the channel capacity of this multi-cell network called multi-cell NOMA. In this system, ’ICI+AWGN’ can be viewed as equivalent noise power at theusers. In contrast to single-cell NOMA, ﬁnding optimal decoding order in multi-cell NOMA ischallenging, because of the impact of ICI on the CINR of multiplexed users [18], [19], [22],[24]–[28]. The ICI at each cell is affected by the total power consumption of each neighboring(interfering) base station (BS). Therefore, the optimal SIC decoding order in each cell dependson the optimal power consumption of all the other interfering BSs. As a result, the optimalSIC decoding order in each cell should be jointly determined with centralized power allocationoptimization in all the neighboring cells. It is shown that under the CINR-based decoding order,the ICI in the centralized total power minimization problem veriﬁes the basic properties of thestandard interference function [18], [22], [25]. Hence, the optimal joint SIC ordering and powerallocation (JSPA) can be obtained by using the well-known Yates power control framework [17].In other words, the globally optimal JSPA for total power minimization problem in multi-cellNOMA can be found in an iterative distributed manner with a fast convergence speed [18],[21], [22], [25]. However, the ICI in the centralized sum-rate maximization problem does notverify the basic properties of the standard interference function. As a result, Yates power controlframework does not guarantee any global optimality for the sum-rate maximization problem[25]. It is shown that the sum-rate function in multi-cell NOMA is nonconcave in powers, dueto existing ICI, which makes the centralized sum-rate maximization problem nonconvex andstrongly NP-hard [23]–[26]. The best candidate solution for solving this problem is monotonicoptimization which is still approximately exponential in the number of users [23], [29]. TheJSPA needs to examine the monotonic-based power allocation ( M !) B times, where ( M !) B isthe total number of possible decoding orders in B cells each having M users. Therefore, thejoint optimization via the monotonic optimization is basically impractical even at lower numberof BSs and users. In [23], [25], the authors address the sum-rate maximization problem inmulti-cell NOMA for the CNR-based decoding order resulting in suboptimal performance. In[24], [26], the SIC decoding orders are updated at the NOMA clustering subproblem, where thepower allocation is ﬁxed. And, the centralized power allocation subproblem is solved accordingto the ﬁxed decoding orders. Since the optimal decoding order inherently depends on ICI, the latter schemes cannot guarantee any global optimality , meaning that NOMA results in a lowerperformance than the DPC. To the best of our knowledge, the optimal JSPA for maximizingusers sum-rate in downlink multi-cell NOMA systems is still an open problem. Moreover, theperformance gap between the optimal and suboptimal decoding orders in multi-cell NOMA isnot yet addressed in the literature. B. Our Contributions

In this work, we study the fundamentals of optimal/suboptimal decoding orders on the capacityregion of the downlink multi-cell NOMA system including a single (ﬁxed) NOMA cluster [1],[2], [18] in each cell . Our main contributions are presented as follows: • We study the fundamentals of SIC in multi-cell NOMA systems. By analyzing the Karush-Kuhn-Tucker (KKT) optimality conditions, we prove that at any feasible power consumptionlevel of BSs, only the NOMA cluster-head user determined by adaptive (optimal) SICdecoding order deserves additional power, while the users with lower decoding order getpower to only maintain their individual minimal rate demands. Then, we obtain closed-formexpressions of the optimal power allocation and SIC ordering in multi-cell NOMA for thegiven BSs power consumption. • We propose a globally optimal JSPA algorithm for maximizing users sum-rate under theindividual minimum rate demand of users. This algorithm utilizes both the exploration ofBSs power consumption and distributed power allocation. We show that this algorithm is agood solution for larger number of users while small number of BSs . • We analytically prove that under speciﬁc channel conditions, called SIC sufﬁcient condition,the CNR-based decoding order is optimal for a user pair independent from power allocation. • We analyze the impact of any suboptimal decoding order on the capacity region of multi-cell NOMA. In contrast to [20], [23]–[27], we show that under any ﬁxed (thus suboptimal) Imposing the commonly-used SIC necessary condition in NOMA clustering among users would result a suboptimalperformance. The global optimality can be guaranteed only if the optimal decoding orders be completely independent fromICI levels at all the cells for some channel conditions. In this work, we aim to investigate the impact of ICI on optimal decoding orders in sum-rate maximization problem whichinherently depends on power allocation. Generalizing this model to multiple NOMA clusters within a cell is considered as afuture work. The user with the highest decoding order which cancels the desired signal of all the other NOMA users within the cell. In practice, the number of interfering BSs is small anyway. decoding order, joint rate and power allocation (JRPA) is necessary to achieve the channelcapacity of users. For a suboptimal decoding order, we propose a near-to-optimal JRPA al-gorithm based on the sequential programming with polynomial complexity. The convergenceand performance of the JRPA algorithm for different initialization methods are investigated.The SIC sufﬁcient condition is utilized to reduce the complexity of the JRPA algorithm. • We prove that under any suboptimal decoding order, guaranteeing successful SIC at allthe users by imposing the commonly-used SIC necessary constraint on power allocation[23]–[27] may signiﬁcantly degrade the total spectral efﬁciency of users. • We also propose a globally optimal power allocation for any ﬁxed (suboptimal) decodingorder under the SIC necessary constraint by modifying our proposed JSPA algorithm. • We propose a semi-centralized framework for a two-tier heterogeneous network (HetNet)consisting of multiple femto BSs (FBSs) underlying a single macro BS (MBS). We nu-merically show that this framework has a near-to-optimal performance with signiﬁcantlyreduced complexity, so is a good solution for practical implementations.

C. Paper Organization

The rest of this paper is presented as follows. Section II describes the general multi-cell NOMAsystem, and formulates the JSPA problem for maximizing users sum-rate. The solution algorithmsare presented in Section III. Numerical results are provided in Section IV. Our conclusions arepresented in Section V.II. G

ENERAL D OWNLINK M ULTI -C ELL

NOMA S

YSTEM

Consider the downlink transmission of a multi-user single-carrier multi-cell NOMA system.The set of single-antenna BSs, and users served by BS b are indicated by B and U b , respectively.According to the NOMA protocol, the users associated to the same transmitter form a NOMAcluster. The signal of users associated to other transmitters (known as ICI) is fully treatedas AWGN at users within the NOMA cluster. Hence, we consider a single NOMA cluster ateach cell b [18] including |U b | users, where | . | is the cardinality of a set. The term k → i indicates that user k has a higher decoding order than user i such that user k is scheduled (andenforced) to decode and cancel the whole signal of user i , while the whole signal of user k istreated as INI at user i . For instance, assume that each cell b serves M users, i.e., |U b | = M .Generally, there are M ! possible decoding orders for users within the M -order NOMA cluster. Without loss of generality, let k → i if k > i , i.e., the SIC of NOMA in each cell b follows M → M − → · · · → . As shown in Fig. 1 in [18], in this SIC decoding order, the signal ofeach user i will be decoded prior to user k > i . In general, each user i ﬁrst decodes and cancelsthe signal of users , . . . , i − . Then, it decodes its desired signal such that the signal of users i + 1 , . . . , M is treated as noise at user i [5]. In this regard, in each cell, the NOMA cluster-headuser M does not experience any intra-NOMA interference (INI).Let s b,i ∼ CN (0 , be the desired signal of user i ∈ U b . Denoted by λ b,i,k ∈ { , } , thebinary decoding decision indicator, where λ b,i,k = 1 if user k ∈ U b is scheduled to decode(and cancel when k = i ) s b,i , and otherwise, λ b,i,k = 0 . Since for each user pair within aNOMA cluster only one user can decode and cancel the signal of other user, we have λ b,i,k + λ b,k,i = 1 , ∀ b ∈ B , i, k ∈ U b , k = i . Moreover, the signal of each user should be decodedat that user, meaning that λ b,i,i = 1 , ∀ b ∈ B , i ∈ U b . Due to the transitive nature of SICordering, if λ b,i,k = 1 and λ b,k,h = 1 , then we should have λ b,i,h = 1 . In other words, we have λ b,i,k λ b,k,h ≤ λ b,i,h , ∀ b ∈ B , i, k, h ∈ U b . According to the SIC protocol, s b,i should be decodedat user i ∈ U b as well as all the users in Φ b,i = { k ∈ U b \ { i } | λ b,i,k = 1 } . Therefore, in the SICof NOMA, each user i ﬁrst decodes and subtracts each signal s b,j , ∀ j ∈ U b \ {{ i } ∪ Φ b,i } , thenit decodes its desired signal s b,i such that the signal of users in Φ b,i is treated as noise (calledINI). According to the SIC protocol, the signal of user i will be decoded prior to user k if | Φ b,i | > | Φ b,k | . According to the above, the SIC decoding order among users can be determinedby ﬁnding λ b,i,k . Actually, λ b,i,k = 1 , ∀ b ∈ B , i, k ∈ U b , k = i is equivalent to k → i in cell b . Similar to [11]–[14], [18]–[28], we assume that the perfect channel state information (CSI)of all the users is available at the scheduler. The channel gain from BS j ∈ B to user i ∈ U b is denoted by g j,b,i . The allocated power from BS b to user i ∈ U b is denoted by p b,i . Afterperforming perfect SIC at each user l ∈ U b \ Φ b,i , the received signal of user i ∈ U b at user k ∈ { i } ∪ Φ b,i is given by y b,i,k = √ p b,i g b,b,k s b,i | {z } intended signal + X j ∈ Φ b,i √ p b,j g b,b,k s b,j | {z } INI + X j ∈B j = b X l ∈U j √ p j,l g j,b,k s j,l | {z } ICI + N b,k , (1) The term Φ b,i is the set of users in cell b with higher decoding orders than user i ∈ U b . In this work, we aim to address the performance gain of ﬁnding optimal decoding order and power allocation for maximizingusers total spectral efﬁciency in multi-cell NOMA systems. Considering any imperfect SIC is considered as a future work. where the ﬁrst and second terms are the received desired signal and INI of user i ∈ U b at user k ∈ U b , respectively. The third term represents the ICI at user k ∈ U b . Moreover, N b,k is theAWGN at user k ∈ U b with zero mean and variance σ b,k . Without loss of generality, assume that | s b,i | = 1 , ∀ b ∈ B , i ∈ U b , and h j,b,k = | g j,b,k | [13], [18], [25]. According to (1), the SINR of user k ∈ Φ b,i for decoding and canceling the signal of user i ∈ U b is γ b,i,k = p b,i h b,b,k P j ∈ Φ b,i p b,j h b,b,k +( I b,k + σ b,k ) ,where P j ∈ Φ b,i p b,j h b,b,k is the INI power of user i ∈ U b (after perfect SIC) received at user k ∈ U b ,and I b,k = P j ∈B j = b P l ∈U j p j,l h j,b,k is the received ICI power at user k ∈ U b . For the case that k = i , γ b,i,i denotes the SINR of user i ∈ U b for decoding its desired signal s b,i after perfect SIC.For convenience, let h b,b,i ≡ h b,i , and γ b,i,i ≡ γ b,i . Furthermore, let us denote the matrix of allthe decoding indicators by λ = [ λ b,i,k ] , ∀ b ∈ B , i, k ∈ U b , in which λ b = [ λ b,i,k ] , ∀ i, k ∈ U b ,represents the decoding indicator matrix of users in U b . Moreover, p is the matrix of powerallocation among all the users, in which p b is the b -th row of this matrix indicating the powerallocation vector of users in cell b . According to the Shannon’s capacity formula, the achievablespectral efﬁciency of user i ∈ U b can be obtained by [4], [5] R b,i ( p , λ b ) = min k ∈{ i }∪ Φ b,i  log  p b,i h b,k P j ∈ Φ b,i p b,j h b,k + ( I b,k ( p − b ) + σ b,k )  . (2)Note that the set Φ b,i depends on λ b , although it is not explicitly shown in (2). The centralizedtotal spectral efﬁciency maximization problem is formulated byJSPA : max p ≥ , λ ∈{ , } X b ∈B X i ∈U b R b,i ( p , λ b ) (3a)s.t. X i ∈U b p b,i ≤ P max b , ∀ b ∈ B , (3b) R b,i ( p , λ b ) ≥ R min b,i , ∀ b ∈ B , i ∈ U b , (3c) λ b,i,k + λ b,k,i = 1 , ∀ b ∈ B , i, k ∈ U b , k = i, (3d) λ b,i,k λ b,k,h ≤ λ b,i,h , ∀ b ∈ B , i, k, h ∈ U b , (3e) λ b,i,i = 1 , ∀ b ∈ B , i, ∈ U b . (3f)where (3b) and (3c) are the per-BS maximum power and per-user minimum spectral efﬁciencyconstraints, respectively. P max b denotes the maximum power of BS b , and R min b,i is the minimumspectral efﬁciency demand of user i ∈ U b . The rest of the constraints are described above. III. S

OLUTION A LGORITHMS FOR THE S UM -R ATE M AXIMIZATION P ROBLEM

In this section, we propose globally optimal and suboptimal solutions for the main problem(3) under the centralized/decentralized resource management frameworks. Finally, we comparethe computational complexity of the proposed resource allocation algorithms.

A. Centralized Resource Management Framework

In this subsection, we ﬁrst propose a globally optimal JSPA algorithm for problem (3). Then,by considering a ﬁxed SIC decoding order in (3), we propose two suboptimal rate adoption andpower allocation algorithms.

1) Globally Optimal JSPA Algorithm:

Problem (3) can be classiﬁed as a mixed-integer non-linear programming (MINLP) problem. The sum-rate function in (3a) is nonconcave in p and λ which makes (3) nonconvex and strongly NP-hard [23], [25]. Let us deﬁne the power con-sumption coefﬁcient of BS b as α b ∈ [0 , such that P i ∈U b p b,i = α b P max b . The received ICI powerat user i ∈ U b can be reformulated by I b,i ( α − b ) = P j ∈B j = b α j P max j h j,b,i . Let α = [ α b ] × B be the BSspower coefﬁcient vector. We prove that for any given α − b , the optimal ( λ ∗ b , p ∗ b ) can be obtainedin closed form as follows: Proposition III.1.

In multi-cell NOMA, the optimal decoding order for the user pair i, k ∈ U b is k → i if and only if ˜ h b,i ( α − b ) ≤ ˜ h b,k ( α − b ) , where ˜ h b,l ( α − b ) = h b,l I b,l ( α − b )+ σ b,l , l = i, k . Therefore, λ ∗ b,i,k = 1 if and only if ˜ h b,i ( α − b ) ≤ ˜ h b,k ( α − b ) .Proof. Please see Appendix A.

Remark III.1.1.

In multi-cell NOMA, the optimal SIC ordering is the decoding order basedon the ascending order of users channel gain normalized by ICI-plus-noise (called CINR-baseddecoding order). The optimal decoding order is independent from the power allocation policywithin the cell. However, it depends on the received ICI, and subsequently on the total powerconsumption of neighboring (interfering) BSs.

Corollary III.1.1.

For any given α − b and subsequently ˜ h b,i ( α − b ) ≤ ˜ h b,k ( α − b ) , at the optimal λ ∗ b , we have log  p b,i h b,i P j ∈ Φ ∗ b,i p b,j h b,i + ( I b,i + σ b,i )  ≤ log  p b,i h b,k P j ∈ Φ ∗ b,i p b,j h b,k + ( I b,k + σ b,k )  . (4) According to (2) and (4) , at the optimal λ ∗ b , we have R b,i ( p , λ ∗ b ) = log  p b,i h b,i P j ∈ Φ ∗ b,i p b,j h b,i +( I b,i + σ b,i )  . According to Remark III.1.1, the centralized power allocation and SIC ordering problemscannot be decoupled. However, for given α − b , and subsequently λ ∗ b (based on Proposition III.1),the optimal power p ∗ b can be obtained in closed form according to the following proposition: Proposition III.2.

Assume that α − b is ﬁxed. For convenience, let |U b | = M , and k > i if ˜ h b,k ( α − b ) > ˜ h b,i ( α − b ) . According to Proposition III.1, the decoding order M → M − →· · · → is optimal. The optimal powers in p ∗ b can be obtained in closed form as follows: p ∗ b,i =  β b,i  i − Y j =1 (1 − β b,j ) α b P max b + 1˜ h b,i − i − X j =1 i − Q k = j +1 (1 − β b,k ) β b,j ˜ h b,j  + , ∀ i = 1 , . . . , M − , (5) and p ∗ b,M =  α b P max b − M − X i =1 β b,i  i − Y j =1 (1 − β b,j ) α b P max b + 1˜ h b,i − i − X j =1 i − Q k = j +1 (1 − β b,k ) β b,j ˜ h b,j  + , (6) where β b,i = R min b,i − R min b,i , ∀ i = 1 , . . . , M − , and [ . ] + = max { ., } .Proof. Please see Appendix B.

Remark III.2.1.

For sufﬁciently large normalized channel gains ˜ h b,i , ∀ i ∈ U b \ { M } , (5) and (6) can be approximated to p ∗ b,i ≈ " α b P max b β b,i i − Y j =1 (1 − β b,j ) ! + , ∀ i = 1 , . . . , M − , (7) and p ∗ b,M ≈ " α b P max b − M − X i =1 β b,i i − Y j =1 (1 − β b,j ) !! + , (8) respectively. Remark III.2.2.

For sufﬁciently large normalized channel gains ˜ h b,i , ∀ i ∈ U b \ { M } , if usershave the same minimum rate demands R min in cell b , the optimal power coefﬁcient of each user i ∈ U b denoted by q ∗ b,i = p ∗ b,i α b P max b based on (7) and (8) can be obtained by q ∗ b,i ≈ R min − R min ) i , ∀ i = 1 , . . . , M − , q ∗ b,M ≈ R min ) M − . (9)Remark III.2.1 shows that in the high SINR regions, the optimal powers are approximatelyinsensitive to the exact channel gains. Hence, (7) and (8) are valid for the fast fading and/orimperfect CSI scenarios, where the CSI variations are small such that the optimal decodingorder remains constant . Moreover, Remark III.2.2 shows that the weaker user always gets morepower than the stronger user for the same minimum rate demands. For instance, for the casethat R min = 1 bps/Hz, regardless of the order of NOMA cluster, near to of the availablepower α b P max will be allocated to the weakest user . The performance of these approximationsis numerically evaluated in Subsection IV-F.Our proposed globally optimal JSPA algorithm utilizes both the exploration of different valuesof α b in α × B and the distributed power allocation optimization. In this algorithm, we performa greedy search on different values of each α b in α × B . For given ˆ α , we ﬁnd the optimal λ ∗ and p ∗ according to Propositions III.1 and III.2, respectively. The pseudo code of the proposedglobally optimal solution is presented in Alg. 1. This algorithm needs to explore all the possiblevalues in α × B . For the total number of samples S α for each α b , the complexity of Alg. 1 is S Bα .For the case that each cell has M users, the complexity of exhaustive search is S BMp × ( M !) B ,where S p is the total number of samples for each p b,i in p . Hence, Alg. 1 reduces the complexityof exhaustive search by a factor of S Mp × ( M !) B when S α = S p . In fact, Alg. 1 has two mainadvantages: 1) The complexity is independent from the order of NOMA clusters resulting in lowcomplexity method for the scenarios with larger number of users while small number of BSs;2) The complexity of ﬁnding optimal SIC ordering is negligible since for a ﬁxed α , the optimaldecoding order is obtained in closed form.Since the proposed Alg. 1 is still exponential in the number of BSs, it is important to checkthe feasibility of problem (3) with a low-complexity algorithm before performing Alg. 1. The In this work, we considered the perfect CSI scenario. The impact of imperfect CSI on the closed form of optimal powers inmulti-cell NOMA can be considered as a future work. Algorithm 1

Optimal JSPA for Sum-Rate Maximization Problem. Initialize the step size ǫ α ≪ , and R tot = 0 . for each sample ˆ α do Update ˜ h b,i = h b,i ˆ I b,i + σ b,i , ∀ b ∈ B , i ∈ U b , where ˆ I b,i = P j ∈B j = b ˆ α j P max j h j,b,i . Update λ according to λ b,i,k = 1 if ˜ h b,k > ˜ h b,i , or equivalently update users index accordingto k > i if ˜ h b,k > ˜ h b,i . Find p according to (5) and (6). if P b ∈B P i ∈U b R b,i ( p , λ b ) > R tot ! then Update R ∗ tot = P b ∈B P i ∈U b R b,i ( p , λ b ) ! , p ∗ = p , and λ ∗ = λ . end if end for The outputs λ ∗ and p ∗ are the optimal solutions.feasibility problem of (3) can be formulated by min p ≥ , λ ∈{ , } f ( p , λ ) s.t. (3b)-(3f) , (10)where f ( p , λ ) can be any objective function such that the intersection of the feasible domainof (3b)-(3f) is a subset of the feasible domain of f ( p , λ ) . Finding a feasible solution for (10)is challenging, due to the binary variables in λ . For the case that f ( p ) = P b ∈B P i ∈U b p b,i , it isproved that the ICI (in the total power minimization problem) under the CINR-based decodingorder veriﬁes the basic properties of the standard interference function [18]. Hence, the well-known iterative distributed power minimization framework can globally solve (10) with a fastconvergence speed [18]. Here, we brieﬂy present the structure of the iterative distributed powerminimization algorithm proposed in [18]. Let p ( t − be the output of iteration t − which is theinitial power matrix for iteration t denoted by p ( t ) . At iteration t , we ﬁrst update the optimaldecoding order in cell based on the updated ICI (according to Proposition III.1). Then, we ﬁnd p ∗ ( t ) for the ﬁxed p ( t ) − . After that, we substitute p ( t )1 with p ∗ ( t ) . The updated p ( t ) is an initialpower matrix for the next cell . We repeat these steps at all the remaining cells. The updatedpower at cell B is the output of iteration t . We continue the iterations until the convergenceis achieved. For the ﬁxed I b,i and subsequently given λ ∗ b in cell b , the optimal powers can be Algorithm 2

Optimal Joint SIC Ordering and Power Allocation for Total Power MinimizationProblem [18]. Initialize feasible p (0) , and tolerance ǫ tol (sufﬁciently small). while (cid:13)(cid:13) p ( t − (cid:13)(cid:13) − (cid:13)(cid:13) p ( t ) (cid:13)(cid:13) > ǫ tol do Set t =: t + 1 , and then update p ( t ) =: p ( t − . for b=1:B do Update the ICI term I b,i = P j ∈B j = b P l ∈U j p j,l ! h j,b,i at cell b . Update λ b in cell b according to k → i if ˜ h b,k > ˜ h b,i , where ˜ h b,l = h b,l I b,l + σ b,l , ∀ b ∈ B , l ∈U b . Find p ∗ ( t ) b using (11). Then, substitute p ( t ) b with p ∗ ( t ) b . end for end while The outputs λ ∗ and p ∗ are the optimal solutions.obtained in closed form as p ∗ b,i = β b,i  M Y j = i +1 (1 + β b,j ) + 1˜ h b,i + M X j = i +1 β b,j j − Q k = i +1 (1 + β b,k )˜ h b,j  , ∀ i = 1 , . . . , M. (11)where β b,i = 2 R min b,i − , ∀ i = 1 , . . . , M and ˜ h b,l = h b,l I b,l + σ b,l . For convenience, in (11), we assumedthat |U b | = M and also updated the users index based on the ascending order of ˜ h b,i . For moredetails, please see Appendix C. The pseudo code of the proposed power minimization algorithmis presented in Alg. 2. Numerical assessments show that Alg. 2 has a very fast convergencespeed [18]. The only concern is ﬁnding a feasible initial point p (0) . In Subsection IV-D, weprovided comprehensive discussions about the impact of feasible/infeasible initial points on theconvergence of Alg. 2.It is proved that the optimal α ∗ in the total power minimization problem is indeed a component-wise minimum [18]. It means that for any feasible ˆ α = [ ˆ α b ] , ∀ b ∈ B in (10), it can be guaranteedthat α ∗ b ≤ ˆ α b . As a result, the exploration area of each α b in Alg. 1 can be reduced from [0 , to [ α min b , , where α min b denotes the optimal power consumption coefﬁcient of BS b in the total powerminimization problem. The lower-bounds α min b , ∀ b ∈ B can signiﬁcantly reduce the complexityof Alg. 1 for the case that α min b grows, e.g., when the minimum rate demand of users increases.

2) Suboptimal SIC Ordering: Joint Rate and Power Allocation:

According to PropositionIII.1, since λ ∗ in (3) depends on α ∗− b , any decoding order before power allocation is indeedsuboptimal. For any suboptimal decoding order, Corollary III.1.1 may not hold and thus, NOMAis not capacity-achieving. The main problem (3) for given λ , or equivalently ﬁxed Φ b,i , can berewritten as max p ≥ X b ∈B X i ∈U b R b,i ( p ) s.t. (3b) , (3c) . (12)Constraint (3c) can be equivalently transformed to a linear form as follows: R min b,i  X j ∈ Φ b,i p b,j h b,k + I b,k + σ b,k  ≤  p b,i + X j ∈ Φ b,i p b,j  h b,k + I b,k + σ b,k , ∀ b ∈ B ,i, k ∈ U b , k ∈ { i } ∪ Φ b,i . (13)Hence, the feasible region of (12) is afﬁne, so is convex in p . However, (12) is still stronglyNP-hard, due to the nonconcavity of the sum-rate function with respect to p . The optimal powersin (5) and (6) are derived based on the optimal decoding order. Hence, Alg. 1 is not applicablefor solving (12). Besides, the complexity of exhaustive search for solving (12) is S BMp , whichis exponential in the number of users. In the following, we apply the sequential programmingapproach [29] to ﬁnd a suboptimal power allocation for (12). To tackle the non-differentiabilityof R b,i ( p ) , we ﬁrst apply the epigraph technique [15], and deﬁne r b,i as the adopted spectralefﬁciency of user i ∈ U b such that r b,i ≤ log p b,i h b,k P j ∈ Φ b,i p b,j h b,k + I b,k + σ b,k ! , ∀ b ∈ B , i, k ∈U b , k ∈ { i } ∪ Φ b,i . Hence, (12) can be equivalently transformed to the following problem asJRPA : max p ≥ , r ≥ X b ∈B X i ∈U b r b,i (14a)s.t. (3b) ,r b,i ≥ R min b,i , ∀ b ∈ B , i ∈ U b , (14b) r b,i ≤ log  p b,i h b,k P j ∈ Φ b,i p b,j h b,k + I b,k + σ b,k  , ∀ b ∈ B , i, k ∈ U b , k ∈ { i } ∪ Φ b,i , (14c)where r = [ r b,i ] , ∀ b ∈ B , i ∈ U b . Although the objective function (14a) and constraints (3b) and(14b) are afﬁne, problem (14) is still nonconvex due to the nonconvexity of (14c). The derivations Algorithm 3

Suboptimal JRPA Algorithm Based on the Sequential Programming. Initialize r (0) , iteration index t = 0 , and tolerance ǫ s ≪ . while (cid:13)(cid:13) r ( t ) − r ( t − (cid:13)(cid:13) > ǫ s do Set t =: t + 1 . Update the approximation parameter ˆ g ( r ( t ) b,i ) based on r ∗ ( t − . Find (cid:16) r ∗ ( t ) , ˜ p ∗ ( t ) (cid:17) by solving the convex approximated problem (40). end while Set p ∗ b,i = e ˜ p ∗ b,i , ∀ b ∈ B , i ∈ U b . The outputs (cid:0) r ∗ ( t ) , p ∗ ( t ) (cid:1) are adopted for the network.of the proposed sequential programming method for solving (14) is presented in Appendix D.The pseudo code of our proposed JRPA algorithm is shown in Alg. 3.The decoding order for some user pairs is independent from the ICI, so it can be determinedprior to power allocation optimization. For a speciﬁc channel gain condition, we prove that theCNR-based decoding order is optimal for a user pair independent from power allocation. Theorem 1. (SIC sufﬁcient condition)

For each user pair i, k ∈ U b with h b,k σ b,k ≥ h b,i σ b,i , if h b,k σ b,k − h b,i σ b,i ≥ X j ∈Q b,i,k P max j σ b,i σ b,k ( h j,b,k h b,i − h j,b,i h b,k ) , (15) where Q b,i,k = n j ∈ B \ { b } (cid:12)(cid:12)(cid:12) h b,k h b,i < h j,b,k h j,b,i o , the decoding order k → i is optimal.Proof. Please see Appendix E.Theorem 1 shows that the optimal decoding order is challenging for only the user pairs inwhich their ICIs affect the sign of their CINR difference. Assume that the suboptimal CNR-baseddecoding order is applied in (14), i.e., k → i or λ b,i,k = 1 if h b,k σ b,k ≥ h b,i σ b,i . Let us deﬁne the setof users with higher decoding orders than user i ∈ U b that satisfy the SIC sufﬁcient conditionby Φ CNR b,i = { k ∈ U b \ { i } | λ b,i,k = 1 , h b,k σ b,k − h b,i σ b,i ≥ P j ∈Q b,i,k P max j σ b,i σ b,k ( h j,b,k h b,i − h j,b,i h b,k ) } .Obviously, we have Φ CNR b,i ⊆ Φ b,i . Based on Theorem 1, it is required to check the feasibility ofconstraint (14c) for only users in { i } ∪ Φ b,i \ Φ CNR b,i , which signiﬁcantly reduces the complexityof our proposed JRPA algorithm. For the case that Φ CNR b,i = Φ b,i , we can guarantee that Corollary(III.1.1) holds for user i ∈ U b , meaning that r ∗ b,i = ˜ R b,i ( p ∗ , λ ∗ b ) . Finally, for the case that theSIC sufﬁcient condition holds for every user pair within cell b , i.e., Φ CNR b,i = Φ b,i , ∀ i ∈ U b , the CNR-based decoding order is optimal in cell b , meaning that the number of constraints in (14c)will be reduced to one in that cell.Similar to problem (10), it can be shown that the feasible region of (12) is the same as thefeasible region of the following total power minimization problem as min p ≥ , r ≥ X b ∈B X i ∈U b p b,i s.t. (3b) , (13) . (16)Since the objective function and all the constraints are afﬁne, problem (16) is a linear program,so is convex. Hence, (16) can be solved by using the Dantzig’s simplex method or interior-pointmethods (IPMs) [15]. The solution of (16) can be utilized as initial feasible point for Alg. 3.

3) Suboptimal SIC Ordering: Power Allocation for a Fixed Rate Region:

In Subsection III-A2,we showed that for any suboptimal decoding order (ﬁxed λ ), Corollary III.1.1 may not hold, sojoint power allocation and rate adoption is necessary to achieve the maximum possible spectralefﬁciency of the users. There are a number of research studies in multi-cell NOMA assuminga ﬁxed rate region for each user as ˜ R b,i = log p b,i h b,i P j ∈ Φ b,i p b,j h b,i +( I b,i ( p − b )+ σ b,i ) ! eliminating theexploration area of ﬁnding optimal rate allocation [20], [23], [25]–[27]. According to (2), toguarantee that user i ∈ U b achieves its Shannon’s capacity for decoding its desired signal aftersuccessful SIC at users, the condition in (4) should be satisﬁed for each user k ∈ Φ b,i . For aﬁxed λ , so ﬁxed Φ b,i , (4) can be rewritten as h b,i I b,i + σ b,i ≤ h b,k I b,k + σ b,k , ∀ b ∈ B , i, k ∈ U b , k ∈ Φ b,i . (17)The constraint in (17) is known as SIC necessary condition [20], [23], [25]. Fact III.2.1.

In multi-cell NOMA, the SIC necessary condition (17) for each user pair i, k ∈U b , k ∈ Φ b,i implies additional maximum power constraints on the BSs in B \ { b } .Proof. Please see Appendix F.

Remark III.2.3.

Restricting the rate region of users under the suboptimal decoding order resultsin additional limitations on the BSs power consumption. For the user pairs satisfying the SICsufﬁcient condition, the decoding order is optimal, and subsequently the SIC constraint (17) willbe completely independent from the power allocation (see Corollary III.1.1). As a result, thenegative side impact of the SIC necessary condition in power allocation will be eliminated. According to (3), for any ﬁxed λ , the power allocation problem for the suboptimal decodingorder and ﬁxed rate region, called ﬁxed-rate-region power allocation (FRPA), is formulated byFRPA : max p ≥ X b ∈B X i ∈U b ˜ R b,i ( p ) (18a)s.t. (3b) , (17) , (18b) ˜ R b,i ( p ) ≥ R min b,i , ∀ b ∈ B , i ∈ U b . (18c)Constraints (18c) and (17) can be equivalently transformed to linear forms in p as (13) and (41),respectively. Therefore, the feasible region of (18) is afﬁne, so is convex. However, the sum-rate function in (18a) is nonconcave, due to the existing ICI, which makes the main problem(18) nonconvex and strongly NP-hard [23], [25]. Problem (18) can be solved by using themonotonic optimization proposed in [29]. However, the complexity of monotonic optimizationis still exponential in the number of users [29]. According to Corollary III.1.1, under the optimaldecoding order λ ∗ , the main problem (3) can be rewritten as max p ≥ X b ∈B X i ∈U b ˜ R b,i ( p ) (19a)s.t. (3b) , (19b) ˜ R b,i ( p ) ≥ R min b,i , ∀ b ∈ B , i ∈ U b . (19c)The feasible region of (19) and (18) is the same for the case that the SIC necessary condition (17)is removed from (18). According to Fact III.2.1, the constraint (17) adds additional restrictionon the feasible region of (18). As a result, the feasible region of (18) is a subset of the feasibleregion of (19). In this regard, the optimal solution of (18) lies on the intersection of the feasibleregion of (19) and constraint (17). Accordingly, Alg. 1 can be modiﬁed based on the ﬁxed λ andSIC constraint (17) to ﬁnd the globally optimal solution of (18). The pseudo code of the globallyoptimal solution for (18) is presented in Alg. 4. The main difference of Algs. 1 and 4 is Step .In Alg. 4, we check the SIC necessary condition for the ﬁxed decoding order instead of updatingthe SIC decoding orders in Alg. 1. Although Alg. 4 scales well with any number of multiplexedusers, it is still exponential in the number of BSs (see Subsection III-A1). One solution is toapply the well-known suboptimal power allocation based on the sequential programming methodproposed in [23], [25], [29]. Algorithm 4

Optimal Power Allocation for Fixed SIC Ordering and Rate Region. Initialize the step size ǫ α = S α ≪ , where S α is the number of samples for each α b , and R tot = 0 . for each sample ˆ α do Update ˜ h b,i = h b,i ˆ I b,i + σ b,i , ∀ b ∈ B , i ∈ U b , where ˆ I b,i = P j ∈B j = b ˆ α j P max j h j,b,i . if (cid:16) h b,i I b,i + σ b,i ≤ h b,k I b,k + σ b,k , ∀ b ∈ B , i, k ∈ U b , k ∈ Φ b,i (cid:17) then Find p according to (5) and (6). if P b ∈B P i ∈U b R b,i ( p ) > R tot ! then Update R ∗ tot = P b ∈B P i ∈U b R b,i ( p ) ! , and p ∗ = p . end if end if end for The output p ∗ is the optimal solution of (18).The feasible solution for (18) can be obtained by solving the following total power minimiza-tion problem as min p ≥ X b ∈B X i ∈U b p b,i s.t. (3b) , (17) , (18c) . (20)Problem (20) is a linear program which can be solved by using the Dantzig’s simplex method. B. Decentralized Resource Management Frameworks1) Fully Distributed JSPA Framework:

Although the globally optimal solution in SubsectionIII-A1 achieves the channel capacity of users, the complexity of Alg. 1 is still exponentialin the number of BSs. Moreover, the centralized framework would cause a large signalingoverhead. Here, we propose a fully distributed resource allocation framework in which eachBS independently allocates power to its associated users. Actually, we divide the main problem(3) into B single-cell NOMA problems. Fact III.2.2.

At the optimal point of sum-rate maximization problem of single-cell NOMA, theBS operates in its maximum available power. It means that the power constraint (3b) is activefor each BS b , i.e., P i ∈U b p ∗ b,i = P max b , ∀ b ∈ B in the fully distributed framework. Proof.

Please see Appendix G.According to Fact III.2.2, we set α ∗ b = 1 , ∀ b ∈ B . Based on the given α ∗ , the optimal decodingorder of users under the fully distributed framework can be easily obtained by Proposition III.1.According to Subsection III-A1, the optimal power p ∗ under the fully distributed framework canbe obtained by using Proposition III.2.

2) Semi-Centralized JSPA Framework:

The fully distributed framework works well for thecase that the ICI levels are signiﬁcantly low, so α ∗ b → , ∀ b ∈ B in problem (3). For thecase that the ICI levels are high, e.g., at femto-cells underlying a single MBS [28], the fullydistributed framework may seriously degrade the spectral efﬁciency of femto-cell users. In thefollowing, we propose a semi-centralized JSPA framework in which we assume that the low-power FBSs operate in their maximum power, while the MBS power consumption is obtained bythe joint power allocation and SIC ordering of all the users. Let b = 1 be the MBS’s index, and b = 2 , . . . , B be the index of FBSs. In this framework, we assume that α b = 1 , ∀ b = 2 , . . . , B .Then, we utilize Alg. 1 to ﬁnd the globally optimal p ∗ and λ ∗ for problem (3). This algorithmperforms a grid search on ≤ α ≤ . Hence, the complexity of ﬁnding optimal JSPA is onthe order of total number of samples for α , i.e., S α . Actually, the computational complexityof this algorithm is independent from the number of FBSs and users which is a good solutionfor the large-scale systems. The performance of the semi-centralized framework depends on theFBSs optimal power consumption in (3). For the case that α ∗ b → , ∀ b = 2 , . . . , B in (3), theperformance gap between the semi-centralized and centralized frameworks tends to zero. C. Computational Complexity Comparison Between Resource Allocation Algorithms

In this section, we compare the computational complexity of the proposed different resourceallocation algorithms for solving the sum-rate maximization problem (3). To simplify the analysis,we assume that each cell has M users. In this comparison, we apply the barrier method withinner Newton’s method to achieve an ǫ -suboptimal solution for a convex problem. The number ofbarrier iterations required to achieve mt = ǫ -suboptimal solution is exactly Υ = ⌈ log ( m/ ( ǫt (0) )) log µ ⌉ ,where m is the total number of inequality constraints, t (0) is the initial accuracy parameter forapproximating the functions in inequality constraints in standard form, and µ is the step sizefor updating the accuracy parameter t [15]. The number of inner Newton’s iterations at eachbarrier iteration i is denoted by N i . In general, N i depends on µ and how good is the initial points at the barrier iteration i . The computational complexity of solving a convex problemis thus on the order of total number of Newton’s iterations obtained by C cnvx = Υ P i =1 N i . Forthe case that the sequential programming converges in Q iterations, the complexity of thismethod is on the order of C SP = Q P q =1 Υ P i =1 N q,i , where N q,i denotes the number of inner Newton’siterations at the i -th barrier iteration of the q -th sequential iteration. For convenience, assumethat N q,i = N, ∀ q = 1 , . . . , Q, i = 1 , . . . , Υ . Hence, we have C cnvx = N Υ = N ⌈ log ( m/ ( ǫt (0) )) log µ ⌉ ,and C SP = QN Υ = QN ⌈ log ( m/ ( ǫt (0) )) log µ ⌉ . The complexity order of different solution algorithmsfor (3) is presented in Table I. In this table, S denotes the number of samples for each p b,i or TABLE IC

OMPUTATIONAL C OMPLEXITY OF R ESOURCE A LLOCATION A LGORITHMS FOR S OLVING THE S UM -R ATE M AXIMIZATION P ROBLEM . Algorithm Complexity Framework Optimal λ r α p

JSPA (Alg. 1) S B Centralized

X X X X X

Exhaustive Search ( M !) B × S BM Centralized

X X X X X

JRPA (Alg. 3) QN (cid:24) log (( B + BM + B ( M − )) / ( ǫt (0) )) log µ (cid:25) Centralized × ×

X X X

FRPA S B Centralized X × × X X

Monotonic Optimization [23], [29] ≈ S BM Centralized X × × X X

Sequential Program [23], [29] QN (cid:24) log (( B + BM + B ( M − )) / ( ǫt (0) )) log µ (cid:25) Centralized × × ×

X X

Subsection III-B2 S Semi-Centralized

X X X α X Subsection III-B1 Fully Distributed

X X X × X α b . Moreover, the optimality status is for only the simpliﬁed problem. For example, FRPA ﬁndsthe globally optimal powers in p , and subsequently α , when λ and r are ﬁxed. It does notmean that the output is globally optimal solution for the main problem (3). Actually, only theﬁrst and second rows in Table I can ﬁnd the globally optimal solution of (3), and the rest ofthe algorithms are indeed suboptimal. It is noteworthy that in Table I, we considered the highestpossible computational complexity (worst case) of each algorithm.IV. N UMERICAL R ESULTS

In this section, we evaluate the performance of our proposed resource allocation algorithms viaMATLAB Monte Carlo simulations through network realizations [18]. This comparisonis divided into two subsections: 1) performance comparison among our proposed JSPA, JRPA,and FRPA algorithms to demonstrate the beneﬁts of optimal SIC ordering, and rate adoption TABLE IISYSTEM PARAMETERS

Parameter Notation Value Parameter Notation Value

Coverage of MBS ×

500 m Lognormal shadowing standard deviation × dBCoverage of FBS ×

40 m Small-scale fading × Rayleigh fadingDistance between MBS and FBS ×

200 m AWGN power density N b,i -174 dBm/HzNumber of macro-cell users |U m | { } MBS transmit power P max m

46 dBmNumber of femto-cell users |U f | { } FBS transmit power P max f

30 dBmUser distribution model × Uniform Minimum rate of macro-cell users R min m { .

5; 1; 2 } bps/HzMinimum distance of users to MBS ×

20 m Minimum rate of femto-cell users R min f { .

25; 0 .

50; 0 .

75; 1; 2; 3 } bps/HzMinimum distance of users to FBS × ǫ tol − Wireless bandwidth × MHz Step size of each α b ∈ [0 , ǫ α − MBS path loss × . . ( d/ Km ) dB Tolerance of Alg. 3 ǫ s − FBS path loss × . . ( d/ Km ) dB - - - for any suboptimal decoding order (see Table I); 2) performance comparison among the cen-tralized and decentralized resource allocation frameworks to demonstrate the effect of optimal α ∗ (ICI management) in the main problem (3). In our simulations, we adopt the commonly-used (suboptimal) CNR-based decoding order [23], [25] for the JRPA and FRPA algorithms.Finally, we evaluate the convergence of the iterative algorithms for solving (10) and (14), andthe performance of approximated optimal powers in Remark III.2.1. The complete source codeis available on GitLab [30]. A. Simulation Settings

Here, we consider a two-tier HetNet consisting of one FBS underlying a MBS . Within eachcell, there is one BS at the center of a circular area and U b users inside it [18]. The systemparameters and their corresponding notations are shown in Table II. The network topology andexemplary users placement are shown in Fig. 1. B. Performance of Centralized Resource Allocation Algorithms

In this subsection, we compare the performance in terms of outage probability, and userstotal spectral efﬁciency of our proposed JSPA, JRPA, and FRPA algorithms. Note that FRPAﬁnds the globally optimal solution of the sum-rate maximization problem in [25] and the single-carrier-based downlink power allocation problem in [23] with signiﬁcantly reduced computationalcomplexity (see Table I). Although in practice there are larger number of FBSs, in this experiment, we aim to fundamentally investigate the impactof ICI from MBS to FBS and vice versa in optimal decoding order of users. -500 -300 -100 100 300 500 Horizontal axis coordinate (m) -400-300-200-1000100200300400500 V e r ti ca l a x i s c oo r d i n a t e ( m ) Coverage area of MBSCoverage area of FBSMBSFBSMacro-cell userFemto-cell user

Fig. 1. Network topology and exemplary user placement for |U m | = 3 , and |U f | = 2 .

1) Outage Probability Performance:

The outage probability of the JSPA, JRPA, and FRPAschemes is obtained based on optimally solving their corresponding total power minimizationproblems (10), (16), and (20), respectively. For each scheme, the outage probability is calculatedby dividing the number of infeasible problem instances by total number of samples [18]. Fig.2 shows the impact of order of NOMA clusters and minimum rate demands on the outageprobability of the JSPA, JRPA, and FRPA problems. According to Theorem 1, the ICI mayaffect the optimal decoding order of user pairs which cannot satisfy the SIC sufﬁcient condition.We call these user pairs as the pairs depending on ICI when the CNR-based decoding order isapplied. In Fig. 2(a), we calculate the average number of user pairs depending on ICI (denoted by Ψ ) for different distances among BSs, and different order of NOMA clusters in the CNR-baseddecoding order. The parameter Ψ is increased by 1) increasing |U b | , which inherently decreasesthe LHS of (15); 2) increasing the inter-cell channel gain h j,b,i , due to increasing the RHS of(15). The second case for the femto-cell user pairs is inversely proportional to the BSs distance,due to the existing path loss. As shown, the impact of |U b | is higher than the impact of BSsdistance. The wide coverage of macro-cell results in large differences between the users channelgains, however the CNR of macro-cell users is typically low reducing the LHS of (15). As aresult, Ψ is also affected by |U m | . It is noteworthy that the position of FBS (BSs distance) inthe coverage area of MBS does not have signiﬁcant impact on Ψ for the macro-cell users. Forthe case that at least one user pair within a cell does not satisfy the SIC sufﬁcient condition, Number of macro/femto-cell users A v e r a g e nu m b e r o f u s e r p a i r s d e p e nd e d on I C I Macrocell users, BS Distance=200 mFemtocell users, BS Distance=25 mFemtocell users, BS Distance=50 mFemtocell users, BS Distance=100 mFemtocell users, BS Distance=300 m (a) Average number of user pairs which cannot satisfy the SICsufﬁcient condition vs. order of NOMA cluster for the CNR-based decoding order. |U f | O u t a g e p r ob a b ilit y JSPA-Opt, |U m |=2JSPA-Opt, |U m |=3JSPA-Opt, |U m |=4JRPA-CNR, |U m |=2JRPA-CNR, |U m |=3 JRPA-CNR, |U m |=4FRPA-CNR, |U m |=2FRPA-CNR, |U m |=3FRPA-CNR, |U m |=4 (b) Outage probability vs. order of NOMA cluster for R min m = R min f = 1 bps/Hz. R minf (bps/Hz) -1 O u t a g e p r ob a b ilit y JSPA-Opt, R minm =0.5 bps/HzJSPA-Opt, R minm =1 bps/HzJSPA-Opt, R minm =2 bps/HzJRPA-CNR, R minm =0.5 bps/HzJRPA-CNR, R minm =1 bps/Hz JRPA-CNR, R minm =2 bps/HzFRPA-CNR, R minm =0.5 bps/HzFRPA-CNR, R minm =1 bps/HzFRPA-CNR, R minm =2 bps/Hz (c) Outage probability vs. users minimum rate demand for -order NOMA clusters. R minf (bps/Hz) -1 O u t a g e p r ob a b ilit y JSPA-Opt, R minm =0.5 bps/HzJSPA-Opt, R minm =1 bps/HzJSPA-Opt, R minm =2 bps/HzJRPA-CNR, R minm =0.5 bps/HzJRPA-CNR, R minm =1 bps/Hz JRPA-CNR, R minm =2 bps/HzFRPA-CNR, R minm =0.5 bps/HzFRPA-CNR, R minm =1 bps/HzFRPA-CNR, R minm =2 bps/Hz (d) Outage probability vs. users minimum rate demand for -order NOMA clusters.Fig. 2. Outage probability of the centralized JSPA, JRPA, and FRPA algorithms for different number of users and minimumrate demands. the CNR-based decoding order in that cell may not be optimal, resulting in reduced spectralefﬁciency and increased outage.In Figs. 2(b)-2(d), we observe that there exist signiﬁcant performance gaps between the JSPAand JRPA schemes, which shows the superiority of ﬁnding the optimal decoding order in multi-cell NOMA. Besides, JRPA signiﬁcantly reduces the outage probability of the FRPA schemewhich shows the importance of rate adoption when a suboptimal decoding order is applied. Thelarge performance gap between JSPA and FRPA shows that the SIC necessary condition in (17)seriously restricts the feasible region of the FRPA problem (see Fact III.2.1) resulting in high outage, speciﬁcally for the larger order of NOMA clusters (see Fig. 2(b)). Last but not least, inFig. 2(b), we observe that a larger order of NOMA clusters results in high outage probabilityeven for the optimal JSPA algorithm. Therefore, it is not wise to multiplex all the users whenconsidering a large number of users within a cell. The common solution is to divide the usersinto multiple groups in which each user group operates in an isolated subband [18], [20], [25].From the practical implementation, another disadvantages of increasing the order of NOMAclusters are increasing the receivers complexity and error propagation due to SIC [5], [18].Finding the optimal JSPA for maximizing users sum-rate in the general multi-carrier multi-cellNOMA system can be considered as a future work.

2) Total Spectral Efﬁciency Performance:

Fig. 3 investigates the impact of order of NOMAclusters and minimum rate remands on the average total spectral efﬁciency of users. Here, weset the sum-rate to zero when the problem is infeasible. As shown, JSPA always outperforms theJRPA and FRPA algorithms. The resulting performance gap between JSPA and JRPA is indeedan upper-bound of the exact performance gap between the optimal and CNR-based decodingorders, since Alg. 3 provides a lower-bound for the optimal value of (14) (see Table I). InSubsection IV-E, we show that JRPA is a near to optimal algorithm, so this lower-bound issigniﬁcantly tighten. Subsequently, the performance gap between JRPA and FRPA is indeed thelower-bound of the exact performance gain of rate adoption for the CNR-based decoding order.As can be seen, FRPA has a signiﬁcantly lower performance compared to JRPA. In Fig. 3(a),we observe that for R min m = R min f = 1 bps/Hz, the negative side impact of increasing the orderof NOMA cluster is higher than the multi-user diversity gain. As a result, increasing the orderof NOMA clusters results in lower total spectral efﬁciency. Another reason is increasing theoutage probability (shown in Fig. 2(b)) which can signiﬁcantly affect the average total spectralefﬁciency. For |U m | = |U f | = 2 , we observed that the performance gap is low between JSPAand JRPA, due to decreasing Ψ in Fig. 2(a). This performance gap grows when |U b | increases.As is expected, Figs. 3(b) and 3(c) show that the total spectral efﬁciency of users is decreasingin minimum rate demands. More importantly, there exist huge performance gaps between JSPAand FRPA for any number of users and minimum rate demands. C. Performance of Centralized and Decentralized Frameworks

In this subsection, we compare the performance of the centralized, semi-centralized, and fullydistributed JSPA frameworks (shown in Table I). |U f | T o t a l s p ec t r a l e ff i c i e n c y ( bp s / H z ) JSPA-Opt, |U m |=2JSPA-Opt, |U m |=3JSPA-Opt, |U m |=4JRPA-CNR, |U m |=2JRPA-CNR, |U m |=3 JRPA-CNR, |U m |=4FRPA-CNR, |U m |=2FRPA-CNR, |U m |=3FRPA-CNR, |U m |=4 (a) Total spectral efﬁciency vs. order of NOMA cluster for R min m = R min f = 1 bps/Hz. R minf (bps/Hz) T o t a l s p ec t r a l e ff i c i e n c y ( bp s / H z ) JSPA-Opt, R minm =0.5 bps/HzJSPA-Opt, R minm =1 bps/HzJSPA-Opt, R minm =2 bps/HzJRPA-CNR, R minm =0.5 bps/HzJRPA-CNR, R minm =1 bps/Hz JRPA-CNR, R minm =2 bps/HzFRPA-CNR, R minm =0.5 bps/HzFRPA-CNR, R minm =1 bps/HzFRPA-CNR, R minm =2 bps/Hz (b) Total spectral efﬁciency vs. users minimum rate demandfor -order NOMA clusters. R minf (bps/Hz) T o t a l s p ec t r a l e ff i c i e n c y ( bp s / H z ) JSPA-Opt, R minm =0.5 bps/HzJSPA-Opt, R minm =1 bps/HzJSPA-Opt, R minm =2 bps/HzJRPA-CNR, R minm =0.5 bps/HzJRPA-CNR, R minm =1 bps/Hz JRPA-CNR, R minm =2 bps/HzFRPA-CNR, R minm =0.5 bps/HzFRPA-CNR, R minm =1 bps/HzFRPA-CNR, R minm =2 bps/Hz (c) Total spectral efﬁciency vs. users minimum rate demandfor -order NOMA clusters.Fig. 3. Total spectral efﬁciency of the centralized JSPA, JRPA, and FRPA algorithms for different number of users and minimumrate demands.

1) Outage Probability Performance:

Fig. 4 evaluates the outage probability of different re-source allocation frameworks. As can be seen, the fully distributed framework results in hugeoutage probability. However, the outage probability gap between the semi-centralized and cen-tralized frameworks is decreasing with larger |U m | (Fig. 4(a)), and/or higher R min m (Figs. 4(b) and4(c)). The large performance gap between the semi-centralized and fully distributed frameworksshows the importance of ICI management from MBS to femto-cell users. It is noteworthy thatthe feasible point for the semi-centralized framework is obtained based on the gread search on α with stepsize ǫ α = 10 − . Although it can be shown that further reducing ǫ α would not cause |U f | O u t a g e p r ob a b ilit y Centralized, |U m |=2Centralized, |U m |=3Centralized, |U m |=4Semi-Centralized, |U m |=2Semi-Centralized, |U m |=3 Semi-Centralized, |U m |=4Distributed, |U m |=2Distributed, |U m |=3Distributed, |U m |=4 (a) Outage probability vs. order of NOMA cluster for R min m = R min f = 1 bps/Hz. R minf (bps/Hz) -1 O u t a g e p r ob a b ilit y Centralized, R minm =0.5 bps/HzCentralized, R minm =1 bps/HzCentralized, R minm =2 bps/HzSemi-Centralized, R minm =0.5 bps/HzSemi-Centralized, R minm =1 bps/Hz Semi-Centralized, R minm =2 bps/HzDistributed, R minm =0.5 bps/HzDistributed, R minm =1 bps/HzDistributed, R minm =2 bps/Hz (b) Outage probability vs. users minimum rate demand for -order NOMA clusters. R minf (bps/Hz) -1 O u t a g e p r ob a b ilit y Centralized, R minm =0.5 bps/HzCentralized, R minm =1 bps/HzCentralized, R minm =2 bps/HzSemi-Centralized, R minm =0.5 bps/HzSemi-Centralized, R minm =1 bps/Hz Semi-Centralized, R minm =2 bps/HzDistributed, R minm =0.5 bps/HzDistributed, R minm =1 bps/HzDistributed, R minm =2 bps/Hz (c) Outage probability vs. users minimum rate demand for -order NOMA clusters.Fig. 4. Outage probability of the centralized and decentralized JSPA frameworks for different number of users and minimumrate demands. signiﬁcant higher total spectral efﬁciency, ǫ α = 10 − is not good enough for calculating outageprobability. ǫ α can be easily reduced to − in the semi-centralized framework, however we set ǫ α = 10 − for both the centralized and semi-centralized frameworks to have a fair comparision.For the same stepsize ǫ α , the feasible region of the semi-centralized problem is a subset of thefeasible region of its corresponding centralized problem.

2) Total Spectral Efﬁciency Performance:

The performance of the decentralized frameworksdepends on how good is the approximation of the power consumption of the BSs comparedto the centralized framework. Figs. 5(a)-5(c) show the impact of order of NOMA clusters and |U f | f Centralized, |U m |=2Centralized, |U m |=3Centralized, |U m |=4 (a) FBS power consumption in the cen-tralized framework vs. order of NOMAcluster for R min m = R min f = 1 bps/Hz. R minf (bps/Hz) f Centralized, R minm =0.5 bps/HzCentralized, R minm =1 bps/HzCentralized, R minm =2 bps/Hz (b) FBS power consumption in the cen-tralized framework vs. minimum ratedemand for -order NOMA clusters. R minf (bps/Hz) f Centralized, R minm =0.5 bps/HzCentralized, R minm =1 bps/HzCentralized, R minm =2 bps/Hz (c) FBS power consumption in the cen-tralized framework vs. minimum ratedemand for -order NOMA clusters. |U f | m Centralized, |U m |=2Centralized, |U m |=3Centralized, |U m |=4Semi-Centralized, |U m |=2Semi-Centralized, |U m |=3Semi-Centralized, |U m |=4 (d) MBS power consumption in thecentralized/semi-centralized frameworksvs. order of NOMA cluster for R min m = R min f = 1 bps/Hz. R minf (bps/Hz) m Centralized, R minm =0.5 bps/HzCentralized, R minm =1 bps/HzCentralized, R minm =2 bps/HzSemi-Centralized, R minm =0.5 bps/HzSemi-Centralized, R minm =1 bps/HzSemi-Centralized, R minm =2 bps/Hz (e) MBS power consumption in thecentralized/semi-centralized frameworksvs. minimum rate demand for -orderNOMA clusters. R minf (bps/Hz) m Centralized, R minm =0.5 bps/HzCentralized, R minm =1 bps/HzCentralized, R minm =2 bps/HzSemi-Centralized, R minm =0.5 bps/HzSemi-Centralized, R minm =1 bps/HzSemi-Centralized, R minm =2 bps/Hz (f) MBS power consumption in thecentralized/semi-centralized frameworksvs. minimum rate demand for -orderNOMA clusters.Fig. 5. Average power consumption coefﬁcient of femto/macro BSs in the centralized/semi-centralized frameworks for differentnumber of macro/femto-cell users and minimum rate demands. minimum rate demands on the FBS power consumption coefﬁcient α f at the optimal point ofthe centralized framework. As is expected, larger |U f | and/or R min f results in larger FBS powerconsumption. Moreover, we observe that increasing |U m | and/or R min m decreases α f . However,the impact of |U m | and/or R min m on α f is quite low and negligible, due to the low ICI level fromlow-power FBS to macro-cell users in average. More importantly, we observe that in most ofthe cases, the FBS operates in up to of its available power. It is noteworthy that in boththe decentralized frameworks, we assume that the FBS operates in of its available power.Figs. 5(d)-5(f) evaluate the impact of order of NOMA clusters and minimum rate demands onthe MBS power consumption coefﬁcient α m at the optimal point of the centralized and semi-centralized frameworks. As is expected, α m is directly proportional to |U m | and/or R min m , whileis inversely proportional to |U f | and/or R min f . More importantly, we observe that1) α m in the semi-centralized framework is always upper-bounded by α m in the centralized framework. This is due to larger α f in the semi-centralized framework compared to thecentralized framework.2) The MBS typically operates in less than of its available power. Hence, the fullydistributed framework with α m = 1 results in signiﬁcantly degraded spectral efﬁciency atfemto-cell users, due to high ICI from the MBS to femto-cell users.Last but not least, we observe that the MBS power consumption gap between the centralizedand semi-centralized frameworks is quite low (see Figs. 5(d)-5(f)).Fig. 6 investigates the total spectral efﬁciency of users in the centralized and decentralizedframeworks. According to the discussions for Fig. 5, we observed that in most of the cases,the performance gap between the centralized (optimal) and semi-centralized frameworks isquite low, speciﬁcally for the lower order of the femto-cell NOMA cluster. Hence, the semi-centralized framework with its low computational complexity (see Table I) is a good candidatesolution for the larger-scale systems. Besides, the fully distributed framework results in quitelow performance, due to the discussions for Fig. 5. D. Convergence of the Iterative Distributed Framework for Solving (10)The feasible domain of problem (10) is empty if1) Problem (10) is infeasible when the maximum power constraint (3b) is removed. Thiscorresponds to the feasibility of (10) which can be determined by the Perron–Frobeniuseigenvalues of the matrices arising from the power control subproblems (see Theorem 8in [18]). In this theorem, it is proved that regardless of the availability of powers, (10) canbe infeasible, due to the existing ICI and minimum rate demands.2) Problem (10) is infeasible while (10) without (3b) is feasible. As a result, (10) is infeasibleonly because of the lack of power resources to meet the QoS constraints in (3c).Since Alg. 2 is a component-wise minimum [18], for any feasible p (0) , it converges to a uniquepoint which is the globally optimal solution. More importantly, the results show that Alg. 2 alsoconverges to the globally optimal solution for any infeasible but ﬁnite p (0) if the feasible domainof problem (10) is nonempty. Fig. 7(b) shows the convergence of Alg. 2 for different initial points. α (0) b = 0 denotes a zero power consumption for the BSs, i.e., p (0) b,i = 0 , ∀ b ∈ B , i ∈ U b . Besides, α (0) b = 1 denotes that the BSs operate in their maximum power at the initial point which mayviolate (3c). Fig. 7(b) shows that Alg. 2 in both the initial points converges to the unique point.For α (0) b = 0 , the convergence of Alg. 2 corresponds tightening the lower-bound of optimal value, |U f | T o t a l s p ec t r a l e ff i c i e n c y ( bp s / H z ) Centralized, |U m |=2Centralized, |U m |=3Centralized, |U m |=4Semi-Centralized, |U m |=2Semi-Centralized, |U m |=3 Semi-Centralized, |U m |=4Distributed, |U m |=2Distributed, |U m |=3Distributed, |U m |=4 (a) Total spectral efﬁciency vs. order of NOMA cluster for R min m = R min f = 1 bps/Hz. R minf (bps/Hz) T o t a l s p ec t r a l e ff i c i e n c y ( bp s / H z ) Centralized, R minm =0.5 bps/HzCentralized, R minm =1 bps/HzCentralized, R minm =2 bps/HzSemi-Centralized, R minm =0.5 bps/HzSemi-Centralized, R minm =1 bps/Hz Semi-Centralized, R minm =2 bps/HzDistributed, R minm =0.5 bps/HzDistributed, R minm =1 bps/HzDistributed, R minm =2 bps/Hz (b) Total spectral efﬁciency vs. users minimum rate demandfor -order NOMA clusters. R minf (bps/Hz) T o t a l s p ec t r a l e ff i c i e n c y ( bp s / H z ) Centralized, R minm =0.5 bps/HzCentralized, R minm =1 bps/HzCentralized, R minm =2 bps/HzSemi-Centralized, R minm =0.5 bps/HzSemi-Centralized, R minm =1 bps/Hz Semi-Centralized, R minm =2 bps/HzDistributed, R minm =0.5 bps/HzDistributed, R minm =1 bps/HzDistributed, R minm =2 bps/Hz (c) Total spectral efﬁciency vs. users minimum rate demandfor -order NOMA clusters.Fig. 6. Total spectral efﬁciency of the centralized and decentralized frameworks for different number of macro/femto-cell usersand minimum rate demands. since the ICI at each user is always upper-bounded by its ICI at the converged point. Besides,since the ICI at each user reaches to its maximum possible value at α (0) b = 1 , the convergenceof Alg. 2 for α (0) b = 1 corresponds to tightening the upper-bound of optimal value. Based on theKKT conditions analysis in Appendix C, we observed that p ∗ is independent from the maximumpower constraint (3b). It can be shown that if problem (10) is infeasible while (10) without (3b)is feasible (Case 2 of the infeasibility reasons of problem (10)), Alg. 2 will converge to theoptimal ﬁnite point violating (3b). For larger minimum rate demands (Fig. 7(c)), we observethat Alg. 2 diverges and the optimal value tends to inﬁnity, regardless of the maximum power -500 -300 -100 100 300 500 Horizontal axis coordinate (m) -400-300-200-1000100200300400500 V e r ti ca l a x i s c oo r d i n a t e ( m ) Coverage area of MBSCoverage area of FBSMBSFBSMacro-cell userFemto-cell user (a) The network topology and users placement for |U m | = 3 ,and |U f | = 2 . Iteration index -20-15-10-50 T o t a l po w e r c on s u m p ti on ( d B m ) (1)b =1 (1)b =0 (b) Convergence of the iterative distributed framework for R min m = 1 bps/Hz, and R min f = 1 . Iteration index -20020406080100 T o t a l po w e r c on s u m p ti on ( d B m ) (1)b =1 (1)b =0 (c) Divergence of the iterative distributed framework for R min m = 4 bps/Hz, and R min f = 4 .Fig. 7. Convergence/Divergence of Alg. 2 for different initial points, and a channel realization with |U m | = 3 , |U f | = 2 , anddifferent minimum rate demands. constraint (3b). This corresponds to the ﬁrst case of the infeasibility reasons of (10). Hence, itis important to check the Perron–Frobenius eigenvalues of the matrices arising from the powercontrol subproblems (see Theorem 8 in [18]) before ﬁnding a feasible point for (10). Last butnot least, Fig. 7(b) veriﬁes a fast convergence speed of Alg. 2 for both the initialization methods,however α (0) b = 0 converges in less iterations compared to α (0) b = 1 . E. Convergence and Performance of the JRPA Algorithm

In Fig. 8, we investigate the convergence of our proposed JRPA algorithm which is based onsequential programming. We assume that the CNR-based decoding order is applied. Since thesequential programming converges to the locally optimal solution, the initial point may affectthe performance of this method. In this study, we applied three initialization methods as1) Minimum rate equality (MRE): In this method, we obtain r (0) by solving the total powerminimization problem (16). It is proved that at the optimal (feasible) point, the spectral ef-ﬁciency of each user achieves its minimum rate demand. Hence, we have r (0) b,i = R min b,i , ∀ b ∈B , i ∈ U b .2) Approximated rate function (ARF): In this method, we substitute the strictly concaveterm g ( r b,i ) = ln (2 r b,i − with its approximated afﬁne function mr b,i in (38c), where m = ∂ ln ( R − ) ∂R , where R is signiﬁcantly large. Then, we solve the convex approximatedproblem of (38) and obtain r (0) . For sufﬁciently large R , g ( r b,i ) is upper-bounded by m × r b,i .3) Equal power allocation (EPA): In this method, we equally distribute P max b to all the usersin U b , and then obtain r (0) according to (2). This method may lead to an infeasible r (0) .MRE provides a feasible r (0) . However, this method does not consider the heterogeneity of usersspectral efﬁciency, leading to larger convergence speed and in some situations lower performance.MRE works well for the low-SINR scenarios with signiﬁcantly high minimum rate demands.The ARF method provides a better feasible lower-bound for the total spectral efﬁciency of usersat the initial point. Fig. 8(a) shows that for larger minimum rate demands, ln (2 r − ≈ mr .The performance gap of ARF and the globally optimal solution is allocating more powers tousers operating in low spectral efﬁciency regions, which results in allocating less power to thestronger user deserving additional power. ARF also works well for the scenarios that the lowadditional minimum rate demands does not have signiﬁcant impact on the users total spectralefﬁciency, i.e., high SINR regions. The EPA initialization method usually leads to infeasible r (0) violating (3c), due to INI and ICI at users. More importantly, we observed that EPA also leadsto high outage at the next iteration of the JRPA algorithm. In Fig. 8, we selected the scenariothat EPA (violating (3c)) does not make the next iteration infeasible to show the convergencebehavior of this initialization method.The users placement are shown in Fig. 8(b). As shown, the JRPA provides a sequence of r -10-8-6-4-20246810 y y=m.ry=ln(2 r -1) (a) Approximation of y = ln (2 r − with the linear function m × r , where m = y ′ ( r = 15) . -500 -300 -100 100 300 500 Horizontal axis coordinate (m) -400-300-200-1000100200300400500 V e r ti ca l a x i s c oo r d i n a t e ( m ) Coverage area of MBSCoverage area of FBSMBSFBSMacro-cell userFemto-cell user (b) The network topology and users placement for |U m | = 3 ,and |U f | = 2 . Iteration index T o t a l s p ec t r a l e ff i c i e n c y o f u s e r s ( bp s / H z ) Optimal ValueApproximated Rate FunctionEqual Power AllocationMinimum Rate Equality (c) Total spectral efﬁciency vs. iteration index for differentinitialization methods.

Iteration index S p ec t r a l e ff i c i e n c y o f eac h u s e r ( bp s / H z ) Femtocell user 1Femtocell user 2Macrocell user 1Macrocell user 2Macrocell user 3 (d) User spectral efﬁciency vs. iteration index for the MREinitialization method.

Iteration index S p ec t r a l e ff i c i e n c y o f eac h u s e r ( bp s / H z ) Femtocell user 1Femtocell user 2Macrocell user 1Macrocell user 2Macrocell user 3 (e) User spectral efﬁciency vs. iteration index for the ARFinitialization method.

Iteration index S p ec t r a l e ff i c i e n c y o f eac h u s e r ( bp s / H z ) Femtocell user 1Femtocell user 2Macrocell user 1Macrocell user 2Macrocell user 3 (f) User spectral efﬁciency vs. iteration index for the EPAinitialization method.Fig. 8. Convergence of Alg. 3 for different initialization methods for a scenario with |U m | = 3 , |U f | = 2 , R min m = 1 bps/Hz,and R min f = 2 . The CNR-based decoding order is optimal: Macro-cell users: → → ; Femto-cell users: → . improved solutions for any feasible initial point such that it converges to a stationary point.Interestingly, we observe that both the MRE and ARF methods converge to a unique point,which shows the low insensitivity of JRPA to these feasible initial points. In this scenario, EPAin iteration results in infeasible r (0) . And, JRPA ﬁnds a feasible solution r (1) based on infeasible r (0) , which is indeed the updated initial feasible point. According to Figs. 8(d)-8(f), we observethat at the converged point, only the NOMA cluster-head users get additional power (leadingto higher spectral efﬁciency than their minimum rate demand). The fast convergence speed ofindividual rates in Figs. 8(d)-8(f) shows that our proposed JRPA has a fact convergence speedshown in Fig. 8(c). In our simulations, we applied the ARF initialization method.As is mentioned in Subsection III-A2, it is difﬁcult to ﬁnd the globally optimal JRPA for anyﬁxed suboptimal decoding order. However, for the case that the ﬁxed decoding order is the sameas the optimal decoding order, the performance gap between the optimal JSPA and suboptimalJRPA algorithms is due to suboptimal JRPA based on sequential programming. In Fig. 8, wechoosed the case that the CNR-based decoding order satisfy Theorem 1, so is optimal. The totalspectral efﬁciency of users (optimal value) at the globally optimal point is shown in Fig. 8(c).As can be seen, the sequential programming generates a sequence of improved solutions suchthat after few iterations, it converges to a near-to-optimal solution. F. Performance of the Approximated Optimal Powers in Remark III.2.1.

Here, we investigate the performance of approximated closed-form of optimal powers inRemark III.2.1. The advantage of this approximation is its insensitivity to the exact CSI. Fig. 9compares the average gap between the exact and approximated forms of optimal powers in femtoand macro-cells, separately. Since ICI is fully treated as AWGN, we evaluate this performancegap in different AWGN power levels of users which directly impacts the CINR of users. InFig. 9, M is the number of users within the considered cell, and R min is their minimum ratedemand. To reduce the randomness impact, we eliminated the Lognormal Shadowing from thepath loss model (see Table II). As can be seen in Figs. 9(a) and 9(c), the average gap betweenthe optimal and approximated powers is increasing in the order of NOMA clusters, minimumrate demands, and speciﬁcally AWGN power. Interestingly, for lower AWGN powers, e.g., lessthan − dBm, this performance gap tends to zero. As a result, this approximation works wellfor middle and high SINR scenarios. On the other hand, we observe that the macro/femto-celltotal spectral efﬁciency gaps between these two closed-form formulations are less than . for -124 -119 -114 -109 -104 -99 -94 Noise power at macro-cell users (dBm) A v e r a g e op ti m a l po w e r s g a p ( % ) M=2, R min =1M=2, R min =2M=2, R min =3M=2, R min =4M=3, R min =1M=3, R min =2M=3, R min =3M=3, R min =4 M=4, R min =1M=4, R min =2M=4, R min =3M=4, R min =4M=5, R min =1M=5, R min =2M=5, R min =3M=5, R min =4 (a) Average optimal and approximated powers gap vs. AWGNpower at macro-cell users. -124 -119 -114 -109 -104 -99 -94 Noise power at macro-cell users (dBm) A v e r a g e t o t a l s p ec t r a l e ff i c i e n c y g a p ( % ) M=2, R min =1M=2, R min =2M=2, R min =3M=2, R min =4M=3, R min =1M=3, R min =2M=3, R min =3M=3, R min =4 M=4, R min =1M=4, R min =2M=4, R min =3M=4, R min =4M=5, R min =1M=5, R min =2M=5, R min =3M=5, R min =4 (b) Average total spectral efﬁciency gap vs. AWGN power atmacro-cell users. -124 -119 -114 -109 -104 -99 -94 Noise power at femto-cell users (dBm) A v e r a g e op ti m a l po w e r s g a p ( % ) M=2, R min =1M=2, R min =2M=2, R min =3M=2, R min =4M=3, R min =1M=3, R min =2M=3, R min =3M=3, R min =4 M=4, R min =1M=4, R min =2M=4, R min =3M=4, R min =4M=5, R min =1M=5, R min =2M=5, R min =3M=5, R min =4 (c) Average optimal and approximated powers gap vs. AWGNpower at femto-cell users. -124 -119 -114 -109 -104 -99 -94 Noise power at femto-cell users (dBm) A v e r a g e t o t a l s p ec t r a l e ff i c i e n c y g a p ( % ) M=2, R min =1M=2, R min =2M=2, R min =3M=2, R min =4M=3, R min =1M=3, R min =2M=3, R min =3M=3, R min =4 M=4, R min =1M=4, R min =2M=4, R min =3M=4, R min =4M=5, R min =1M=5, R min =2M=5, R min =3M=5, R min =4 (d) Average total spectral efﬁciency gap vs. AWGN power atfemto-cell users.Fig. 9. The performance gap between the approximated and exact closed-form expression of optimal powers in macro/femto-cellfor different AWGN powers, number of macro/femto-cell users, and minimum rate demands. M ≤ , R min ≤ bps/Hz, and AWGN power less than − dBm. Hence, the results show ahigh insensitivity level of optimal powers at macro/femto-cell users to the CSI. The impact ofthe approximated closed-form optimal powers on the ergodic rate regions and/or imperfect CSIcan be considered as a future work.V. C ONCLUDING R EMARKS

In this paper, we addressed the problem of optimal joint SIC ordering and power allocation inmulti-cell NOMA systems to achieve the maximum users sum-rate. We showed that the optimalSIC ordering depends on the ICI at users. For the given total power consumption of BSs, we obtained the optimal powers and SIC decoding orders in closed-form. Then, we proposed aglobally optimal JSPA algorithm with a signiﬁcantly reduced computational complexity. For anygiven suboptimal decoding order, we addressed the problem of joint rate and power allocationto maximize users sum-rate. We showed that for speciﬁc channel conditions, the CNR-baseddecoding order is optimal for a user pair. We also devised two decentralized resource allocationframeworks. Numerical assessments show that the optimal SIC ordering results in signiﬁcantlylower outage probability and higher users sum-rate compared to the CNR-based decoding order.Moreover, for the ﬁxed suboptimal SIC ordering, the rate adoption is necessary to achieve themaximum possible sum-rate. As a result, restricting the rate region of users results in high outageand subsequently seriously low sum-rate. Besides, we observed that the semi-centralized frame-work has a near-to-optimal performance with a signiﬁcantly lower computational complexitycompared to the globally optimal JSPA algorithm in the centralized framework.A PPENDIX AP ROOF OF P ROPOSITION

III.1 .Let ˜ h b,k = h b,k I b,k + σ b,k . The rate function in (2) can be reformulated as R b,i = min k ∈{ i }∪ Φ b,i  log  p b,i ˜ h b,k P j ∈ Φ b,i p b,j ˜ h b,k + 1  , which is the same as the achievable rate of single-cell NOMA based on the equivalent noise.Since the SISO Gaussian broadcast channels are degraded [1], [2], NOMA with CINR-baseddecoding order is capacity achieving in each cell of multi-cell NOMA (where ICI is fully treatedas AWGN), so the decoding order ˜ h b,k > ˜ h b,i ⇒ k → i is optimal [5], [10].Similar to single-cell NOMA, in each cell b , we have ∂ log  pb,k ˜ hb,k P j ∈ Φ b,i pb,j ˜ hb,k +1  ∂ ˜ h b,k > . Assumethat α − b is ﬁxed. In the following, we analytically show that at any given p b (after linearsuperposition coding combined), the decoding order based on ˜ h b,k ( α − b ) > ˜ h b,i ( α − b ) ⇒ k → i achieves the maximum total spectral efﬁciency of users after SIC. Assume that cell b has M users. Moreover, the users index are updated based on k > i if ˜ h b,k ( α − b ) > ˜ h b,i ( α − b ) . We provethat the decoding order M → M − → · · · → outperforms any other possible decodingorders in terms of total spectral efﬁciency of users, so is optimal. To prove this, for two adjacentusers i and i + 1 (with ˜ h b,i +1 > ˜ h b,i ) consider different decoding orders as A) i + 1 → i , and subsequently M → M − → · · · → ; B) i → i + 1 , and subsequently M → M − → · · · → i + 2 → i → i + 1 → i − → · · · → . According to (2), the achievable spectral efﬁciency ofusers i and i + 1 in case A (after SIC) can be obtained by R Ab,i ( p b ) = log  p b,i ˜ h b,iM P j = i +1 p b,j ˜ h b,i + 1  , R Ab,i +1 ( p b ) = log  p b,i +1 ˜ h b,i +1 M P j = i +2 p b,j ˜ h b,i +1 + 1  . The achievable spectral efﬁciency of users i and i + 1 in case B (after SIC) is given by R Bb,i ( p b ) = log  p b,i ˜ h b,iM P j = i +2 p b,j ˜ h b,i + 1  , R Bb,i +1 ( p b ) = log  p b,i +1 ˜ h b,i ( p b,i + M P j = i +2 p b,j )˜ h b,i + 1  . In both Cases A and B, the signal of users i and i + 1 is treated as INI at users , . . . , i − .Moreover, the signal of users i and i + 1 is scheduled to be decoded and canceled by all the users i + 2 , . . . , M . Hence, the set Φ b,k of each user k ∈ { , . . . , i − , i + 2 , . . . , M } is the same in bothCases A and B, resulting in the same spectral efﬁciency formulated in (2). Since the signal of allthe users in U b is fully treated as noise (called ICI), the set Φ b ′ ,i of each user i ∈ U b ′ , b ′ ∈ B \ { b } is the same in both the cases A and B resulting in the same spectral efﬁciency. Accordingly,changing the decoding order of two adjacent users only changes the capacity region of these twodecoding orders for given p b . As a result, the total spectral efﬁciency gap between the differentdecoding orders A and B for given p b can be formulated by R A − B gap ( p b ) = X b ∈B X i ∈U b R Ab,i ( p b ) − X b ∈B X i ∈U b R Bb,i ( p b )= (cid:0) R Ab,i ( p b ) + R Ab,i +1 ( p b ) (cid:1) − (cid:0) R Bb,i ( p b ) + R Bb,i +1 ( p b ) (cid:1) = log  M P j = i p b,j ˜ h b,i ! M P j = i +1 p b,j ˜ h b,i +1 ! M P j = i +1 p b,j ˜ h b,i ! M P j = i +2 p b,j ˜ h b,i +1 !  +log  M P j = i +2 p b,j ˜ h b,i ! p b,i + M P j = i +2 p b,j )˜ h b,i ! p b,i + M P j = i +2 p b,j )˜ h b,i ! M P j = i p b,j ˜ h b,i !  = log  M P j = i +1 p b,j ˜ h b,i +1 ! M P j = i +2 p b,j ˜ h b,i ! M P j = i +1 p b,j ˜ h b,i ! M P j = i +2 p b,j ˜ h b,i +1 !  = log  M P j = i +1 p b,j ! ˜ h b,i +1 + M P j = i +2 p b,j ! ˜ h b,i + M P j = i +2 p b,j ! M P j = i +1 p b,j ! ˜ h b,i ˜ h b,i +1 M P j = i +1 p b,j ! ˜ h b,i + M P j = i +2 p b,j ! ˜ h b,i +1 + M P j = i +2 p b,j ! M P j = i +1 p b,j ! ˜ h b,i ˜ h b,i +1  . The difference of the numerator and denominator of the latter fraction is p b,i +1 (cid:16) ˜ h b,i +1 − ˜ h b,i (cid:17) ,which is always positive since ˜ h b,i +1 > ˜ h b,i , which results in R A − B gap ( p b ) > . Therefore, for anyfeasible p b , the decoding order i + 1 → i for each two adjacent users i and i + 1 in cell b isoptimal if and only if ˜ h b,i +1 > ˜ h b,i . Imposing this optimality condition to each two adjacentusers in cell b results in the decoding order based on ˜ h b,k ( α − b ) > ˜ h b,i ( α − b ) ⇒ k → i . As aresult, λ ∗ b,i,k = 1 if and only if ˜ h b,i ( α − b ) ≤ ˜ h b,k ( α − b ) , and the proof is completed.A PPENDIX BP ROOF OF P ROPOSITION

III.2 .According to Corollary III.1.1, the achievable spectral efﬁciency of each user i ∈ U b for theﬁxed α − b and optimal decoding order M → M − → · · · → can be formulated by ˜ R b,i ( p b ) = log  p b,i ˜ h b,i ( α − b ) M P j = i +1 p b,j ˜ h b,i ( α − b ) + 1  . Note that ˜ h b,k ( α − b ) > ˜ h b,i ( α − b ) for each i, k ∈ U b , k > i . Moreover, ˜ R b,i is independent from p − b for given α − b , meaning that (3) can be equivalently divided into B single-cell NOMAsub-problems. In cell b , we ﬁnd p ∗ b by solving the following sub-problem as max p b ≥ M X i =1 ˜ R b,i ( p b ) (21a)s.t. X i ∈U b p b,i = α b P max b , (21b) ˜ R b,i ( p b ) ≥ R min b,i , ∀ i ∈ U b . (21c)Similar to single-cell NOMA, it can be easily shown that the hessian of the sum-rate functionin (21a) is negative deﬁnite in p b for ˜ h b,k ( α − b ) > ˜ h b,i ( α − b ) for each i, k ∈ U b , k > i [11]–[13], so the objective function (21a) is strictly concave on p b . (21c) can be rewritten as the followinglinear constraint R min b,i M P j = i +1 p b,j ˜ h b,i ! ≤ M P j = i p b,j ˜ h b,i . Hence, the feasible region of (21) isafﬁne, so is convex. Accordingly, the problem (21) is strictly convex in p b . The Slater’s conditionholds in (21) since it is convex and there exists p b ≥ satisfying (21c) with strict inequalities.Therefore, the strong duality holds in (21). As a result, the KKT conditions are satisﬁed andthe optimal solution p ∗ can be obtained by using the Lagrange dual method [15]. The Lagrangefunction (upper-bound) of (21) is given by L ( p b , µ , δ , ν ) = M X i =1 log  p b,i ˜ h b,i M P j = i +1 p b,j ˜ h b,i  + M X i =1 µ i  log  p b,i ˜ h b,i M P j = i +1 p b,j ˜ h b,i  − R min b,i  + M X i =1 δ i p b,i + ν α b P max b − M X i =1 p b,i ! , where µ = [ µ , . . . , µ M ] , δ = [ δ , . . . , δ M ] , and ν are the Lagrangian multipliers correspondingto the constraints (21c), (21b), and p b,i ≥ , i = 1 , . . . , M , respectively. The Lagrange dualproblem is given by min µ , δ ,ν sup p { L ( p , µ , δ , ν ) } s.t. µ i ≥ , ∀ i = 1 , . . . , M,δ i ≥ , ∀ i = 1 , . . . , M. The KKT conditions are listed below:1) Feasibility of the primal problem (21):

C-1.1: log  p ∗ b,i ˜ h b,i M P j = i +1 p ∗ b,j ˜ h b,i  ≥ R min b,i , ∀ i, C-1.2: p ∗ b,i ≥ , ∀ i, C-1.3: M X i =1 p ∗ b,i = α b P max b .

2) Feasibility of the dual problem:

C-2.1: µ ∗ i ≥ , ∀ i = 1 , . . . , M, C-2.2: δ ∗ i ≥ , ∀ i = 1 , . . . , M, C-2.3: ν ∗ ∈ R .

3) The complementary slackness condition:

C-3.1: µ ∗ i  log  p ∗ b,i ˜ h b,i M P j = i +1 p ∗ b,j ˜ h b,i  − R min b,i  = 0 , ∀ i = 1 , . . . , M, C-3.2: δ ∗ i p ∗ b,i = 0 , ∀ i = 1 , . . . , M.

4) The condition ∇ p ∗ b L ( p ∗ b , µ ∗ , δ ∗ , ν ∗ ) = 0 , which implies that C-4: ∂L∂p ∗ b,i = i X k =1 µ ∗ i ln 2 . ˜ h b,k M P j = k p ∗ b,j ˜ h b,k − i − X k =1 µ ∗ i ln 2 . ˜ h b,k M P j = k +1 p ∗ b,j ˜ h b,k + δ ∗ i − ν ∗ = 0 , ∀ i = 1 , . . . , M. This equation can be reformulated by ∂L∂p ∗ b,i = 1 + µ ∗ i ln 2 . h b, M P j =1 p ∗ b,j ˜ h b, + i − X k =1 µ ∗ i ln 2 .  ˜ h b,k +1 M P j = k +1 p ∗ b,j ˜ h b,k +1 − ˜ h b,k M P j = k +1 p ∗ b,j ˜ h b,k  + δ ∗ i − ν ∗ = 0 , ∀ i = 1 , . . . , M. To ease of convenience, we indicate A ( p ∗ b ) = . ˜ h b, M P j =1 p ∗ b,j ˜ h b, and B k ( p ∗ b ) = . ˜ h b,k +1 M P j = k +1 p ∗ b,j ˜ h b,k +1 − ˜ h b,k M P j = k +1 p ∗ b,j ˜ h b,k , ∀ k = 1 , . . . , M − . Then, the last KKT condition can be reformulated as C-4.1: ∂L∂p ∗ b,i = (1 + µ ∗ i ) A ( p ∗ b ) + (1 + µ ∗ i ) i − X k =1 B k ( p ∗ b ) + δ ∗ i − ν ∗ = 0 , ∀ i = 1 , . . . , M. (23)The primal dual δ ∗ i , ∀ i = 1 , . . . , M, acts as a slack variable in C-4.1 (due to the KKT condition

C-2.2 ), so it can be eliminated by reformulating the KKT conditions (

C-4.1 , C-2.2 ) and

C-3.2 ,respectively as ν ∗ ≥ (1 + µ ∗ i ) A ( p ∗ b ) + i − X k =1 B k ( p ∗ b ) ! , ∀ i = 1 , . . . , M, (24)and p ∗ b,i ν ∗ − (1 + µ ∗ i ) A ( p ∗ b ) + i − X k =1 B k ( p ∗ b ) !! = 0 , ∀ i = 1 , . . . , M. (25)Obviously, A ( p ∗ ) , i = 1 , . . . , M, is positive at p ∗ b . According to Corollary III.1.1, B k ( p ∗ b ) , i =1 , . . . , M, is also positive at p ∗ b , since ˜ h b,k +1 > ˜ h b,k . To simplify the derivations, in the following, we assume that R min b,i > , ∀ i = 1 , . . . , M . Then, we show that the derivations are valid for thecase than R min b,i = 0 for some i ∈ U b . The assumption R min b,i > , ∀ i = 1 , . . . , M implies that p ∗ b,i > , ∀ i = 1 , . . . , M in C-1.2 . According to (25), we have ν ∗ = (1 + µ ∗ i ) A ( p ∗ b ) + i − X k =1 B k ( p ∗ b ) ! , ∀ i = 1 , . . . , M. (26)Consider two adjacent users i and i + 1 . According to (26), we have (1 + µ ∗ i ) A ( p ∗ b ) + i − X k =1 B k ( p ∗ b ) ! = (cid:0) µ ∗ i +1 (cid:1) A ( p ∗ b ) + i X k =1 B k ( p ∗ b ) ! , Since A ( p ∗ b ) > and B k ( p ∗ b ) > , ∀ k , we have A ( p ∗ b ) + i P k =1 B k ( p ∗ b ) > A ( p ∗ b ) + i − P k =1 B k ( p ∗ b ) .Accordingly, the latter equality holds if µ ∗ i +1 < µ ∗ i . Therefore, there exists ν ∗ satisfying (26) if µ ∗ i +1 < µ ∗ i for each i ∈ U b . Accordingly, we have the following strict inequalities as µ ∗ M < µ ∗ M − < · · · < µ ∗ . Based on Condition

C-2.1 , we have µ ∗ M ≥ which implies that µ ∗ i > , ∀ i = 1 , . . . , M − .According to Condition C-3.1 , the optimal power for users , . . . , M − can be obtained by log  p ∗ b,i ˜ h b,i M P j = i +1 p ∗ b,j ˜ h b,i  = R min b,i , ∀ i = 1 , . . . , M − . (27)According to the power condition C-1.3 , the optimal power of user M can be obtained by p ∗ b,M = α b P max b − M − X i =1 p ∗ b,i . (28)Note that µ ∗ M > implies that log (cid:16) p ∗ b,M ˜ h b,M (cid:17) = R min b,i due to Condition C-3.1 which mayviolate Condition

C-1.3 . Therefore, at the optimal point p ∗ b,M obtained by (28), we have µ ∗ M = 0 .Additionally, ν ∗ can take any value since at the optimal point, the KKT condition C-1.3 holds.From (27), it can be concluded that at the optimal point p ∗ b , the allocated power to all the userswith lower decoding order, i.e., users i = 1 , . . . , M − , is to only maintain their minimumspectral efﬁciency demand R min b,i . Moreover, (28) proves that only the NOMA cluster-head user M deserves additional power. According to (27), the optimal power for each user i < M canbe obtained by p ∗ b,i = T b,i α b P max b − i − P j =1 p ∗ b,j ! ˜ h b,i ! T b,i ˜ h b,i , ∀ i = 1 , . . . , M − , (29) where T i = R min b,i − h b,i , ∀ i = 1 , . . . , M − . For the case that R min b,i → for each user i = 1 , . . . , M − ,then T b,i → . Therefore, p ∗ b,i → meaning that when the spectral efﬁciency demand of theweaker user is zero, no power will be allocated to that user. According to (29), p ∗ b,i depends onoptimal powers p ∗ b,j , ∀ j = 1 , . . . , i − . Hence, the optimal powers can be directly obtained bycalculating p ∗ b, ⇒ p ∗ b, ⇒ · · · ⇒ p ∗ b,M − by (29), and ﬁnally p ∗ b,M according to (28). To ﬁnd aclosed-form expression for p ∗ b,i , we rewrite (29) as p ∗ b,i = β b,i P max b − i − X j =1 p ∗ b,j + 1˜ h b,i ! , ∀ i = 1 , . . . , M − , where β b,i = R min b,i − R min b,i , ∀ i = 1 , . . . , M − . Then, we have p ∗ b,i = β b,i α b P max b − p ∗ b,i − − i − X j =1 p ∗ b,j + 1˜ h b,i ! = β b,i (cid:18) α b P max b − β b,i − α b P max b − i − X j =1 p ∗ b,j + 1˜ h b,i − ! − i − X j =1 p ∗ b,j + 1˜ h b,i (cid:19) = β b,i (cid:18) (1 − β b,i − ) α b P max b − (1 − β b,i − ) i − X j =1 p ∗ b,j + 1˜ h b,i − β b,i − ˜ h b,i − (cid:19) ... = β b,i (cid:18) (1 − β b,i − ) (1 − β b,i − ) . . . (1 − β b, ) α b P max b + 1˜ h b,i − β b,i − ˜ h b,i − − (1 − β b,i − ) β b,i − ˜ h b,i − . . . − (1 − β b,i − ) (1 − β b,i − ) . . . (1 − β b, ) β b, ˜ h b, (cid:19) . According to the above, we have p ∗ b,i = β b,i  i − Y j =1 (1 − β b,j ) α b P max b + 1˜ h b,i − i − X j =1 β b,j i − Q k = j +1 (1 − β b,k )˜ h b,j  , ∀ i = 1 , . . . , M − . According to (28), the optimal power of the NOMA cluster-head user M can be obtained by p ∗ b,M = α b P max b − M − X i =1 β b,i  i − Y j =1 (1 − β b,j ) α b P max b + 1˜ h b,i − i − X j =1 β b,j i − Q k = j +1 (1 − β b,k )˜ h b,j  . A PPENDIX CC LOSED -F ORM E XPRESSION OF O PTIMAL P OWERS FOR T OTAL P OWER M INIMIZATION P ROBLEM

Here, we ﬁrst obtain the closed-form expression of optimal powers for a M -user singe-cellNOMA system under the CNR-based decoding order. Then, we extend the results to the casethat ICI is ﬁxed in cell b serving M users and ﬁnd the closed-form expressions of powers in p ∗ b under the CINR-based decoding order.In the power minimization problem of a M -user single-cell NOMA system with ˜ h < ˜ h < · · · < ˜ h M and thus the optimal (CNR-based) decoding order M → M − → · · · → , theachievable spectral efﬁciency of user i can be obtained by R i ( p ) = log  p i ˜ h iM P j = i +1 p j ˜ h i +1  .Here, ˜ h i = h i σ i is the normalized channel gain of user i by its noise power σ i . The total powerminimization problem under the CNR-based decoding order can be formulated by min p ≥ M X i =1 p i (30a)s.t. M X i =1 p i ≤ P max , (30b) log  p i ˜ h iM P j = i +1 p j ˜ h i + 1  ≥ R min i , ∀ i = 1 , . . . , M. (30c)The minimum rate constraint (30c) can be rewritten as p i h i ≥ (cid:16) R min i − (cid:17) M P j = i +1 p j ˜ h i + 1 ! , ∀ i =1 , . . . , M , which is afﬁne in p . Hence, problem (30) is convex in p with an afﬁne feasible set. Itcan be shown that in the power minimization problem, the allocated power to each user is onlyto maintain its minimal rate demand. In the following, we prove this proposition by analyzingthe KKT conditions. The Slater’s condition holds in (30) since it is convex and there exists p ≥ satisfying (30b) and (30c) with strict inequalities. Therefore, the strong duality in (30)holds. Hence, the KKT conditions are satisﬁed and the optimal solution p ∗ can be obtained byusing the Lagrange dual method [15]. The Lagrange function (lower-bound) of (30) is given by L ( p , µ , δ , ν ) = M X i =1 p i + M X i =1 µ i  R min i − log  p i ˜ h i M P j = i +1 p j ˜ h i  + M X i =1 δ i ( − p i )+ ν M X i =1 p i − P max ! , where µ = [ µ , . . . , µ M ] , δ = [ δ , . . . , δ M ] , and ν are the Lagrangian multipliers correspondingto the constraints (30c), (30b), and p i ≥ , i = 1 , . . . , M , respectively. The Lagrange dualproblem is given by min µ , δ ,ν sup p { L ( p , µ , δ , ν ) } s.t. µ i ≥ , ∀ i = 1 , . . . , M,δ i ≥ , ∀ i = 1 , . . . , M. The KKT conditions are listed below.1) Feasibility of the primal problem (30):

C-1.1: log  p ∗ i ˜ h i M P j = i +1 p ∗ j ˜ h i  ≥ R min i , ∀ i, C-1.2: p ∗ i ≥ , ∀ i, C-1.3: M X i =1 p ∗ i ≤ P max .

2) Feasibility of the dual problem:

C-2.1: µ ∗ i ≥ , ∀ i = 1 , . . . , M, C-2.2: δ ∗ i ≥ , ∀ i = 1 , . . . , M, C-2.3: ν ∗ ≥ .

3) The complementary slackness condition:

C-3.1: µ ∗ i  R min i − log  p ∗ i ˜ h i M P j = i +1 p ∗ j ˜ h i  = 0 , ∀ i = 1 , . . . , M, C-3.2: δ ∗ i p ∗ i = 0 , ∀ i = 1 , . . . , M, C-3.3: ν ∗ M X i =1 p ∗ i − P max ! = 0 .

4) The condition ∇ p ∗ L ( p ∗ , µ ∗ , δ ∗ , ν ∗ ) = 0 , which implies that C-4: ∂L∂p ∗ i = 1 − i − X j =1 µ ∗ j (cid:16) R min j − (cid:17) ˜ h j − µ ∗ i ˜ h i − δ ∗ i + ν ∗ = 0 , ∀ i = 1 , . . . , M. Let B j = (cid:16) R min j − (cid:17) ˜ h j , j = 1 , . . . , M . The latter equation is rewritten as C-4.1: ∂L∂p ∗ i = 1 − i − X j =1 µ ∗ j B j − µ ∗ i ˜ h i − δ ∗ i + ν ∗ = 0 , ∀ i = 1 , . . . , M. The primal dual δ ∗ i , ∀ i = 1 , . . . , M, acts as a slack variable in C-4.1 (due to the KKT condition

C-2.2 ), so it can be eliminated by reformulating the KKT conditions (

C-4.1 , C-2.2 ) and

C-3.2 ,respectively as ν ∗ ≥ i − X j =1 µ ∗ j B j + µ ∗ i ˜ h i − , ∀ i = 1 , . . . , M, (32)and p ∗ i ν ∗ − i − X j =1 µ ∗ j B j + µ ∗ i ˜ h i − !! = 0 , ∀ i = 1 , . . . , M. (33)We ﬁrst assume that R min i > , ∀ i = 1 , . . . , M . Then, we show that the derivations are validfor R min i = 0 for some i . The assumption R min i > , ∀ i = 1 , . . . , M implies that p ∗ i > , ∀ i =1 , . . . , M in C-1.2 . According to (33), we have ν ∗ = i − X j =1 µ ∗ j B j + µ ∗ i ˜ h i − , ∀ i = 1 , . . . , M. (34)Consider two adjacent users i and i + 1 . According to (34), we have i − X j =1 µ ∗ j B j + µ ∗ i ˜ h i = i − X j =1 µ ∗ j B j + µ ∗ i B i ! + µ ∗ i +1 h i +1 , which can be simpliﬁed to µ ∗ i ˜ h i = µ ∗ i B i + µ ∗ i +1 h i +1 ⇒ µ ∗ i (cid:16) ˜ h i − B i (cid:17) = µ ∗ i +1 h i +1 . In the following, we prove that µ ∗ i +1 < µ ∗ i . Let µ ∗ i +1 ≥ µ ∗ i . It implies that h i +1 ≤ ˜ h i − B i ,which is equivalent to h i +1 + B i ≤ ˜ h i . Since B i > , it results in h i +1 ≤ ˜ h i which violates ourassumption h i +1 > ˜ h i . Accordingly, (34) holds if µ ∗ i +1 < µ ∗ i for each two adjacent users i and i + 1 . Hence, there exists ν ∗ satisfying (34) if µ ∗ M < µ ∗ M − < · · · < µ ∗ . According to Condition

C-2.1 , we have µ ∗ M ≥ which implies that µ ∗ i > , ∀ i = 1 , . . . , M − .The optimal Lagrangian multiplier µ ∗ M of the NOMA cluster-head user is also positive. This isdue to the fact that r M ( p ∗ M ) = log (1 + p ∗ M h M ) in (30c) is monotonically increasing in p ∗ M , andalso independent from the other optimal powers. Hence, at the optimal point which corresponds to the minimal p ∗ M , the spectral efﬁciency r M ( p ∗ M ) reaches to its lower-bound R min M . Hence, wehave log (1 + p ∗ M h M ) = R min M . According to the KKT condition C-3.1 , µ ∗ M > . As a result, wehave < µ ∗ M < µ ∗ M − < · · · < µ ∗ . According to Condition

C-3.1 , the optimal power for each user i = 1 , . . . , M can be obtainedby log  p ∗ i ˜ h i M P j = i +1 p ∗ j ˜ h i  = R min i , ∀ i = 1 , . . . , M. (35)It is noteworthy that the duality gap between the primal and dual problems is zero when theSlater’s condition holds. This condition implies that there exists p such that the KKT condition C-1.3 with strict inequality holds, meaning that the feasible region of (30) with strict inequalitypower constraint M P i =1 p i < P max is nonempty. Since M P i =1 p i = P max corresponds to the maximumvalue of the objective function (30a), satisfying the Slater’s condition ensures us M P i =1 p ∗ i < P max .According to Condition C-3.3 , we have ν ∗ = 0 . According to the above, it can be concludedthat at the optimal point p ∗ , the allocated power to each user i is only to maintain its minimumspectral efﬁciency demand R min i . According to (27), the optimal power (in Watts) for each user i < M can be obtained by p ∗ i = T i M X j = i +1 p ∗ j ˜ h i ! , ∀ i = 1 , . . . , M, (36)where T i = R min i − h i , ∀ i = 1 , . . . , M . For the case that R min i → , then T i → . Therefore, p ∗ i → meaning that no power will be allocated to user i . Similar to (29), it can be easily shown thatthe optimal powers can be obtained directly by (36). To obtain a closed-form expression for p ∗ i ,we rewrite (36) as p ∗ i = β i h i + M X j = i +1 p ∗ j ! , ∀ i = 1 , . . . , M, where β i = 2 R min i − , ∀ i = 1 , . . . , M . The optimal power p ∗ i can be reformulated as p ∗ i = β i h i + M X j = i +1 p ∗ j ! The discussions about optimal ν ∗ = 0 is only additional notes on the impact of the power constraint. We proved that forany non-empty feasible set satisfying the Slater’s condition, the power constraint will not be active. = β i h i + p ∗ i +1 + M X j = i +2 p ∗ j ! = β i h i + β i +1 h i +1 + M X j = i +2 p ∗ j ! + M X j = i +2 p ∗ j ! = β i (1 + β i +1 ) M X j = i +2 p ∗ j + 1˜ h i + β i +1 h i +1 ! = β i (1 + β i +1 ) p ∗ i +2 + M X j = i +3 p ∗ j ! + 1˜ h i + β i +1 h i +1 ! = β i (1 + β i +1 ) β i +2 h i +2 + M X j = i +3 p ∗ j ! + M X j = i +3 p ∗ j ! + 1˜ h i + β i +1 h i +1 ! = β i (1 + β i +1 )(1 + β i +2 ) M X j = i +3 p ∗ j + 1˜ h i + β i +1 h i +1 + β i +2 (1 + β i +1 ) h i +2 ! ... = β i (cid:18) (1 + β i +1 )(1 + β i +2 ) . . . (1 + β M ) + 1˜ h i + β i +1 h i +1 + β i +2 (1 + β i +1 ) h i +2 + . . . + β M (1 + β M − ) . . . (1 + β i +1 ) h M (cid:19) . According to the above, we have p ∗ i = β i  M Y j = i +1 (1 + β j ) + 1˜ h i + M X j = i +1 β j j − Q k = i +1 (1 + β k )˜ h j  , ∀ i = 1 , . . . , M. (37)In multi-cell NOMA, let cell b has M users with ˜ h b, < ˜ h b, < · · · < ˜ h b,M , where ˜ h b,i = h b,i I b,i + σ b,i . According to Corollary III.1.1, the achievable spectral efﬁciency of each user i ∈ U b un-der the optimal decoding order M → M − → · · · → is ˜ R b,i ( p ) = log  p b,i ˜ h b,iM P j = i +1 p b,j ˜ h b,i +1  .In multi-cell NOMA, for the case that ICI is ﬁxed, the power minimization problem of cell b under the optimal (CINR-based) decoding order corresponds to the power minimization problemof single-cell NOMA. According to (37), for the case that ˜ h b, < ˜ h b, < · · · < ˜ h b,M , the optimalpower of each user i ∈ U b can be obtained in closed form as p ∗ b,i = β b,i  M Y j = i +1 (1 + β b,j ) + 1˜ h b,i + M X j = i +1 β b,j j − Q k = i +1 (1 + β b,k )˜ h b,j  , ∀ i = 1 , . . . , M. A PPENDIX DJ OINT P OWER A LLOCATION AND R ATE A DOPTION A LGORITHM

By taking ln from the both sides of (14c), we have ln (2 r b,i −

1) + ln (cid:18) P j ∈ Φ b,i p b,j h b,k + I b,k + σ b,k (cid:19) ≤ ln ( p b,i h b,k ) , ∀ b ∈ B , i, k ∈ U b , k ∈ { i } ∪ Φ b,i . Now, let p b,i = e ˜ p b,i , and subsequently I b,i ( ˜ p − b ) = P j ∈B j = b P l ∈U j e ˜ p j,l ! h j,b,i . Accordingly, problem (3) can be rewritten as max ˜ p , r ≥ X b ∈B X i ∈U b r b,i (38a)s.t. (14b) , X i ∈U b e ˜ p b,i ≤ P max b , ∀ b ∈ B , (38b) ln (2 r b,i −

1) + ln  X j ∈ Φ b,i e ˜ p b,j h b,k + X j ∈B j = b X l ∈U j e ˜ p j,l  h j,b,i + σ b,k  ≤ ˜ p b,i ln ( h b,k ) , ∀ b ∈ B , i, k ∈ U b , k ∈ { i } ∪ Φ b,i . (38c)The objective function (38) is afﬁne, so is concave on r . Constraint (14b) is afﬁne, so is convex.Constraint (38b) is also convex since log-sum-exp is convex [15]. However, (38c) is nonconvex. Inthe left hand side of (38c), it can be easily shown that the ﬁrst term ln (2 r b,i − is strictly concaveon r b,i which makes (38) nonconvex and strongly NP-hard. Now, we apply the iterative sequentialprogramming method. At each iteration t , we approximate the term g (cid:16) r ( t ) b,i (cid:17) = ln (cid:16) r ( t ) b,i − (cid:17) toits ﬁrst-order Taylor series around r ( t − b,i obtained from prior iteration ( t − as follows: ˆ g (cid:16) r ( t ) b,i (cid:17) = g (cid:16) r ( t − b,i (cid:17) + g ′ (cid:16) r ( t − b,i (cid:17) (cid:16) r ( t ) b,i − r ( t − b,i (cid:17) , (39)where g ′ ( r ) = r r − . By substituting g (cid:16) r ( t ) b,i (cid:17) with its afﬁne approximated form ˆ g (cid:16) r ( t ) b,i (cid:17) in (39),problem (38) at iteration t will be approximated to the following convex form as max ˜ p ( t ) , r ( t ) ≥ X b ∈B X i ∈U b r ( t ) b,i (40a)s.t. (14b) , (38b) ˆ g (cid:16) r ( t ) b,i (cid:17) + ln  X j ∈ Φ b,i e ˜ p ( t ) b,j h b,k + X j ∈B j = b X l ∈U j e ˜ p ( t ) j,l  h j,b,i + σ b,k  ≤ ˜ p ( t ) b,i ln ( h b,k ) , ∀ b ∈ B , i, k ∈ U b , k ∈ { i } ∪ Φ b,i . (40b)It can be shown that (40) satisﬁes the KKT conditions [29], so it can be solved by using theLagrange dual method, or IPMs [15]. In the sequential programming, we ﬁrst initialize r (0) . Ateach iteration t , we solve (40) and ﬁnd (cid:16) r ∗ ( t ) , ˜ p ∗ ( t ) (cid:17) according to the updated ˆ g (cid:16) r ( t ) b,i (cid:17) basedon r ∗ ( t − . We continue the iterations until the convergence is achieved.The solution of (40) remains in the feasible region of the main problem (38). This is dueto the fact that at each iteration t , we have ˆ g (cid:16) r ( t ) b,i (cid:17) ≥ g (cid:16) r ( t ) b,i (cid:17) . Let (cid:16) ˆ r ( t ) , ˆ˜ p ( t ) (cid:17) be the feasiblesolution of (40). It implies that (40b) is satisﬁed. Thus, we have ˆ˜ p ( t ) b,i ln ( h b,k ) − ln  X j ∈ Φ b,i e ˆ˜ p ( t ) b,j h b,k + X j ∈B j = b X l ∈U j e ˆ˜ p ( t ) j,l  h j,b,i + σ b,k  ≥ ˆ g (cid:16) ˆ r ( t ) b,i (cid:17) , ∀ b ∈ B ,i, k ∈ U b , k ∈ { i } ∪ Φ b,i . Since ˆ g (cid:16) r ( t ) b,i (cid:17) ≥ g (cid:16) r ( t ) b,i (cid:17) , we can guarantee that (38c) is satisﬁed, meaning that (40) remains inthe feasible region of (38). It can be easily shown that the sequential programming generates asequence of improved feasible solutions, such that it converges to a stationary point which is alocal maxima of (38) [29]. The performance and convergence of this algorithm is numericallyevaluated in Subsection IV-E. A PPENDIX EP ROOF OF T HEOREM i, k ∈ U b , if (4) holds at any power level p − b , the decoding order k → i isoptimal. Note that (4) for cell b is completely independent from p b (see Proposition III.1). Letus rewrite (4) for the user pair i, k ∈ U b as h b,k σ b,k − h b,i σ b,i ≥ X j ∈B j = b P i ∈U b p j,i ! σ b,i σ b,k ( h j,b,k h b,i − h j,b,i h b,k ) . (41)For the case that the left-hand side (LHS) is non-negative and each term in the right-hand side(RHS) is non-positive, (41) is satisﬁed at any power level p − b [20]. These conditions imply thatwith h b,k σ b,k ≥ h b,i σ b,i and h b,k h b,i ≥ h j,b,k h j,b,i , ∀ j ∈ B \ { b } , the decoding order k → i is optimal. Now,assume that for BSs in Q b,i,k ⊆ B \ { b } we have h j,b,k h b,i − h j,b,i h b,k > . Moreover, each BS j ∈ Q b,i,k operates at its maximum power P max j , meaning that P i ∈U b p j,i = P max j , ∀ j ∈ Q b,i,k . In the following, we show that if h b,k σ b,k − h b,i σ b,i ≥ P j ∈Q b,i,k P max j σ b,i σ b,k ( h j,b,k h b,i − h j,b,i h b,k ) , the inequality(41) holds for any power level p − b . To prove this, we ﬁrst note that the inequality X j ∈Q b,i,k P max j σ b,i σ b,k ( h j,b,k h b,i − h j,b,i h b,k ) ≥ X j ∈Q b,i,k P i ∈U b p j,i ! σ b,i σ b,k ( h j,b,k h b,i − h j,b,i h b,k ) , (42)always holds since ( h j,b,k h b,i − h j,b,i h b,k ) is positive for each BS j ∈ Q b,i,k , and P i ∈U b p j,i ≤ P max j , ∀ j ∈ Q b,i,k . On the other hand, for each BS j ′ ∈ B \ {Q b,i,k ∪ { b }} , we have ( h j,b,k h b,i − h j,b,i h b,k ) ≤ . Accordingly, if h b,k σ b,k ≥ h b,i σ b,i and (42) holds, the inequality in (41) holds for anypower level p − b , so the decoding order k → i is optimal at any feasible p − b .A PPENDIX FP ROOF OF F ACT

III.2.1 .For convenience, let all the channel gains be normalized by noise. According to (41), for eachuser pair i, k ∈ U b , k ∈ Φ b,i , (17) can be rewritten as the following linear constraint: h b,k − h b,i ≥ X j ∈B j = b X i ∈U b p j,i ! ( h j,b,k h b,i − h j,b,i h b,k ) . (43)Constraint (43) can be equivalently transformed to ( B − maximum power constraints includingall the BSs in B \ { b } . The power consumption of each BS j ∈ B \ { b } is upper-bounded by X i ∈U b p j,i ≤ ( h b,k − h b,i ) − P l ∈B\{ j,b } P i ∈U b p l,i ! ( h l,b,k h b,i − h l,b,i h b,k ) h j,b,k h b,i − h j,b,i h b,k | {z } Ψ j,b,i,k ( p − ( j,b ) ) . (44)In general, the maximum power constraint (3b) and SIC constraint (17) can be equivalentlycombined as X i ∈U b p b,i ≤ min  P max b , min j ∈B\{ b } ,i,k ∈U j,k ∈ Φ j,i Ψ b,j,i,k ( p − ( b,j ) )  , ∀ b ∈ B . (45)The negative side impact of the SIC necessary constraint (17) can observed in (45). For the casethat min j ∈B\{ b } ,i,k ∈U j,k ∈ Φ j,i Ψ b,j,i,k ( p − ( b,j ) ) < P max b , (17) imposes additional limitations on the total powerconsumption of BS b which restricts the feasible region of (3b). A PPENDIX GP ROOF OF F ACT

III.2.2 .In the sum-rate maximization problem of a M -user single-cell NOMA system with nor-malized channel gains h < h < · · · < h M , and thus optimal (CNR-based) decoding order M → M − → · · · → , the achievable spectral efﬁciency of user i can be obtained by R i ( p ) = log  p i h iM P j = i +1 p j h i +1  . The sum-rate function M P i =1 log  p i h iM P j = i +1 p j h i +1  is mono-tonically increasing in p , i.e., the allocated power to user with the lowest decoding order.In fact, the signal of user with lowest decoding order is decoded and canceled by all theother users i = 2 , . . . , M . Due to the power constraint M P i =1 p i ≤ P max , and the fact that R ( p ) ismonotonically increasing in p , at the optimal point p ∗ , we have p ∗ = P max − M P i =2 p ∗ i . In otherwords, we ensure that at the optimal point, we have M P i =1 p ∗ i = P max , and the proof is completed.R EFERENCES [1] H. Weingarten, Y. Steinberg, and S. S. Shamai, “The capacity region of the Gaussian multiple-input multiple-outputbroadcast channel,”

IEEE Transactions on Information Theory , vol. 52, no. 9, pp. 3936–3964, 2006.[2] A. E. Gamal and Y.-H. Kim,

Network Information Theory . Cambridge University Press, 2011.[3] Y. Saito, Y. Kishiyama, A. Benjebbour, T. Nakamura, A. Li, and K. Higuchi, “Non-orthogonal multiple access (NOMA)for cellular future radio access,” in

Proc. IEEE 77th Vehicular Technology Conference (VTC Spring) , 2013, pp. 1–5.[4] Z. Ding, X. Lei, G. K. Karagiannidis, R. Schober, J. Yuan, and V. K. Bhargava, “A survey on non-orthogonal multipleaccess for 5G networks: Research challenges and future trends,”

IEEE Journal on Selected Areas in Communications ,vol. 35, no. 10, pp. 2181–2195, 2017.[5] S. M. R. Islam, N. Avazov, O. A. Dobre, and K. Kwak, “Power-domain non-orthogonal multiple access (NOMA) in 5Gsystems: Potentials and challenges,”

IEEE Communications Surveys & Tutorials , vol. 19, no. 2, pp. 721–742, 2017.[6] W. Shin, M. Vaezi, B. Lee, D. J. Love, J. Lee, and H. V. Poor, “Non-orthogonal multiple access in multi-cell networks:Theory, performance, and practical challenges,”

IEEE Communications Magazine , vol. 55, no. 10, pp. 176–183, 2017.[7] M. Vaezi, R. Schober, Z. Ding, and H. V. Poor, “Non-orthogonal multiple access: Common myths and critical questions,”

IEEE Wireless Communications , vol. 26, no. 5, pp. 174–180, 2019.[8] O. Maraqa, A. S. Rajasekaran, S. Al-Ahmadi, H. Yanikomeroglu, and S. M. Sait, “A survey of rate-optimal power domainNOMA with enabling technologies of future wireless networks,”

IEEE Communications Surveys & Tutorials , vol. 22, no. 4,pp. 2192–2235, 2020.[9] L. You and D. Yuan, “A note on decoding order in user grouping and power optimization for multi-cell NOMA with loadcoupling,”

IEEE Transactions on Wireless Communications , vol. 20, no. 1, pp. 495–505, 2021.[10] P. Xu, Z. Ding, X. Dai, and H. V. Poor, “A new evaluation criterion for non-orthogonal multiple access in 5G softwaredeﬁned networks,”

IEEE Access , vol. 3, pp. 1633–1639, 2015.[11] J. Zhu, J. Wang, Y. Huang, S. He, X. You, and L. Yang, “On optimal power allocation for downlink non-orthogonalmultiple access systems,”

IEEE Journal on Selected Areas in Communications , vol. 35, no. 12, pp. 2744–2757, 2017. [12] M. S. Ali, E. Hossain, A. Al-Dweik, and D. I. Kim, “Downlink power allocation for CoMP-NOMA in multi-cell networks,” IEEE Transactions on Communications , vol. 66, no. 9, pp. 3982–3998, Sept. 2018.[13] W. U. Khan, F. Jameel, T. Ristaniemi, S. Khan, G. A. S. Sidhu, and J. Liu, “Joint spectral and energy efﬁciency optimizationfor downlink NOMA networks,”

IEEE Transactions on Cognitive Communications and Networking , vol. 6, no. 2, pp. 645–656, 2020.[14] M. S. Ali, H. Tabassum, and E. Hossain, “Dynamic user clustering and power allocation for uplink and downlink non-orthogonal multiple access (NOMA) systems,”

IEEE Access , vol. 4, pp. 6325–6343, 2016.[15] S. Boyd and L. Vandenberghe,

Convex Optimization . Cambridge University Press, 2009.[16] D. Tse and P. Viswanath,

Fundamentals of Wireless Communication . Cambridge University Press, 2005.[17] R. D. Yates, “A framework for uplink power control in cellular radio systems,”

IEEE Journal on Selected Areas inCommunications , vol. 13, no. 7, pp. 1341–1347, 1995.[18] Y. Fu, Y. Chen, and C. W. Sung, “Distributed power control for the downlink of multi-cell NOMA systems,”

IEEETransactions on Wireless Communications , vol. 16, no. 9, pp. 6207–6220, 2017.[19] J. Cui, Y. Liu, Z. Ding, P. Fan, and A. Nallanathan, “QoE-based resource allocation for multi-cell NOMA networks,”

IEEETransactions on Wireless Communications , vol. 17, no. 9, pp. 6160–6176, 2018.[20] L. You, D. Yuan, L. Lei, S. Sun, S. Chatzinotas, and B. Ottersten, “Resource optimization with load coupling in multi-cellNOMA,”

IEEE Transactions on Wireless Communications , vol. 17, no. 7, pp. 4735–4749, 2018.[21] D. Ni, L. Hao, Q. T. Tran, and X. Qian, “Transmit power minimization for downlink multi-cell multi-carrier NOMAnetworks,”

IEEE Communications Letters , vol. 22, no. 12, pp. 2459–2462, 2018.[22] L. Lei, L. You, Y. Yang, D. Yuan, S. Chatzinotas, and B. Ottersten, “Load coupling and energy optimization in multi-celland multi-carrier NOMA networks,”

IEEE Transactions on Vehicular Technology , vol. 68, no. 11, pp. 11 323–11 337, 2019.[23] Y. Sun, D. W. K. Ng, Z. Ding, and R. Schober, “Optimal joint power and subcarrier allocation for full-duplex multicarriernon-orthogonal multiple access systems,”

IEEE Transactions on Communications , vol. 65, no. 3, pp. 1077–1091, 2017.[24] J. Zhao, Y. Liu, K. K. Chai, A. Nallanathan, Y. Chen, and Z. Han, “Spectrum allocation and power control for non-orthogonal multiple access in HetNets,”

IEEE Transactions on Wireless Communications , vol. 16, no. 9, pp. 5825–5837,2017.[25] Z. Yang, C. Pan, W. Xu, Y. Pan, M. Chen, and M. Elkashlan, “Power control for multi-cell networks with non-orthogonalmultiple access,”

IEEE Transactions on Wireless Communications , vol. 17, no. 2, pp. 927–942, 2018.[26] K. Wang, Y. Liu, Z. Ding, A. Nallanathan, and M. Peng, “User association and power allocation for multi-cell non-orthogonal multiple access networks,”

IEEE Transactions on Wireless Communications , vol. 18, no. 11, pp. 5284–5298,2019.[27] A. B. M. Adam, X. Wan, and Z. Wang, “Energy efﬁciency maximization in downlink multi-cell multi-carrier NOMAnetworks with hardware impairments,”

IEEE Access , vol. 8, pp. 210 054–210 065, 2020.[28] C. Liu and D. Liang, “Heterogeneous networks with power-domain NOMA: Coverage, throughput, and power allocationanalysis,”

IEEE Transactions on Wireless Communications , vol. 17, no. 5, pp. 3524–3539, 2018.[29] A. Zappone, E. Bj¨ornson, L. Sanguinetti, and E. Jorswieck, “Globally optimal energy-efﬁcient power control and receiverdesign in wireless networks,”