[PDF] Wireless Power Control via Counterfactual Optimization of Graph Neural Networks

Abstract

We consider the problem of downlink power control in wireless networks, consisting of multiple transmitter-receiver pairs communicating with each other over a single shared wireless medium. To mitigate the interference among concurrent transmissions, we leverage the network topology to create a graph neural network architecture, and we then use an unsupervised primal-dual counterfactual optimization approach to learn optimal power allocation decisions. We show how the counterfactual optimization technique allows us to guarantee a minimum rate constraint, which adapts to the network size, hence achieving the right balance between average and 5 th percentile user rates throughout a range of network configurations.

Full PDF

WWireless Power Control via CounterfactualOptimization of Graph Neural Networks

Navid Naderializadeh ∗ Mark Eisen ∗ Alejandro Ribeiro † Abstract —We consider the problem of downlink power con-trol in wireless networks, consisting of multiple transmitter-receiver pairs communicating with each other over a singleshared wireless medium. To mitigate the interference amongconcurrent transmissions, we leverage the network topology tocreate a graph neural network architecture, and we then use anunsupervised primal-dual counterfactual optimization approachto learn optimal power allocation decisions. We show how thecounterfactual optimization technique allows us to guarantee aminimum rate constraint, which adapts to the network size, henceachieving the right balance between average and th percentileuser rates throughout a range of network conﬁgurations. Index Terms —Wireless power control, graph neural networks,counterfactual optimization, primal-dual learning

I. I

NTRODUCTION

With the proliferation of ubiquitous wireless devices andservices, wireless communication networks are becoming in-creasingly complex. In particular, the arrival of th generationmobile networks (5G) will provide connectivity to devicesranging from sensors and cell phones to vehicles and drones,shifting the paradigm of how things connect together. This willgive rise to ultra-dense deployment scenarios, where a massivenumber of transmissions compete to obtain access to a limitedamount of wireless resources.One of the main drivers of higher throughput in 5G networksand beyond is leveraging the bandwidth that is available athigher frequency bands, such as the mmWave band. However,given the fact that the physical wireless resources are limitedin nature, another way to enhance the performance of wirelessnetworks is to improve the spectral efﬁciency. This becomesextremely challenging as the networks become denser, sincethe interference among concurrent transmissions can signiﬁ-cantly hurt the network performance.To deal with these challenges, there has been a plethoraof work on radio resource management in wireless networks.The approaches proposed in the literature use a wide varietyof techniques in optimization, information, and game theoriesin order to attack various radio resource management sub-problems, including power control, link scheduling, cell asso-ciation, sub-carrier assignment, and beamforming [1]–[8].Nevertheless, solving the radio resource management prob-lem in its most general form is NP-hard, implying that asthe network size increases, it becomes more challenging toderive an optimal solution [9], [10]. That is why most prior Supported by the Intel Science and Technology Center for Wireless Au-tonomous Systems. The authors are with ∗ Intel Labs and † University of Penn-sylvania. E-mails: [email protected], [email protected],[email protected]. works in the literature devise approximate solutions in variousregimes of system parameters. With the recent success ofmachine learning, and particularly deep learning, over the pastfew years, learning-based algorithms have also been shownto result in promising solutions for resource management inwireless networks [11]–[14]. More recently, the natural graphstructure of wireless interference patterns has been leveragedin graph neural network (GNN) architectures [15]–[17] thatare more suited to scalability and transference.In this paper, we consider a wireless interference channelcomprising multiple transmitter-receiver pairs, and seek apower control policy that mitigates the interference amongconcurrent transmissions with respect to both overall systemperformance and fairness across pairs. We model the networktopology by a conﬂict graph, where each edge representsthe interference between pairs that are strong compared tothe signal power levels, while the absent edges correspondto interference links that are weak enough to be treatedas noise [18]. We then leverage the instantaneous conﬂictgraph in a GNN that outputs a power allocation decision foreach transmitter. We pose the power control problem as theoptimization of the ﬁlter weights of the GNN such that anetwork-wide convex utility function is maximized subject tosome minimum rate constraints for all receivers.Channel values in wireless networks ﬂuctuate from time totime and from topology to topology. Therefore, even for agiven density of transmitters and receivers, a ﬁxed and strictminimum rate constraint may not be satisﬁable for some of thereceivers with poor channel conditions and is hard to deﬁnea priori. Hence, we introduce a counterfactual optimizationformulation, in which an adaptive slack variable is subtractedfrom the minimum rate constraints [19]. We then utilize aprimal-dual optimization algorithm to learn optimal policiesand their associated optimal constraint slacks. We demonstratethrough simulation results how our proposed framework learnsa power control strategy that strikes a balance between thesum-rate and cell-edge performance—quantiﬁed by the th percentile rate achieved by the users. In addition, we illustratehow the algorithm adaptively tunes the slack variable, hencethe minimum rate constraints for the receivers, given thedensity of the network.The rest of this paper is organized as follows. In Section II,we present the system model and formulate the problem. InSection III, we provide the details of the GNN architecture. InSection IV, we show how counterfactual optimization adaptsthe constraints as needed. In Section V, we present our simu-lation results. Finally, we conclude the paper in Section VI. a r X i v : . [ ee ss . SP ] F e b I. S

YSTEM M ODEL AND P ROBLEM F ORMULATION

We consider a wireless interference network with a set of m transmitters { Tx i } mi =1 and a set of m receivers { Rx j } mj =1 ,where each transmitter Tx i intends to communicate to its cor-responding receiver Rx i . The channel gain between each trans-mitter Tx i and each receiver Rx j in the network is a randomvariable denoted by h ij . We collect all the channel gains acrossthe network in a square matrix, denoted by H ∈ H ⊆ C m × m .Each channel gain in H is composed of a constant long-term component, resulting from path loss and shadowing—due to signal attenuation from the physical distance betweenthe transmitter and receiver nodes, alongside deviations thanksto obstacles in the environment—and a short-term fast fadingcomponent—a result of multi-path propagation in the channeland node mobility. In general, we assume that H is drawnfrom a joint probability distribution f ( H ) .Assuming that all transmissions occur at the same time andon the same frequency band, they will cause interference oneach other. Therefore, it is imperative for each transmitterto set its transmit power such that a global network-wideobjective function is optimized. In particular, for each channelrealization H , we denote the vector of power allocationvariables by p ∈ R m , whose i th component, p i , representsthe transmit power allocated to transmitter Tx i . This impliesthat the signal-to-interference-plus-noise ratio (SINR) at eachreceiver Rx i can be written as SINR i ( H , p ) = | h ii | p i ( H ) σ + (cid:80) j (cid:54) = i | h ji | p j ( H ) , ∀ i ∈ [ m ] , (1)where σ denotes the noise variance, and [ m ] is deﬁned as [ m ] (cid:44) { , ..., m } . The Shannon capacity of the link betweentransmitter Tx i and receiver Rx i is then given by C i ( H , p ) = log (1 + SINR i ( H , p )) . (2)Due to the aforementioned short-term fading phenomenon,channel realizations vary over time, implying that the powerallocation variables also need to be modiﬁed temporally. Thismotivates considering the ergodic average x i = E [ C i ( H )] ∈ R , to capture the throughput experienced by each receiverover a long period of time. The goal is to determine apower allocation policy φ ( H , θ ) parameterized by a ﬁxedparameter vector θ ∈ R q , where, for each channel realization H , the transmit powers are determined by p = φ ( H , θ ) . Weformulate the power allocation problem to ﬁnd the parametervector θ that provides the best performing policy, i.e., max θ , x U ( x ) , (3) s . t . x i ≤ E H [ C i ( H , φ ( H , θ ))] , ∀ i ∈ [ m ] ,x i ≥ C i, min , ∀ i ∈ [ m ] , φ ( H , θ ) ∈ [0 , P max ] m . In the above optimization problem, U ( x ) denotes a convexfunction of the receivers’ ergodic rates throughout the network, C i, min denotes a minimum capacity that each receiver needsto satisfy, and P max denotes the maximum transmit power. The minimum capacity constraints are included so as toavoid allocating all resources to “cell-center” receivers, hencebalancing the power control policy to treat “cell-center” and“cell-edge” receivers fairly .The problem in (3) is generally challenging to solve, mainlydue to the non-convexity of the constraints. Moreover, asidefrom the effort in solving (3), the choice of parameterizationfunction φ is critical in achieving an optimal policy with goodpractical performance. Fully-connected deep neural networks(DNNs) are a proper choice here, due to their universalityproperty, which states that given enough depth and/or width,they have sufﬁcient expressive power to approximate any func-tion with any desired accuracy [13], [20]. However, despitetheir theoretical properties, such a parameterization does notscale well—as the parameter dimension q grows with numberof transmitter-receiver pairs m in the network—and morecritically does not generalize over varying network topologies.In the next section, we discuss and develop a graph neuralnetwork architecture suitable for solving the power allocationproblem in networks of any size.III. R ANDOM E DGE G RAPH N EURAL N ETWORKS

We present the random edge graph neural network(REGNN) architecture as a parameterization for the resourcemanagement policy. Broadly speaking, graph neural networks(GNNs) can be viewed as a generalization of convolutionalneural network (CNN) architectures, whose popularity andpractical beneﬁts stem largely from their signiﬁcantly reducedparameter dimension relative to traditional DNNs, their invari-ance to input size, and their so-called translation equivariance.Graph neural networks generalize the convolutional oper-ations performed in CNNs with a convolution performed onarbitrarily structured data [21]. This structure is given in theform of a graph G = ( V , E ) , where V := [ m ] are the nodes ofthe graph connected by weighted edges E . We further use thematrix S ∈ R m × m + as a graph shift operator, that encodes theweights of edges E . The elements S ij take on higher valueswhen node i is closely related to node j , smaller values whenthey are less related, and a value of 0 if they are unrelated.The graph convolution of input signal y ∈ R m —whose i th element y i is the signal value at transmitter Tx i —and ﬁlter α ∈ R K with respect to the graph encoded in S is a vector z ∈ R m , whose j th component is deﬁned as z j := [ α ∗ S x ] j := K (cid:88) k =0 α k [ S k y ] j . (4)Observe that the term S k shifts the elements of x in k turnsaccording to the weights and structure deﬁned in S .A GNN is constructed with a sequence of L so-called hiddenlayers, where the output of layer l − is fed as an input tolayer l . Denote by y l the input to layer l , and by α l the graphﬁlter at layer l . With shift operator S , the output of layer l ,denoted by y l +1 , is computed as a composition of the graphﬁlter α l and a pointwise, nonlinear function σ l ( · ) , i.e., y l +1 := σ l ( α l ∗ S y l ) . (5)he full GNN is then formed as the composition of layeroperations as in (5) for l ∈ [ L ] . The input to the GNN is givenas the initial graph input signal y ∈ R m , deﬁned on the nodes V . While standard applications feature a ﬁxed graph, we mayalso consider more generically an input graph S ∈ R m × m + —i.e., an input signal on the edges E . When such inputs aredrawn randomly from some distribution, this may otherwisebe considered as a graph with random edge weights.In the wireless interference network deﬁned in Section II,a graph can be readily formed using the transmitter-receiverchannel gains contained in the channel matrix H . We deﬁnethe graph S := g ( H ) , where g : R m × m + → R m × m + is somefunction that preserves the sparsity pattern and node orderingof the channel matrix H . Simple choices for g ( · ) may includeelement-wise magnitude [ g ( H )] ij := | h ij | or squared magni-tude [ g ( H )] ij := | h ij | . In this work, we use the information-theoretic optimality condition for treating interference as noise,derived in [18], to classify the interference links between allnon-associated transmitter-receiver pairs as strong or weak.In particular, we take an approach similar to [5], where foreach interference link between Tx i and Rx j , j (cid:54) = i , we deﬁneindicator variables I ij = 1 iff P max | h ij | σ ≥ M (cid:16) P max min {| h ii | , | h jj | } σ (cid:17) η , (6)where M and η are design parameters, controlling the sparsityof the graph. We then deﬁne g ( · ) as [ g ( H )] ij := I ij | h ij | ,where for each direct link between Tx i and Rx i , we set I ii = 1 .We ﬁnally normalize the resulting matrix S by its − norm.As the edge weights of S are derived from the randomchannel gain values, we consider the previously described caseof GNNs with random input graphs—called the random edgegraph neural network (REGNN)—with edges drawn from jointdistribution f ( g ( H )) [15] . The full REGNN parameterization φ ( H , θ ) of the resource management policy can then bedescribed as a GNN with a constant input y := , i.e., φ ( H , θ ) := σ L ( α L ∗ g ( H ) ( . . . ( σ ( α ∗ g ( H ) y ) . . . ))) . (7)where the parameter θ contains the L sets of ﬁlter weights, i.e., θ = { α l } Ll =1 . The ﬁnal nonlinear activation σ L can be chosento scale the output between [0 , P max ] . Note that with a ﬁlterlength of K l at the l th layer, the total number of parameters ina GNN is q = (cid:80) Ll =1 K l , a number signiﬁcantly smaller thanthat of a fully connected DNN and invariant to the size of theinput graph, i.e., number of transmitter-receiver pairs.We point out that a key feature of REGNNs that make themwell suited for learning in wireless networks lies in a structuralproperty called permutation equivariance . Permutation equiv-ariance implies that any permutation of the rows and columnsof the channel matrix H —i.e., a relabeling of the indices oftransmitter-receiver pairs in the wireless network—will resultin an equally permuted output for any REGNN φ ( H , θ ) asdeﬁned in (7)—see [15]. This property is valuable in wirelessnetworks because it can facilitate the training of a REGNNto operate over many different geometric conﬁgurations of thetransmitters and receivers in the network, which will invariably change over time in practice. We will demonstrate the effec-tiveness in using REGNNs to achieve strong performance overa wide range of network conﬁgurations in Section V.IV. C OUNTERFACTUAL O PTIMIZATION

While we may utilize the REGNN to parameterize theresource allocation policy, a training algorithm must be usedto ﬁnd the proper set of GNN ﬁlter weights θ = { α l } Ll =1 that performs well under the metrics and constraints deﬁnedin (3). Training the ﬁlter weights here is not straightforwardin that the resulting policy must not only maximize the utilityfunction U ( x ) , but also satisfy the minimum rate constraints x i ≥ C i, min . While constraints can generally be satisﬁedwith a Lagrangian dual function, this requires explicit a prioriknowledge of the minimum achievable rate C i, min . However,this is generally not known in practice, as complex interferencepatterns between concurrent transmissions in different networkdensities may make some lower bounds infeasible.We address this problem with what may be referred to as a counterfactual [19]. That is, we consider a slack term s i for the i th minimum capacity constraint, and try to ﬁnd the optimalpolicy under the loosened constraint. Any increase in slack s i will render a solution further from the intended solution of (3);as such, we further seek to minimize s i under the conditionthat the problem remains feasible. More formally, we augment(3) with the counterfactual slack variable s ∈ R m + as max θ , x , s U ( x ) − (cid:107) s (cid:107) , (8) s . t . x i ≤ E H [ C i ( H , φ ( H , θ ))] , ∀ i ∈ [ m ] ,x i ≥ C i, min − s i , ∀ i ∈ [ m ] , φ ( H , θ ) ∈ [0 , P max ] m , s ≥ . In (8), along with optimizing the REGNN parameters θ andergodic average rates x , we also minimize the value of theslack s that makes the problem feasible. Increasing s willlessen the achieved objective value in (8). However, too smalla slack may make the constraints too tight to satisfy, renderingthe problem unsolvable. The value in the counterfactual for-mulation lies in the fact that, should the preferred C i, min beunrealizable, the optimization of slack variables will implicitlyloosen this requirement just enough to ﬁnd a solution.We proceed to derive the training algorithm by introducingthe Lagrangian function, with non-negative dual multipliers λ , µ ∈ R m + associated with each constraint in (8), as L ( θ , x , s , λ , µ ) := U ( x ) − (cid:107) s (cid:107) (9) − λ T [ x − E H C ( H , φ ( H , θ ))] − µ T [ C min − s − x ] . The Lagrangian in (9) provides a single, unconstrained ob-jective function, which we can optimize using gradient-basedmethods. In particular, we seek to maximize over the so-calledprimal variables θ , x , s , while subsequently minimizing overthe dual variables λ , µ , i.e. min λ , µ ≥ max θ , x , s L ( θ , x , s , λ , µ ) . (10)e can now deﬁne the updates over an iteration index k foreach primal and dual variable by either adding or subtractingthe partial gradient of L ( θ , x , s , λ , µ ) with respect to thatvariable. For the primal variables, this gives us the updates, θ k +1 = θ k + γ ∇ θ E H C ( H , φ ( H , θ k )) λ k , (11) x k +1 = x k + γ ( ∇U ( x k ) − λ k + µ k ) , (12) s k +1 = [ s k + γ ( µ k − s k )] + , (13)where γ , γ , γ > denote learning rates corresponding tothe primal variables. Note that in addition to updating θ k and x k in (11)-(12), the counterfactual formulation updates theslack variable s k as the difference between the current slackand dual variables. Likewise, we descend on the dual variablesusing the associated partial gradients of the Lagrangian, i.e., λ k +1 = λ k − γ ( E H C ( H , φ ( H , θ k )) − x k ) , (14) µ k +1 = [ µ k − γ ( x k + s k − C min )] + , (15)with γ , γ > representing learning rates correspondingto the dual variables. The primal-dual gradient updates in(11)-(15) successively move the primal and dual variablestowards maximum and minimum points of the Lagrangian dualfunction, respectively. The complete counterfactual primal-dual learning algorithm is summarized in Algorithm 1. Remark 1

Our proposed method is unsupervised in the sensethat we train the REGNN weights to optimize the utility andconstraints in (8) directly rather than with labeled solutions.Therefore, this algorithm can be applied to all different typesof radio resource management problems, whose objectives andconstraints can be formulated as in (8), without the need tohave any optimal solutions beforehand.

Remark 2

We point out that evaluating the updates in (11)-(15) may require computing potentially challenging gradi-ents and expectations. The gradients in these updates canbe replaced with well-known model-free gradient estimationmethods that can be obtained with function evaluations andchannel sampling—see [13] for details on these approaches.V. S

IMULATION R ESULTS

We consider wireless networks with m ∈ { , , , , } transmitter-receiver pairs, dropped randomly within a squarearea of side length 500m. We drop the transmitters uniformlyat random within the network area, and ensure a minimumpairwise distance of 35m between them. Afterwards, for eachtransmitter, a receiver is dropped within an annulus centeredat the transmitter, with inner and outer radii of 10m and 100mrespectively, according to a skewed distribution that biases thereceiver’s location towards its serving transmitter. Each drop isthen run for 200 steps. The long-term channel model consistsof a standard dual-slope path-loss model [22], [23] and log-normal shadowing with 7 dB standard deviation. We alsomodel short-term Rayleigh fading using the sum of sinusoids(SoS) technique proposed in [24]. The bandwidth is taken tobe 10 MHz, the noise power spectral density is assumed to Algorithm 1: Counterfactual Primal-Dual Learning Parameters:

REGNN model (e.g., ﬁlter lengths { K l } Ll =1 ) Input:

Initial values θ , x , λ , µ , s = for k = 0 , , , . . . do Update primal variables [cf. (11)-(12)] θ k +1 = θ k + γ ∇ θ E H C ( H , φ ( H , θ k )) λ k , x k +1 = x k + γ ( ∇U ( x k ) − λ k + µ k ) . Update slack variable [cf. (13)] s k +1 = [ s k + γ ( µ k − s k )] + . Update dual variables [cf. (14)-(15)] λ k +1 = λ k − γ ( E H C ( H , φ ( H , θ k )) − x k ) , µ k +1 = [ µ k − γ ( x k + s k − C min )] + . end for be -174 dBm/Hz, and the maximum transmit power is takento be P max = 10 dBm. We utilize a sum-rate network utilityfunction U ( x ) = (cid:80) mi =1 x i , and we set the minimum capacityto C i, min = 2 bps/Hz for all receivers.As for the learning parameters, we consider a GNN ar-chitecture with 3 hidden layers, each containing 4 featuresand a ﬁlter of size 4, and a ReLU activation function. Weuse M = 1 and η = 0 . to build the underlying GNNgraph as in (6). We consider a batch size of 200 consecutivesamples within each drop. The learning rates for the primal,dual, and slack variables are set to × − , − , and − , respectively. We consider a unique slack variable forall the minimum-rate constraints. We also restrict the powerallocation decisions to be binary, i.e., each transmitter atany step either remains silent, or transmits with full power.We perform training on 4000 random and independent drops(realizations of transmitter/receiver locations), and we thenconduct a ﬁnal test on a set of 500 test drops.Figure 1 illustrates how the slack variable evolves during thecourse of training for different network densities. Increasingthe number of transmitter-receiver pairs gives rise to higherlevels of interference and lowers the average achievable rateof each receiver, making it harder to satisfy its minimumrate constraint. Therefore, as Figure 1 shows, our proposedalgorithm indeed learns how to adaptively elevate the slackvariable for denser deployments so as to make the optimizationproblem feasible and maximize the desired utility function.Moreover, Figure 2 shows the achievable sum-rates and th percentile rates of our proposed algorithm for differentnumbers of transmitter-receiver pairs, as compared with twobaselines of time division multiplexing or TDM (transmittersactivated in a round-robin fashion), and weighted minimummean-squared error or

WMMSE [2]. As the ﬁgure shows,these two baselines represent two ends of a spectrum: TDMis completely fair across all pairs in the network, henceachieving excellent th percentile rate performance, at theexpense of poor sum-rate. WMMSE, on the other hand, merelyoptimizes sum-rate, hence sacriﬁcing most of the pairs that areexperiencing poor channel conditions. Our proposed method,

500 1000 1500 2000 2500 3000 3500 4000Training drops0.000.250.500.751.001.251.501.752.00 S l a ck m=6m=8m=10m=12m=14 Fig. 1: Evolution of the slack variable during training for networkswith − transmitter-receiver pairs.

15 20 25 30 35 40 45 50Sum-rate (bps/Hz)0.00.20.40.60.81.01.21.41.6 t h pe r c en t il e r a t e ( bp s / H z ) m=6 m=6m=6m=14 m=14m=14 CF-GNNTDMWMMSE Fig. 2: The trade-off between achievable sum-rate and th percentilerate by our proposed algorithm (CF-GNN) and the baseline algo-rithms for − transmitter-receiver pairs. however, demonstrates a superior trade-off between sum-rate and th percentile rate, balancing the rates experiencedby “cell-center” and “cell-edge” receivers. In particular, itachieves sum-rate gains of up to 110% and th percentile rategains of up to 2740% over TDM and WMMSE, respectively.VI. C ONCLUDING R EMARKS

In this paper, we considered the problem of downlink powercontrol in wireless networks with multiple transmitter-receiverpairs. We parametrized the power control policy as a graphneural network, whose edge weights are derived from thechannel gains between the transmitters and receivers. We thenproposed a primal-dual gradient-based optimization algorithmbased on counterfactuals, which learns a power control policythat maximizes a convex network utility function with adaptiveminimum rate constraints tuned to the actual network condi-tions. Simulation results show the superiority of our proposedalgorithm compared to baseline methods in terms of the trade-off between average and th percentile user rates.R EFERENCES[1] R. Madan, J. Borran, A. Sampath, N. Bhushan, A. Khandekar, and T. Ji,“Cell association and interference coordination in heterogeneous LTE-A cellular networks,”

IEEE Journal on Selected Areas in Communications ,vol. 28, no. 9, pp. 1479–1489, 2010.[2] Q. Shi, M. Razaviyayn, Z.-Q. Luo, and C. He, “An iteratively weightedMMSE approach to distributed sum-utility maximization for a MIMOinterfering broadcast channel,”

IEEE Transactions on Signal Processing ,vol. 59, no. 9, pp. 4331–4340, 2011.[3] W. Yu, T. Kwon, and C. Shin, “Multicell coordination via joint schedul-ing, beamforming, and power spectrum adaptation,”

IEEE Transactionson Wireless Communications , vol. 12, no. 7, pp. 1–14, 2013.[4] X. Wu, S. Tavildar, S. Shakkottai, T. Richardson, J. Li, R. Laroia, andA. Jovicic, “FlashLinQ: A synchronous distributed scheduler for peer-to-peer ad hoc networks,”

IEEE/ACM Transactions on Networking , vol. 21,no. 4, pp. 1215–1228, 2013.[5] N. Naderializadeh and A. S. Avestimehr, “ITLinQ: A new approachfor spectrum sharing in device-to-device communication systems,”

IEEEjournal on Selected Areas in Communications , vol. 32, no. 6, pp. 1139–1151, 2014.[6] X. Yi and G. Caire, “ITLinQ+: An improved spectrum sharing mech-anism for device-to-device communications,” in . IEEE, 2015, pp. 1310–1314.[7] L. Song, Y. Li, and Z. Han, “Game-theoretic resource allocation forfull-duplex communications,”

IEEE Wireless Communications , vol. 23,no. 3, pp. 50–56, 2016.[8] K. Shen and W. Yu, “FPLinQ: A cooperative spectrum sharing strat-egy for device-to-device communications,” in . IEEE, 2017, pp. 2323–2327.[9] Z.-Q. Luo and S. Zhang, “Dynamic spectrum management: Complexityand duality,”

IEEE Journal of Selected Topics in Signal Processing ,vol. 2, no. 1, pp. 57–73, 2008.[10] Y.-F. Liu and Y.-H. Dai, “On the complexity of joint subcarrier andpower allocation for multi-user OFDMA systems,”

IEEE Transactionson Signal Processing , vol. 62, no. 3, pp. 583–596, 2013.[11] H. Lee, S. H. Lee, and T. Q. Quek, “Deep learning for distributedoptimization: Applications to wireless resource management,”

IEEEJournal on Selected Areas in Communications , vol. 37, no. 10, pp. 2251–2266, 2019.[12] L. Liang, H. Ye, G. Yu, and G. Y. Li, “Deep-learning-based wirelessresource allocation with application to vehicular networks,”

Proceedingsof the IEEE , 2019.[13] M. Eisen, C. Zhang, L. F. Chamon, D. D. Lee, and A. Ribeiro, “Learningoptimal resource allocations in wireless systems,”

IEEE Transactions onSignal Processing , vol. 67, no. 10, pp. 2775–2790, 2019.[14] N. Naderializadeh, J. Sydir, M. Simsek, H. Nikopour, and S. Talwar,“When multiple agents learn to schedule: A distributed radio resourcemanagement framework,” arXiv preprint arXiv:1906.08792 , 2019.[15] M. Eisen and A. Ribeiro, “Optimal wireless resource allocation withrandom edge graph neural networks,” arXiv preprint arXiv:1909.01865 ,2019.[16] M. Lee, G. Yu, and G. Y. Li, “Graph embedding based wireless linkscheduling with few training samples,” arXiv preprint arXiv:1906.02871 ,2019.[17] Y. Shen, Y. Shi, J. Zhang, and K. B. Letaief, “A graph neural net-work approach for scalable wireless power control,” arXiv preprintarXiv:1907.08487 , 2019.[18] C. Geng, N. Naderializadeh, A. S. Avestimehr, and S. A. Jafar, “Onthe optimality of treating interference as noise,”

IEEE Transactions onInformation Theory , vol. 61, no. 4, pp. 1753–1767, 2015.[19] L. F. Chamon, S. Paternain, and A. Ribeiro, “Counterfactual program-ming for optimal control,” arXiv preprint arXiv:2001.11116 , 2020.[20] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforwardnetworks are universal approximators,”

Neural Networks , vol. 2, no. 5,pp. 359–366, 1989.[21] M. Henaff, J. Bruna, and Y. LeCun, “Deep convolutional networks ongraph-structured data,” arXiv preprint arXiv:1506.05163 , 2015.[22] X. Zhang and J. G. Andrews, “Downlink cellular network analysis withmulti-slope path loss models,”

IEEE Transactions on Communications ,vol. 63, no. 5, pp. 1881–1894, 2015.[23] J. G. Andrews, X. Zhang, G. D. Durgin, and A. K. Gupta, “Are weapproaching the fundamental limits of wireless network densiﬁcation?”

IEEE Communications Magazine , vol. 54, no. 10, pp. 184–190, 2016.[24] Y. Li and X. Huang, “The simulation of independent Rayleigh faders,”