[PDF] Distributed Spectrum and Power Allocation for D2D-U Networks: A Scheme based on NN and Federated Learning

Abstract

In this paper, a Device-to-Device communication on unlicensed bands (D2D-U) enabled network is studied. To improve the spectrum efficiency (SE) on the unlicensed bands and fit its distributed structure while ensuring the fairness among D2D-U links and the harmonious coexistence with WiFi networks, a distributed joint power and spectrum scheme is proposed. In particular, a parameter, named as price, is defined, which is updated at each D2D-U pair by a online trained Neural network (NN) according to the channel state and traffic load. In addition, the parameters used in the NN are updated by two ways, unsupervised self-iteration and federated learning, to guarantee the fairness and harmonious coexistence. Then, a non-convex optimization problem with respect to the spectrum and power is formulated and solved on each D2D-U link to maximize its own data rate. Numerical simulation results are demonstrated to verify the effectiveness of the proposed scheme.

Full PDF

NNoname manuscript No. (will be inserted by the editor)

Distributed Spectrum and Power Allocation for D2D-UNetworks: A Scheme based on NN and Federated Learning

Rui Yin · Zhiqun Zou · Celimuge Wu · Jiantao Yuan · Xianfu Chen

Received: date / Accepted: date

Abstract

In this paper, a

Device-to-Device communi-cation on unlicensed bands (D2D-U) enabled network isstudied. To improve the spectrum eﬃciency (SE) on theunlicensed bands and ﬁt its distributed structure whileensuring the fairness among D2D-U links and the har-monious coexistence with WiFi networks, a distributedjoint power and spectrum scheme is proposed. In partic-ular, a parameter, named as price, is deﬁned, which isupdated at each D2D-U pair by a online trained

Neu-ral network (NN) according to the channel state andtraﬃc load. In addition, the parameters used in the NNare updated by two ways, unsupervised self-iterationand federated learning, to guarantee the fairness andharmonious coexistence. Then, a non-convex optimiza-tion problem with respect to the spectrum and poweris formulated and solved on each D2D-U link to maxi-mize its own data rate. Numerical simulation results are

The article is an extended version of MONAMI 2020 confer-ence paper [1]Rui YinSchool of Information and Electrical Engineering, ZhejiangUniversity City College, Hangzhou 310015, ChinaE-mail: [email protected] ZouCollege of Information Science and Electrical Engineering,Zhejiang University, Hangzhou, ChinaCelimuge Wu (Corresponding author)Graduate School of Informatics and Engineering, The Uni-versity of Electro-Communications,1-5-1, Chofugaoka, Chofu-shi, Tokyo 182-8585, JapanJiantao YuanInstitute of Ocean Sensing and Networking of the Ocean Col-lege, Zhejiang University, Hangzhou, ChinaXianfu ChenVTT Technical Research Centre of Finland, Finland demonstrated to verify the eﬀectiveness of the proposedscheme.

Keywords

D2D-U · Resource Allocation · PriceModel · Neural Network · Federated Learning

The large-scale commercialization of the ﬁfth genera-tion (5G) mobile networks has brought us better com-munication experience with low latency and high datarate As a key technology in 5G networks, device-to-device (D2D) communication allows direct transmis-sion between D2D terminals instead of relaying throughthe base stations, which improves both system spectrumeﬃciency (SE), energy eﬃciency (EE) and quality-of-service (QoS) of D2D pairs [2].The conventional D2D communication mainly reusesthe licensed channels with long-term evolution (LTE)cellular networks to increase system SE and EE in thelicensed spectrum [3]. However, the licensed spectrum isbasically managed by mobile communication operatorsand expensive. In addition, with the explosive growth ofthe number of smart terminals, the spectrum resourceson the licensed bands are becoming more scarce andD2D communications may cause severe interference tothe original cellular users. In order to guarantee thetransmission performance of the cellular users as wellas improve the QoS of D2D users,

Device-to-Device onunlicensed bands (D2D-U) is proposed to enable D2Dcommunication on unlicensed spectrum [4]. As the spec-trum resources of unlicensed bands are abundant andfree to use, D2D-U may signiﬁcantly increase the SEand EE of D2D systems while guaranteeing the QoS ofcellular users [5]. a r X i v : . [ c s . I T ] F e b Rui Yin et al.

Most existing works have studied on the mode se-lection, power and spectrum allocation mechanisms forD2D enabled cellular networks. In [6], the impact ofmode selection on eﬀective capacity has been investi-gated via the Markov service process model. Authorsin [7] have proposed a centralized optimal mode se-lection and resource allocation for D2D enabled cel-lular networks. A distributed joint spectrum sharingand power allocation problem has been modeled as anon-convex optimization problem in [8] and the sub-optimal solution is obtained by convex approximationtechniques. Similar problem is solved by a price-basedmodel in [9], where a game-theoretic approach is pro-posed to mitigate interference among D2D pairs in adistributed way. Many machine learning-based meth-ods have also been used to solve related problems inrecent years. The authors of [10] have designed a trans-mit power control strategy to D2D pairs based on a deep neural network (DNN) structure, where the SEand interference are taken into account. A deep rein-forcement learning-based method is utilized in [11] tomaximize the sum rates of D2D links.Recently, long-term evolution on unlicensed bands (LTE-U) system is introduced into the unlicensed spec-trum. listen-before-talk (LBT) and duty cycle method (DCM) access mechanisms have been proposed for LTE-U based cellular users to access the unlicensed spectrumwhile ensuring the fair coexistence with WiFi networks[12,13,14,15,16]. In [17], the back-oﬀ window size basedon LBT mechanism is adaptively adjusted according tothe WiFi traﬃc load and available bandwidth on li-censed spectrum, which improves the system spectrumeﬃciency. A Q-learning based scheme has been also pro-posed to adjust the back oﬀ window size of LBT in [18].The performance of DCM mechanism has been ana-lyzed in [19,20], where the reinforcement learning basedmethods have been employed to achieve fair coexistencebetween LTE-U and WiFi networks. A hybrid mecha-nism has been designed in [21], both LBT and DCMhave been utilized and the ﬂexible handoﬀ between twomechanisms is achieved to meet fairness constraint.Only a small amount of work has focused on D2Dtransmission on unlicensed bands. The conclusion of[22] has proven that D2D-U can signiﬁcantly mitigatethe congestion, conﬂicts and improve the system through-put. In [23], the sub-channel allocation of D2D-U en-abled LTE cellular networks has been formulated as amany-to-many matching problem with externality andan iterative sub-channel swap algorithm has been pro-posed to improve the system performance. A reinforce-ment learning based scheme is proposed in [24], wherea deep Q-network has been utilized to learn the traﬃcload on the unlicensed spectrum. It allows D2D-U sys- tem to model the joint allocation problem as a convexoptimization problem. A decentralized joint spectrumand power allocation approach for D2D-U has been pro-posed in [25], which can meet the global minimizationof power consumption among the D2D-U pairs.After thorough investigation, most above mentionedpower and spectrum allocation schemes are centralized,which may bring large signaling overhead to the basestation and lead to high latency. Besides, most of thework concentrates only on maximizing system through-put without considering the diﬀerent traﬃc loads of dif-ferent D2D pairs. In this paper, we ﬁrst deﬁne the un-licensed channel traﬃc load according to the numberof competing WiFi users when DCM scheme is appliedat D2D-U network. A price based model where D2D-Uusers need to pay for the channel resources is then builtand a

Neural network (NN) is applied to estimate theprice to use unlicensed spectrum at each D2D-U linkadaptively. In order to guarantee the harmonious co-existence with WiFi networks and the fairness amongD2D-U pairs, the loss function is designed speciﬁcallyto realize the online unsupervised learning of NN.However, the training of NN in a distributed systemis always unstable. If individual D2D-U links experienceexcessive noise or over ﬁtting problem, it will cause se-vere interference to the performance of the whole sys-tem. In addition, when new D2D-U links join the net-work, the randomly initialized NN parameters will alsocause serious ﬂuctuations to the system which has con-verged. To solve above problem, the federated learn-ing method [26] is utilized to update the parametersof the networks. In the scenario of federated learning,distributed terminals train NNs themselves, and thenintegrate gradients or parameters at the center basestation periodically [27]. In our model, the integratedparameters can help correct those networks with largedeviations and also initialize parameters for new D2D-U users. Afterwards, with the corresponding prices ofchannels, the spectrum and power allocation can be op-timized jointly to maximize the transmission rates ateach D2D-U pair.The main contributions of the paper are summa-rized as follows.1. A DCM based channel access model is built forD2D-U networks to share the unlicensed spectrumwith WiFi networks. According to the carrier sens-ing multiple access with collision avoidance (CSMA/CA)mechanism adopted in WiFi, a novel traﬃc load onthe unlicensed channel is deﬁned.2. To balance the SE and traﬃc load of D2D-U pairswhile ensuring the fairness, a virtual variable, namedas price, is deﬁned, which is related to the traﬃcload, channel state of D2D-U links and the total istributed Spectrum and Power Allocation for D2D-U Networks: A Scheme based on NN and Federated Learning 3 traﬃc load on the unlicensed channels. With theprice, the transmission power and unlicensed spec-trum allocation can be optimized jointly in a dis-tributed way at each D2D-U pair.3. Since it is hard to formulate an explicit function tomodel the relationship between the price, the traf-ﬁc load and channel state information, an onlinetrained NN with a speciﬁc loss function is appliedto update the price at each D2D-U pair adaptively.The parameters of NN are updated via unsuper-vised self-iteration as well as federated learning tostabilize the system performance.4. The centralized optimal solution is presented forcomparison. Moreover, the simulation results are pro-vided to verify the eﬀectiveness of the scheme andthe theoretical analysis.The rest of this paper is organized as follows. Sec-tion 2 introduces the system model and a novel deﬁ-nition of WiFi traﬃc load on unlicensed channels. Theprice based learning model is proposed in Section 3 andSection 4, respectively. We analyze the simulation re-sults in Section 5 and summarize the paper in Section 6

In this paper, we study the scenario where multipleD2D-U links share the unlicensed channels with WiFi

Access points (APs), as shown in Fig. 1, where D2D-U links are able to simultaneously use multiple unli-censed channels and a single unlicensed channel can beshared by more than one D2D-U pair.

Macro base sta-tion (MBS) can obtain the information on the achiev-able data rates of each D2D-U pair and the parametersof NNs via the control channel on licensed bands. It isworth mentioning that, in order to consider more real-istic scenarios, the number of D2D-U links is not ﬁxed,which means that D2D-U pairs may leave and join thenetwork dynamically. To model the system mathemat-ically, we use set D = { d , d , ..., d N − } to demonstrate N D2D-U links in the coverage of the MBS and N isnot ﬁxed. Moreover, there are M accessible unlicensedchannels, denoted by set, U , U = { u , u , ..., u M − } ,which are orthogonal with each other. To consider thefairness among D2D-U links and the harmonious co-existence with WiFi networks, L D = { l D , l D , ..., l DN − } and L U = { l U , l U , ..., l UM − } are used to denote the traf-ﬁc loads of D2D-U links and WiFi systems on the un-licensed channels, respectively. In addition, WiFi APsadopt the CSMA/CA based distributed coordination func-tion (DCF) to access the unlicensed channels while DCMmechanism is applied at D2D-U links to access the un-licensed channels. Fig. 1

System model.

Fig. 2

DCM mechanism. θ i,j , θ i,j ∈ [0 , d i on unlicensedchannel u j . Then, the achievable data-rate at d i on un-licensed channels can be calculated by R i = M − (cid:88) j =0 θ i,j B j log (cid:18) p i,j h i,j N B j (cid:19) , (1)where B j is the bandwidth of unlicensed channel u j , h i,j and p i,j are the channel power gain and correspondingtransmission power of D2D-U pair d i on u j , respec-tively. N is the noise power spectrum on unlicensedchannel, which is ﬁxed in the manuscript.2.2 WiFi traﬃc load deﬁnitionTo ensure the transmission requirements of WiFi users,D2D-U links must decide the duration of ‘on period’based on the WiFi traﬃc load on the unlicensed chan-nels, which means that D2D-U links need to obtainWiFi traﬃc load before accessing the channels. Theconventional traﬃc load of WiFi networks is mainly Rui Yin et al.

The number of WiFi users T o t a l t h r o u g h p u t G=32, m=3G=32, m=5

Fig. 3

Relationship between throughput and number ofWiFi users. decided by the number of competing WiFi users. Asdemonstrated in [28,29], the extended Kalman ﬁlterhas been used to achieve an accurate estimation on thenumber of active WiFi users, where D2D-U links ﬁl-ter the transmission collision probability in the channeland then obtain the number of WiFi users based onthe estimated transmission collision probability. How-ever, the impact of the number of WiFi users on thethroughput of the WiFi system is non-linear and D2D-U links cannot directly determine the duration of ‘onperiod’ based on the number of WiFi users. Therefore,a novel WiFi traﬃc load deﬁnition is ﬁrst deﬁned whenthe DCM mechanism is employed at the D2D-U pairs.As WiFi APs adopt binary slotted exponential back-oﬀ scheme in DCF, the relationship between the to-tal WiFi throughput on an unlicensed channel and thenumber of WiFi users could be obtained according to[31], as illustrated in Fig 3. The size of back-oﬀ con-tention window, denoted as G , is 32 and the maximumback-oﬀ contention stage, denoted as m , is set to 3 and5, respectively. We can observe that, as the number ofWiFi users increases, the achievable throughput on theunlicensed channel increases ﬁrst and then decreases.The reason is that when a large number of WiFi userscompete for the same unlicensed channel, the trans-mission collision probability will increase, resulting intransmission failure and lower throughput.Herein, we use the number of WiFi users corre-sponding to the highest throughput, n max k , to representthe maximum load that the WiFi network can han-dle on unlicensed channel u k . If the number of WiFiusers in the unlicensed channel is greater or equal to n max k , the channel u k is considered inaccessible to D2D-U pairs. Furthermore, in order to deﬁne the WiFi traﬃcload to ﬁt the DCM mechanism, when the through-put of the WiFi network reaches maximum, the av-erage throughput of each WiFi user, ˆ r max k , is treated The number of WiFi users A v e r a g e t h r o u g h p u t o f W i F i u s e r s G=32, m=3̂r ′k ̂r maxk Fig. 4

Relationship between ˆ r (cid:48) k and ˆ r max k . as the basic throughput guarantee of WiFi users. Let R max k denote the maximum system throughput on un-licensed channel u k . Then, ˆ r max k can be calculated asˆ r max k = R max k n max k . The basic throughput guarantee meansthat when D2D-U pairs reuse the unlicensed channelsbased on DCM mechanism, the average throughput ofWiFi users should not be less than ˆ r max k .Then we can calculate the minimum value of the ‘oﬀperiod’ based on the above description. For unlicensedchannel u k ∈ U , let ˆ r k represent the average throughputof each WiFi user when no D2D-U links use u k . On theother hand, when the number of WiFi users is less than n max k , D2D-U pairs are allowed to use u k with DCMmechanism and the average throughput of WiFi usersis given byˆ r (cid:48) k = ˆ r k (1 − N − (cid:88) i =0 θ i,k ) . (2)According to the basic throughput guarantee, we canfurther achieveˆ r max k = R max k n max k ≤ ˆ r (cid:48) k . (3)The relation ship between ˆ r (cid:48) k and ˆ r max k is shown in Fig 4,when ˆ r (cid:48) k locates on the left side of ˆ r max k , u k is avail-able to D2D-U users. In order to adapt to DCM accessmechanism, the WiFi traﬃc load l Uk on u k is deﬁned asthe minimum ‘oﬀ period’ duration that meets the ba-sic throughput guarantee of WiFi users. Combining (2)and (3), l Uk can be given by l Uk = ˆ r max k ˆ r k ≤ − N − (cid:88) i =0 θ i,k . (4)Since both ˆ r max k and ˆ r k can be calculated accordingto the physical layer parameters of the WiFi networks istributed Spectrum and Power Allocation for D2D-U Networks: A Scheme based on NN and Federated Learning 5 [31] and the number of WiFi users can be accurately es-timated by extended Kalman ﬁlter based methods [28,29] and learning based method [30], D2D-U links areable to sense the traﬃc load in the unlicensed channeland then decide their own resource allocation scheme.In next section, the resource allocation model is built forD2D-U links and a priced-based solution is proposed. In this section, we ﬁrst formulate a distributed opti-mization problem for each D2D-U link to maximize itsown data rate. Then, in order to ensure fairness amongD2D links, a priced-based solution is proposed to pro-vide D2D-U links with diﬀerent prices for using unli-censed spectrum under diﬀerent traﬃc load and chan-nel state conditions.3.1 Problem formulationFor a single D2D link, d i ∈ D , to maximize its transmis-sion rates while guaranteeing the performance of WiFinetworks, an optimization problem can be formulatedasmax { θ i,j ,p i,j } { R i } , (5)subject to C θ i,j ≤ − l Uj , ∀ j ∈ [0 , M − , (5a) C M − (cid:88) j =0 θ i,j p i,j ≤ p c , (5b) C θ i,j p i,j ≤ p u , ∀ j ∈ [0 , M − , (5c)where C C d i and C η i,j = θ i,j p i,j is introduced and (1) can be re-expressed as R i = M − (cid:88) j =0 θ i,j B j log (cid:18) η i,j h i,j N B j θ i,j (cid:19) , (6)then problem (5) is converted intomax { θ i,j ,η i,j } { R i } , (7) subject to C θ i,j ≤ − l Uj , ∀ j ∈ [0 , M − , (7a) C M − (cid:88) j =0 η i,j ≤ p c , (7b) C η i,j ≤ p u , ∀ j ∈ [0 , M − . (7c)Problem (7) is a convex optimization problem and canbe solved by Lagrangian multiplier method. However,optimization problem (7) can only allow a D2D-U linkto maximize its own throughput under the constraintof guaranteeing the fair coexistence with WiFi networkswithout considering the impact on other D2D-U pairs.When multiple D2D-U links share the same unlicensedchannel, the possibility of transmission collision is ex-tremely high, which leads to the lose on the perfor-mance of the D2D-U transmission. Therefore, the modelneeds to be improved based on the respective traﬃcload conditions of D2D-U links, where D2D-U linkswith heavy transmission tasks could use more spec-trum resources while D2D-U links with light transmis-sion tasks require only a small fraction of unlicensedspectrum resources. In next subsection, a priced-basedsolution is applied to achieve this goal.3.2 Priced-based solutionDiﬀerent from the Stackelberg game based method in[9,33,34] to minimize users’ overhead, our method ad-justs the price when assets are limited. In the proposedprice-based model, each D2D-U link is considered as aconsumer and spectrum resources in unlicensed bandsare provided to consumers as commodities. The totalmoney which each D2D-U link has are set to the same,which is represented by C . Deﬁne the price correspond-ing to the unlicensed channel u j for the D2D-U link d i as c i,j and when d i transmits data on u j , the price d i needs to pay is written as θ i,j × c i,j . Accordingly,the optimization problem(5) can be expressed with anextra fairness constraint asmax { θ i,j ,p i,j } { R i } , (8)subject to C θ i,j ≤ − l j , ∀ j ∈ [0 , M − , (8a) C M − (cid:88) j =0 θ i,j p i,j ≤ p c , (8b) C θ i,j p i,j ≤ p u , ∀ j ∈ [0 , M − , (8c) C M − (cid:88) j =0 θ i,j c i,j ≤ C. (8d) Rui Yin et al.

The above problem is also a non-convex optimizationproblem and can not be solved in its current formation.Same as in Problem (5), replacing p i,j with p i,j = η i,j θ i,j and the above problem is converted intomax { θ i,j ,η i,j } { R i } , (9)subject to C θ i,j ≤ − l j , ∀ j ∈ [0 , M − , (9a) C M − (cid:88) j =0 η i,j ≤ p c , (9b) C η i,j ≤ p u , ∀ j ∈ [0 , M − , (9c) C M − (cid:88) j =0 θ i,j c i,j ≤ C. (9d)Herein, one important key to solve 9 is to ﬁnd c i,j . If c i,j is known, the above problem is a convex optimizationproblem and the optimal solution can be obtained basedon the Lagrangian multiplier method. The Lagrangianfunction of Problem 9 is constructed as L ( θ i,j , η i,j , µ (1) j , µ (2) , µ (3) j , µ (4) ) = − R i + M − (cid:88) j =0 µ (1) j ( θ i,j + l j −

1) + µ (2) ( M − (cid:88) j =0 η i,j − C )+ M − (cid:88) j =0 µ (3) j ( η i,j − p u ) + µ (4) ( M − (cid:88) j =0 θ i,j c i,j − C ) , (10)where µ (1) j , µ (2) , µ (3) j and µ (4) are the Lagrangian multi-pliers and the Karush-Kuhn-Tucker (KKT) conditionsof L ( · ) are derived based on (10) as ∂L∂θ i,j = 0 , ∀ j ∈ [0 , M − , (11) ∂L∂η i,j = 0 , ∀ j ∈ [0 , M − , (12) µ (1) j ( θ i,j + l j −

1) = 0 , ∀ j ∈ [0 , M − , (13) µ (2) ( M − (cid:88) j =0 η i,j − C ) = 0 , (14) µ (3) j ( η i,j − p u ) = 0 , ∀ j ∈ [0 , M − , (15) µ (4) ( M − (cid:88) j =0 θ i,j c i,j − C ) = 0 , (16) µ (1) j ≥ , µ (2) ≥ , µ (3) j ≥ , µ (4) ≥ , ∀ j ∈ [0 , M − . (17)On the basis of KKT conditions, the optimal solutionsof θ i,j and η i,j should satisfy η i,j = θ i,j B j ( log eµ (2) + µ (3) j − N h i,j ) , (18)andlog(1+ η i,j h i,j N B j θ i,j ) − η i,j h i,j log eN B j θ i,j + η i,j h i,j = µ (1) j + µ (4) c i,j B j . (19)Then, according to (18) and (19), η i,j and θ i,j can beachieved for diﬀerent d i and u j . Since (19) is a tran-scendental equation, numerical method can be appliedto ﬁnd the solution.Based on above analysis, θ i,j and p i,j can be ob-tained with the known price, c i,j , which can be usedto ensure the fairness among D2D-U pairs. To adjustprices adaptively to reach the fairness, each D2D-U pairneeds to determine the c i,j based on its own traﬃc load,channel state information and the WiFi traﬃc load onthe unlicensed channel. Accordingly, we denote c i,j as c i,j = F ( l Di , l Uj , h i,j | s c ) , (20)where l Di represents the transmission task of d i , h i,j represents the channel power gain on unlicensed chan-nel u j . The function F ( · ) is to model the relationshipbetween the price and the traﬃc loads of the D2D linksand the unlicensed channels. When the traﬃc load ofD2D-U link, d i , is heavy, d i is encouraged to use un-licensed spectrum with a low price while the price toD2D-U link with less traﬃc load is high; Moreover, un-licensed channels with low traﬃc loads or strong chan-nel power gain for the D2D link will need to be paidwith low prices while under channels with high traﬃcloads or poor channel conditions will be expensive. Inaddition, to mitigate the channel access conﬂict amongD2D-U links, a feedback signal s c is set on d i . If d i col-lides with other D2D-U links on the channel, s c is ac-tivated and the price should be enhanced accordingly.Therefore, function F ( · ) should have the following char-acteristics:(1). F ( · ) should decrease monotonically with respect to l Di ;(2). F ( · ) should increase monotonically with respect to l Uj ;(3). F ( · ) should decrease monotonically with respect to h i,j ; istributed Spectrum and Power Allocation for D2D-U Networks: A Scheme based on NN and Federated Learning 7 Fig. 5

The structure of NN. (4). F ( · ) should increase with the activation of s c .As for the fairness among D2D-U links, we deﬁne Expected transmission time (ETT) as the ratio of aD2D-U link’s traﬃc load to its achievable data rates.Then, the fairness sharing on unlicensed channels amongD2D-U links is denoted as that the ETT values of allD2D-U links are equal, which is written as l D R = l D R = · · · = l DM − R M − . (21)However, it is diﬃcult to directly build an explicitmathematical model to formulate the function F ( · ) andachieve (21). To address this issue, an online training neural network (NN) architecture is exploited to im-plement function of F ( · ) and the loss function for allD2D-U links are provided based on s c and the assis-tance of MBS. Speciﬁc details of the adopted NN willbe given in the next section. Because of the strong ﬁtting performance and robust-ness of NN, the online trained NN is utilized to achieveadaptive adjustment of the prices in a dynamic D2D-Uenvironment, where the parameters of NN are updatedaccording to the loss function as well as the federatedlearning. The structure of the whole distributed pricesystem is illustrated in Fig. 5 where each D2D-U pairholds a NN itself and determines its own prices of dif-ferent channels. In this section, the structure and the loss function of NN will be ﬁrst introduced to train NNin an unsupervised online mode, then to improve thestability of the system and better accept new D2D-Uusers, the federated learning based mode will be pro-posed to adjust parameters in the networks.4.1 Unsupervised online training methodAs demonstrated in Fig. 5, the output of NN in D2D-Ulink d i can be calculate asˆ c i,j = ˆ F ( l Di , l Uj , h i,j | φ ) , (22)where ˆ c i,j is the output as well as the price estimated byNN, the input of NN is [ l Di , l Uj , h i,j ]. In particular, be-fore [ l Di , l Uj , h i,j ] is feeded into NN, the input data needsto be normalized to avoid problems caused by diﬀerentorders of magnitude. ˆ F ( · ) indicates the forward propa-gation of NN and φ represents all the weights and biasparameters. At each iteration, all the parameters in NNare updated based on the gradient descent algorithm,which is denoted by φ = φ − α ∂Q∂φ , (23)where α is the learning rate of NN and Q is the lossfunction. Since it is hard to achieve global optimal so-lution of problem (8), we cannot obtain labels and usethe supervised learning method to train the network.Therefore, based on (21) and the collision detection,the loss function Q is formulated by two parts to trainNN in an unsupervised way. Q and Q are used to rep-resent these two parts, respectively, where Q is deﬁnedas:(1). if l Di R i is larger than ETT values of M +12 D2D-U links(when M is odd) or M D2D-U links (when M iseven), Q = q ;(2). if l Di R i is smaller than ETT values of M +12 D2D-Ulinks (when M is odd) or M D2D-U links (when M is even), Q = − q ;(3). else Q = 0;where q is the adjustment step size of the price and isset to a tiny positive value. Q corresponds to conﬂictfeedback s c , which is deﬁned as:(1). if d i collides with other D2D-U links, Q is set to v ;(2). else Q = v ;In order to mitigate the transmission collision amongD2D-U links eﬀectively in actual operation, v is set toa larger positive value to signiﬁcantly increase the priceof the unlicensed channel when collision happens. v is a Rui Yin et al. negative value which aims at decreasing prices to allowD2D-U links to use more spectrum resources when nocollision happens. Herein, the value Q can be providedby MBS and Q can be decided on d i according to itstransmission collision situation. Accordingly, the targetof NN output is denoted as T = ˆ c i,j + Q + Q and Q can be calculated by Mean-Squared Loss as Q = ( T − ˆ c i,j ) = ( Q + Q ) . (24)More speciﬁcally, in each time slot, d i will obtain theprices of M unlicensed channels and get M loss valuesbased on (24). Here M training data will be treatedas a batch to jointly train the network. Furthermore, tokeep the convergence of NN and the stability of output,we use Sigmoid function to limit the output value in acertain range according to the actual conditions. Theactivation function of the output layer is set to be w × Sigmoid( · ), which limits the output in [0 , w ].It is noteworthy that each D2D-U link holds a NNindependently to determine the price corresponding tothe utilization on the unlicensed channels. When theneural networks of D2D-U links converge, the systemhas reached an equilibrium. When the traﬃc load ofD2D-U links or WiFi system changes, neural networkswill converge to a new equilibrium adaptively. However,in such an online dynamic environment, the networkmay be hard to converge due to environmental noise orover ﬁtting. Due to the setting of the loss function, theun-convergence of a single NN may aﬀect the updatesof other D2D-U users, which leads to poor system per-formance. Additionally, when new D2D-U pairs join inthe system, their NN parameters will be initialized ran-domly, which can also cause the instability of the sys-tem. To solved the mentioned problem, the federatedlearning based mode will be utilized to further adjustNN parameters.4.2 Federated learning based methodFederated learning based schemes have achieved signiﬁ-cant performance in distributed scenarios [32]. Withouta large amount of data interaction, federated learningensures safety and stability of the system by integratingthe gradient or parameters information of distributedusers at the central processor and then feedback to thedistributed terminals. In our D2D-U model, each D2D-U pair is a distributed terminal and MBS plays therole as the center. Then MBS periodically collects NNparameters of D2D-U pairs in the region and averagesthem, which can be denoted asˆ φ = (cid:80) N − i =0 φ i N , (25) where φ i is NN parameters of D2D-U link d i . Sincethe traﬃc load and channel gain of diﬀerent D2D-Ulinks are not the same, the information contained inthe neural network corresponding to the expected valueof ˆ φ , E ( ˆ φ ), is the average traﬃc load and transmissionstatus of all D2D-U users in the region.Then for new D2D-U pairs who join the system, in-stead of initializing NN parameters randomly, we usecurrent ˆ φ saved by MBS as the starting point of NN fortraining. It can eﬀectively reduce the instability of theinitial training of NN and accelerate the network con-vergence. For existing D2D-U pair d i , in each iterationof parameter updates based on federated learning, d i needs to adjust the value of φ i according to ˆ φ . If NNin d i has converged and output stable price value, φ i should be kept or update little. On the other hand, ifthe output is unstable or the loss value is large, φ i needcorrection. The update of φ i is calculated by φ i = β ˆ φ + (1 − β ) φ i , (26)where β ∈ (0 ,

1) is the update intensity parameter whichis determined adaptively by the network convergence.In order to realize the judgment of network conver-gence, here the accumulated absolute loss value of NNis utilized to be the basis for judgment. Let Q sum,i bethe accumulated absolute loss value of d i , if the valueof Q sum,i is tiny, which means that NN has convergedand β should be small. When Q sum,i is a large number, β need to be set large accordingly. Here we use the sig-moid function again to express the relationship between Q sum,i and β as β = 11 + e − ( γ(cid:15) Q sum,i − γ ) , (27)where γ and (cid:15) are constant value and decided by spe-ciﬁc environment and the loss value. Figure. 6 describesthis relationship more intuitively, we can ﬁnd that when Q sum,i is small, the value of β is close to 0, as Q sum,i increases, β also increases till close to 1.Based on the above interpretation on the proposedNN and the federated learning based mode, the processof the joint power and spectrum allocation algorithmfor a single D2D-U link, d i ∈ D , is summarized in Al-gorithm 1. T is the system clock recorded by MBS and T fl is the length of the federated learning period. In this section, the simulation results are demonstratedto verify the performance of the proposed distributedD2D-U communication scheme. In the simulation setup,the relevant parameters of NN and federated learning istributed Spectrum and Power Allocation for D2D-U Networks: A Scheme based on NN and Federated Learning 9 sum,i T h e v a l u e o f β The relationship between Q sum,i and βε

Fig. 6

The relationship between Q sum,i and β . Algorithm 1

Distributed joint spectrum and powerallocation at d i

1: Initialize the structure and the parameters of d i ;2: Initialize the structure and the parameters of NN fromMBS;3: while d i is transmitting data do

4: Estimate the number of WiFi users with EKF basedmethod;5: Estimate the traﬃc load L U on unlicensed spectrumbased on (4);6: Normalize the input of NN;7: The prices of all unlicensed channels are calculatedbased on (22);8: Problem(8) is solved to get the resources allocationscheme of d i ;9: Calculate the loss value on the basis of (24);10: ADD the absolute loss value to Q sum,i ;11: d i trains the NN based on gradient descent algorithmin (23);12: if T can divide T fl evenly then

13: MBS collects data and calculates ˆ φ based on (25);14: d i updates φ i based on (26);15: Reset Q sum,i to 0;16: end if end while are demonstrated in Table 1 and the parameters relatedto the D2D-U and WiFi networks are given in Table 2.Since the real-time performance of the algorithm is re-quired in practice, we use shallow fully connected neu-ral networks to reduce algorithm complexity and forquickly convergence during online training processing.Then during an iteration time slot, D2D-U link will ﬁrstestimate WiFi traﬃc load based on EKF method men-tioned before, then the price of the unlicensed channelscan be derived based on the proposed method. Withthe unlicensed channels’ price as well as WiFi traﬃcload, D2D-U link can decide its spectrum and powerallocation scheme. Finaly NN is trained based on thetransmission transmission of D2D-U link and the littlesignaling interaction with MBS. Table 1

Parameters of NN.Parameters ValueLearning rate α . w × sigmoidMax value of output w q . v . v − . γ . (cid:15) . T fl Table 2

Parameters of D2D-U links.Parameters ValueTotal power control p c p u N − C d and d ,and two independent unlicensed channels, u and u .The traﬃc load of d is set to be larger than d andthe WiFi traﬃc load on unlicensed channel u is set tobe less than that on u . Then the output prices fromthe NNs of two D2D-U links on two unlicensed channelsare illustrated in Fig. 7. It can be observed that afterabout 250 iterations, the D2D-U system has achievedconvergence, which means that the loss value of NN istiny and each D2D-U link can stably estimate the priceof unlicensed channels.Here we can ﬁnd that the prices of u and u is muchmore cheaper for d with heavy transmission tasks,which implies that d is encouraged to use more unli-censed spectrum resources. In addition, since the WiFitraﬃc load on u is lighter than that on u , the valueof c , is smaller than c , and d will select the unli-censed channel u in the ﬁrst place. As for d , since u has priority to be selected by d with much less price, toavoid transmission collision, c , is trained larger than c , and d will mainly use u for transmission. Fig. 8shows the fairness between d and d , where we canobserve that as the prices converge, l D R is equal to l D R and the fairness between d and d is achieved. The number of time slots P r i c e o f un li c e n s e d c h a nn e l s The price determined by D2D-U links c11c12c21c22

Fig. 7

The price determined by D2D-U links.

The number of time slots E TT The fairness of D2D-U links d1d2

Fig. 8

The transmission fairness among D2D links. e Table 3

The comparison of transmission fairness.Users num-ber Normalizedtraﬃc load ETT(price-based) ETT(maxthrough-put)1 0 . . . . The number of time slots N o r m a li z e d d a t a r a t e o f W i F i u s e r s The fairness of WiFi system

WiFi transmission guaranteeWiFi data rate

Fig. 9

Achieved fairness of WiFi system.

WiFi traﬃc load is normalized based on the the ba-sic WiFi throughput guarantee calculated in (3). Sim-ulation result shows that after the convergence of theprice, the WiFi throughput is basically equal to the ba-sic WiFi throughput guarantee. Due to the set of v which encourages D2D-U links to use more spectrumresources with less collision, there are a little transmis-sion collision which leads to the tiny impairment to theWiFi system throughput.5.3 Eﬀectiveness of federated learningTo further verify the eﬀectiveness of federated learning,the convergence results of new D2D-U pairs with diﬀer-ent NN parameters are illustrate in Fig. 10 and Fig. 11,respectively. Here the new D2D-U link in Fig. 10 andFig. 11 are the same instead of NN parameters and sys-tem has converged before new users coming, then ETTvalue of new D2D-pairs and the average ETT valueof all D2D-U pairs are provided to prove the perfor-mance of the proposed scheme. In Fig. 11, NN param-eters are initialized randomly where new D2D-U pairspends more time to achieve the optimal output andthe ﬂuctuation of system performance is also greater.On the contrary, with the federated learning parame-ters saved in MBS, new D2D-pair in Fig. 10 completesconvergence faster and the system performance is morestable. istributed Spectrum and Power Allocation for D2D-U Networks: A Scheme based on NN and Federated Learning 11 E TT New D2D-U pair with federated learning$ETT of new D2D-U pairAverage ETT of all D2D-U pairs

Fig. 10

The ETT of new D2D-U pair with federated learn-ing. E TT New D2D-U pair without federated learning$ETT of new D2D-U pairAverage ETT of all D2D-U pairs

Fig. 11

The ETT of new D2D-U pair without federatedlearning.

400 500 600 700 800 900 1000

The number of time slots S y s t e m t h r o u g h p u t / e Throughput performance comparison

Distributed methodMaximum throughput method

Fig. 12

The comparison of system throughput between pricebased method and maximum throughput method. u0 u1 u2 u3

Unlicensed channels P o w e r a ll o c a t i o n o f D D - U p a i r Power allocation when D2D-U traffic load is high

Distributed methodMaximum throughput method u0 u1 u2 u3

Unlicensed channels P o w e r a ll o c a t i o n o f D D - U p a i r Power allocation when D2D-U traffic load is low

Distributed methodMaximum throughput method

Fig. 13

The comparison of power allocation between pricebased method and maximum throughput method. ﬁc load on u and u is low and on u and u is high.The abscissa of Fig. 13 is diﬀerent unlicensed channelsand the ordinate is η i,j of related D2D-U pairs and un-licensed channels. It can be observed that in the pricebased model, the D2D-U link with more traﬃc loadreuses the spectrum resources of more ideal channelsand the D2D-U link with less traﬃc load chooses toreuse more crowded channels. While in the centralizedmethod, the change of D2D-U traﬃc load has no eﬀecton its power allocation scheme. Therefore, simulationresults justify that the proposed method can guaran-tee the fairness among D2D-U pairs with least lose onthe data rates comparing with the centralized optimalsolution. In this paper, in order to reuse the spectrum resourceson the unlicensed spectrum to improve the performanceof D2D-U system, a distributed power and spectrumallocation mechanism with adaptive price adjustmentscheme is proposed. An unsupervised online learningstructure is employed on each D2D-U link to estimatethe prices of all perceived unlicensed channels, a fed-erated learning based approach is utilized to improvesystem stability and performance. Then the power andspectrum optimization models can be formulated andsolved by D2D-U pairs to access the unlicensed spec-trum. Numerical simulation proves that the proposedalgorithm allows D2D-U link to maximize data-ratewhile ensuring the fairness of WiFi system and the fair-ness among diﬀerent D2D-U users.

Acknowledgements

This work was supported in part bythe National Natural Science Foundation of China under GrantNo. 61771429, No. 61703368, in part by Zhejiang UniversityCity College Scientiﬁc Research Foundation under Grant No.JZD18002, in part by Zhejiang Provincial Key Laboratory2 Rui Yin et al.of Information Processing, Communication and Networking,and in part by the selective Grants for Postdoctoral ProgramsZJ2020035 in Zhejiang Province, in part by ROIS NII OpenCollaborative Research 2020-20S0502, and JSPS KAKENHIgrant numbers 18KK0279, 19H04093 and 20H00592.

References

1. Z. Zou, R. Yin, C. Wu, J. Yuan, and X. Chen, “Dis-tributed spectrum and power allocation for D2D-U Net-works,”

EAI. MONAMI. , Nov. 2020.2. K. Doppler, M. Rinne, C. Wijting, C. Ribeiro, and K.Hugl, “Deviceto-device communication as an underlay toLTE-Advanced networks,”

IEEE Commun. Mag. , vol. 47,no. 12, pp. 42-49, Dec. 2009.3. G. Yu, L. Xu, D. Feng, R. Yin, G. Li, and Y. Jiang, “Jointmode selection and resource allocation for device-to-devicecommunications,”

IEEE Trans. Commun. , vol. 62, no. 11,pp. 3814-3824. Nov. 2014.4. Y. Wu, W. Guo, H. Yuan, L. Li, S. Wang, X. Chu, andJ. Zhang “Device-to-device meets lte-unlicensed,”

IEEECommun. Mag. , vol. 54, no. 5, pp. 154-159, May. 2016.5. R. Liu, G. Yu, F. Qu, Z. Zhang, “Device-to-device commu-nications in unlicensed spectrum: mode selection and re-source allocation,”

IEEE ACCESS , vol. 4, pp. 4720-4729,Aug. 2016.6. S. Shah, M. Rahman, A. Mian, A. Imran, S. Mumtaz, andO. Dobre, “On the impact of mode selection on eﬀectivecapacity of device-to-device communication,”

IEEE Wire-less Commun. Letters , vol. 8, no. 3, pp. 945-948, Jun. 2019.7. G. Yu, L. Xu, D. Feng, R. Yin and G. Li, “Joint modeselection and resource allocation for device-to-device com-munications,”

IEEE Trans. Commun. , vol. 62, no. 11, pp.3814-3824, Nov. 20148. R. Yin, C. Zhong, G. Yu, Z. Zhang, K. Wong, and X. Chen,“Joint Spectrum and Power Allocation for D2D Commu-nications Underlaying Cellular Networks,”

IEEE Trans.Veh. Technol. , vol. 65, no. 4, pp. 2182-2195, Apr. 2016.9. R. Yin, G. Yu, H. Zhang, Z. Zhang, and G. Li, “Pricing-Based Interference Coordination for D2D Communica-tions in Cellular Networks,”

IEEE Trans. Wireless Com-mun. , vol. 14, no. 3, pp. 1519-1532, Mar. 2015.10. W. Lee, M. Kim and D. Cho, “Transmit Power Con-trol Using Deep Neural Network for Underlay Device-to-Device Communication,”

IEEE Commun. Letters , vol. 8,no. 1, pp. 141-144, Feb. 2019.11. A. Moussaid, W. Jaafar, W. Ajib and H. Elbiaze, “DeepReinforcement Learning-based Data Transmission for D2DCommunications,”

IEEE Int. Conf. WiMob. , 2018, pp. 1-7.12. QualComm, “Extending LTE advanced to unlicensedspectrum,”

White Paper , San Diego, CA, USA, Dec. 2013.13. Huawei, “U-LTE: Unlicensed spectrum utilization ofLTE,”

White Paper , Shenzhen, China, 2014.14. “Study on licensed-assisted access to unlicensed spectrum(Release 13),” document 3GPP , Sophia Antipolis Cedex,France, TR 36.889, Jul. 2015.15. X. Sun, and L. Dai, “Towards ﬁar and eﬃcient spec-trum sharing between LTE and WiFi in unlicensedbands: fairness-constrained throughput maximization,”

IEEE Trans. Wireless Commun. , pp. 1-1, Jan. 2020, earlyaccess.16. Q. Cui, W. Ni, S. Li, B. Zhao, R. Liu, and P. Zhang,“Learning-assisted clustered access of 5G/B5G networksto unlicensed spectrum,”

IEEE Wireless Commun. , vol.27, no. 1, pp. 31-37, Feb. 2020. 17. R. Yin, G. Yu, A. Maaref, and G. Li, “LBT-based adap-tive channel access for LTE-U systems,”

IEEE Trans.Wireless Commun. , vol. 15, no. 10, pp. 6585-6597, Oct.2016.18. V. Maglogiannis, D. Naudts, A. Shahid and I. Moerman,“A Q-Learning Scheme for Fair Coexistence Between LTEand Wi-Fi in Unlicensed Spectrum,”

IEEE Access , vol. 6,pp. 27278-27293, May. 2018.19. P. Santana, V. Sousa, F. Abinader, and J. Neto, “DM-CSAT: a LTE-U/Wi-Fi coexistence solution based on re-inforcement learning,”

Telecommun Syst , vol. 71, no. 4,pp. 615-626, Aug. 2019.20. J. Neto, S. Neto, P. Santana, and V. Sousa, “Multi-cellLTE-U/Wi-Fi coexistence evaluation using a reinforce-ment learning framework,”

Sensors , vol. 20, no. 7, pp.1855-1877, Mar. 2020.21. S. Liu, R. Yin and G. Yu,“Hybrid Adaptive Channel Ac-cess for LTE-U Systems,”

IEEE Trans. Veh. Technol. , vol.68, no. 10, pp. 9820-9832, Oct. 2019.22. S. Andreev, O. Galinina, A. Pyataev, K. Johnsson, andY. Koucheryavy, “Analyzing assisted oﬄoading of cellularuser sessions onto D2D links in unlicensed bands,”

IEEEJ. Selected Areas in Commun. , vol. 33, no. 1, pp. 67-80,Jan. 2015.23. H. Zhang, Y. Liao and L. Song, “D2D-U: Device-to-Device Communications in Unlicensed Bands for 5G Sys-tem,”

IEEE Trans. Wireless Commun. , vol. 16, no. 6, pp.3507-3519, June 2017.24. Z. Zou, R. Yin, X. Chen and C. Wu, “Deep Reinforce-ment Learning for D2D transmission in unlicensed bands,”

IEEE/CIC ICCC Workshops , pp. 42-47. 2019.25. R. Yin, Z. Wu, S. Liu, C. Wu, J. Yuan and X.Chen, “Decentralized Radio Resource Adaptation in D2D-U Networks,”

IEEE Internet of Things Journal , doi:10.1109/JIOT.2020.3016019.26. J. Konecny, H. Brendan McMahan, F. Yu, P. Richtarik,A. Suresh and D. Bacon, “Federated Learning: Strategiesfor Improving Communication Eﬃciency,” arXiv preprintarXiv:11610.05492 , 2017.27. S. Niknam, H. S. Dhillon, and J. H. Reed, “Fed-erated learning for wireless communications: Motiva-tion, opportunities and challenges,” arXiv preprintarXiv:1908.06847 , 2019.28. G. Bianchi and I. Tinnirello, “Kalman ﬁlter estimation ofthe number of competing terminals in an IEEE 802.11 net-work,”

IEEE INFOCOM , vol. 2, pp. 844-852, Feb. 2003.29. F. Qin, X. Dai and J. Mitchell, “Eﬀective-SNR estimationfor wireless sensor network using Kalman ﬁlter,”

Ad HocNetworks , vol. 11, no. 3, pp. 944-958, May. 2013.30. R. Yin, Z. Zou, C. Wu, J. Yuan, X. Chen andG. Yu, “Learning-based WiFi Traﬃc Load Estimationin NR-U Systems,”

IEICE Trans. , doi: 10.1587/trans-fun.2020EAP1063, Aug. 2020.31. G. Bianchi, “Performance analysis of the IEEE 802.11distributed coordination function,”

IEEE J. Sel. AreasCommun. , vol. 18, no. 3, pp. 535-547, Mar. 2000.32. Q. Yang, Y. Liu, T. Chen and Y. Tong,“Federated Ma-chine Learning: Concept and Applications,”

ACM Trans.Intell. Syst. Technol. , vol. 10, no. 2, Feb. 2019.33. Z. Zhou, B. Wang, B. Gu, B. Ai, S. Mumtaz and J.Rodriguez,“Time-Dependent Pricing for Bandwidth Slic-ing Under Information Asymmetry and Price Discrimina-tion,”

IEEE Trans. Commun. , vol. 68, no. 11, Nov. 2020.34. B. Gu, X. Yang, Z. Lin, W. Hu, M. Alazab and R.Kharel,“Multi-Agent Actor-Critic Network-based Incen-tive Mechanism for Mobile Crowdsensing in Industrial Sys-tems,”