[PDF] CFLMEC: Cooperative Federated Learning for Mobile Edge Computing

Abstract

We investigate a cooperative federated learning framework among devices for mobile edge computing, named CFLMEC, where devices co-exist in a shared spectrum with interference. Keeping in view the time-average network throughput of cooperative federated learning framework and spectrum scarcity, we focus on maximize the admission data to the edge server or the near devices, which fills the gap of communication resource allocation for devices with federated learning. In CFLMEC, devices can transmit local models to the corresponding devices or the edge server in a relay race manner, and we use a decomposition approach to solve the resource optimization problem by considering maximum data rate on sub-channel, channel reuse and wireless resource allocation in which establishes a primal-dual learning framework and batch gradient decent to learn the dynamic network with outdated information and predict the sub-channel condition. With aim at maximizing throughput of devices, we propose communication resource allocation algorithms with and without sufficient sub-channels for strong reliance on edge servers (SRs) in cellular link, and interference aware communication resource allocation algorithm for less reliance on edge servers (LRs) in D2D link. Extensive simulation results demonstrate the CFLMEC can achieve the highest throughput of local devices comparing with existing works, meanwhile limiting the number of the sub-channels.

Full PDF

CFLMEC: Cooperative Federated Learning for Mobile Edge Computing

Xinghan Wang+, Xiaoxiong Zhong+, ||, *, Yuanyuan Yang

Abstract - We investigate a cooperative federated learning framework among devices for mobile edge computing, named CFLMEC, where devices co-exist in a shared spectrum with interference. Keeping in view the time-average network throughput of cooperative federated learning framework and spectrum scarcity, we focus on maximize the admission data to the edge server or the near devices, which fills the gap of communication resource allocation for devices with federated learning. In CFLMEC, devices can transmit local models to the corresponding devices or the edge server in a relay race manner, and we use a decomposition approach to solve the resource optimization problem by considering maximum data rate on sub-channel, channel reuse and wireless resource allocation in which establishes a primal-dual learning framework and batch gradient decent to learn the dynamic network with outdated information and predict the sub-channel condition. With aim at maximizing throughput of devices, we propose communication resource allocation algorithms with and without sufficient sub-channels for strong reliance on edge servers (SRs) in cellular link, and interference aware communication resource allocation algorithm for less reliance on edge servers (LRs) in D2D link. Extensive simulation results demonstrate the CFLMEC can achieve the highest throughput of local devices comparing with existing works, meanwhile limiting the number of the sub-channels..

Index Terms – federated learning ; mobile edge computing I. INTRODUCTION With the great development of information and communications technology (ICT), more and more mobile devices are connected, which will need more bandwidth and bring a challenge for the capacity of computing and battery for mobile devices, if we exploit the cloud computing manner for them, it will have a high resource consumption and a high latency. Mobile Edge Computing (MEC) is a new promotion technology that extend the computing and storage at network edge, providing timely and reliable services and efficient bandwidth utilization [1]. On the other hand, mobile devices will generate huge amounts of data with privacy-sensitive in nature at the edge network. However, in this scenario, devices should share their own data to the connected server. Federated Learning (FL) [2] is a promising solution to solve such difficult

This work was supported by the National Natural Science Foundation of China (Grant Nos. 61802221, 61802220), and the Natural Science Foundation of Guangxi Province under grant 2017GXNSFAA198192, and the Key Research problem. which can allow devices to build a consensus learning model with a collaborative and manner while preserving all training data on these devices. Each device can send the learning model to the server with its gradient and they are aggregated and feedback by the server. However, when mobile devices exploit an uncooperative training strategy, it is hard to improve the communication efficiency while updating model during aggregation. Hence, a challenging issue in FL is how devices cooperate to build a high-quality global model with considering communication resource allocation. FL with resource allocation in MEC is a promising scheme for resource management in intelligent edge computing, improving resource utilization and preserving data privacy. Cooperative federated learning with resource optimization in an adaptive manner for MEC will brings some challenging issues. How could we design an efficient resource optimization framework for cooperative FL and how could we guarantee the optimal value to a resource management scheme and performance optimality given cooperative FL? To answer these questions, we propose a cooperative federated learning framework for the MEC system, named CFLMEC, which mainly considers maximum data rate on sub-channel, channel reuse and wireless resource allocation. In CFLMEC, devices can transmit local models to the corresponding devices or the edge server in a relay race manner. The contributions of this article are as follows: 1)

In order to make use of resource, we propose a cooperative federated learning for MEC, whose goal is maximizing the admission data to the edge server or the near devices. In CFLMEC, we use a decomposition approach to solve the problem by considering maximum data rate on sub-channel, channel reuse and wireless resource allocation in which establish a primal-dual learning framework and batch gradient decent to learn the dynamic network with outdated information and predict the sub-channel condition. 2)

In CFLMEC, devices can transmit local models to the corresponding devices or the edge server in a relay race manner, which aims at maximizing throughput of and Development Program for Guangdong Province 2019B010136001, the Peng Cheng Laboratory Project of Guangdong Province PCL2018KP005 and PCL2018KP004. evices. To achieve this goal, we propose communication resource allocation algorithms with and without sufficient sub-channels for strong reliance on edge servers (SRs) in cellular link, and interference aware communication resource allocation algorithm less reliance on edge servers (LRs) in D2D link. 3)

We present a new proactive scheduling policy, which allows an edge server to select the SRs and assigns sub-channels based on its sub-channel condition (we can see the details in Algorithm 4), outdated information from SRs (we can see the details in Algorithm 2), instantaneous information from SRs (we can see the details in Algorithm 1). For efficient sub-channels utilization, we assume sub channels reuse such that a sub-channel can be shared by at most two devices simultaneously. We need find a pair (LRS, SRs) for SRs with the same sub-channels and select a trans-mission power for LRs (we can see the details in Algorithm 3). 4)

We conduct extensive experiments to evaluate the performance of the CFLMEC. With the numerical results, we show that the proposed method can achieve a higher throughput. The remainder of this paper is organized as follows. Section II gives the related work. The detailed descriptions of CFLMEC will presented in Section III. We give the performance evaluation of the CFLMEC is in Section IV and make a conclusion for the paper in Section V. II.

RELATED WORK As a promising machine learning technique, federated learning based wireless network performance optimization has been attracted more attentions recently due to its good trade-off in data privacy risks and communication costs. Most of existing works about FL in wireless networks mainly focus on resource allocation and scheduling. Dinh et al , [3] proposed the FEDL framework, which can handle heterogeneous mobile device data with only assumption of strongly convex and smooth loss functions. In FEDL, it exploits different models updating methods for local model and global model, which is based on corresponding computation rounds. And they implement FEDL for resource allocation optimization in wireless networks with heterogeneous computing and power resources.

Ren et al . [4] mainly focused on federated edge learning with gradient averaging over selecting devices in each communication round, which exploits a novel scheduling policy with considering two types diversities about channels and learning updates. Yang et al ., [5] studied three scheduling policies of federated learning (FL) in wireless networks: random scheduling, round robin, and proportional fair, and exploited a general model that accounts for scheduling schemes. Chen et al. , [6] studied the joint optimization problem that including device scheduling, learning, and resource allocation: which minimizes the FL loss function with transmission delay constrains. Ding et al . [7] presented a new server’s optimal multi-dimensional contract-theoretic approach based incentive mechanism design with considering training cost and communication delay. In the meanwhile, they analyze the impact of information asymmetry levels on server’s optimal strategy and minimum cost. Xia et al . [8] formulated a client scheduling problem as an optimization problem: minimizing the whole training time consumption, which includes transmission time and local computation time in both ideal and non-ideal scenarios. And then they used a multi-armed bandit based scheme to learn to scheduling clients online in FL training without knowing wireless channel state information and dynamics of computing resource usage of clients. Aiming at accelerating the training process in FL, Ren et al . [9] formulate a training acceleration optimization problem as a maximizing the system learning efficiency problem, in the CPU scenario or GPU scenario, which jointly considers batch size selection and communication resource allocation. Pandey et al . [10] proposed a novel incentive based crowd-sourcing framework to enable FL, in which exploited a two-stage Stackelberg game model to maximize the utility of the participating clients and MEC server interacting. Considering probabilistic queuing delays, Samarakoon et al . [11] studied the problem of joint federated learning based power and resource allocation in vehicular networks, minimized power consumption of vehicular users and estimated queue lengths distribution using by Lyapunov optimization in wireless links communication delays. Shi et al. [12] formulated the problem of joint bandwidth allocation and devices scheduling as maximize the convergence rate problem, which is to capture the long-term convergence performance of FL. For optimizing FL mechanism in wireless networks, some proposals have been presented. In order to optimize the expected convergence speed, Nguyen et al . [13] proposed a fast convergent federated learning algorithm, which can deal with the heterogeneity of computation and communication of devices by adapting the aggregations based to the device’s contributions for updating. Mills et al . [14] presented an adapting FedAvg to exploit a distributed manner of Adam optimization and the novel compression techniques, which can greatly reduce the number of rounds to convergence. Guo et al . [15] proposed a novel analog gradient aggregation in wireless networks, which can improve gradient aggregation quality and accelerate convergence speed. Wang et al . [16] studied the problem of learning model parameters in the FL framework analyzed the convergence bound of distributed gradient descent from a theoretical perspective, which is based on the proposed control algorithm for minimizing the loss function with a resource budget constrain.

To the best of our knowledge, there are few works about decentralized FL in wireless networks. Luo et al . [17] presented a novel hierarchical federated edge learning (HFEL) framework in which model aggregation is partially migrated to edge servers from the cloud. In HFEL, they studied the resource optimization problem formulated as a global cost minimization, and decomposed it into two sub problems: resource allocation and edge association. Savazzi et al . [18] proposed a novel device cooperation FL framework based on the iterative exchange of both model updates and gradients, which can improve convergence and minimize the number of communication rounds in the D2D network. However, they did not completely ransmit local models in a cooperative manner, e.g., they can only transmit local models to an edge server, or only transmit local models to a device without considering channel allocation. All of the above-mentioned existing works of federated learning focused on designing learning algorithm to improve training performance or maximizing network performance, the cooperative federated learning issue among devices is still under-explored, which will cause a poor system performance for the FL based MEC system. Hence, how to design an efficient cooperative federated learning framework that device not only transmit local model to an edge server but also transmit local models to its near devices in a relay race manner, with considering resource allocation for MEC is a challenging issue. This paper aims to propose a solution to address this problem. III.

MODEL FOR CFLMEC In this section, we will describe the architecture model, mathematical model and communication model for the proposed cooperative federated learning, CFLMEC.

1. Cooperative federated learning architecture model

In this paper, we consider a cooperative federated leaning system with an edge server and multiple local devices, The set of local devices denoted as {1, 2,3..., } M   . Fig. 1. Cooperative federated learning architecture.

In the proposed architecture, local devices are divided into two types: local devices with less reliance on edge server (LRs) and local devices with strong reliance on edge server (SRs). The set of LRs

K {1, 2,3..., } K    consists of all such local devices which can not be directly connect to edge server due to harvested energy limitations and a high transmission delay. The set H {1, 2,3..., } H    of SRs consists of all such local devices which can be connected to edge server. Thus, cooperative federated learning requires LRs to send their local models to the near SRs, then the SRs must both aggregates the local models received from LRs and train its local model. Finally, the BS (edge server) aggregates models received from SRs and transmits it to the associated devices. For example, as shown in Fig.1, the device 5 and device 6 send local model to the device 3, the device 3 can be consider as SRs, the device 5 and device 6 can be considered as LRs, then device 3 trains its local model using gradient decent and aggregates local model from device 5 and device 6 while the edge server aggregates the models from device 3. Due to limited harvested energy and high transmission delay, a LRs can transmit local model to one of the SRs. To represent the local devices association, we introduce a binary indicator variable {0,1} kh x  , where k  and h  , and define the device profile as { | , } kh x x k h     .

2. Mathematical demonstration

In this subsection, we introduce the leaning process. As shown in Fig. 2, the LRs are allocated to an SRs, and the edge server collectively learns the global model with the help of the SRs.

Fig. 2. The cooperative federated learning weight update.

Each local device m collects a matrix { , ,..., } m m m m mL X x x x  , where m L is the number of the samples collected by device m . The output data vector for training cooperative federated learning of local device m is { , ,..., } m m m m mL Y y y y  . Let m W denote the parameters related to model that is trained by m X and m Y . We refer to the dataset of each device by m D . Upon a specific assignment, each SRs can collect models from the near LRs and the edge server can only receive the models from the SRs. The aggregated dataset of each SRs is g ag regateh h kh k D D x D   U , and the aggregated dataset of the edge server gg edge server ag regateag regate h D D  U . The objective of the overall cooperative learning process is to converge to * w which solves the following problem: m LM mn mnw m n

F w f w x yN      (1) where M mm

N L    is the total number of the dataset belonged to the devices. For the k -th local devices, the local parameters at time slot t are optimized as follow: ( ) ( 1) ( ) k k k k k w t w t F w     (2) Specially, in cooperative federated learning, the weights are synchronized across LRs belonging to the SRs. Hence, at time lot t , the parameters of an SRs aggregation are: g g1 ( ) ( ) ( ) Kaggregate kh k hh k hag regate ag regatek h h x D Dw t w t w tD D     (3) Similarly, at edge server, at time slot t , the weights are averaged across all SRs in edge: g 1 g ( ) ( ) ag regateH aggregatehedge hedge serverh ag regate Dw t w tD    (4)

3. Communication model

As shown in Fig.3, we introduce the links and local devices association in our network. Each device transmits its trained local model to its connected devices or the edge server via a shared wireless interface with N sub-channels. We introduce links in the network as follows: Cellular link : an SRs can transmit its aggregate models to the edge server or directly transmit its local model to the edge server (there is no LRs belonging to this SRs).

D2D link : a LRs can establish a direct D2D link with the nearest SRs within the maximum distance.

Fig. 3. An illustration of the cooperative federated learning with resource allocation.

For efficiently using sub-channels, we assume channels reuse such that a sub-channel can be shared by at most two devices simultaneously. Therefore, the reuse of sub-channel is allowed only to a cellular and a D2D link, and are not among the D2D links. Let {1, 2,..., } N   denote N channels, and thus the available bandwidth B is divided into N orthogonal sub-channels. In our case, a D2D link reuses the sub-channel of a cellular link, so we must consider the interference. And the SINR of cellular link can be expressed as: ''' _ _0 _11 n nh edge h edgenh H K n nkh k edgekhkhh h p hN x p h       (5) where _ nh edge p is the transmission power from SRs to edge and _ nh edge h denote the channel gain between SRs and edge server on sub-channel n . Let N denote the noise power and _ nk edge h is the channel gain of interference link between LRs and the edge server. Therefore, the SINR of D2D link when it reuses sub-channel n can be expressed as: ' '' n n nn kh kh khk H n nh server h hhh h x p hN p h      ‘ (6) Then the data rate of each device m on sub-channel n can be expressed as:   m 2 log 1 n nm r B     (7) where B  is the bandwidth per sub-channels. The total data rate of each device m can be defined as: N nm mn

R r    (8) We further discuss the state of local devices. Let ( ) { ( )} m S t S t  collect the state of all local devices at time slot t , where ( ) { ( ), ( )} m m m S t A t R t  is the state of the device m . ( ) m A t is the data size collected by device m at time slot t corresponding to the data size of parameter ( ) m w t . ( ) m R t is the instantaneous capacity of sub-channels at device m . Having all the local devices to report their state to the BS at each time slot may be not easy, so we divide the situation into two parts: one part is that there are enough sub-channels for the local devices, hence the local device can select best quality sub-channels. We can assign one sub-channel to a local device based on maximum data rate. As there is only one sub-channel initially, maximum power is allocated to local device. We then can assign the remaining sub-channel to local device, so the local device can have more than one sub-channels. Another part is that we consider that the local devices can be in large scale network. So we meet the challenge where there are not enough sub-channels, which means that we cannot assign the sub-channels to the local devices immediately. For example, the local device generates the data in time slot ( ) m T t and local device cannot send the data to the edge server or its near devices immediately, since the number of devices can be dramatically smaller that the number of sub channels. In this scenario, the edge server can only schedule the devices based on outdated state of local devices. Let ( ) m T t be the time that we assign the sub-channel to device m .So each device maintains data from time ( ) m T t to ( ) m T t in data queue. At time slot t , device m can be admitted to transmit the queue data to edge server or near SRs . The admission data ( ) m a t meets constrains s follow: ( ) ( ) 00 ( ) ( ) m mm m a t R ta t A t    (9) here  

1( ) lim ( )

TT t x t E x tT     indices that average of a process. The first constrain denotes admission data at device is more than the data rate of device. The second constrain denotes the admission data at device, in which there is no more than the arrived data at device during time slot t . Let ( ) { ( )} nm t t   denote the device schedule decision at time slot t . ( )=1 nm t  denotes the device m selects channel n at time slot t Otherwise ( )=0 nm t  IV. PROBLEM FORMULATION In this work, we aim to maximize the admission data of devices which transmit their models to the edge server or the SRs. The purpose is to maximize the time-average network throughput of cooperative federated learning framework, which is based on QoS aware communication resource allocation with sufficient sub-channels, batch gradient descent and primal-dual predict learning without sufficient channels, and the optimal schedule with a learned online method. Based on the system model in Section III, we formulate communication resource allocation scheme as follows: ( ) 1 : max ( ) P1 M mQ t m a t   . . ( ) ( ) 0 (10-1) 0 ( ) ( ) (10-2) ( ) ( ) 2 (10-3) 1 (10-4) ( ) 1 m mm mK H n nk hk hH khhH nhh s t a t R ta t A tt tx t           

11 max1 1 (10-5) ( ) 1 (10-6) ( ) 1 (10-7) (10-8)

K nkkN nkn M N nm mm n ttP P       (10) where ( ) { ( ), ( )} m m Q t a t t  denotes the data admission and schedule from all device across all sub-channels at time slot t . Here, the objective function aims to maximize the data admission of network. The constrain in (10-1) implies that the admission data must not exceed the maximum data rate on sub-channel. Whereas (10-2) shows admission data is between zero and collected data at device at time slot. In (10-3) , the sub-channel can be shared by at most two links and only a cellular and a D2D link are allowed to reuse the sub-channel n . The constrain in (10-4) shows that a LRs can only connect to a SRs. The constraints in (10-5) and (10-6) present the sub-channel condition, where (10-5) implies that a sub-channel can shared by one cellular link and no more than one cellular link, such as two cellular links is not allowed in a sub-channel, and (10-6) implies that a sub-channel can shared by one D2D link and no more than one D2D link, such as two D2D links are not allowed in a sub-channel. The (10-7) implies that a SRs can occupy more than one sub-channels. The (10-8) implies that a local device the transmission power cannot exceed the maximum transmission power of local device. We discuss the situation with two parts. One part is that communication resource allocation with sufficient sub-channels. Another part is the communication resource allocation without sufficient sub-channels in large scale federated learning. Theorem 1 : The objection function in P1 is strong convex. Proof.

Let

11 t 0 1

1: max ( )= lim ( ) P1 M T Mm mTQ m m a t a tT       denote the time average objective. The Hessian Matrix of P1 is positive and

11 2 ( ) 0( ) ( )

M mmm m a tt t       , if m m  or t t  . Therefore, the convexity of the objective function is confirmed. This completes the proof. Theorem 2 : Given an edge server and a set of SRs and LRs, we can have the divergence, ( ) aggregateh w t , which is equal to the weights reached by using centralized gradient decent on the h - th aggregated at time slot t : ( ) ( -1) - ( ( 1)) aggregate aggregate aggregateh h h w t w t F w t    Proof.

From equitation (3), we can have g g1 g1 g g ( ) ( ) ( ) = ( ( -1)- ( ( 1)) ( ( 1)- ( ( 1)) =

Kaggregate kh k hh k hag regate ag regatek h hK kh k k kag regatek hh h hag regateh kh kag regath x D Dw t w t w tD Dx D w t F w tDD w t F w tD x DD           g1 g g1 ( -1) ( 1) ( ( ( 1)) ( ( 1)) K hk he ag regatek hK kh k hk hag regate ag regatek h h

Dw t w tDx D DF w t F w tD D             (11) Since g g1 ( -1) ( 1) K kh k hk hag regate ag regatek h h x D Dw t w tD D     = ( -1) aggregateh w t (12) and h g g1 ( ( 1))= ( ( 1)) ( ( 1)) Kaggregate kh k hk hag regate ag regatek h h x D DF w t F w t F w tD D           (13) Submitting (13) and (12) into (11), we can have: ( ) ( -1) - ( ( 1)) aggregate aggregate aggregateh h h w t w t F w t    his completes the proof.

1. Communication resource allocation with sufficient sub-channels for SRs in cellular link.

In the considered D2D assisted cooperative federated learning in cellular network. After assigning the sub-channels to local devices based on the maximum data rate, we can assign the remaining sub-channel to device whose QoS is met the smallest value, we can select best sub-channel for the weakest cellular link. We perform communication resource allocation in

Algorithm 1 , which can improve the data admission in cellular link for SRs. In

Algorithm 1 , we first consider that a sub-channel should be assigned to an SRs based on the maximum data rate such that '' * * , 1 ( )( , ) arg max( )( ) / c nhHh n N n chh r th n r t       . Due to the fact that we have the sufficient sub-channels, we can make some supplement to some SRs that are allowed slow admission data until that sub-channels are fully utilized. For that reason, that we assign one sub-channels to a SRs at first, maximum power is allowed to the SRs. We then assign the remaining the sub-channels to the SRs according to the ' ' 1 ( )arg min( )( ) / c hHh h ch a th a t      . We assign more than one channel to the weakest cellular link and improve the size of admission data. Algorithm 1 :QoS aware communication resource allocation for SRs in cellular link with sufficient sub-channels

Input: ' ' ' ' ' ' _ , , , , , nh edge

Q h        

Output: , , a P  : Initialize ' _ ' ' , =0, 0, , sub channelh h c num R           : while    do : Find '' * * , 1 ( )( , ) arg max( )( ) / c nhHh n N n chh r th n r t       : Set ( )=1 nh t  and update _ _ = +1 sub channel sub channelh h num num Set max_ _ ( ) nh edge h edge p t p  and ( ) ( ) nh h a t r t  : Update * = { } h    : Update * = { } n    : end while 9: while    Find ' ' 1 ( )arg min( )( ) / c hHh h ch a th a t      Find ' * arg max( ( )) nhn N n r t   Set ' ( )=1 nh t  and update ' ' _ _ = +1 sub channel sub channelh h num num Update power allocation ' ' max ___ ( ) / n sub channelh edgeh edge h p t p num  Update ' ' ' ( ) ( ) ( ) nh h h a t a t r t  

Update * = { } n    : end while Then, we discuss the complexity of the

Algorithm 1 . In the first, there are H iterations for initial sub-channel to the SRs. The search for an optimal pair is ( ) O HN , thus the complexity of initial sub-channel assignment is ( ) O H N . For the remaining sub-channels allocation, the complexity is ( - ) ) O H N H （ . So the whole complexity of Algorithm 1 is ( )+ ( - ) ) O H N O H N H （ .

2. Communication resource allocation without sufficient sub-channels for SRs in cellular link.

In this case, we consider that there are not enough sub-channels for SRs, so the problem is challenging with the following features: The edge server can collect information of SRs delay by ( )- ( )+1 h h T t T t time slots where ( ) h T t is the time that device will begin to generate data and ( ) h T t be the time that we assign the sub-channel to device. The edge server makes the scheduling decision in the absence of the devices state. To address the challenge, we develop batch gradient descent and primal-dual predict learning without sufficient channels. The objective can be transformed as ： ( ) 1 min - ( ) P2 H hQ t h a t   So the Lagrangian of P2 can be reformulated by: ( ( ), ( ), ( )) ( ) [ ( ) ( ) ( )] H H N n nh h h h hh h n

L Q t S t t a t a t t r t              (14) Given the convex object and constraints, P2 can be reformulated by primal-dual predict learning to its dual problem: min ( ( ), ( ), ( )) max ( ( )) L Q t S t t G t   (15) The duality gap can be diminished by finding the optimal multipliers to maximize the dual lagrangian. Next, we need to find the optimal primal variables ( ) min ( ( ), ( ), ( ))

Q t

L Q t S t t  and dual multipliers ( ) ( ) max (min ( ( ), ( ), ( ))) t Q t L Q t S t t   Batch gradient decent: The edge server can update multipliers for the scheduled device according to: ( )( ) 1 ( +1) [ ( ) ( ( ) ( ( )) ( ( )))] hh T t N n nh h h h h h ht T t n t t a t T t r T t            (16) Here, the above equation can be regarded as batch gradient decent with outdated information of SRs. By using the multipliers, we can get the optimal primal variables. The update can be given by: * ( ) ( ) arg min ( ( ), ( ), ( ))

Q t

Q t L Q t S t t  (17) lgorithm 2 : QoS aware communication resource allocation for SRs in cellular link without sufficient sub-channels Input: ' ' ' ' ' ' _ , , , , , nh edge

Q h        

Output: , , a P  : Initialize ' ' , ( ) 0, t        : At ( ) h T t , calculate t ( ) h a t using by primal-dual predict learning. : At ( ) h T t , edge server receives the state from device : Update multipliers ( +1) t  according to (16). : If ( )=1 nh t 

6: Calculate ( ) nh t  at edge server. Set max_ _ ( ) nh edge h edge p t p  Update devices state ( )

S t

Then, we discuss the complexity of the

Algorithm 2 . Each SRs hold its multiplier ( ) h t  , and optimize its admission data ( ) h a t , the complexity of SRs is (1) O . The edge server calculate the ( +1) h t  , and update ( +1) h t  according to the batch gradient decent .The edge server assign the sub-channels to SRs and calculate ( ) nh t  . Hence, the complexity of edge server is ( ) O N H . Theorem 3 : Given the multipliers for the scheduled device ( ) h t  , the ( ) Q t can be given by:

0 , if ( ) ( 1) 0( ) ( ), if ( ) ( 1) ( )( ) ( 1), h hh h h h hh h t a ta t A t t a t A tt a t otherwise                ''

1 ( ) 0 nh h ht h h      and ' '' ' '' ' ( ) ( 1) argmin ( ( ) ( 1) ) 2 n nH n nh hh h hh h tt r t n N             Proof : To get the optimal primal variables, we need to find the point which has the minimum distance with the ( 1) ( ( 1), ( ), ( 1)) Q Q t L Q t t S t       , thus the distance problem can be reformulated as given by: ( ) arg min ( ( 1), ( ), ( 1))( ( 1))( 1) 2 Q Q t L Q t t S t Q Q tQ Q t        (18) The above equation can be transformed as: ( ) arg min ( ( 1), ( ), ( 1))( ( 1))( 1) 2( ) arg min ( ( 1), ( ), ( 1))( ( 1))( 1) 2 a t L Q t t S t Q Q tta t L Q t t S t Q Q ta a t                       (19) Optimal admission data: The SRs can solve the following problem for the admission data:

221 21 ( 1)( ( 1), ( ), ( 1))( ( 1)) 2( 1)= [ ( 1 ( ) ) ]2 ( 1)[ (1 ( )) ( 1)]2 aH h hh hhH h h hh a a tL Q t t S t Q Q ta a tt aa t t a t                        (20)

221 1 21 1 1 1 ( ( 1), ( ), ( 1))( ( 1))( 1)2( ) ( 1)= [ ( ( ) ( 1) ) ]2 ( ( 1)) + [( ( ) ( 1)) ( 1)] [ ]2 n nH N n nh hh h hh n nH N H Nn n hh h hh n h n

L Q t t S t Q Q tt tt r t tt r t t                                     (21) The above equations refer to the variables ( ) h a t .and ( ) nh t  The above equations can be suppressed as:

21 21 1 ( 1)( ) [ ( 1 ( ) ) ]2 ( ) ( 1)( ) [ ( ( ) ( 1) ) ]2

H h hh hh n nH N n nh hh h hh n a a ta t a tt r t                         (22) The objective ( ) a  can get the optimal admission data: min ( ) s.t 0 ( ) ( ) a a t A t    (23) Then, we can get the optimal admission data:

0 , if ( ) ( 1) 0( ) ( ), if ( ) ( 1) ( )( ) ( 1), h hh h h h hh h t a ta t A t t a t A tt a t otherwise                (24) The objective ( )   can get the sub-channel selection : min ( ) s.t {0,1}    (25) The objective ( )   can be decoupled between different sub-channels. ' '' ' '' ' ( ) ( 1) argmin ( ( ) ( 1) ) 2 n nH n nh hh h hh h tt r t n N             ''

1 ( ) 0 nh h ht h h      This completes the proof.

Theorem 4 : We can conclude that max ( ) h h t   , where max max h h     . roof : if ( 1)1 ( ) - 0 hh a tt     , we can conclude that - ( ) ( 1) 0 h h t a t     , hence, we have ( ) 0 h a t  according to (24), and ( +1) ( ) h h t t   according to (16). Hence, max ( ) h h t   . If max hh At     and max ( ) h h a t A  , we can obtain maxmax maxmaxmax A( +1) ( ) A 1 A1 1 ( )A = hh h h hhh t t           . This completes the proof.

3. Communication resource allocation for LRs in D2D link.

In initially, some SRs do not need to aggregate the model from LRs, because there are no LRs that belongs to SRs. These SRs can directly transmit its local model to edge server. At the same time, LRs need to share local model to the near SRs (these SRs cannot directly transmit its local model to edge server) in order to aggregate data. To avoid degradation of weak cellular links of SRs, we should guarantee the admission data of SRs. We first range the SRs based on admission data. We then find a pair (LRS, SRs) for SRs with the same sub-channels and select a transmission power for LRs. In this paper we focus on the maximum transmission power for SRs, as shown in

Algorithm 3 , which is obviously to obtain its complexity with ( ( 1))

O K H  . Algorithm 3 : Interference aware communication resource allocation for LRs in D2D link

Input: ' ' ' ' ' ' _ , , , , , nk edge

Q h        

Output: , , a P  : Initialize ' ' ' , ,          : Range SRs that are in  with a descending order of admission data : For k  to K : Find a pair ' ( , ) h h with minimum ' _ nh h h where =1 kh x Assign ( ) 1 and ( ) 1 n nk h t t    : Set ' '' ( )( ) Hn n nk h server h hhh hnkh n nkh kh

N p hP x h     ‘ : Set _ _ 02 _ -( ) n n nh edge h edge hnkh n nh k edge p h NP h  : Update ( ) min{( ) , ( ) } n n nkh kh kh

P t P P 

9: if _ ( ) n nkh h edge P t P  , return.

10: else ' '' n n nn kh kh khk H n nh server h hhh h x p hN p h      ‘ Set ( ) ( ) nk k a t r t  { } k     = { } h    = { } n    : end for Hence, the communication resource allocation for local devices is describe as following

Algorithm 4 . Algorithm 4 : Communication resource allocation for local devices

Input: ' ' ' ' ' ' _ _ , , , , , , n nk edge h edge

Q h h        

Output: , , a P  : Initialize ' ' ' _ , , =0, 0, sub channelh h num R           ， h D : Determine local devices association X : for k  to K : for h  to H

5: if k is inside range h D then 6: =1 kh x end if 8 : end for 9 : end for Based on number of sub-channels, we determine the communication resources for SRs in cellular link.

11: if

N H  the number of sub-channels is larger than the number of SRs then Executing

Algorithm 1 else 14 : Executing

Algorithm 2

15: end if 16:

Allocate communication resources to LRs using

Algorithm 3

IV.

PERFORMANCE EVALUATION In this part, we evaluate the performance of the proposed CFLMEC framework. We establish the following parameters: we consider a network topology of 300 m × 300 m, which consists of one edge server, multiple are local device are randomly distributed. The maximum transmission power of mobile user set to 100 mW. The Rayleigh fading model is adopted for small scale fading. The bandwidth of edge server is 10 MHz. We set the network coverage radius of SRs as 50m. The channel gain is modeled as independent Rayleigh fading channel which incorporates the path loss and shadowing effects. The average channel capacity of the devices follows a uniform distribution within [0, 125] Kbps. The number of sub-channels is 10, the data arrivals at device within a time slot is [0,40] Kbits. The numerical of baseline is offline optimum. Fig. 4 clearly reveals the change of network throughput with the different parameters ε =0.001, ε =0.005, and ε =0.00025 respectively. From this figure, ε =0.00025 is ranked in the first, ε =0.005 was far behind ε =0.00025, while the figure for 0.001 was the smallest compared with other parameters. We can see that the network throughput of the proposed approach increases with the growing number of devices. Fig. 5 shows that the Lagrange multipliers of the Algorithm 2. It first increases under all different parameters ε, and then stabilizes at the same value over time. As the step size learning rate decreases from 0.002 to 0.0005, Algorithm 2 requires increasingly long convergence times to stabilize the system.

Fig. 4. Number of devices vs. throughput.

Fig. 5. Time slot vs. value of Lagrange multipliers.

Fig. 6. Number of devices vs. run time.

Fig. 7. Number of devices vs. throughput.

Fig.8. number of sub-channel vs. throughput.

Fig. 6. shows the runtime for different numbers of devices among different numbers of sub-channels. The learning rate is ε=0.00025. We can see that the runtime of the Algorithm 2 increases proportionally with the number of devices. From the Fig.7, we can see that when the number of the devices increases from 4 to 28, the network throughput of three approaches increases, the values are from 87 Kbps to 512.322 Kbps for Algorithm 1+Algorithm 3, from 39.31 to 274.3 for Random, and from 73 Kbps to 407.382 Kbps for Max-SNR. Algorithm 1+Algorithm 3 has a highest network throughput. Fig. 8 plots the effect of the network throughput on different sub-channels, the number of devices is 30 and shows that it gradually increases with increasing in number of the sub-channels for Algorithm 4. This can be explained that the number of sub-channels is smaller than the number of SRs, we run the Algorithm 2 and Algorithm 3. With the increase number of the sub-channels, the number of sub-channels is larger than the number of SRs, we run the Algorithm 1 and Algorithm 3. we can make good use of communication resources under different sub-channels. .