[PDF] Bandwidth Allocation for Multiple Federated Learning Services in Wireless Edge Networks

Abstract

This paper studies a federated learning (FL) system, where \textit{multiple} FL services co-exist in a wireless network and share common wireless resources. It fills the void of wireless resource allocation for multiple simultaneous FL services in the existing literature. Our method designs a two-level resource allocation framework comprising \emph{intra-service} resource allocation and \emph{inter-service} resource allocation. The intra-service resource allocation problem aims to minimize the length of FL rounds by optimizing the bandwidth allocation among the clients of each FL service. Based on this, an inter-service resource allocation problem is further considered, which distributes bandwidth resources among multiple simultaneous FL services. We consider both cooperative and selfish providers of the FL services. For cooperative FL service providers, we design a distributed bandwidth allocation algorithm to optimize the overall performance of multiple FL services, meanwhile cater to the fairness among FL services and the privacy of clients. For selfish FL service providers, a new auction scheme is designed with the FL service owners as the bidders and the network provider as the auctioneer. The designed auction scheme strikes a balance between the overall FL performance and fairness. Our simulation results show that the proposed algorithms outperform other benchmarks under various network conditions.

Full PDF

aa r X i v : . [ c s . N I] J a n Bandwidth Allocation for Multiple FederatedLearning Services in Wireless Edge Networks

Jie Xu, Heqiang Wang, Lixing Chen

Abstract

This paper studies a federated learning (FL) system, where multiple

FL services co-exist in awireless network and share common wireless resources. It ﬁlls the void of wireless resource alloca-tion for multiple simultaneous FL services in the existing literature. Our method designs a two-levelresource allocation framework comprising intra-service resource allocation and inter-service resourceallocation. The intra-service resource allocation problem aims to minimize the length of FL rounds byoptimizing the bandwidth allocation among the clients of each FL service. Based on this, an inter-serviceresource allocation problem is further considered, which distributes bandwidth resources among multiplesimultaneous FL services. We consider both cooperative and selﬁsh providers of the FL services. Forcooperative FL service providers, we design a distributed bandwidth allocation algorithm to optimizethe overall performance of multiple FL services, meanwhile cater to the fairness among FL services andthe privacy of clients. For selﬁsh FL service providers, a new auction scheme is designed with the FLservice owners as the bidders and the network provider as the auctioneer. The designed auction schemestrikes a balance between the overall FL performance and fairness. Our simulation results show that theproposed algorithms outperform other benchmarks under various network conditions.

I. I

NTRODUCTION

Today’s mobile devices are generating an unprecedented amount of data every day. Leveragingthe recent success of machine learning (ML) and artiﬁcial intelligence (AI), this rich data hasthe potential to power a wide range of new functionalities and services, such as learning theactivities of smart phone users, predicting health events from wearable devices or adaptingto pedestrian behavior in autonomous vehicles. With the help of multi-access edge computing

J. Xu, H. Wang and L. Chen are with the Department of Electrical and Computer Engineering, University of Miami, CoralGables, FL, USA. (MEC) servers, ML models can be quickly trained/updated using this data to adapt to the chang-ing environment without moving the data to the remote cloud data center, which is envisioned inintelligent next-generation communication systems [1]. Furthermore, due to the growing storageand computational power of mobile devices as well as privacy concerns associated with uploadingpersonal data, it is increasingly attractive to store and process data directly on mobile devices.Federate learning (FL) [2] is thus proposed as a new distributed ML framework, where mobiledevices collaboratively train a shared ML model with the coordination of an edge server whilekeeping all the training data on device, thereby decoupling the ability to do ML from the needto upload/store the data to/in a public entity.A typical FL service involves a number of mobile devices (a.k.a., participating clients) andan edge server (a.k.a., a parameter server) to train a ML model, which lasts for a number oflearning rounds. In each round, the clients download the current ML model from the server,improve it by learning from their local data, and then upload the individual model updatesto the server; the server then aggregates the local updates to improve the shared model. Forexample, the seminal work [3] proposed the FedAvg algorithm in which the global model isobtained by averaging the parameters of local models. Although other FL algorithms differ in thespeciﬁcs, the majority of them follow the same procedure. Because the clients work in the samewireless network to download and upload models, how to allocate the limited wireless bandwidthamong the participating clients has a crucial impact on the resulting FL speed and efﬁciency.Therefore, resource allocation for wireless FL systems is attracting much attention recently inthe wireless communications community [4], [5]. Compared to resource allocation in traditionalthroughput-maximizing wireless networks, the resource allocation objective and outcome becomeconsiderably different for wireless FL due to its unique requirements and characteristics.Although existing works have made meaningful progress towards efﬁcient resource allocationfor wireless FL, they share the common limitation that only a single

FL service was considered.As ML-powered applications grow and become more diverse, it is anticipated that the wirelessnetwork will host multiple co-existing FL services, the set of which may also dynamicallychange over time. See Figure 1 for an illustration of the multi-FL service scenario. The presenceof multiple FL services makes resource allocation for wireless FL much more challenging.

First , the achievable FL performance depends on not only intra-service resource allocationamong the participating clients within each FL service but also inter-service resource allocationamong different FL services, and these two levels of allocation decisions are also strongly !" !" : !" ! !"!!"! <5678(7.9 ! Fig. 1.

System Overview. coupled.

Second , the FL service providers may adopt different FL algorithms and choose differentconﬁgurations (e.g., number of participating clients, number of epochs of local training, etc.), yetthis information is not always available to the wireless network operator due to privacy concernswhen making resource allocation decisions.

Third , because FL service providers have theirindividual goals, they may have incentives to untruthfully reveal their valuation of the wirelessbandwidth if by doing so they gain advantages in the inter-service bandwidth allocation. Withoutthe correct information, there is no guarantee on the overall system performance.

Finally , as inany multi-user system, resource allocation should strike a good balance between efﬁciency andfairness – every FL service provider should obtain a reasonable share of the wireless resourceto train their ML models using FL.In this paper, we make an initial effort to study wireless FL with multiple co-existing FLservices, which share the same bandwidth to train their respective ML models. Our focus is onthe efﬁcient bandwidth allocation among different FL services as well as among the participatingclients within each FL service, thereby understanding the interplay between these two levels ofallocation decisions. Our main contributions are summarized as follows: • We formalize a two-level bandwidth allocation problem for multiple FL services co-existingin the wireless network, which may start and complete at different time depending on theirown demand and FL requirements. The model is general enough for any FL algorithm thatinvolves downloading, local learning, uploading and global aggregation in each learning round, and hence has wide applicability in real-world systems. In addition, we explicitlytake fairness into consideration when optimizing bandwidth allocation to ensure that no FLservice is starved of bandwidth. • We consider two use cases depending on the nature/goals of the FL service providers. Inthe ﬁrst case, FL service providers are fully cooperative to maximize the overall systemperformance. For this, we design a distributed optimization algorithm based on dual de-composition to solve the two-level bandwidth allocation problem. The algorithm keeps allFL-related information at the individual FL service provider side without sharing it withthe network operator, thereby reducing the communication overhead and enhancing privacyprotection. • We further consider a second case where FL service providers are selﬁshly maximizingtheir own performance. To address the selﬁshness issue, we design a multi-bid auctionmechanism, which is able to elicit the FL service providers’ truthful valuation of bandwidthbased on their submitted bids. With a fairness-adjusted ex post charge, the proposed auctionmechanism is able to make a tunable trade-off between efﬁciency and fairness.The rest of this paper is organized as follows. Section II discusses related works. Section IIIbuilds the system model. Section IV formulates the problem for the cooperative case and developsa distributed bandwidth allocation algorithm. Section V studies the selﬁsh service providers caseand develops a multi-auction mechanism. Section VI performs simulations. Concluding remarksare made in Section VII. II. R

ELATED W ORK

A lot of research has been devoted to tackling various challenges of FL, including but notlimited to developing new optimization and model aggregation methods [6]–[8], handling non-i.i.d. and unbalanced datasets [9]–[11], dealing with the straggler problem [12], preservingmodel and data privacy [13], [14], and ensuring fairness [15], [16]. A comprehensive reviewof these challenges can be found in [17]–[19]. In particular, the communication aspect of FL hasbeen recognized as a primary bottleneck due to the tension between uploading a large amountof model data for aggregation and the limited network resource to support this transmission,especially in a wireless environment. In this regard, early research on communication-efﬁcientFL largely focuses on reducing the amount of transmitted data while assuming that the underlyingcommunication channel has been established, e.g., updating clients with signiﬁcant training improvement [20], compressing the gradient vectors via quantization [21], or accelerating trainingusing sparse or structured updates [22]. More recent research starts to address this problem froma more communication system perspective, e.g., using a hierarchical FL network architecture[23] that allows partial model aggregation, and leveraging the wireless transmission property toperform analog model aggregation over the air [24].As wireless networks are envisioned as a main deployment scenario of FL, wireless resourceallocation for FL is another active research topic. Many existing works [5], [25], [26] study thetrade-off between local model update and global model aggregation. Client selection is essentialto enable FL at scale and address the straggler problem. Different types of joint bandwidthallocation and client scheduling policies [27]–[31] have been proposed to either minimize thetraining loss or the training time. In all these works, resource allocation is carried out amongclients of a single

FL service, while assuming that the FL service itself has already receiveddedicated resource. In stark contrast, our paper studies a network consisting of multiple co-existing FL services and performs resource allocation at both the FL service level and the clientlevel. We notice that a related problem where multiple FL services are being trained at the sametime is also considered in a recent work [32]. In that paper, different FL services run on thesame set of clients and a joint computation and communication resource scheduling problem isstudied. In our paper, different FL services have their separate client sets which may experiencevery different channel qualities and hence we focus only on the bandwidth allocation problem.Moreover, while [32] assumed that all clients are obedient, we study the possible selﬁsh natureof FL service providers and highlight bandwidth allocation fairness.Considering each FL service as a “user”, our problem is a special type of resource allocationproblems for multi-user wireless networks. While many concepts and techniques adopted in thispaper, including proportional fairness [33], dual decomposition [34] and multi-bid auction [35],have seen applications in other multi-user wireless resource allocation domains, applying themin multi-service FL requires special treatment as two levels of resource allocation are involvedin our problem. In particular, there is no closed-form expression of how the performance (i.e.,learning speed) of a FL service depends on the resource allocation among its clients. Therefore,understanding the inter-dependency of intra-service and inter-service bandwidth allocation isessential. Furthermore, we put an emphasis on the resource fairness among different FL servicesby designing a new fairness-adjusted multi-bid auction mechanism in the selﬁsh FL serviceprovider case, thereby achieving a tunable tradeoff between efﬁciency and fairness. We point out that there are some existing works [36]–[39] on designing incentive mechanisms for clientparticipation of a single FL service. These works are very different from our paper in termsof both the problem and the approaches, and do not consider fairness when designing themechanism. III. S

YSTEM M ODEL

We consider a wireless network where machine learning models are trained using FederatedLearning (FL). The wireless network has a total bandwidth B , and the network operator has toallocate this bandwidth among concurrent FL services when needed to enable their individualtraining. Because new FL services may start and old FL services may ﬁnish over time, bandwidthallocation has to be periodically performed to adapt to the current active FL services. Therefore,we divide time into periods and let the length of a period be T . At the beginning of each period i , a set N i of FL services are active and require wireless bandwidth to carry out their training.These services are either newly initiated services in period i or continuing services from theprevious period. A FL service ﬁnishes and hence exits the wireless network when a certaintermination criteria is satisﬁed (e.g., the training loss is below a threshold, the testing accuracyis above a threshold, or other convergence criterion), which usually varies across FL servicesand are pre-speciﬁed by the corresponding service provider. Therefore, a FL service may spanmultiple periods. The wall clock time (i.e. the number of periods) that a FL service takes toﬁnish depends on the difﬁculty and other inherent characteristics of the service itself as well ashow much wireless resource is allocated to this service in each period for which it stays andhow this bandwidth is further allocated among its participating clients. In what follows, we ﬁrstformulate the client-level (i.e., intra-service) bandwidth allocation problem and then describe theservice-level (i.e., inter-service) bandwidth allocation problem. A. Intra-Service Bandwidth Allocation

To understand how bandwidth allocation affects FL performance, let us consider a singlerepresentative FL service n in one period (period index i is dropped for conciseness). Supposethat this service is allocated with a bandwidth b n in this period, which is further allocated amongits participating clients, the set of which is denoted by K n . For each client k ∈ K n , let φ k beits computing speed, and g ul k and g dl k be the uplink and downlink wireless channel gains to theparameter server of service n , respectively, which are assumed to be invariant within a period. We consider a synchronized FL model for each FL service, where a number of FL rounds takeplace in a period. Nonetheless, different FL services do not have to be synchronized – they learnat their own pace. See Figure 2 for an illustration. ! " $ % & " ’ ( ) * ) + ! " $ % & " ’ * ) , ! " ! " !" !""""""""!"!""""""""!"!""""""""!"!""""""""!" ! " $ % & " ’ ( ! " $ % & " ’ ) ! " $ % & " ’ * -.%"/0’(-.%"/0’,-.%"/0’1-.%"/0’(2-.%"/0’((-.%"/0’(3 + , " - . % / - ’ . % " " !" %" &$ C / " D ! " $ % & " ’ E : /F % F G ’ H .. & : % / C / : D ! " $ % & " E : /F % F G ’ H .. & : % / I/"’J=’ I / " ’ K " % F !""""""""!"!""""""""!" ! !!! !! !! ! Fig. 2.

Bandwidth Allocation among Multiple FL Services.

A FL round consists of four stages: download transmission, local computation, upload trans-mission and global computation: • Download Transmission (DT) . Each FL round starts with a DT stage in which each client k downloads the current global model from its parameter server residing on the base station.Suppose client k is allocated with bandwidth b n,k , then its DT rate is b n,k log (1 + P n g dl k /N ) following Shannon’s equation, where P n is the transmission power of parameter server n and N is the noise power. For notational convenience, we denote log (1 + P n g dl k /N ) , r DT k as the DT base rate of client k . Let s DT n be the download data size (e.g., the size of theglobal model), then the DT latency is t DT n,k = s DT n / ( b n,k r DT k ) . • Local Computation (LC) . With the current global model, each client k then updates its localmodel using its local dataset. Depending on the ML model complexity, the local datasetsize and the number of episodes in local training, the per-round local computation workloadis denoted by w LC n,k . Therefore, the LC latency of client k is t LC n,k = w LC n,k /φ k . • Upload Transmission (UT) . Once local update is ﬁnished, client k transmits the result to theparameter server n . Given the bandwidth b n,k , its UT rate is b n,k log (1 + P k g ulk /N ) , where P k is the transmission power of client k and N is the noise power. Again, for notationalconvenience, we denote log (1 + P k g ulk /N ) , r UT k as the UT base rate of client k . Let s UT n be the data size that has to be transmitted to the parameter server, then the UT latency ofclient k is t UT n,k = s UT n / ( b n,k r UT k ) . • Global Computation (GC) . Finally, once the local updates of all clients are received byparameter server n , the global model is updated. Let w GC n be the global model updateworkload and φ n be the computing speed of parameter server n , then the GC latency is t GC n = w GC n /φ n .Note that our framework is applicable to a vast set of FL algorithms (e.g., FedAvg, FedSGD)that can be chosen for service n . For instance, the downloaded/uploaded data may be themodel itself, the compressed version of the model, or the model gradient information. Forthe purpose of bandwidth allocation, it is sufﬁcient to describe the FL service as a tuple h s DT n , { w LC n,k } k ∈K n , s UT n , w GC n i .In synchronized FL, the parameter server updates the global model until it has received thelocal updates from all participating clients. Hence, the length of a FL round of service n isdetermined by the total latency of the slowest client, i.e. t n = max k ∈K n ( t DT n,k + t LC n,k + t UT n,k + t GC n ) .To minimize the FL round length t n of service n so that more FL rounds can be executed ina period, one has to optimally allocate bandwidth b n among the clients of service n . Given b n ,the intra-service bandwidth allocation problem can be formulated as min b n, ,...,b n,K t n ( { b n,k } k ∈K n ) subject to X k ∈K n b n,k = b n (1)Let t ∗ n ( b n ) denote the optimal solution to Eqn. (1). Then the optimal FL frequency of service n is f ∗ n ( b n ) = 1 /t ∗ n ( b n ) , which is used to represent the FL speed of service n . Note that this means T · f ∗ n ( b n ) FL rounds can be performed in one period.

B. Inter-Service Bandwidth Allocation

In a period, multiple active FL services may be active and require wireless bandwidth tocarry out learning. Since they share a total bandwidth B , how this bandwidth is allocatedamong different services will determine their achievable learning frequencies f ∗ n ( b n ) , thus theconvergence speed in terms of the wall clock time. In this paper, we consider two scenariosdepending on the goals of the FL service providers and how inter-service bandwidth allocationis implemented. In the ﬁrst scenario, all FL service providers are cooperative , and their goal isto maximize the FL performance of the overall system. Therefore, it is equivalent to the networkoperator solving a system-wide optimization problem. In the second scenario, the FL service providers are selﬁsh who care about only their own FL performance. As these service providersare competing for the limited bandwidth resource, addressing their incentive issues is crucial.In this paper, we design a fairness-adjusted multi-bid auction mechanism for the inter-servicebandwidth allocation in this case. In the following two sections, we discuss these two scenariosseparately. IV. C OOPERATIVE SERVICE PROVIDERS

In the cooperative service providers scenario, the network operator directly decides the band-width allocation to maximize the overall system performance. As in any multi-user network,bandwidth allocation for multi-service FL has to address both efﬁciency and fairness – everyactive FL service should get a reasonable share of the bandwidth. Thus, we adopt the notion of proportional fairness [33], a metric widely used in multi-user resource allocation, and aim tosolve the following optimization problem: max b ,...,b N N X n =1 log(1 + f ∗ n ( b n )) subject to N X n =1 b n = B and f ∗ n ( b n ) solves (1) , ∀ n (2)where we drop the period index i and let N be the number of active FL services in the periodfor conciseness. The objective function adds a “1” inside the logarithmic to ensure that thefunction value is always non-negative. This change has very little impact on the ﬁnal allocationsince the frequency is often much larger than 1 in a period. Note also that the above inter-service bandwidth allocation problem Eqn. (2) implicitly incorporates the intra-service problemas f ∗ n ( b n ) is the solution to Eqn. (1). A. Optimal Solution to the Intra-Service Problem

We ﬁrst investigate the optimal solution to the intra-service bandwidth allocation problem andsee how it can be used to solve the inter-service problem. According to our system model andEqn. (1), the intra-service bandwidth allocation is equivalent to min b n, ,...,b n,K t n (3)subject to t C n,k + α n,k /b n,k ≤ t n (4) X k b n,k = b n (5) where we let t C n,k , t LC n,k + t GC n and α n,k , s DT n /r DT k + s UT n /r UT k for notational convenience. Clearly,the optimal solution t ∗ must satisfy t C n,k + α n,k /b n,k = t ∗ n , ∀ k (6)Therefore, the optimal t ∗ n solves the following equality, X k α n,k t ∗ n − t C n,k = b n (7)Although we do not have a closed-form solution of t ∗ n ( b n ) , a bi-section algorithm can beconstructed to easily solve the above problem to obtain the optimal t ∗ n ( b n ) and consequentlythe optimal frequency f ∗ n ( b n ) = 1 /t ∗ n ( b n ) as a function of b n . Furthermore, the property of f ∗ n ( b n ) can be characterized in the following lemma. Lemma 1. f ∗ n ( b n ) is a differentiable, increasing and concave function for b n > .Proof. Let us consider the inverse function b n ( f n ) deﬁned by Eqn. (7). It is easy to see thatfor f n ∈ [0 , / max k t C n,k ) , b n ( F n ) is a monotonically increasing function in f n with b n (0) = 0 and b n ( f n ) → ∞ as f n → / max k t cpn,k . Therefore, for b n ≥ , f n ( b n ) is also monotonicallyincreasing. The ﬁrst-order derivative of b n ( f n ) is b ′ n = db n df n = db n dt n dt n df n = X k α n,k (1 − t C n,k f n ) > , ∀ f n ∈ [0 , / max k t C n,k ) (8)Therefore, f n ( b n ) is differentiable for b n ≥ and f ′ n = df n db n = X k α n,k (1 − t C n,k f n ) ! − > , ∀ b n ≥ (9)The second-order derivative f ′′ n can also be computed as follows: f ′′ n = − X k α n,k (1 − t Cn,k f n ) ! − X k α n,k t C n,k (1 − t C n,k f n ) ! < , ∀ B ≥ (10)This proves that f n ( b n ) is a concave function for b n ≥ .With Lemma 1, it is straightforward to see that the inter-service bandwidth allocation problem(2) is a convex optimization problem. Proposition 1.

The inter-service bandwidth allocation problem (2) is an equality-constrainedconvex optimization problem. Proof.

Because f ∗ is concave, log is concave and increasing, the composition log(1 + f ∗ ) is alsoa concave function. Then it is straightforward to see that the problem is a concave maximizationproblem with an equality constraint. B. Distributed Algorithm for Inter-Service Bandwidth Allocation

We now proceed with solving the inter-service bandwidth allocation problem. While variouscentralized algorithms, such the Newton’s method, can efﬁciently solve the inter-service problemEqn. (2) given the fact that it is a convex optimization problem, we prefer a distributed algorithmwhere individual FL service providers do not share their FL algorithm details and client-levelinformation with each other or the network operator. This way reduces the communicationoverhead and preserves privacy of the client devices of individual FL service providers. Ouralgorithm is developed based on dual decomposition [34] as follows.We ﬁrst relax the total bandwidth constraint P n b n = B to be P n b n ≤ B , and then form theLagrangian by relaxing the coupling constraint: L ( b , ..., b N , λ ) = X n log(1 + f ∗ n ( b n )) − λ X n b n − B ! = X n L n ( b n , λ ) + λB (11)where λ is the Lagrange multipier associated with the total bandwidth constraint, and L n ( b n , λ ) =log(1 + f ∗ n ( b n )) − λb n is the Lagrangian to be maximized by service provider n . Such dualdecomposition results in each service provider n solving, for a given λ , the following problem b ∗ n ( λ ) = arg max b n ≥ L n ( b n , λ ) = arg max b n ≥ (log(1 + f ∗ n ( b n )) − λb n ) (12)where the solution is unique due to the strict concavity of f ∗ n according to Lemma 1. Speciﬁcally,to solve this maximization problem, we only need to solve its ﬁrst-order condition, f ∗ n ′ ( b n ) / (1 + f ∗ n ( b n )) = λ (13)which can be converted to solve f ∗ using (1 + f ∗ n ) X k ∈K n α n,k (1 − t C n,k f ∗ n ) = λ − (14)Clearly, the left-hand side is an increasing function of f ∗ n for f ∗ n ∈ [0 , / max k t C n,k ) and thus, asimple bi-section algorithm can be devised to solve Eqn. (14) to obtain f ∗ n ( λ ) . Then plugging f ∗ n ( λ ) (hence t ∗ n ( λ ) ) into Eqn. (7) yields the optimal b ∗ n ( λ ) . Let g n ( λ ) = max b n ≥ L n ( b n , λ ) = L n ( b ∗ n ( λ ) , λ ) be the local dual function for service provider n . Then the master dual problem is min λ g ( λ ) = X n g n ( λ ) + λB subject to λ ≥ (15)Since b ∗ n ( λ ) is unique, it follows that the dual function g n ( λ ) is differentiable and the followinggradient method can be used to iteratively update λ : λ ( j + 1) = " λ ( j ) − γ B − X n b ∗ n ( λ ( j )) ! + (16)where j is the iteration index, γ > is a sufﬁciently small positive step-size, and [ · ] + denotesthe projection onto the non-negative orthant. The dual variable λ ( j ) will converge to the dualoptimum λ ∗ as j → ∞ . Since the duality gap for the inter-service problem Eqn. (2) is zero andthe solution to Eqn. (12) is unique, the primal variable b ∗ n ( λ ( j )) will also converge to the primaloptimal variable b ∗ n .Algorithm 1 summarizes the distributed inter-service bandwidth allocation (DISBA) algorithm.The algorithm works iteratively. In each iteration, the operator sends the current λ ( j ) to all serviceproviders. Then, each service providers solves for b ∗ n ( λ ( j )) using its local information and sendsthe result to the network operator. The network operator ﬁnally updates λ ( j + 1) for the nextiteration’s computation. The algorithm terminates until λ converges. Algorithm 1

Distributed Inter-Service Bandwidth Allocation (DISBA) Input to Network Operator : total bandwidth B , step size γ , convergence gap ǫ Input to service provider n : FL service n parameters h s DT n , { w LC n,k } k ∈K n , s UT n , w GC n i , channelgains and computing speed of its clients K n . Initialization : set j = 0 and λ (0) equal to some non-negative value while λ ( j ) − λ ( j − > ǫ do Network Operator sends λ ( j ) to all service providers Each service provider n obtains b ∗ n ( λ ( j )) by solving Eqn. (12) using bi-section Each service provider n sends b ∗ n ( λ ( j )) to Network Operator Network Operator updates λ ( j + 1) according to Eqn. (16) j ← j + 1 end while V. S

ELFISH SERVICE PROVIDERS

In the previous section, the distributed inter-service bandwidth allocation works by lettingeach FL service provider compute the allocated bandwidth b ∗ n ( λ ( j )) given λ ( j ) . This, however,creates an opportunity for a selﬁsh service provider to mis-report its computation result thatfavors itself but reduces the system performance as a whole. In fact, even if the inter-servicebandwidth allocation problem (2) is solved in a centralized way, similar selﬁsh behavior maystill undermine the efﬁcient system operation as a selﬁsh service provider may mis-report its FLservice and client parameters (e.g., FL workload, client computing power and channel gains etc.),which will alter the frequency function f ∗ n used at the operator side. With a wrong frequencyfunction f ∗ n , the operator will not be able to determine the true optimal bandwidth allocation.In this section, we address the selﬁshness issue in inter-service bandwidth allocation bydesigning a multi-bid auction mechanism. This auction mechanism will ensure that the FL serviceproviders are using their true FL frequency functions f ∗ n when making bandwidth bids. A. Multi-bid Auction

First, we describe the general rules of the multi-bid auction mechanism.

1) Bidding:

At the beginning of each bandwidth allocation period, each service provider n submits a set of M bids s n = { s n , ..., s Mn } . For each m ∈ { , ..., M } , s mn = ( b mn , p mn ) is atwo-dimensional bid, where b mn is the requested bandwidth and p mn is the unit price that serviceprovider n is willing to pay to get the requested bandwidth b mn . Without loss of generality,we assume that bids are sorted according to the price such that p n ≤ p n ≤ ... ≤ p Mn . Let S ∈ R + × R + denote the set of multi-bids that a service provider can submit.

2) Bandwidth Allocation and Charges:

Once the network operator collects all multi-bids fromall service providers, denoted by s = { s n } n ∈N , it computes and implements the inter-servicebandwidth allocation ( b , ..., b N ) . Each service provider n then further allocates b n to its clientsto perform FL. At the end of the period, the network operator determines the charges ( c , ..., c N ) for all service providers depending on the allocated bandwidth and the realized FL performance.Now, a couple of issues remain to be addressed. First, how to compute the bandwidth allocationand determine the charges given the service provider-submitted multi-bids? Second, do the serviceproviders have incentives to truthfully report their valuations of the bandwidth? These are thequestions to be addressed in the next subsections. B. Market Clearing Prices with Full Information

We ﬁrst consider a simpler case where the service providers truthfully report the complete

FLfrequency function f ∗ n ( b ) , ∀ n to the network operator. This analysis will provide us with insightson how to design bandwidth allocation and charging rules in the more difﬁcult multi-bid auctioncase.Recall that f ∗ n ( b ) is the optimal FL frequency of service n if it has bandwidth b . Taking intoaccount the price paid to obtain this bandwidth, the (net) utility of service provider n is u n ( b ; p ) = f ∗ n ( b ) − p · b (17)Now, if the bandwidth were sold at the unit price p , then service provider n would buy b n ( p ) =arg max b u n ( b ; p ) bandwidth in order to maximize its utility. We call b n ( p ) the bandwidthdemand function (BDF), and it is easy to show that b n ( p ) = ( f ∗ n ′ ) − ( p ) by checking theﬁrst-order condition of Eqn. (17). On the other hand, if service provider n requires a bandwidth b , then the service provider would pay a unit price no more than p n ( b ) = f ∗ n ′ ( b ) . We call p n ( b ) the marginal valuation function (MVF).

1) Market clearing price:

With the complete information of f ∗ n ( b ) and hence BDF b n ( p ) forall service providers, the network operator can compute the market clearing price (MCP) ρ sothat P Nn =1 b n ( ρ ) = B . One can prove that the MCP is unique and optimal in the sense that itmaximizes the total (equivalently, average) FL frequency. Proposition 2.

The market clearing price ρ is unique and maximizes the total FL frequency P Nn =1 f ∗ n ( b n ) .Proof. According to Lemma 1, f ∗ n ′ ( b ) is an increasing function. Therefore, the BDF, which isthe inverse function of f ∗ n ′ ( b ) is also increasing. As a result, there exists a unique solution tothe increasing function P Nn =1 b n ( p n ) = B .To show that ¯ p maximizes P Nn =1 f ∗ n ( b n ) , consider the following maximization problem max b ,...,b N N X n =1 f ∗ n ( b n ) subject to N X n =1 b n = B (18)This is clearly a convex optimization problem. Consider its Karush-Kuhn-Tucker conditions. Inparticular, the stationarity condition is ∇ N X n =1 f ∗ n ( b n ) + λ ∇ ( N X n =1 b n − B ) = 0 (19) where λ is the Lagrangian multiplier associated with the constraint. The solution requires f ∗ n ′ ( b n ) = λ, ∀ n (20)Together with the feasibility constraint, this is equivalent to imposing a homogeneous marketclearing price.Because b n ( p ) is a monotonically decreasing function in p , a bi-section algorithm can be easilydesigned to ﬁnd the unique market clearing price so that P Nn =1 b n ( ρ ) = B .

2) Fairness-adjusted costs:

One major issue with the above pricing scheme is that it ignoresfairness among the service providers: although it maximizes efﬁciency in terms of the average FLfrequency according to Proposition 2, it is possible that the average FL frequency is maximized atan operating point where a few service providers are allocated with most of the bandwidth whilesome service providers obtain very little bandwidth. In this paper, we design and incorporatea fairness-adjusted charging scheme into the above pricing scheme. The payment of serviceprovider n now consists of two parts as follows: • The ﬁrst part of the payment depends on the amount of bandwidth b n allocated to the serviceprovider n , and the unit price p set by the operator. Speciﬁcally, this payment is p · b n . • The second part of the payment depends on the realized FL frequency f n of service provider n . Speciﬁcally, service provider n will be charged a fairness-adjusted cost of α · ( f n − log(1 + f n )) at the end of the period once f n has been realized, where α ∈ [0 , is a tunableparameter.With these payments, service provider n ’s utility becomes u n ( b ; p ) = f ∗ n ( b ) − p · b − α · ( f ∗ n ( b ) − log(1 + f ∗ n ( b ))) = g n ( b ) − p · b (21)where g n ( b ) , (1 − α ) f ∗ n ( b ) + α log(1 + f ∗ n ( b )) . Comparing this new utility function Eqn. (21)with Eqn. (17), we make the following remarks. First, the fairness-adjusted cost essentiallyreplaces f ∗ n ( b ) with g n ( b ) . The decision problem remains largely the same except that nowwe have a different beneﬁt function. Second, in the new utility function Eqn. (17), given anyallocated bandwidth b , it is still in the service provider’s interest to perform the optimal client-level bandwidth allocation to maximize f n ( b ) . This is because g n ( b ) is an increasing functionin f n ( b ) for α ∈ [0 , . Therefore, we can directly write g n ( b ) as a function of the optimal FLfrequency f ∗ n ( b ) . Third, to charge the fairness-adjusted cost, the network operator does not needto know the exact function f ∗ n ( b ) . Rather, it only has to know the realized FL frequency f n at the end of the current period. This is key to achieving fairness in multi-bid auction where FLservice providers do not report the complete FL frequency function f ∗ n ( b ) .We call d n ( p ) = ( g ′ n ) − ( p ) the modiﬁed bandwidth demand function (mBDF). Likewise, wecall q n ( b ) = g ′ n ( b ) the modiﬁed marginal valuation function (mMVF). The network operator cansimilarly compute the modiﬁed market clearing price (mMCP) ζ so that P Nn =1 d n ( ζ ) = B . Usinga similar argument that proves Proposition 2, one can prove Proposition 3 as follows. Proposition 3.

The mMCP ζ is unique and the resulting bandwidth allocation ( b , ..., b N ) maximizes P Nn =1 [(1 − α ) f ∗ n ( b n ) + α log(1 + f ∗ n ( b ))] .Proof. Because f ∗ n ( b ) is a concave increasing function, log(1 + f ∗ n ( b )) is also concave andincreasing. This further shows that g n ( b ) is concave and increasing. Following similar argumentsin the proof of Theorem 2 proves the bandwidth allocation as a result of mMCP maximizes P Nn =1 g n ( b n ) .The parameter α makes a tradeoff between efﬁciency and fairness. On the one hand, setting α = 0 reduces the problem to the total FL frequency maximization problem. On the other hand,setting α = 1 achieves proportional fairness among the service providers. C. Bandwidth Allocation and Charging Rules

Now, we are ready to describe the bandwidth allocation and charging rules in fairness-adjustedmulti-bid auction. In this subsection, each service provider n submits only a multi-bid s n =( s n , ..., s Mn ) instead of the complete FL frequency function f ∗ n ( b ) . However, we will assume thatthe service providers are truthfully submitting their bids, which will be proven indeed true in thenext subsection. Speciﬁcally, we say that a bid s mn = ( b mn , p mn ) is truthful if the bandwidth demand b mn and the price p mn that FL service provider n is willing to pay satisfy the mBDF because itreveals FL service provider n ’s true valuation of bandwidth after taking into consideration thefairness-adjusted costs. A multi-bid is truthful if all bids are truthful. Deﬁnition 1. (Truthful Multi-bid) A multi-bid s n = ( s n , ..., s Mn ) is truthful if ∀ m , s mn = ( b mn , p mn ) is such that p mn = g ′ n ( b mn ) . The network operator does not know the BDF (and hence the mBDF) of each FL serviceprovider n because it does not have access to the FL frequency function f ∗ n . Nonetheless, suppose service provider n submitted a truthful multi-bid s n , then the operator can compute a pseudo-mBDF using these bids to have some idea of the actual mBDF. Speciﬁcally, given the submittedmulti-bid s n , a left-continuous step function can be used to describe the pseudo-mBDF as follows, ¯ d n ( p ) =  , if p Mn < p max ≤ m ≤ M { b mn : p mn ≥ p } , otherwise (22)Essentially, the pseudo-mBDF uses b mn to approximate the bandwidth demand for prices in therange ( p m − n , p mn ] . Similarly, the operator can also construct a pseudo-mMVF (pseudo-MVF), anapproximation of service provider n ’s actual mMVF using the submitted multi-bid, as follows, ¯ q n ( b ) =  , if b n < b max ≤ m ≤ M { p mn : b mn ≥ b } , otherwise (23)In other words, the pseudo-mMVF uses p mn to approximate the marginal value for bandwidthallocation in the range [ b mn , b m +1 n ) . We illustrate the pseudo-mBDF and pseudo-mMVF in Figure3. !" ! ! !" $ ! $ !$ & !" & ! & !$ & !" & ! & !$ $ !" $ ! $ !$ ’ ! ! !" & ’ () * ) + , &’()* ! " $% !" Fig. 3.

Pseudo-mBDF and Pseudo-mMVF.

The aggregated pseudo-mBDF is the sum of pseudo-mBDFs of all FL service providers: ¯ d ( p ) = N X n =1 ¯ d n ( p ) (24)The pseudo-mMCP ¯ ζ is the largest possible price so that the aggregated pseudo-mBDF exceedsthe total available bandwidth, i.e., ¯ ζ = sup { p : ¯ d ( p ) > B } (25) This implies that reducing the mMCP by just a little bit will result in the supply (i.e., the totalavailable bandwidth B ) being no greater than the demand. Because every individual pseudo-mBDF function is a step function with K steps, the aggregated pseudo-mBDF is also a stepfunction with at most N K steps. Therefore, the complexity of computing ¯ ζ is at most O ( N K ) .Next, we describe our bandwidth allocation and charging rules. For notational convenience,we denote y ( x + ) = lim z → x,z>x y ( x ) when this limit exists for a function y : R → R and all x ∈ R .

1) Bandwidth allocation:

With the pseudo-mMCP ¯ ζ , our bandwidth allocation rule is asfollows: if FL service provider n submits the multi-bid s n (and thereby declares the associatedfunctions ¯ d n and ¯ q n ), then it receives bandwidth b n ( s n , s − n ) , with b n ( s n , s − n ) = ¯ d n ( ¯ ζ + ) + ¯ d n ( ¯ ζ ) − ¯ d n ( ¯ ζ + )¯ d ( ¯ ζ ) − ¯ d ( ¯ ζ + ) (cid:0) B − ¯ d ( ¯ ζ + ) (cid:1) (26)In other words: (1) Each FL service provider n receives an amount of bandwidth it asks for at thelowest price ¯ ζ + for which supply exceeds the pseudo-bandwidth demand. (2) If all bandwidth isnot allocated yet, the surplus B − ¯ d ( ¯ ζ + ) is shared among service providers. This share is doneproportionally to ¯ d n ( ¯ ζ ) − ¯ d n ( ¯ ζ + ) as we notice that ¯ d ( ¯ ζ ) − ¯ d ( ¯ ζ + ) = P Nn =1 (cid:0) ¯ d n ( ¯ ζ ) − ¯ d n ( ¯ ζ + ) (cid:1) , andensures that all bandwidth is allocated.

2) Charging:

Given the submitted multi-bids s , each service provider n is charged a payment c n ( s ) as follows, c n ( s n , s − n ) = X j = n Z b j ( s − n ) b j ( s ) ¯ q j ( b ) db + α · ( f ∗ n ( b n ) − log(1 + f ∗ n ( b n ))) (27)The ﬁrst term on the right-hand side is based on the exclusion-compensation principle in second-price auction mechanisms [40]: service provider n pays so as to cover the “social opportunitycost”, namely the loss of utility it imposes on all other service providers by its presence. Thesecond term on the right-hand side is the fairness-adjusted cost, which is charged at the end ofeach period after the actual FL frequency is realized and observed.Considering both the achieved FL frequency and the payment, FL service provider n ’s utilityis therefore u ( s ) = f ∗ n ( b n ( s )) − c n ( s ) (28) D. Incentives of Truthful Reporting

In the previous subsection, we assumed that the every service provider truthfully submitsits bid. Now, we prove that this assumption indeed “approximately” holds under the designedbandwidth allocation and charging rules.We ﬁrst study the individual rationality of the designed mechanism.

Deﬁnition 2.

A mechanism is said to be individual rational if no service provider can be worseoff from participating in the auction than if it had declined to participate.

Proposition 4.

If FL service provider n submits a truthful multi-bid s n , then u n ( s ) ≥ .Proof. By Lemma 1, it is straightforward to see that g n ( b ) has the following properties: • g n ( b ) is differentiable and g n (0) = 0 • g ′ n ( b ) is positive, non-increasing and continuous • ∃ γ n > , ∀ b ≥ , g ′ n ( b ) = 0 ⇒ ∀ ˜ b < b , g ′ n ( b ) ≤ g ′ n (˜ b ) − γ n ( b − ˜ b ) .Therefore, g n ( b ) satisﬁes [Assumption 1, [35]]. According to [Property 10, [35]], we have X j = n Z b j ( s − n ) b j ( s ) ¯ q j ( b ) db ≤ g n ( b n ( s n , s − n )) (29)which is equivalent to c n ( s n , s − n ) ≤ f ∗ n ( b n ( s n , s − n )) . Therefore, u ( s ) ≥ .Next, we show that truthful reporting is approximately incentive compatible , i.e., a serviceprovider cannot do much better than simply reveal its true valuation. Proposition 5.

Consider any truthful multi-bid s n for service provider n , and any other multi-bid ˜ s n = s n , ∀ s − n , we have u n ( s n , s − n ) ≥ u n (˜ s n , s − n ) − ∆ n (30) where ∆ n = max ≤ m ≤ M Z d n ( p mn ) d n ( p m +1 n ) ( q n ( b ) − p mn ) db (31) with p M +1 n = q n (0) and p n = p .Proof. The proof follows [Proposition 2, [35]].The above proposition shows that if service provider n submits a truthful multi-bid s n , thenevery other multi-bid ˜ s n necessarily corresponds to an increase of utility no larger than ∆ n . In other words, a truthful bidding brings service provider n the best utility possible up to agap ∆ n . Importantly, this value does not depend on the number of other service providers orthe multi-bids they submit. In the game theoretic terminology, the situation where all serviceproviders submit truthful multi-bids is an ex post ∆ -Nash equilibrium , where ∆ = max n ∆ n ,in the sense that no service provider could have improved its utility by more than ∆ if it hadsubmitted a different multi-bid. E. An Uniform Multi-Bidding Example

To conclude the multi-bid auction mechanism design, we illustrate a uniform multi-biddingapproach as an example of how to decide the multi-bid of an individual service provider. Insteadof having the service provider submitting both prices and bandwidth requests, the operatorcan announce M prices ( p n , ..., p Mn ) to service provider n and let service provider n report itsrequested bandwidth ( b n , ..., b MM ) at these price points. This way, the operator has a better controlover how the service providers make multi-bids to avoid multi-bids that may result in a large ∆ n , which may reduce service provider’s incentives to truthfully report. Because the operatordoes not know the demand function of service provider n , a natural approach is to uniformlydistribute these M prices in the range [ p , p max n ] where p max n is the largest price at which theservice provider may still request a positive amount of bandwidth. Speciﬁcally, p max n = p n (0) = f ∗ n ′ (0) = K n X k =1 α n,k ! − = K n X k =1 ( s DT n r DT k + s UT n r UT k ) ! − (32)Assume that the network operator has prior knowledge K n , s DT n , s UT n , ¯ r DT n and ¯ r UT n on thelower/upper bounds on the parameters, then p max n can be upper bounded by p max n ≤ K − n · (cid:18) s DT n ¯ r DT n + s UT n ¯ r UT n (cid:19) − , ¯ p max n (33)Thus, the operator can set the uniform prices as p mn = p + m · ¯ p max n − p M + 1 , ∀ m = { , ..., M } (34)Note that there is an intrinsic trade-off on the choice of M . On the one hand, a large M allows thepseudo-BDF and pseudo-MVF to more accurately reﬂect the true BDF and MVF at an increasedcomplexity and signaling overhead. On the other hand, a smaller M makes multi-biding easierbut the discrepancy between the pseudo functions and the true functions will introduce a largerperformance loss. VI. S

IMULATIONS

In this section, we conduct simulations to evaluate the performance of the proposed methods.

A. Simulation Setup

The simulated wireless network adopts an OFDMA system with a total bandwidth of B = 10 MHz. The period length is set as T = 20 s . The number of clients of a FL service is drawnfrom a Normal distribution with mean 25. In every period, a new FL task may start followinga scheduled plan, which is deﬁned by a Poisson distribution with the mean interval p arrive . Bytuning p arrive , we adjust the FL service demand, and a smaller p arrive will more likely lead to moreconcurrent FL services in a period as an FL service often lasts multiple periods. Each FL servicehas a pre-determined target training accuracy, and when the accuracy reaches the target, the FLservice terminates and exits the wireless network. The clients’ wireless channel gain is modeledas independent free-space fading where the average path loss is from a Normal distributionwith different mean and variance in different circumstances. The variance of the complex whiteGaussian channel noise is set as − . For each client, the local training time is uniformlyrandomly drawn from [0 . , . s. We ﬁx the global aggregation time to be × − . Weconsider typical neural network sizes in the range of [0 . , . Mbits. The upload transmissionpower is uniformly randomly between 0.05 and 0.15 W, and the download transmission poweris uniformly randomly between 0.1 and 0.3 W.

B. Convergence of DISBA in the Cooperative Case

We ﬁrst illustrate the convergence behavior of DISBA in the cooperative FL service providercase in a representative period with 5 concurrent FL services. These services have 10, 12, 14,16, 18 clients, respectively. In Figure 4, we show the computed FL frequency for each serviceprovider before convergence. As Figure 5 shows, the bandwidth allocation quickly convergesto the optimal allocation for a convergence tolerance gap ǫ = 1 e − . Eventually, the resultingFL frequencies of these FL services in this period are reported in Table I. We further show inTable II the computation time of DISBA for different values of the tolerance gap and step size.The time values are measured on a desktop computer with Intel Core i5-9400 2.9GHz GPU and16GB memory. Number of iterations F r e qu e n cy Service 1Service 2Service 3Service 4Service 5

Fig. 4. Frequency of Each Service before Convergence

Number of iterations B a nd w i d t h a ll o ca t i on Service 1Service 2Service 3Service 4Service 5

Fig. 5. Bandwidth of Each Service before ConvergenceService Index Number of Clients Bandwidth Ratio Frequency1 10 0.182 1132 12 0.196 1073 14 0.209 102.64 16 0.205 90.45 18 0.205 81.2TABLE IR

ESULTED B ANDWIDTH A LLOCATION AND F REQUENCY OF E ACH

FL S

ERVICE (C OOPERATIVE ) C. Fairness-adjusted Multi-bid Auction in the Selﬁsh Case

We perform fairness-adjusted multi-bid auction in the same representative period as in thelast subsection, with M = 5 and α = 0 . . The pseudo-mBDFs of the FL service providers andthe aggregated pseudo-mBDF are illustrated in Figures 6 and 7, respectively. The pseudo-MCPis also shown in Figure 7. Table III reports the resulting bandwidth allocation and achieved FLfrequency.

10 20 30 40

Price B a nd w i d t h Service 1Service 3Service 5

Fig. 6. Pseudo-mBDF of Individual FL Services

10 20 30 40

Price To t a l B a nd w i d t h Pseudo-MCP

Fig. 7. Aggregated Pseudo-mBDF and Pseudo-MCP Tolerated Gap Step Size

OMPUTATIONAL C OMPLEXITY FOR THE C OOPERATIVE P ROVIDER C ASE

Service Index Number of Clients Bandwidth Ratio Frequency1 10 0.164 105.822 12 0.177 99.523 14 0.217 105.464 16 0.218 94.45 18 0.223 86.56TABLE IIIO

PTIMAL B ANDWIDTH AND F REQUENCY OF E ACH S ERVICE (S ELFISH ) As we brieﬂy mentioned in Section V, there is a trade-off when selecting the number of bids M . On the one hand, a larger M increases the computational complexity for searching for thepseudo-MCP and determining the eventual bandwidth allocation. On the other hand, a larger M improves the precision of the pseudo-MCP, thereby improving the allocation performance.In Figure 8, we demonstrate the overall performance by varying M . As can be seen, as M increases, the overall performance will increase while each FL service provider needs to submitmore bids to the server which will cause transmission delays and data backlogs.The parameter α plays an important role in the selﬁsh owner case, which makes a tradeoffbetween efﬁciency and fairness. With a larger α , the whole system sees fairness as moreimportant, and conversely, the whole system is more concerned with the overall efﬁciency. Themarket clearing price is reﬂected in Figure 9 and the overall utility is shown in Figure 10. Withthe increase of α , the market clearing price and the total utility will decrease, which can betreated as a concession to achieve fairness between different FL services. Value M

Fig. 8. Overall Performance in the Selﬁsh Service Providers Case with Different M Value P se udo - M C P Fig. 9. Pseudo-MCP with Different α Value

Fig. 10. Total Utility with Different α D. Performance Comparison

In the following experiments, we compare our proposed algorithms with three benchmarkalgorithms. • Equal-Client (EC) : Bandwidth is equally allocated to the clients. Therefore, each clientgets a bandwidth of B/ P n K n . • Equal-Service (ES) : Bandwidth is equally allocated to the FL services. That is, each FLservice gets a bandwidth of

B/N . However, each FL service provider still performs theoptimal intra-service bandwidth allocation among its clients. • Proportional (PP) : Each FL service obtains a bandwidth that is proportional to the numberof its client. That is, FL service n obtains a bandwidth of K n P j K n B . This bandwidth is furtherallocated among its clients following the optimal intra-service bandwidth allocation. We start by comparing the proposed algorithms with benchmarks in the per-period setting.The overall performance is shown in Figure 11. In this setting, there are ﬁve FL services with arandom number of clients drawn from a Normal distribution with mean 20 and variance 10 andrandom channel conditions drawn from a Normal distribution with mean 85 and variance 15, andthe result is averaged over 20 runs. As can be seen, our DISBA algorithm for the cooperativecase (labeled as Coop) has the best performance, and the auction mechanism for the selﬁsh case(labeled as Self) also outperforms the other benchmarks. Although ES and PP also perform theintra-service bandwidth allocation, the heterogeneity of the client number and channel conditionsrender them suboptimal.

Coop Self EC PP ES1516171819202122

Fig. 11. Per-period FL Performance of Different Algorithms

Because FL is a long-term process, we further investigate the long-term performance ofthe proposed algorithms. In the long-term setting, 10 FL services join the wireless networkat different times controlled by the p arrive -parameterized Poisson process and the FL servicewill be removed from the wireless network when its test accuracy has converged. Although theconvergence of FL is complexly affected by many factors including the adopted FL algorithm,dataset and the selected clients, we assume that each of these 10 FL services require 2000 FLrounds, which is a typical value observed in the literature [2], to reach convergence in order toprovide a meaningful comparison of the algorithms in a controlled environment. Whenever a FLservice has been run for 2000 rounds, it exits the system.Figure 12 illustrates the average duration (in terms of the number of periods) of all FL servicesby running different algorithms for p arrive = 5 , where the client number of a FL service is drawn from a Normal distribution with mean 25 and variance 15 and the channel condition of a FLservice is drawn from a Normal distribution with mean 85 and variance 15. The results areaveraged over 20 runs. We can see that the proposed algorithms achieve the smallest averageduration compared to the benchmarks, conﬁrming their fast FL convergence even in the long-run. Coop Self EC PP ES60708090100110120 A ve r a g e D u r a t i on ( un i t: p e r i od ) Fig. 12. Average Duration Period of FL Services

Next, we study the impact of the client number heterogeneity (which reﬂects the FL service sizeheterogeneity) on the performance of different algorithms. To this end, the client number of a FLservice is drawn from a Normal distribution with mean 25 and we change the variance between0 and 15 to adjust the heterogeneity degree. The result is shown in Figure 13: as the varianceincreases (i.e. a higher degree of heterogeneity), the mean of the average duration decreases, whilethe standard deviation of average duration increases. This is understandable because a higherdegree of heterogeneity causes wireless bandwidth to be more unevenly distributed among theFL services, thereby degrading the overall FL performance. Notably, the performance gain ofour proposed algorithms increases as the variance increases, which demonstrates the superiorability of our algorithms to handle the heterogeneous case.Furthermore, we also investigate the impact of the channel condition heterogeneity on the FLperformance. In these simulations, the average channel condition of a FL service is drawn froma Normal distribution with mean 85 and we change the variance between 0 and 15 to adjustthe heterogeneity degree. The channel conditions of clients of this FL are further drawn froma Normal distribution with a mean being the instantiated average channel condition. In Figure14, we observe a similar phenomenon as in Figure 13, which further conﬁrms the advantage of

0 5 10 15

Variance of the Client Number Distribution A ve r a g e D u r a t i on ( un i t: p e r i od ) CoopSelfECPPES

Fig. 13. Impact of Client Number Heterogeneity adopting our proposed algorithms.

0 5 10 15

Variance of the Channel Condition Distribution A ve r a g e D u r a t i on ( un i t: p e r i od ) CoopSelfECPPES

Fig. 14. Impact of the Channel Condition Heterogeneity

Finally, we study the inﬂuence of the mean arrival interval parameter p arrive on the resultingaverage FL duration. in Figure 15, with the increasing of p arrive , the average duration of the FLservices decreases. This is because when p arrive is small, many FL services pile up and co-existin the wireless network, thereby reducing the wireless bandwidth an individual FL service canreceive. VII. C ONCLUSION

This paper studied a bandwidth allocation problem for multiple FL services in a wirelessnetwork, which has not been well studied in the literature. The considered problem consistsof two interconnected subproblems, intra-service resource allocation, and inter-service resource

0 5 10 15 20

Mean service arrival interval P arrive A ve r a g e D u r a t i on ( un i t: p e r i od ) CoopSelf

Fig. 15. Average Duration With Varying P arrive allocation. By solving these problems, we optimally allocate bandwidth resources to multipleFL services and their corresponding clients to speed up the training process and meanwhileguarantee fairness for both cooperative and selﬁsh FL service providers cases. Our method hasshown superior performance compared to the benchmarks. However, there are several futureresearch works that can be done to extend the impact of this work. For example, this paper takesFL frequency as the key metric to be optimized, but the true performance of FL is affected bythe dataset, federated optimization algorithm, and many others. In addition, when a client cansimultaneously participate in multiple FL services, resource allocation has to consider both thewireless bandwidth and client computing resources.R EFERENCES [1] J. Park, S. Samarakoon, M. Bennis, and M. Debbah, “Wireless network intelligence at the edge,”

Proceedings of the IEEE ,vol. 107, no. 11, pp. 2204–2239, 2019.[2] J. Konecny, H. B. McMahan, F. X. Yu, P. Richtarik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies forimproving communication efﬁciency,” arXiv preprint arXiv:1610.05492 , 2016.[3] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efﬁcient learning of deep networksfrom decentralized data,” in

Artiﬁcial Intelligence and Statistics . PMLR, 2017, pp. 1273–1282.[4] M. Chen, Z. Yang, W. Saad, C. Yin, H. V. Poor, and S. Cui, “A joint learning and communications framework for federatedlearning over wireless networks,”

IEEE Transactions on Wireless Communications , 2020.[5] N. H. Tran, W. Bao, A. Zomaya, N. M. NH, and C. S. Hong, “Federated learning over wireless networks: Optimizationmodel design and analysis,” in

IEEE INFOCOM 2019-IEEE Conference on Computer Communications . IEEE, 2019, pp.1387–1395.[6] S. P. Karimireddy, S. Kale, M. Mohri, S. J. Reddi, S. U. Stich, and A. T. Suresh, “Scaffold: Stochastic controlled averagingfor on-device federated learning,” arXiv preprint arXiv:1910.06378 , 2019.[7] T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, “Federated optimization in heterogeneous networks,” arXiv preprint arXiv:1812.06127 , 2018. [8] F. Haddadpour and M. Mahdavi, “On the convergence of local descent methods in federated learning,” arXiv preprintarXiv:1910.14425 , 2019.[9] X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the convergence of fedavg on non-iid data,” arXiv preprintarXiv:1907.02189 , 2019.[10] Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V. Chandra, “Federated learning with non-iid data,” arXiv preprintarXiv:1806.00582 , 2018.[11] F. Sattler, S. Wiedemann, K.-R. Muller, and W. Samek, “Robust and communication-efﬁcient federated learning fromnon-iid data,” IEEE transactions on neural networks and learning systems , 2019.[12] V. Smith, C.-K. Chiang, M. Sanjabi, and A. S. Talwalkar, “Federated multi-task learning,”

Advances in neural informationprocessing systems , vol. 30, pp. 4424–4434, 2017.[13] C. Fung, C. J. Yoon, and I. Beschastnikh, “Mitigating sybils in federated learning poisoning,” arXiv preprintarXiv:1808.04866 , 2018.[14] H. Kim, J. Park, M. Bennis, and S.-L. Kim, “Blockchained on-device federated learning,”

IEEE Communications Letters ,vol. 24, no. 6, pp. 1279–1283, 2019.[15] M. Mohri, G. Sivek, and A. T. Suresh, “Agnostic federated learning,” arXiv preprint arXiv:1902.00146 , 2019.[16] T. Li, M. Sanjabi, A. Beirami, and V. Smith, “Fair resource allocation in federated learning,” arXiv preprintarXiv:1905.10497 , 2019.[17] T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning: Challenges, methods, and future directions,”

IEEESignal Processing Magazine , vol. 37, no. 3, pp. 50–60, 2020.[18] W. Y. B. Lim, N. C. Luong, D. T. Hoang, Y. Jiao, Y.-C. Liang, Q. Yang, D. Niyato, and C. Miao, “Federated learning inmobile edge networks: A comprehensive survey,”

IEEE Communications Surveys & Tutorials , 2020.[19] Q. Yang, Y. Liu, T. Chen, and Y. Tong, “Federated machine learning: Concept and applications,”

ACM Transactions onIntelligent Systems and Technology (TIST) , vol. 10, no. 2, pp. 1–19, 2019.[20] T. Chen, G. Giannakis, T. Sun, and W. Yin, “Lag: Lazily aggregated gradient for communication-efﬁcient distributedlearning,” in

Advances in Neural Information Processing Systems , 2018, pp. 5050–5060.[21] Y. Lin, S. Han, H. Mao, Y. Wang, and W. J. Dally, “Deep gradient compression: Reducing the communication bandwidthfor distributed training,” arXiv preprint arXiv:1712.01887 , 2017.[22] A. F. Aji and K. Heaﬁeld, “Sparse communication for distributed gradient descent,” arXiv preprint arXiv:1704.05021 ,2017.[23] L. Liu, J. Zhang, S. Song, and K. B. Letaief, “Client-edge-cloud hierarchical federated learning,” in

ICC 2020-2020 IEEEInternational Conference on Communications (ICC) . IEEE, 2020, pp. 1–6.[24] K. Yang, T. Jiang, Y. Shi, and Z. Ding, “Federated learning via over-the-air computation,”

IEEE Transactions on WirelessCommunications , vol. 19, no. 3, pp. 2022–2035, 2020.[25] S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, and K. Chan, “Adaptive federated learning in resourceconstrained edge computing systems,”

IEEE Journal on Selected Areas in Communications , vol. 37, no. 6, pp. 1205–1221,2019.[26] Y. Zhan, P. Li, and S. Guo, “Experience-driven computational resource allocation of federated learning by deepreinforcement learning,” in

Proc. of IPDPS , 2020.[27] J. Xu and H. Wang, “Client selection and bandwidth allocation in wireless federated learning networks: A long-termperspective,” arXiv preprint arXiv:2004.04314 , 2020.[28] Q. Zeng, Y. Du, K. Huang, and K. K. Leung, “Energy-efﬁcient radio resource allocation for federated edge learning,” in . IEEE, 2020, pp. 1–6. [29] W. Shi, S. Zhou, and Z. Niu, “Device scheduling with fast convergence for wireless federated learning,” in ICC 2020-2020IEEE International Conference on Communications (ICC) . IEEE, 2020, pp. 1–6.[30] T. Nishio and R. Yonetani, “Client selection for federated learning with heterogeneous resources in mobile edge,” in

ICC2019-2019 IEEE International Conference on Communications (ICC) . IEEE, 2019, pp. 1–7.[31] M. Chen, H. V. Poor, W. Saad, and S. Cui, “Convergence time optimization for federated learning over wireless networks,” arXiv preprint arXiv:2001.07845 , 2020.[32] M. N. Nguyen, N. H. Tran, Y. K. Tun, Z. Han, and C. S. Hong, “Toward multiple federated learning services resourcesharing in mobile edge networks,” arXiv preprint arXiv:2011.12469 , 2020.[33] L. Massoulie and J. Roberts, “Bandwidth sharing: objectives and algorithms,” in

IEEE INFOCOM’99. Conference onComputer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and CommunicationsSocieties. The Future is Now (Cat. No. 99CH36320) , vol. 3. IEEE, 1999, pp. 1395–1403.[34] D. P. Palomar and M. Chiang, “A tutorial on decomposition methods for network utility maximization,”

IEEE Journal onSelected Areas in Communications , vol. 24, no. 8, pp. 1439–1451, 2006.[35] P. Maille and B. Tufﬁn, “Multibid auctions for bandwidth allocation in communication networks,” in

IEEE INFOCOM2004 , vol. 1. IEEE, 2004.[36] S. Feng, D. Niyato, P. Wang, D. I. Kim, and Y.-C. Liang, “Joint service pricing and cooperative relay communicationfor federated learning,” in . IEEE, 2019, pp. 815–820.[37] Y. Sarikaya and O. Ercetin, “Motivating workers in federated learning: A stackelberg game perspective,”

IEEE NetworkingLetters , vol. 2, no. 1, pp. 23–27, 2019.[38] J. Kang, Z. Xiong, D. Niyato, H. Yu, Y.-C. Liang, and D. I. Kim, “Incentive design for efﬁcient federated learning in mobilenetworks: A contract theory approach,” in .IEEE, 2019, pp. 1–5.[39] T. H. T. Le, N. H. Tran, Y. K. Tun, Z. Han, and C. S. Hong, “Auction based incentive design for efﬁcient federatedlearning in cellular wireless networks,” in .IEEE, 2020, pp. 1–6.[40] W. Vickrey, “Counterspeculation, auctions, and competitive sealed tenders,”