[PDF] Distributed Fair Scheduling for Information Exchange in Multi-Agent Systems

Abstract

Information exchange is a crucial component of many real-world multi-agent systems. However, the communication between the agents involves two major challenges: the limited bandwidth, and the shared communication medium between the agents, which restricts the number of agents that can simultaneously exchange information. While both of these issues need to be addressed in practice, the impact of the latter problem on the performance of the multi-agent systems has often been neglected. This becomes even more important when the agents' information or observations have different importance, in which case the agents require different priorities for accessing the medium and sharing their information. Representing the agents' priorities by fairness weights and normalizing each agent's share by the assigned fairness weight, the goal can be expressed as equalizing the agents' normalized shares of the communication medium. To achieve this goal, we adopt a queueing theoretic approach and propose a distributed fair scheduling algorithm for providing weighted fairness in single-hop networks. Our proposed algorithm guarantees an upper-bound on the normalized share disparity among any pair of agents. This can particularly improve the short-term fairness, which is important in real-time applications. Moreover, our scheduling algorithm adjusts itself dynamically to achieve a high throughput at the same time. The simulation results validate our claims and comparisons with the existing methods show our algorithm's superiority in providing short-term fairness, while achieving a high throughput.

Full PDF

DDistributed Fair Scheduling for Information Exchange in Multi-Agent Systems

Majid Raeis, S. Jamaloddin Golestani University of Toronto, Toronto, Canada Sharif University of Technology, Tehran, [email protected], [email protected],

Abstract

Information exchange is a crucial component of many real-world multi-agent systems. However, the communication be-tween the agents involves two major challenges: the limitedbandwidth, and the shared communication medium betweenthe agents, which restricts the number of agents that can si-multaneously exchange information. While both of these is-sues need to be addressed in practice, the impact of the lat-ter problem on the performance of the multi-agent systemshas often been neglected. This becomes even more importantwhen the agents’ information or observations have differentimportance, in which case the agents require different pri-orities for accessing the medium and sharing their informa-tion. Representing the agents’ priorities by fairness weightsand normalizing each agent’s share by the assigned fairnessweight, the goal can be expressed as equalizing the agents’normalized shares of the communication medium. To achievethis goal, we adopt a queueing theoretic approach and pro-pose a distributed fair scheduling algorithm for providingweighted fairness in single-hop networks. Our proposed al-gorithm guarantees an upper-bound on the normalized sharedisparity among any pair of agents. This can particularly im-prove the short-term fairness, which is important in real-timeapplications. Moreover, our scheduling algorithm adjusts it-self dynamically to achieve a high throughput at the sametime. The simulation results validate our claims and compar-isons with the existing methods show our algorithm’s superi-ority in providing short-term fairness, while achieving a highthroughput.

Introduction

Many real-world multi-agent systems rely on the informa-tion exchange between the agents. However, the commu-nication of the agents involves major practical limitationssuch as the shared communication medium and the lim-ited bandwidth. Whether the information exchange is re-quired for the coordination of agents, or solely the transferof information from one point to another, the agents can-not transmit their messages simultaneously over the samecommunication medium. This becomes even more challeng-ing when there is no coordinator to arbitrate access to themedium and therefore, a distributed algorithm is requiredfor scheduling the agents’ transmissions. Fairness is a key concern in distributed scheduling algorithms, where someagents might hog the communication medium for a long pe-riod of time and therefore, lead to starvation of the otheragents. Moreover, the agents’ messages might have differentimportance or different latency requirements. Informationexchange in multi-agent reinforcement learning (MARL) isone such example, where the importance of each agent’s par-tially observed information can be different from one an-other (Kim et al. 2018). Weighted fair queueing, which haslong been used as an effective basis for resource allocationand scheduling, is a natural candidate for dealing with thesecases. The main goal of the weighted fair queueing algo-rithms is to provide service in proportion to some speciﬁedservice shares (also known as fairness weights) to the com-peting users of a shared resource. The Generalized ProcessorSharing (GPS) scheme (Parekh and Gallager 1993) servesas the reference for most of the existing fair queueing al-gorithms in the literature. The GPS algorithm uses a ﬂuidﬂow model in which all the users can receive service simul-taneously. However, because of the ﬂuid ﬂow assumption,the GPS scheme is not applicable in many real-world ap-plications such as information exchange. Consequently, al-ternative fair queueing algorithms such as (Golestani 1994;Goyal, Vin, and Cheng 1997) have been proposed to mimicthe behaviour of the GPS algorithm in real-world packet-based applications. However, the majority of these algo-rithms are centralized in implementation and cannot be useddirectly to provide fair queueing in a distributed manner.In the information exchange context, the communicationmedium is the shared resource that needs to be scheduledamong the agents. In wireless networks, the medium ac-cess control (MAC) protocol is responsible for schedulingthe users. IEEE 802.11 DCF (Wi-Fi) is the dominant dis-tributed MAC protocol in wireless networks. Although therehas been some effort in designing distributed fair schedul-ing algorithms based on 802.11 DCF, almost all these pro-posed methods take heuristic approaches to emulate a par-ticular centralized fair queueing algorithm without provid-ing any guarantees or theoretical analysis. Moreover, theysuffer from short-term unfairness, which is particularly im- We use the terms 802.11 DCF and Wi-Fi interchangeably forreferring to the Distributed Coordination Function (DCF) protocolin IEEE 802.11 a r X i v : . [ c s . M A ] F e b ortant in real-time applications. In contrast, we propose adistributed fair scheduling algorithm, which provides deter-ministic guarantees on the normalized share disparity amongany pair of agents. Speciﬁcally, we propose a distributedscheduling algorithm for information exchange in single-hop wireless networks, which guarantees a bounded dispar-ity between the normalized services received by any twoagents. To the best of our knowledge, this is the ﬁrst time thata distributed scheduling algorithm is capable of providing abounded service disparity among the users. This particularlyimproves the short-term fairness of our proposed algorithmcompared to the existing ones. Furthermore, our algorithmdynamically adjusts itself to balance the trade-off betweenthe fairness and the network throughput. A Brief Background on 802.11 Wi-Fi

Wi-Fi is the dominant protocol for wireless communicationscheduling, which is based on CSMA (Carrier-Sense Mul-tiple Access). The main idea of CSMA is that each agentis required to listen to the medium before its transmission.The agent is allowed to transmit its message only if themedium is idle. However, to avoid collisions after busy pe-riods, it uses the Binary Exponential Backoff (BEB) mech-anism, which requires the agents to choose random back-offs in the interval [0 , CW ] . Whenever the medium becomesidle, each agent discretizes the time into time slots of pre-deﬁned length (denoted by σ ) and decrements its backoffcounter by one after each idle time slot. The backoff counterwill be frozen during the busy periods. When the counterreaches zero, the agent is allowed to attempt a transmission.The parameter CW is called the contention window, whichis doubled by the agent if its transmission is unsuccessful. CW is reset to its default value once the agent transmitsits message successfully. Moreover, 802.11 uses interframespaces (IFS), which are speciﬁed waiting periods betweentransmission of frames, to prioritize different trafﬁc types(such as control and data). Related Work

Efﬁcient and fair communication between agents of a sys-tem is a challenging task. Particularly, it is a key compo-nent in applications that require agents’ coordination, suchas cooperative multi-agent reinforcement learning. Most ofthe existing studies on multi-agent communication only fo-cus on the bandwidth limitation of the medium and ignorethe shared medium contention between the agents (Goldmanand Zilberstein 2003; Mao et al. 2020; Foerster et al. 2016;Zhang and Lesser 2013). For instance, a message pruningmechanism is proposed by (Mao et al. 2020) to improve thebandwidth efﬁciency of the recent deep RL-based communi-cation methods for multi-agent systems (Foerster et al. 2016;Jiang and Lu 2018; Peng, Zhang, and Luo 2018; Singh, Jain,and Sukhbaatar 2019). In another work, (Zhang, Zhang, andLin 2019) introduce Variance Based Control (VBC) tech-nique for efﬁcient communication in MARL, which limitsthe variance of the exchanged messages between the agentsin the training phase. (Goldman and Zilberstein 2003) havedeveloped a theoretical model for decentralized control ofa cooperative multi-agent system with communication. This

Agent Agent Agent Agent $ Agent % … … … Agent SharedmediumAgent Agent (a)

Agent Agent Agent Agent $ Agent % … … … Agent SharedmediumAgent Agent (b)

Figure 1: Modeling the communication medium as a sharedserver in a queueing model that needs to be scheduled be-tween the agentsmodel can be used for studying the trade-off between thevalue and cost (including bandwidth cost) of the exchangedinformation, as well as its effect on the joint utility of theagents. While almost all these works focus on the band-width efﬁciency of the communication and ignore the sharedmedium contention between the agents, (Kim et al. 2018)study the problem of communication scheduling in MARL.The authors propose SchedNet for scheduling the communi-cation of the agents, which uses a weight-based schedulingalgorithm to prioritize the agents with more important infor-mation.On the other hand, the scheduling problem has been wellstudied in the area of wireless communication. Since Wi-Fi is the dominant distributed MAC protocol in wirelessnetworks, most of the existing distributed scheduling algo-rithms have been proposed based on this protocol. In par-ticular, these algorithms use one of the parameters of the802.11, such as CW size (Qiao and Shin 2002), IFS (Shree-dhar and Varghese 1996; Lee, Liao, and Chen 2007) or back-off intervals (Vaidya et al. 2005), to emulate centralized fairqueueing service disciplines in a distributed manner. A com-mon problem with these algorithms is the short-term unfair-ness, which is mainly caused by the random backoffs that areused in the collision resolution mechanism of 802.11 (Choi,Yoo, and Kim 2008). In order to improve the short-term fair-ness, (Dugar, Vaidya, and Bahl 2001) propose a protocol thatuses pulse transmissions for giving priority to the nodes thathave experienced collision in their last attempt. Althoughthis protocol improves the short-term fairness, the order inwhich packets are served could still drastically change fromthe reference centralized fair queueing algorithm. Moreover,almost all these methods use heuristic methods to mimic thebehaviour of the centralized fair queueing algorithms anddo not provide any guarantees or theoretical backings on thefairness or the network throughput. System Model and Problem Formulation

Consider a network consisting of N agents that need tocommunicate with each other to achieve a common goal.The set of all agents is represented by N and each agent n ∈ N has a fairness weight of φ n , which corresponds toits share of the communication medium. Fairness weightsare chosen independently and are based on the importanceof each agent’s information. We assume that the agents cansense the medium and therefore, avoid transmitting their ime 1,3,6,8Collision Set 3,6,81 3 6 8 ! !" = ! = ! $" = 1! "" = 2 ! = ! $% = 2! !% = 4 ! = 5! $! = 1 CollisionSuccess

Figure 2: An example of using splitting algorithm (with m =2) for contention resolution.messages when the medium is busy. Without loss of gen-erality and solely to simplify the discussion, we assume thatall the agents can instantaneously detect if the communica-tion medium becomes busy or idle. Although in practice theagents are only able to detect the medium’s status with somedelay, the concept of interframe spaces in the 802.11 DCFprotocol can be easily merged into our algorithm to addressthis issue for real-world applications.In order to study this problem, we model the communi-cation medium as a shared resource (server) that needs tobe scheduled among the agents. Speciﬁcally, each agent hasa transmission queue for storing the information that needsto be sent (Fig. 1). At a given time, the agents with non-empty queues are called backlogged agents , while the oneswith no information to share are called absent agents. Let B ( t ) denote the set of backlogged agents at time t , and A ( t ) represent the absent agents. In the same way, B ( t , t ) and A ( t , t ) respectively represent the set of agents that arebacklogged and absent during the whole interval ( t , t ) .Each time that an agent successfully transmits a packet, itreceives a service equal to the transmitted packet size. Wedeﬁne W k ( t ) as the total service provided to agent k dur-ing (0 , t ) . Based on our deﬁnition, if a packet of agent k is under transmission at t , W k ( t ) includes any part of ittransmitted by time t . The normalized service received byagent k , w k ( t ) , is deﬁned as the aggregate service providedto this agent during (0 , t ) normalized to its fairness weight,i.e., w k ( t ) = W k ( t ) /φ k , k ∈ N . Furthermore, we de-ﬁne bivariate process w k ( t , t ) as w k ( t , t ) = w k ( t ) − w k ( t ) , k ∈ N .Now, our goal can be formulated as designing a dis-tributed scheduling algorithm that guarantees a bounded ser-vice disparity among any pair of backlogged agents, i.e., | w k ( t , t ) − w j ( t , t ) | ≤ ε, k, j ∈ B ( t , t ) , (1)while achieving a high throughput at the same time. Sinceour proposed algorithm implements the centralized Self-Clocked Fair Queueing algorithm (Golestani 1994) in a dis-tributed manner to provide the above guarantee, we refer toit as Distributed SCFQ (DSCFQ) in the rest of the paper.

Distributed SCFQ (DSCFQ)

An important source of short-term unfairness in the exist-ing scheduling schemes is that the agents with unsuccessfullast attempts have no priority over other agents. In order toaddress this issue, we split the competing nodes into twoclasses: class I that is deﬁned as the set of agents that have to retransmit their messages because of their unsuccessfullast attempts, and class II nodes, which are the agents withnew information to share. Moreover, each agent maintainsa collision counter , scaling factor and a compensation fac-tor . Collision counter keeps track of the number of colli-sions that the agent has experienced since its last successfultransmission. Scaling factor is a parameter for suitable scal-ing of the backoff intervals, in order to reach an acceptablethroughput. Finally, compensation factor is a parameter thatis required for providing short-term fairness, which compen-sates the fairness deviations that are caused by the discretebackoff intervals.Now, let us introduce our distributed scheduling algorithmfor providing weighted fairness in a multi-agent system. Let p ik denote the i th message (packet) of agent k . Moreover,the arrival time, transmission start time and transmission ﬁn-ish time of packet p ik are denoted by a ik , s ik and d ik , respec-tively. Now, each agent takes the following actions to trans-mit the backlogged messages in its queue. Class II agents:

1. At the time message p ik reaches the front of agent k ’stransmitter queue, it is tagged with backoff tag B ik , cal-culated as follows: B ik = (cid:22) α (cid:0) L ik φ k − (cid:15) ik (cid:1)(cid:23) , (2)where L ik and α are the size of packet p ik and scaling fac-tor, respectively, and (cid:15) ik represents the compensation fac-tor, which is deﬁned as (cid:15) k = 0 , (cid:15) ik = (cid:15) i − k + (cid:0) B i − k α − L i − k φ k (cid:1) . (3)2. To send p ik , agent k picks a backoff interval equal to B ik slots. Class II agents can start decrementing their backoffcounters only after sensing the medium idle for one timeslot. The backoff counter is decremented by one after eachidle slot and is frozen during the busy periods.3. Whenever a node’s backoff reaches zero, it starts thetransmission process for the intended destination.4. If a transmitting class II agent detects a collision, it stopsthe transmission, increments its collision counter by 1 andstarts following the procedure for class I nodes. Class I agents: Splitting Algorithm for Collision Resolution:

In con-trast to class II agents, each class I node can access themedium as soon as it becomes idle, without waiting foran extra idle time slot. In this way, the nodes involvedin the collision get prior access to the medium. However,to avoid a guaranteed collision with other class I nodes,we employ a mechanism based on the splitting algorithm(Bertsekas and Gallager 1992) for resolving the collision.In particular, agent k chooses a uniformly distributed in-teger, C q k k , in the interval [( q k − m + 1 , q k m ] , where q k Throughout the paper, we use the terms message and packetinterchangeably. s node k ’s collision counter and m denotes the number ofbranches into which nodes involved in the latest collisionare divided. As soon as the medium becomes idle, agent k transmits a pulse for a duration of C q k k slots and thenstarts sensing the medium. If it concludes that the follow-ing slot is busy, it defers its transmission and repeats thisstep, without changing its collision counter. Otherwise, ifthe medium is idle after its pulse transmission, the agentstarts its transmission. Hence, the node with the largestpulse transmission period among class I nodes will be thewinner of the contention.2. If a collision happens during the collision resolution pe-riod, nodes involved in the collision increase their colli-sion counters and go to the previous step.3. Each time a node transmits successfully, it resets its colli-sion counter to 0 and follows the procedure for the Class IIagents. Fairness Analysis of DSCFQ

Now let us introduce some deﬁnitions, which facilitate ourfairness analysis of the proposed scheduling algorithm.

Deﬁnition 1.

We deﬁne

Collision Resolution Period as thewhole period of the splitting algorithm, during which allclass I nodes receive service, and will refer to it as CRP inthe rest of the paper. Furthermore, we deﬁne the set of nodesthat were involved in the latest collision as

Collision Set . Deﬁnition 2.

A generalized time slot is deﬁned as the inter-val between two consecutive times at which class II nodesare allowed to transmit a packet.In other words, the generalized time slot equals an idleslot ( σ ) if the medium has been idle at least for one timeslot. Otherwise, the generalized time slot will be equal tothe time interval during which the medium is entirely busyuntil it gets idle for one time slot. These cases are shown inFig. 3. Deﬁnition 3.

We deﬁne the network’s virtual time, v ( t ) , asthe number of idle slots during interval (0 , t ) , normalizedby the scaling factor. Furthermore, for any interval ( t , t ) , v ( t , t ) is deﬁned as v ( t ) − v ( t ) .It is obvious from Deﬁnition 3 that the system’s virtual timeis a cumulative function and as a result, it is a nondecreasingfunction of time. Whenever a node transmits at the begin-ning of a slot, all the other nodes will be able to detect thattransmission. As a result, all the agents have similar obser-vations about the status of a given slot and therefore, all theagents are capable of tracking the network’s virtual time.In order to compare the normalized services received bydifferent agents, we deﬁne virtual time and service deviationfor each agent, as follows: Deﬁnition 4.

The virtual time of agent k , v k ( t ) , is deﬁnedas v k (0) = 0 , ∀ k ∈ N ,v k ( t , t ) = (cid:26) w k ( t , t ) , ∀ k ∈ B ( t , t ) ,v ( t , t ) , ∀ k ∈ A ( t , t ) . & !" Busy ' Successful Transmission & $%!! Types of generalized slots … TimeBusy & !" & ! & $ Collision Collision resolution … Figure 3: Illustration of the possible states of a generalizedtime slot

Deﬁnition 5.

We deﬁne normalized service deviation ofagent k , which represents the difference between the net-work’s and the agent’s virtual times, as follows δ k ( t ) = v ( t ) − v k ( t ) , ∀ k ∈ N . Theorem 1.

The maximum normalized service disparity be-tween any pair of agents k and j , k, j ∈ B ( t , t ) , in ashared medium with our proposed scheduling algorithm, isbounded as | w k ( t , t ) − w j ( t , t ) | ≤ L maxk φ k + L maxj φ j + 2 α , (4) where L maxk and L maxj are the maximum packet sizes thatcan be transmitted by agents k and j , respectively. We prove this theorem through the following set of lem-mas.

Lemma 1.

Whenever the transmission of an agent’s mes-sage ﬁnishes, the normalized service deviation of the corre-sponding agent satisﬁes − α < δ k ( d ik ) ≤ ∀ i, ∀ k ∈ N . (5) Proof.

Let us deﬁne b ik as max { a ik , d i − k } . If packet p ik ar-rives before that packet p i − k has ﬁnished its service, b ik = d i − k and hence δ k ( b ik ) = δ k ( d i − k ) , a ik < d i − k . (6)Otherwise, b ik = a ik > d i − k and therefore we have δ k ( b ik ) = δ k ( a ik ) = δ k ( d i − k ) + δ k ( d i − k , a ik ) . (7)Since agent k is not backlogged during the inter-val ( d i − k , a ik ) , we get δ k ( d i − k , a ik ) = v ( d i − k , a ik ) − v k ( d i − k , a ik ) = 0 . We conclude from Eqs. (6) and (7) that δ k ( b ik ) = δ k ( d i − k ) . (8)On the other hand, according to the deﬁnition of b ik , all theprevious messages of agent k have been sent by time b ik . So, p ik is the only message that will be transmitted by agent k during ( b ik , d ik ) . Furthermore, transmitting p ik requires sens-ing the medium idle for B ik generalized slots. Consequently,during the interval ( b ik , d ik ) , normalized service of agent k and the network’s virtual time will be increased by L ik /φ k and B ik /α , respectively, i.e., v k ( b ik , d ik ) = w k ( b ik , d ik ) = L ik φ k , (9) v ( b ik , d ik ) = B ik α = (cid:98) α ( L ik φ k − (cid:15) ik ) (cid:99) α . (10)sing Eqs. (9) and (10), we get δ k ( b ik , d ik ) = (cid:98) α ( L ik φ k − (cid:15) ik ) (cid:99) α − L ik φ k . (11)Let us return to the deﬁnition of (cid:15) ik . Comparing Eqs. (3) and(11) we get (cid:15) i +1 k − (cid:15) ik = δ k ( b ik , d ik ) . Moreover, we have: (cid:15) ik = (cid:15) k + i − (cid:88) n =1 ( (cid:15) n +1 k − (cid:15) nk ) = i − (cid:88) n =1 δ k ( b nk , d nk )= δ k ( b k , d k ) + i − (cid:88) n =2 δ k ( b nk , d nk ) . Using Eq. (8) and considering the fact that δ k ( b k ) = 0 , since b k is the ﬁrst time that agent k gets backlogged and conse-quently v ( b k ) = v k ( b k ) , we get the following equation: (cid:15) ik = δ k ( d k ) + i − (cid:88) n =2 δ k ( d n − k , d nk ) = δ k ( d i − k ) . (12)Therefore, CompensationFactor for the i th message of agent k is actually equal to the normalized service deviation afterthe transmission of message p i − k . Using Eq. (8), we have δ k ( d ik ) = δ k ( d i − k )+ δ k ( b ik , d ik ) . Now, deﬁning γ = α ( L ik φ k − δ k ( d i − k )) , and substituting (cid:15) ik = δ k ( d i − k ) into Eq. (11), weget δ k ( d ik ) = δ k ( d i − k ) + δ k ( b ik , d ik ) = (cid:98) γ (cid:99) − γα . (13)Since − < (cid:98) γ (cid:99) − γ ≤ , the proof is complete. Lemma 2.

The normalized service deviation of agent k , k ∈ N , in a shared medium with our proposed schedulingalgorithm, gets bounded as − α ≤ δ k ( t ) ≤ L maxk φ k , ∀ k ∈ N . (14) Proof.

We consider two possible cases. In the ﬁrst case,agent k is absent at time t , i.e. k ∈ A ( t ) . If p i − k rep-resents the last packet that was transmitted by this agentbefore t , we can easily conclude that this agent’s queuehas been empty in the interval ( d i − k , t ) , which results in v k ( d i − k , t ) = v ( d i − k , t ) , and therefore δ k ( d i − k , t ) = 0 .Now, δ k ( t ) can be calculated as follows: δ k ( t ) = δ ( d i − k ) + δ ( d i − k , t ) = δ ( d i − k ) . Finally, using Lemma 1 we get − α < δ k ( t ) ≤ , k ∈ A ( t ) . (15)Hence, while an agent is absent, its normalized service de-viation equals δ ( d i − k ) , where p i − k is the last packet sent byit before becoming absent.Now, let us consider the other case in which agent k isbacklogged at t . Assuming that p ik is the ﬁrst packet thatﬁnishes its service after t , we get b ik ≤ t ≤ d ik , where b ik = max { a ik , d i − k } . So, we can calculate δ k ( t ) as δ k ( t ) = δ k ( b ik ) + δ k ( b ik , t ) . Since the transmission of packet p ik be-gins after sensing B ik idle slots, the network’s virtual timewill have an increase of B ik /α by the time agent k startstransmitting p ik , i.e., t = s ik . Furthermore, the medium willget busy during ( s ik , d ik ) and therefore, the network’s virtualtime will remain unchanged during this interval. Therefore,for t ∈ [ b ik , d ik ] , v ( b ik , t ) satisﬁes the following: ≤ v ( b ik , t ) ≤ v ( b ik , s ik ) = B ik α = (cid:98) α ( L ik φ k − δ k ( d i − k )) (cid:99) α , (16)where we have used (cid:15) ik = δ k ( d i − k ) from Eq. (12). On theother hand, agent k is backlogged during ( b ik , d ik ) and there-fore, v k ( b ik , t ) = w k ( b ik , t ) . Since s ik represents the time thattransmission of packet p ik begins, w k ( b ik , t ) will remain zerountil s ik , and then reaches L ik /φ k at time d ik because of trans-mitting packet p ik during ( s ik , d ik ) . So we have ≤ v k ( b ik , t ) = w k ( b ik , t ) ≤ w k ( b ik , d ik ) = L ik φ k . (17)Since δ k ( b ik , t ) = v ( b ik , t ) − v k ( b ik , t ) , from Eqs. (16)and (17) we deduce that δ k ( b ik , t ) is zero for t = b ik , reachesits maximum at t = s ik and ﬁnally decreases to δ k ( b ik , d ik ) at d ik . So, this yields min (cid:8) , δ k ( b ik , d ik ) (cid:9) ≤ δ k ( b ik , t ) ≤ δ k ( b ik , s ik ) . Adding δ k ( d i − k ) to each side of the above inequalities, weget min (cid:8) δ k ( d i − k ) , δ k ( d ik ) (cid:9) ≤ δ k ( t ) ≤ δ k ( d i − k ) + δ k ( b ik , s ik ) , (18)where we have used Eq. (8). Using Lemma 1, we have − α ≤ min (cid:8) δ k ( d i − k ) , δ k ( d ik ) (cid:9) . (19)Moreover, since v k ( b ik , s ik ) = w k ( b ik , s ik ) = 0 , the upper-bound in Eq. (18) can be simpliﬁed as δ k ( d i − k ) + δ k ( b ik , s ik ) = δ k ( d i − k ) + v ( b ik , s ik )= δ k ( d i − k ) + (cid:98) α ( L ik φ k − δ k ( d i − k )) (cid:99) α = (cid:98) γ (cid:99) − γα + L ik φ k ≤ L maxk φ k . Therefore, using (18) and (19) we have − α ≤ δ k ( t ) ≤ L maxk φ k , k ∈ B ( t ) , (20)Finally, the lemma follows from (15) and (20).It can be shown from Lemma that | δ k ( t , t ) | ≤ L maxk φ k + 1 α , k ∈ N . (21)Using Deﬁnitions 4 and 5, and inequality (21), we get thefollowing corollary. orollary 1. For any agent k ∈ B ( t , t ) , we have | v ( t , t ) − w k ( t , t ) | ≤ L maxk φ k + 1 α . (22)Now, Theorem 1 can be easily obtained by using Corol-lary 1 for any pair of agents k and j , k, j ∈ B ( t , t ) .It can be observed from Theorem 1 that the maximumnormalized service disparity between a pair of backloggedagents is a function of the maximum packet sizes, fairnessweights and the scaling factor. Furthermore, the effect ofdiscretizing backoff times after scaling by α is reﬂected inthe term /α in Eq. (4). Hence, using small scaling factorsresults in large maximum service deviations and vice versa. Throughput Analysis of DSCFQ

In this section, we study the achievable network throughputby our proposed algorithm. We begin by introducing someassumptions and parameters that will be used in the analysis.We concentrate on saturation throughput, which repre-sents the network throughput in overloaded conditions thatall the agents are always backlogged. Let us denote by g k the probability that agent k ∈ class II attempts to transmita packet in a given generalized slot. Furthermore, we deﬁnetotal attempt rate G as the expected number of transmissionattempts in a generalized slot, i.e., G = (cid:80) Nn =1 g k . Now,we analyze the network throughput for the case in which N (cid:29) , and assume that each agent’s average access prob-ability is very small compared to the total transmission at-tempt rate, i.e., g k (cid:28) G, ∀ k ∈ N . Since G represents theaverage number of transmission attempts in a given gener-alized slot, and also each collision wastes much more timecompared to an idle slot, we expect the optimal total attemptrate, which maximizes the network saturation throughput, tobe less than one. So, we conduct the analysis for the case inwhich G < , and consequently g k (cid:28) , ∀ k ∈ N .As discussed in the previous section, a generalized timeslot may be an idle slot, may contain a successful transmis-sion or it may contain collisions. So, a generalized time slotmight have a length of T idle = σ , T succ or T coll as shown inFig. 3. Let us denote by T s and T c the duration of a success-ful transmission and the length of the wasted time caused bythe collision of class II agents, respectively. Hence, T coll and T succ can be calculated based on Fig. 3 as follows: T coll = T c + T CRP + σ, T succ = T s + σ, where T CRP represents the duration of the contention res-olution period (CRP). Now, we calculate the probabilitiesthat a given generalized slot is idle ( P idle ), it contains a suc-cessful transmission ( P succ ) or it contains collisions ( P coll ).Since each agent k ∈ class II, transmits with probability g k in a generalized slot, the total number of transmission at-tempts in a given slot has a Poisson Binomial distribution.Furthermore, Le Cam’s theorem (Le Cam 1960) suggeststhat the Poisson Binomial distribution with g k (cid:28) , ∀ k ∈N , can be approximated by a Poisson distribution with mean G = (cid:80) Nk =1 g k . Therefore, considering the fact that a givenslot remains idle (contains a successful transmission) only if zero (one) transmission attempt has been made in that slot,we can calculate P idle , P succ and P coll as follows P idle (cid:39) e − G , P succ (cid:39) Ge − G ,P coll = 1 − P succ − P idle = 1 − Ge − G − e − G . (23)Now, let us deﬁne the normalized saturation throughput,which is represented by S , as the fraction of time themedium is used to successfully transmit payload bits. Wecan express S as follows: S = ( P s + ¯ n c P c )( ¯ L/C ) P s T succ + P i σ + P c T coll , (24)where ¯ L , C and ¯ n c denote the average message length, thetransmission rate, and the average number of agents involvedin a given collision of class II nodes, respectively. Let n t denote the number of agents attempting a transmission in agiven generalized slot. Since a collision happens only if thenumber of transmitted packets in a given time slot is largerthan one, ¯ n c can be calculated as follows ¯ n c = E [ n t | collision ] = E [ n t | n t >

1] = G (1 − e − G ) . (25) Adaptive DSCFQ

As we discussed earlier, scaling factor is a parameter forbackoff adjustment and it has a direct impact on the net-work throughput. Choosing a small scaling factor results inan increase of the collision probability, and reduces the idleintervals during which the medium is left unused. On theother hand, increasing the scaling factor causes less colli-sions, but large idle intervals. Therefore, it is important topick the optimal scaling factor that maximizes the through-put. In the following, we propose a simple adaptive methodfor updating the scaling factor to reach the peak saturationthroughput.Let us ﬁrst deﬁne the optimal total attempt rate, G ∗ , as thevalue for which the saturation throughput in Eq. (24) is max-imized. Substituting G ∗ in Eq. (23), we get P ∗ idle , P ∗ succ and P ∗ coll , respectively. Now, we introduce an adaptive methodto reach the maximum saturation throughput. As discussedearlier, each generalized slot has one of the three differentstates of successful transmission, collision or idle state. As-sume that all the agents agree on a common initial scalingfactor, which is denoted by α . Then each agent senses themedium and updates α based on each generalized slot’s state as follows: α new = (cid:40) α, success α + γ, collision α − β, idle (26)where P ∗ idle β = P ∗ coll γ. Now, let α ∗ denote the optimal scaling factor that maxi-mizes the saturation throughput. Since P idle and P coll are,respectively, monotonically decreasing and increasing func-tions of G , and G is inversely proportional to α , we have D α = (cid:26) γP coll − βP idle > , if α < α ∗ γP coll − βP idle < , if α > α ∗ (27)able 1:Simulation ParametersParameter Value Deﬁnition C

12 Mbps data transmission rate C ctrl L ACK

112 bits ACK frame length L RT S

160 bits RTS frame length L CT S

112 bits CTS frame length ¯ L σ µ s slot time SIF S µ s SIFS timewhere D α is the expected drift of the scaling factor from onegeneralized slot to the next one. Hence, as long as α < α ∗ ,there will be a positive drift which increases α on average.On the other hand, α > α ∗ results in a negative drift andtherefore, leads to the reduction of the scaling factor. Performance Evaluation

In this section, we present our evaluations of the proposeddistributed scheduling algorithm. We ﬁrst describe the ex-perimental setup and the technical assumptions used in theexperiments. We then explain the baselines and the metricsthat have been used for the evaluations. Finally, we discussthe experiments and results.

Experimental Setup:

Consider a network of 10 agentsthat are always backlogged and need to share their infor-mation with each other. Furthermore, the importance ofthe agents’ information are different. Therefore, each agentchooses a fairness weight that reﬂects the importance ofits messages. We consider a scenario in which three agentschoose φ = 10 , three other agents choose φ = 8 , two agentspick φ = 2 and ﬁnally two agents pick φ = 1 as their fair-ness weights. Moreover, to study the performance of our al-gorithm in more realistic conditions, we implement our al-gorithm based on 802.11 DCF. Speciﬁcally, in our imple-mentation we consider various practical requirements suchas the control messages (RTS/CTS/ACK), inter frame spac-ing intervals, the imperfections caused by the packet headersand the propagation delay, etc. The parameters used in ourimplementation are summarized in Table 1. Baselines:

To compare our algorithm with the existingfair scheduling algorithms, we classify these baselines intotwo general types. The ﬁrst type, which we call Type I,includes algorithms that emulate weighted fair queueingby taking backoff intervals (or IFS values) in proportionto the packet size of the head of the queue messages, i.e. B ik = (cid:98) α ( L ik φ k ) (cid:99) , while still using random mechanisms suchas BEB for collision resolution. The DFS algorithm intro-duced by (Vaidya et al. 2005) belongs to this category. Theother type (Type II) uses similar methods for providing fair-ness, with the difference of giving a higher priority to theagents with unsuccessful last transmission attempts. Theproposed algorithm by (Dugar, Vaidya, and Bahl 2001) isan example of type II algorithms. An advantage of the above classiﬁcation is that the value and impact of each part of ouralgorithm on the performance of the system can be assessedseparately. Evaluation Metrics:

For evaluating the fairness of ouralgorithm, we use the fairness index (Jain et al. 1996), whichis deﬁned as follows:fairness index = (cid:0) (cid:80) k T k /φ k (cid:1) N · (cid:80) k (cid:0) T k /φ k (cid:1) , (28)where T k denotes the throughput of ﬂow k . Moreover, wecompare the degree of fairness over different time-scales us-ing Sliding Window Method (SWM), introduced by (Kok-sal, Kassab, and Balakrishnan 2000). In this method, weslide a window of size w across a packet trace of mediumaccesses. By calculating the fairness index for the transmis-sions that lie in this window after each sliding, we get a se-quence of fairness indices. We report the average fairnessindex as the fairness metric for a given window size. Theother metrics used in our evaluations are the throughput ofthe agents and the network throughput. The throughput ofa particular agent in a given interval is deﬁned as the ratioof the successfully transferred information to the length ofthe interval. The network throughput can be calculated byadding the throughput of the agents. Results and Discussions

Let us start our evaluations with the fairness analysis ofthe system. Fig. 4 shows the performance of our algorithmalong with type I and II scheduling schemes for differ-ent time scales ( w =30, 50, 100 and 1000). As can be ob-served, our algorithm achieves the best performance amongall the compared algorithms in both short-term ( w =30, 50)and long-term ( w =100, 1000) comparisons. While our al-gorithm achieves a relatively constant fairness index overa wide range of scaling factors ([0.0001,0.02]), the perfor-mance of type I and II algorithms deteriorate dramaticallyfor the small scaling factors. This is particularly importantsince large scaling factors can result in more idle time slotsand therefore, less efﬁcient utilization of the medium. Onthe other hand, we can observe that the difference betweenthe fairness indices of the compared algorithms starts to di-minish as we increase the scaling factor. This can be ex-plained by considering the main sources of unfairness inType I and II algorithms. Speciﬁcally, the agents with un-successful last transmissions have no priority over the otheragents in type I algorithms, which creates unfairness in thecase of collisions. Since increasing the scaling factor resultsin less collisions, the effect of this factor becomes less dom-inant. On the other hand, discretization of the backoff inter-vals, i.e., using (cid:98) α ( L ik /φ k ) (cid:99) instead of α ( L ik /φ k ) , is anothersource of unfairness in both type I and II algorithms. In asimilar way, the impact of this factor is more noticeable forsmall scaling factors, where a large range of packet lengthscan be mapped into the same backoff intervals after scal-ing and discretization. The compensation factor (cid:15) ik used inthe backoff calculation step in our algorithm (Eq. (2)) dealswith this issue effectively. Since our algorithm eliminatesthe effect of the two sources of unfairness, it achieves the − − − − − − − − − − − − Scaling factor F a i r ne ss I nde x DSCFQType IIType I

Scaling factorScaling factor Scaling factor F a i r ne ss I nde x F a i r ne ss I nde x F a i r ne ss I nde x (a) w=30 − − − − − − − − − DSCFQ

Type IIType I − − − Scaling factor F a i r ne ss I nde x DSCFQType IIType I

Scaling factorScaling factor Scaling factor F a i r ne ss I nde x F a i r ne ss I nde x F a i r ne ss I nde x (b) w=50 − − − − − − − − − DSCFQ

Type IIType I − − − Scaling factor F a i r ne ss I nde x DSCFQType IIType I

Scaling factorScaling factor Scaling factor F a i r ne ss I nde x F a i r ne ss I nde x F a i r ne ss I nde x (c) w=100 − − − − − − − − − − − − Scaling factor F a i r ne ss I nde x DSCFQType IIType I

Scaling factorScaling factor Scaling factor F a i r ne ss I nde x F a i r ne ss I nde x F a i r ne ss I nde x (d) w=1000 Figure 4: Average fairness index as a function of the scaling factor, for window sizes w =30, 50, 100, 1000. T h r oughpu t F a i r ne ss I nde x Time (s) S c a li ng f a c t o r Time (s) Time (s) (a) T h r oughpu t F a i r ne ss I nde x Time (s) S c a li ng f a c t o r Time (s) Time (s) (b) T h r oughpu t F a i r ne ss I nde x Time (s) S c a li ng f a c t o r Time (s) Time (s) (c)

Figure 5: Convergence of the (a) scaling factor (b) fairnessindex and (c) normalized throughput to their optimal values.best performance among all the compared baselines in dif-ferent time scales (Fig. 4). Therefore, the difference betweenthe fairness indices of type I and II algorithms shows theamount of improvement in average fairness index, using analgorithm that gives priority to the agents with unsuccessfullast transmission attempts. On the other hand, the differencebetween our algorithm and type II class demonstrates the in-crease in the fairness index by using compensation factor inthe backoff calculation as in Eq. (2). Another observationwe make is that the fairness index generally decreases for allthe algorithms as we consider shorter time scales. This phe-nomenon can be also observed in the centralized schedulingalgorithms, which is justiﬁed by the packet-based nature ofthe problem (in contrast to ﬂuid ﬂow models).Now, let us study the adaptive behaviour of our algorithm.As we discussed before, there is an intrinsic trade-off be-tween the fairness and the network throughput, which canbe controlled by the scaling factor. In previous section, weproposed an adaptive method for updating the scaling fac-tor to achieve the maximum throughput attainable by ouralgorithm. Fig. 5a shows dynamic adaptation of the scalingfactor for the case in which it is initially set to 0.2. We ob-serve that as the scaling factor converges into its optimumvalue, the fairness index approaches one (Fig. 5b) and thenetwork throughput reaches its maximum value (Fig. 5c).This can be validated from Fig. 6a, which shows the nor-malized network throughput obtained from the simulationand the theory (Eq. (24)). As can be seen, the analyticallydriven throughput from Eq. (24) closely approximates thethroughput obtained from the simulations. Moreover, we canobserve that the maximum throughput ( (cid:39) . ) is achievedfor a scaling factor around . , which are the same valuesthat our algorithm converges into in Figs. 5a and 5c. Finally, Scaling Factor T h r oughpu t SimulationAnalysis (a)

Time (s) T h r oughpu t =8 =2 (b) Figure 6: a) Comparison of the normalized network through-put obtained from the simulation and Eq. (24), as a func-tion of the scaling factor. b) Convergence of the normalizedthroughput of agents 1 and 2 (with φ = 2 and φ = 8 ) intothe same sharesFig. 6b shows the convergence of the normalized throughputof two agents with fairness weights of φ = 2 and φ = 8 .As can be observed, our algorithm equalizes the normalizedthroughput of these agents after its convergence. Conclusion

In this paper, we considered one of the important, but under-studied challenges in the communication of multi-agent sys-tems. Speciﬁcally, we studied the shared medium contentionproblem in the communication of the agents and proposed anew distributed scheduling algorithm for providing fairnessin these systems. This becomes even more important whenthe agents’ information or observations have different im-portance, in which case the agents require different priori-ties for accessing the medium and sharing their information.We showed that our proposed algorithm can provide a de-terministic bound on the maximum service disparity amongany pair of agents. This can particularly improve the short-term fairness, which is important in real-time applications.Moreover, we designed an adaptive mechanism that enablesour scheduling algorithm to adjust itself and achieve a highthroughput at the same time.

References

Bertsekas, D.; and Gallager, R. 1992.

Data Networks (2NdEd.) . Upper Saddle River, NJ, USA: Prentice-Hall, Inc.Choi, J.; Yoo, J.; and Kim, C.-K. 2008. A Distributed FairScheduling Scheme With a New Analysis Model in IEEE802.11 Wireless LANs.

Vehicular Technology, IEEE Trans-actions on

MILCOM. Communica-tions for Network-Centric Operations: Creating the Infor-mation Force. IEEE , volume 2, 993–997 vol.2.Foerster, J.; Assael, I. A.; De Freitas, N.; and Whiteson, S.2016. Learning to communicate with deep multi-agent re-inforcement learning. In

Advances in neural informationprocessing systems , 2137–2145.Goldman, C. V.; and Zilberstein, S. 2003. Optimizing in-formation exchange in cooperative multi-agent systems. In

Proceedings of the second international joint conference onAutonomous agents and multiagent systems , 137–144.Golestani, S. 1994. A self-clocked fair queueing scheme forbroadband applications. In

INFOCOM ’94. Networking forGlobal Communications., 13th Proceedings IEEE , 636–646vol.2.Goyal, P.; Vin, H.; and Cheng, H. 1997. Start-time fairqueueing: a scheduling algorithm for integrated servicespacket switching networks.

Networking, IEEE/ACM Trans-actions on

Advances in neuralinformation processing systems , 7254–7264.Kim, D.; Moon, S.; Hostallero, D.; Kang, W. J.; Lee, T.; Son,K.; and Yi, Y. 2018. Learning to Schedule Communicationin Multi-agent Reinforcement Learning. In

InternationalConference on Learning Representations .Koksal, C. E.; Kassab, H.; and Balakrishnan, H. 2000. AnAnalysis of Short-Term Fairness in Wireless Media AccessProtocols. In

ACM Sigmetrics 2000 . Santa Clara, CA.Le Cam, L. 1960. An approximation theorem for the Poissonbinomial distribution.

Paciﬁc J. Math.

Vehicular Tech-nology, IEEE Transactions on

AAAI 2020 , 5142–5149.Parekh, A.; and Gallager, R. 1993. A generalized processorsharing approach to ﬂow control in integrated services net-works: the single-node case.

Networking, IEEE/ACM Trans-actions on

Proceedings of the 31st International Conference on Com-puter Animation and Social Agents , 11–16.Qiao, D.; and Shin, K. 2002. Achieving efﬁcient channelutilization and weighted fairness for data communications inIEEE 802.11 WLAN under the DCF. In

Quality of Service,2002. Tenth IEEE International Workshop on , 227–236. Shreedhar, M.; and Varghese, G. 1996. Efﬁcient fair queuingusing deﬁcit round-robin.

Networking, IEEE/ACM Transac-tions on

Inter-national Conference on Learning Representations . URLhttps://openreview.net/forum?id=rye7knCqK7.Vaidya, N.; Dugar, A.; Gupta, S.; and Bahl, P. 2005. Dis-tributed fair scheduling in a wireless LAN.

Mobile Comput-ing, IEEE Transactions on

Proceedings of the 2013 international conference on Au-tonomous agents and multi-agent systems , 1101–1108.Zhang, S. Q.; Zhang, Q.; and Lin, J. 2019. Efﬁcient commu-nication in multi-agent reinforcement learning via variancebased control. In