Distributed Learning for Proportional-Fair Resource Allocation in Coexisting WiFi Networks
DDistributed Learning for Proportional-Fair ResourceAllocation in Coexisting WiFi Networks
Piotr Gawłowicz ∗ , Jean Walrand † , Adam Wolisz ∗∗ Technische Universit¨at Berlin, Germany, { gawlowicz, wolisz } @tkn.tu-berlin.de † University of California, Berkeley, USA, [email protected]
Abstract —In this paper, we revisit the widely known perfor-mance anomaly that results in severe network utility degradationin WiFi networks when nodes use diverse modulation and codingschemes. The proportional-fair allocation was shown to mitigatethis anomaly and provide a good throughput to the stations. It canbe achieved through the selection of contention window valuesbased on the explicit solution of an optimization problem or, asproposed recently, by following a learning-based approach thatuses a centralized gradient descent algorithm. In this paper, weleverage our recent theoretical work on asynchronous distributedoptimization and propose a simple algorithm that allows WiFinodes to independently tune their contention window to achieveproportional fairness. We compare the throughputs and air-timeallocation that this algorithm achieves to those of the standardWiFi binary exponential back-off and show the improvements.
Index Terms —WiFi, Contention Window, Distributed Opti-mization
I. I
NTRODUCTION
The widely deployed WiFi technology uses DistributedCoordination Function (DCF), a simple decentralized channelaccess mechanism based on the CSMA/CA protocol to sharethe limited wireless channel capacity. With DCF, the nodesadjust their channel access probability using a back-off mech-anism. Specifically, they maintain a dynamically changingcontention window (CW) and select the time lag betweentwo consecutive transmission attempts as a random numberwithin the scope of the CW. By design, DCF ensures equallong term channel access probabilities to all involved stations,thus guaranteeing frame-level fairness. However, the frame-fairness leads to the so-called performance anomaly problemwhen stations use different transmission rates.Stations employ various modulation and coding schemes(MCSs) to preserve transmission robustness in response to thequality of the radio link they experience, which in turn dependson their communication distance, mobility, and other factors.The lower MCSs offer increased robustness against errors atthe cost of a drop in data rate. Consequently, the WiFi stationsthat use lower data rates occupy the channel for a longer periodof air-time that could be otherwise used more efficiently bythe faster clients. Precisely, under the frame-level fairness, astation with poor transmission conditions captures extensiveair-time share and reduces the air time available to otherstations, and hence the station slows down all stations. Thispathological behavior was first identified in 802.11b networks[1]. Then, Patras et al. [2] demonstrated that the effect isdramatically exacerbated in today’s high throughput networks (i.e., 802.11n/ac), where the data rates among stations mayvary by orders of magnitude, e.g., the throughput of a stationusing the data rate of 780 Mbps become similar to that of aco-existing station with the data rate of 6 Mbps.The proportional-fair allocation, introduced by Kelly et al. [3], was shown to address the performance anomaly problemappropriately [4]. By definition, it maximizes the networkutility (defined as the sum of the logarithms of individualthroughputs) subject to the constraints that the communicationconditions impose on the individual stations. In [5], the pro-portional fairness allocation in WiFi networks was formulatedas a convex optimization problem that is solved by selectingoptimal contention window values for the stations.The existing approaches to solve this problem are deployedin a central node (e.g., AP or a controller node) and requireknowledge of the average frame duration or throughput ofeach transmitting station. However, such centralized operationcannot be assumed in the case of overlapping but separatelymanaged WiFi networks. Furthermore, the standard does notenvision the possibility of communication between nodesbelonging to different networks.Consequently, in this paper, we propose a distributedlearning-based approach where WiFi nodes independently tunetheir own contention window to achieve proportional fairness.Our algorithm is based on a stochastic convex optimizationframework. Specifically, it is based on our previous work [6]that proves the convergence of the Kiefer-Wolfowitz algo-rithm [7] in a distributed and asynchronous setting. Thoseproperties are highly beneficial for coexisting WiFi nodes asthey allow learning channel access parameters without anycoordination nor explicit information exchange. Instead, thenodes use overheard frames to compute the network utilityand independently follow a gradient descent method to opti-mize overall network performance. In general, we explore theusability of the distributed and asynchronous Kiefer-Wolfowitz(DA-KW) algorithm in a practical use-case of wireless opti-mization. Specifically, the contributions of this work are asfollow: • We propose a simple approach for distributed contentionwindow tuning that achieves proportional fairness incoexisting WiFi scenarios. To this end, we apply theDA-KW algorithm to the WiFi domain while introducingslight modifications to address the practical issues. • Using simulations, we evaluate the proposed approach interms of convergence speed and achieved performance a r X i v : . [ c s . N I] F e b n multiple scenarios. Moreover, we compare its perfor-mance with the standard WiFi DCF. • We investigate the impact of the level of coordination(i.e., synchronized execution). It appears that there is nosignificant gain of the coordination in the case of a singlecollision domain. However, coordination and informationexchange allow achieving optimal channel allocation inthe case of overlapping collision domains.II. R
ELATED W ORK
Proportional fairness in WiFi networks has been extensivelystudied, e.g., [8]–[10]. Checco et al. [4] provided an anal-ysis of proportional fairness in WiFi networks. The authorsproved that a unique proportional fair rate allocation existsand assigns equal total (i.e., spent on both colliding andsuccessful transmission) air-time to nodes. Patras et al. [2]extended this analysis to multi-rate networks, and confirmedthat under proportional fair solution all the stations get anequal share of the air-time, which is inversely proportional tothe number of active stations. The authors formulated networkutility maximization as a convex optimization problem andprovided a closed-form solution that can be solved explicitly.To this end, a WiFi access point (AP) estimates the averageframe transmission duration for each station and pass it as aninput to an optimization tool. The computed CW values aredistributed to nodes in a beacon frame. The optimization isexecuted periodically (i.e., every beacon interval) to react tochanges in the network (e.g., traffic or wireless conditions).An alternative approach was proposed by Famitafreshi et at. [11]. The authors use a stochastic gradient descent(SGD) algorithm, which can iteratively learn the optimal con-tention window only by monitoring the network’s throughput.Specifically, the learning agent resides in a WiFi AP, whereit measures the uplink throughput of each connected stationand send the CW value updates to all the stations in a beaconframe. The algorithm combines two utility values measuredunder different CW values to compute the gradient and updatethe CW following the SGD algorithm.Both proposed algorithms use global knowledge and central-ized operation (i.e., deployed in AP). However, in typical sce-narios, multiple networks under separate management domainsare co-located and have to coexist, and there is no centralentity with the full knowledge to perform the optimization orlearning. III. R ELEVANT W I F I B ACKGROUND
In this section, we briefly describe the CSMA operation,present its analytical models, and summarize the throughputoptimization in WiFi networks.
A. WiFi Random Back-off Operation
WiFi nodes use the DCF mechanism to access the chan-nel. The DCF is based on the Carrier Sense Multiple Ac-cess/Collision Avoidance (CSMA/CA) method and employsbinary exponential back-off (BEB) to control the contentionwindow. Specifically, in DCF, before a frame transmission, a WiFi station has to perform a random back-off procedure.To this end, it initializes its back-off counter with a randomnumber drawn uniformly from { , . . . , CW − } , where CW is the contention window. Then, the station observes thewireless channel and decrements the counter whenever it issensed idle during a DCF inter-frame space (DIFS) and freezesthe back-off counter otherwise. Finally, when the back-offcounter reaches zero, the station transmits a frame. If thetransmission is successful (as indicated by the reception ofan acknowledgment), the station sets CW to the minimalvalue, i.e., CW min , for the next transmission. Otherwise, itdoubles the previous contention window and performs theframe retransmission. The CW is increased until it reachesthe maximal value defined by CW max . B. Analytical Model of Contention-based Medium Access
Here, we briefly describe the analytical model derived in [2]which allows computing the total throughput achieved by WiFinodes. We assume that all nodes are in a single collisiondomain (i.e., each node overhears transmissions of all othernodes). For the sake of clarity of presentation, in this section,we consider the case where all stations are saturated (i.e.,they always have packets to transmit), but in Section VI, wealso evaluate our algorithm in scenarios with non-saturatedtraffic. Note that the model allows for arbitrary packet sizesand selection of MCS.Let us consider a set of N wireless stations, where eachactive station i accesses the channel with slot transmissionprobability λ i . The relation between channel access probabilityto a constant contention window is CW i = − λ i λ i [12]. Thetransmission failure probability experienced by a station i equals p f,i = 1 − (1 − p n,i )(1 − p i ) , where p n,i is the probabilitythat the transmission fails due to channel errors (e.g., noise orinterference), while p i = 1 − (cid:81) Nj =1 ,j (cid:54) = i (1 − λ j ) denotes thecollision probability experienced by a packet transmitted bythis station. Then, the throughput of station i equals S i : S i = p s,i D i P e T e + P s T s + P u T u (1)where, p s,i = λ i (1 − p f,i ) is the probability of a successfultransmission performed by station i , while D i denotes itsframe payload size in bits. P e = (cid:81) Ni =1 (1 − λ i ) is the proba-bility that the channel is idle during a slot of duration T e (e.g., µs in 802.11n). P s = (cid:80) Ni =1 p s,i and P u = 1 − P e − P s are theexpected probabilities of successful and unsuccessful transmis-sion with the expected durations T s = (cid:81) Ni =1 p s,i P s T s,i and T u = (cid:81) Ni =1 p u,i P u T u,i , respectively. Here, T s,i and T u,i are the dura-tions of a successful and a failed transmissions of each stationthat depend on the fixed preamble duration, the variable dura-tion of a header, the size of a payload transmitted with the PHYrate C i , and whether an acknowledgment is sent (success)or not (failure). Finally, p u,i = τ i p n,i (cid:81) Nj =1 ,j (cid:54) = i (1 − τ j ) + τ i (cid:16) − (cid:81) i − j =1 (1 − τ j ) (cid:17) (cid:81) Nj = i +1 (1 − τ j ) is the probability of anunsuccessful transmission (either due to collision or channelerrors) of stations of highest index i (when labeling stationsccording to their transmission durations). Note that the properlabeling is needed as the duration of a collision is dominatedby the frame with the longest duration involved in thatcollision, and collisions should only be counted once.Using the transformed variable y i = λ i − λ i , the expressionof a station’s throughput (1) can be rewritten as: S i = (1 − p n,i ) y i Y D i (2)where Y = T e + (cid:80) Ni =1 (cid:16) y i T s,i (cid:81) i − k =1 (1 + y k ) (cid:17) .We refer to [2] for the details of the model and the transitionfrom identity (1) to (2). C. Proportional-fair Allocation
Following [4], we formulate the proportional-fair allocationproblem as a convex optimization problem. The global utilityfunction is defined as the sum of the logarithms of individualthroughputs, i.e., U = (cid:80) Ni =1 ˜ S i , where ˜ S i = log ( S i ) . Theutility maximization problem is as follows:maximize N (cid:88) i =1 ˜ S i s.t. ˜ S i ≤ log (cid:16) z i y i Y D i (cid:17) , i = 1 , , ..., N and ≤ y i , i = 1 , , ..., N (0 ≤ λ i ≤ where z i = 1 − p n,i . The constraints ensure that the optimalsolution is feasible, i.e., it is within the log-transformed rateregion ˜ R . The rate region R is a set of achievable throughputvectors S ( λ ) = [ S , S , ..., S N ] as the vector λ of attemptprobabilities ranges over domain [0 , N . The set R is knownto be non-convex in 802.11 networks, but the log-transformedrate region ˜ R is strictly convex [5]. Moreover, as the strongduality and the KKT (Karush-Kuhn-Tucker) conditions aresatisfied, a global and unique solution exists.IV. D ISTRIBUTED S TOCHASTIC O PTIMIZATION P RIMER
Here, we briefly introduce the stochastic convex optimiza-tion techniques in centralized (i.e., single agent) and dis-tributed (i.e., multiple agents) settings.
A. Stochastic Convex Optimization
A Stochastic Convex Optimization deals with minimizingof the expected value of a function F ( x , ξ ) that is convex in x ∈ R d where ξ is random vector:Find x ∗ = argmin x ∈K f ( x ) := E ( F ( x , ξ )) . (3)The setup is that one has access to sample values of F ( x , ξ ) .If one could measure the gradient ∇ f ( x ) of the function,one could use a gradient descent algorithm where the pa-rameters at the t -th iteration, x t , are updated according to x t +1 = x t − η t ∇ f ( x ) , where η t is the step size at iteration t .However, in practical scenarios there is no access to the actualgradient ∇ f ( x ) . Accordingly, the stochastic gradient descentalgorithms rely on constructing noisy gradient estimates ˜ g t , which are then used to adjust the parameters according to x t +1 = x t − η t ˜ g t .The Kiefer-Wolfowitz (KW) algorithm [7] is a gradient es-timation method that combines two function evaluations withperturbed values of its variable to compute the estimate. Thesimultaneous perturbation stochastic approximation (SPSA)algorithm [13] is an extension of the KW algorithm towardsmultivariate problems. In SPSA, the partial derivatives withrespect to the different variables are estimated by simultane-ously perturbing each variable by an independent and zero-mean amount, instead of perturbing the variables one at a time.The optimization procedure can be performed in a cen-tralized or distributed setting. In the former case, a singleagent knows and controls all the variables in vector x andhas the exclusive right to query the function (we refer to it as environment ), while in the latter case, those assumptions donot hold. B. Distributed Convex Optimization
In the distributed settings, a set of N distributed agents try tooptimize a global objective function. The critical challenge isthat agents make individual decisions simultaneously, and thevalue of the function depends on all agents’ actions. Moreover,we assume that the agents are not synchronized (i.e., theyquery the function and update their variable asynchronously ata random point in time) and cannot communicate. Specifically,each agent adjusts his own variable x i without knowing thevalues of the other variables, i.e., x − i . Fig. 1 shows theinteraction between agents and the environment. Agent Agent
Environment stores and computes
Agent starts with randomdelay
Every submitObserve
Fig. 1. The interaction between distributed agents and the environment –agents asynchronously submit their individual variables to the environmentand get noisy observations of its value.
Following a naive application of the KW algorithm, eachagent estimates the gradient by perturbing its own variable bya zero-mean change and querying the global function everytime interval τ . Then, it uses those measurements to determinethe gradient and updates its variable in proportion to thecomputed estimate. Note that agents perform such experimentswithout any coordination. Thus, when an agent attempts to geta second evaluation of the function, the function may havealready changed due to another agent’s query. Intuitively, eachagent gets gradient estimates corrupted due to the actions ofother agents. However, we have recently proved that the KWalgorithm can converge to the optimal solution in a distributedand asynchronous (DA) setting if the size of perturbationis bounded [6]. The algorithm executed by each agent ispresented as Algorithm 1.V. D ISTRIBUTED C ONTENTION W INDOW L EARNING
Based on a distributed and asynchronous Kiefer-Wolfowitzalgorithm, we propose a simple collaborative learning schemeor WiFi nodes, which allows them to learn the optimal con-tention window values without coordination and informationexchange.
A. Practical Issues
There are several practical issues that have to be considered.First, the utility function evaluation is not immediate, as anagent cannot measure the instantaneous throughput. Instead,it has to count an amount of successfully transmitted databy overhearing frames and compute the mean throughputof neighboring nodes over the measurement time slot τ .As agents follow this procedure simultaneously with randomphase offsets, the utility that each agent observes is an averageof multiple function evaluations during τ that correspond tochanging CW of agents. Second, in contrast to the formalproof where the function is globally defined, the congestionwindow takes values in a discrete set between CW min and CW max . Finally, to make an algorithm responsive to changes(e.g., used data rates), we use constant exploration and learningparameters and remove the termination condition. Therefore,the algorithm cannot converge to the optimum value but onlyto its neighborhood. B. Proposed Approach
We apply a modified version of the DA-KW algorithm tolearn the IEEE 802.11’s CW cooperatively. More specifically,as illustrated in Algorithm 2, we introduce modifications tomatch the discussed practical issues.From the point of view of a single agent, our proposedtechnique works as follows. During initialization, a WiFinode selects a contention window value within the range of [ CW min , CW max ] . The integer CW value is converted tolog-transformed variable y using function L . Specifically, L computes channel access probability as λ = CW +1 , and thentransformed variable as y = log ( λ − λ ) . By L − , we designatethe operation inverse to L . Algorithm 1:
DA-KW (executed by each agent i ) Input:
Non-increasing sequences ( δ k ) , ( η k ) Input:
Function sample interval τ ≥ Input:
Phase offset p i ∈ { , . . . , τ − } Initialization:
Choose arbitrary y ∈ R for k = 0 , , . . . , do Draw ε k uniformly from {− , } Let t = k × τ + p i At t set x t = y k + ε k δ k Observe g + k = f ( x t ) . At t + τ set x t + τ = y k − ε k δ k Observe g − k = f ( x t + τ ) Compute gradient estimate ˜ g k = g + k − g − k ε k δ k Update y k +1 = y k − η k ˜ g k if (cid:107) x t − x ∗ (cid:107) < ε ∗ break end Algorithm 2: Proposed Algorithm (executed by eachagent i ) Input: L converts CW to log-transformed value y Input:
Constant parameters δ k = δ, η k = η Input:
Function sample interval τ Input:
Random phase offset p i ∈ [0 , τ ] Initialization:
Choose y ∈ [ L ( CW min ) , L ( CW max )] for k = 0 , , . . . , do Draw ε k uniformly from {− , } Let t = k × τ + p i At t set CW t = (cid:6) L − ( y k + ε k δ k ) (cid:7) Observe transmissions of neighbors for τ andcompute g + k = (cid:80) Ni =1 ˜ S i . At t + τ set CW t + τ = (cid:6) L − ( y k − ε k δ k ) (cid:7) Observe transmissions of neighbors for τ andcompute g − k = (cid:80) Ni =1 ˜ S i . Compute gradient estimate ˜ g k = g + k − g − k ε k δ k Update y k +1 = Π k ( y k − η k ˜ g k ) , where Π k is theprojection operator onto K δ k end At a random time point t , node i perturbs its log-transformedvariable by a fixed exploration parameter δ k , i.e., it replaces y k by y k + ε k δ k and converts that value back to the discrete CW t value. Next, for the duration of a single measurement slot τ (e.g., 100 ms), it transmits all its frames applying the CW t value to the back-off procedure. Simultaneously, the nodeobserves the environment formed by all WiFi nodes. That is,by overhearing frames, it counts the amount of data transmittedby each neighbor. At the end of the measurement slot, itcomputes the value of the network utility function, i.e., the sumof the logarithms of the observed throughputs of the differentnodes and the known own throughput. Then at t + τ , the nodeagain perturbs its log-transformed variable, i.e., it replaces y k by y k − ε k δ k , and repeats the measurement procedure.Finally, at t + 2 τ , it combines both measurements to computethe gradient estimate and updates its log-transformed variable y k +1 accordingly. The value is projected to the decision setdefined as K α = [ A + α, B − α ] , where α ≤ B − A . Theprojection of x to the nonempty interval [ a, b ] is defined as Π [ a,b ] = max { min { b, x } , a } . Note that the gradient descentis performed in a continuous domain using log-transformedvariable y , which is then converted and discretized into aninteger CW value. C. Impact of Diverse Coordination Levels
To evaluate the impact of action synchronization among thedistributed nodes, we target the distributed scenario with amulti-step approach that gradually removes the coordinationbetween nodes:
Coordinated Learning:
In the case of full coordination, themeasurement slots of agents are synchronized, i.e., p i = 0 for i ∈ { , .., N } . Specifically, agents perform the gradientestimation procedure and update the CW at the same time.herefore, the utility function is evaluated with constant vari-ables in each time slot. Slotted Learning:
In the second step, we remove the stagecoordination between agents, i.e., the time is still slotted,and agents enter the gradient estimation procedure at the slotboundaries. However, they might be at a different stage ofthe procedure as p i ∈ { , τ } for i ∈ { , .., N } . As a result,they perform a CW update at two different time points. Theenvironment state does not change during a single measure-ment period. However, it changes in each slot. Hence, agentsperform the first measurement (i.e., g − k ) under a differentenvironment state than the second one (i.e., g + k ). Uncoordinated Learning:
In the general case, the actions ofdistributed agents are not synchronized, and at any point intime t each agent is at a different stage of the algorithm. Notethat from each agent’s perspective, the environment state couldchange N − times within a single measurement period.VI. P ERFORMANCE E VALUATION
We evaluate the distributed contention window tuning algo-rithm by means of simulations using the ns-3 network simu-lator [14] in conjunction with ns3-gym [15]. Specifically, weuse the model for IEEE 802.11n in infrastructure-based modeand create multiple overlapping WiFi networks. Note thatnodes belonging to separate networks cannot communicate.If not stated otherwise, we create a fully-connected topology(i.e., single collision domain), where nodes are uniformlyspread in the area of 10-by-10 m; hence, every transmittercan sense ongoing transmissions. In order to change networkcontention conditions, we vary the number of transmittingnodes. Moreover, we change the data rates and frame sizes toinfluence the solution of the proportional-fairness problem. Toturn off the BEB procedure and enable simple uniform back-off, we assign the same value of CW to CW min and CW max .We bound the CW value to the range used in WiFi, i.e., CW ∈ { , } , hence the log-transformed CW operatesin y ∈ {− . , − . } . We show the convergence of thedistributed algorithm using the evolution of the contentionwindow, air-time share, and throughput. We compare theperformance of our technique against the original IEEE 802.11BEB technique. A. Selection of Measurement Slot Duration
First, we evaluate the impact of the measurement slotduration on the quality of the network utility estimates andthe algorithm’s convergence.To this end, we consider a scenario with five transmittingstations with homogeneous traffic, i.e., each station is back-logged with 1000 B UDP packets and transmits to its own APwith a data rate equal to 26 Mbps (MCS 3).Fig. 2 shows the evolution of the contention window value(we show c = log ( CW ) ) for each of transmitting nodes withfour slot durations, τ ∈ { , , , } ms. We observehigher variability for smaller values of τ , which is expecteddue to the random nature of the frame transmissions. Specif-ically, with smaller values of τ , the nodes cannot collect enough statistics to accurately estimate the network utility.These effects are alleviated by increasing the duration of τ so that the throughput estimates become more accurate, but atthe cost of longer convergence time because updates are lessfrequent. For further evaluation, we select τ = 200 ms, andleave its optimization as future work. B. Homogeneous Traffic
Here, we examine the sensitivity of the proposed distributedalgorithm to the type of coordination and learning parameters.Moreover, we vary the number of active nodes with N ∈{ , } . We use the same homogeneous traffic parameters asin section VI-A. Note that the exploration parameter δ andthe learning rate η were adequately selected to provide goodperformance.Fig. 3 and Fig. 4 show a representative evolution of indi-vidual contention windows, total network throughput, as wellas allocation of air-time in the scenario with two and tennodes, respectively. In each setting, the nodes start with thesame CW values. In the case of two nodes, one starts with CW = 1023 and the second one with CW = 15 . While, inthe case of ten nodes, each starts with CW value equal to c , where c ∈ [4 , . First, we observe that the distributednodes converge to similar CW values, as expected becauseof the homogeneous traffic, and the algorithm converges toequal air-time allocation and optimal total network throughput.Note that in the case of ten active nodes, the total throughput isaround 20% higher than that achieved by WiFi. The algorithmconverges faster in the case of ten nodes, as the utility functionbecomes steeper with an increased number of nodes [11].The variability of CW is higher for ten nodes, as the nodeestimates become noisier. Nevertheless, the network operateswith approximately optimal utility. Moreover, in the case oftwo nodes, we observe an interesting cooperative behaviorwhere the initially more aggressive node slows down to freemore air-time for its peer, then both nodes increase theiraggressiveness to maximize the network utility. C W e x p . τ = 25 ms τ = 50 ms Uncoordinated Learning with δ = 0 . η = 0 . C W e x p . τ = 100 ms τ = 200 ms Fig. 2. Convergence of individual CW under various measurement slotduration τ ∈ { , , , } ms. C W e x p . Coordinated Learning δ = 0 . η = 0 . Node 1Node 2 T h r o u g hpu t [ M bp s ] TotalWiFiOpt. A i r - T i m e [ % ] Node 1Node 2
Slotted Learning δ = 0 . η = 0 . Uncoordinated Learning δ = 0 . η = 0 . Uncoordinated Learning δ = 0 . η = 0 . Uncoordinated Learning δ = 0 . η = 0 . Fig. 3. Convergence of the proposed algorithm with two transmitting nodes. C W e x p . Coordinated Learning δ = 0 . η = 0 . T h r o u g hpu t [ M bp s ] TotalWiFiOpt. A i r - T i m e [ % ] Slotted Learning δ = 0 . η = 0 . Uncoordinated Learning δ = 0 . η = 0 . Uncoordinated Learning δ = 0 . η = 0 . Uncoordinated Learning δ = 0 . η = 0 . Fig. 4. Convergence of the proposed algorithm with ten transmitting nodes.
Our results also show that the distributed algorithm behavessimilarly with and without learning coordination, i.e., there isno significant advantage to coordination.Finally, the three right-most columns in Fig. 3 and Fig. 4show the convergence behavior with different learning rates η ∈ { . , . , . } . Increasing the value of η allow thealgorithm to take bigger steps and to converge faster in caseof two nodes. But, when N = 10 , the higher learning ratebrings more fluctuations due to increased noise in the utilityestimates. The evaluation with varying exploration parameterswas skipped due to the space limit. C. Heterogeneous Traffic
Here, we evaluate the performance of the algorithm underheterogeneous traffic, where the optimal CW values are notidentical for all transmitting nodes. We consider two scenarios,where three nodes use diverse transmission parameters: i) thesame data rate (MCS3, 26 Mbps), but diverse packet sizes D i ∈ { , , } B; ii) the same packet size (1500 B),but diverse data rates M i ∈ { . , , } Mbps.In Fig. 5 and Fig. 6, we compare the performance of our dis-tributed approach with standard WiFi operation. Specifically, we are interested in air-time allocation, individual through-puts and packets data rate. Due to the symmetric contentionprocess, WiFi assures an equal number of transmission op-portunities to all nodes, i.e., frame-fairness. Our results showthe performance anomaly problem in WiFi, i.e., despite usingthe higher data rate, the performance of the faster nodes iscapped at that of the slowest station. In contrast, the proposedalgorithm allows nodes to successfully and quickly convergeto equal air-time allocation (i.e., proportional-fair allocation)in both scenarios.
D. Dynamic Scenario
Using a setup similar as in the previous sections, weevaluate the adaptability of the proposed algorithm to networkdynamics. Specifically, we consider a dynamic scenario withthree transmitters that change data rates (e.g., due to thechange of wireless propagation condition). Fig. 7 show theindividual air-time allocation and throughput. The traffic issaturated, and nodes use a packet size of 1000 B. The nodesstarts with the same data rate of 13 Mbps (MCS2), then at time t = 20 s, nodes change the data rates, i.e., Node-1 switches to65 Mbps (MCS7) and Node-2 switches to 6.5 Mbps (MCS0). A i r - T i m e [ % ] WiFi T X D a t a [ M bp s ] T X F r a m e s [ s ] Node 1Node 2Node 3
Uncoordinated Learning
Fig. 5. Air-time allocation, individual throughput and packet rate in case ofheterogenous traffic, i.e., all nodes use the same packet size of 1500 B, butdifferent data rates M i ∈ { . , , } Mbps. A i r - T i m e [ % ] WiFi T X D a t a [ M bp s ] T X F r a m e s [ s ] Node 1Node 2Node 3
Uncoordinated Learning
Fig. 6. Air-time allocation, individual throughput and packet rate in case ofheterogenous traffic, i.e., all nodes use the same data rate of 26 Mbps (MCS3),but different packet sizes D i ∈ { , , } B. At t = 60 s, nodes change data rates again, i.e., Node-1switches to 6.5 Mbps (MCS0) and Node-2 switches to 65 Mbps(MCS7). Node-3 never changes its data rate.We observe in Fig. 7 that the algorithm can adapt to thechanges in data rates. The convergence takes around 10 s afterthe change occurs. A i r - T i m e [ % ] Node:1 Node:2 Node:3 T X D a t a [ M bp s ] Fig. 7. Air-time allocation and individual throughput in dynamic scenario.Nodes starts with the same data rate. At t = 20 s and t = 60 s, node 1 and2 change the used data rate. E. Unsaturated traffic
Here, we evaluate the behavior of the proposed algorithmunder unsaturated traffic conditions. Specifically, we simulatetwo scenarios with three transmitting nodes, in which: i) C W E x p . Unsaturated Traffic
One Node Saturated T X F r a m e s [ s ] Node 1Node 2Node 3
Uncoordinated Learning, δ = 0 . η = 0 . Fig. 8. Individual contention window and packet rate in unsaturated scenario(left) with offered load of r i ∈ { , , } pkts/s and with one nodesaturated (right), i.e., r i ∈ { , , } pkts/s. the total offered load does not saturate the wireless channel,i.e., r i ∈ { , , } pkts/s; ii) the total offered loadsaturates the wireless channel as one node operates withsaturated traffic, i.e., r i ∈ { , , } pkts/s. The nodesuse homogeneous transmission parameters, namely a data rateof 26 Mbps (MCS3), and a packet size of 1000 B.Fig. 8 shows the evolution of individual contention windowand packet rate in both scenarios. We observe that in the caseof unsaturated traffic the nodes increase their aggressivenessuntil the offered load is satisfied. Afterward, they operatewith a stable CW , i.e., the perturbation of the CW doesnot change the value of the network utility, and hence theestimated gradient equals zero. In the second scenario, thenode with a high traffic load increases its aggressivenessuntil it saturates the wireless channel, but without negativelyaffecting the slower nodes, i.e., they also adapt their CWproperly to get enough transmission opportunities and satisfytheir own traffic. F. Flow-In-the-Middle Topology
Finally, we evaluate the behavior of the proposed schemein a flow-in-the-middle (FIM) topology – Fig. 9. The FIMtopology is a simple multi-collision domain scenario, wherenodes have asymmetric contention information. Specifically,the central transmitter can carrier sense transmissions ofboth its neighbors, while the edge transmitters cannot carriersense each other. Therefore, the middle transmitter defers itstransmissions whenever at least one of its neighbors transmitsa frame, while concurrent transmissions of the edge nodescan occur. Note that the transmissions of the edge nodes mayinterleave, leaving no silent periods for the middle node. Asa result, the throughput of the middle node is lowered due tothe lack of transmission opportunities. In the worst case, themiddle node suffers from complete starvation [16].We simulate a scenario where all three transmitters usethe same data rate of 26 Mbps (MCS3), and packet size of1500 B. The traffic is saturated. In the FIM topology, onlythe middle transmitter can estimate the global utility function,while the edge nodes can estimate the utility function only intheir collision domain. We consider two cases of CW learning, ig. 9. Flow-In-the-Middle (FIM) Topology. In the network graph, vertices,dotted lines, and arrows represent nodes, connectivity, and flows, respectively. namely uncoordinated learning, where nodes locally estimatethe utility function, and coordinated learning, where nodesperform gradient estimation procedure synchronously and theglobal function computed by the middle node is communicatedto the edge nodes.Fig. 10 shows the evolution of the individual CW andthe air-time allocation, while the Fig. 11 show the air-timeallocation averaged over the simulation duration (i.e., 100 s).Our results confirm the problem of starvation of the middlenode in the case of standard WiFi BEB operation. Specifically,the middle node gets only 9% of the channel air-time, whilethe edge nodes around 70%. We also show the optimal air-timeallocation found with extensive simulations. The proposedalgorithm improves the fairness among nodes, i.e., the middlenode gets assigned more air-time while the edge nodes getproportionally less. When using the global utility, the nodescan find the CW values leading to the optimal solution.However, with local utilities, the middle node becomes tooaggressive. Moreover, as the goals of the three nodes are notconsistent, the distributed algorithm cannot converge. Instead,the algorithm oscillates between two solutions, i.e., the middlenode wants to achieve proportional fairness for three nodes,while the edge nodes optimize its operation for two nodes. C W e x p . with Local Reward with Global Reward A i r - T i m e [ % ] Uncoordinated Learning, δ = 0 . η = 0 . Fig. 10. The individual contention window and air-time allocation in FIMtopology for uncoordinated learning with locally estimated utility (left) andcoordinated learning with global utility (right).
VII. C
ONCLUSIONS
In this paper, we apply a distributed Kiefer-Wolfowitzalgorithm to a wireless network optimization problem. Weaddress the distributed tuning of contention windows in WiFinetworks and show that nodes can learn the proper contentionwindows values yielding proportional fair resource allocationeven without explicit communication. Specifically, they can
WiFi UncoordinatedLearning CoordinatedLearning Optimal020406080 A i r - T i m e [ % ] Left TX Middle TX Right TX
Fig. 11. The averaged air-time allocation attained by 802.11 WiFi and usingthe proposed learning-based approach in the FIM topology. Optimal air-timeallocation is presented for comparison. estimate the global utility function by overhearing each others’transmissions and use it to solve the log-convex optimizationproblem collaboratively. We evaluated the algorithm in multi-ple scenarios and with various levels of coordination betweennodes (i.e., time and action synchronization). We concludethat based on the KW algorithm, one can design simple yetpowerful learning-based collaboration schemes, which do notrequire any communication nor coordination among agents.However, the parameters of such an algorithm have to becarefully configured to make the convergence speed practical.The automatic tuning of the parameters is left for furtherstudy. R
EFERENCES[1] M. Heusse, F. Rousseau, G. Berger-Sabbatel, and A. Duda, “Perfor-mance anomaly of 802.11b,” in
IEEE INFOCOM , 2003.[2] P. Patras, A. Garcia-Saavedra, D. Malone, and D. J. Leith, “Rigorousand practical proportional-fair allocation for multi-rate wi-fi,”
Ad HocNetworks , 2016.[3] F. Kelly, A. Maulloo, and D. Tan, “Rate control in communicationnetworks: shadow prices, proportional fairness and stability,”
Journalof the Operational Research Society , 1998.[4] A. Checco and D. J. Leith, “Proportional Fairness in 802.11 WirelessLANs,”
IEEE Communications Letters , 2011.[5] D. J. Leith, V. G. Subramanian, and K. R. Duffy, “Log-convexity of rateregion in 802.11e WLANs,”
IEEE Communications Letters , 2010.[6] J. Walrand, “Convergence of a Distributed Kiefer-Wolfowitz Algorithm,” arXiv e-prints , p. arXiv:2008.12856, Aug. 2020.[7] J. Kiefer and J. Wolfowitz, “Stochastic estimation of the maximum ofa regression function,”
The Annals of Mathematical Statistics , 1952.[8] Yuan Le, Liran Ma, Wei Cheng, Xiuzhen Cheng, and Biao Chen,“Maximizing throughput when achieving time fairness in multi-ratewireless lans,” in
IEEE INFOCOM , 2012.[9] Y. Le, L. Ma, W. Cheng, X. Cheng, and B. Chen, “A Time Fairness-Based MAC Algorithm for Throughput Maximization in 802.11 Net-works,”
IEEE Transactions on Computers , 2015.[10] M. Laddomada, F. Mesiti, M. Mondin, and F. Daneshgaran, “Onthe throughput performance of multirate IEEE 802.11 networks withvariable-loaded stations: analysis, modeling, and a novel proportionalfairness criterion,”
IEEE Transactions on Wireless Comm. , 2010.[11] G. Famitafreshi and C. Cano, “Achieving Proportional Fairness in WiFiNetworks via Bandit Convex Optimization,” in
Machine Learning forNetworking . Springer International Publishing, 2020.[12] G. Bianchi, “Performance analysis of the IEEE 802.11 distributedcoordination function,”
IEEE Journal on Selected Areas in Com. , 2000.[13] J. C. Spall, “Multivariate stochastic approximation using a simultaneousperturbation gradient approximation,”
IEEE Transactions on AutomaticControl , 1992.[14] ns-3 network simulator, , accessed: 2021-02-05.[15] P. Gawłowicz and A. Zubow, “ns-3 meets OpenAI Gym: The Playgroundfor Machine Learning in Networking Research,” in
ACM MSWiM , 2019.[16] E. Aryafar, T. Salonidis, J. Shi, and E. Knightly, “Synchronized CSMAContention: Model, Implementation, and Evaluation,”