[PDF] A Model of WiFi Performance With Bounded Latency

Abstract

In September 2020, the Broadband Forum published a new industry standard for measuring network quality. The standard centers on the notion of quality attenuation. Quality attenuation is a measure of the distribution of latency and packet loss between two points connected by a network path. A vital feature of the quality attenuation idea is that we can express detailed application requirements and network performance measurements in the same mathematical framework. Performance requirements and measurements are both modeled as latency distributions. To the best of our knowledge, existing models of the 802.11 WiFi protocol do not permit the calculation of complete latency distributions without assuming steady-state operation. We present a novel model of the WiFi protocol. Instead of computing throughput numbers from a steady-state analysis of a Markov chain, we explicitly model latency and packet loss. Explicitly modeling latency and loss allows for both transient and steady-state analysis of latency distributions, and we can derive throughput numbers from the latency results. Our model is, therefore, more general than the standard Markov chain methods. We reproduce several known results with this method. Using transient analysis, we derive bounds on WiFi throughput under the requirement that latency and packet loss must be bounded.

Full PDF

AA Model of WiFi Performance With Bounded Latency

Bjørn Ivar Teigen

University of [email protected]

Neil Davies

Predictable Network [email protected]

Kai Olav Ellefsen

University of [email protected]

Tor Skeie

University of [email protected]

Jim Torresen

University of [email protected]

ABSTRACT

In September 2020, the Broadband Forum published a newindustry standard for measuring network quality. The stan-dard centers on the notion of quality attenuation. Qualityattenuation is a measure of the distribution of latency andpacket loss between two points connected by a networkpath. A vital feature of the quality attenuation idea is thatwe can express detailed application requirements and net-work performance measurements in the same mathematicalframework. Performance requirements and measurementsare both modeled as latency distributions. To the best of ourknowledge, existing models of the 802.11 WiFi protocol donot permit the calculation of complete latency distributionswithout assuming steady-state operation. We present a novelmodel of the WiFi protocol. Instead of computing throughputnumbers from a steady-state analysis of a Markov chain, weexplicitly model latency and packet loss. Explicitly modelinglatency and loss allows for both transient and steady-stateanalysis of latency distributions, and we can derive through-put numbers from the latency results. Our model is, therefore,more general than the standard Markov chain methods. Wereproduce several known results with this method. Usingtransient analysis, we derive bounds on WiFi throughputunder the requirement that latency and packet loss must bebounded.

In September 2020 the Broadband Forum published a newindustry standard for measuring network quality [10]. Thestandard is called “Quality Attenuation Measurement Ar-chitecture and Requirements”.

Quality attenuation is a mea-sure of the latency and packet loss performance of packet-switched networks. In this light, we revisit established mod-eling methodologies for the WiFi protocol because mostprevious work on modeling the 802.11 protocol has focusedon analysis of throughput values only [2, 9, 19, 20]. Through-put analysis can be used to calculate the WiFi link’s average latency [4], but average latency is not sufficient to modelQuality of Experience (QoE) [15]. “‘Performance’ is typically considered as a positive at-tribute of a service. However, a perfect service would beone without error, failure or delay, whereas real servicesalways fall short of this ideal; we can say that their qualityis attenuated relative to the ideal” [18]. The quality attenua-tion (abbreviated Δ 𝑄 ) concept has been developed throughseveral decades of academic work [5, 7, 12, 15, 18]. Thomp-son and Davies [18] present a framework for performancemanagement based on the notion of quality attenuation. The Δ 𝑄 framework centers on the assertion that network per-formance should be defined as the amount of latency andpacket loss the network introduces. Δ 𝑄 can be modelled asthe distribution of latency introduced by each hop along thenetwork path, with packet loss modelled as infinite latency.[18] shows how the Δ 𝑄 of a network link can be used tomodel application performance over that link. In particular,the tail of the latency distribution at each hop is importantfor the end-to-end performance of an application, especiallywhen many hops are involved in transmitting data. Under-standing the tail of the latency distribution is therefore keyto understanding network performance as seen from theend-user perspective.The model described by Bianchi [2] is perhaps the mostinfluential WiFi model in the literature. The analysis in [2]is performed using steady-state analysis of a Markov chaindescription of the WiFi protocol. Such steady-state analysisis suitable for analyzing average throughput over long time-scales. Throughput is proportional to average latency whenthe system is saturated, as shown in [4], and throughput andaverage latency therefore represent the same informationabout the system performance at saturation. However, forthe purposes of relating WiFi performance to application-level outcomes, we need a more complete description of thelatency distribution. This work presents a novel WiFi modelthat describes complete latency distributions and packet lossprobabilities. We can describe long-term average throughputby computing the average latency, and our method is thusmore general than the steady-state Markov chain analysisapproach. a r X i v : . [ c s . N I] J a n ubmitted for review to SIGCOMM, 2021 Teigen et al. This work analyzes the latency and packet loss perfor-mance of the WiFi protocol. We propose a method that ex-plicitly models latency and packet loss. Evaluating our modelrequires more computational resources than a comparableMarkov chain method, but we gain the ability to do tran-sient analysis because we do not rely on the system beingin a steady state. We compute throughput values from thelatency results and show that our throughput values matchthose derived by Markov chain methods. We then derivebounds on throughput under the requirement that latencyand packet loss must be bounded. We also reproduce someknow results such as the WiFi performance anomaly [11].Our model is suitable for analysis of application layer per-formance using the methods described in [15, 18] because itaccurately models the tail of the latency distribution.Section 2 lays out the most relevant related work. Weexplain our method and its application to WiFi in section3. In section 4, we validate the convergence and accuracyof our model. In section 5, we use our model to find anupper bound on WiFi throughput with latency guarantees.We expand on the work on upper bounds in section 6 byexploring the impacts of several improvements to the WiFiprotocol. Finally, we conclude the work in section 7. Thiswork does not raise any ethical issues.

Reeve [15] shows that mean value analysis of network la-tency is not sufficient to model application performance. Onereason why average latency does not capture the notion ofperformance is that network users care about how reliablyan outcome is delivered on time. We require a model thatcan capture the risk of not delivering the desired outcomein a specified time. Mean latency is not at all sufficient tocapture this risk. Consider a network that loads a website in1 millisecond 99 out of 100 times, but once every 100th time,loading the website takes 10 seconds. The average delay ofthis outcome is only about 100ms. An observer monitoringthis network using average values only might conclude thata 100ms load time is very reasonable, but the unpredictablebehavior is likely to annoy users.Thompson and Davies [18] show how the notion of qual-ity attenuation is related to the probability of deliveringapplication outcomes in time.Bianchi [2] models the WiFi distributed coordination func-tion (DCF) using a Markov chain, and in doing so, makesa few key simplifications. The most important simplifica-tion is to abstract away the details of delays in the model. Atime-step in the model is defined by the value of the back-offcounters. That is to say, the model does not separate the casein which the medium is idle from the case in which the sta-tion (STA) has to wait for another transmission to complete before the back-off counter is decremented. In other words,the time-steps are defined in terms of the model state, nothow much actual time has passed. Defining the time-stepsin terms of back-off counter values is a useful simplificationfor a Markov chain analysis, but it comes at the cost of dis-carding timing information. Bianchi also points this out [2,Section IV, A].Tinnirello et al. [19] extend the methodology of Bianchi[2]. Here, a Markov chain is solved for the steady-state dis-tribution of back-off timer values . While this approach waschosen to better model the different channel access prob-abilities of the Wireless Multi-Media (WMM) extension ofWiFi, this method is also closer to modeling latency dis-tributions. Tinnirello et al. deals with the distribution ofback-off timers instead of simply the probability of packettransmission. Their model still makes simplifications thathide latency information because the model does not dealwith differences in transmission times due to different datarates. Heusse [11] shows that differences in data rates arevery important for WiFi performance. A time-step in Tin-nirello’s model is defined as the period from the end of onetransmission or collision to the end of the next transmissionor collision. Thus, this model does not take into account thatthe size of the transmission time interval may change. Thatis not to say this model is incorrect, only that it lacks fidelityin modeling latency.In [9], Engelstad et al. expand the model of [2] to includethe Access Categories of 802.11e and to handle both satu-rated and unsaturated networks. The model is validated bycomparison to a simulation, but only throughput numbersare reported.Youm and Kim [21] derive latency distributions from themodel of Bianchi [2]. Their analysis begins by assuming thesystem is in the steady-state and assigning the appropriatelatency values to each of the transitions. This approach issimilar to the one we present in that it computes latencydistributions by explicitly modelling the latency of each pos-sible transition from one state to the next. Our work differsfrom that of Youm and Kim by avoiding the assumption thatthe system starts out in the steady state. Avoiding this re-quirement allows us to perform transient analysis startingfrom any system state.From our review of previous work we see a lack of analysisof the latency and packet loss performance of WiFi. Weaddress this shortcoming by developing a novel model of the802.11 WiFi protocol that allows for the exact computationof the statistical distribution of latency and packet loss.

Our model is based on the quality attenuation framework( Δ 𝑄 framework) [18]. Conceptually, quality attenuation is Model of WiFi Performance With Bounded Latency Submitted for review to SIGCOMM, 2021

DoneLoss

Figure 1: An example labeled transition system the delay and loss introduced by a network element. It de-scribes how far the network element is from delivering the“perfect service” of zero delay and no chance of loss. Qualityattenuation, denoted by Δ 𝑄 , can be described by an improperrandom variable describing the probability distribution overthe possible latency of an outcome. The variable is improperbecause there can be some chance that the outcome is neverdelivered. The possibility of an outcome never being achievedis modeled by including infinity in the domain of the randomvariable.Analyzing the quality attenuation of the WiFi protocolis a matter of describing the latency of every possible paththrough the protocol state machine. The Δ 𝑄 framework pro-vides the tools for adding up the contributions (the Δ 𝑄 ’s)from each possible path so that the complete system’s latencydistribution can be accurately described. Our model of the WiFi protocol describes the protocol statemachine as a labeled transition system. A Labeled TransitionSystem (LTS) is a directed multigraph with labeled edges (seethe example LTS in Figure 1). Nodes in the graph representstates of the protocol state machine, and edges representtransitions between the states. Each edge is labeled with a Δ 𝑄 value, which describes the distribution of time neededto complete the transition and includes the probability thatthe transition never terminates. Each edge also has a prob-ability associated with it, which describes the likelihood ofthe system progressing through that edge conditioned onthe system being the source state of the edge. In the LTS shown in Figure 1, each transition is labeled witha distribution of possible delays and a probability of thattransition from the source state.To make the states of the LTS Markovian, we transformthe LTS so that every possible path to each state ends upin a separate copy of that state. In the unrolled LTS shownin Figure 2, new indices are added to distinguish differentversions of the states from Figure 1. The cost of this trans-formation is a large (possibly infinitely large) increase inthe size of the state space. If the LTS contains loops, this ...

Figure 2: The unrolled labeled transition system.

A B CA C

Figure 3: Δ 𝑄 convolution transformation will introduce infinite recursion, as shown inFigure 2. We can solve this problem by defining a maximumlatency and equating any path that exceeds this limit with aloss. The WiFi protocol state machine contains no loops, andso introducing the maximum latency is unnecessary in thiscase, but we include the idea here to show the generality ofthe method. Our goal is to compute the time re-quired for the LTS to evolve from a given starting state tosome state which represents the desired outcome. This calcu-lation includes answering the question; “Knowing the start-ing state, how long does it take to reach a given state?” Forstates that can only be reached by a single path through thegraph, we simply sum the latency contributions of each tran-sition along that path. In the unrolled version of the LTS (seeFigure 2) we have, by design, only one path to every state.Because the latency of each edge is described by an improperrandom variable, the total Δ 𝑄 of a sequence of transitionscan be calculated by convolving the Δ 𝑄 of each transition[5]. See Figure 3. When we have more than one pathfrom the starting state to the target state, we cannot com-pute the arrival time distribution to the target state with ubmitted for review to SIGCOMM, 2021 Teigen et al. q A p BA B

Figure 4: Δ 𝑄 mixture density StartLoss Done while t < maxdelay

Figure 5: The reduced form of the WiFi protocol model convolution alone. In this case, we create a mixture distribu-tion consisting of the latency along each possible path. Theweights in the mixture distribution are determined by theprobability of taking each of the paths, see Figure 4.There are only two possible outcomes for a packet goingthrough the WiFi protocol stack: Successful transmissionor packet loss. Consider the branches that go to the state“Done” in Figure 1. We unroll the LTS as illustrated in Figure2 and perform the convolution operation above on each ofthe paths ending in a copy of the “Done” state. Now, we knowthe latency distribution associated with each possible pathto the “Done” state. The total latency of the done state in thenot-unrolled LTS is, therefore, the mixture density formedby weighting each of the possible latency distributions thatterminate in a copy of the “Done” state by their respectiveprobabilities. See Figure 5.

The WiFi protocol is “listen before talk”. That means that aWiFi station must check that the radio frequency is idle for acertain amount of time before starting a transmission. Whena WiFi station begins a transmission procedure, it first selectsa random number in the range [ , 𝐶𝑊 𝑚𝑖𝑛 ] and assign thenumber to a back-off counter [17, Section 10.2.2]. “CW” herestands for contention window. If the back-off counter is notzero, the station will wait for a single “slot time”. If the radio Back-off counter value0 1 2 3 4 5 6 . . . retry 0 0 0 2 0 0 0 0 . . . retry 1 0 0 0 0 0 0 1 . . .... retry 𝑚𝑎𝑥𝑟𝑒𝑡𝑟𝑖𝑒𝑠 . . . Table 1: The state representation for the 802.11 model frequency is sensed to be idle during the waiting period, theback-off counter is decremented by one. When the back-offcounter becomes zero, the station starts transmitting. Thereason for this somewhat convoluted scheme is that a stationcannot listen while transmitting. This limitation means atransmitting stations cannot sense that another station isalso transmitting at the same time. Therefore, collisions cannot be handled by both stations interrupting their ongoingtransmissions. When many WiFi stations compete for accessto the frequency, collisions can waste a significant amountof time. The back-off counter mechanism was introduced toreduce the risk of collisions.When a station is involved in a collision, the station willagain choose a random value for its back-off counter. Thesize of the interval from which random values can be selectedis a function of how many times a packet has been retried.The back-off window size doubles with each new retry of thesame packet, up to a maximum value of 𝐶𝑊 𝑚𝑎𝑥 . 𝑚𝑎𝑥𝑟𝑒𝑡𝑟𝑖𝑒𝑠 determines how many times to retry a packet before it isdropped. See Table 2 for the values of each parameter. We represent the state of a WiFi system with 𝑛 competingstations as an allocation of each of the 𝑛 stations to a back-offcounter and retry counter value. Table 1 shows an exampleof the state representation. Each entry in the state represen-tation counts how many stations have a specific combinationof back-off counter and retry counter values. In the exam-ple in Table 1, two stations are at back-off counter valuetwo after zero retries, and one station is at back-off countervalue six after one retry. Note that our state representationis similar to that of Bianchi [2, Figure 4].For a given state, the 802.11 protocol defines the possibletransitions to a next state. Only three cases are possible:(1) No stations have a back-off counter value of zero, soall stations decrease their back-off counter value byone after one slot time(2) Exactly one station has a back-off counter value of zero.This station successfully transmits, spending the timerequired for transmission. The transmitting station isthen finished sending its packet, and it either leaves the Model of WiFi Performance With Bounded Latency Submitted for review to SIGCOMM, 2021 system or restarts the back-off procedure with a newpacket. The transmitting station resets its retry counterto zero. The remaining stations hold their back-offcounters constant for the duration of the transmission.(3) More than one station has a back-off counter value ofzero. This causes a collision, spending the amount oftime required for the slowest of the colliding stationsto transmit. All the colliding stations then increasetheir retry counter by one and select a random back-off counter value from the range ( , 𝐶𝑊 𝑟 ) , where r isthe new retry counter value. The stations not involvedin the collision keep their back-off counter values con-stant for the duration of the collision.We include the minimum interval between subsequentWiFi transmissions [17, Figure 10.4] in the time required foreach transmission. The duration of this interval is 𝑆𝐼 𝐹𝑆 + Slot time such that the earliest possible time for a transmis-sion following a period with busy medium is

𝑆𝐼 𝐹𝑆 + ∗ Slot time . This simplifies the model because we do not needto keep track of whether a transmission just occurred or not.Designing the model this way makes it difficult to model the802.11e extension of WiFi where the inter-frame space variesfor traffic from different access categories. Future work willexpand our model in this direction.

In this section we explain how we evaluate our model, em-pirically test the accuracy of our model, and verify that thatthe model converges to the same latency distribution witheach evaluation.The state-space of our WiFi LTS model is large. Assumingthe values for maximum retries, 𝐶𝑊 𝑚𝑖𝑛 , and 𝐶𝑊 𝑚𝑎𝑥 fromtable 2, the number of possible configurations of a single sta-tion is (cid:205) 𝑖 = + 𝑖 = 𝑛 stations, there are then 2032 𝑛 ways to assign them to back-off and retry counter values.Some of these assignments will be equivalent, but even so,evaluating the evolution of this system for all possible stateconfigurations quickly becomes infeasible as 𝑛 increases.Progress has been made in solving large-scale semi-Markovmodels similar to the one we use here, although these modelshave been solved only up to the order of tens of millions ofstates [6]. We approach the state-space explosion problem byusing Monte Carlo simulation to approximate the evolutionof the LTS model.In this work we evaluate our model in two different ways;The case where all stations always has a packet to send,and the case where all stations only have a single packetto send. We label these “Ergodic evaluation” and “Transientevaluation”. When a station has either suc-cessfully transmitted it’s packet, or the packet is dropped,the station immediately re-starts its back-off process. Thiscorresponds to the saturation conditions used by Bianchi[2]. We call this method of evaluation “ergodic” because itcorresponds to the evaluation of an ergodic Markov chain.For this case we run the model forward from a random start-ing state until we have observed the outcome of 10 packets.We arrive at the throughput numbers by first calculatingthe latency distribution seen by the head of line packet ateach station, and then calculating throughput using equa-tion 2(see section 5.1) appropriately scaled by the number ofstations. When a station has either suc-cessfully transmitted it’s packet, or the packet is dropped,the station leaves the system. We record the time-to-empty ,defined as the time at which the last station leaves the system.We call this method of evaluation “transient” because we areessentially modelling the transient response of the system toone packet simultaneously arriving at each of 𝑛 stations. Tocompute the distribution of the measured time-to-empty, weevaluate the model starting from a random state 10 times.The ability to do transient analysis of latency distributions isthe main advantage of our model over steady-state analysisof Markov chains. We now establish empirical results for the rate of conver-gence of our Monte Carlo simulations for both the ergodicand the transient evaluation method. For these experimentswe use the parameters in table 2, and 5 competing stations.This analysis does not confirm that the results produced arecorrect, but it shows how consistent the results are across dif-ferent runs of the Monte Carlo simulation. We have chosento look at the convergence of the 90 th percentile of latencybecause we are interested in accurately characterizing thetail of the latency distribution. Since events in the tail of thedistribution are by definition rare, it takes longer for the 90 th percentile to converge. Therefore, these results are strongerthan showing convergence of the mean latency.Figure 6 shows the distribution of the 90 th percentile la-tency as a function of the number of packet outcomes ob-served for the ergodic evaluation method described in sec-tion 4.0.1. We ran the simulation 1000 times to compute thedistributions of the results.Figure 7 shows the distribution of 90 th percentile time-to-empty as a function of the number of evaluations for thetransient evaluation method described in section 4.0.2. Weran 1000 separate simulations starting from a random state 𝑘 ∗ 𝑘 from 1 to 10, and ubmitted for review to SIGCOMM, 2021 Teigen et al. Packet outcomes observed . . . . . . L a t e n c y ( s ) MeanStandard deviation

Figure 6: Convergence of the 90 th percentile latencyestimate as a function of number of packet outcomesobserved for the ergodic evaluation method Number of evaluations . . . . . T i m e - t o - e m p t y ( s ) Figure 7: Convergence of the 90 th percentile time-to-empty estimate as a function of number of evalua-tions observed for the transient evaluation method recorded the 90 th percentile time-to-empty for each of thesimulations.We conclude that the 90 th percentile of the distributionconverges with a high probability for both the ergodic andthe transient evaluation method with the amount of packetoutcomes or evaluations chosen. To replicate the results of Bianchi [2] we evaluate our modelusing the ergodic method described in section 4.0.1. We per-form the evaluation using the same parameters as Bianchi[2, Table 2], shown in table 2. Figure 8 shows results for totalsystem throughput as a function of initial back-off windowsize in the ergodic case, compared to results from [2, Fig-ure 9]. We consider these results sufficiently close to those Parameter ValueSlot time ( 𝜇𝑠 ) 50SIFS ( 𝜇𝑠 ) 28DIFS ( 𝜇𝑠 ) 128PHY Header (bits) 128MAC Header (bits) 272ACK ( 𝜇𝑠 ) PHY Header + 14*8/base rateBase rate (Mbit/s) 1 𝐶𝑊 𝑚𝑖𝑛 exponent 4 𝐶𝑊 𝑚𝑖𝑛 𝐶𝑊 𝑚𝑎𝑥 𝑚𝑎𝑥𝑟𝑒𝑡𝑟𝑖𝑒𝑠 𝐶𝑊 𝑚𝑖𝑛 exponent + Number of retries

Table 2: Parameters used for comparison to the modelof Bianchi[2]

Initial size of the backoﬀ window . . . . T h r o u g hpu t n=5n=5 Bianchin=10n=10 Bianchin=20n=20 Bianchin=50n=50 Bianchi Figure 8: System throughput as a function of initialback-off window size. of [2], which demonstrates that our model accurately de-scribes a saturated WiFi system for a set of different systemparameters.

In this section we explore the relation between latency andthroughput in the WiFi protocol. First we show how latencyand throughput are related in the ergodic case, and showthat latency for the ergodic case grows very quickly as thenumber of competing stations increases. We then discuss thenotion of an upper bound on throughput under the conditionthat latency and packet loss must be bounded. Model of WiFi Performance With Bounded Latency Submitted for review to SIGCOMM, 2021

Time is related to throughput as shown in equation 1, where 𝑇 is throughput in packets per second, 𝑁 is the number ofpackets sent in some interval, and 𝐷 is the duration of thatinterval (in seconds). If we also know the average packet size,for instance in bytes, we can calculate the average through-put in Mbit/s. 𝑇 = 𝑁𝐷 (1)In our model we record the delay of each packet, and sowe do not directly measure 𝐷 in equation 1. Assuming theinterval 𝐷 ends with the transmission of a packet, and thatthere was no idle time which did not count towards the delayof any of the recorded packets, we can calculate 𝐷 . Considerthe packets from a single station which sends packets back-to-back, as is true in the ergodic case. We denote the latencyof packet 𝑖 by 𝑑 𝑖 , and assert 𝐷 = (cid:205) 𝑁𝑖 = 𝑑 𝑖 . Observe that thismeans throughput is the inverse of the average per-packetdelay, as shown in equation 2.1 𝑇 = 𝑁 𝑁 ∑︁ 𝑖 = 𝑑 𝑖 (2)Note that we only consider packets that are not lost, or elsethe sum of delays would be infinite. This is correct becauselost packets do not contribute to the throughput. Lost packetswill, however, increase the average latency for the packetsthat are not lost. This is also in accordance with the methodused by Bianchi to calculate mean latency from throughputvalues [4]. We now proceed to investigate latency and packet loss per-formance in the ergodic case. Figures 9 and 10 show thelatency and packet loss performance of the WiFi DCF for dif-ferent back-off timer values and different numbers of stations.We read packet loss values in figures 9 and 10 by observinghow far the maximum of each CDF is from 1 on the y-axis.The quality attenuation found in these experiments is solarge, especially for a high number of stations, that we arguethe throughput results of Figure 8 are of little practical use.Even though total system throughput is very close to thetheoretical optimum for the 50-station case with a back-offwindow size of 1024, the vast majority of user applicationswill not perform well when running over a network with thismuch latency and packet loss. Interactive applications suchas gaming and video conferencing obviously cannot functionwell with this much latency, and TCP throughput is severelyaffected by loss rates as high as in the 50-station case, asshown by Padhye et al. [14]. Our results are consistent with . . . . . . . Latency (seconds) . . . . . . C D F n=5n=10n=20n=50 Figure 9: CDF for latency and packet loss with initialback-off window size of 8 those of Youm and Kim [21]. Note that the results show thelatency of a head-of-line packet, and so queuing delays andpotential packet loss due to full buffers will come in additionto the delays shown here.Existing WiFi models mostly evaluate performance underthe assumption that the system is in the steady-state and thatthe system is saturated [3, 19, 20, 21]. The results presentedhere, along with those of [21], shows that the latency of WiFiunder these conditions is very large. A more complete wayof modeling WiFi performance is needed. We therefore arguefor a different perspective on WiFi performance modelingand optimization. Instead of looking for the system parame-ters that will give the best throughput, or the highest systemutilization, we should look for the system parameters mostlikely to deliver good Quality of Experience with typicalend-user applications. The main drivers of QoE, as arguedin [15], are latency and packet loss. It is therefore crucialthat our models accurately capture latency and packet lossperformance under realistic conditions.

We now compute throughput bounds under the conditionthat latency and packet loss must be kept bounded. We con-sider bounded latency and packet loss to be the absoluteminimum requirement for a good user experience.The worst-case scenario for a WiFi system with 𝑛 sta-tions is that all 𝑛 stations begin their back-off procedureat the same time. We can think of this as all stations beingmaximally correlated, or as having worst-case correlationbetween the stations. This scenario has the highest risk ofcollisions and will therefore lead to the longest possible time-to-empty. We investigate this worst-case correlation scenario ubmitted for review to SIGCOMM, 2021 Teigen et al. . . . . . . Latency (seconds) . . . . . . C D F n=5n=10n=20n=50 Figure 10: CDF for latency and packet loss with initialback-off window size of 1024 .

02 0 .

04 0 .

06 0 .

08 0 .

10 0 .

12 0 .

14 0 . Time (seconds) . . . . . . C D F n=1n=2n=3n=4n=5n=6n=7n=8n=9 Figure 11: Time-to-empty for for 𝑛 competing stationsstarting back-off procedures at the same time to establish an upper bound on the time-to-empty of a WiFisystem with 𝑛 stations.The time-to-empty of an 802.11b WiFi link is shown inFigure 11 for one to nine stations. To make the results com-parable to those of Bianchi [2] we use a channel rate of 1Mbit/s. The time-to-empty represents the time from all sta-tions simultaneously initiate the back-off process until allstations have completed the transmission of or dropped asingle packet. To calculate the distribution of time-to-emptywe evaluate the model using the transient evaluation methoddescribed in section 4.0.2 A well-known result from queuing theory states that if, onaverage, packet arrivals to an unbounded queue occur morefrequently than departures from the queue, then the queuelength grows toward infinity. Thus, to avoid unbounded la-tency growth (or packet loss when queues are not infinitelylarge), the packet arrival rate must be smaller than the servicerate. This relationship is expressed by the inequality in equa-tion 3, where 𝜆 is the mean arrival rate in packets/second, and 𝐸 [ 𝑠 ] is the mean service time per packet (also in seconds).We are now ready to find the upper bound on throughput.Our logic is as follows: If the arrival rate is slower thanthe worst-case service rate, then we know that latency isbounded. Using the mean service time we calculate an upperbound on the packet arrival rate by setting 𝜆 = 𝐸 [ 𝑠 ] . 𝜆 < 𝐸 [ 𝑠 ] (3)Because the time-to-empty represents the time for the sys-tem to process one packet from each station, the upper boundon throughput is reached when one packet arrives at each ofthe stations every mean time-to-empty. The throughput inMbit/s can then be calculated, assuming we know the PHYrate and the packet size. The upper bound on throughput isshown in Figure 12 for one to seven stations. The through-put shown in Figure 12 is total system throughput, so tocompute the per-station throughput, we must divide by thenumber of stations. The per station throughput is shown inFigure 13. These results use the parameters shown in table2, which are the same as those in Bianchi’s analysis[2]. Wenow have a tool for guiding the design and configurationof WiFi networks. The upper bound on throughput givesus a way to know whether a given WiFi configuration cansupport a certain set of applications without building queues.We can potentially use this to inform queuing and sched-uling algorithms about the available capacity of the WiFilink so that large delays due to unnecessary congestion canbe avoided. In particular, our results show that WiFi per-formance is very sensitive to the number of simultaneouslyactive stations on a channel. Several methods for improvingWiFi performance by reducing the number of simultaneouslyactive stations have been reported. Saeed et al. [16] proposeda token-based WiFi scheduling algorithm for reducing con-tention overhead. Maity et al. [13] proposed a WiFi schedulerfor TCP downloads which reduces the number of differentstations transmitting ACKs to an access point at the sametime. Channel planning and transmit power managementcan also reduce the number of concurrently active stations Model of WiFi Performance With Bounded Latency Submitted for review to SIGCOMM, 2021

Tx and Rx Rate (Mbit/s) M b i t / s n=1n=2n=3n=4n=5n=6n=7 Figure 12: Upper bound on total system throughputfor 𝑛 competing stations with bounds on latency andpacket loss Tx and Rx Rate (Mbit/s) M b i t / s n=1n=2n=3n=4n=5n=6n=7 Figure 13: Upper bound on throughput per stationfor 𝑛 competing stations with bounds on latency andpacket loss and thus improve WiFi performance [1, 8]. We believe ourmodel can help inform further work on WiFi optimization. Readers familiar with WiFi performance might object thatthe throughput bounds presented above are too strict. Indeed,WiFi networks exist today with much greater throughputperformance. In this section we explore some of the im-provements that have been made to the WiFi protocol, andinvestigate how each improvement affects the throughputbounds. Parameter ValueSlot time ( 𝜇𝑠 ) 9SIFS ( 𝜇𝑠 ) 10DIFS ( 𝜇𝑠 ) 28PHY Header ( 𝜇𝑠 ) 24MAC Header (bits) 272ACK ( 𝜇𝑠 ) PHY Header + 14*8/base rateBasic rate set (Mbit/s) [1, 2, 5.5, 11, 24] 𝐶𝑊 𝑚𝑖𝑛 exponent 4 𝐶𝑊 𝑚𝑖𝑛 𝐶𝑊 𝑚𝑎𝑥 𝑚𝑎𝑥𝑟𝑒𝑡𝑟𝑖𝑒𝑠 𝐶𝑊 𝑚𝑖𝑛 exponent + Number of retries

Table 3: Parameters used throughout section 6, unlessotherwise specified. These represent the default pa-rameters of 802.11n.

Tx and Rx Rate (Mbit/s) . . . . . . . . M b i t / s n=1n=2n=3n=4n=5n=6n=7 Figure 14: Upper bound on per station throughput for 𝑛 competing stations using 802.11n parameters (see ta-ble 3) This section explores the impact of various protocol fea-tures introduced in 802.11n and its accompanying amend-ments. We compute the impact on the throughput bound ofRequest-to-send/Clear-to-send (RTS/CTS) in section 6.2 andof packet aggregation in section 6.3. We also reproduce the“WiFi performance anomaly” first reported by Heusse et al.[11] in section 6.4.

Figure 14 shows the upper bound on per-station through-put assuming throughput is fairly divided and latency andpacket loss is bounded. Comparing to Figure 13, we see verysimilar behavior. However, as expected the throughput issignificantly higher using the 802.11n parameters compared ubmitted for review to SIGCOMM, 2021 Teigen et al. to those of [2]. The results in Figure 14 will serve as a base-line comparison for the protocol features explored in thissection. The request-to-send, clear-to-send mechanism in WiFi in-troduces an extra handshake between sender and receiver.Before a data packet is transmitted, the sender transmitsa RTS packet. Upon hearing the RTS packet, the receiverresponds with a CTS packet. If the sender hears the CTSpacket, the data packet is transmitted. Because the RTS andCTS packets are small, and therefore take little time to trans-mit, the introduction of the extra handshake can reduce theamount of time spent transmitting whenever a collision oc-curs. The RTS/CTS mechanism also reduces the impact ofhidden node problems.We can model the RTS/CTS mechanism by increasing thetime to complete each transmission by the time required toperform the RTS/CTS handshake. If a collision occurs, theelapsed time is decreased, because the collision is detectedby all involved stations when they fail to receive a CTSframe. Because we do not consider hidden nodes in thiswork, RTS/CTS can only reduce the amount of time spentwaiting for colliding transmission to cease.We now compare our RTS/CTS results to those of [2, 20].We calculate the mean time-to-empty for a WiFi systemwith and without RTS/CTS enabled. We use WiFi parametersequal to those of [2] (see Table 2), and vary the number ofstations and the packet size. The packet size is increased insteps of 100 bytes from 100 to 9900 bytes. Figure 15 showsthe percentage change in the upper bound on throughputfrom enabling RTS/CTS. Positive values indicate that thesystem with RTS/CTS enabled allows for greater throughput.Our results show a trend similar to those found in [2, 20],but our results indicate that RTS/CTS should be enabled fora smaller packet size and for a smaller number of stationscompared to the results of [2, 20]. Whereas Bianchi[2] setsthe threshold for enabling RTS/CTS in a system with fivestations running at 1Mbit/s at 3160 bytes, and Tinnirello etal. [20] sets the threshold at 800 bytes, our results indicatethat RTS/CTS should be enabled for packets above 600 bytes.Figure 16 shows the impact of enabling RTS/CTS for 802.11nWiFi using the parameters listed in Table 3 and a channel rateof 144 Mbit/s. Comparing to Figure 15, it is clear that we aremore at risk of negatively impacting the system throughputin this case, because the overhead of the RTS/CTS handshakeis relatively larger when the channel rate is high. Figure 17compares the time-to-empty CDF with and without RTS/CTSenabled for 5 stations and a packet size of 1023. Accordingto the results shown in Figure 16, enabling RTS/CTS reducestotal system throughput using these parameters. Figure 17

Number of stations P a c k e t s i z e − − Figure 15: Heatmap showing the percent impact of en-abling RTS/CTS as a function of number of stationsand packet size. Positive values (towards top right cor-ner) indicate that enabling RTS/CTS increases the up-per bound on throughput. The parameters are thoselisted in Table 2.

Number of stations P a c k e t s i z e − − Figure 16: Heatmap showing the percent impact of en-abling RTS/CTS as a function of number of stationsand packet size. Positive values (towards top right cor-ner) indicate that enabling RTS/CTS increases the up-per bound on throughput. The parameters are thoselisted in Table 3. The channel rate is 144 Mbit/s. clearly shows that enabling RTS/CTS reduces the likelihoodof high latency, at the cost of added overhead. Increasingpredictability at the cost of higher minimum latency may bea desirable trade-off for jitter-sensitive applications, even ifthe total system throughput decreases. Model of WiFi Performance With Bounded Latency Submitted for review to SIGCOMM, 2021 .

000 0 .

002 0 .

004 0 .

006 0 . Time (seconds) . . . . . . C D F Baseline80211nRTSCTSAMSDU

Figure 17: Time-to-empty distributions for the casesof 5 stations and Tx and Rx rates of 144 Mbit/s withdifferent WiFi extensions.

Packet aggregation is a mechanism by which several higher-layer packets (here typically IP packets) are transmitted to-gether over a link, without individual MAC-layer headers.Packet aggregation increases the total throughput by reduc-ing the MAC-layer overhead because the number of transmitopportunities required to send a given number of IP packetsis reduced. The 802.11 protocol describes the following twotypes of packet aggregation: Mac Service Data Unit (MSDU)aggregation and Mac Protocol Data Unit (MPDU) aggrega-tion. To take advantage of either aggregation mechanism,the station must have buffered several packets with the samedestination address. The WiFi protocol imposes a limit on thetime of each transmit opportunity. When this time expires,the station must perform the back-off mechanism.

Mac Service Data Unit aggregation worksby grouping several IP-layer packets into a single Mac-layerpacket for the WiFi transmission. This grouping reduces theprotocol overhead, both in terms of transmission time andwaiting time due to the back-off mechanism. Implementingthis kind of aggregation in our model is very straightforward,because increasing the packet size parameter is sufficient. In802.11n, the maximum A-MSDU size is 7935 octets. Usingthis packet size, we arrive at the maximum stable throughputper station shown in Figure 18.We now compare the latency of a WiFi link using packetaggregation to one that does not. Figure 17 shows the time-to-empty CDF for a WiFi network with five stations andtransmit and receive rates of 144 Mbit/s. “Baseline80211n” isusing the exact parameters presented in Table 3. “RTSCTS”uses the RTS/CTS mechanism, and “AMSDU” uses packetaggregation with a packet size of 7935 bytes.

Tx and Rx Rate (Mbit/s) M b i t / s n=1n=2n=3n=4n=5n=6n=7 Figure 18: Upper bound on per station throughput for 𝑛 competing stations with A-MSDU packet aggrega-tion Mac Protocol Data Unit aggregation groupsseveral MAC-layer packets into a single transmit opportu-nity. Once a station has won a transmit opportunity, thestation can transmit several packets back-to-back withoutperforming the back-off procedure or waiting for individualACKs between each MAC-layer packet. The main differencebetween A-MPDU and A-MSDU is that with A-MPDU, eachpacket that is part of the aggregate can be ACKed separately.Separate ACKs mean the overhead of packet errors is smaller.Because we are investigating the latency induced by the WiFiDCF specifically, we do not explore the details of A-MPDU,other than to note that the DCF performance will be verysimilar to that of A-MSDU because both methods competefor transmit opportunities in the same manner. Both methodscan send similar amounts of data in a single aggregate.

Heusse et al.[11] first described “The WiFi PerformanceAnomaly,” a phenomenon by which a single low-rate sta-tion can lay claim to a large portion of the available airtime.This effect emerges because the WiFi CDF is designed togive each station an equal amount of transmit opportunities.When one station holds on to their transmit opportunity fora longer period than all other stations each time it wins one,the result is an uneven allocation of airtime resources. Heusseet al. show that when this happens, all stations achieve thesame throughput as the station using the lowest transmitrate.Our WiFi model can readily reproduce this phenomenon.The state representation is modified to include station iden-tifiers such that we can assign different parameters to eachstation. Then, we assign one station a rate of 1 Mbit/s andvary the rates of the other stations. The resulting throughput ubmitted for review to SIGCOMM, 2021 Teigen et al. Tx and Rx Rate (Mbit/s) . . . . . . . . . M b i t / s n=2n=3n=4n=5n=6n=7 Figure 19: Upper bound on throughput for each of 𝑛 stations when one of the stations transmit and receiveat 1 Mbit/s. per station is shown in figure 19. As expected, the per-stationrate is bounded above by 1 Mbit/s. Our results match thoseof Heusse et al. [11]. In this paper we have presented and validated a novel methodfor WiFi performance analysis. Our primary contribution isthe modeling of complete latency distributions. At the cost ofadded computational complexity, explicit latency modelingallows more accurate performance analysis by directly mod-eling the reliability of network outcomes. Our model doesthis while retaining the ability to produce throughput num-bers. We derive upper bounds for WiFi throughput under therequirement that latency and packet loss are bounded. Wealso investigate the consequences of RTS/CTS and packetaggregation, and reproduce the result known as “The WiFiperformance anomaly”. Our model lets us quantify the im-pact of trade-offs such as RTS/CTS, where some stabilityis gained at the expense of added overhead. Using a sin-gle framework, we can measure these impacts in terms ofthroughput and in terms of average latency, jitter and packetloss. This flexible modeling means we can determine whichtrade-offs should be made depending on the particular use-case of a given WiFi network. Future work will investigatehow this new insight can be used to create WiFi controlsystems that automatically configure the WiFi network tobest suit the needs of specific applications, such as videoconferencing.

REFERENCES [1] Aditya Akella et al. “Self-Management in ChaoticWireless Deployments”. In:

Wireless Networks . Vol. 13.6. 2007, pp. 737–755. doi: 10.1007/s11276-006-9852-4. [2] Giuseppe Bianchi. “Performance analysis of the IEEE802.11 distributed coordination function”. In:

IEEEJournal on Selected Areas in Communications

IEEE International Symposium on Personal, Indoorand Mobile Radio Communications, PIMRC . Vol. 2. IEEE,1996, pp. 392–396. doi: 10.1109/pimrc.1996.567423.[4] Giuseppe Bianchi and Ilenia Tinnirello. “Remarks onIEEE 802.11 DCF performance analysis”. In:

IEEE Com-munications Letters

Linear Algebra and Its Applica-tions . Vol. 386. 1-3 SUPPL. North-Holland, July 2004,pp. 311–334. doi: 10.1016/j.laa.2003.12.018.[7] Neil Davies, Judy Holyer, and Peter Thompson. “End-to-end management of mixed applications across net-works”. In:

Proceedings - 1999 IEEE Workshop on Inter-net Applications

July (1999), pp. 12–19. doi: 10.1109/WIAPP.1999.788012.[8] Frank Den Hartog et al. “A pathway to solving the Wi-Fi tragedy of the commons in apartment blocks”. In:

ACM MSWiM 2005 - Pro-ceedings of the Eighth ACM Symposium on Modeling,Analysis and Simulation of Wireless and Mobile Sys-tems . Vol. 2006. 2006, pp. 224–233. isbn: 1595931880.doi: 10.1145/1089444.1089485.[10] Broadband Forum.

TR-452.1 Quality Attenuation Mea-surement Architecture and Requirements

Proceedings - IEEE INFOCOM . Vol. 2. 2003, pp. 836–843. doi: 10.1109/infcom.2003.1208921.[12] Lucian Leahu. “Analysis and predictive modeling ofthe performance of the ATLAS TDAQ network”. PhDthesis. Bucharest, Tech. U., Jan. 2013. Model of WiFi Performance With Bounded Latency Submitted for review to SIGCOMM, 2021 [13] Mukulika Maity, Bhaskaran Raman, and Mythili Vu-tukuru. “TCP download performance in dense WiFiscenarios: Analysis and solution”. In:

IEEE Transac-tions on Mobile Computing

IEEE/ACM Transactions on Networking

Proceedings - International Confer-ence on Network Protocols, ICNP . Vol. 2018-September.IEEE Computer Society, Nov. 2018, pp. 378–388. isbn:9781538660430. doi: 10.1109/ICNP.2018.00053.[17] The Institute of Electrical and Electronics Engineers.

IEEE Standard for Information technology – Telecommu-nications and information exchange between systemsLAN and MAN – Specific requirements - Part 11: Wire-less LAN Medium Access Control (MAC) and PhysicalLayer (PHY) Specifications, IEEE Std 802.11-2016 . 2013.url: http://ieeexplore.ieee.org/document/7786995/. [18] Peter Thompson and Neil Davies. “Towards a RINA-Based Architecture for Performance Management ofLarge-Scale Distributed Systems”. In:

Computers

IEEE/ACM Transactions on Net-working

Proceedings - 6th IEEE Interna-tional Symposium on a World of Wireless Mobile andMultimedia Networks, WoWMoM 2005 . 2005, pp. 240–248. isbn: 0769523420. doi: 10.1109/WOWMOM.2005.89.[21] Sungkwan Youm and Eui-Jik Kim. “Latency and JitterAnalysis for IEEE 802.11e Wireless LANs”. In:

Journalof Applied Mathematics2013 (2013). doi: 10.1155/2013/792529.