[PDF] DiffPerf: Towards Performance Differentiation and Optimization with SDN Implementation

Abstract

Continuing the current trend, Internet traffic is expected to grow significantly over the coming years, with video traffic consuming the biggest share. On the one hand, this growth poses challenges to access providers (APs), who have to upgrade their infrastructure to meet the growing traffic demands as well as find new ways to monetize their network resources. On the other hand, despite numerous optimizations of the underlying transport protocol, a user's utilization of network bandwidth and is thus the user's perceived quality still being largely affected by network latency and buffer size. To address both concerns, we propose DiffPerf, a class-based differentiation framework, that, at a macroscopic level dynamically allocates bandwidth to service classes pre-defined by the APs, and at a microscopic level statistically differentiates and isolates user flows to help them achieve better performance. We implement DiffPerf on OpenDaylight SDN controller and programmable Barefoot Tofino switch and evaluate it from an application perspective for MPEG-DASH video streaming. Our evaluations demonstrate the practicality and flexibility that DiffPerf provides APs with capabilities through which a spectrum of qualities are provisioned at multiple classes. Meanwhile, it assists in achieving better fairness and improving overall user's perceived quality within the same class.

Full PDF

DDiffPerf: Towards Performance Differentiation andOptimization with SDN Implementation

Walid Aljoby † , Xin Wang † , Dinil Mon Divakaran ∗ , Tom Z. J. Fu § , Richard T. B. Ma †† School of Computing, National University of Singapore ∗ Trustwave, Singapore § Bigo Technology Pte Ltd, Singapore { algobi, xin.wang, tbma } @comp.nus.edu.sg, [email protected], [email protected] Abstract — Continuing the current trend, Internet trafﬁc isexpected to grow signiﬁcantly over the coming years, with videotrafﬁc consuming the biggest share. On the one hand, thisgrowth poses challenges to access providers (APs), who haveto upgrade their infrastructure to meet the growing trafﬁcdemands as well as ﬁnd new ways to monetize their networkresources. On the other hand, despite numerous optimizations ofthe underlying transport protocol, a user’s utilization of networkbandwidth and is thus the user’s perceived quality still beinglargely affected by network latency and buffer size. To addressboth concerns, we propose DiffPerf, a class-based differentiationframework, that, at a macroscopic level dynamically allocatesbandwidth to service classes pre-deﬁned by the APs, and at amicroscopic level statistically differentiates and isolates user ﬂowsto help them achieve better performance. We implement DiffPerfon OpenDaylight SDN controller and programmable BarefootToﬁno switch and evaluate it from an application perspectivefor MPEG-DASH video streaming. Our evaluations demonstratethe practicality and ﬂexibility that DiffPerf provides APs withcapabilities through which a spectrum of qualities are provisionedat multiple classes. Meanwhile, it assists in achieving betterfairness and improving overall user’s perceived quality withinthe same class.

I. I

NTRODUCTION

Today’s Internet is dominated by content trafﬁc, especiallyvideo streams. According to Cisco Annual Internet Report [1],video will make up 82% of the total downstream Internettrafﬁc by 2022. In today’s home, Internet video drives ourwork and life, particularly with COVID-19 pandemic [2],meanwhile video applications continue to be of a signiﬁcantdemand for the bandwidth in the future [1]. To accommodatehigh trafﬁc, content providers (CPs) have been deployingwide-area infrastructures to bring content closer to users,e.g., Netﬂix uses third-party content delivery networks suchas Akamai and Limelight, and builds its own [3]. However,as end-users rely on last-mile access providers (APs) foraccessing the Internet, APs’ bandwidth capacity still limit userthroughput due to network congestion [4]. For example, theaverage throughput of Netﬂix users behind Comcast [5], thelargest U.S. broadband provider, degraded 25% from over 2Mbps in Oct 2013 to 1.5 Mbps in Jan 2014.To sustain trafﬁc growth, APs need to upgrade networkinfrastructures and expand capacities; however, their incentivesdepend on the business model and the corresponding mecha-nism used to monetize bottleneck bandwidth, which is crucialto the viability of the current Internet model in the future.A general approach used by APs is to differentiate services and prices, e.g., APs provide premium peering [6] optionsfor CPs to choose and multiple data plans for end-users withdifferent data usage to choose. However, the former can onlybe implemented with large CPs via peering agreements, whilethe latter does not guarantee the performance of end-users inany sense. The bandwidth allocation is typically a functionof the application endpoints, and is traditionally embodied aspart of transport layer’s congestion control mechanism. TCPCUBIC and BBR are the most popular protocols control themajority of Internet trafﬁc. However, both of them strive forefﬁcient utilization of the bandwidth, while being unawareof the negatively biased user’s Quality of Experience (QoE)affected by Round-Trip Times (RTT) and network buffer size.Consequently, there exists a fundamental mismatch betweenthe differentiated services and the underlying resource alloca-tion that differentiates for predictable performance.To resolve this mismatch, we consider a class-based differ-entiation approach, under which CPs and users can choosea service class (SC) to join. We propose

DiffPerf , adynamic performance differentiation framework, at APs van-tage point to manage their bottleneck bandwidth resourcesin a principled and practical manners. From a macroscopicperspective,

DiffPerf dynamically allocates bandwidth toeach SC according to the changing number of active ﬂows ineach SC, by maximizing the weighted α -fair utilities, whichenables APs to trade-off fairness. Nevertheless, as the usersin the same service class might not perceive a fair qualitydue to the consequences of the complex interaction betweenthe transport protocol and the inherent network conditionssuch as heterogeneous RTTs and buffer sizes, as shown inour experimental explorations and known by conventionalwisdom [7]. Thus, at a microscopic level, DiffPerf usesa new performance-aware mechanism, called ( β, γ )-fairnessto further optimize and make more ﬁne-grained bandwidthallocation within each SC, so as to more efﬁciently utilize theaggregate capacity and achieve fairer performance for ﬂows.Our main contributions are as follows:1. We derive the closed-form bandwidth allocation solutionand show that this solution achieves guaranteed perfor-mance differentiation in terms of controllable ratios ofthe average per-ﬂow throughput across the different SCs.2. Within each SC, we present ( β, γ )-fairness and a neatstatistical method to differentiate and isolate ﬂows auto- a r X i v : . [ c s . N I] D ec atically based on their achieved throughput, to mitigatethe bias brought by the TCP protocol due to its interac-tion with network latency (i.e., RTT) and buffer size.3. By leveraging SDN capabilities, we develop a nativeOpenDaylight (ODL) control plane application that dy-namically manages network resources, including track-ing ﬂows, inquiring ﬂow statistics and allocating band-width capacity. Furthermore, to measure the impact ofnetwork buffer sizes, we also implement DiffPerf onprogrammable Barefoot Toﬁno switch which allows ﬂex-ible buffer sizing and enables ﬁne-grained and ﬂexibleline-rate telemetry.4. We carry out comprehensive evaluations of

DiffPerf from an application perspective for DASH video stream-ing, as a mainstream accounts for the majority of Internetvideo trafﬁc.We believe that

DiffPerf demonstrates a new avenue forAPs to differentiate and optimize the performance of videoﬂows and corresponding perceived user QoE so as to bettermonetize their bottleneck network resources. This will furtherincentivize APs to deploy more bandwidth capacity to accom-modate the growth of Internet content trafﬁc.II. T HE D I F F P E R F F RAMEWORK

In this section, we present the

DiffPerf framework in atop-down manner. We ﬁrst describe how

DiffPerf allocatesbandwidth capacity among the SCs, based on an optimizationapproach. We will derive closed-form allocation solution andshow its feature of guaranteed performance differentiation. Wethen discuss the performance issues due to the consequencesof TCP congestion control mechanism that responds to theheterogeneity of ﬂows’ RTTs and network’s buffer sizes. Tosolve this problem, we show how

DiffPerf classiﬁes ﬂowsand optimizes bandwidth allocation within each SC.

A. Inter-Class Bandwidth Allocation

We consider an access provider that offers a set S of serviceclasses over a bottleneck link with capacity C . We denote theset of active ﬂows in any service class s ∈ S by F s and thecardinality of F s , i.e., the number of ﬂows in class s , by n s .To differentiate the performance for ﬂows in different serviceclasses, the access provider needs to allocate appropriateamount of bandwidth to each service class. To accomplish thisin a principled manner, we formulate the bandwidth allocationas an optimization of the allocation X = ( X s : s ∈ S ) thatsolves a general utility maximization problem as follows.max X (cid:88) s ∈S n s U s (cid:18) X s n s (cid:19) (1)s.t. (cid:88) s ∈S X s ≤ C and X s ≥ , ∀ s ∈ S . (2)Under the link capacity constraint (2), the above mathematicalprogram tries to maximize the aggregate utility over all serviceclasses, where for each service class s , it counts the numberof ﬂows n s multiplied by the per-ﬂow utility U s ( X s /n s ) over the average capacity X s /n s allocated to each ﬂow. In particular, we adopt and generalize the well-known weighted α fairness family of utility functions [8] as follows. In this familyof utility functions, each service class s will be assigned aweight w s that indicates the relative importance of the serviceclass, resulting in differentiated per-ﬂow bandwidth allocationacross the service classes. By controlling the parameter α , theaccess provider can express different preferences over variousnotions of fairness. When α approach , the utility tendsto be measured purely by the allocated bandwidth; when α approaches + ∞ , the solution converges to the weighted max-min fair allocation among the ﬂows. In particular, a trade-offof a weighted proportional fair solution can be obtained bysolving the optimization problem when α is set to be . Thus,besides the differentiation factor w s among service classes, theservice operator can choose the value of α to tradeoff fairness. Theorem 1.

If an allocation X maximizes the aggregate utilityover all service classes, it must satisfy X s = n s α √ w s (cid:80) s (cid:48) ∈S n s (cid:48) α √ w s (cid:48) C, ∀ s ∈ S . (3)Theorem 1 provides the closed-form solution of the utilitymaximization problem. Based on the optimal allocation solu-tion in Equation (3), we derive the ratio of the average per-ﬂowcapacities of any two service classes s, s (cid:48) ∈ S as (cid:18) X s n s (cid:19) : (cid:18) X s (cid:48) n s (cid:48) (cid:19) = α (cid:114) w s w s (cid:48) . (4)This result implies that performance differentiation is achievedby enforcing a ﬁxed ratio for the per-ﬂow bandwidth capacityacross SCs, which is controlled by the weights w s , w s (cid:48) andthe fairness parameter α . Equation (4) explicitly shows thatthe optimal solution effectively allocates a higher average per-ﬂow capacity in the service class that has a larger weight,which is desirable and expected for the better service class.In particular, we also see that when α is set to be , theweighted proportional fair allocation leads an average per-ﬂowallocation that are proportional to the weights of the SCs. B. Intra-Class Bandwidth Allocation

Motivation:

Given X s amount of bandwidth capacity allo-cated to the n s ﬂows in SC s , each ﬂow f ∈ F s is expected toachieve an average throughput of X s /n s . However, the actualthroughput achieved, denoted by x f , might be signiﬁcantlyless than the mean. This can adversely affect the QoE thatthe corresponding user perceives. At the last-mile bottleneck,parameters such as RTT, TCP congestion control algorithm(e.g., CUBIC v/s BBR), and buffer size affect the performanceof ﬂows [9], [10], [11]. The heterogeneity of RTTs experiencedby the ﬂows as well as the interaction of the TCP-basedcongestion control mechanisms that respond to the RTTs andnetwork buffer size differently, lead to multiple competingﬂows achieving different throughput. We analyzed the perfor-mance of 100 competing DASH ﬂows on a testbed, where allﬂows run TCP BBR and share a bottleneck link connecting to aDASH server. The bottleneck link capacity is set to 120Mbps,nd 30% of the ﬂows experience relatively longer RTTs thanthe rest. We run the experiments by changing one of the keyparameters, i.e. the network buffer size. The experiment resultsshow that the average stalling time of DASH ﬂows at 10MBnetwork buffer size, is 35% higher than that of DASH ﬂowswhen the network buffer size is 1MB. However, by “isolating”ﬂows that perceived dissimilar QoE at the last-mile bottlenecklink, we observed that average stalling time of DASH ﬂowsis reduced by 50% and 25% at the buffer sizes of 1MBand 10MB, respectively, thereby improving the overall QoEsigniﬁcantly.Motivated by this observation, we propose a practically scal-able solution to classifying similar ﬂows into sub-groups andisolating them into separate sub-classes by allocating appro-priate amount of bandwidth to them within each SC. Next, wedescribe 1) a ﬂexible statistical method that DiffPerf usesto classify ﬂows within a SC, and 2) the intra-class bandwidthallocation used by

DiffPerf for sub-group isolation.

1) Flow Classiﬁcation and Isolation:

By relying on QoEas a similarity metric to classify the ﬂows, clearly this choicerequires an explicit feedback via the receiver to the AP vantagepoint, which is difﬁcult to afford in practice. We thereforewant to leverage SDN functionalities to ﬁnd other metrics touse at the vantage point. The ﬁrst metric that comes to mind isRTT. However, the use of real-time RTT samples could not betaken solely as indicators of performance issues without otherinformation such as underlying congestion control mechanism,buffer size [9], and packet route security. Even if we makeassumptions of the availability of these information, measuringﬂow RTT at the APs is unreliable. Measuring RTT at the SDNcontrol plane inﬂates a variable and high RTT based on ourmeasurements in ODL control plane, while measuring it in theSDN data plane [12] may not scale well due to the memoryspace constraints. Instead, we emphasize that throughput ofTCP ﬂows is the appropriate and robust metric that indicatesthe collective impact of the interaction of network parametersto the user-perceived performance. Next we show how to uti-lize the throughput measure as a proxy to determining whetherﬂows are similar to each other and to identify effectively whichﬂows are affected.Because the number of groups and the number of ﬂows foreach group may change and are not known in real scenarios,we adopt general statistical metrics for classiﬁcation. Given theachieved throughput x f of the ﬂows f ∈ F s in any SC s ∈ S ,the mean and standard deviation of the ﬂows’ throughput aredeﬁned as ¯ x s = n s (cid:80) f ∈F s x f and σ s = (cid:113)(cid:80) f ∈F s ( x f − ¯ x s ) n s .Because the achieved throughput x f of each ﬂow depends onthe number n s of competing ﬂows and their characteristics,buffer size, and the allocated capacity X s that ultimatelydetermines the network congestion imposed on the SC, insteadof using absolute throughput thresholds to classify ﬂows, weadopt the following statistical metric that orders and measuresthe relative throughput values among all ﬂows in the same SC. Deﬁnition 1.

Given the mean ¯ x s and standard deviation σ s , the standard score of a ﬂow f ’s throughput is deﬁned by z f =( x f − ¯ x s ) /σ s . When a ﬂow’s throughput is above (or below) the mean, itsstandard score or z -score is positive (or negative, respectively).This z -score captures the signed fractional number of standarddeviations by which it is above the mean value.Without loss of generality, we divide a set F s of ﬂowsinto two sub-classes: lower sub-class F Ls and upper sub-class F Hs , based on each ﬂow’s z -score compared with apre-deﬁned threshold β , where F Ls ( β ) = { f ∈ F s : z f <β } and F Hs ( β ) = F s \F Ls ( β ) . The set F Ls contains theﬂows that achieved the lowest throughput values (i.e., thenegatively affected ﬂows). Thus, our goal is to identify them sothat we isolate and allocate appropriate amount of bandwidthto them accordingly. We use a non-positive value of β to cap-ture ﬂows whose throughput are | β | deviations lower than theachieved average ¯ x f . Because the set F Ls grows monotonicallywith the parameter β , i.e., F Ls ( β ) ⊆ F Ls ( β ) ∀ β < β , asmaller value of β makes a more conservative decision on thelowest throughput ﬂows, avoiding mis-classiﬁcations. We willfurther study how the values of β affect the performance ofﬂows in a later section via experimental evaluations.

2) Bandwidth Allocation Model:

After classifying ﬂows ineach SC into two sub-groups, we isolate them into two sub-classes and determine how much bandwidth X Ls and X Hs toallocate for each sub-class. To fully utilize bandwidth capacity,our solution needs to satisfy X Ls + X Hs = X s .The throughput of some ﬂows might be naturally lowand might not be able to achieve the targeted throughput X s /n s even allocated that amount of capacity. As a result,enforcing the per-ﬂow allocation of X s /n s will result inresource wastage. The key question to answer is how muchper-ﬂow capacity we should allocate to the ﬂows F Ls , whoseinnate throughputs are less than what is needed to achievethe average throughput ¯ x s or to utilize the per-ﬂow allocatedcapacity X s /n s in theory. Since these ﬂows might not beable to achieve the average throughput, the per-ﬂow allocationshould be no higher than X s /n s . On the other hand, byisolating negatively affected ﬂows from high-throughput ﬂows(i.e., the ﬂows F Hs that cause the performance issues of ﬂows F Ls ), we expect them to achieve higher throughput than whatare being achieved; and therefore, we should allocate morecapacity for the set F Ls of ﬂows than their aggregate achievedthroughput. To this end, we allocate the average amount ofbandwidth capacity for the per ﬂow of set F Ls as X Ls ( β, γ ) |F Ls ( β ) | = γ (cid:80) f ∈F − s x f |F − s | + (1 − γ ) X s n s , (5)where we deﬁne the set of ﬂows whose throughput are belowthe mean ¯ x s by F − s (cid:44) { f ∈ F s : x f < ¯ x s } and introduce aparameter γ ∈ [0 , to control the allocated capacity ﬂexibly.In particular, for one extreme of γ = 1 , the solution allocatesthe average throughput of the set F − s of ﬂows as the per-ﬂow capacity for the lower sub-class F Ls ( β ) , which must belower than the average throughput ¯ x s and the average capacity s /n s of all ﬂows. In this case, the per-ﬂow capacity allocatedfor the lower sub-class F Ls is lower than that allocated forthe upper sub-class F Hs , under which resource wastage isreduced and resource is utilized more efﬁciently. For the otherextreme of γ = 0 , the solution simply isolates the two sub-classes and equally allocates an average capacity X s /n s asthe per-ﬂow capacity for both upper and lower sub-classes,under which per-ﬂow fairness is enforced regardless of howefﬁciently the resource is utilized. Thus, by choosing the valueof γ between and , we can make a trade-off betweenresource fairness and utilization. However, this depends onthe interaction of TCP algorithm with network buffer size.As opposed to the shallow buffer, the deep buffer allows low-throughput TCP ﬂows (especially those negatively affected dueto heterogeneity of RTTs) to stabilize their transfer. Thus, inthe vantage point of deep buffer, if the low-throughput ﬂowswere crowded out by others, then they can perform better if γ = 0 . However, this is not the case in shallow buffer thatdoes not allow negatively affected ﬂows to ramp up quicklyand perhaps this even leads to lower utilization.By Eq.(5), we also have the next theorem showing 1) lowerbounds of per-ﬂow capacities re-allocated to the lower andupper sub-classes F Ls and F Hs ; and 2) the monotonicity ofthe average throughput of the ﬂows within F Ls and the averageper-ﬂow capacity re-allocated to the F Hs on the parameter β . Theorem 2.

Given any ﬁxed parameter γ , for any serviceclass s ∈ S , 1) the average achieved throughput of theﬂows within the lower sub-class F Ls is non-decreasing in β and always no higher than the average per-ﬂow capacity re-allocated to F Ls ; 2) the average per-ﬂow capacity re-allocatedto the upper sub-class F Hs is non-decreasing in β and alwaysno lower than X s /n s . Theorem 2 states that as the parameter β increases, the aver-age throughput of the ﬂows F Ls of the lower sub-class wouldalso increase because more high-throughput ﬂows would beclassiﬁed into F Ls . It also tells that this achieved averagethroughput must be no higher than the the per-ﬂow capacity re-allocated to them in F Ls . This property guarantees our designobjective of allocating more capacity for the ﬂows in the lowersub-class than their aggregate achieved throughput. Theorem 2also states that as β increases, ﬂows within the upper sub-class F Hs would be re-allocated more per-ﬂow bandwidth capacityalthough fewer ﬂows would be classiﬁed into the sub-class.Thus, service operators can choose the value of β to controlthe scales of the sub-classes and both β and γ to controlthe bandwidth capacity allocated to the ﬂows of the two sub-classes, which we refer to it as ( β , γ )-fairness.Before we close this section, we would like to emphasizethat although DiffPerf classiﬁes the ﬂows in each SCinto two sub-groups for simplicity, its statistical method ofclassiﬁcation and the corresponding bandwidth allocation canbe applied in a top-down recursive manner to further split anysub-group for a more ﬁne-grained optimization. III. I

MPLEMENTATION A. DiffPerf

Prototype on OpenDaylight with OpenFlow

We implement

DiffPerf as an application on the popularindustry-grade open-source SDN platform—the OpenDaylight(ODL) controller. We particularly develop a native MD-SAL(Model-Driven Service Adaptation Layer) application on ODLwhich comprises use of different technologies such as OSGI,Karaf, YANG, blueprint container, and messaging patterns asRPC, publish-subscribe, and data store accesses [13]. We skipimplementation details for the sake of brevity. Figure 1 depicts

Inter-class Allocator Intra-class AllocatorFlow Processor Bandwidth OptimizerBandwidth Enforcer Stat. CollectorOpenFlow PluginForwarding Devices S DN C on t r o ll e r O pen F l o w P r o t o c o l Figure 1:

DiffPerf implementationthe implementation structure of

DiffPerf on ODL. Thereare four main modules:

Flow Processor , Bandwidth Optimizer , Bandwidth Enforcer and

Stats Collector and are interconnectedas shown in the ﬁgure; we brieﬂy describe them below.

1) Flow Processor:

The

Flow Processor module on theODL performs two primary functions. First, it assigns user-speciﬁed service classes to newly joining ﬂows in the network.We use the YANG modeling language [14] to deﬁne theservice classes. Second, the module carries out regular ﬂowmaintenance; i.e., the ﬂow processor inserts new ﬂows into thedata store, determines the pre-deﬁned service class and assignsthe corresponding weight to new ﬂows, removes inactive orcompleted ﬂows, etc.

2) Statistics Collector:

DiffPerf performs in-networkperformance optimization. For

DiffPerf to work, we needto estimate throughput of each active ﬂow. Let x f ( t ) denotethe average throughput of a ﬂow f until time t . By measuringthe instantaneous throughput ˜ x f of ﬂow f during the pastimmediate time period ∆ t , the average throughput for thenext time period ∆ t is updated as follows: x f ( t + ∆ t ) ← δ × x f ( t ) + (1 − δ ) × ˜ x f . where δ ∈ [0 , , is a weight. Havingmeasured the average throughput of the active ﬂows, we usethese estimates to group ﬂows into sub-classes so that ﬂowswith similar achieved throughput fall into the same sub-class.To obtain real-time estimates of the throughput of eachactive ﬂow as well as link bandwidth, we implement a Stat.Collector module on ODL. This module registers per-ﬂowrules to pull out measurement information using event-basedhandlers from the operational data store in ODL. The datastore in turn uses the OpenFlow plugin (as indicated inFigure 1) to request the switches to report ﬂow measurements;The per-ﬂow measures of interest are packet counts, bytecounts and duration. ) Bandwidth Optimizer:

The core part of

DiffPerf is the

Bandwidth Optmitizer module, which is responsiblefor inter- and intra-class bandwidth optimization describedin Section II. The optimizer runs every ∆ t interval, gettinginput from two modules described above - Flow Processor and

Stat. Collector (see Figure 1). While the former providesmapping of ﬂows to user-speciﬁed service classes, the latterprovides real-time measurements on the active ﬂows in theswitch. Given the input information, the inter- and intra-classoptimizers are executed; the output of optimization are: (i) theportion of bandwidth allocated to each service class (SC), and(ii) the portion of bandwidth for each sub-class within everyservice class.

4) Bandwidth Enforcer:

To materialize bandwidth alloca-tion, each sub-class should use its designated bandwidth inan isolated manner. A naive approach to implementing thisis to leverage multi-queues at the switch egress port so thateach sub-class maps into an isolated queue. However, thereare two practical challenges. First, in commodity switches thenumber of queues at egress port is usually limited to a smallnumber [15][16], meaning that the number of available queuescould be less than the number ﬂow sub-classes. Second,current OpenFlow switches do not expose APIs to update theweight of the queues dynamically. Without this capability (ofdynamically changing queue weights), the bandwidth allocatedto queues cannot be changed as and when required.To overcome both limitations, we leverage the metering feature available in OpenFlow switches. Instead of deﬁningqueues and updating their bandwidth at egress port, wedevelop a

Bandwidth Enforcer module that essentially doesenforcement at the ingress side of the switch. That is, multiplemeters corresponding to the number of sub-classes are deﬁned;and based on the output of the

Bandwidth Optimizer , ﬂow rateof each sub-class is attached to a speciﬁc meter dynamically.The

Bandwidth Enforcer uses the OpenFlow plugins to encap-sulate the allocated bandwidth into OpenFlow messages andinstall them on to the switch(es). B. DiffPerf

Prototype with Programmable Data Plane

Next, we implement another prototype of

DiffPerf on alightweight C controller that is connected to Barefoot Toﬁnoprogrammable switch [17] which enables ﬂexible buffer sizingand ﬁne-grained line-rate telemetry. We particularly implementstatistics building block in the data plane to track numberof bytes transmitted by the active ﬂows. Additionally, were-implement Bandwidth Enforncer and

Statistics Collector modules in the control plane. We leverage the Toﬁno switchexposed APIs to update the weight of the queues dynamicallyand conﬁgure their sizes by the control plane. The remainingmodules are kept the same with minor modiﬁcations.IV. E

XPERIMENTAL E VALUATION

We evaluate

DiffPerf by carrying out experiments on arealistic testbed. We describe the details below.

A. Testbed setup

OpenFlow Brocade switch experiments:

We set up atestbed for video streaming between a DASH client (i.e.,video player) called dash.js and DASH server over anSDN network; Our testbed consists of 12 servers, 10 of whichare used to host DASH clients and one each for hostingDASH server and ODL controller. The 10 servers runningDASH clients are connected to the DASH server such thatthey compete (for video segments) at a downstream bottlenecklink from a SDN-enabled Brocade ICX-6610 24-port physicalswitch. We evaluated

DiffPerf in three different scenarios.In Scenario 1 (Section IV-C1) and Scenario 2 (Section IV-C2),each physical server hosts up to DASH clients, each clientruns in a VM, and all clients are connected to the DASH serverover a 50 Mbps downstream bottleneck link. For Scenario 3(Section IV-C3), we scale up the number of DASH clients—each physical server hosts DASH clients, all run as dockercontainers and connected to the DASH server over a 200 Mbpsdownstream bottleneck link.

Barefoot Toﬁno switch experiments:

The experimentswith Toﬁno programmable switch concentrate on evaluatingthe impact of buffer size on the performance of bottle-necked ﬂows and how

DiffPerf enables switch bufferto perform better (i.e., improve the overall ﬂow perfor-mance). The evaluations are carried out with multiple switchbuffer sizes:

KB, MB, and MB. Toﬁno exposes aset of APIs for Trafﬁc Manager applications to managebuffer allocation from both ingress and egress ends. We use bf_tm_q_app_pool_usage_set

API to aid in settingbuffer size for the queues of the egress port attached tothe bottlenecked link. Buffer size is speciﬁed in terms ofcells, where each cell size is 80 bytes. The buffer precedes a

Mbps bottleneck link that transfers video segments fromDASH server to

DASH clients. The results are presentedunder the last part of Scenario 2 (Section IV-C2).Except for Scenario 1 (Section IV-C1), assuming majorityof ﬂows in the Internet have short RTTs [18], we partition theclients into two sets in 70:30 ratio based on the RTT min valuesconﬁgured: the mean and standard deviation of the bigger setare 64ms and 16ms, respectively, and that of the other are224ms and 32ms, respectively. We use the network emulator netem [19] at the server machines running the DASH clientsto set the latency. For streaming, we use the Big Buck Bunnyvideo sample that lasts for 600 seconds and has been encodedinto 3 bitrate levels—1.2 Mbps, 2.2 Mbps and 4.1 Mbps—ofequal segments (i.e., each segment is 2 seconds long). Thus,a DASH client can choose the bitrate levels and segments forstreaming video, based on the measured congestion level of thenetwork. We compare

DiffPerf against two most popularTCP congestion control algorithms on the Internet [20]; TCPCUBIC [21] and TCP BBR [22].

B. Metrics for evaluation

To evaluate the performance of

DiffPerf and TCP vari-ants, we use two metrics. One is per-ﬂow average throughput,which in our case corresponds to the average throughputs .25 0.5 1 2 412345 A v e r age t h r oughpu t ( M bp s ) TotalG-classS-classB-class (a) Throughput A v e r age Q o E G-classS-classB-class (b) QoE

Figure 2: The Achieved throughput and corresponding QoE ofmultiple service classes for different values of α of all DASH client. The other metric of importance is theuser-perceived quality-of-experience (QoE). The QoE metric isadopted based on the widely used model proposed by [23], andis expressed as: QoE = (cid:80) Nn =1 q ( R n ) − λ (cid:80) N − n =1 | q ( R n +1 ) − q ( R n ) | − µT stall − µ s T s . This QoE deﬁnition uses variousperformance factors such as the average playback bitrate R n over the total N segments of the video, the average variabilityof the consecutive segments bitrate represented by the secondsummation, the duration of rebuffering T stall (i.e., the durationof time the player’s playout buffer has no content to render),and startup delay ( T s ) (i.e., the lag between the user clickingand the time to begin rendering). As in [24][23], q maps abitrate to a quality value; λ is usually set to one, µ and µ s areset to the maximum bitrate of the video sample. We measureQoE for the entire duration of the video. C. Results1) Scenario 1: Evaluation of inter-class performance:

Inthis scenario we evaluate

DiffPerf ’s inter-class bandwidthallocation model. We assume the access provider offers threeclasses of services: Golden (G), Silver (S) and Bronze (B),with weights of 3, 2 and 1, respectively. We run a set ofexperiments to evaluate the bandwidth allocated to users ofdifferent classes under different values of α . We assign 13DASH clients to each service class, thereby having a totalof 39 DASH clients in this scenario. All ﬂows experiencehomogeneous RTT in this scenario.Figures 2(a) plots, for each SC, the average throughputachieved by all ﬂows in that class, for different values of α . Evidently, the ratios of the estimated average throughputof ﬂows across the service classes closely follow the ratiosobtained from our model (refer Eq. 4). In addition, the averagethroughput is converging with increasing α .Figure 2(b) plots the average QoE of all ﬂows in each SC.Observe that the QoE of service class B is low, when theaverage throughput achieved (given in Figure 2(a) achievedis low. The QoE of the three service classes converge withincreasing value of α . With increasing α , as resources wouldbe fairly shared among the competing ﬂows, it is expected thathigher level QoE also reﬂects this fair sharing given that theﬂows have homogeneous RTTs.

2) Scenario 2: Evaluation of intra-class performance:

Inthis part, we evaluate our proposed performance-aware fair-ness, ( β , γ )-fairness. That is, DiffPerf ’ β capability to mit- igate the bias brought against the affected ﬂows by the inter-action between TCP CUBIC, TCP BBR, ﬂows with heteroge-neous RTTs, and switch buffer size. Also, we present the ﬂex-ibility of γ in enabling a feature of practical interest, the trade-off between network efﬁciency and user QoE fairness. Recall, DiffPerf uses statistical ﬂow classiﬁcation and bandwidthallocation to the classiﬁed sub-classes to appropriately allocatea higher capacity to the negatively affected ﬂows.

DiffPerf based on CUBIC.

In Figure 3, we present T h r oughpu t ( M bp s ) (22,18) (17,23) (12,28) (9,31) (5,35) lower sub−class upper sub−class β = 0 β = −0.25 β = −0.50 β = −0.75 β = −1 CUBIC BBR DiffPerf CUBIC BBR DiffPerf CUBIC BBR DiffPerf CUBIC BBR DiffPerf CUBIC BBR DiffPerf

Figure 3: Aggregate throughput of the SC sub-classes ﬂows DASH Client ID Q o E CUBICBBRDiffPerf ( γ =1)DiffPerf ( γ =0.5)DiffPerf ( γ =0) (a) Upper sub-class

24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40−8−6−4−20246x 10 DASH client ID Q o E CUBICBBRDiffPerf ( γ =1)DiffPerf ( γ =0.5)DiffPerf ( γ =0) (b) Lower sub-class Figure 4: QoE of the SC sub-classes ﬂowsthe aggregate throughput achieved by DASH clients belongingto the two different sub-classes, under TCP CUBIC, TCPBBR and

DiffPerf over CUBIC. We run experiments fordifferent values of β . For better clarity, we show the resultsonly for γ = 0 . . The number of ﬂows classiﬁed into thelower sub-class based on our statistical classiﬁcation, for β = 0 , − . , − . , − . , and − . , are 22, 17, 12, 9, and5, respectively. The ﬁgure shows that DiffPerf ’s isolation enables the DASH clients in the lower sub-class to achievehigher throughput than both TCP variants, while also beingto achieve comparable aggregate throughput as TCP . And β gives an access provider ﬂexibility to decide on the ﬂowclassiﬁcation based on a simple and intuitive metric ( z -score).While small number of ﬂows get classiﬁed in the lower sub-class, we note that these were also the worst affected onces.Figures 4(a) and (b) plot the QoE achieved by each DASHclient, for β = − . ; here we plot for different values of γ as well. The former ﬁgure plots QoE of the upper sub-classof ﬂows, the the latter plots the same for the lower sub-classof ﬂows. The key observation is that, the ﬂows in the lowersub-class perceive higher QoE under DiffPerf than underboth TCP variants, and only at the cost of a small numberof ﬂows in the upper sub-class. Another observation is that, Notice that all CUBIC, BBR, and

DiffPerf achieve aggregate through-put slightly higher than 50Mbps; this is due to the burstiness tolerated byOpenFlow meters. Also, BBR achieves the highest aggregate as it fundamen-tally does not passively react to packet loss or delay as signals of congestion. .511.522.5 A v g . t h r oughpu t ( M bp s ) = 0 = 0.25 = 0.5 = 0.75 = 10102030405060 T h r oughpu t ( M bp s ) lower sub-classupper sub-class lower sub-classupper sub-class Figure 5:

DiffPerf fairness-efﬁciency tradeoffwith lower γ , DiffPerf gives fairer QoE to the clients;

DiffPerf with γ = 0 is the most fair. DiffPerf not onlyimproves the fairness, we calculated the overall QoE valuesfor all ﬂows; and it shows that

DiffPerf , via ﬂow isolationas well as performance-aware bandwidth allocation, improvessigniﬁcantly the overall QoE compared to TCP solely. Itperforms . , . , and . times higher QoE than CUBIC,at γ = 0 , . , and , respectively; Similarly, it performs . , . , . times higher QoE than BBR. Fairness-efﬁciency tradeoff:

We run experiments for dif-ferent values of γ , to analyze the trade-off between efﬁciency(i.e., bandwidth utilization) and fairness (user-perceived qual-ity fairness), β is set to − . . Figure 5 depicts that theaggregate throughput increases as γ increases. Meanwhile, theaverage throughput of the lower sub-class decreases and thatof the upper-class increases, as γ is increased from 0 to 1(refer the second Y-axis). Evidently, the parameter γ affectsthe average ﬂow throughput of both sub-classes. This behavioris due to the fact that, when γ approaches 0, the clients areallocated equal bandwidth regardless of the sub-class; hencethe DASH clients tend to achieve higher throughput (subjectto their characteristics), and thus the fairness is also improved.Conversely, when γ approaches 1, DiffPerf helps thenetwork achieves better bandwidth utilization.

DiffPerf allocates higher bandwidth to upper sub-class that likely hasﬂows with greater tendency to exploit provisioned networkbandwidth, and hence result in better network utilization.

DiffPerf based on BBR.

As TCP BBR has recentlygained wide-spread attention,

DiffPerf is evaluated overTCP BBR. Figure 6 shows that

DiffPerf ’s isolation enablesDASH clients in the lower sub-class (i.e., the affected ﬂows)to achieve higher throughput than BBR, while also being ableto achieve comparable aggregate throughput as BBR. Thenumber of ﬂows classiﬁed into the sub-classes are illustratednext to DiffPerf bar. For example, at β = , ﬂows areclassiﬁed into lower sub-class, and ﬂows to upper sub-class.Figure 7(a) and (b) plot the perceived QoE of DASH clientsin the aforementioned sub-classes. The lower-subclass ﬂowswith DiffPerf perceive better QoE than with BBR.

Impact of buffer size:

Based on the experiments carried outon the Toﬁno switch, this part presents the impact of buffersize on the performance of bottlenecked ﬂows and the ﬂowoptimization achieved using

DiffPerf . Note, Toﬁno switchupdates

DiffPerf with ﬂow statistics every 1s (i.e., thesampling rate). At every interval of ∆ t = 5s, DiffPerf uses T h r oughpu t ( M bp s ) ( , ) ( , ) ( , ) ( , ) ( , ) lower sub−class upper sub−class β = −1 β = −0.75 β = −0.50 β = −0.25 β = 0 BBR DiffPerf BBR DiffPerf BBR DiffPerf BBR DiffPerf BBR DiffPerf

Figure 6: Aggregate throughput of the SC sub-classes ﬂows DASH Client ID Q o E BBRDiffPerf ( γ =0) (a) Upper sub-class DASH client ID Q o E BBRDiffPerf ( γ =0) (b) Lower sub-class Figure 7: QoE of the SC sub-classes ﬂowsthe last measured statistics for regulating network ﬂows in thenext immediate time interval. Figure 8 plots the QoE achievedby

DASH clients competing for

Mbps bottleneck linkpreceded at one experiment by MB shallow buffer and inthe other experiment by MB deep buffer. With shallowbuffer, BBR achieves three times higher QoE than with deepbuffer. We then run same experiments for

DiffPerf , with DASH client ID Q o E DiffPerf with Deep BufferDiffPerf with Shallow BufferBBR with Deep BufferBBR with Shallow Buffer

Figure 8: Impact of switch buffer size β = − . and γ = 0 ; and it achieves . times higher QoEthan BBR with shallow buffer and . times higher QoE thanBBR with deep buffer. We note that shallow buffer leads tooverall better user-perceived quality. However, quality worsenswith much smaller buffer size (e.g. KB). The deep buffermight help low-throughput ﬂows, especially those affected bythe interaction of TCP with ﬂow RTT, to achieve better QoEbut it increases packet queuing delay. The very shallow buffer(e.g.

KB) reduces packet queuing delay but increasespacket losses. Hence, both these extreme buffer sizes increaseDASH client’s average stalling time (i.e., the duration of timethe player’s playout buffer has no content to render). WithBBR, the client average stalling time is . , . , and . seconds, while under DiffPerf , the client average stallingtime is reduced to . , . , and . seconds over buffersizes KB, MB, and MB, respectively.

DiffPerf thusdemonstrates to be effective in improving the user-perceivedquality for multiple buffer sizes. Lastly, it is worth noting thatrom this set of experiments, the buffer size of MB makesbetter trade-off between queuing delays and packet losses.Overall,

DiffPerf is fairer than both CUBIC and BBRin terms of client’s throughput, client’s QoE, and provides thehighest overall QoE.

Scalability : DiffPerf operations on the Toﬁno switch aresplit across controller and dataplane. The controller collectsaggregate real-time statistics of the active ﬂows, performs theoptimization, and reacts to data plane regularly. The dataplanetracks the number of bytes transferred by the active ﬂows.

DiffPerf does not impose high rate of sampling, whichmay lead to inaccurate statistics, especially when the DASHclients enter

OFF period. Hence, the communication (betweencontroller and dataplane) is only at the scale of seconds, andthis works well for long running video ﬂows over the Internet.This is also demonstrated by our experimental results.

3) Scenario 3: The Dynamics of

DiffPerf : Finally, tounderstand how

DiffPerf performs in real-world cases, weevaluate it in a dynamic scenario, where users from differentservice classes join and leave the network at different times.In this set of experiments, we conduct the evaluation on Open-Flow network centralized by ODL controller, were 150 DASHclients share a

Mbps bottleneck link; they have variableRTTs, with the ratio and distribution same as in the previousscenario. The arrivals of the DASH client requests follow thePoisson process with rate λ = 1 client/s. A client exits afterthe entire video (that lasts for seconds) is streamed. TheDASH clients subscribe to G, S, and B, service classes inthe ratio 1:2:3. The weights of service classes are kept thesame as before; i.e., G:S:B = 3:2:1. We set the values of α , β and γ to , − . , and . , respectively. At every intervalof ∆ t = 15s, DiffPerf uses the last measured statistics,such as number of active ﬂows and each ﬂow’s instantaneousthroughput ( δ = 0), to subsequently send command to theswitch for regulating network ﬂows in the next immediatetime interval. OpenFlow switch updates DiffPerf with ﬂowstatistics every 3s (i.e., the default sampling rate in BrocadeICX-6610 switch). Figure 9(a) shows the arrival and departureof DASH ﬂows. The number of active ﬂows in the twosub-classes, for each of the services classes, are depicted inFigures 9(b), (c), and (d). Although the video being streamedis of 600 seconds, observe that the G-class clients completeearlier than S-class and B-class; and this is true for bothlower and upper sub-classes. Similarly, S-class ﬂows ﬁnishearlier than B-class ﬂows. We observe a sudden decrease inthe active ﬂows a few times (the dips on the curves); this isnot because the ﬂow(s) actually leave the system, rather, dueto the expiry of idle timeout of ﬂows. When DASH clientdoes not receive video segments packets, the timeout causesthe ﬂow to be deemed as an inactive ﬂow. However, oncethe client resumes receiving data,

DiffPerf promptly countsit as an active ﬂow. Figure 10 plots the dynamic bandwidthallocation recommended by

DiffPerf , for each service classand the sub-classes within. The bandwidth allocated accountsfor number of active ﬂows in each service class and theirachieved throughput, optimized via ( β , γ ) performance-aware mechanism. It also shows that DiffPerf adapts quickly tothe departure of ﬂows (observe time period after 600 seconds),allocating the spare capacity to the remaining active ﬂows.V. R

ELATED W ORK

A. TCP Congestion Control

Increase of network bandwidth also saw the emergenceof ‘high-speed’ TCP variants such as FAST [25], BIC [26],CUBIC [21] and BBR [22] for transporting Internet traf-ﬁc. Yet, TCP’s inability to fairly share the bandwidth ofﬂows with heterogeneous RTTs—a problem known to thecommunity for around two decades [27], [28], [29]—stillpersists. As demonstrated by our experiments (and also otherworks, e.g., [30]) CUBIC exhibits such a behavior, and sodoes BBR [9], [31]. This unfairness in achieved throughputworsens when ﬂows with different TCP congestion controlmechanisms compete [11]. Another interesting observationfrom literature is that the relative performance degradation inthroughput can be due to more than the single factor of RTT.In this context, we highlight that

DiffPerf is agnostic tothe speciﬁc RTT of ﬂows and other router speciﬁcations (e.g.,buffer size) in performing optimization and enforcement of thecomputed optimized bandwidth; indeed

DiffPerf classiﬁesand isolates ﬂows of dissimilar characteristics solely based ontracking their achieved throughput.

B. Service Differentiation

Service differentiation is at the core of network qualityof service (QoS) provisioning to serve trafﬁc from multi-ple classes over the network [32], [33], [34], [35]. While

IntServ [34] did not ﬁnd adoption in the Internet,

DiffServ [32]inspired a body of work on providing differentiated services.However, many of such solutions mandate a sophisticatedscheduling with manual conﬁguration of QoS knobs on a perservice class basis. Instead, we choose a well-known utilityfunction based framework which enables service operator topractically specify number of service classes and make goodbalance between the bandwidth share and performance.In [36], authors proposed an approach for rate-delay (RD)differentiation by maintaining two queues at the router’s outputlink. While the aspirations resemble

DiffPerf , it is stillbest-effort and does not promise any rate or loss guarantees.[37] discussed a static service differentiation framework forISP. In short, class-based trafﬁc control and service differen-tiation have been largely limited to theoretical analyses [36],[38], [39], [40], and have not been experimented on hardwareswitches with real application trafﬁc.

DiffPerf ’s inter-classutility is general enough to make trade-offs among desirableperformance metrics and operates dynamically based on activeusers and available bandwidth.

C. Fair Queuing

Fair queuing has been a topic for extensive research [41],[42], [43], [42], [44], [45], [46]. FQ-CoDel [42], a recentAQM discipline, offers good performance gain in achievingfairness among the ﬂows, by classifying them into different

200 400 600 800 1000 1200

Time (s) D AS H c li en t I D G-class S-class B-class (a) Arrival-departure of the Clients

Time (s) C u m u l a t i v e s u m o f f l o w s G lower sub-classG upper sub-classG-class total (b) G-class active ﬂows

Time (s) C u m u l a t i v e s u m o f f l o w s S lower sub-classS upper sub-classS-class total (c) S-class active ﬂows

Time (s) C u m u l a t i v e s u m o f f l o w s B lower sub-classB upper sub-classB-class total (d) B-class active ﬂows

Figure 9: DASH clients arrival-departure and service classes cumulative sum of active ﬂows

Time (s) A ll o c a t ed B and w i d t h ( M bp s ) G lower sub-classG upper sub-classG-class total (a) Bandwidth allocation for G-class

Time (s) A ll o c a t ed B and w i d t h ( M bp s ) S lower sub-classS upper sub-classS-class total (b) Bandwidth allocation for S-class

Time (s) A ll o c a t ed B and w i d t h ( M bp s ) B lower sub-classB upper sub-classB-class total (c) Bandwidth allocation for B-class

Figure 10: Dynamic service classes bandwidth allocationbuckets and serving them in a round-robin manner. However,large memory is extremely expensive or unavailable in dataplane, hence it is practically infeasible to accommodate verylarge number of buckets for hashing large number of ﬂows.

DiffPerf optimizer performs simply by comparing ﬂow’s z score with a pre-deﬁned β threshold, (i.e., classiﬁer requiresno training). Additionally, unlike AQM, DiffPerf is notlimited to speciﬁc congestion algorithm; it works on topof several interactive parameters such as buffer size, ﬂowcharacteristics, and congestion algorithm. Also,

DiffPerf isportable, it can be packaged as virtual network function (VNF)over SmartNIC [47], to handle a presence of extremely heavyworkload.

D. User Quality of Experience (QoE)

In the context of video streaming, several studies proposedto improve user QoE [24], [48] or to achieve QoE fairness [49].These approaches continuously attempt to improve the adap-tive bitrate (ABR) algorithms in the DASH Reference Player atapplication layer, based on several performance metrics seen inthe application. Our work differ from them in that we proposebottom-up optimization.

DiffPerf reacts to the interplaybetween several network’s inherently coupled parameters bycontinuously improving affected trafﬁc ﬂows. This in largepart improves the performance metrics (e.g., QoE fairness) atthe application. VI. C

ONCLUSION

We propose

DiffPerf that leverages the rapid develop-ment in network softwarization and enables an agile and dy-namic network bandwidth allocation at the AP vantage point.At a macroscopic level,

DiffPerf offers access providersnew capabilities for performance guarantees by dynamically allocating bandwidth to service classes through which thetrade-off between fairness and differentiation can be made. Ata microscopic level,

DiffPerf isolates and optimizes the af-fected ﬂows as a result of interplay between several network’sinherently coupled parameters such as ﬂow characteristics,buffer size, and congestion algorithm. We implemented twoprototypes of

DiffPerf ; one in ODL with OpenFlow, andthe other on the programmable Toﬁno switch. We evaluate

DiffPerf from an application perspective by evaluating itfor MPEG-DASH video streaming. Our experiment resultsconﬁrm

DiffPerf ’s capabilities of QoE provisioning, fair-ness, and optimization. A

PPENDIX AP ROOFS OF T HEOREMS

Proof of Theorem 1 : From optimization theory, our band-width allocation problem is a convex optimization problem.By Karush-Kuhn-Tucker (KKT) conditions, it has a uniquesolution which satisﬁes  w s (cid:18) X s n s (cid:19) − α − u + u G = 0 and u s X s = 0 , ∀ s ∈ S ,u (cid:32)(cid:88) s ∈S X s − C (cid:33) = 0 where u and ( u s : s ∈ S ) are KKT multipliers and satisfy u, u s ≥ for any s ∈ S . By solving the above equations, wecan derive that X s = n s α √ w s (cid:80) s (cid:48)∈S n s (cid:48) α √ w s (cid:48) C, ∀ s ∈ S . Proof of Theorem 2 : By the deﬁnition of the set F Ls ( β ) , weknow that for any two thresholds β < β , F Ls ( β ) ⊆ F Ls ( β ) .For any two ﬂows ¯ f and f satisfying ¯ f ∈ F Ls ( β ) and f ∈ F Ls ( β ) \F Ls ( β ) , we have x ¯ f < x ˜ f because z ¯ f < β ≤ z ˜ f < β . Therefore, it satisﬁes (cid:80) f ∈F Ls ( β x f |F Ls ( β ) | ≤ (cid:80) f ∈F Ls ( β x f |F Ls ( β ) | ,.e., the average achieved throughput of the ﬂows within thelower sub-class F Ls is non-decreasing in β . By Eq.(5), when β = 0 , we have (cid:80) f ∈F Ls ( β ) x f |F Ls ( β ) | = (cid:80) f ∈F− s x f |F − s | ≤ X Ls ( β ) |F Ls ( β ) | because F Ls = F − s . In other words, the average achievedthroughput of the ﬂows within the lower sub-class F Ls equalsthe average per-ﬂow capacity re-allocated to F Ls . Because |F Ls | is non-decreasing in β , γn s / (cid:0) n s − |F Ls | (cid:1) is non-decreasing in β . By Eq.(5), the capacity allocated to theper-ﬂow of the upper sub-class satisﬁes X Hs |F Hs | = X s − X Ls |F Hs | = γn s n s −|F Ls | (cid:18) X s n s − (cid:80) f ∈F− s x f |F − s | (cid:19) + γ (cid:80) f ∈F− s x f |F − s | +(1 − γ ) X s n s ≥ X s n s .Thus it is non-decreasing in β and no lower than X s /n s .R Workshop on Buffer Sizing , 2019.[5] “Netﬂix,” https://ispspeedindex.netﬂix.com/country/us/, acc. May. 2020.[6] P. Faratin, D. D. Clark, S. Bauer, and W. Lehr, “Complexity of Internetinterconnections: Tech., incentives and implications for policy,” 2007.[7] K. Bollen and R. Lennox, “Conventional wisdom on measurement: Astructural equation perspective.”

Psychological bulletin , 2018.[8] R. Srikant,

The mathematics of Internet congestion control . SpringerScience & Business Media, 2012.[9] M. Hock, R. Bless, and M. Zitterbart, “Experimental evaluation of BBRcongestion control,” in

Proc. IEEE ICNP , 2017.[10] M. Mathis and A. McGregor, “Buffer sizing: a position paper,” in

Workshop on Buffer Sizing , 2019.[11] J. R¨uth, I. Kunze, and O. Hohlfeld, “An Empirical View on ContentProvider Fairness,” in ”Proc. TMA , 2019.[12] X. Chen, H. Kim, J. M. Aman, W. Chang, M. Lee, and J. Rexford,“Measuring tcp rtt in the data plane,” in

Workshop on SPIN , 2020.[13] “OpenDaylight Project,” http://wiki.opendaylight.org, acc. Jul. 2019.[14] M. Bjorklund et al.

IEEE Communications Magazine , 2015.[19] “Network emulation with netem,” https://man7.org/linux/man-pages/man8/tc-netem.8.html, acc. Aug 2020.[20] A. Mishra, X. Sun, A. Jain, S. Pande, R. Joshi, and B. Leong, “The greatinternet tcp congestion control census,” in in Proc. ACM on MACS , 2019.[21] S. Ha, I. Rhee, and L. Xu, “CUBIC: a new TCP-friendly high-speedTCP variant,”

ACM SIGOPS operating systems review , 2008.[22] N. Cardwell, Y. Cheng, C. S. Gunn, S. H. Yeganeh, and V. Jacobson,“BBR: Congestion-Based Congestion Control,”

Queue , 2016.[23] X. Yin, A. Jindal, V. Sekar, and B. Sinopoli, “A control-theoreticapproach for dynamic adaptive video streaming over HTTP,” in

Proc.ACM SIGCOMM , 2015.[24] P. K. Yadav, A. Shaﬁei, and W. T. Ooi, “Quetra: A queuing theoryapproach to dash rate adaptation,” in

Proc. ACM MM , 2017.[25] D. X. Wei, C. Jin, S. H. Low, and S. Hegde, “FAST TCP: Motivation,Architecture, Algorithms, Performance,”

IEEE/ACM Tran. on Net. , 2006.[26] Lisong Xu, K. Harfoush, and Injong Rhee, “BIC for fast long-distancenetworks,” in

Proc. IEEE INFOCOM , 2004.[27] C. Barakat, E. Altman, and W. Dabbous, “On TCP performance in aheterogeneous network: a survey,”

IEEE Comm. Magazine , 2000. [28] K. Tan, J. Song, Q. Zhang, and M. Sridharan, “A Compound TCPApproach for High-Speed and Long Distance Networks,” in

Proc. IEEEINFOCOM , 2006.[29] Y. Li, D. Leith, and R. N. Shorten, “Experimental Evaluation of TCPProtocols for High-Speed Networks,”

IEEE/ACM Tran. on Net. , 2007.[30] D. Miras, M. Bateman, and S. Bhatti, “Fairness of High-Speed TCPStacks,” in

Proc. AINA , 2008.[31] S. Ma, J. Jiang, W. Wang, and B. Li, “Towards rtt fairness of congestion-based congestion control,”

CoRR

Proc. IEEE INFOCOM

IEEE Tran. on NSM , 2012.[36] M. Podlesny and S. Gorinsky, “Rd network services: differentiationthrough performance incentives,”

Proc. ACM SIGCOMM , 2008.[37] V. Sivaraman, S. C. Madanapalli, H. Kumar, and H. H. Gharakheili,“Opentd: Open trafﬁc differentiation in a post-neutral world,” in

Proc.ACM SOSR , 2019.[38] B. Han, V. Sciancalepore, D. Feng, X. Costa-Perez, and H. D. Schotten,“A utility-driven multi-queue admission control solution for networkslicing,” in

Proc. IEEE INFOCOM , 2019.[39] Y.-W. E. Sung, C. Lund, M. Lyn, S. G. Rao, and S. Sen, “Modelingand understanding end-to-end class of service policies in operationalnetworks,” in

Proc. ACM SIGCOMM , 2009.[40] M. Zou, R. T. Ma, X. Wang, and Y. Xu, “On optimal service differen-tiation in congested network markets,”

IEEE/ACM Tran. on Net. , 2018.[41] D. Katabi, M. Handley, and C. Rohrs, “Congestion control for highbandwidth-delay product networks,” in

Proc. ACM SIGCOMM , 2002.[42] T. Hoeiland-Joergensen, P. McKenney, D. Taht, J. Gettys, and E. Du-mazet, “The ﬂow queue-codel packet scheduler and active queue man-agement algorithm,”

IETF Draft, March , 2016.[43] J. C. Bennett and H. Zhang, “Hierarchical packet fair queueing algo-rithms,”

IEEE/ACM Tran. on Net. , 1997.[44] N. K. Sharma, M. Liu, K. Atreya, and A. Krishnamurthy, “Approximat-ing fair queueing on reconﬁgurable switches,” in

Proc. NSDI , 2018.[45] M. Hedayati, K. Shen, M. L. Scott, and M. Marty, “Multi-queue fairqueuing,” in

Proc. NSDI , 2019.[46] A. Sivaraman, K. Winstein, S. Subramanian, and H. Balakrishnan, “Nosilver bullet: extending sdn to the data plane,” in

Proc. ACM Workshopon Hotnets

Proc. ACM MM , 2016.[49] P. Georgopoulos, Y. Elkhatib, M. Broadbent, M. Mu, and N. Race,“Towards network-wide qoe fairness using openﬂow-assisted adaptivevideo streaming,” in