DiffPerf: Towards Performance Differentiation and Optimization with SDN Implementation
Walid Aljoby, Xin Wang, Dinil Mon Divakaran, Tom Z. J. Fu, Richard T. B. Ma
DDiffPerf: Towards Performance Differentiation andOptimization with SDN Implementation
Walid Aljoby † , Xin Wang † , Dinil Mon Divakaran ∗ , Tom Z. J. Fu § , Richard T. B. Ma †† School of Computing, National University of Singapore ∗ Trustwave, Singapore § Bigo Technology Pte Ltd, Singapore { algobi, xin.wang, tbma } @comp.nus.edu.sg, [email protected], [email protected] Abstract — Continuing the current trend, Internet traffic isexpected to grow significantly over the coming years, with videotraffic consuming the biggest share. On the one hand, thisgrowth poses challenges to access providers (APs), who haveto upgrade their infrastructure to meet the growing trafficdemands as well as find new ways to monetize their networkresources. On the other hand, despite numerous optimizations ofthe underlying transport protocol, a user’s utilization of networkbandwidth and is thus the user’s perceived quality still beinglargely affected by network latency and buffer size. To addressboth concerns, we propose DiffPerf, a class-based differentiationframework, that, at a macroscopic level dynamically allocatesbandwidth to service classes pre-defined by the APs, and at amicroscopic level statistically differentiates and isolates user flowsto help them achieve better performance. We implement DiffPerfon OpenDaylight SDN controller and programmable BarefootTofino switch and evaluate it from an application perspectivefor MPEG-DASH video streaming. Our evaluations demonstratethe practicality and flexibility that DiffPerf provides APs withcapabilities through which a spectrum of qualities are provisionedat multiple classes. Meanwhile, it assists in achieving betterfairness and improving overall user’s perceived quality withinthe same class.
I. I
NTRODUCTION
Today’s Internet is dominated by content traffic, especiallyvideo streams. According to Cisco Annual Internet Report [1],video will make up 82% of the total downstream Internettraffic by 2022. In today’s home, Internet video drives ourwork and life, particularly with COVID-19 pandemic [2],meanwhile video applications continue to be of a significantdemand for the bandwidth in the future [1]. To accommodatehigh traffic, content providers (CPs) have been deployingwide-area infrastructures to bring content closer to users,e.g., Netflix uses third-party content delivery networks suchas Akamai and Limelight, and builds its own [3]. However,as end-users rely on last-mile access providers (APs) foraccessing the Internet, APs’ bandwidth capacity still limit userthroughput due to network congestion [4]. For example, theaverage throughput of Netflix users behind Comcast [5], thelargest U.S. broadband provider, degraded 25% from over 2Mbps in Oct 2013 to 1.5 Mbps in Jan 2014.To sustain traffic growth, APs need to upgrade networkinfrastructures and expand capacities; however, their incentivesdepend on the business model and the corresponding mecha-nism used to monetize bottleneck bandwidth, which is crucialto the viability of the current Internet model in the future.A general approach used by APs is to differentiate services and prices, e.g., APs provide premium peering [6] optionsfor CPs to choose and multiple data plans for end-users withdifferent data usage to choose. However, the former can onlybe implemented with large CPs via peering agreements, whilethe latter does not guarantee the performance of end-users inany sense. The bandwidth allocation is typically a functionof the application endpoints, and is traditionally embodied aspart of transport layer’s congestion control mechanism. TCPCUBIC and BBR are the most popular protocols control themajority of Internet traffic. However, both of them strive forefficient utilization of the bandwidth, while being unawareof the negatively biased user’s Quality of Experience (QoE)affected by Round-Trip Times (RTT) and network buffer size.Consequently, there exists a fundamental mismatch betweenthe differentiated services and the underlying resource alloca-tion that differentiates for predictable performance.To resolve this mismatch, we consider a class-based differ-entiation approach, under which CPs and users can choosea service class (SC) to join. We propose
DiffPerf , adynamic performance differentiation framework, at APs van-tage point to manage their bottleneck bandwidth resourcesin a principled and practical manners. From a macroscopicperspective,
DiffPerf dynamically allocates bandwidth toeach SC according to the changing number of active flows ineach SC, by maximizing the weighted α -fair utilities, whichenables APs to trade-off fairness. Nevertheless, as the usersin the same service class might not perceive a fair qualitydue to the consequences of the complex interaction betweenthe transport protocol and the inherent network conditionssuch as heterogeneous RTTs and buffer sizes, as shown inour experimental explorations and known by conventionalwisdom [7]. Thus, at a microscopic level, DiffPerf usesa new performance-aware mechanism, called ( β, γ )-fairnessto further optimize and make more fine-grained bandwidthallocation within each SC, so as to more efficiently utilize theaggregate capacity and achieve fairer performance for flows.Our main contributions are as follows:1. We derive the closed-form bandwidth allocation solutionand show that this solution achieves guaranteed perfor-mance differentiation in terms of controllable ratios ofthe average per-flow throughput across the different SCs.2. Within each SC, we present ( β, γ )-fairness and a neatstatistical method to differentiate and isolate flows auto- a r X i v : . [ c s . N I] D ec atically based on their achieved throughput, to mitigatethe bias brought by the TCP protocol due to its interac-tion with network latency (i.e., RTT) and buffer size.3. By leveraging SDN capabilities, we develop a nativeOpenDaylight (ODL) control plane application that dy-namically manages network resources, including track-ing flows, inquiring flow statistics and allocating band-width capacity. Furthermore, to measure the impact ofnetwork buffer sizes, we also implement DiffPerf onprogrammable Barefoot Tofino switch which allows flex-ible buffer sizing and enables fine-grained and flexibleline-rate telemetry.4. We carry out comprehensive evaluations of
DiffPerf from an application perspective for DASH video stream-ing, as a mainstream accounts for the majority of Internetvideo traffic.We believe that
DiffPerf demonstrates a new avenue forAPs to differentiate and optimize the performance of videoflows and corresponding perceived user QoE so as to bettermonetize their bottleneck network resources. This will furtherincentivize APs to deploy more bandwidth capacity to accom-modate the growth of Internet content traffic.II. T HE D I F F P E R F F RAMEWORK
In this section, we present the
DiffPerf framework in atop-down manner. We first describe how
DiffPerf allocatesbandwidth capacity among the SCs, based on an optimizationapproach. We will derive closed-form allocation solution andshow its feature of guaranteed performance differentiation. Wethen discuss the performance issues due to the consequencesof TCP congestion control mechanism that responds to theheterogeneity of flows’ RTTs and network’s buffer sizes. Tosolve this problem, we show how
DiffPerf classifies flowsand optimizes bandwidth allocation within each SC.
A. Inter-Class Bandwidth Allocation
We consider an access provider that offers a set S of serviceclasses over a bottleneck link with capacity C . We denote theset of active flows in any service class s ∈ S by F s and thecardinality of F s , i.e., the number of flows in class s , by n s .To differentiate the performance for flows in different serviceclasses, the access provider needs to allocate appropriateamount of bandwidth to each service class. To accomplish thisin a principled manner, we formulate the bandwidth allocationas an optimization of the allocation X = ( X s : s ∈ S ) thatsolves a general utility maximization problem as follows.max X (cid:88) s ∈S n s U s (cid:18) X s n s (cid:19) (1)s.t. (cid:88) s ∈S X s ≤ C and X s ≥ , ∀ s ∈ S . (2)Under the link capacity constraint (2), the above mathematicalprogram tries to maximize the aggregate utility over all serviceclasses, where for each service class s , it counts the numberof flows n s multiplied by the per-flow utility U s ( X s /n s ) over the average capacity X s /n s allocated to each flow. In particular, we adopt and generalize the well-known weighted α fairness family of utility functions [8] as follows. In this familyof utility functions, each service class s will be assigned aweight w s that indicates the relative importance of the serviceclass, resulting in differentiated per-flow bandwidth allocationacross the service classes. By controlling the parameter α , theaccess provider can express different preferences over variousnotions of fairness. When α approach , the utility tendsto be measured purely by the allocated bandwidth; when α approaches + ∞ , the solution converges to the weighted max-min fair allocation among the flows. In particular, a trade-offof a weighted proportional fair solution can be obtained bysolving the optimization problem when α is set to be . Thus,besides the differentiation factor w s among service classes, theservice operator can choose the value of α to tradeoff fairness. Theorem 1.
If an allocation X maximizes the aggregate utilityover all service classes, it must satisfy X s = n s α √ w s (cid:80) s (cid:48) ∈S n s (cid:48) α √ w s (cid:48) C, ∀ s ∈ S . (3)Theorem 1 provides the closed-form solution of the utilitymaximization problem. Based on the optimal allocation solu-tion in Equation (3), we derive the ratio of the average per-flowcapacities of any two service classes s, s (cid:48) ∈ S as (cid:18) X s n s (cid:19) : (cid:18) X s (cid:48) n s (cid:48) (cid:19) = α (cid:114) w s w s (cid:48) . (4)This result implies that performance differentiation is achievedby enforcing a fixed ratio for the per-flow bandwidth capacityacross SCs, which is controlled by the weights w s , w s (cid:48) andthe fairness parameter α . Equation (4) explicitly shows thatthe optimal solution effectively allocates a higher average per-flow capacity in the service class that has a larger weight,which is desirable and expected for the better service class.In particular, we also see that when α is set to be , theweighted proportional fair allocation leads an average per-flowallocation that are proportional to the weights of the SCs. B. Intra-Class Bandwidth Allocation
Motivation:
Given X s amount of bandwidth capacity allo-cated to the n s flows in SC s , each flow f ∈ F s is expected toachieve an average throughput of X s /n s . However, the actualthroughput achieved, denoted by x f , might be significantlyless than the mean. This can adversely affect the QoE thatthe corresponding user perceives. At the last-mile bottleneck,parameters such as RTT, TCP congestion control algorithm(e.g., CUBIC v/s BBR), and buffer size affect the performanceof flows [9], [10], [11]. The heterogeneity of RTTs experiencedby the flows as well as the interaction of the TCP-basedcongestion control mechanisms that respond to the RTTs andnetwork buffer size differently, lead to multiple competingflows achieving different throughput. We analyzed the perfor-mance of 100 competing DASH flows on a testbed, where allflows run TCP BBR and share a bottleneck link connecting to aDASH server. The bottleneck link capacity is set to 120Mbps,nd 30% of the flows experience relatively longer RTTs thanthe rest. We run the experiments by changing one of the keyparameters, i.e. the network buffer size. The experiment resultsshow that the average stalling time of DASH flows at 10MBnetwork buffer size, is 35% higher than that of DASH flowswhen the network buffer size is 1MB. However, by “isolating”flows that perceived dissimilar QoE at the last-mile bottlenecklink, we observed that average stalling time of DASH flowsis reduced by 50% and 25% at the buffer sizes of 1MBand 10MB, respectively, thereby improving the overall QoEsignificantly.Motivated by this observation, we propose a practically scal-able solution to classifying similar flows into sub-groups andisolating them into separate sub-classes by allocating appro-priate amount of bandwidth to them within each SC. Next, wedescribe 1) a flexible statistical method that DiffPerf usesto classify flows within a SC, and 2) the intra-class bandwidthallocation used by
DiffPerf for sub-group isolation.
1) Flow Classification and Isolation:
By relying on QoEas a similarity metric to classify the flows, clearly this choicerequires an explicit feedback via the receiver to the AP vantagepoint, which is difficult to afford in practice. We thereforewant to leverage SDN functionalities to find other metrics touse at the vantage point. The first metric that comes to mind isRTT. However, the use of real-time RTT samples could not betaken solely as indicators of performance issues without otherinformation such as underlying congestion control mechanism,buffer size [9], and packet route security. Even if we makeassumptions of the availability of these information, measuringflow RTT at the APs is unreliable. Measuring RTT at the SDNcontrol plane inflates a variable and high RTT based on ourmeasurements in ODL control plane, while measuring it in theSDN data plane [12] may not scale well due to the memoryspace constraints. Instead, we emphasize that throughput ofTCP flows is the appropriate and robust metric that indicatesthe collective impact of the interaction of network parametersto the user-perceived performance. Next we show how to uti-lize the throughput measure as a proxy to determining whetherflows are similar to each other and to identify effectively whichflows are affected.Because the number of groups and the number of flows foreach group may change and are not known in real scenarios,we adopt general statistical metrics for classification. Given theachieved throughput x f of the flows f ∈ F s in any SC s ∈ S ,the mean and standard deviation of the flows’ throughput aredefined as ¯ x s = n s (cid:80) f ∈F s x f and σ s = (cid:113)(cid:80) f ∈F s ( x f − ¯ x s ) n s .Because the achieved throughput x f of each flow depends onthe number n s of competing flows and their characteristics,buffer size, and the allocated capacity X s that ultimatelydetermines the network congestion imposed on the SC, insteadof using absolute throughput thresholds to classify flows, weadopt the following statistical metric that orders and measuresthe relative throughput values among all flows in the same SC. Definition 1.
Given the mean ¯ x s and standard deviation σ s , the standard score of a flow f ’s throughput is defined by z f =( x f − ¯ x s ) /σ s . When a flow’s throughput is above (or below) the mean, itsstandard score or z -score is positive (or negative, respectively).This z -score captures the signed fractional number of standarddeviations by which it is above the mean value.Without loss of generality, we divide a set F s of flowsinto two sub-classes: lower sub-class F Ls and upper sub-class F Hs , based on each flow’s z -score compared with apre-defined threshold β , where F Ls ( β ) = { f ∈ F s : z f <β } and F Hs ( β ) = F s \F Ls ( β ) . The set F Ls contains theflows that achieved the lowest throughput values (i.e., thenegatively affected flows). Thus, our goal is to identify them sothat we isolate and allocate appropriate amount of bandwidthto them accordingly. We use a non-positive value of β to cap-ture flows whose throughput are | β | deviations lower than theachieved average ¯ x f . Because the set F Ls grows monotonicallywith the parameter β , i.e., F Ls ( β ) ⊆ F Ls ( β ) ∀ β < β , asmaller value of β makes a more conservative decision on thelowest throughput flows, avoiding mis-classifications. We willfurther study how the values of β affect the performance offlows in a later section via experimental evaluations.
2) Bandwidth Allocation Model:
After classifying flows ineach SC into two sub-groups, we isolate them into two sub-classes and determine how much bandwidth X Ls and X Hs toallocate for each sub-class. To fully utilize bandwidth capacity,our solution needs to satisfy X Ls + X Hs = X s .The throughput of some flows might be naturally lowand might not be able to achieve the targeted throughput X s /n s even allocated that amount of capacity. As a result,enforcing the per-flow allocation of X s /n s will result inresource wastage. The key question to answer is how muchper-flow capacity we should allocate to the flows F Ls , whoseinnate throughputs are less than what is needed to achievethe average throughput ¯ x s or to utilize the per-flow allocatedcapacity X s /n s in theory. Since these flows might not beable to achieve the average throughput, the per-flow allocationshould be no higher than X s /n s . On the other hand, byisolating negatively affected flows from high-throughput flows(i.e., the flows F Hs that cause the performance issues of flows F Ls ), we expect them to achieve higher throughput than whatare being achieved; and therefore, we should allocate morecapacity for the set F Ls of flows than their aggregate achievedthroughput. To this end, we allocate the average amount ofbandwidth capacity for the per flow of set F Ls as X Ls ( β, γ ) |F Ls ( β ) | = γ (cid:80) f ∈F − s x f |F − s | + (1 − γ ) X s n s , (5)where we define the set of flows whose throughput are belowthe mean ¯ x s by F − s (cid:44) { f ∈ F s : x f < ¯ x s } and introduce aparameter γ ∈ [0 , to control the allocated capacity flexibly.In particular, for one extreme of γ = 1 , the solution allocatesthe average throughput of the set F − s of flows as the per-flow capacity for the lower sub-class F Ls ( β ) , which must belower than the average throughput ¯ x s and the average capacity s /n s of all flows. In this case, the per-flow capacity allocatedfor the lower sub-class F Ls is lower than that allocated forthe upper sub-class F Hs , under which resource wastage isreduced and resource is utilized more efficiently. For the otherextreme of γ = 0 , the solution simply isolates the two sub-classes and equally allocates an average capacity X s /n s asthe per-flow capacity for both upper and lower sub-classes,under which per-flow fairness is enforced regardless of howefficiently the resource is utilized. Thus, by choosing the valueof γ between and , we can make a trade-off betweenresource fairness and utilization. However, this depends onthe interaction of TCP algorithm with network buffer size.As opposed to the shallow buffer, the deep buffer allows low-throughput TCP flows (especially those negatively affected dueto heterogeneity of RTTs) to stabilize their transfer. Thus, inthe vantage point of deep buffer, if the low-throughput flowswere crowded out by others, then they can perform better if γ = 0 . However, this is not the case in shallow buffer thatdoes not allow negatively affected flows to ramp up quicklyand perhaps this even leads to lower utilization.By Eq.(5), we also have the next theorem showing 1) lowerbounds of per-flow capacities re-allocated to the lower andupper sub-classes F Ls and F Hs ; and 2) the monotonicity ofthe average throughput of the flows within F Ls and the averageper-flow capacity re-allocated to the F Hs on the parameter β . Theorem 2.
Given any fixed parameter γ , for any serviceclass s ∈ S , 1) the average achieved throughput of theflows within the lower sub-class F Ls is non-decreasing in β and always no higher than the average per-flow capacity re-allocated to F Ls ; 2) the average per-flow capacity re-allocatedto the upper sub-class F Hs is non-decreasing in β and alwaysno lower than X s /n s . Theorem 2 states that as the parameter β increases, the aver-age throughput of the flows F Ls of the lower sub-class wouldalso increase because more high-throughput flows would beclassified into F Ls . It also tells that this achieved averagethroughput must be no higher than the the per-flow capacity re-allocated to them in F Ls . This property guarantees our designobjective of allocating more capacity for the flows in the lowersub-class than their aggregate achieved throughput. Theorem 2also states that as β increases, flows within the upper sub-class F Hs would be re-allocated more per-flow bandwidth capacityalthough fewer flows would be classified into the sub-class.Thus, service operators can choose the value of β to controlthe scales of the sub-classes and both β and γ to controlthe bandwidth capacity allocated to the flows of the two sub-classes, which we refer to it as ( β , γ )-fairness.Before we close this section, we would like to emphasizethat although DiffPerf classifies the flows in each SCinto two sub-groups for simplicity, its statistical method ofclassification and the corresponding bandwidth allocation canbe applied in a top-down recursive manner to further split anysub-group for a more fine-grained optimization. III. I
MPLEMENTATION A. DiffPerf
Prototype on OpenDaylight with OpenFlow
We implement
DiffPerf as an application on the popularindustry-grade open-source SDN platform—the OpenDaylight(ODL) controller. We particularly develop a native MD-SAL(Model-Driven Service Adaptation Layer) application on ODLwhich comprises use of different technologies such as OSGI,Karaf, YANG, blueprint container, and messaging patterns asRPC, publish-subscribe, and data store accesses [13]. We skipimplementation details for the sake of brevity. Figure 1 depicts
Inter-class Allocator Intra-class AllocatorFlow Processor Bandwidth OptimizerBandwidth Enforcer Stat. CollectorOpenFlow PluginForwarding Devices S DN C on t r o ll e r O pen F l o w P r o t o c o l Figure 1:
DiffPerf implementationthe implementation structure of
DiffPerf on ODL. Thereare four main modules:
Flow Processor , Bandwidth Optimizer , Bandwidth Enforcer and
Stats Collector and are interconnectedas shown in the figure; we briefly describe them below.
1) Flow Processor:
The
Flow Processor module on theODL performs two primary functions. First, it assigns user-specified service classes to newly joining flows in the network.We use the YANG modeling language [14] to define theservice classes. Second, the module carries out regular flowmaintenance; i.e., the flow processor inserts new flows into thedata store, determines the pre-defined service class and assignsthe corresponding weight to new flows, removes inactive orcompleted flows, etc.
2) Statistics Collector:
DiffPerf performs in-networkperformance optimization. For
DiffPerf to work, we needto estimate throughput of each active flow. Let x f ( t ) denotethe average throughput of a flow f until time t . By measuringthe instantaneous throughput ˜ x f of flow f during the pastimmediate time period ∆ t , the average throughput for thenext time period ∆ t is updated as follows: x f ( t + ∆ t ) ← δ × x f ( t ) + (1 − δ ) × ˜ x f . where δ ∈ [0 , , is a weight. Havingmeasured the average throughput of the active flows, we usethese estimates to group flows into sub-classes so that flowswith similar achieved throughput fall into the same sub-class.To obtain real-time estimates of the throughput of eachactive flow as well as link bandwidth, we implement a Stat.Collector module on ODL. This module registers per-flowrules to pull out measurement information using event-basedhandlers from the operational data store in ODL. The datastore in turn uses the OpenFlow plugin (as indicated inFigure 1) to request the switches to report flow measurements;The per-flow measures of interest are packet counts, bytecounts and duration. ) Bandwidth Optimizer:
The core part of
DiffPerf is the
Bandwidth Optmitizer module, which is responsiblefor inter- and intra-class bandwidth optimization describedin Section II. The optimizer runs every ∆ t interval, gettinginput from two modules described above - Flow Processor and
Stat. Collector (see Figure 1). While the former providesmapping of flows to user-specified service classes, the latterprovides real-time measurements on the active flows in theswitch. Given the input information, the inter- and intra-classoptimizers are executed; the output of optimization are: (i) theportion of bandwidth allocated to each service class (SC), and(ii) the portion of bandwidth for each sub-class within everyservice class.
4) Bandwidth Enforcer:
To materialize bandwidth alloca-tion, each sub-class should use its designated bandwidth inan isolated manner. A naive approach to implementing thisis to leverage multi-queues at the switch egress port so thateach sub-class maps into an isolated queue. However, thereare two practical challenges. First, in commodity switches thenumber of queues at egress port is usually limited to a smallnumber [15][16], meaning that the number of available queuescould be less than the number flow sub-classes. Second,current OpenFlow switches do not expose APIs to update theweight of the queues dynamically. Without this capability (ofdynamically changing queue weights), the bandwidth allocatedto queues cannot be changed as and when required.To overcome both limitations, we leverage the metering feature available in OpenFlow switches. Instead of definingqueues and updating their bandwidth at egress port, wedevelop a
Bandwidth Enforcer module that essentially doesenforcement at the ingress side of the switch. That is, multiplemeters corresponding to the number of sub-classes are defined;and based on the output of the
Bandwidth Optimizer , flow rateof each sub-class is attached to a specific meter dynamically.The
Bandwidth Enforcer uses the OpenFlow plugins to encap-sulate the allocated bandwidth into OpenFlow messages andinstall them on to the switch(es). B. DiffPerf
Prototype with Programmable Data Plane
Next, we implement another prototype of
DiffPerf on alightweight C controller that is connected to Barefoot Tofinoprogrammable switch [17] which enables flexible buffer sizingand fine-grained line-rate telemetry. We particularly implementstatistics building block in the data plane to track numberof bytes transmitted by the active flows. Additionally, were-implement Bandwidth Enforncer and
Statistics Collector modules in the control plane. We leverage the Tofino switchexposed APIs to update the weight of the queues dynamicallyand configure their sizes by the control plane. The remainingmodules are kept the same with minor modifications.IV. E
XPERIMENTAL E VALUATION
We evaluate
DiffPerf by carrying out experiments on arealistic testbed. We describe the details below.
A. Testbed setup
OpenFlow Brocade switch experiments:
We set up atestbed for video streaming between a DASH client (i.e.,video player) called dash.js and DASH server over anSDN network; Our testbed consists of 12 servers, 10 of whichare used to host DASH clients and one each for hostingDASH server and ODL controller. The 10 servers runningDASH clients are connected to the DASH server such thatthey compete (for video segments) at a downstream bottlenecklink from a SDN-enabled Brocade ICX-6610 24-port physicalswitch. We evaluated
DiffPerf in three different scenarios.In Scenario 1 (Section IV-C1) and Scenario 2 (Section IV-C2),each physical server hosts up to DASH clients, each clientruns in a VM, and all clients are connected to the DASH serverover a 50 Mbps downstream bottleneck link. For Scenario 3(Section IV-C3), we scale up the number of DASH clients—each physical server hosts DASH clients, all run as dockercontainers and connected to the DASH server over a 200 Mbpsdownstream bottleneck link.
Barefoot Tofino switch experiments:
The experimentswith Tofino programmable switch concentrate on evaluatingthe impact of buffer size on the performance of bottle-necked flows and how
DiffPerf enables switch bufferto perform better (i.e., improve the overall flow perfor-mance). The evaluations are carried out with multiple switchbuffer sizes:
KB, MB, and MB. Tofino exposes aset of APIs for Traffic Manager applications to managebuffer allocation from both ingress and egress ends. We use bf_tm_q_app_pool_usage_set
API to aid in settingbuffer size for the queues of the egress port attached tothe bottlenecked link. Buffer size is specified in terms ofcells, where each cell size is 80 bytes. The buffer precedes a
Mbps bottleneck link that transfers video segments fromDASH server to
DASH clients. The results are presentedunder the last part of Scenario 2 (Section IV-C2).Except for Scenario 1 (Section IV-C1), assuming majorityof flows in the Internet have short RTTs [18], we partition theclients into two sets in 70:30 ratio based on the RTT min valuesconfigured: the mean and standard deviation of the bigger setare 64ms and 16ms, respectively, and that of the other are224ms and 32ms, respectively. We use the network emulator netem [19] at the server machines running the DASH clientsto set the latency. For streaming, we use the Big Buck Bunnyvideo sample that lasts for 600 seconds and has been encodedinto 3 bitrate levels—1.2 Mbps, 2.2 Mbps and 4.1 Mbps—ofequal segments (i.e., each segment is 2 seconds long). Thus,a DASH client can choose the bitrate levels and segments forstreaming video, based on the measured congestion level of thenetwork. We compare
DiffPerf against two most popularTCP congestion control algorithms on the Internet [20]; TCPCUBIC [21] and TCP BBR [22].
B. Metrics for evaluation
To evaluate the performance of
DiffPerf and TCP vari-ants, we use two metrics. One is per-flow average throughput,which in our case corresponds to the average throughputs .25 0.5 1 2 412345 A v e r age t h r oughpu t ( M bp s ) TotalG-classS-classB-class (a) Throughput A v e r age Q o E G-classS-classB-class (b) QoE
Figure 2: The Achieved throughput and corresponding QoE ofmultiple service classes for different values of α of all DASH client. The other metric of importance is theuser-perceived quality-of-experience (QoE). The QoE metric isadopted based on the widely used model proposed by [23], andis expressed as: QoE = (cid:80) Nn =1 q ( R n ) − λ (cid:80) N − n =1 | q ( R n +1 ) − q ( R n ) | − µT stall − µ s T s . This QoE definition uses variousperformance factors such as the average playback bitrate R n over the total N segments of the video, the average variabilityof the consecutive segments bitrate represented by the secondsummation, the duration of rebuffering T stall (i.e., the durationof time the player’s playout buffer has no content to render),and startup delay ( T s ) (i.e., the lag between the user clickingand the time to begin rendering). As in [24][23], q maps abitrate to a quality value; λ is usually set to one, µ and µ s areset to the maximum bitrate of the video sample. We measureQoE for the entire duration of the video. C. Results1) Scenario 1: Evaluation of inter-class performance:
Inthis scenario we evaluate
DiffPerf ’s inter-class bandwidthallocation model. We assume the access provider offers threeclasses of services: Golden (G), Silver (S) and Bronze (B),with weights of 3, 2 and 1, respectively. We run a set ofexperiments to evaluate the bandwidth allocated to users ofdifferent classes under different values of α . We assign 13DASH clients to each service class, thereby having a totalof 39 DASH clients in this scenario. All flows experiencehomogeneous RTT in this scenario.Figures 2(a) plots, for each SC, the average throughputachieved by all flows in that class, for different values of α . Evidently, the ratios of the estimated average throughputof flows across the service classes closely follow the ratiosobtained from our model (refer Eq. 4). In addition, the averagethroughput is converging with increasing α .Figure 2(b) plots the average QoE of all flows in each SC.Observe that the QoE of service class B is low, when theaverage throughput achieved (given in Figure 2(a) achievedis low. The QoE of the three service classes converge withincreasing value of α . With increasing α , as resources wouldbe fairly shared among the competing flows, it is expected thathigher level QoE also reflects this fair sharing given that theflows have homogeneous RTTs.
2) Scenario 2: Evaluation of intra-class performance:
Inthis part, we evaluate our proposed performance-aware fair-ness, ( β , γ )-fairness. That is, DiffPerf ’ β capability to mit- igate the bias brought against the affected flows by the inter-action between TCP CUBIC, TCP BBR, flows with heteroge-neous RTTs, and switch buffer size. Also, we present the flex-ibility of γ in enabling a feature of practical interest, the trade-off between network efficiency and user QoE fairness. Recall, DiffPerf uses statistical flow classification and bandwidthallocation to the classified sub-classes to appropriately allocatea higher capacity to the negatively affected flows.
DiffPerf based on CUBIC.
In Figure 3, we present T h r oughpu t ( M bp s ) (22,18) (17,23) (12,28) (9,31) (5,35) lower sub−class upper sub−class β = 0 β = −0.25 β = −0.50 β = −0.75 β = −1 CUBIC BBR DiffPerf CUBIC BBR DiffPerf CUBIC BBR DiffPerf CUBIC BBR DiffPerf CUBIC BBR DiffPerf
Figure 3: Aggregate throughput of the SC sub-classes flows DASH Client ID Q o E CUBICBBRDiffPerf ( γ =1)DiffPerf ( γ =0.5)DiffPerf ( γ =0) (a) Upper sub-class
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40−8−6−4−20246x 10 DASH client ID Q o E CUBICBBRDiffPerf ( γ =1)DiffPerf ( γ =0.5)DiffPerf ( γ =0) (b) Lower sub-class Figure 4: QoE of the SC sub-classes flowsthe aggregate throughput achieved by DASH clients belongingto the two different sub-classes, under TCP CUBIC, TCPBBR and
DiffPerf over CUBIC. We run experiments fordifferent values of β . For better clarity, we show the resultsonly for γ = 0 . . The number of flows classified into thelower sub-class based on our statistical classification, for β = 0 , − . , − . , − . , and − . , are 22, 17, 12, 9, and5, respectively. The figure shows that DiffPerf ’s isolation enables the DASH clients in the lower sub-class to achievehigher throughput than both TCP variants, while also beingto achieve comparable aggregate throughput as TCP . And β gives an access provider flexibility to decide on the flowclassification based on a simple and intuitive metric ( z -score).While small number of flows get classified in the lower sub-class, we note that these were also the worst affected onces.Figures 4(a) and (b) plot the QoE achieved by each DASHclient, for β = − . ; here we plot for different values of γ as well. The former figure plots QoE of the upper sub-classof flows, the the latter plots the same for the lower sub-classof flows. The key observation is that, the flows in the lowersub-class perceive higher QoE under DiffPerf than underboth TCP variants, and only at the cost of a small numberof flows in the upper sub-class. Another observation is that, Notice that all CUBIC, BBR, and
DiffPerf achieve aggregate through-put slightly higher than 50Mbps; this is due to the burstiness tolerated byOpenFlow meters. Also, BBR achieves the highest aggregate as it fundamen-tally does not passively react to packet loss or delay as signals of congestion. .511.522.5 A v g . t h r oughpu t ( M bp s ) = 0 = 0.25 = 0.5 = 0.75 = 10102030405060 T h r oughpu t ( M bp s ) lower sub-classupper sub-class lower sub-classupper sub-class Figure 5:
DiffPerf fairness-efficiency tradeoffwith lower γ , DiffPerf gives fairer QoE to the clients;
DiffPerf with γ = 0 is the most fair. DiffPerf not onlyimproves the fairness, we calculated the overall QoE valuesfor all flows; and it shows that
DiffPerf , via flow isolationas well as performance-aware bandwidth allocation, improvessignificantly the overall QoE compared to TCP solely. Itperforms . , . , and . times higher QoE than CUBIC,at γ = 0 , . , and , respectively; Similarly, it performs . , . , . times higher QoE than BBR. Fairness-efficiency tradeoff:
We run experiments for dif-ferent values of γ , to analyze the trade-off between efficiency(i.e., bandwidth utilization) and fairness (user-perceived qual-ity fairness), β is set to − . . Figure 5 depicts that theaggregate throughput increases as γ increases. Meanwhile, theaverage throughput of the lower sub-class decreases and thatof the upper-class increases, as γ is increased from 0 to 1(refer the second Y-axis). Evidently, the parameter γ affectsthe average flow throughput of both sub-classes. This behavioris due to the fact that, when γ approaches 0, the clients areallocated equal bandwidth regardless of the sub-class; hencethe DASH clients tend to achieve higher throughput (subjectto their characteristics), and thus the fairness is also improved.Conversely, when γ approaches 1, DiffPerf helps thenetwork achieves better bandwidth utilization.
DiffPerf allocates higher bandwidth to upper sub-class that likely hasflows with greater tendency to exploit provisioned networkbandwidth, and hence result in better network utilization.
DiffPerf based on BBR.
As TCP BBR has recentlygained wide-spread attention,
DiffPerf is evaluated overTCP BBR. Figure 6 shows that
DiffPerf ’s isolation enablesDASH clients in the lower sub-class (i.e., the affected flows)to achieve higher throughput than BBR, while also being ableto achieve comparable aggregate throughput as BBR. Thenumber of flows classified into the sub-classes are illustratednext to DiffPerf bar. For example, at β = , flows areclassified into lower sub-class, and flows to upper sub-class.Figure 7(a) and (b) plot the perceived QoE of DASH clientsin the aforementioned sub-classes. The lower-subclass flowswith DiffPerf perceive better QoE than with BBR.
Impact of buffer size:
Based on the experiments carried outon the Tofino switch, this part presents the impact of buffersize on the performance of bottlenecked flows and the flowoptimization achieved using
DiffPerf . Note, Tofino switchupdates
DiffPerf with flow statistics every 1s (i.e., thesampling rate). At every interval of ∆ t = 5s, DiffPerf uses T h r oughpu t ( M bp s ) ( , ) ( , ) ( , ) ( , ) ( , ) lower sub−class upper sub−class β = −1 β = −0.75 β = −0.50 β = −0.25 β = 0 BBR DiffPerf BBR DiffPerf BBR DiffPerf BBR DiffPerf BBR DiffPerf
Figure 6: Aggregate throughput of the SC sub-classes flows DASH Client ID Q o E BBRDiffPerf ( γ =0) (a) Upper sub-class DASH client ID Q o E BBRDiffPerf ( γ =0) (b) Lower sub-class Figure 7: QoE of the SC sub-classes flowsthe last measured statistics for regulating network flows in thenext immediate time interval. Figure 8 plots the QoE achievedby
DASH clients competing for
Mbps bottleneck linkpreceded at one experiment by MB shallow buffer and inthe other experiment by MB deep buffer. With shallowbuffer, BBR achieves three times higher QoE than with deepbuffer. We then run same experiments for
DiffPerf , with DASH client ID Q o E DiffPerf with Deep BufferDiffPerf with Shallow BufferBBR with Deep BufferBBR with Shallow Buffer
Figure 8: Impact of switch buffer size β = − . and γ = 0 ; and it achieves . times higher QoEthan BBR with shallow buffer and . times higher QoE thanBBR with deep buffer. We note that shallow buffer leads tooverall better user-perceived quality. However, quality worsenswith much smaller buffer size (e.g. KB). The deep buffermight help low-throughput flows, especially those affected bythe interaction of TCP with flow RTT, to achieve better QoEbut it increases packet queuing delay. The very shallow buffer(e.g.
KB) reduces packet queuing delay but increasespacket losses. Hence, both these extreme buffer sizes increaseDASH client’s average stalling time (i.e., the duration of timethe player’s playout buffer has no content to render). WithBBR, the client average stalling time is . , . , and . seconds, while under DiffPerf , the client average stallingtime is reduced to . , . , and . seconds over buffersizes KB, MB, and MB, respectively.
DiffPerf thusdemonstrates to be effective in improving the user-perceivedquality for multiple buffer sizes. Lastly, it is worth noting thatrom this set of experiments, the buffer size of MB makesbetter trade-off between queuing delays and packet losses.Overall,
DiffPerf is fairer than both CUBIC and BBRin terms of client’s throughput, client’s QoE, and provides thehighest overall QoE.
Scalability : DiffPerf operations on the Tofino switch aresplit across controller and dataplane. The controller collectsaggregate real-time statistics of the active flows, performs theoptimization, and reacts to data plane regularly. The dataplanetracks the number of bytes transferred by the active flows.
DiffPerf does not impose high rate of sampling, whichmay lead to inaccurate statistics, especially when the DASHclients enter
OFF period. Hence, the communication (betweencontroller and dataplane) is only at the scale of seconds, andthis works well for long running video flows over the Internet.This is also demonstrated by our experimental results.
3) Scenario 3: The Dynamics of
DiffPerf : Finally, tounderstand how
DiffPerf performs in real-world cases, weevaluate it in a dynamic scenario, where users from differentservice classes join and leave the network at different times.In this set of experiments, we conduct the evaluation on Open-Flow network centralized by ODL controller, were 150 DASHclients share a
Mbps bottleneck link; they have variableRTTs, with the ratio and distribution same as in the previousscenario. The arrivals of the DASH client requests follow thePoisson process with rate λ = 1 client/s. A client exits afterthe entire video (that lasts for seconds) is streamed. TheDASH clients subscribe to G, S, and B, service classes inthe ratio 1:2:3. The weights of service classes are kept thesame as before; i.e., G:S:B = 3:2:1. We set the values of α , β and γ to , − . , and . , respectively. At every intervalof ∆ t = 15s, DiffPerf uses the last measured statistics,such as number of active flows and each flow’s instantaneousthroughput ( δ = 0), to subsequently send command to theswitch for regulating network flows in the next immediatetime interval. OpenFlow switch updates DiffPerf with flowstatistics every 3s (i.e., the default sampling rate in BrocadeICX-6610 switch). Figure 9(a) shows the arrival and departureof DASH flows. The number of active flows in the twosub-classes, for each of the services classes, are depicted inFigures 9(b), (c), and (d). Although the video being streamedis of 600 seconds, observe that the G-class clients completeearlier than S-class and B-class; and this is true for bothlower and upper sub-classes. Similarly, S-class flows finishearlier than B-class flows. We observe a sudden decrease inthe active flows a few times (the dips on the curves); this isnot because the flow(s) actually leave the system, rather, dueto the expiry of idle timeout of flows. When DASH clientdoes not receive video segments packets, the timeout causesthe flow to be deemed as an inactive flow. However, oncethe client resumes receiving data,
DiffPerf promptly countsit as an active flow. Figure 10 plots the dynamic bandwidthallocation recommended by
DiffPerf , for each service classand the sub-classes within. The bandwidth allocated accountsfor number of active flows in each service class and theirachieved throughput, optimized via ( β , γ ) performance-aware mechanism. It also shows that DiffPerf adapts quickly tothe departure of flows (observe time period after 600 seconds),allocating the spare capacity to the remaining active flows.V. R
ELATED W ORK
A. TCP Congestion Control
Increase of network bandwidth also saw the emergenceof ‘high-speed’ TCP variants such as FAST [25], BIC [26],CUBIC [21] and BBR [22] for transporting Internet traf-fic. Yet, TCP’s inability to fairly share the bandwidth offlows with heterogeneous RTTs—a problem known to thecommunity for around two decades [27], [28], [29]—stillpersists. As demonstrated by our experiments (and also otherworks, e.g., [30]) CUBIC exhibits such a behavior, and sodoes BBR [9], [31]. This unfairness in achieved throughputworsens when flows with different TCP congestion controlmechanisms compete [11]. Another interesting observationfrom literature is that the relative performance degradation inthroughput can be due to more than the single factor of RTT.In this context, we highlight that
DiffPerf is agnostic tothe specific RTT of flows and other router specifications (e.g.,buffer size) in performing optimization and enforcement of thecomputed optimized bandwidth; indeed
DiffPerf classifiesand isolates flows of dissimilar characteristics solely based ontracking their achieved throughput.
B. Service Differentiation
Service differentiation is at the core of network qualityof service (QoS) provisioning to serve traffic from multi-ple classes over the network [32], [33], [34], [35]. While
IntServ [34] did not find adoption in the Internet,
DiffServ [32]inspired a body of work on providing differentiated services.However, many of such solutions mandate a sophisticatedscheduling with manual configuration of QoS knobs on a perservice class basis. Instead, we choose a well-known utilityfunction based framework which enables service operator topractically specify number of service classes and make goodbalance between the bandwidth share and performance.In [36], authors proposed an approach for rate-delay (RD)differentiation by maintaining two queues at the router’s outputlink. While the aspirations resemble
DiffPerf , it is stillbest-effort and does not promise any rate or loss guarantees.[37] discussed a static service differentiation framework forISP. In short, class-based traffic control and service differen-tiation have been largely limited to theoretical analyses [36],[38], [39], [40], and have not been experimented on hardwareswitches with real application traffic.
DiffPerf ’s inter-classutility is general enough to make trade-offs among desirableperformance metrics and operates dynamically based on activeusers and available bandwidth.
C. Fair Queuing
Fair queuing has been a topic for extensive research [41],[42], [43], [42], [44], [45], [46]. FQ-CoDel [42], a recentAQM discipline, offers good performance gain in achievingfairness among the flows, by classifying them into different
200 400 600 800 1000 1200
Time (s) D AS H c li en t I D G-class S-class B-class (a) Arrival-departure of the Clients
Time (s) C u m u l a t i v e s u m o f f l o w s G lower sub-classG upper sub-classG-class total (b) G-class active flows
Time (s) C u m u l a t i v e s u m o f f l o w s S lower sub-classS upper sub-classS-class total (c) S-class active flows
Time (s) C u m u l a t i v e s u m o f f l o w s B lower sub-classB upper sub-classB-class total (d) B-class active flows
Figure 9: DASH clients arrival-departure and service classes cumulative sum of active flows
Time (s) A ll o c a t ed B and w i d t h ( M bp s ) G lower sub-classG upper sub-classG-class total (a) Bandwidth allocation for G-class
Time (s) A ll o c a t ed B and w i d t h ( M bp s ) S lower sub-classS upper sub-classS-class total (b) Bandwidth allocation for S-class
Time (s) A ll o c a t ed B and w i d t h ( M bp s ) B lower sub-classB upper sub-classB-class total (c) Bandwidth allocation for B-class
Figure 10: Dynamic service classes bandwidth allocationbuckets and serving them in a round-robin manner. However,large memory is extremely expensive or unavailable in dataplane, hence it is practically infeasible to accommodate verylarge number of buckets for hashing large number of flows.
DiffPerf optimizer performs simply by comparing flow’s z score with a pre-defined β threshold, (i.e., classifier requiresno training). Additionally, unlike AQM, DiffPerf is notlimited to specific congestion algorithm; it works on topof several interactive parameters such as buffer size, flowcharacteristics, and congestion algorithm. Also,
DiffPerf isportable, it can be packaged as virtual network function (VNF)over SmartNIC [47], to handle a presence of extremely heavyworkload.
D. User Quality of Experience (QoE)
In the context of video streaming, several studies proposedto improve user QoE [24], [48] or to achieve QoE fairness [49].These approaches continuously attempt to improve the adap-tive bitrate (ABR) algorithms in the DASH Reference Player atapplication layer, based on several performance metrics seen inthe application. Our work differ from them in that we proposebottom-up optimization.
DiffPerf reacts to the interplaybetween several network’s inherently coupled parameters bycontinuously improving affected traffic flows. This in largepart improves the performance metrics (e.g., QoE fairness) atthe application. VI. C
ONCLUSION
We propose
DiffPerf that leverages the rapid develop-ment in network softwarization and enables an agile and dy-namic network bandwidth allocation at the AP vantage point.At a macroscopic level,
DiffPerf offers access providersnew capabilities for performance guarantees by dynamically allocating bandwidth to service classes through which thetrade-off between fairness and differentiation can be made. Ata microscopic level,
DiffPerf isolates and optimizes the af-fected flows as a result of interplay between several network’sinherently coupled parameters such as flow characteristics,buffer size, and congestion algorithm. We implemented twoprototypes of
DiffPerf ; one in ODL with OpenFlow, andthe other on the programmable Tofino switch. We evaluate
DiffPerf from an application perspective by evaluating itfor MPEG-DASH video streaming. Our experiment resultsconfirm
DiffPerf ’s capabilities of QoE provisioning, fair-ness, and optimization. A
PPENDIX AP ROOFS OF T HEOREMS
Proof of Theorem 1 : From optimization theory, our band-width allocation problem is a convex optimization problem.By Karush-Kuhn-Tucker (KKT) conditions, it has a uniquesolution which satisfies w s (cid:18) X s n s (cid:19) − α − u + u G = 0 and u s X s = 0 , ∀ s ∈ S ,u (cid:32)(cid:88) s ∈S X s − C (cid:33) = 0 where u and ( u s : s ∈ S ) are KKT multipliers and satisfy u, u s ≥ for any s ∈ S . By solving the above equations, wecan derive that X s = n s α √ w s (cid:80) s (cid:48)∈S n s (cid:48) α √ w s (cid:48) C, ∀ s ∈ S . Proof of Theorem 2 : By the definition of the set F Ls ( β ) , weknow that for any two thresholds β < β , F Ls ( β ) ⊆ F Ls ( β ) .For any two flows ¯ f and f satisfying ¯ f ∈ F Ls ( β ) and f ∈ F Ls ( β ) \F Ls ( β ) , we have x ¯ f < x ˜ f because z ¯ f < β ≤ z ˜ f < β . Therefore, it satisfies (cid:80) f ∈F Ls ( β x f |F Ls ( β ) | ≤ (cid:80) f ∈F Ls ( β x f |F Ls ( β ) | ,.e., the average achieved throughput of the flows within thelower sub-class F Ls is non-decreasing in β . By Eq.(5), when β = 0 , we have (cid:80) f ∈F Ls ( β ) x f |F Ls ( β ) | = (cid:80) f ∈F− s x f |F − s | ≤ X Ls ( β ) |F Ls ( β ) | because F Ls = F − s . In other words, the average achievedthroughput of the flows within the lower sub-class F Ls equalsthe average per-flow capacity re-allocated to F Ls . Because |F Ls | is non-decreasing in β , γn s / (cid:0) n s − |F Ls | (cid:1) is non-decreasing in β . By Eq.(5), the capacity allocated to theper-flow of the upper sub-class satisfies X Hs |F Hs | = X s − X Ls |F Hs | = γn s n s −|F Ls | (cid:18) X s n s − (cid:80) f ∈F− s x f |F − s | (cid:19) + γ (cid:80) f ∈F− s x f |F − s | +(1 − γ ) X s n s ≥ X s n s .Thus it is non-decreasing in β and no lower than X s /n s .R Workshop on Buffer Sizing , 2019.[5] “Netflix,” https://ispspeedindex.netflix.com/country/us/, acc. May. 2020.[6] P. Faratin, D. D. Clark, S. Bauer, and W. Lehr, “Complexity of Internetinterconnections: Tech., incentives and implications for policy,” 2007.[7] K. Bollen and R. Lennox, “Conventional wisdom on measurement: Astructural equation perspective.”
Psychological bulletin , 2018.[8] R. Srikant,
The mathematics of Internet congestion control . SpringerScience & Business Media, 2012.[9] M. Hock, R. Bless, and M. Zitterbart, “Experimental evaluation of BBRcongestion control,” in
Proc. IEEE ICNP , 2017.[10] M. Mathis and A. McGregor, “Buffer sizing: a position paper,” in
Workshop on Buffer Sizing , 2019.[11] J. R¨uth, I. Kunze, and O. Hohlfeld, “An Empirical View on ContentProvider Fairness,” in ”Proc. TMA , 2019.[12] X. Chen, H. Kim, J. M. Aman, W. Chang, M. Lee, and J. Rexford,“Measuring tcp rtt in the data plane,” in
Workshop on SPIN , 2020.[13] “OpenDaylight Project,” http://wiki.opendaylight.org, acc. Jul. 2019.[14] M. Bjorklund et al.
IEEE Communications Magazine , 2015.[19] “Network emulation with netem,” https://man7.org/linux/man-pages/man8/tc-netem.8.html, acc. Aug 2020.[20] A. Mishra, X. Sun, A. Jain, S. Pande, R. Joshi, and B. Leong, “The greatinternet tcp congestion control census,” in in Proc. ACM on MACS , 2019.[21] S. Ha, I. Rhee, and L. Xu, “CUBIC: a new TCP-friendly high-speedTCP variant,”
ACM SIGOPS operating systems review , 2008.[22] N. Cardwell, Y. Cheng, C. S. Gunn, S. H. Yeganeh, and V. Jacobson,“BBR: Congestion-Based Congestion Control,”
Queue , 2016.[23] X. Yin, A. Jindal, V. Sekar, and B. Sinopoli, “A control-theoreticapproach for dynamic adaptive video streaming over HTTP,” in
Proc.ACM SIGCOMM , 2015.[24] P. K. Yadav, A. Shafiei, and W. T. Ooi, “Quetra: A queuing theoryapproach to dash rate adaptation,” in
Proc. ACM MM , 2017.[25] D. X. Wei, C. Jin, S. H. Low, and S. Hegde, “FAST TCP: Motivation,Architecture, Algorithms, Performance,”
IEEE/ACM Tran. on Net. , 2006.[26] Lisong Xu, K. Harfoush, and Injong Rhee, “BIC for fast long-distancenetworks,” in
Proc. IEEE INFOCOM , 2004.[27] C. Barakat, E. Altman, and W. Dabbous, “On TCP performance in aheterogeneous network: a survey,”
IEEE Comm. Magazine , 2000. [28] K. Tan, J. Song, Q. Zhang, and M. Sridharan, “A Compound TCPApproach for High-Speed and Long Distance Networks,” in
Proc. IEEEINFOCOM , 2006.[29] Y. Li, D. Leith, and R. N. Shorten, “Experimental Evaluation of TCPProtocols for High-Speed Networks,”
IEEE/ACM Tran. on Net. , 2007.[30] D. Miras, M. Bateman, and S. Bhatti, “Fairness of High-Speed TCPStacks,” in
Proc. AINA , 2008.[31] S. Ma, J. Jiang, W. Wang, and B. Li, “Towards rtt fairness of congestion-based congestion control,”
CoRR
Proc. IEEE INFOCOM
IEEE Tran. on NSM , 2012.[36] M. Podlesny and S. Gorinsky, “Rd network services: differentiationthrough performance incentives,”
Proc. ACM SIGCOMM , 2008.[37] V. Sivaraman, S. C. Madanapalli, H. Kumar, and H. H. Gharakheili,“Opentd: Open traffic differentiation in a post-neutral world,” in
Proc.ACM SOSR , 2019.[38] B. Han, V. Sciancalepore, D. Feng, X. Costa-Perez, and H. D. Schotten,“A utility-driven multi-queue admission control solution for networkslicing,” in
Proc. IEEE INFOCOM , 2019.[39] Y.-W. E. Sung, C. Lund, M. Lyn, S. G. Rao, and S. Sen, “Modelingand understanding end-to-end class of service policies in operationalnetworks,” in
Proc. ACM SIGCOMM , 2009.[40] M. Zou, R. T. Ma, X. Wang, and Y. Xu, “On optimal service differen-tiation in congested network markets,”
IEEE/ACM Tran. on Net. , 2018.[41] D. Katabi, M. Handley, and C. Rohrs, “Congestion control for highbandwidth-delay product networks,” in
Proc. ACM SIGCOMM , 2002.[42] T. Hoeiland-Joergensen, P. McKenney, D. Taht, J. Gettys, and E. Du-mazet, “The flow queue-codel packet scheduler and active queue man-agement algorithm,”
IETF Draft, March , 2016.[43] J. C. Bennett and H. Zhang, “Hierarchical packet fair queueing algo-rithms,”
IEEE/ACM Tran. on Net. , 1997.[44] N. K. Sharma, M. Liu, K. Atreya, and A. Krishnamurthy, “Approximat-ing fair queueing on reconfigurable switches,” in
Proc. NSDI , 2018.[45] M. Hedayati, K. Shen, M. L. Scott, and M. Marty, “Multi-queue fairqueuing,” in
Proc. NSDI , 2019.[46] A. Sivaraman, K. Winstein, S. Subramanian, and H. Balakrishnan, “Nosilver bullet: extending sdn to the data plane,” in
Proc. ACM Workshopon Hotnets
Proc. ACM MM , 2016.[49] P. Georgopoulos, Y. Elkhatib, M. Broadbent, M. Mu, and N. Race,“Towards network-wide qoe fairness using openflow-assisted adaptivevideo streaming,” in