[PDF] Scalable Application- and User-aware Resource Allocation in Enterprise Networks Using End-host Pacing

Abstract

Scalable user- and application-aware resource allocation for heterogeneous applications sharing an enterprise network is still an unresolved problem. The main challenges are: (i) How to define user- and application-aware shares of resources? (ii) How to determine an allocation of shares of network resources to applications? (iii) How to allocate the shares per application in heterogeneous networks at scale? In this paper we propose solutions to the three challenges and introduce a system design for enterprise deployment. Defining the necessary resource shares per application is hard, as the intended use case and user's preferences influence the resource demand. Utility functions based on user experience enable a mapping of network resources in terms of throughput and latency budget to a common user-level utility scale. A multi-objective MILP is formulated to solve the throughput- and delay-aware embedding of each utility function for a max-min fairness criteria. The allocation of resources in traditional networks with policing and scheduling cannot distinguish large numbers of classes. We propose a resource allocation system design for enterprise networks based on Software-Defined Networking principles to achieve delay-constrained routing in the network and application pacing at the end-hosts. The system design is evaluated against best effort networks with applications competing for the throughput of a constrained link. The competing applications belong to the five application classes web browsing, file download, remote terminal work, video streaming, and Voice-over-IP. The results show that the proposed methodology improves the minimum and total utility, minimizes packet loss and queuing delay at bottlenecks, establishes fairness in terms of utility between applications, and achieves predictable application performance at high link utilization.

Full PDF

SScalable Application- and User-aware Resource Allocationin Enterprise Networks Using End-Host Pacing

CHRISTIAN SIEBER,

Chair of Communication Networks, Technical University of Munich, Germany

SUSANNA SCHWARZMANN,

FG INET, TU Berlin, Germany

ANDREAS BLENK,

Chair of Communication Networks, Technical University of Munich, Germany and Faculty ofComputer Science, University of Vienna, Austria

THOMAS ZINNER,

FG INET, TU Berlin, Germany

WOLFGANG KELLERER,

Chair of Communication Networks, Technical University of Munich, Germany

Providing scalable user- and application-aware resource allocation for heterogeneous applications sharing an enterprise network isstill an unresolved problem. The main challenges are: (i) How to define user- and application-aware shares of resources? (ii) How todetermine an allocation of shares of network resources to applications? (iii) How to allocate the shares per application in heterogeneousnetworks at scale? In this paper we propose solutions to the three challenges and introduce a system design for enterprise deployment.Defining the necessary resource shares per application is hard, as the intended use case, the user’s environment, e.g., big or smalldisplay, and the user’s preferences influence the resource demand. We tackle the challenge by associating application flows withutility functions from subjective user experience models, selected Key Performance Indicators, and measurements. The specific utilityfunctions then enable a mapping of network resources in terms of throughput and latency budget to a common user-level utility scale.A sensible distribution of the resources is determined by formulating a multi-objective mixed integer linear program to solve thethroughput- and delay-aware embedding of each utility function in the network for a max-min fairness criteria. The allocation ofresources in traditional networks with policing and scheduling cannot distinguish large numbers of classes and interacts badly withcongestion control algorithms. We propose a resource allocation system design for enterprise networks based on Software-DefinedNetworking principles to achieve delay-constrained routing in the network and application pacing at the end-hosts.The system design is evaluated against best effort networks in a proof-of-concept set-up for scenarios with increasing number ofparallel applications competing for the throughput of a constrained link. The competing applications belong to the five applicationclasses web browsing, file download, remote terminal work, video streaming, and Voice-over-IP. The results show that the proposedmethodology improves the minimum and total utility, minimizes packet loss and queuing delay at bottlenecks, establishes fairness interms of utility between applications, and achieves predictable application performance at high link utilization.CCS Concepts: • Networks → Network design principles ; •

Human-centered computing → Human computer interaction(HCI) ; •

Theory of computation → Linear programming ; •

Applied computing → Intranets . Additional Key Words and Phrases:

SDN, QoE, QoS, pacing, enterprise, network, HAS, VoIP, browsing

Authors’ addresses: Christian Sieber, [email protected], Chair of Communication Networks, Technical University of Munich, Germany; Susanna Schwarz-mann, [email protected], FG INET, TU Berlin, Germany; Andreas Blenk, [email protected], Chair of Communication Networks,Technical University of Munich, Germany , Faculty of Computer Science, University of Vienna, Austria; Thomas Zinner, [email protected], FGINET, TU Berlin, Germany; Wolfgang Kellerer, [email protected], Chair of Communication Networks, Technical University of Munich, Germany.Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are notmade or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-partycomponents of this work must be honored. For all other uses, contact the owner/author(s).© 2020 Copyright held by the owner/author(s).Manuscript submitted to ACMManuscript submitted to ACM a r X i v : . [ c s . N I] F e b Sieber, C. et al.

ACM Reference Format:

Christian Sieber, Susanna Schwarzmann, Andreas Blenk, Thomas Zinner, and Wolfgang Kellerer. 2020. Scalable Application- andUser-aware Resource Allocation in Enterprise Networks Using End-Host Pacing.

ACM Trans. Model. Perform. Eval. Comput. Syst.

1, 1,Article 1 (January 2020), 41 pages. https://doi.org/10.1145/3381996

Increasing bandwidth demands by multimedia-rich applications and low delay requirements for real-time communica-tions present a challenge for modern enterprise network designs. Despite a variety of demands, an enterprise networkhas to support the employees by providing a reliable infrastructure for the deployed network applications. Alongside theemployees, the network resources are drained by automated processes such as backup transfers or by Internet of Things(IoT) devices such as surveillance cameras or sensors. A network design is required which allocates every applicationits share of the available network resources while at the same time minimizes the need for over-provisioning. There arethree main challenging research questions for application- and user-aware resource allocation in enterprise networks:I)

Define : How to define an application-aware allocation of resources in terms of Quality of Experience (QoE) ofthe user, considering the variety of application classes and their demands?II)

Determine:

How to determine shares of resources for each application under resource constraints consideringthe definition of application-awareness derived in I)?III)

Allocate:

How to allocate each application its share of the network resources in heterogeneous enterprise networkswhere the availability of QoS mechanisms at each hop highly depends on the deployed switching hardware?Today there are commonly two high-level approaches for resource allocation in enterprise networks: best efforttransport with sender-based congestion control and Quality of Service (QoS) mechanisms on the forwarding devices.But this is neither stable or fair in terms of goodput when applications compete for a link’s bandwidth [31], nor awareof the specific application or the user behind it. This can lead to bad application quality and, as a consequence, to userdissatisfaction. The second option, QoS configuration at the forwarding devices, either discards or delays data packetsof an application or application class in favor of another class or application. However, enforcing QoS on intermediatedevices has several drawbacks. Buffer space and scheduling QoS options are limited on the devices. Discarding packetsalong the way from the sender to receiver interacts badly with the sender’s congestion control [14] and increases thenetwork load due to the retransmission of discarded packets.Moving the QoS enforcement from the intermediate devices to the end-hosts is a viable third option, as shown bydata-center operators: By using a central controller, network monitoring, and programmable application pacing at thesender and receiver, a specific amount of the available throughput can be allocated to each application. Congestion inthe network is then prevented by limiting the total sending rate of all applications [28]. At the end-hosts, applications,i.e., the primary contributors to the network load, can be restricted from generating more data than the network cancarry. Furthermore, the limited QoS options, such as interface queues, can be reserved for high-profile use cases such ascritical real-time traffic and separating managed from best-effort traffic.In this paper, we apply this concept to enterprise networks and show that, indeed, a global control strategy withend-host pacing can significantly improve user experience. Next we define the problems in detail.

Manuscript submitted to ACM calable Application- and User-aware Resource Allocation in Enterprise Networks 3

In plain best effort networks, resource allocation is implemented on transport-level at the endpoints, e.g., at web serversand browsers, via TCP congestion control. Congestion control works at sender-side by increasing or decreasing thesending rate based on observed packet loss and the Round-Trip Time (RTT). TCP’s goal is to divide the availabledata-rate equally between active TCP connections. In the network, the data packets of a sending application, e.g., a webserver, are treated equally by the forwarding devices. If the receiving rate at a forwarding device’s interface exceeds themaximum physical sending rate, packets are queued in a buffer or dropped if the buffer is full.The main problems with plain best effort networks are: (1) Some applications, such as web browsers, behave unfairand open multiple parallel TCP connections and therefore can receive a larger fraction of the available throughput.(2) Datagram-based applications, such as Voice-over-IP (VoIP), often do not implement any congestion control at all.(3) The effectiveness of TCP congestion control depends on factors such as the specific congestion control algorithm,delay, packet loss, relative start times of competing TCP flows and how active a TCP connection is. (4) Differentdemands of applications are not considered, e.g., in terms of minimum throughput and maximum delay. Thus, there isno application-awareness in best effort networks.Commonly, the problems of best effort networks are addressed by enterprises by implementing QoS mechanismsin the network. QoS mechanisms on the forwarding devices allow to prioritize some packets over others based onmatching rules. For example priority queuing allows to put VoIP packets based on the Type of Service (ToS) flag, VLANtag or specific UDP ports into a queue with preferred treatment. That way the delay and packet loss of VoIP calls iskept low and isolated from other traffic. Flow- or class-based Weighted Fair Queueing (WFQ) allows to put individualapplication flows or whole application classes into separate queues with guaranteed minimum bandwidth. Token bucket(TB) policing allows to limit the data-rate of individual flows or classes without the need for switch buffer space. Forexample mobile service providers are known to use TB policing to limit the data-rate of video streaming services [14].But implementing QoS in the network is costly and inefficient: (1) Buffers in forwarding devices are expensive andthere is only a limited number of queues to configure per egress interface, typically about 8 . This is insufficient forimplementing a sophisticated strategy to distinguish hundreds of active applications of multiple classes in a network.(2) Policing interacts badly with transport-level congestion avoidance algorithms resulting in lost packets. Lost packetscause retransmissions and decrease transmission efficiency [14]. (3) Heterogeneous enterprise networks with diverseforwarding devices from different vendors are complex and error-prune to manage, hampering the enforcement ofend-to-end QoS options. Furthermore, there are no common QoS abstractions across switching hardware vendors.Hence, deploying a single QoS strategy across devices might not be possible, especially if not all devices support therequired features. (4) Encryption or header field ambiguity can prevent the correct identification of application classesin the network.Hence, with limited or incompatible QoS mechanisms and the issues regarding identification of application flows, ascalable and application-aware network design is hard to implement in the network (see also Section 2). We realize the resource allocation by implementing centrally-controlled pacing of individual applications at the end-hosts. Packet pacing at the end-hosts ensures that a stream of packets conforms to a specified data-rate by addingartificial delays between consecutive packets during the sending process. Pacing prevents packet loss by smoothing out Jim Warner, https://people.ucsc.edu/~warner/buffer.html, last accessed: 11.10.2018 Manuscript submitted to ACM

Sieber, C. et al.

Allocation

ProblemSolver

Active

ApplicationsUtility FunctionsNetwork Topology /

Resources

Delay-Constrained

FlowRoutingApplicationPacing

Inputs Outputs

NetworkController

End-Host Agents

Maximize:1. Minimum utility2. Total utility

Networking Stack

Forwarding

Devices

SDN ControlOptimization

PacingRouting

SubjectiveModelsMeasurementsKPIs

Challenge I : Defining application- and user-Awareness

Challenge II : Determining sensible shares of resources

Challenge

III : Allocate the shares

123 4 Fig. 1. Overview over the challenges and the proposed solution towards scalable application-aware resource allocation in enterprisenetworks. Based on the set of applications ( ), resources ( ) and application utility functions ( ), the allocation problem solvermaximizes the minimum and total utility over all active applications ( ). As a result, delay-constrained flow routing ( ) andapplication pacing rates ( ) are implemented by a network controller ( ) in the network ( ) and on the end-hosts ( ). packet bursts and allows for shallow buffers in the intermediate forwarding nodes. Shallow buffers reduce queuingdelays and avoid expensive switch buffer space. Applications can reliably determine their available goodput and itis unnecessary to probe the throughput by loss-based congestion control mechanisms. Furthermore, pacing at theend-host allows for implementation of effective backpressure to the applications producing the data, reducing theamount of buffered data in the network stack. Pacing at the end-hosts can scale to thousands of traffic classes [37],congestion in the network can be avoided by a central management of the available resources [28] and applicationflows can be identified at the source. Recent works show that bandwidth allocation to applications can be implementedhierarchically at global scale, enabling high percentages of link utilization [28]. Sender congestion control and QoS inthe network are downgraded both to failsafe solutions and supportive roles in the overall QoS strategy, e.g., in cases thecentral control fails or embedded devices cannot be modified.Ultimately, a user of an application does not care about what share of the resources is allocated to her/him as long asher/his user experience, or Quality of Experience (QoE), with the application is positive. For that reason, challengesI) and II), i.e., how to define and determine sensible allocations, are tackled based on the resulting user experience.We define the user experience as a per-application utility function of throughput and delay. The utility function isderived from user experience models from the literature and selected application Key Performance Indicators (KPIs). Byjointly optimizing the utility and network resources usage, a fair share in terms of utility can be determined given a setof applications, utility functions, and constrained network resources. Challenge III) is the scalable allocation of thecalculated application shares. We propose centrally-controlled application pacing at the end-hosts combined with perapplication flow routing. Routing is solved implicitly by our problem formulation by selecting paths for application flowswhich satisfy capacity and delay requirements. Routing per flow can then be implemented through Software-DefinedNetworking (SDN) for all applications in the network. The identification of application flows in the network, e.g., sourceand destination TCP/UDP ports, are provided to the central controller by software agents at the end-hosts. Applicationscan then be subjected to routing and pacing as dictated by the network controller.Figure 1 summarizes the general methodology of the proposed solution. First, the active applications ( ) in thenetwork are determined by end-host agents and network monitoring. Second, the network topology and available Manuscript submitted to ACM calable Application- and User-aware Resource Allocation in Enterprise Networks 5resources ( ) are known by the network controller. Third, a suitable utility function ( ) based on subjective QoEmodels, application KPIs, and measurements is associated with each application. An allocation problem solver ( ) thendetermines the per-application routing ( ) and application pacing rates ( ) based on a fairness criteria. Routing rulesare then implemented by the network controller ( ) on the forwarding devices ( ) and pacing rates are enforced atthe end-hosts ( ).The system is implemented as a proof-of-concept set-up with support for the following five application classes: webbrowsing, batch file transfer, VoIP, adaptive video streaming, and remote administration. In the set-up, we evaluatestatic scenarios where a fixed number of parallel clients with multiple applications have to use a resource constrainedlink to communicate with central services, such as it is the case in SD-WAN or remote building scenarios. The resultsshow that central pacing can provide dependable application performance and increases inter-application fairness athigh link utilizations. The contributions of this paper are as follows:(1) We present a system design for scalable user-aware resource allocation in enterprise networks based on SDN-principles and end-host pacing (Section 3). The design does not make assumptions about the availability of QoSmechanisms such as WFQ on the forwarding devices.(2) We define throughput- and delay-dependent utility functions for five application classes. Furthermore, wediscuss deployment options and trade-offs regarding the creation and accuracy of the utility functions (Section4). Compared to other works, the utility functions are based on actual subjective studies and thus tied to theexperience of the user instead of technical KPIs. Furthermore, measurements of the applications’ behavior underlimited available resources are used to determine the relationship between resources and user experience.(3) We formulate the utility throughput- and delay-aware allocation problem as a 2-step Mixed Integer LinearProgram (MILP) with max-min fairness criteria. The first step maximizes the minimum utility in the network( max-min-fairness ), while the second step maximizes the sum of all utilities for a constrained minimum utility(Section 5 and in detail in Appendix A). While the min-max utility proportional fair bandwidth allocationproblem is well studied in literature, the problem combination of bandwidth allocation and delay-aware routingfor arbitrary utility functions is not formulated so far. Note that in this paper we provide an optimal algorithmfor the allocation problem, but with limited scalability.(4) We evaluate application mixes with over 100 parallel applications of 5 common use cases in a proof-of-concept set-up. The results show how pacing can improve delay and packet loss at bottlenecks and can significantly increaseinter-application fairness in terms of utility. Furthermore, pacing leads to predictable application performanceeven at high levels of network utilization (Section 7).(5) We provide all material to the paper, such as the automated applications, a virtual experimentation set-up, andoptimization formulation as open source software. The paper is structured as follows. Section 2 introduces the background and related work. Section 3 presents theproposed system architecture. Afterwards we define the shares (Section 4), determine shares under resource constraints https://github.com/tum-lkn/appaware - Supplemental material to this article Manuscript submitted to ACM Sieber, C. et al.(Section 5), discuss the allocation of the shares in a experimental set-up (Section 6) and evaluate the effectiveness of theproposed approach in the set-up (Section 7). Section 8 summarizes the results, discusses future research directions andconcludes this paper.

This section introduces fundamental network QoS control techniques and, from this, motivates the usage of pacing.Besides the technical basics, we describe its benefits and implementations, and present some works targeting pacing ofindividual TCP flows. Finally, we summarize related works on multi-application QoE management. We start by definingthe term enterprise network in the context of this paper.

Enterprise networks are not bound to the same net neutrality laws which govern most parts of the public Internetand access to an enterprise network can be limited to approved devices. The network operator is in full control of theapplications deployed in the network and on the end-hosts. This is due to security concerns (e.g. malware, leakage ofsensitive documents/data) and the need for performance guarantees for mission-critical applications. This means thatend-hosts are restricted to a small set of applications, depending on the role of the employee, and that the communicationof each application can be monitored. HTTP(S) traffic passes through a proxy to perform Deep Packet Inspection (DPI)to identify sensitive documents being uploaded on an external website or malware being accidentally downloaded.The scale can range from small businesses housed in one building to global enterprises with multiple remote campusesconnected to one or multiple central offices and millions of end-hosts. In order to adjust the available throughputaccording to the utility allocation, it is crucial to know or approximate the available throughput and to monitor the linkutilization and packet loss. For delay-constrained routing per application flow and load balancing, Software-definedNetworking with fine-grained flow control is necessary. If SDN control is not available, allocation can still be done basedon the available forwarding graph, e.g., based on shortest-path routing. QoS control mechanisms on the network nodesare not required, but can be used to support the overall QoS strategy. For example, two VLANs in combination withtwo queues can be used to isolate managed from best effort traffic using hierarchical token bucket (HTB) scheduling.

On a basic level, QoS enforcement relies on two options of treating packets in the network: they either can be dropped orenqueued. Mechanisms that decide how packets are treated form the fundamentals of QoS control techniques, e.g., flowprioritization or rate allocation with weighted fair queuing are widely applied in today’s communication networks [32].Table 1 summarizes and classifies the most relevant techniques and gives state of the art examples. In the following, weshortly describe the listed mechanisms.Active queue management (AQM) is applied within queues of network elements and describes the intelligent dropof network packets to control the queue length [35]. Excessively buffering packets causes bufferbloat and leads toincreased delays. Random early detection (RED) [16] is one of the well-known and widely applied mechanisms for AQM.Conventional tail-drop mechanisms discard all incoming packets when the queue is full. RED drops incoming packetswith a certain probability that increases with increasing queue length. To realize this, RED applies two thresholds: Ifthe queue is (almost) empty, the probability to drop a packet is set to zero. If the queue is (almost) filled, all packetsare definitely dropped. In between these two thresholds, the dropping probability increases linearly. That way, REDproactively prevents bufferbloat and reduces the bias of discarded packets against bursty traffic. Controlled Delay

Manuscript submitted to ACM calable Application- and User-aware Resource Allocation in Enterprise Networks 7(CoDel) [34] keeps the queuing delay of packets below a certain threshold. Packets are marked with the currenttimestamp as they enter the buffer. When dequeuing a packet, the CoDel algorithm computes the time it spent in thebuffer. When the maximum delay, by default 5 ms, is exceeded for a certain amount of time, subsequent packets aredropped at the head of the queue. In contrast to CoDel and RED, Explicit Congestion Notification (ECN) [15] does notproactively discard packets. Instead, it marks packets in case of impending congestion to inform the receiver, which inturn signals the impending congestion to the sender. As the ECN-aware endpoints adapt the sending rate accordingly,ECN performs the queue length control and bufferbloat prevention in an indirect manner.Rate limiting mechanisms manage queues or flows to achieve a target traffic rate. One rate limiting example ispolicing, which controls the rate of a flow by dropping network packets. This is realized by applying token bucket orleaky bucket algorithms. Tokens are created according to the target rate. If not enough tokens are available, packets tobe sent are dropped. In contrast to policing, where packets are dropped in case that no tokens are available, shapingenqueues packets and allows them to wait for a token to be created.Scheduling algorithms decide about how packets are dequeued when several queues are active. Hence, they operatebetween queues. First of all, incoming packets are classified based on pre-defined QoS policies and are accordinglyinserted into one of the queues. The scheduling algorithm then decides about the order and frequency with whichpackets can be released from the different queues. Allowing certain queues to transmit packets more often than others,enables QoS enforcements in the sense of allocating higher bandwidth shares, i.e., different priorities to different queues.Such mechanisms that govern how packets are queued and de-queued, are often referred to as queueing disciplines(qdiscs) . They can further be categorized as handling packets in either a classful or a classless manner. We omit thedifferentiation in Table 1, but shortly emphasize the difference in the following. Classless queueing disciplines are wellsuited for basic traffic management and come with decreased configuration overhead compared to classful queueingdisciplines. Classful qdiscs allow a more differentiated treatment of different kinds of traffic on the costs of increasedconfiguration efforts, like the definition of appropriate filters and classes. From the examples above, Class-basedQueueing (CBQ) [17], Hierarchical Token Bucket (HTB), and Weighted Round Robin (WRR) fall within the classfulqdiscs, while Round Robin (RR) is an example for a classless qdisc.The paradigm of smart queue management combines active queue management and scheduling. Weighted RandomEarly Detection (WRED) [44] allows to apply several thresholds of dropping packets in one queue. For example, whilepackets of one QoS class are dropped if the buffer is half filled, the packets belonging to another QoS class are onlydropped if the buffer is completely filled. Furthermore, WRED supports applying several queues with different bufferlengths. On the one hand, this allows to additionally influence the packet dropping probability for different QoS classes.On the other hand, scheduling between the queues enables realizing further QoS policies, like packet prioritization.Flow Queue Codel ( fq_codel , RFC8290) is an extension of CoDel. It uses multiple queues, whereby each of the queuesemploys CoDel. A scheduler decides based on a modified Deficit Round Robin algorithm, from which queue a packetshould be dequeued. fq_codel allows to enforce QoS policies by classifying the packets and allocating them accordinglyto queues.Although many QoS enforcement mechanisms exist and are applied in today’s networks, they cannot be straight-forwardly applied in our case. Some of the techniques listed in the table are not powerful enough. ECN, for example,is capable to influence the sender’s rate to prevent packet loss, but does not allow to set a specific rate. The majordrawback when it comes to applying those mechanisms is the limited number of configurable queues in network

Manuscript submitted to ACM

Sieber, C. et al.

Table 1. Overview of traffic QoS control/allocation techniques applied in communication networks

Location Technology Action Example Description Illustration

Drop Queue

Activequeue man-agement:

Manage thequeue length X RED Drops packets based on statistical probabilities in-stead of conventional tail drop. Prevents high de-lays resulting from full buffers. p(drop)=0p(drop)=1

CoDel Reduction of packet transmission delays by pre-venting large and constantly full buffers mark checkdelay/drop

Withinqueues ECN Notification about network congestion withoutdropping packets

Notify

Ratelimiting:

Achievetarget trafficrate X Policing Tokens are created with a rate corresponding thetarget traffic rate. If no tokens are available, incom-ing packets are dropped.

DropToken available

X Shaping Tokens are created with a rate corresponding thetarget traffic rate. If no tokens are available, incom-ing packets are enqueued.

Token availableEnqueue

X RR Round Robin lets every active data flow take turnin transferring packets on a shared channel in aperiodically repeated order.Betweenqueues

Scheduling:

Allocateresource toqueues X CBQ Divides user traffic into a hierarchy of classes andperforms class based queueing so to allocate band-width to traffic classes.

ScheduleClassify

X WRR Allows to differentiate QoS classes by allowing cer-tain queues to put more packets on the wire.X HTB Hierarchical token bucket allows for setting band-width thresholds to different flow classes.Hybrid:Withinandbetweenqueues

Smartqueue man-agement:

QoS-awarequeuemangagement X X WRED Supports several queues that vary in buffer sizeand allows several thresholds per queue. Packetsof higher prioritized flows are less likely to bedropped. p(drop)=1Classify p(drop)=1

X X FQ-CoDel Flow queue CoDel (fq_codel) extends CoDel byapplying several queues. Allows for differentiatingQoS classes. mark checkdelay/dropclassify elements such as switches and routers. As a consequence, QoS can only be enforced on aggregated flows and QoSclasses. Hence, the limited scalability hinders a fine-granular QoS control. Shifting the QoS enforcement from networknodes to the end hosts constitutes a scalable method that allows for fine-grained QoS control. For that reason, wepropose to apply TCP pacing to enforce traffic rates on a per application basis.

The term pacing is used in different contexts and it is important to distinguish where it is applied and who is dictatingthe pacing rate. There can be pacing per interface, per application and per network socket or a combination of all three.The pacing rate can be set autonomously, e.g., like in TCP pacing, or by an external entity, e.g., by a central networkcontroller. The term TCP pacing is an example where the rate is set autonomously and refers to the technique wherethe packets of one TCP transmission window are spread out over the measured RTT [2]. The target pacing rate is Jim Warner, https://people.ucsc.edu/~warner/buffer.html, last accessed: 11.10.2018Manuscript submitted to ACM calable Application- and User-aware Resource Allocation in Enterprise Networks 9determined by the congestion control algorithm based on observed packet loss or delay. In this work, when we usethe term pacing, we apply pacing per-application flow and it is set by the central network controller. One applicationflow can include one or multiple streams (TCP) or datagram-based (UDP) transmissions which share the same source,destination and network path. Hence, all packets sent by the sockets of an application flow have to share the allocatedpacing rate. In the following, we shortly introduce the pacing implementation of the Linux Kernel. Afterwards, wehighlight the advantages of this technique compared to other rate limiting approaches, i.e., policing and shaping. Finally,other works relying on pacing are summarized.

Pacing follows the approach of placing gaps between outgoing packets so to evenlyspace data transmissions [2, 8]. In the case of the Linux pacing implementation, the departure time of the next packet time _ next _ packet is determined by the current time now , the size of the current packet pkt _ len , and the target pacingrate tarдet _ rate : time _ next _ packet = now + pkt _ lentarдet _ rate For details on the technical fundamentals and the way pacing is applied in this work, please refer to Section 6.2.Google is currently putting much efforts in developing efficient, rate-compliant, and scalable traffic control mechanisms,mainly for a deployment in data centers. To do so, they implement pacing in many of their recent approaches. With

TIMELY , they propose an RTT-based congestion control [33]. Their congestion-based congestion control (BBR) [6] isimplemented in the Linux kernel and used by all Google and YouTube server connections. With

Carousel [37] theypresent a scalable traffic shaping mechanism by controlling packet release times where the target rate can be set byexternal entities per traffic class. As we do not have those strict requirements on scalability as for

Carousel , we apply acustom version of the Linux fq implementation (Section 6.2). Pacing eliminates several drawbacks of other strategies for traffic rate control.While policing drops packets exceeding the target rate and shaping enqueues those packets, pacing follows the approachof delaying packets so to reach a certain rate. On the one hand, this eliminates the problem of increased overall networkload resulting from retransmitting dropped packets of policed flows. On the other hand, there is no RTT inflation,as with shaping. Policing interacts poorly with TCP, as a result, policed flows suffer from low throughput even atlow packet loss rates [14]. In contrast, pacing can increase the link utilization in shared environments. Delaying theoutgoing packets at the sender in a controlled manner reduces burstiness, which implicates less packet loss and resultsin fewer triggers of TCP’s congestion control. Furthermore, configuring target rates at end hosts brings the advantageof scalability, compared to other techniques. By shifting the QoS control to the involved end-hosts, pacing facilitates afine-grained control on flow- and application-level.However, studies have also shown that this only applies to some cases, while even with all-paced TCP flows theperformance is worsened in many cases [2, 20]. According to the authors of the studies, this can be attributed to the factthat TCP pacing delays the congestion signal and that pacing results in synchronized packet drops. A pacing systemthat shapes traffic under consideration of the buffer queue is proposed in [5]. The authors introduce Queue LengthBased Pacing (QLBP) to shape the traffic at access networks, so to smooth the traffic before entering the core network.This is especially interesting for small buffer networks, where packet loss is more likely to occur. The reduced packetloss, as a consequence of the decreased burstiness, results in a nearly fully link utilization when using the proposedsolution. The QLBP algorithm is also applied in [4] to study the impact of pacing on different network traffic conditions.The authors conclude that pacing is especially beneficial in networks with small buffers, where packet loss, as a result

Manuscript submitted to ACM

Table 2. Summary of state-of-the art approaches that make use of TCP pacing, along with their scope and the entity deciding aboutthe pacing rate.

Technique Category Scope Rate set by

BBR [6],TIMELY [33] TCP Pacing One TCP socket TCP congestion controlCarousel [37] Efficient and scalable pacing im-plementation Flexible, based on traffic classes,evaluated per flow External controller or TCP con-gestion controlFQ Linux kernel pacing implemen-tation Per flow Primarily congestion control al-gorithm, can be manually over-writtenBwE [28] Hierarchical bandwidth alloca-tion From global to per computingtask External controllerQLBP [4, 5] Edge pacing Single queue per interface Adaptive based on queue-lengthOur work End-user application pacing Per a subset of application’ssockets External controllerfrom bursty traffic, can significantly reduce network performance. They furthermore show that pacing can have asmall negative impact on short-lived flows, if the parameters are not set appropriately. Finally, it is shown that thefairness achieved by pacing only slightly differs from the fairness as achieved by TCP. The performance of host trafficpacing and edge traffic pacing, i.e., pacing the traffic before it enters the core network, is compared for small buffernetworks in [19]. The results indicate for most of the evaluated scenarios that edge pacing performs at least as well ashost pacing in terms of link utilization. Edge pacing also has practical benefits, as it does not require an adaptation ofthe involved clients. A critical analysis on pacing is performed in [43]. The authors evaluate the impacts of pacing forseveral TCP implementation and scenarios. They conclude that the benefits when applying pacing depend on the usedTCP implementation and on the performance metrics that are relevant for a specific application. However, due to thetendency to high speed protocols, they predict an increasing motivation to use pacing in future. Furthermore, theyshowed that in some cases, pacing was capable to improve the performance of both, paced and un-paced flows. As adrawback, the work highlights the unfairness among paced and non-paced flows in terms of bandwidth, as paced flowsdo not receive their fair share when competing with non-paced flows.We summarize the approaches that rely on TCP pacing in Table 2. It shows that pacing is applied on per-interface,per-flow, and per-task levels, but so far not on a per-application level. Although these approaches might provide fairnesson a per-flow level, they do not provides fairness between applications (10 connections opened by a web browser vs. 1connection for a file download) and it still remains unclear how this could be applied to UDP-based applications wherethere is no operating system support for TCP-style probing of the available throughput. This work aims at closing thegap of considering pacing from an application-centric perspective, i.e., to evaluate its feasibility for application-awarenetwork management. We investigate the conformance of actual rates and delays to the target values, which dictatesthe degree of granularity to which QoE can be controlled. As we will find that pacing constitutes a feasible method todo so, we present a proof-of-concept architecture for optimizing QoE fairness in a multi-application environment.

Manuscript submitted to ACM calable Application- and User-aware Resource Allocation in Enterprise Networks 11

Table 3. Overview of related works targeting multi-application QoE-awareness and their classification in terms of utility function,determination of QoE-aware resource shares, and allocating of determined resources. 0 denotes that no utility functions are appliedat all, + denotes that utility functions are applied mapping either AQoS or AQoS plus NQoS to QoE, ++ represents utility functionssolely relying on configurable network resources, e.g. bandwidth or PSNR in radio access networks.

Source Utility function Determine Allocate

Classi- Descriptionfication[21] + Mapping AQoS to QoE Not specified Generic control concepts[40] ++ Mapping NQoS to QoE Particle swarm optimization (PSO)based algorithm Applying the proposed algorithm to re-source block allocation technique in LTE[36] ++ Mapping NQoS to QoE Game theoretic approach Radio resource management applyingproposed game theoretic approach[30] + Mapping NQoS and AQoS toQoE Optimization based on multi-choiceknapsack problem (MCKP) Carrier scheduling applying proposed op-timization algorithm[11] ++ Mapping of network band-width to QoE Solving multi-objective optimiza-tion problem Joint subcarrier and power allocationscheme[12,13] 0 Application feedbackinstead of utility functions Not specified Admission Control, bandwidth guaran-tees[38] 0 Hypothetical utility func-tions mapping NQoS to QoE Proposed algorithm optimizingbandwidth allocation WFQ scheduling with QoE-optimizedweights[18] ++ Mapping screen resolutionand bitrate to SSIM Branch and bound algorithm to findoptimal set of video bitrates Video bitrate guidance for heteroge-neous clients[28] + Mapping bandwidth to an ar-bitrary fair share Novel Multi-Path Fair Allocation(MFAA) algorithm Enforced via pacing at the hosts

Several efforts have been made towards QoE-awareness in multi-application scenarios. Some relevant approaches aresummarized in Table 3. The first column denotes the investigated approaches. The remaining table columns representthe three challenges introduced beforehand: define , determine , and allocate . However, we found that none of the reviewedwork explicitly defines the required resources to obtain a certain Mean Opinion Score (MOS) , but they all utilize insome form utility functions that map application QoS (AQoS) and/or network QoS (NQoS) to express QoE. The MOSscale describes the experience of a user with the application on a scale of one to five where the scale is labeled with{Bad, Poor, Fair, Good, Excellent}. For that reason, we replaced in the table the define -step by a classification and ashort description of the applied utility function. In the following, when reporting on related work, we focus on howand which utility functions have been applied, how the appropriate resource shares are determined, and the appliedmethods to allocate the resources.BwE[28] introduces a global hierarchical top-down bandwidth allocation scheme used in Google’s internal networkfor distributed computing tasks. Bandwidth allocation is done via a function that maps bandwidth to a "relative priorityon an arbitrary, dimensionless measure of available fair share capacity". The BwE reference is important as it showsthat global and large-scale bandwidth allocation is indeed possible in production environments. But how to derive anallocation for end-user applications and how they benefit from it, is not discussed in BwE. In contrast, this paper athand focuses on end-user applications and the interplay with, and possibilities for, network control to guarantee aspecific user experience to the end users. https://lwn.net/Articles/564825/ ITU-T Recommendation P.800. Methods for objective and subjective assessment of quality Manuscript submitted to ACM

Manuscript submitted to ACM calable Application- and User-aware Resource Allocation in Enterprise Networks 13allocates respective weights to the flows. Simulation results show that the minimum utility can be increased significantly,while maintaining the same average utility in most of the cases, compared to a conventional max-min-fairness approach.[18] presents an SDN-based framework to support a fair video QoE for all clients within a shared network. Theutility function maps a client’s device resolution and bitrate to structural similarity (SSIM) [42]. Considering the currentnetwork capacity, a controller decides about the bitrate for each video client, so to provide a similar quality to each ofthem. The bitrates are communicated to the streaming clients, which in turn request the respective quality level fromthe video content server.The presented strategies are all steps towards QoE-awareness in multi-application systems. Some of the works relyon state of the art control mechanisms, but propose novel resource scheduling or allocation techniques. However, theapplied utility functions often depend on features, which cannot be influenced in a direct manner. As a result, thoseapproaches allow for a qualitative, less targeted QoE control. For example, a low video quality implies a low MOS value.Providing more bandwidth will enhance the playback quality and increase MOS, but it is not possible to quantify theimpact of providing a certain amount of bandwidth on MOS scale.We present an approach that allows to quantitatively map the NQoS parameters bandwidth and delay to MOS.Furthermore, we propose to apply network pacing, which allows us to control both, the bandwidth allocated to a flowand the end-to-end delay. Having utility functions which only rely on controllable parameters allows for a targeted,fine-grained QoE optimization.

The background on multi-application QoE architecture designs shows that previous proposals cannot combine accurateidentification of application- and user-aware resource demands with scalable resource allocation. In the following wepropose a new design considering the following aspects: a) Awareness of the active applications in the network and theirdemands, b) per-application resource allocation and c) per-application forwarding for delay- and capacity-constrainedrouting. Our design relies on a centralized control, whose major drawback is the introduction of a single point of failure(SPOF). Although not addressed in detail in our work, we would like to emphasize that several approaches exist forphysically distributed, but logically centralized SDN control planes, to overcome the SPOF problem. Proposed solutionseither rely on hierarchical [22] or flat organizations [41] and enable scaling the load among several controllers.Figure 2 illustrates the system design. A logically centralized network controller ( ) exerts control over the forwardingdevices, the data-plane , by applying network control ( ) through SDN protocols. SDN protocols, such as OpenFlow,enable per-flow routing by pushing simple match-action rules to SDN-enabled devices. Resources are allocated in thenetwork by pacing ( P ) the data-rate sent by traffic sources into the network ( ). Pacing is applied at the edges of thenetwork, i.e., at end-hosts such as clients and servers, or at gateways ( ). Software agents on the end-hosts ( ) allowthe network controller on the one side, to know about which applications access the network, and on the other side, toapply pacing to the applications ( ) at the host’s networking stack P . Applying pacing at the end-hosts’ networkingstacks can support tens of thousands of individual flows with low additional resource consumption for the host [37]. Allconversations in the network are subject to the pacing set by the network controller. If a conversation cannot be pacedat its networking stack, pacing can then be applied, for example, at the first hop in the network. Delay requirements arefulfilled by selection of appropriate links and target link utilizations. We describe how pacing is implemented in ourset-up as part of the experiment design in Section 6.2. Manuscript submitted to ACM

Data PlaneControl PlaneApplication PlaneNetwork Controller

Routing /

Load Balancing

End-Host Agents

SDN

AvailableResources

Network

Stack

P 7

Application P

42 45

Application

Performance

User

Feedback + Pacing Configuration

Application

Data/Packets

Fig. 2. Overall system design. A logically centralized network controller ( ) provides per-application delay-constrained routingand resource allocation. Applications are identified by software agents at the end-hosts ( ) and resources are allocated throughrestricting the total sending rate of applications at the hosts’ networking stacks ( , P ) and at the network edges ( ). Per-applicationrouting is implemented by using SDN protocols to push individual forwarding rules to the network devices ( ). Per-applicationdelay requirements are fulfilled by a careful selection of the flow path and target link utilizations. Some application flows transported on the network, such as system monitoring or building surveillance, have predictabletraffic patterns and determining a suitable data-rate to allocate is straightforward. Furthermore, periodic backgroundjobs, like backup data transfers, can be scheduled based on the approximate amount of data and the deadline forcompletion. However, for user-facing applications, the variety of demands is higher. Determining the appropriatepacing data-rate for such applications is challenging. It is insufficient to consider only the class of an application, e.g.,web browser, but also for what purpose the application is used. For example, modern web browsers are an executionenvironment for a variety of business applications, from employee and financial management to video streaming(

DASH , HTMLMediaElement ) and video conferencing (

WebRTC ). Hence, we distinguish between application classes andapplication intents . An intent can be specific, such as a video stream of a surveillance camera with specific encodersettings, or broad, such as general web browsing. A running application can also participate in multiple conversationswith different intents and conversation endpoints. Hence the resource demands of a conversation are defined by thetuple of (class, intent) .Identifying the intent of an application accurately enables the specification of precise application demands and theselection of suitable user experience models. Both are essential to implement predictable application performance andto improve accuracy in terms of QoE for the user. We argue that in an enterprise deployment, a holistic identification ofall classes and intents is infeasible. Therefore, we propose a hierarchy of intents as illustrated by Figure 3. The figureshows a possible enterprise intent hierarchy by example. At the root of the hierarchy, there is a default intent whichoffers basic guarantees in terms of throughput and delay to unidentified applications. The root intent is followed by theapplication classes such as video streaming and remote terminal work.Intents can be specified with an arbitrary hierarchy depth. If an application’s intent cannot be identified accurately,a higher-level intent can be selected. However, this comes with the cost that the allocated resources do not fit to thetargeted application performance. For example, the hierarchy in the figure specifies the two common voice codecs G.729and G.711 as sub-intent for desktop VoIP-phones. If the codec is known, e.g., based on the MAC/IP address of the phoneor from a database, the demand and user experience model are well-defined. If the conversation from the phone can

Manuscript submitted to ACM calable Application- and User-aware Resource Allocation in Enterprise Networks 15 default intent

Pacing rate: 1 Mbps max. delay: 200ms

Video Streaming Web Browsing

File Download

Remote Terminal

Voice-over-IP

Email AttachmentsInternet ProxyOn-boardingTutorials

Live

Announce-ments

SSH

Admin RemoteDesktop

G.729

SAP Intranet DesktopPhone

G.711

SoftPhone

Team

Viewer RDP

DASHLive

Raw forstorage YouTube etc. Wiki-pedia

CloudSyncbbb_live bbb science_lab

G.729.110mb

Fig. 3. Illustration of a possible hierarchy of application classes and intents. Classes and intents specify the utility function and userexperience model to use for the target utility calculation and resource allocation. There is a trade-off between predictable applicationQoE and the effort for the company to construct the hierarchy. If an application’s intent cannot be identified, fall-back rules can beapplied to select a higher-level intent with the cost of reduced QoE accuracy. Highlighted intents are part of the evaluation. only be identified as a desktop phone, one may define the highest known demand from all codecs. How to create such adeep hierarchy is out of scope of this paper. We restrict our hierarchy to the five classes and six intents as highlighted inbold in the figure. One can imagine that a combination of user-feedback and network/application monitoring, combinedvia machine-learning and some manual work, results in an accurate representation of the enterprise environment.

The question remains how applications can convey their class and intents to the network controller. We propose localagents ( ) at the end hosts as an interface between applications and network control. There are two basic options fromhere. The first one is to modify the applications to report their identity and intent(s) to the agent. This can be donewith a standardized API, for example through client or server extensions. The agent then forwards this informationto the network controller and waits for the controller to decide on the appropriate pacing rate to apply. The secondoption could be for the agent to monitor connection establishment or perform Deep Packet Inspect (DPI) and classifythe conversations by matching it to known endpoints, header fields, packet payloads, process names or function calls.Other techniques from the area of application performance management (APM), like code injection or tracing in theoperating system, could also be used.Without installing an agent, you could also capture packets of a conversation at the first hop, for example by using theSDN protocol OpenFlow and its packet_in feature , or by custom middle-boxes. However, in the network most traffic isencrypted and identification becomes difficult. There is no one-size-fits-all solution for how to identify applications andtheir intents as the available options depend on the specific enterprise environment. One can expect that a combinationof rule- and pattern-based matching and machine-learning reduces the required manual work to a minimum. Once the application class and intent of a conversation are identified, the controller looks up the utility function forthe (class, intent) tuple from a database. The utility function describes the relationship between demand, in terms ofminimum throughput and maximum delay, and benefit, in terms of utility. We define utility as a dimensionless unit in http://flowgrammable.org/sdn/openflow/message-layer/packetin/, last accessed: 11.10.2018 Manuscript submitted to ACM Table 4. Applications, Intents and Key Performance Indicators

Class Application Intent(s) Shorthand(s) KPI(s) QoE Model

Web Browsing Firefox, selenium science_lab WEB Page Load Time Egger et al. [10]File Download Python requests emailattach DL

Download Time Egger et al. [10]Video Streaming TAPAS [9] bbb , bbb_live VoD , Live

Average Quality custom

Remote Terminal SSHv2, paramiko sshadmin SSH Response Time Casas et al. [7]Voice-over-IP D-ITG [3] g729.1 VoIP

Delay, Loss, (+ Jitter) Sun et al. [39]the range of [ , ] , which describes the satisfaction of the user with the service. The subsequent Section 4 introducesthe utility functions in detail. Comparing the performance of different applications with conceptual different KPIs requires mapping functions to acommon scale. We denote the scale as user-aware utility scale and we define it with a dimensionless quantity in therange of [1, 5]. The utility functions then describe the relationship between the amount of resources allocated to anapplication and the resulting experience of the user with the application. In the following section we define the utilityfunctions for selected classes of applications and intents. First, we present the considered application classes, intents,and KPIs of the deployed implementations. Second, we discuss the selected user experience models from the literature.Third, we define the utility functions based on measurements and the user experience models.We consider five application classes: Web browsing, file download, video streaming, remote terminal work, andVoice-over-IP (VoIP) (Table 4).

Web browsing covers a wide range of use cases, as modern web standards facilitate themove from proprietary and platform-dependent software to responsive web applications running in the browser.

Filedownload is the batch-transfer of data the user is waiting for, such as an email attachment. Use cases for adaptive video streaming in the enterprise range from announcements to training videos, such as on-boarding lectures fornew employees. Depending on the purpose, both, video-on-demand as well as live transmissions, are conceivable. Inparticular major announcements are taxing for the infrastructure when viewed by a large fraction of the staff in a shorttime-frame.

Remote terminal work by secure shell access allows administrators to access the terminals of servers, hosts,and switches from anywhere. The application class

VoIP includes office phones, conferencing by software or in thebrowser, and VoIP applications on smartphones. We denote the combination between an application class and intent asapplication type and use the types

WEB , DL , VoIP , Live , SSH and

VoIP as shorthands for the investigated combinationsof application classes and intents.

Next we discuss the implementations, KPIs, and intents per application class in detail. KPIs in parentheses in Table 4are not inputs for the user experience models, but are part of the evaluation in this paper.

For remote terminal work we define the intent of an administrator typing commandsover a Secure Shell (SSH) connection. An automated SSH client enters commands and measures the duration until theoutput of the command appears in the terminal. Only commands which require minimal processing on the server-side,e.g., uptime and date , are entered. The SSH connection is established before the start of the experiment. OpenSSH 7.2 isused as server implementation on Ubuntu 16.04.4 LTS systems. Client-side automation is implemented using paramiko . calable Application- and User-aware Resource Allocation in Enterprise Networks 17 File download is the batch transfer of a chunk of data over one TCP connection.As intent we define emailattach , a file with random content and a size of 10 MB, which is placed on an HTTP serverfor download. In an enterprise environment this intent could represent the maximum size of email attachments. Thedownload is implemented using a short Python script and the requests library. As KPI, the script measures the durationfrom when the GET request is sent, up to the last received Byte.Web browsing is implemented using Firefox in version 58.0.2 automated with selenium . The settings are left to thedefault state and the cache is cleared after every page view. The number of parallel connections is limited to six perserver and HTTP pipelining is not supported anymore by recent Firefox versions. The connections are configured to bepersistent between requests. The browser interface is disabled (headless mode) and no page rendering is performed inthe experiments to minimize the influence of system load and deployed testbed hardware.This is a scenario where a limited number of browser-based business applications are used frequently and/or all webbrowsing sessions are tunneled through an enterprise proxy. With proxies, connections can be persistent even whenrequesting content from different domains. General web browsing, where multiple domains are involved without proxy,is not represented well by assuming persistent connections. This is due to the fact that connection establishment cansignificantly influence the page load time for longer transport delays. We define the KPI for one web browsing requestas the duration from the initial GET request to the time all embedded resources are received ( page load time ). For webbrowsing we define the intent science_lab . The science_lab template is a web-site with 22 objects with a total size ofabout 1.3 MB. HTTP adaptive video streaming is implemented using the TAPAS[9] DASH player.The conventional [29] bit-rate adaptation strategy is selected. We consider one video view as one request and select theaverage quality level of all downloaded segments as KPI. We define the intent bbb for on-demand video streaming. Forthis intent, we encode the open-source movie Big Buck Bunny in six quality levels with average bit-rates of 486 Kbps,944 Kbps, 1389 Kbps, 1847 Kbps, 2291 Kbps, and 2750 Kbps. Only the first 60 s of the movie are selected and segmentedinto 15 chunks of 4 s each. The playback buffer is configured with a maximum size of 60 sAdditionally, we define the live-streaming intent bbb_live where the chunk size is reduced to 1 s and the buffer islimited to 10 s. Due to encoding overhead for the shorter chunk duration, the bit-rates increase to 572 Kbps, 1103 Kbps,1625 Kbps, 2145 Kbps, 2660 Kbps, and 3172 Kbps.

We emulate VoIP traffic using the Distributed Internet Traffic Generator (D-ITG) by Botta et al. [3].D-ITG reproduces the inter departure-times and packet sizes of VoIP traffic and measures the KPIs jitter, packet loss,and delay of the resulting UDP packet stream. We define the intent

G.729.1 for VoIP and configure D-ITG to emulateRTP VoIP calls with the codec G.729.1. In this configuration, a constant bit-rate stream with 50 packets per second isgenerated with a packet size of about 20 Bytes ( ≈ We define the current utility value of an application as an estimation of the instantaneous satisfaction of a user with theinteraction with the application. The relationship between KPIs and user experience has to be determined throughsubjective studies, either directly by conducting dedicated laboratory, field, or crowd-sourcing studies, or indirectly bymeasuring user-relevant success metrics such as task completion times. We denote this relationship as M : KPI (cid:55)→ Utility. . . . t [s]12345 M ( SS H ) ( t ) MOS (a) Terminal Work ( M (SSH) ) t [s]12345 M ( W EB ) ( t ) (b) Web Browsing ( M (WEB) ) t [s]12345 M ( D L ) ( t ) (c) File Download (10M) ( M (DL) ) Fig. 4. Utility from application KPIs ( M : KPI (cid:55)→ Utility) derived from subjective study results scaled to range [ , ] . M (SSH) is derivedfrom subjective study [7, Fig. 5 (a)] by Casas et al. Plus signs indicate the MOS data points as collected by the authors in the study.Web and file download utility values are derived from subjective user studies in [10] by Egger et al. In case there is a suitable QoE Mean Opinion Score (MOS) model available for the application based on subjectivestudies, we take a scaled version of the MOS model for M . Thus, the utility functions are based on the average userexperience of the test subjects in the referenced studies. However, the range of some user experience models does notreach up to 5.0 (Excellent). In those cases, we define the M by scaling up the experience model to [ , ] . If no model isavailable, we define M based on hand-picked application KPIs.QoE is an active area of research and holistic models do not exist yet for most applications. There could be alternativesor more complex models available for the selected user experience models. Furthermore, custom enterprise applicationsmight require custom user experience studies. In any case, the presented system design and findings of this paper areindependent of the concrete deployed user experience models. Therefore, the selected models in this work should beseen as rough approximations of the true underlying user experience. We piece-wise interpolate M for remote typing from the results presented in [7, Fig. 5(a)].There, Casas et al. study the QoE of remote desktop services for different use cases. For the investigated typing use case,the test subjects were asked to type a short text on a text processor in a remote desktop session. The higher the delay inthe network, the longer the user has to wait until his actions, e.g., typing a character or deleting character, appear onthe screen. The delay until the actions result in visual feedback is denoted as response time and we choose it as the KPIfor remote terminal work. Figure 4(a) illustrates the piece-wise interpolated model based on the presented opinionscores in [7]. The authors only investigated response time values up to 0.5 s. We linearly extrapolate the results up to1.2 s where the utility reaches 1. We define M as M (SSH) ( t ) : = MOS (SSH) ( t ) − ) · . + [ , ] . Egger et al. [10] propose models for the user experience of web browsing andfile downloads based on subjective user studies. The web browsing model uses the page load time as KPI. For thefile download, the download time of a 10 MB file is used as KPI. The MOS value for web browsing is proposed as

MOS (WEB) ( t ) : = − . · ln ( t ) + .

72. For the file download,

MOS (DL) ( t ) : = − . · ln ( t ) + . Excellent ) to 4 (

Good ).With additional 4.6 s waiting time, the MOS decreases to 3 (

Fair ). After a total waiting time of 20 s, the score rangesbetween

Poor and

Bad . For web downloads (Figure 4(c)), the users are more willing to accept longer waiting times. For

Manuscript submitted to ACM calable Application- and User-aware Resource Allocation in Enterprise Networks 19

Measurement Set-up

Throughput [ , ] Kbps D e l a y [ , ] m s KPI M C S P + delay E e.g. response time Measurements KPI Utility Function

U User

ExperienceModel

Emulator Pacer

Domain of 𝑈 Throughput D e l a y U tilit y Fig. 5. Utility functions for (class, intent) are generated by first defining a measurement domain in terms of throughput and delay.Second, the domain is quantized and the application KPIs are measured in an emulated network environment using the quantizedparameters for throughput and delay. Third, user experience models are used to derive the utility for the measured parameters. example it takes a waiting time of 28 s for the opinion score to decrease to 4. We use the

MOS (DL) model as proposed bythe authors as M with M (DL) ( t ) : = MOS (DL) ( t ) . M (WEB) we define as M (WEB) ( t ) : = ( MOS (WEB) ( t ) − ) · . + The user experience during an adaptive video streaming session depends on factorssuch as average presented quality, number and amplitude of quality switches, frequency and duration of stalling events,device’s screen size, viewing environment, user expectation, encoding, adaptation strategy, and content type [24]. Tothe best of our knowledge there is no holistic model for the user experience of adaptive streaming available at themoment. One option for enterprises is to create custom models, for example for onboarding videos for new employees.Studies show the average quality as a dominant influence factor [25] for the user QoE. We therefore assign a utilityvalue to a streaming application based on the observed average quality q (avg) and the maximum and minimum qualitylevel, q (max) and q (min) . The utility value is then determined by M (HAS) ( q (avg) ) : = q (avg) − q (min) q (max) − q (min) · + Sun et al. [39] propose a model for the MOS of VoIP depending on the used audio codec and a user’sinteractivity, i.e., whether the user is only listening or also conferencing. The MOS value is presented as polynomialequation with constants a to j and with packet loss ratio and delay as input parameters. The constants depend on theused codec. We configure D-ITG to emulate G.729. The MOS model MOS (VoIP) ( loss , delay ) is then described by Eq. 10and Table II in [39]. We define the M accordingly as M (VoIP) ( loss , delay ) : = MOS (VoIP) ( loss , delay ) − ) · . + The utility function U a : (Throughput [Kbps], Delay [ms]) (cid:55)→ [ , ] approximates the QoE-aware utility for a specificapplication type a for a unidirectional pacing rates and maximum delay threshold using the utility model. Hence, thefunction solves the problem of linking network resource demands with the resulting user experience. The hereinafterdescribed methodology for constructing the utility functions can be applied in an automated fashion to any enterpriseapplication and its intents.Figure 5 illustrates the process of constructing the utility functions. A set-up measures the utility of each applicationand intent for different pacing rates and delays in an isolated environment. Two hosts (Host S and Host C) are connectedthrough a network emulator. On the emulator, Linux netem is adding delay to all packets passing through it. HostS is running the server endpoint of the application, e.g., in case of web browsing an HTTP web server. The clientendpoint is assigned to Host C, e.g. the web browser. Host S egress traffic is paced using the cfg queuing discipline(Section 6.2). From the measurements we derive the 2-dimensional utility functions. Note that to account for asymmetricdata-rates in a conversation, which is the case for the most server-client traffic such as web traffic, the two directions Manuscript submitted to ACM D e l a y [ m s ] (a) DL (b) WEB (c)

VoD . . . . . . . . . U tilit y (d) Live

Fig. 6. Utility functions U a : (Throughput, Delay) (cid:55)→ [ , ] which map throughput and delay to utility for the application classes filedownload, web browsing and video streaming and intents defined in Table 4. of a conversation have to be described by different utility functions. For the sake of simplicity, we consider only onedirection per conversation as constrained and only present the server-to-client utility functions. For the throughput,we measure DL in the range of [ , ] Kbps,

WEB in the range of [ , ] Kbps,

VoD and

Live in the range of [ , ] Kbps and

VoIP and

SSH in the range of [ , ] Kbps. For the delay, we measure

WEB , DL , VoD , Live in therange of [ , ] ms and VoIP and

SSH in the range of [ , ] ms. The maximum pacing rate per intent is set so thatfurther increasing the pacing rate does not improve the utility for any delay demand.Figure 6 presents the measurement results for the utility of the applications depending on delay and throughput. Theintersections of the grid indicate the quantization as used by the resource allocation problem formulation. The figureshows that DL , WEB , and

VoD are highly dependent on the throughput and only a minor dependency on delay is visible.

Live depends on delay and throughput.

SSH (not shown) depends solely on the delay. For DL (Fig. 6(a)), the impact ofthe delay is limited to the TCP handshake, the file request and acknowledgements packets. The impact is insignificantcompared to the download time and not visible on the figure. For VoD , the impact of delay depends additionally onthe number and playtime duration of video segments and the adaptation strategy. As illustrated by Figure 6(c), theinfluence of delay for the intent

VoD is minor. For

Live there is a clear influence of delay on the utility (Fig. 6(d)). For

SSH , the delay is the important influence factor, as every typed character triggers an outgoing packet and requires animmediate response packet. As we use persistent HTTP connections for web browsing, there is no influence of thedelay on the

WEB utility due to the TCP handshake. The influence of the delay is limited to the requests of the HTMLindex object and the embedded resources (Fig. 6(b)).The maximum utility values an application can reach in the measurements are determined by implementation-specificfactors and the domain and range of the utility function. For example

WEB is limited by the browser processing timeand

VoD / Live depend on the behavior of the adaptation algorithm.

SSH can reach the highest utility of 5 with 100 Kbpsthroughput and 0 ms delay.

VoIP can reach 5 with 100.0 Kbps and 34.5 ms. For

WEB the highest utility is 4.5 with11589.7 Kbps and 33.1 ms delay.

VoD can reach its highest utility of 4.9 with 3479.3 Kbps and 41.4 ms.

Live can reach 4.8with 5000.0 Kbps and 41.4 ms. DL can reach a utility of 4.8 with 5000 Kbps and 99.3 ms delay. The network controller performs the calculation of the shares to be allocated based on the number of applications, theirutility functions, the network topology, current network status, and fairness criteria. For this paper we define the utility

Manuscript submitted to ACM calable Application- and User-aware Resource Allocation in Enterprise Networks 21fairness criteria as follows. We first try to maximize the minimum utility over all applications ( max-min-fairness ) andafterwards maximize the sum of utilities while allowing a small decrease in minimum utility. The complete allocationformulation is introduced in Appendix A.We formulate the problem as a Mixed Integer Linear Program (MILP). The objective of the MILP in the first step isto maximize the minimal utility value θ ( min ) over all applications. In the second step the MILP maximizes the sum ofall utility values, while the minimum utility θ ( min,2 ) is restricted to the range θ ( min,2 ) ∈ [( θ ( min ) − ϵ ) , θ ( min,1 ) ] with ϵ = .

3. The MILP has to consider the two-dimensional utility function of every application, the capacities of all pathsbetween application endpoints, and the delay at intermediate hops depending on the link utilization. The decisionvariables describe which pacing rate to apply to which application and how to configure the routing between applicationendpoints.We allocate a specific data-rate per application. Hence, we do not consider how much data-rate is actually consumedby an application. On the one side, static allocation via application pacing can guarantee predictable applicationperformance as this work shows in the experiments. But on the other side, there is no statistical multiplex gain in casethe applications use less resources than allocated to them. As a consequence, the network may be under-provisionedand available resources are potentially not made available to other applications.In this work we configure all applications in the experiments to constantly use the link which makes the numberof active applications equal to the total number of applications on a given link. That puts the most stress on the linkfor a given number and mix of applications. Reducing the activity of an application would be equivalent to reducingthe number of simultaneously active applications on the link. But existing research on Internet traffic and congestioncan be leveraged by future work, e.g., to overprovision the links based on the actual number of active applications at agiven point in time. A , a ∈ A is the set of all unidirectional application flows a . We define Λ ( a ) as the target utility value of an applicationflow a . In the first step we maximize the minimum utility value ( max-min fairness ) subject to all application utilitieshave to be larger than the minimum utility value θ ( min ) .maximize: θ ( min ) subject to: Λ ( a ) ≥ θ ( min ) ∀ a ∈ A and (7) - (21) in appendix A2 - A6.We denote the optimal value of θ ( min ) of the first step as θ ( min,1 ) . A full definition of all symbols is provided in theappendix A. In the second step we relax the max-min constraint by ϵ and maximize the sum of all target utility values. We add theadditional constraint to bound θ ( min ) by θ ( min,1 ) − ϵ = . Manuscript submitted to ACM (cid:213) a ∈A Λ ( a ) subject to: θ ( min ) ≥ θ ( min,1 ) − ϵ and (7) - (21) in appendix A2 - A6.For the remainder of the paper, if not otherwise stated, θ ( min ) denotes the optimal value of the second step ( θ ( min,2 ) ).The complete formulation of the problem can be found in Appendix A. The objective of the experiments is to show the dependability and scalability of resource allocation via end-host pacingand how the different application classes profit and/or suffer from the enforced packet pacing. The experiments areconducted in a set-up where we monitor sets of increasing number of parallel applications sharing a throughput-constrained link. For each set of applications we measure the utility with and without resource allocation and discussthe differences in the evaluation. Dynamic embedding of applications at run-time and additional intents are out ofscope of this evaluation. Next, we elaborate on the deployed experimental set-up (Section 6.1) and the custom pacingimplementation (Section 6.2). Afterwards, we discuss the experiment parameters (Section 6.3). The results of theevaluation are presented in the subsequent Section 7.

Figure 7 illustrates the experiment set-up, consisting of two groups of hosts: one server ( ) and one client group ( ).The link between the two groups is throughput-constrained and the applications running on the host groups haveto share the limited bandwidth. The network consists of two switches, one SDN-enabled Pica8 P-3290 ( ) and oneunmanaged off-the-shelf 100 Mbps switch ( ). The link between the two switches constrains the available data-ratebetween the hosts on the left and on the right side to 100 Mbps. The Pica8 switch is equipped with a maximum queuesize of 1 MB and maximum queuing delay of about 80 ms towards the 100 Mbps link. We deploy three modern desktopPCs on each side to meet the processing and memory resources required by the experiment scenarios.Each application consists of a server and client endpoint, e.g., a web server and a browser. All endpoints are confinedto a separate network namespace ( ) and connected via virtual interfaces and a software bridge to the host’s physicalinterface ( ). Each namespace is configured with a unique IP and MAC address. Furthermore, every client is connectedto an exclusive server application. That way, the pacing rate can be set per namespace and no further control is neededto assign outgoing server packets to different pacers. In case of web browsing, video streaming, and web download,each client is assigned to an exclusive light-weight HTTP server, but with shared content. The server endpoints areplaced left of the bottleneck and the client endpoints to the right of the bottleneck, which makes the egress queue andinterface of the Pica8 the bottleneck. Pacers ( P ) based on our cfq implementation (Section 6.2) restrict the egress rate ofthe namespaces/applications towards the hosts’ software bridges.All management and monitoring operations are performed out-of-band. The KPIs of each application are measuredat the client endpoint, e.g., the page load time at the browser, and reported to the network controller by the applications’agents ( ). Additionally, we frequently poll the statistics counters of all physical and virtual network interfaces tomeasure throughput, queue length and packet loss. Manuscript submitted to ACM calable Application- and User-aware Resource Allocation in Enterprise Networks 23

Bottleneck

Application

Network Controller

Agent P Server Host(s) Client Host(s)

Application

Pica8 Off-the-shelf

Agent

OpenFlowControl PacingPacing

Apps. Apps.

Network Namespaces Fig. 7. Experimental set-up. Two groups of hosts, one server and one client group, are connected via an SDN-capable switch and anunmanaged 100 Mbps link to each other. A network controller calculates fair shares, configures the pacers, and collects statistics.

In Linux, pacing is implemented as a queuing discipline . Furthermore, a mechanism called TCP small queues exertsbackpressure on the applications to mitigate buffer bloat and packet loss by limiting the allowed number of Bytes perflow in the queuing discipline and device queue (default: 128 Kilobytes). Other operation systems offer similar pacingmechanisms. We implemented a custom queuing discipline based on the existing Fair Queuing ( fq ) discipline , referredto as Custom Fair Queuing ( cfq ). Every conversation defined by (class, intent) and by one or multiple sockets, can beassigned to an exclusive queue with a target packet release rate as configured by the network controller through thelocal agent. Packets from the queues are released time-based. The departure time of the next packet time _ next _ packet is determined by the current time now , the size of the current packet pkt _ len and the target pacing rate tarдet _ rate : time _ next _ packet = now + pkt _ lentarдet _ rate The parameter space of the experiments is limited to the number and types of the applications and whether theexperiment is managed or best effort . In detail, the bottleneck link is shared by { , , .., } applications per class, intotal |A| ∈ { , .., } . For video streaming, half of the applications are of type Live and the other half of

VoD .At the start of the experiment, a configuration file is pushed to each host telling the host which number and type ofapplications to start. Each application is modified to start in its own network namespace and to report its type to thehost-local agent ( in Fig. 7). The local agent forwards this information to the network controller. Once all applicationsare registered with the network controller, the controller calculates the resource shares of utility for each applicationand pushes the corresponding static pacing rates to the agents. The agents configure the pacers of the applications’network namespaces accordingly. The pacing rate is not changed during an experiment run. The SDN-enabled Pica8switch is configured via OpenFlow for simple forwarding. Besides the forwarding rule configuration, the OpenFlowconnection is used to poll queue and interface statistics.The duration of one experiment run is 15 minutes with an additional 1 minute warm-up and cool-down phase. Theapplications are started at random times during the warm-up phase and requests during the warm-up or cool-downphase are discarded for the evaluation. Each experiment is repeated 11 times. If an application’s request is finished, itinitiates a new request after a pause time of 100 ms. One request equals one video view for VoD and

Live . For

VoIP ,one request equals one 30 s phone call. The reason for the static pause time of 100 ms is that this results in an almostconstant number of concurrent applications using the bottleneck link. Hence, each application in a specific scenario is https://lwn.net/Articles/506237/, Accessed: 2018-10-12 https://lwn.net/Articles/564825/, Accessed: 2018-10-12 Manuscript submitted to ACM Cubic is configured as TCP congestion control algorithm. Cubis is chosenas comparison as it shows better performance on congested links compared to Compound and New Reno TCP [1] andit is the default algorithm for many Linux server variants. BBR congestion control proposed by Google fails to showperformance benefits and fairness in heterogeneous environments [23] compared to Cubic.There exist valid optimal solutions to the allocation problem formulation with applications of the same type to beassigned different utility values. For easier presentation of the results, we constrain the problem formulation to chooseone utility value per type. The bottleneck link is modeled with a capacity of 100 Mbps. As the sum of all paced flowrates does not exceed the available capacity, and due to the short pause times between application requests, the linkin the managed case is slightly under-provisioned. Thus, a large queue build-up is unlikely and the link delay of thebottleneck is modeled with a constant delay of 2 ms. In the best effort case, the link is already over-utilized with 10competing applications and experiences 58 ms delay and 0.5 % packet loss (discussed later in Section 7.3).We provide details on how the experiment setup is expressed in the terms of the variables of the theoretical problemformulation in Appendix B.

We evaluate the performance of an increasing number of applications sharing a throughput-constrained link with andwithout data-rate management. The evaluation is pursuing the following questions. i) How does the minimum andaverage utility of the applications compare between the managed and best effort scenarios? ii) Which applications benefit,which utility values are decreased, and why? iii) Can pacing result in configurable and thus predictable applicationperformance in terms of the difference between the target and the measured utility? iv) How fair, in terms of utility, arethe best effort and the managed utility distribution?First, we evaluate how the available data-rate is distributed among the applications in a best effort scenario andpresent the resulting utility distribution. Second, we solve the allocation formulation for the scenario, implement staticpacing in the set-up for each application and present the gains in terms of utility. Third, we present how pacing affectsthe QoS parameters, such as packet loss and jitter, of the link. Fourth, we conduct a parameter study on the number ofparallel applications and show how the gains and fairness changes with increasing number of parallel applications.Error bars in the result figures indicate the standard deviation if not otherwise stated. In cases the error bars are notclearly visible on the presented scale, they are omitted from the figures.

First, we take a close look at the best effort application performance for single scenario with 16 clients per applicationclass, 16 × WEB , 16 × DL , 16 × SSH , 16 × VoIP , 8 × VoD , 8 × Live , in total |A| =

80. The scenario with 80 applicationsis selected for a closer inspection due to the fact that among the investigated application counts (from 10 up to 120applications), one of the highest gains is observed here. With the 80 applications competing for the bandwidth, the linkis fully utilized resulting in an average packet loss of 4 % and queuing delay of 80 ms.Figure 8(a) presents the CDFs of the average throughput and Figure 8(b) the CDFs of the utility values of all requestsper application type. Multiple observations can be made from the figures. First, the throughput as well as the utility isdistributed non-uniformly between the application types. For example, while

WEB enjoys high throughput and utility(median ≥ . Live ’s achieved throughput is less than 1 Mbps and median utility is about 1.4.

WEB ’s Manuscript submitted to ACM calable Application- and User-aware Resource Allocation in Enterprise Networks 25 . . . . . . C D F WEBDLVoDLiveSSH/VoIP (a) Throughput . . . . . C D F VoIPWEB SSHDLLive VoD (b) Utility

Fig. 8. Best effort throughput and utility of the different application types for 16 clients per application class. The markers at the topare for better visual indication of the application types. high throughput is due to the use of multiple persistent parallel TCP connections, while video streaming clients, DL ,and SSH establish only one TCP connection. Parallel TCP connections allow an application to receive a proportionallarger fraction of the available throughput. As web download has no idle periods during the download, web downloadexhibits a higher average throughput than video streaming.Second, even

VoD and

Live , which belong to the same application class (video streaming) and achieve similarthroughput rates, suffer from unfair utility distribution (1.3 vs. 2.1). This is due to the smaller playback buffer for livestreaming and the increased encoding overhead for the shorter video chunks. Third, the average throughput of

SSH andVoice-over-IP (

VoIP ) is below 100 Kbps, while the utility is 3.7 and 4.9, respectively.

SSH ’s performance is influenced bydelay, caused by queuing at the bottleneck link, and retransmissions, due to lost packets when the bottleneck’s queue isoverflowing.

VoIP is barely influenced in this scenario, as the maximum delay and packet loss over the single bottleneckis acceptable for

VoIP traffic according to the user experience model. Details on the performance of

VoIP is given inSection 7.3. Fourth, the utility distributions per application type are varying with a standard deviation of 0.2 (

WEB )to 0.5 ( DL ), with the exception of VoIP . Hence, application performance is not consistent across requests of the sameapplication type, and, as a consequence, there is an unfair distribution of shares, even within the same application type.In summary, best effort delivery is inadequate to provide fair and consistent application performance for multipleapplications sharing a constrained link. Best effort delivery does not consider different demands (throughput vs. delay-sensitivity), transport protocols (TCP vs. UDP), or multiple flows per application. Furthermore, the constrained link isoverloaded, resulting in lost packets and queuing delay.

Next, we solve the allocation problem formulation with the max-min fairness criteria for the scenario with 80 parallelapplications and apply the calculated pacing rates. Figures 9(a) to 9(f) illustrate the best effort (solid lines) and managedutility (dashed lines) for the scenario with 16 clients per application class. Improvements in median utility due to thedata-rate management are indicated by ( → , + ). Deteriorations are shown by ( ← , − ). The target utility per applicationtype, as calculated by the allocation formulation, is indicated by ( | , ⋆ ).The figures (a) to (f) show that all application types, except WEB and

VoIP , profit from the management.

Live benefitsmost from the management, with a median increase of 3.1 (from 1.3 to 4.4).

VoD , SSH and DL ’s median utility improveby 2.0, 1.0, and 0.4, respectively. On the other hand, WEB ’s median utility decreases by 1.3 (from 3.8 to 2.5). With

Manuscript submitted to ACM . . . . . . C D F + (a) VoD . . . . . . C D F + (b) Live . . . . . . C D F + (c) SSH . . . . . . C D F - (d) WEB . . . . . . C D F + (e) DL . . . . . . C D F + (f) VoIP . . . . . . . . . . C D F VoDLiveDL (g) Best effort standard deviation

Fig. 9. Figures (a) to (f) show measured best effort and managed application utility for |A | = applications sharing the constrainedlink. The dashed lines indicate the utility CDF for the managed scenario, the solid lines the best effort scenario. The star and verticalline mark the target utility Λ for the application type. Arrows to the right highlight the improvement in median utility. Figure (g)show the standard deviations of a client’s utility values per application type. pacing WEB can not get an unfair advantage over the other applications by using multiple parallel TCP connections.No noteworthy improvement or deterioration in utility is measurable for

VoIP . Live (b) exhibits a deviation of about 0.5 between the target and measured utility. The deviation is the result ofan inaccuracy in the live streaming utility function. The samples collected from the utility measurement setup aresupplemented with interpolated values to build the quantized utility function. In the case of live streaming and low delayvalues, the interpolation results in a utility error of about 0.5. The error can be reduced by collecting more measurementsamples from the throughput-delay parameter space and/or fine-tuning the interpolation algorithm.Figure 9(g) presents the standard deviations per client of a specific type for the best effort scenario. The smaller thestandard deviation is, the more consistent is the experience of a single user. The dashed vertical line indicates themaximum ( = .

05) of the standard deviations in the managed case (per type CDFs are not shown for the managed case). DL ( ⃝ ) clients exhibit the largest median standard variation (0.64) among the application types, followed by SSH ( △ )with 0.41. WEB ( ⋄ ) clients’ median variation is the second smallest with 0.25. There is no visible variation for VoIP ( ◦ ).The figure also shows that not only the utility value per client request varies, but also the behavior of each client. Forexample for VoD ( ◁ ), the standard deviation varies between 0.1 and 0.43. Hence, some clients experience a smallerquality variation for their video views than other clients. Next, we take a closer look at the QoS metrics of the constrained link in terms of packet loss, queuing delay and jitterfor an increasing number of parallel applications. In the best effort case we expect the link QoS parameters to degrade

Manuscript submitted to ACM calable Application- and User-aware Resource Allocation in Enterprise Networks 27

20 40 60 80 100 120 P ac k e t L o ss [ % ] ManagedBest Effort (a) Packet Loss

20 40 60 80 100 120 D e l a y [ m s ] ManagedBest Effort (b) RTT

20 40 60 80 100 120 J itt e r[ m s ] ManagedBest Effort (c) Jitter

Fig. 10. Quality of Service metrics of the constrained link in terms of packet loss, delay and jitter for increasing number of applications( |A | ) as recorded by the VoIP clients. The dashed lines without markers indicate the 95th percentile. The markers indicate themanaged ( □ ) and best effort ( ◦ ) median values. Without data-rate management the queue at the bottleneck is overflowing quicklyeven at low numbers of parallel applications and thus causing packet loss and delay. because the link is fully saturated and the interface queue is overflowing. In the managed case we do not expect anydegradation as the level of link saturation is managed. As the MOS and utility functions of VoIP are based on the QoSmetrics, we also discuss why the QoS metrics have only minor influence on the

VoIP performance in the evaluation.Figure 10(a) shows the median packet loss as measured by the VoIP clients during a call for 10 to 120 parallelapplications. The dashed lines indicate the 95th percentile. The figure shows that there is no packet loss for theinvestigated number of applications in the managed experiments. In the best effort experiments, the packet lossincreases linearly from 0.5 % to 7.1 % (0.9 % to 8.1 % for the 95th percentile).Figure 10(b) shows the Round-Trip-Time (RTT). Note that the client-to-server flow direction of the constrained linkis only lightly utilized and therefore, the given RTT approximates the one-way delay experienced by the applications. Inthe best effort case, the delay increases roughly logarithmic from 53 ms for 10 applications and saturates for 70 parallelapplications at 79 ms. The 95th percentile shows that even with 10 parallel applications the experienced RTT is in 5 % ofthe cases already greater than 79 ms. In the managed experiments the measured RTT increases linearly from 0.9 ms to1.1 ms (1.5 ms to 2.4 ms).Figure 10(c) shows the median and 95th percentile of the average jitter as measured by the VoIP clients during acall. In general, the figure shows that in the best effort case the jitter decreases for increasing application count, whilefor the managed experiments the jitter increases. The decrease in jitter in the best effort case shows that due to thelink saturation, there are almost constant inter-arrival times of packets. The high link utilization results in a full linkqueue and packets are processed at line-rate by the switch’s outgoing interface. In the managed case, the arrivals of themultiplexed requests of the clients result in minor RTT variations, but even for 120 applications the 95th percentile ofthe jitter stays below 0.9 ms.As there are no retransmissions for VoIP, the maximum delay for the successful transmission of a voice sample is about80 ms in our set-up. For 8 % packet loss and 80 ms delay, the utility for VoIP is estimated as 4.9 ( U V OI P ( , . ) = . U V OI P , there is a maximum utility difference of 0.1 in the set-up (5 - 4.9).In summary, data-rate management significantly improves the QoS metrics of the constrained link. There is nopacket loss, the RTT stays in most cases far below 2.5 ms and the jitter is at least halved. Regarding the influence of theQoS metrics on the VoIP utility, the VoIP clients in combination with the selected audio codec are marginally affected

Manuscript submitted to ACM

10 30 50 70 90 110 . . . . . . . . . U tilit y (a) VoD

10 30 50 70 90 110 . . . . . . . . . U tilit y (b) Live

10 30 50 70 90 110 . . . . . . . . . U tilit y ManagedBest Effort (c)

SSH

10 30 50 70 90 110 . . . . . . . . . U tilit y (d) WEB

10 30 50 70 90 110 . . . . . . . . . U tilit y ManagedBest Effort (e) DL

10 30 50 70 90 110 − . − . . . . . . . . . U tilit y G a i n WEB Live VoD SSHDL (f) Gain for all types

10 30 50 70 90 110 . . . . . . . . . M i n i m u m U tilit y ManagedBest Effort θ ( min ) (g) min Fig. 11. Comparison of managed measurements ( □ ), target utility ( × ) and best effort ( ◦ ) measurements per application type forincreasing number of applications sharing the constrained link. In Figure (g), the crosses and the dashed line indicate the solutionto the allocation formulation ( θ ( min ) ) over all types. Figure (f) summarizes differences in measured utility between best effort andmanaged. Results are shown over mean of the 10 % tail of all requests of an application. by the unmanaged link degradation. However, one can imagine how applications with stricter QoS requirements orVoIP calls with longer network paths profit from the QoS improvements. Figure 11 illustrates the gain in utility per application type for increasing number of simultaneous applications. Resultsare shown as the mean of the 10th percentiles of the utility values over all requests of an application. The 10 % tail assummary metric is chosen to allow for a small budget of random error compared to the minimal utility over all requests,e.g., for random delays in processing on the experiment PCs or requests which take longer due to rare latency spikes inthe network. Hence, on average 90 % of the requests of a client result in a utility equal or better than the given value.Figures (a) to (e) present the findings per application type. The application class VoIP is omitted as there is nosignificant difference between the managed and best effort scenario. Figure 11(f) summarizes the difference in utilityper application type between the managed and best effort experiments. Application types with a positive difference (tophalf of the figure) profit from management. The performance of application types with negative differences deteriorate.The following general observations can be made based on the figures.First, the utility for all shown types decreases with increasing number of applications in the best effort case. This isexpected as with increasing |A| more flows compete for the scarce constrained link capacity. In the managed case, only DL and WEB exhibit an equivalent degradation in utility.

VoD , Live , and

SSH on the other hand can sustain a high utilityin the managed experiments even while the number of competing flows increases. Second, for |A| <

40 the potentialgain is low as the available capacity is sufficient to reach close to maximum utility for all applications in the managed

Manuscript submitted to ACM calable Application- and User-aware Resource Allocation in Enterprise Networks 29and best effort cases. Third, the performance of

WEB deteriorates while all other classes (except

VoIP ) profit for most ofthe evaluated values of |A| . Fourth, the minimum utility over all applications ( θ ( min ) ) in the managed case is mostlydetermined by WEB and

Live . The minimum for the best effort case is mostly dictated by

SSH for |A| <

30 and by

Live for |A| ≥

30. Fifth, the measurements from the managed scenario deviate less than 0.5 from the target utility asdetermined by the solution to the optimization formulation for all application types. For DL , WEB , VoD , and

SSH thedeviation is even less than 0.2 for the investigated number of parallel applications. Hence, data-rate management leadsto predictability of application performance. Furthermore, the results show that pacing can implement the output of theallocation optimization formulation accurately.Next, we investigate the measurement results for each application type in detail. For

VoD (Fig. 11(a)), the utilitydecreases approximately linear with an increasing number of parallel applications for |A| >

30. For |A| ≤

30, besteffort management is sufficient to provide a utility of 4.5 or higher. With data-rate management, the fairness formulationcan allocate enough resources to the

VoD clients to sustain a high utility value even for up to |A| = |A| =

120 the utility gain is about 3.1. For

Live (Fig. 11(b)), the figure shows that the utility decreases rapidly withoutdata-rate management. There, data-rate management is most effective at 60 to 70 parallel applications where the increaseis up to 3.4. In terms of predictable performance, the target utility is met most of the time with a deviation of 0.1 to0.3. However for |A| ≥

Live reaches in the best effort case for the same number of applications. For

SSH (Fig. 11(c)), profitincreases roughly linear with |A| , from about 0.7 up to 2.1 for |A| = SSH can sustain a high utility even for large |A| .For

WEB (Fig. 11(d)), the difference between managed and best effort is 1 or less utility (maximum difference of 0.9at |A| = |A| <

90 and |A| >

90 the differencedecreases. As our pacing applies on application level, not flow level,

WEB can not gain an unfair advantage by openingmultiple TCP connections anymore. Furthermore, the utility function of

WEB (Fig. 6(b)) shows that

WEB is expensive interms of required throughput, which makes the optimization likely to sacrifice the target utility of

WEB in the secondoptimization step in order to increase the average utility of all applications. DL (Fig. 11(e)) exhibits the smallest utilitygains (besides VoIP ). The gain is below 0.8 for |A| ≤

90 and around zero for |A| = |A| is roughly linear for the managed and best effort experiments. For |A| ≥ DL again, which results in a utility gain close to 1.0. Managing the utility isaccurate and the deviation from the target utility can be neglected for all investigated numbers of parallel applications. VoIP exhibits no benefit or degradation from the activated management according to the user experience model (furtherdiscussed in Section 7.3).Figure 11(g) shows the minimum 10th percentile utility as measured in the best effort and managed experiments andas calculated by the fairness formulation. The figure shows that in the managed scenario, every client’s utility is atleast 3.0 up to 80 parallel applications, which is denoted as fair on the MOS scale. In the best effort case, the observedminimum utility drops below 3 for 40 applications and down to 1.0 for 80. When comparing θ ( min ) ( × ) and managed( □ ), the managed minimum utility does not differ more than 0.1 from the calculated minimum utility.In summary, the presented measurements for increasing number of parallel applications sharing the constrained linkhighlight the benefits of the proposed approach. VoD , Live , DL , and SSH exhibit gains in utility between 0.5 and up to3.3, even for 100 and more applications sharing the 100 Mbps link.

WEB ’s utility degrades, but the decrease is less than1.0. The minimum utility θ ( min ) can be greatly increased, especially for |A| >

30, and the target utility is mostly met,

Manuscript submitted to ACM

20 60 100 . . . . Q o E F a i r n e ss I nd e x (a) VoD

20 60 100 . . . . (b) Live

20 60 100 . . . . (c) SSH

20 60 100 . . . . (d) WEB

20 60 100 . . . . (e) DL Fig. 12. Comparison of the F-index between managed ( □ ) and best effort ( ◦ ) scenarios per application type for increasing number ofapplications sharing the constrained link. An F-index of 1.0 denotes perfect fairness. resulting in predictable application performance. VoIP shows no benefit or degradation due to the nature of its userexperience model.

To the best of our knowledge, there is no fairness measure to quantify the fairness for different application types withorthogonal resource demands, e.g., throughput-sensitive and delay-sensitive demands. For example,

VoIP is in our set-upalways close to a utility of 5.0, independent of other applications. Hence, any fairness measure which considers onlydifferences between values will consider this as unfair. But enforcing equal utility for all application types, includingartificially restricting

VoIP , would result in a non-Pareto-optimal utility distribution where the target utility of

VoIP couldbe increased without negatively impacting other applications. Therefore, we evaluate the inter-application fairness perapplication type. Note that for the evaluation we are restricting the allocation formulation to allocate only one targetutility value per application type. Hence the target utilities per type exhibit always perfect fairness and are omitted.We evaluate the inter-application fairness using the F-index [26] defined by F = − σ for a utility scale of 1 to 5.The F-index is selected as fairness measure as it is specifically designed and evaluated for user experience fairness. AnF-index of 1.0 indicates perfect fairness between the applications. An F-index of 0.0 is the result of half of the applicationexperiencing a utility of 1.0 and the other half a utility of 5.0. Figures 12(a) to 12(e) illustrate the F-index per applicationtype t for |A| = { , , .., } applications sharing the constrained link, measured for the best effort and managedscenarios. From the figures, we conclude that in the managed case, the F-index does not drop below 0.98 for any of theevaluated scenarios and application types.In the best effort case, the fairness depends strongly on the application type and the number of parallel applications.The WEB clients exhibit a fairness similar to that in the managed scenario ( ≥ . SSH and DL , the fairnessfluctuations are larger, but in general the fairness is still high ( ≥ . VoD and

Live suffer the most in the best effort scenarios. For

Live , the fairness drops down to 0.7 for |A| =

44 and for

VoD down to0.77 for |A| =

55. However, for video streaming there is a high level of fairness for |A| <

30 and |A| > VoD and

Live profit the most fromthe management.

SSH and DL show some improvement. WEB and

VoIP improve only marginally. In the managedmeasurements, we observe nearly perfect fairness for all application types.

Manuscript submitted to ACM calable Application- and User-aware Resource Allocation in Enterprise Networks 31

The evaluation set out to discuss the following four subjects: i) comparison of minimum and average utility for managedand best effort scenarios, ii) advantages and disadvantages of central data-rate management for each application class,iii) predictability of application performance, and iv) fairness between the applications.First, a scenario with 80 applications sharing a 100 Mbps link is presented. The measurements show that for the besteffort case, web browsing consumes about four times more of the available throughput than the other applications. This isdue to web browsers using multiple parallel TCP connections. As a consequence, the utility of the web browsing sessionsis high (3.5 to 4.0), while other applications like live video streaming suffer ( ≤ ≤ . DL and SSH . The

WEB clients do notprofit much from the management in terms of fairness.No or little benefit can be expected from the management when the link is only lightly utilized, as the applications donot have to compete for resources and there is no queuing time at the bottleneck. For highly utilized links, throughput-sensitive applications can not profit as the available resources are insufficient for all applications. In such situationssome application could be evicted from the network to provide a satisfying experience for critical applications. However,delay-sensitive applications like

SSH still profit from the reduced queuing at the link.In summary, the results show that there is a significant benefit of centrally controlled application pacing in terms ofutility, inter-application fairness, and predictability. Furthermore, compared to classical Quality of Service measures inthe network, the approach can be implemented with heterogeneous forwarding devices without any special features, itdoes not require expensive switch buffer space, and it is fully software-based.

In this paper we propose a design for resource allocation in enterprise networks based on central software-definednetwork control, fine-grained per-application pacing at the end-hosts, and utility functions derived from measurementsand user-experience models. Pacing refers to the method of restricting the amount of data an application is allowedto send into the network by implementing local back-pressure to the application sockets and introducing artificialdelays between packets. Traditional methods of QoS control in the network, such as policing or scheduling, interactbadly with end-host congestion control and do not scale to larger number of applications and application classes.Moving application pacing from in-network QoS methods to the end hosts, e.g., to user PCs, servers, smartphones, andtablets, is scalable, increases transmission efficiency, reduces the required complexity of forwarding devices, and allowscost-efficient high link utilizations. To the best of our knowledge, this is the first work proposing, formulating, andevaluating a scalable architecture for resource allocation for end-user applications in enterprise environments based onreal applications and user-experience models.We define application- and user-level utility using selected user-experience models from the literature. Based on themodels, we derive per-application utility models for the five common network use cases web browsing, file download,

Manuscript submitted to ACM

ACKNOWLEDGMENTS

This work has been supported, in part, by the German Research Foundation (DFG) under the grant numbers KE1863/6-1,ZI1334/2-1 and TR257/43-1 and in part by the European Union’s Horizon 2020 research and innovation program (grantagreement No 647158 - FlexNets). This work reflects only the authors’ view and the funding agency is not responsiblefor any use that may be made of the information it contains.

REFERENCES [1] Abdeljaouad, I., Rachidi, H., Fernandes, S., and Karmouch, A. Performance analysis of modern TCP variants: A comparison of Cubic, Compoundand New Reno. In (2010), IEEE, pp. 80–83.[2] Aggarwal, A., Savage, S., and Anderson, T. Understanding the performance of TCP pacing. In

Proc. of IEEE INFOCOM (2000), pp. 1157–1165.[3] Botta, A., Dainotti, A., and Pescapè, A. A tool for the generation of realistic network workload for emerging networking scenarios.

ElsevierComputer Networks 56 , 15 (2012), 3531–3547.[4] Cai, Y., Hanay, Y. S., and Wolf, T. A Study of the Impact of Network Traffic Pacing from Network and End-User Perspectives. In

Proc. of IEEEInternational Conference on Computer Communications and Networks (ICCCN) (2011).[5] Cai, Y., Jiang, B., Wolf, T., and Gong, W. A Practical On-line Pacing Scheme at Edges of Small Buffer Networks. In

Proc. of IEEE INFOCOM (2010).[6] Cardwell, N., Cheng, Y., Gunn, C. S., Yeganeh, S. H., and Jacobson, V. BBR: Congestion-based congestion control.

ACM Queue 14 , 5 (2016), 50.[7] Casas, P., Seufert, M., Egger, S., and Schatz, R. Quality of experience in remote virtual desktop services. In

Proc. of IFIP/IEEE InternationalSymposium on Integrated Network Management (IM) (2013).[8] Cheng, Y., and Cardwell, N. Making Linux TCP Fast. In

Netdev Conference (2016).[9] De Cicco, L., Caldaralo, V., Palmisano, V., and Mascolo, S. Tapas: a tool for rapid prototyping of adaptive streaming algorithms. In

Proc. of ACMWorkshop on Design, Quality and Deployment of Adaptive Video Streaming (2014).[10] Egger, S., Reichl, P., Hossfeld, T., and Schatz, R. Time is bandwidth? Narrowing the gap between subjective time perception and Quality ofExperience. In

Proc. of IEEE International Conference on Communications (ICC) (2012), pp. 1325–1330.Manuscript submitted to ACM calable Application- and User-aware Resource Allocation in Enterprise Networks 33 [11] Fei, Z., Xing, C., and Li, N. QoE-driven resource allocation for mobile IP services in wireless network.

Springer Science China Information Sciences58 , 1 (2015), 1–10.[12] Ferguson, A. D., Guha, A., Liang, C., Fonseca, R., and Krishnamurthi, S. Participatory networking: an API for application control of SDNs. In

Proc. of ACM SIGCOMM (2013), vol. 43, pp. 327–338.[13] Ferguson, A. D., Guha, A., Place, J., Fonseca, R., and Krishnamurthi, S. Participatory Networking. In

USENIX Workshop on Hot Topics inManagement of Internet, Cloud, and Enterprise Networks and Services (Hot-ICE) (2012).[14] Flach, T., Papageorge, P., Terzis, A., Pedrosa, L., Cheng, Y., Karim, T., Katz-Bassett, E., and Govindan, R. An Internet-Wide Analysis of TrafficPolicing. In

Proc. of ACM SIGCOMM (2016), ACM, pp. 468–482.[15] Floyd, S. TCP and explicit congestion notification.

ACM SIGCOMM Computer Communication Review 24 , 5 (1994), 8–23.[16] Floyd, S., and Jacobson, V. Random early detection gateways for congestion avoidance.

IEEE/ACM Transactions on Networking 1 , 4 (1993), 397–413.[17] Floyd, S., and Jacobson, V. Link-sharing and resource management models for packet networks.

IEEE/ACM Transactions on Networking 3 , 4 (1995),365–386.[18] Georgopoulos, P., Elkhatib, Y., Broadbent, M., Mu, M., and Race, N. Towards network-wide QoE fairness using openflow-assisted adaptivevideo streaming. In

Proc. of ACM SIGCOMM Workshop on Future human-centric Multimedia Networking (FhMN) (2013), pp. 15–20.[19] Gharakheili, H. H., Vishwanath, A., and Sivaraman, V. Comparing edge and host traffic pacing in small buffer networks.

Elsevier ComputerNetworks 77 , C (2015), 103–116.[20] Ghobadi, M., and Ganjali, Y. TCP pacing in data center networks. In

IEEE 21st Annual Symposium on High-Performance Interconnects (2013), IEEE,pp. 25–32.[21] Gómez, G., Lorca, J., García, R., and Pérez, Q. Towards a QoE-driven resource control in LTE and LTE-A networks.

Journal of Computer Networksand Communications , Article ID 505910 (2013).[22] Hassas Yeganeh, S., and Ganjali, Y. Kandoo: a framework for efficient and scalable offloading of control applications. In

Proceedings of the firstworkshop on Hot topics in software defined networks (2012), ACM, pp. 19–24.[23] Hock, M., Bless, R., and Zitterbart, M. Experimental evaluation of bbr congestion control. In (2017), IEEE.[24] Hossfeld, T., Seufert, M., Sieber, C., and Zinner, T. Assessing effect sizes of influence factors towards a QoE model for HTTP adaptive streaming.In

Proc. of Sixth International Workshop on Quality of Multimedia Experience (QoMEX) (2014), pp. 111–116.[25] Hossfeld, T., Seufert, M., Sieber, C., Zinner, T., and Tran-Gia, P. Identifying QoE optimal adaptation of HTTP adaptive streaming based onsubjective studies.

ELSEVIER Computer Networks 81 (2015), 320–332.[26] Hossfeld, T., Skorin-Kapov, L., Heegaard, P. E., and Varela, M. Definition of QoE fairness in shared systems.

IEEE Communications Letters 21 , 1(2017), 184–187.[27] Hu, T. C. Multi-commodity network flows.

Operations research 11 , 3 (1963), 344–360.[28] Kumar, A., Jain, S., Naik, U., Raghuraman, A., Kasinadhuni, N., Zermeno, E. C., Gunn, C. S., Ai, J., Carlin, B., Amarandei-Stavila, M., andOthers. BwE: Flexible, Hierarchical Bandwidth Allocation for WAN Distributed Computing. In

Proc. of ACM SIGCOMM (2015), vol. 45.[29] Li, Z., Zhu, X., Gahm, J., Pan, R., Hu, H., Begen, A. C., and Oran, D. Probe and Adapt: Rate Adaptation for HTTP Video Streaming At Scale.

IEEEJournal on Selected Areas in Communications 32 , 4 (2014), 719 – 733.[30] Liu, F., Xiang, W., Zhang, Y., Zheng, K., and Zhao, H. A Novel QoE-Based Carrier Scheduling Scheme in LTE-Advanced Networks withMulti-Service. In

Proc. of IEEE Vehicular Technology Conference (VTC Fall) (2012), IEEE.[31] Lukaseder, T., Bradatsch, L., Erb, B., Van Der Heijden, R. W., and Kargl, F. A Comparison of TCP Congestion Control Algorithms in 10GNetworks. In

Proc. of IEEE Local Computer Networks (LCN) (2016), pp. 706–714.[32] Mirchev, A. Survey of Concepts for QoS improvements via SDN.

Future Internet (FI) and Innovative Internet Technologies and Mobile Communications(IITM) 33 (2015), 1.[33] Mittal, R., Dukkipati, N., Blem, E., Wassel, H., Ghobadi, M., Vahdat, A., Wang, Y., Wetherall, D., Zats, D., and Others. TIMELY: RTT-basedCongestion Control for the Datacenter. In

Proc. of ACM SIGCOMM (2015), vol. 45, ACM, pp. 537–550.[34] Nichols, K., and Jacobson, V. Controlling queue delay.

Communications of the ACM 55 , 7 (2012), 42–50.[35] Ryu, S., Rump, C., and Qiao, C. Advances in Active Queue Management (AQM) Based TCP Congestion Control.

Springer Telecommunication Systems25 , 3-4 (2004), 317–351.[36] Sacchi, C., Granelli, F., and Schlegel, C. A QoE-oriented strategy for OFDMA radio resource allocation based on min-MOS maximization.

IEEECommunications Letters 15 , 5 (2011), 494–496.[37] Saeed, A., Dukkipati, N., Valancius, V., Contavalli, C., Vahdat, A., and Others. Carousel: Scalable Traffic Shaping at End Hosts. In

Proc. ofACM SIGCOMM (2017), ACM, pp. 404–417.[38] Salles, R. M., and Barria, J. A. Fair and efficient dynamic bandwidth allocation for multi-application networks.

Computer Networks 49 , 6 (2005),856–877.[39] Sun, L., and Ifeachor, E. C. Voice quality prediction models and their application in VoIP networks.

IEEE Trans. on Multimedia 8 , 4 (2006), 809–820.[40] Tang, P., Wang, P., Wang, N., and Ngoc, V. N. QoE-Based Resource Allocation Algorithm for Multi-Applications in Downlink LTE Systems. In

Proc. of International Conference on Computer, Communications and Information Technology (CCIT) (2014), Atlantis Press, pp. 1011–1016.[41] Tootoonchian, A., and Ganjali, Y. Hyperflow: A distributed control plane for OpenFlow. In

Proceedings of the 2010 internet network management

Manuscript submitted to ACM conference on Research on enterprise networking (2010), vol. 3.[42] Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli, E. P. Image quality assessment: from error visibility to structural similarity.

IEEE Transactionson Image Processing 13 , 4 (2004), 600–612.[43] Wei, D., Cao, P., Low, S., and EAS, C. TCP Pacing Revisited. In

Proc. of IEEE INFOCOM (2006).[44] Wurtzler, M. Analysis and simulation of weighted random early detection (WRED) queues.Manuscript submitted to ACM calable Application- and User-aware Resource Allocation in Enterprise Networks 35

A ALLOCATION PROBLEM FORMULATION

Next we give the complete description of the resource allocation problem formulated as a MILP. The MILP has to considerthe two-dimensional utility function of every application, the capacities of all links and the delay on intermediate linksdepending on the link utilization. The decision variables describe which pacing rate to apply to which applicationand how to configure the routing between application endpoints. The problem can be summarized with the followinginputs, objectives, high-level constraints and outputs.

Inputs: (I) Number of applications. (II) Utility function U of each application. (III) Network topology with linkcapacity information and delay on the links based on link utilization. Objectives: (I) Min-max utility fairness in the first step. (II) Increasing average utility in the second step.

Constraints:

Unidirectional application routing (source to destination) has to be valid, considering link capacity andmaximum delay per application.

Outputs: (I) Target utility value and allocated throughput per application. (II) Application flow routing.

A.1 Notation

Table 5 summarizes the notation. A , a ∈ A is the set of all unidirectional application flows a . For simplification,application flow a and intent i are merged in the notation to only a and each application consists of only one applicationflow. The two directions of a bidirectional application flow are considered as two independent applications by theformulation. This allows different paths and utility functions for both flow directions. We define the topology as adirected graph G (V , E) with nodes v ∈ V and edges ( u , v ) ∈ E and edge capacity C u , v . A flow a is defined by thesource node S a , target node T a and its utility function U . ψ and Ψ (both ∈ R + |E |×|E |× m ) describe the piece-wise defined relationship between link usage ψ and delay Ψ forspecific edge and for a quantization bin m .The utility function describes the relationship between allocated throughput and delay and the application’s resultingutility (Fig. 6). It can be determined for example through measurements and user experience models, as we do in thepaper at hand in Chapter 4. Mathematically, the utility function is split into its three components, the throughputdemands ( τ ∈ R + |A |× n ), the delay demands ( δ ∈ R + |A |× n ) and the utility values ( U ∈ ([ , ]) |A |×| τ |×| δ | ), where n denotes the quantization bin. F describes the application flow routing. An edge ( u , v ) is traversed by an application a if F a , u , v equals 1. Delayon a link is describes as a function of the link usage. ψ and Ψ (both ∈ R + |E |×|E |× m ) define the piece-wise definedrelationship between usage ( ψ ) and resulting delay ( Ψ ) for each edge in the graph G and quantization bin m . A.2 Objective

The objective of the MILP is in the first step to maximize the minimal utility value θ ( min ) over all applications. In thesecond step the MILP maximizes the sum of all utility values while the minimum utility is θ ( min ) restricted to rangebased on the minimum value determined by the first step, denoted as θ ( min,1 ) , θ ( min ) ∈ [ θ ( min,1 ) − ϵ , θ ( min,1 ) ] with ϵ = .

3. The second step allows the problem formulation to improve the average utility over all applications by relaxingthe max-min fairness constrain using the slack parameter ϵ . This prevents solutions where the optimization would stopwhen the utility of a single application can not be increased further, but where there are plenty of resources left toincrease the utility of other applications. Manuscript submitted to ACM

Table 5. Notation Allocation Problem Formulation

Symbol Type Unit DescriptionConstants G (V , E) Network topology graph with nodes V and edges ( u , v ) ∈ E . A , a ∈ A Set of all unidirectional application flows. S , T ∈ V |A | Start and target nodes of application flows. ψ , Ψ ∈ R + |E |×|E |× m Translation between link usage and delay for a specific link. C ∈ R + |V |×|E | Kbps Unidirectional link capacity between u and v . τ ∈ R + |A |× n Kbps Utility functions’ throughput demands of the applications. δ ∈ R + |A |× n ms Utility functions’ delay demands of the applications. U ∈ ([ , ]) |A |×| τ |×| δ | Utility functions’ utility values of the applications.Decision Variables θ ( min ) ∈ [ , ] Minimum utility for all applications.T ∈ { , } |A |×| τ | ∆ ∈ { , } |A |×| δ | F ∈ { , } |A |×|E |×|E | η ( a ) A (cid:55)→ R + Kbps Selected throughput for application a . D ( a ) A (cid:55)→ R + ms Selected delay requirement for application a . Λ ( a ) A (cid:55)→ [ , ] Target utility value of application a . Ω ( u , v ) E (cid:55)→ R + Kbps Assigned throughput to link ( u , v ) in Kbps. ω ( u , v ) E (cid:55)→ R + ms Delay on link ( u , v ) in milliseconds. ϒ ( a ) A (cid:55)→ R + ms End-to-end delay of application a in milliseconds.Miscellaneous θ ( min,{1|2} ) ∈ [ , ] Solution of θ (min) in first and second step. ϵ [ = . ∈ R + Slack parameter for θ (min) in the second step. n , m ∈ N Quantification factors for the utility and link delay functions.We define θ a as utility value of an application a . In the first step we maximize the minimum utility value ( max-minfairness ) subject to all application utilities have to be larger than the minimum utility value θ ( min ) :maximize: θ ( min ) (1)subject to: Λ ( a ) ≥ θ ( min ) ∀ a ∈ A (2)and (7) - (21) (3)We denote the optimal value of θ ( min ) of the first step as θ ( min,1 ) . In the second step we relax the max-min constraint by ϵ and maximize the sum of all utility values. We denote the optimal value of θ ( min ) of the second step as θ ( min,2 ) andadd the additional constraint to bound θ ( min,2 ) by θ ( min,1 ) − ϵ = . Manuscript submitted to ACM calable Application- and User-aware Resource Allocation in Enterprise Networks 37maximize: (cid:213) a ∈A Λ ( a ) (4)subject to: θ ( min ) ≥ θ ( min,1 ) − ϵ (5)and (7) - (21) (6)For remainder of this formulation and if not otherwise stated, θ ( min ) denotes the optimal value as determined by thesecond step ( θ ( min,2 ) ). Next we formulate the constraints. Table 6 summarizes the constraints. A.3 Utility Selection Constraints

For each application, one throughput, delay and target utility value have to be selected. We first introduce the equationsand afterwards illustrate the selection process by a simplified example. Eq. 7 and Eq. 8 dictate that only one throughputand delay demand for application a can be chosen at a time: | T a | (cid:213) i = T a , i = ∀ a ∈ A (7) | ∆ a | (cid:213) i = ∆ a , i = ∀ a ∈ A (8)Hence the chosen throughput demand in Kbps η a and delay requirement in milliseconds D a for application a are givenby the following element-wise multiplications. η ( a ) : = T Ta · τ a (9) D ( a ) : = ∆ Ta · δ a (10)The resulting utility value of application a , Λ ( a ) , is then selected from the quantified utility functions (Fig. 6) by thefollowing equation: Table 6. Overview of all constraints

Type Constraints DescriptionObjectives (2), (5) Maximize minimum utility (1st step) and sum of utilities (2nd step).Utility (7) - (11) Select target utility, throughput allocation and maximum allowed delayper application.Routing (12) - (14) Application routing (multi-commodity flow problem).Capacity (15) - (16) Link capacity (in Kpbs) can not be exceeded by applications.Delay (17) - (21) Determine delay per link (in milliseconds) depending on link usage.Ensure applications’ maximum delay demand is not exceeded.

Manuscript submitted to ACM Λ ( a ) : = | T | (cid:213) tp = | ∆ | (cid:213) d = ( T a , tp · ∆ a , d · U a , tp , d ) (11)Next we give an example for a target utility, throughput and delay demands calculations for an arbitrary application a . The discretized utility function U a has a domain of [100, 500, 1000] Kbps for the throughput and [150, 100, 50]milliseconds for the delay demand. At an allocation of 1000 Kbps and 50 ms the utility of the application reaches itshighest point with 4.9, while for 100 Kbps and 150 ms the target utility drops to 1 .

3. In the following example thedecision variables T a , and ∆ a , are set to 1 by the solver based on other constraints like the available link capacity.Hence, an allocation of η ( a ) =

500 Kbit / s is chosen with a target utility of Λ ( a ) = . Λ ( a ) = (cid:169)(cid:173)(cid:173)(cid:173)(cid:171) T a , T a , T a , ∆ a , U a , , U a , , U a , , ∆ a , U a , , U a , , U a , , ∆ a , U a , , U a , , U a , , (cid:170)(cid:174)(cid:174)(cid:174)(cid:172) = (cid:169)(cid:173)(cid:173)(cid:173)(cid:171)

00 1 . . . . . .

50 4 . . . (cid:170)(cid:174)(cid:174)(cid:174)(cid:172) = . η ( a ) = (cid:169)(cid:173)(cid:173)(cid:173)(cid:171) τ a , τ a , τ a , (cid:170)(cid:174)(cid:174)(cid:174)(cid:172) · (cid:16) T a , T a , T a , (cid:17) = (cid:169)(cid:173)(cid:173)(cid:173)(cid:171) (cid:170)(cid:174)(cid:174)(cid:174)(cid:172) · (cid:16) (cid:17) =

500 Kbit / s D ( a ) = (cid:169)(cid:173)(cid:173)(cid:173)(cid:171) δ a , δ a , δ a , (cid:170)(cid:174)(cid:174)(cid:174)(cid:172) · (cid:16) ∆ a , ∆ a , ∆ a , (cid:17) = (cid:169)(cid:173)(cid:173)(cid:173)(cid:171) (cid:170)(cid:174)(cid:174)(cid:174)(cid:172) · (cid:16) (cid:17) =

100 ms

A.4 Routing Constraints

We formulate the application flow routing problem as the multi-commodity flow problem [27] with non-fractionalflows. First we formulate the constraints required to route the flow from source to destination. Afterwards we formulatethe link capacity and application delay constraints. A flow is subject to the following routing constraints. Number ofincoming and outgoing edges of in-between nodes has to be equal ( flow conservation ): (cid:213) w ∈V F a , u , w = (cid:213) w ∈V F a , w , u | u (cid:44) T a , S a ∀ a ∈ A , ∀ u ∈ V (12)Flow conservation at the source (Eq. 13) and destination (Eq. 14): (cid:213) w ∈V F a , S a , w − (cid:213) w ∈V F a , w , S a = ∀ a ∈ A (13) (cid:213) w ∈V F a , w , T a − (cid:213) w ∈V F a , T a , w = ∀ a ∈ A (14) A.5 Capacity Constraints

Capacity constraints ensure that the assigned throughput to a link does not exceed the capacity of the link. Next weformulate the required link capacity constraints. We define the link usage in Kbps Ω ( u , v ) on the directed edge ( u , v ) asthe sum of the throughput values of all applications traversing that edge/link: Manuscript submitted to ACM calable Application- and User-aware Resource Allocation in Enterprise Networks 39 Ω ( u , v ) : = (cid:213) a ∈A F a , u , v · η ( a ) (15)And assigned throughput can not exceed the capacity: Ω ( u , v ) ≤ C u , v ∀ ( u , v ) ∈ E (16) A.6 Delay Constraints

We define the delay of each link as a function of the link usage. That way, the delay function can express a combinationof constant, e.g., propagation delay, and dynamic, e.g., queuing and processing delay, use cases. For example, an addedconstant delay can describe significant propagation delay, or the queuing delay can be modeled based on the target linkutilization. We first provide the necessary equations and then provide a simple example.We do a piece-wise linear interpolation to approximate the link delay for edge ( u , v ) , denoted as ω ( u , v ) , for a givenlink usage Ω ( u , v ) of the edge. ψ u , v , i and Ψ u , v , j describe the piece-wise defined translation sets between a usage inKbps with index i and delay in milliseconds with index j for a link ( u , v ) with | ψ u , v | = | Ψ u , v | . We introduce the variables l u , v , p with l u , v , p ∈ { , } and S u , v , p ∈ [ , ] for p = { , , .., | ψ u , v | − } . Variable l selects the closest, lower, link usagefrom ψ and S is the linear scaling factor. l and S are subject to: S u , v , p ≤ l u , v , p ∀ ( u , v ) ∈ E , p = { , , .., | ψ u , v | − } (17)Constrain the selection variable l u , v , p and scale variable S u , v , p according to the link usage Ω u , v : Ω ( u , v ) − | ψ u , v |− (cid:213) p = [ l u , v , p · ψ u , v , p + ( ψ u , v , p + − ψ u , v , p ) · S u , v , p ] = ∀ ( u , v ) ∈ E (18) ω ( u , v ) then defines the delay for the given link usage: ω ( u , v ) : = | ψ u , v |− (cid:213) p = [ l u , v , p · Ψ u , v , p + ( Ψ u , v , p + − Ψ u , v , p ) · S u , v , p ] (19)Let’s consider the following simple example. A hypothetical link ( u , v ) has a maximum capacity of 1000 Kbps and apropagation delay of 10 ms. Up to a link usage of 100 Kbps, there is no queuing delay. Between 100 Kbps and 1000 Kbpsthe queuing delay increases linearly up to a maximum of 70 ms. Hence, at a link usage of 1000 Kbps the delay on thelink is 70 ms +

10 ms =

80 ms. We can model this by setting ψ and Ψ as follows: ψ u , v = (cid:169)(cid:173)(cid:173)(cid:173)(cid:171) ψ u , v , ψ u , v , ψ u , v , (cid:170)(cid:174)(cid:174)(cid:174)(cid:172) = (cid:169)(cid:173)(cid:173)(cid:173)(cid:171) (cid:170)(cid:174)(cid:174)(cid:174)(cid:172) Kbps Ψ u , v = (cid:169)(cid:173)(cid:173)(cid:173)(cid:171) Ψ u , v , Ψ u , v , Ψ u , v , (cid:170)(cid:174)(cid:174)(cid:174)(cid:172) = (cid:169)(cid:173)(cid:173)(cid:173)(cid:171) (cid:170)(cid:174)(cid:174)(cid:174)(cid:172) ms Let us assume the decision variables assign link ( u , v ) a total link usage of 500 Kbps. The resulting total delay on thatlink can then be calculated by first determining l and S : Manuscript submitted to ACM Ω ( u , v ) − | ψ u , v |− (cid:213) p = [ l u , v , p · ψ u , v , p + ( ψ u , v , p + − ψ u , v , p ) · S u , v , p ] = ↔ − ([ l u , v , · + ( − ) · S u , v , ] + [ l u , v , · + ( − ) · S u , v , ]) = l u , v = [ , ] and S u , v = [ , . ¯44 ] . The delay on the link is then calculated as follows: ω ( u , v ) = | ψ u , v |− (cid:213) p = [ l u , v , p · Ψ u , v , p + ( Ψ u , v , p + − Ψ u , v , p ) · S u , v , p ] = [ l u , v , · + ( − ) · ] + [ · + ( − ) · . ¯44 ] ≈

41 msThe end-to-end delay of an application is then the sum of delays on the links traversed by the application. We denotethe end-to-end delay of application a with ϒ ( a ) : ϒ ( a ) : = (cid:213) ( u , v )∈E ω ( u , v ) · F a , u , v (20)Finally, the delay of the flow is not allowed to exceed the requirement: ϒ ( a ) ≤ D ( a ) ∀ a ∈ A (21) A.7 Problem Complexity and Possible Solving Strategies

The optimization formulation combines variations of the non-splittable multi-commodity flow problem (routing)[27] and of the knapsack problem (balancing demand and utility), both known to be NP-hard. Hence, approximationalgorithms have to be found to solve the formulation in a reasonable runtime for larger topologies with potentiallymultiple bottleneck links and a large number of simultaneous applications. The efficient and fast solving of the problemis out of scope of this work and is left to future work. This work provides the necessary abstractions and implementationproof that once the allocation decision is made, it can be efficiently and accurately be implemented in the network. Aswith other network resource allocation problems, such as the virtual network embedding (VNE) problem, the efficientsolving of the theoretical problem can now be explored independently of the implementation concepts.Solving the problem for our evaluation scenario (one bottleneck link, ≤

120 applications) takes on average less thanone minute on a standard eight-core Intel Core i7-4770 3.4 GHz desktop PC with 32 GB RAM using the commercialGurobi solver. In detail, Figures 13(a) to 13(c) illustrate the solving time, total number of variables and total number ofconstraints of the problem instances with increasing number of applications with one bottleneck link. The solving timestays below 10 s up to approximately 50 applications. Above 90 applications the solving time increases drastically up to66.2 s. Afterwards, when a high number of applications does not leave much room for allocating higher utility values,the solving time decreases again. Figures 13(b) and 13(c) show that the number of variables increases linearly with thenumber of applications with 2889 variables and 86 constraints for each additional application. Thus, the total number ofvariables and constraints depends on the number of applications, on the used quantification of the utility and link delayfunctions and on the size of the network topology. calable Application- and User-aware Resource Allocation in Enterprise Networks 41

50 100 S o l v i ng T i m e [ s ] (a) Solving Time

50 100 V a r i a b l e s × (b) Variables

50 100 . . . . C on s t r a i n t s × (c) Constraints Fig. 13. Problem size and solving time of the optimization formulation for increasing number of applications ( |A | ) sharing one bottleneck link. Maximum of 66.2 s solving time for 110 applications. 2896 variables and 86 constraints for each additional application.

One greedy algorithm for finding a viable solution could be to start with a target utility of 1.0 for all applicationflows and shortest path routing. Subsequently the utility can be increased by increments of 0.1 in a round-robin orderuntil an allocation is reached where no application’s utility can be increased anymore without violating capacity ordelay constraints. One problem with this algorithm is that it does not find sophisticated solutions where the utilizationof one path is kept low to support low volume-low delay applications, e.g., web browsing, and other paths are dedicatedto batch transfers, e.g., file download.Despite sophisticated approximations there may be delay between a change to the global state, e.g., a new application,and the availability of a new allocation. Applications may have to wait before they can join the network, lower priorityapplications have to be disconnected or some throughput has to be reserved for yet unknown applications. This reservedcapacity can then be allocated to new applications without requiring the solver to recalculate.

B EXPERIMENT VALUE SETUP

In this section we describe the value setup used in the experiments in this paper for a two applications setup. Forincreasing number of applications, all variables with dependency on A increase in size along their first dimension.Table 7 summarizes the problem input variables. In the experiments we have a network topology G with one bidirectionallink ( E = [( , ) , ( , )] ) between two nodes ( V : = [ , ] ). The link is shared by two application flows ( A : = [ , ] )which both send data from node 0 to node 1 ( S : = [ , ] , T : = [ , ] ). The link has a capacity of 100 Mbps in both flowdirections ( C : = [ Mbps , Mbps ] ). The link delay is modeled as constant with 2 ms for our managed scenarioswhere combined paced throughput does not exceed the link capacity ( ψ : = [ , Mbps ] , Ψ : = [ ms , ms ] ). τ , δ and U describe the quantized utility functions from Fig. 6. An example for the quantization of the utility functions can befound in Section A.3. Table 7. Experiment Value Setup For Two Applications