[PDF] An Adaptive Flow-Aware Packet Scheduling Algorithm for Multipath Tunnelling

Abstract

This paper proposes AFMT, a packet scheduling algorithm to achieve adaptive flow-aware multipath tunnelling. AFMT has two unique properties. Firstly, it implements robust adaptive traffic splitting for the subtunnels. Secondly, it detects and schedules bursts of packets cohesively, a scheme that already enabled traffic splitting for load balancing with little to no packet reordering. Several NS-3 experiments over different network topologies show that AFMT successfully deals with changing path characteristics due to background traffic while increasing throughput and reliability.

Full PDF

AAn Adaptive Flow-Aware Packet SchedulingAlgorithm for Multipath Tunnelling

Richard Sailer, J¨org H¨ahner

Organic Computing GroupUniversity of Augsburg, Augsburg, GermanyEmail: [email protected], [email protected]

Abstract —This paper proposes AFMT, a packet schedulingalgorithm to achieve adaptive ﬂow-aware multipath tunnelling.AFMT has two unique properties. Firstly, it implements robustadaptive trafﬁc splitting for the subtunnels. Secondly, it detectsand schedules bursts of packets cohesively, a scheme that alreadyenabled trafﬁc splitting for load balancing with little to no packetreordering.Several NS-3 experiments over different network topologiesshow that AFMT successfully deals with changing path charac-teristics due to background trafﬁc while increasing throughputand reliability.

I. I

NTRODUCTION

Network paths can be unreliable and slow [12], [5]. Redun-dancy is an obvious solution for this issue. Redundancy hasbeen used to provide reliability and higher throughput in manyﬁelds of computer engineering, e.g. databases, storage, powersupply, but seldomly for network paths. There are severalproposed concepts for redundancy: Multipath TCP [6] andMultipath Tunnelling [3].Multipath TCP (MPTCP) grants more reliability, but needsevery client and every server to have direct full access toall network uplinks (ﬁgure 1) [6]. This complicates wiring.Additionally, it needs all clients to implement MPTCP, acomplex protocol. It also does not solve the problem for UDPﬂows. So while MPTCP performs better than load balancing,it still leaves a lot of issues unaddressed.Multipath Tunnelling (MT) addresses these issues. As vis-ible in ﬁgure 2, only the tunnel endpoints T entry and T exit need to see and understand the sub-tunnels. The clients andservers don’t know they’re connected by multiple paths, theyneed no additional wiring or implementation of a new networkprotocol. Since all ﬂows between the two networks are tun-nelled, this also works for UDP. Lastly, all current prototypesand their concepts are less complex than the Multipath TCPconcept and its implementations [9].Our novel contributions include: We introduce AFMT, apacket scheduling algorithm for adaptive ﬂow-aware multi-path tunnelling. AFMT aims to overcome the throughput,reordering and adaptivity issues of existing MT approaches.We evaluate AFMT for diverse networks with changing pathcharacteristics. This shows that AFMT improves reliability andthroughput compared to a classic packet scheduling scheme.The evaluation in this paper focuses on wired connections,nevertheless AFMT research focuses on providing a generalsolution. C S P i Fig. 1. Multipath TCP Network Topology. For every packet p sent from client C to server S , C decides (schedules) the path P i to use. For this, C and S need an implementation of MPTCP and direct access to all the paths. C ST exit T entry P i Fig. 2. Multipath Tunnelling Network Topology. A Packet p sent from C isencapsulated at the tunnel entry T entry and sent via P i to the tunnel exit T exit . There, it is decapsulated and sent to S . II. B

ACKGROUND

TCP is sensitive to packet reordering [4], it interpretsreordering as a sign of packet loss and reacts with spuriousretransmits and throughput reduction. TCP uses a slidingwindow (congestion window, cwnd ) algorithm to adjust itssend rate to the path’s capacity [11]. Often, TCP sends thecontents of a cwnd as one burst or ﬂowlet [7]. The set of alltransmitted packets in a transport layer association is deﬁnedas a ﬂow [12]. As illustrated in ﬁgure 3, it is possible to utiliseﬂowlets to achieve trafﬁc splitting with little to no packetreordering.Multipath tunnelling is known to induce heavy packetreordering [3]. This research proposes to reduce it by usingﬂowlet switching [7], a scheme that reduced packet reorderingfor a similar problem (ISP load balancing) to very little to nooccurrence.In multipath routing context, packet scheduling refers to thetask of choosing an output queue for every packet from aninput queue [10]. In a MT system one output queue mapsto one subtunnel. The behaviour of the packet schedulingalgorithm is central to the behaviour of the whole MT system[5]. a r X i v : . [ c s . N I] S e p ig. 3. When an inter-ﬂowlet delay δ is larger than the delay differencebetween two paths (MTBS), it is possible to send the two ﬂowlets via thedifferent paths and no packet reordering will occur [7]. III. AFMTAFMT is built on the assumption that using these resultsgives comparable or even better results than inventing anown path estimation scheme. DCCP and TCP maintain anestimation of the path’s RTT and capacity for their congestioncontrol. Decades of research have been invested to optimisethese [12], [8]. The following subsections describe the reali-sation of ﬂow awareness and adaptivity.

A. Flow-Awareness

To ﬁnd the applicable subtunnels and match the ﬂow ofa packet, a central data structure that tracks all ﬂows isnecessary. Therefore AFMT uses a ﬂow table with the ﬂow id as key and a tuple of last subtunnel and timestamp as value.For every ﬂow id it returns the last subtunnel used by thisﬂow and a timestamp. This timestamp indicates the absolutetime when the last packet of our ﬂow was sent through theassociated subtunnel.With this data AFMT can determine the applicable sub-tunnels as shown in Algorithm 1. Relevant entities are thesubtunnels s i , the to-schedule packet p and the absolute time-points t i . Initially AFMT obtains the ﬂow-id of p ( p.f low id ).It can be obtained from the operating system. Every operatingsystem that supports network address translation (NAT) needsto track ﬂows and can (more or less directly) provide ﬂow-ids.For Linux this is possible relatively simple with the conntrack netﬁlter module. In most use cases AFMT targets, T entry isthe internet gateway of a organisation’s local network, a devicethat already implements NAT. Therefore there’s no overheadfor ﬂow identiﬁcation and tracking necessary.Next we lookup the ﬂow id in the ﬂow table (Line 6). If itexists we assign the data to two local variables (Line 7), andcalculate δ the time that has passed since the last packet ofour ﬂow was sent. Then the smoothed round trip time (SRTT) of a subtunnelis obtained from the transport protocol implementation . It’smore resistant to ﬂuctuations and therefore a more meaningfulproposition about the path than the RTT[11].Knowing s i .SRT T , the SRTT of a subtunnel i it is possibleto predict when p will arrive at T exit , namely in s i .SRT T time from now and s i .SRT T + δ time from when p last wassent. Comparatively s last .SRT T gives the arrival time of p last from when it was sent. Therefore if s i .SRT T + δ is largerthan s last .SRT T p will arrive after p last , and we can add s i to the list of applicable subtunnels (Lines 11-12). This is donefor all subtunnels other then s last .After acquiring the list of applicable subtunnels AFMTselects the best of them ( s opt ) adaptivity wise, which isexplained in more detail in the next subsection (Line 15). Thenthe ﬂow table is updated with the new values of s opt and thecurrent time t now and p is ﬁnally sent via s opt (Lines 16-17).If p.f low id is not found in the ﬂow table i.e. it starts anew ﬂow, AFMT directly calls the adaptive selection processwith all available subtunnels s , ..., s n to determine s opt (Line19). Then, as previously we update the ﬂow table and send p via s opt (Line 20-21). Algorithm 1:

AFMT: Flow-Awareness Input Queue q Subtunnels s , ..., s n p ← q.deque () if def ined ( f low table [ p.f low id ]) then ( s last , t last ) ← f low table [ p.f low id ] δ = t now − t last s applicable ← s last for s i in other subtunnels do if s i .SRT T + δ > s last .SRT T then s applicable .append( s i ) end end s opt ← select adaptively ( s applicable ) f low table [ p.f low id ] ← { s opt , t now } s opt .send ( p ) else s opt ← select adaptively ( s , ..., s n ) f low table [ p.f low id ] ← { s opt , t now } s opt .send ( p ) end B. Adaptivity

Algorithm 2 illustrates how the best subtunnel adaptivitywise is chosen. Line 1 iterates over all s i and calculates the weighted ﬁll for each. Then, the subtunnel with the lowestone is selected. weighted ﬁll aims to model the current load For Linux it is possible to get these values via getsockopt(). f the subtunnel. It considers the ﬁll of the buffer associatedwith s i : s i .f ill . s i .f ill is added to the size of p to get thefull load this subtunnel would have to shoulder. This valueis divided by a value comparable to the “bandwidth-delay-quotient”: s i .cwnd/s i .SRT T . Algorithm 2:

AFMT: Adaptivity ( select adaptively() ) s opt ← s i with minimal s i .weighted f ill where s i .weighted f ill = s i .fill + p.sizes i .cwnd/s i .SRT T (= s i .fill + p.sizes i .cwnd ∗ s i .SRT T ) IV. E

VALUATION

To evaluate AFMT, it was implemented in NS-3 and usedwith several network topologies and multiple payload ﬂows.

A. NS-3 and The Network

For modelling wired links, NS-3 provides a PointToPointand a CSMA model. We chose the CSMA model for all linkssince it provides the closest model of the Ethernet and DSLlinks of a typical application case [1]. All CSMA net devicequeues are conﬁgured as drop tail queues.Data Rates of 16, 32 and 50 Mbit/s were chosen to model afast cable internet connection and two slower DSL subscriberlines. We also conducted a second set of experiments with onlytwo uplinks, modelling a 32 and a 50 Mbit/s line. IntermediateRouters between T entry and T exit were introduced to partiallymodel the IP Layer routing overhead and different backbonepaths of a real client to online server path. All links in thebackbone network are modelled with 1 GBit/s. The same goesfor the CSMA link from T exit to the server, which represents aconnection in a data center. The local network conﬁguration ofthe clients models a local gigabit Ethernet link. As a baselinewe also evaluated how the three payload ﬂows perform if theyare routed single path via the fastest 50 Mbit/s link withoutany tunneling or multipath aggregation.For the CsmaChannel delay which models the propagationdelay between two nodes including all switches and hubs,we chose . Considering an average switch overheadof 600ns[2] this models 1-2 switches and 1-2 km of medium.Serialisation delay (sometimes called transmission delay) andqueuing delay are modelled by NS-3 based on the ChannelData Rates. NS-3 does not simulate processing delays.The simulation duration is 30 seconds. The payload ﬂowsstart at second 4 and cease at second 24. Between second 8and 16 a background bulk TCP ﬂow occurs on the 32 Mbit/suplink. This models a sudden decrease in the path’s capacityto observe how the different scheduling algorithms handle it.To simulate application trafﬁc we used the NS-3 packet-sinkapplication on C − C and three bulk send applications on S . This simulates three full speed downloads via TCP. B. AFMT and Round Robin Implementation

For a ﬁrst prototype we used TCP as transport protcol forthe subtunnels. Since the T entry and T exit nodes are under Subtunnels RR AFMT

Three Subtunnels: 16, 32, 50 Mbit/s 63.02 105.89Two Subtunnels: 32, 50 Mbit/s 87.63 111No Tunnel, Single Path, 50 Mbit/s 52.7TABLE IG

OODPUT OVER SECONDS IN M IBI B YTES FOR THE DIFFERENT S CHEDULING A LGORITHMS IN DIFFERENT N ETWORK T OPOLOGIES the control of the AFMT system we can fully conﬁgure theTCP socket to beneﬁt AFMT. The Delayed acknowledgementextension allows TCP to only send an acknowledgement forevery second received data segment, or if a timeout occurs, toreduce overhead. NS-3 used a unusual high default of 200msfor the timeout, for accurate tunnel stats we reduced it to thesame value the Linux kernel uses: 40ms. TcpNoDelay wasenabled to get fast and interactive tunnel behaviour.When tunnelling datagrams, TCP blurs the packet bound-aries, since it’s basically a stream transport protocol[12]. Tore-distinguish the payload packets we introduced a small 8byte header preceding every payload datagram with its size.For comparison all experiments were also conducted with around robin MT system in the same network, with the samepayload. The round robin (RR) scheduler used UDP for it’ssubtunnels.

C. General Results

The total goodput is shown in table I. All MT systemsprovide a higher throughput than a single path solution withthe fastest uplink. The increase ranges from 21% (threesubtunnels RR) to 110% (two subtunnels, AFMT). For threesubtunnels AFMT has a 60% higher overall goodput with 105Mibyte. However with two subtunnels the gap is much smallerwith 27%.This indicates that the dynamic AFMT is better in dealingwith diverse paths and path capacity changes. The stablethroughput additionally indicates successful ﬂow-aware trafﬁcsplitting. Low ﬂow-awareness would have resulted in packetreordering and therefore cwnd reduction and sudden through-put rate drops. But both are considerabley reduced comparedto RR as described in the following two subsections.

D. Three Subtunnels

For three subtunnels the overall goodput is plotted in Figure4. AFMT has a consistently higher throughput than RR.It starts at second 4 when the slow start algorithm of thesubtunnels opens up the cwnd in the same time the cwndsof the payload ﬂows do. So no initial inertia is visible. Afterthat, until second 8 the goodputs lie around 6.5 MiB/s and 4.2MiB/s.At second 8 the gap widens as AFMT goodput drops toabout 4 MiB/s and RR to about 1. We assume this is becauseRR continues to send the same amount of packets over theimpaired path, which brings congestion and packet loss. Atsecond 12 AFMT drops down to the same goodput as RR.1.5 seconds later it recovers back to 4 MiB/s. At second 16when the background trafﬁc stops, both systems recover. The M i B y t e / s Time (Seconds)Total GoodputTotal Goodput using AFMTTotal Goodput using RR

Fig. 4. Total Goodput of AFMT System when using three subtunnels M i B y t e / s Time (Seconds)Total GoodputTotal Goodput using AFMT Total Goodput using RR

Fig. 5. Total Goodput of AFMT System when using two subtunnels

AFMT goodput stays at an average of 5.8 MiB/s, while theRR goodput stays at 4.1.

E. Two Subtunnels

The total goodput over time for both algorithms without the16 Mbit/s subtunnel is plotted in ﬁgure 5. While AFMT opensits cwnds still fast, the difference is smaller. AFMT transportsca. 6.5 MiB/s, RR oscillates around 5.8 MiB/s. At second 8both throughputs drop and again RR’s throughput drops furtherto about 2 MiB/s compared to AFMT’s 4 MiB/s. However itis notable that for both systems the throughputs oscillate withlarger amplitudes than for three subtunnels. After second 16both systems recover to their previous throughput rate.V. R

ELATED W ORK

A. Multipath TCP

A MPTCP scheduler has more options to avoid packetreordering than a MT scheduler. It does not have to considerﬂowlets and can deﬁne new ﬂowlets suitable to avoid packetreordering. For adaptivity, MPTCP trusts the congestion con-trol of its subﬂows. Every time space in a subﬂows’ cwnd opens and there is data to send, the scheduler is invoked [9].LowRTT [10] is a simple scheduler currently used as defaultin the Linux Kernel. When invoked, it picks the subﬂow with the lowest available RTT. It reduces head-of-line blocking anddelay variation by about 20%.DPSAF [13] is a sophisticated computation intensive sched-uler for vehicular networks. It tries to predict when the packetswill arrive and sends them out-of-order, so that they will arrivein order. While DPSAF might be a good solution for vehicularnetworks with bad connectivity, it is unclear how feasible it isfor high speed internet usage.

B. Multipath Tunnelling [3] proposes a MT system for non-TCP loss-tolerant mediatrafﬁc and two subtunnel paths with ﬁxed characteristics. ADSL path with high stable bandwidth and a LTE path withlow varying bandwidth as overﬂow vault. It detects packetloss via sequence numbers in a own header and adapts roundrobin weights acordingly. However, this reimplements existingtransport protocol functionality. It is not ﬂow-aware and hasto re-reorders the packets with a reordering buffer.[5] researches multipath access networking in general. Ad-ditionally the author designed a HTTP extension that splitsvideos in chunks of ﬁxed size and downloads them on separatepaths. This needs changes of the client application and onlyworks for a speciﬁc case.VI. C

ONCLUSIONS AND O UTLOOK

In this paper we proposed AFMT a packet schedulingalgorithm for multipath tunnelling that increases throughputand reliability. Several NS-3 simulations including changingpath capacities have shown that AFMT effectively deals withdiverse and changing network paths.These results were obtained although the experiments usedthe suboptimal TCP protocol for the subtunnels. Our futurework will evaluate and optimise AFMT characteristics usingDCCP with diverse payload trafﬁc.R

EFERENCES[1] ns-3 Model Library Documentation , 2019.[2] M. Barreiros and P. Lundqvist,

QoS-Enabled networks: Tools andfoundations . John Wiley & Sons, 2015.[3] M. Bednarek, G. Barrenetxea, M. K¨uhlewind, and B. Trammell, “Mul-tipath bonding at layer 3.” in

ANRW , 2016, pp. 7–12.[4] S. Bohacek, J. P. Hespanha, J. Lee, C. Lim, and K. Obraczka, “Anew TCP for persistent packet reordering,”

IEEE/ACM Transactions onNetworking (TON) , vol. 14, no. 2, pp. 369–382, 2006.[5] K. Dominikn, “Multipath aggregation of heterogeneous access net-works,” Ph.D. dissertation, PhD Thesis, University of Oslo, 2011.[6] A. Ford, C. Raiciu, M. Handley, and O. Bonaventure, “TCP Extensionsfor Multipath Operation with Multiple Addresses,”

RFC 6824 , 2013.[7] S. Kandula, D. Katabi, S. Sinha, and A. Berger, “Dynamic LoadBalancing Without Packet Reordering,”

SIGCOMM Comput. Commun.Rev. , vol. 37, no. 2, pp. 51–62, Mar. 2007.[8] E. Kohler, M. Handley, and S. Floyd, “Datagram congestion controlprotocol (DCCP),”

RFC 4340 , 2006.[9] C. Paasch et al. , “Improving Multipath TCP,”

Diss. Universit’ecatholique de Louvain (UCL), London , 2014.[10] C. Paasch, S. Ferlin, O. Alay, and O. Bonaventure, “Experimental eval-uation of multipath tcp schedulers,” in

Proc. of 2014 ACM SIGCOMMworkshop on Capacity sharing workshop . ACM, 2014, pp. 27–32.[11] J. Postel, “Transmission control protocol,”

RFC 793 , 1981.[12] A. S. Tanenbaum,

Computer networks, 4-th edition , 2003.[13] K. Xue, J. Han, D. Ni, W. Wei, Y. Cai, Q. Xu, and P. Hong, “DPSAF:forward prediction based dynamic packet scheduling and adjusting withfeedback for multipath TCP in lossy heterogeneous networks,”