An Adaptive Flow-Aware Packet Scheduling Algorithm for Multipath Tunnelling
AAn Adaptive Flow-Aware Packet SchedulingAlgorithm for Multipath Tunnelling
Richard Sailer, J¨org H¨ahner
Organic Computing GroupUniversity of Augsburg, Augsburg, GermanyEmail: [email protected], [email protected]
Abstract —This paper proposes AFMT, a packet schedulingalgorithm to achieve adaptive flow-aware multipath tunnelling.AFMT has two unique properties. Firstly, it implements robustadaptive traffic splitting for the subtunnels. Secondly, it detectsand schedules bursts of packets cohesively, a scheme that alreadyenabled traffic splitting for load balancing with little to no packetreordering.Several NS-3 experiments over different network topologiesshow that AFMT successfully deals with changing path charac-teristics due to background traffic while increasing throughputand reliability.
I. I
NTRODUCTION
Network paths can be unreliable and slow [12], [5]. Redun-dancy is an obvious solution for this issue. Redundancy hasbeen used to provide reliability and higher throughput in manyfields of computer engineering, e.g. databases, storage, powersupply, but seldomly for network paths. There are severalproposed concepts for redundancy: Multipath TCP [6] andMultipath Tunnelling [3].Multipath TCP (MPTCP) grants more reliability, but needsevery client and every server to have direct full access toall network uplinks (figure 1) [6]. This complicates wiring.Additionally, it needs all clients to implement MPTCP, acomplex protocol. It also does not solve the problem for UDPflows. So while MPTCP performs better than load balancing,it still leaves a lot of issues unaddressed.Multipath Tunnelling (MT) addresses these issues. As vis-ible in figure 2, only the tunnel endpoints T entry and T exit need to see and understand the sub-tunnels. The clients andservers don’t know they’re connected by multiple paths, theyneed no additional wiring or implementation of a new networkprotocol. Since all flows between the two networks are tun-nelled, this also works for UDP. Lastly, all current prototypesand their concepts are less complex than the Multipath TCPconcept and its implementations [9].Our novel contributions include: We introduce AFMT, apacket scheduling algorithm for adaptive flow-aware multi-path tunnelling. AFMT aims to overcome the throughput,reordering and adaptivity issues of existing MT approaches.We evaluate AFMT for diverse networks with changing pathcharacteristics. This shows that AFMT improves reliability andthroughput compared to a classic packet scheduling scheme.The evaluation in this paper focuses on wired connections,nevertheless AFMT research focuses on providing a generalsolution. C S P i Fig. 1. Multipath TCP Network Topology. For every packet p sent from client C to server S , C decides (schedules) the path P i to use. For this, C and S need an implementation of MPTCP and direct access to all the paths. C ST exit T entry P i Fig. 2. Multipath Tunnelling Network Topology. A Packet p sent from C isencapsulated at the tunnel entry T entry and sent via P i to the tunnel exit T exit . There, it is decapsulated and sent to S . II. B
ACKGROUND
TCP is sensitive to packet reordering [4], it interpretsreordering as a sign of packet loss and reacts with spuriousretransmits and throughput reduction. TCP uses a slidingwindow (congestion window, cwnd ) algorithm to adjust itssend rate to the path’s capacity [11]. Often, TCP sends thecontents of a cwnd as one burst or flowlet [7]. The set of alltransmitted packets in a transport layer association is definedas a flow [12]. As illustrated in figure 3, it is possible to utiliseflowlets to achieve traffic splitting with little to no packetreordering.Multipath tunnelling is known to induce heavy packetreordering [3]. This research proposes to reduce it by usingflowlet switching [7], a scheme that reduced packet reorderingfor a similar problem (ISP load balancing) to very little to nooccurrence.In multipath routing context, packet scheduling refers to thetask of choosing an output queue for every packet from aninput queue [10]. In a MT system one output queue mapsto one subtunnel. The behaviour of the packet schedulingalgorithm is central to the behaviour of the whole MT system[5]. a r X i v : . [ c s . N I] S e p ig. 3. When an inter-flowlet delay δ is larger than the delay differencebetween two paths (MTBS), it is possible to send the two flowlets via thedifferent paths and no packet reordering will occur [7]. III. AFMTAFMT is built on the assumption that using these resultsgives comparable or even better results than inventing anown path estimation scheme. DCCP and TCP maintain anestimation of the path’s RTT and capacity for their congestioncontrol. Decades of research have been invested to optimisethese [12], [8]. The following subsections describe the reali-sation of flow awareness and adaptivity.
A. Flow-Awareness
To find the applicable subtunnels and match the flow ofa packet, a central data structure that tracks all flows isnecessary. Therefore AFMT uses a flow table with the flow id as key and a tuple of last subtunnel and timestamp as value.For every flow id it returns the last subtunnel used by thisflow and a timestamp. This timestamp indicates the absolutetime when the last packet of our flow was sent through theassociated subtunnel.With this data AFMT can determine the applicable sub-tunnels as shown in Algorithm 1. Relevant entities are thesubtunnels s i , the to-schedule packet p and the absolute time-points t i . Initially AFMT obtains the flow-id of p ( p.f low id ).It can be obtained from the operating system. Every operatingsystem that supports network address translation (NAT) needsto track flows and can (more or less directly) provide flow-ids.For Linux this is possible relatively simple with the conntrack netfilter module. In most use cases AFMT targets, T entry isthe internet gateway of a organisation’s local network, a devicethat already implements NAT. Therefore there’s no overheadfor flow identification and tracking necessary.Next we lookup the flow id in the flow table (Line 6). If itexists we assign the data to two local variables (Line 7), andcalculate δ the time that has passed since the last packet ofour flow was sent. Then the smoothed round trip time (SRTT) of a subtunnelis obtained from the transport protocol implementation . It’smore resistant to fluctuations and therefore a more meaningfulproposition about the path than the RTT[11].Knowing s i .SRT T , the SRTT of a subtunnel i it is possibleto predict when p will arrive at T exit , namely in s i .SRT T time from now and s i .SRT T + δ time from when p last wassent. Comparatively s last .SRT T gives the arrival time of p last from when it was sent. Therefore if s i .SRT T + δ is largerthan s last .SRT T p will arrive after p last , and we can add s i to the list of applicable subtunnels (Lines 11-12). This is donefor all subtunnels other then s last .After acquiring the list of applicable subtunnels AFMTselects the best of them ( s opt ) adaptivity wise, which isexplained in more detail in the next subsection (Line 15). Thenthe flow table is updated with the new values of s opt and thecurrent time t now and p is finally sent via s opt (Lines 16-17).If p.f low id is not found in the flow table i.e. it starts anew flow, AFMT directly calls the adaptive selection processwith all available subtunnels s , ..., s n to determine s opt (Line19). Then, as previously we update the flow table and send p via s opt (Line 20-21). Algorithm 1:
AFMT: Flow-Awareness Input Queue q Subtunnels s , ..., s n p ← q.deque () if def ined ( f low table [ p.f low id ]) then ( s last , t last ) ← f low table [ p.f low id ] δ = t now − t last s applicable ← s last for s i in other subtunnels do if s i .SRT T + δ > s last .SRT T then s applicable .append( s i ) end end s opt ← select adaptively ( s applicable ) f low table [ p.f low id ] ← { s opt , t now } s opt .send ( p ) else s opt ← select adaptively ( s , ..., s n ) f low table [ p.f low id ] ← { s opt , t now } s opt .send ( p ) end B. Adaptivity
Algorithm 2 illustrates how the best subtunnel adaptivitywise is chosen. Line 1 iterates over all s i and calculates the weighted fill for each. Then, the subtunnel with the lowestone is selected. weighted fill aims to model the current load For Linux it is possible to get these values via getsockopt(). f the subtunnel. It considers the fill of the buffer associatedwith s i : s i .f ill . s i .f ill is added to the size of p to get thefull load this subtunnel would have to shoulder. This valueis divided by a value comparable to the “bandwidth-delay-quotient”: s i .cwnd/s i .SRT T . Algorithm 2:
AFMT: Adaptivity ( select adaptively() ) s opt ← s i with minimal s i .weighted f ill where s i .weighted f ill = s i .fill + p.sizes i .cwnd/s i .SRT T (= s i .fill + p.sizes i .cwnd ∗ s i .SRT T ) IV. E
VALUATION
To evaluate AFMT, it was implemented in NS-3 and usedwith several network topologies and multiple payload flows.
A. NS-3 and The Network
For modelling wired links, NS-3 provides a PointToPointand a CSMA model. We chose the CSMA model for all linkssince it provides the closest model of the Ethernet and DSLlinks of a typical application case [1]. All CSMA net devicequeues are configured as drop tail queues.Data Rates of 16, 32 and 50 Mbit/s were chosen to model afast cable internet connection and two slower DSL subscriberlines. We also conducted a second set of experiments with onlytwo uplinks, modelling a 32 and a 50 Mbit/s line. IntermediateRouters between T entry and T exit were introduced to partiallymodel the IP Layer routing overhead and different backbonepaths of a real client to online server path. All links in thebackbone network are modelled with 1 GBit/s. The same goesfor the CSMA link from T exit to the server, which represents aconnection in a data center. The local network configuration ofthe clients models a local gigabit Ethernet link. As a baselinewe also evaluated how the three payload flows perform if theyare routed single path via the fastest 50 Mbit/s link withoutany tunneling or multipath aggregation.For the CsmaChannel delay which models the propagationdelay between two nodes including all switches and hubs,we chose . Considering an average switch overheadof 600ns[2] this models 1-2 switches and 1-2 km of medium.Serialisation delay (sometimes called transmission delay) andqueuing delay are modelled by NS-3 based on the ChannelData Rates. NS-3 does not simulate processing delays.The simulation duration is 30 seconds. The payload flowsstart at second 4 and cease at second 24. Between second 8and 16 a background bulk TCP flow occurs on the 32 Mbit/suplink. This models a sudden decrease in the path’s capacityto observe how the different scheduling algorithms handle it.To simulate application traffic we used the NS-3 packet-sinkapplication on C − C and three bulk send applications on S . This simulates three full speed downloads via TCP. B. AFMT and Round Robin Implementation
For a first prototype we used TCP as transport protcol forthe subtunnels. Since the T entry and T exit nodes are under Subtunnels RR AFMT
Three Subtunnels: 16, 32, 50 Mbit/s 63.02 105.89Two Subtunnels: 32, 50 Mbit/s 87.63 111No Tunnel, Single Path, 50 Mbit/s 52.7TABLE IG
OODPUT OVER SECONDS IN M IBI B YTES FOR THE DIFFERENT S CHEDULING A LGORITHMS IN DIFFERENT N ETWORK T OPOLOGIES the control of the AFMT system we can fully configure theTCP socket to benefit AFMT. The Delayed acknowledgementextension allows TCP to only send an acknowledgement forevery second received data segment, or if a timeout occurs, toreduce overhead. NS-3 used a unusual high default of 200msfor the timeout, for accurate tunnel stats we reduced it to thesame value the Linux kernel uses: 40ms. TcpNoDelay wasenabled to get fast and interactive tunnel behaviour.When tunnelling datagrams, TCP blurs the packet bound-aries, since it’s basically a stream transport protocol[12]. Tore-distinguish the payload packets we introduced a small 8byte header preceding every payload datagram with its size.For comparison all experiments were also conducted with around robin MT system in the same network, with the samepayload. The round robin (RR) scheduler used UDP for it’ssubtunnels.
C. General Results
The total goodput is shown in table I. All MT systemsprovide a higher throughput than a single path solution withthe fastest uplink. The increase ranges from 21% (threesubtunnels RR) to 110% (two subtunnels, AFMT). For threesubtunnels AFMT has a 60% higher overall goodput with 105Mibyte. However with two subtunnels the gap is much smallerwith 27%.This indicates that the dynamic AFMT is better in dealingwith diverse paths and path capacity changes. The stablethroughput additionally indicates successful flow-aware trafficsplitting. Low flow-awareness would have resulted in packetreordering and therefore cwnd reduction and sudden through-put rate drops. But both are considerabley reduced comparedto RR as described in the following two subsections.
D. Three Subtunnels
For three subtunnels the overall goodput is plotted in Figure4. AFMT has a consistently higher throughput than RR.It starts at second 4 when the slow start algorithm of thesubtunnels opens up the cwnd in the same time the cwndsof the payload flows do. So no initial inertia is visible. Afterthat, until second 8 the goodputs lie around 6.5 MiB/s and 4.2MiB/s.At second 8 the gap widens as AFMT goodput drops toabout 4 MiB/s and RR to about 1. We assume this is becauseRR continues to send the same amount of packets over theimpaired path, which brings congestion and packet loss. Atsecond 12 AFMT drops down to the same goodput as RR.1.5 seconds later it recovers back to 4 MiB/s. At second 16when the background traffic stops, both systems recover. The M i B y t e / s Time (Seconds)Total GoodputTotal Goodput using AFMTTotal Goodput using RR
Fig. 4. Total Goodput of AFMT System when using three subtunnels M i B y t e / s Time (Seconds)Total GoodputTotal Goodput using AFMT Total Goodput using RR
Fig. 5. Total Goodput of AFMT System when using two subtunnels
AFMT goodput stays at an average of 5.8 MiB/s, while theRR goodput stays at 4.1.
E. Two Subtunnels
The total goodput over time for both algorithms without the16 Mbit/s subtunnel is plotted in figure 5. While AFMT opensits cwnds still fast, the difference is smaller. AFMT transportsca. 6.5 MiB/s, RR oscillates around 5.8 MiB/s. At second 8both throughputs drop and again RR’s throughput drops furtherto about 2 MiB/s compared to AFMT’s 4 MiB/s. However itis notable that for both systems the throughputs oscillate withlarger amplitudes than for three subtunnels. After second 16both systems recover to their previous throughput rate.V. R
ELATED W ORK
A. Multipath TCP
A MPTCP scheduler has more options to avoid packetreordering than a MT scheduler. It does not have to considerflowlets and can define new flowlets suitable to avoid packetreordering. For adaptivity, MPTCP trusts the congestion con-trol of its subflows. Every time space in a subflows’ cwnd opens and there is data to send, the scheduler is invoked [9].LowRTT [10] is a simple scheduler currently used as defaultin the Linux Kernel. When invoked, it picks the subflow with the lowest available RTT. It reduces head-of-line blocking anddelay variation by about 20%.DPSAF [13] is a sophisticated computation intensive sched-uler for vehicular networks. It tries to predict when the packetswill arrive and sends them out-of-order, so that they will arrivein order. While DPSAF might be a good solution for vehicularnetworks with bad connectivity, it is unclear how feasible it isfor high speed internet usage.
B. Multipath Tunnelling [3] proposes a MT system for non-TCP loss-tolerant mediatraffic and two subtunnel paths with fixed characteristics. ADSL path with high stable bandwidth and a LTE path withlow varying bandwidth as overflow vault. It detects packetloss via sequence numbers in a own header and adapts roundrobin weights acordingly. However, this reimplements existingtransport protocol functionality. It is not flow-aware and hasto re-reorders the packets with a reordering buffer.[5] researches multipath access networking in general. Ad-ditionally the author designed a HTTP extension that splitsvideos in chunks of fixed size and downloads them on separatepaths. This needs changes of the client application and onlyworks for a specific case.VI. C
ONCLUSIONS AND O UTLOOK
In this paper we proposed AFMT a packet schedulingalgorithm for multipath tunnelling that increases throughputand reliability. Several NS-3 simulations including changingpath capacities have shown that AFMT effectively deals withdiverse and changing network paths.These results were obtained although the experiments usedthe suboptimal TCP protocol for the subtunnels. Our futurework will evaluate and optimise AFMT characteristics usingDCCP with diverse payload traffic.R
EFERENCES[1] ns-3 Model Library Documentation , 2019.[2] M. Barreiros and P. Lundqvist,
QoS-Enabled networks: Tools andfoundations . John Wiley & Sons, 2015.[3] M. Bednarek, G. Barrenetxea, M. K¨uhlewind, and B. Trammell, “Mul-tipath bonding at layer 3.” in
ANRW , 2016, pp. 7–12.[4] S. Bohacek, J. P. Hespanha, J. Lee, C. Lim, and K. Obraczka, “Anew TCP for persistent packet reordering,”
IEEE/ACM Transactions onNetworking (TON) , vol. 14, no. 2, pp. 369–382, 2006.[5] K. Dominikn, “Multipath aggregation of heterogeneous access net-works,” Ph.D. dissertation, PhD Thesis, University of Oslo, 2011.[6] A. Ford, C. Raiciu, M. Handley, and O. Bonaventure, “TCP Extensionsfor Multipath Operation with Multiple Addresses,”
RFC 6824 , 2013.[7] S. Kandula, D. Katabi, S. Sinha, and A. Berger, “Dynamic LoadBalancing Without Packet Reordering,”
SIGCOMM Comput. Commun.Rev. , vol. 37, no. 2, pp. 51–62, Mar. 2007.[8] E. Kohler, M. Handley, and S. Floyd, “Datagram congestion controlprotocol (DCCP),”
RFC 4340 , 2006.[9] C. Paasch et al. , “Improving Multipath TCP,”
Diss. Universit’ecatholique de Louvain (UCL), London , 2014.[10] C. Paasch, S. Ferlin, O. Alay, and O. Bonaventure, “Experimental eval-uation of multipath tcp schedulers,” in
Proc. of 2014 ACM SIGCOMMworkshop on Capacity sharing workshop . ACM, 2014, pp. 27–32.[11] J. Postel, “Transmission control protocol,”
RFC 793 , 1981.[12] A. S. Tanenbaum,
Computer networks, 4-th edition , 2003.[13] K. Xue, J. Han, D. Ni, W. Wei, Y. Cai, Q. Xu, and P. Hong, “DPSAF:forward prediction based dynamic packet scheduling and adjusting withfeedback for multipath TCP in lossy heterogeneous networks,”