Fast Flow Volume Estimation
FFast Flow Volume Estimation
Ran Ben Basat
Technion [email protected] Gil Einziger
Nokia Bell Labs [email protected] Roy Friedman
Technion [email protected]
ABSTRACT
The increasing popularity of jumbo frames means growingvariance in the size of packets transmitted in modern net-works. Consequently, network monitoring tools must main-tain explicit traffic volume statistics rather than settle forpacket counting as before. We present constant time algo-rithms for volume estimations in streams and sliding win-dows, which are faster than previous work. Our solutions areformally analyzed and are extensively evaluated over multi-ple real-world packet traces as well as synthetic ones. Forstreams, we demonstrate a run-time improvement of up to2.4X compared to the state of the art. On sliding windows,we exhibit a memory reduction of over 100X on all tracesand an asymptotic runtime improvement to a constant. Fi-nally, we apply our approach to hierarchical heavy hittersand achieve an empirical 2.4-7X speedup.
1. INTRODUCTION
Traffic measurement is vital for many network algo-rithms such as routing, load balancing, quality of ser-vice, caching and anomaly/intrusion detection [18, 20,28, 37]. Typically, networking devices handle millionsof flows [38, 41]. Often, monitoring applications trackthe most frequently appearing flows, known as heavyhitters , as their impact is most significant.Most works on heavy hitters identification have fo-cused on packet counting [3, 17, 42]. However, in recentyears jumbo frames and large TCP packets are becom-ing increasingly popular and so the variability in packetsizes grows. Consequently, plain packet counting mayno longer serve as a good approximation for bandwidthutilization. For example, in data collected by [23] in2014, less than 1% of the packets account for over 25%of the total traffic. Here, packet count based heavy hit-ters algorithms might fail to identify some heavy hitterflows in terms of bandwidth consumption.Hence, in this paper we explicitly address monitoringof flow volume rather than plain packet counting. Fur-ther, given the rapid line rates and the high volume ofaccumulating data, an aging mechanism such as a slid-ing window is essential for ensuring data freshness andthe estimation’s relevance. Hence, we study estimations of flow volumes in both streams and sliding windows.Finally, per flow measurements are not enough forcertain functionalities like anomaly detection and Dis-tributed Denial of Service (DDoS) attack detection [40,43]. In such attacks, each attacking device only gen-erates a small portion of the traffic and is not a heavyhitter. Yet, their combined traffic volume is overwhelm-ing.
Hierarchical heavy hitters (HHH) aggregates trafficfrom IP addresses that share some common prefix [6].In a DDoS, when attacking devices share common IPprefixes, HHH can discover the attack. To that end, weconsider volume based HHH detection as well.Before explaining our contribution, let us first moti-vate why packet counting solutions are not easily adapt-able to volume estimation. Counter algorithms typi-cally maintain a fixed set of counters [3, 4, 16, 29, 34,35, 39] that is considerably smaller than the numberof flows. Ideally, counters are allocated to the heavyhitters. When a packet from an unmonitored flow ar-rives, the corresponding flow is allocated the minimalcounter [35] or a counter whose value has dropped be-low a dynamically increased threshold [34].We refer to a stream in which each packet is asso-ciated with a weight is as a weighted stream. Similarly,we refer to streams without weights, or when all packetsreceive the same weight as unweighted . For unweightedstreams, ordered data structures allow constant timeupdates and queries [3,35], since when a counter is incre-mented, its relative order among all counters changes byat most one. Unfortunately, maintaining the counterssorted after a counter increment in a weighted streameither requires to search for its new location, which in-curs a logarithmic cost, or resorting to logarithmic timedata structures like heaps. The reason is that if thecounter is incremented by some value w , its relative po-sition might change by up to w positions. This difficultymotivates our work . The most naive approach treats a packet of size w as w consecutive arrivals of the same packet in the unweightedcase, resulting in linear update times, which is even worse. a r X i v : . [ c s . D S ] O c t e contribute to the following network measurementproblems: (i) stream heavy hitters, (ii) sliding windowheavy hitters, (iii) stream hierarchical heavy hitters.Specifically, our first contribution is Frequent items Al-gorithm with a Semi-structured Table (FAST), a novelalgorithm for monitoring flow volumes and finding heavyhitters. FAST processes elements in worst case O (1)time using asymptotically optimal space. We formallyprove and analyze the performance of FAST. We thenevaluate FAST on 5 real Internet packet traces from adata center and backbone networks, demonstrating a2.4X performance gain compared to previous works.Our second contribution is Windowed Frequent itemsAlgorithm with a Semi-structured Table (WFAST), anovel algorithm for monitoring flow volumes and findingheavy hitters in sliding windows. We evaluate WFASTon five Internet traces and show that its runtime is rea-sonably fast, and that it requires as little as 1% of thememory of previous work [27]. We analyze WFAST andshow that it operates in constant time and is space op-timal, which asymptotically improves both the runtimeand the space consumption of previous work. We be-lieve that such a dramatic improvement makes volumeestimation over a sliding window practical!Our third contribution is
Hierarchical Frequent itemsAlgorithm with a Semi-structured Table (HFAST), whichfinds hierarchical heavy hitters. HFAST is created byreplacing the underlying HH algorithm in [36] (SpaceSaving) with FAST. We evaluate HFAST and demon-strate an asymptotic update time improvement as wellas an empirical 2.4-7X speedup on real Internet traces.
2. RELATED WORK2.1 Streams
Sketches such as
Count Sketch (CS) [8] and
CountMin Sketch (CMS) [15] are attractive as they enablecounter sharing and need not maintain a flow to countermapping for all flows. Sketches typically only providea probabilistic estimation, and often do not store flowidentifiers. Thus, they cannot find the heavy hitters,but only focus on the volume estimation problem. Ad-vanced sketches, such as Counter Braids [32], Random-ized Counter Sharing [31] and Counter Tree [9], improveaccuracy, but their queries require complex decoding.In counter based algorithms, a flow table is main-tained, but only a small number of flows are monitored.These algorithms differ from each other in the size andmaintenance policy of the flow table, e.g.,
Lossy Count-ing [34] and its extensions [16, 39],
Frequent [29] and
Space Saving [35]. Given ideal conditions, counter al-gorithms are considered superior to sketch based tech-niques. Particularly, Space Saving was empirically shownto be the most accurate [11,12,33]. Many counter basedalgorithms were developed by the databases community and are mostly suitable for software implementation.The work of [3] suggests a compact static memory im-plementation of Space Saving that may be more acces-sible for hardware design. Yet, software implementa-tions are becoming increasingly relevant in networkingas emerging technologies such as NFVs become popular.Alas, most previous works rely on sorted data struc-tures such as
Stream Summary [35] or SAIL [3] that onlyoperate in constant time for unweighted updates. Thus,a logarithmic time heap based implementation of SpaceSaving was suggested [12] for the more general volumecounting problem. IM-SUM, DIM-SUM [5] and BUS-SS [19] are very recent algorithms developed for thevolume heavy-hitters problem ( only for streams, with no sliding windows support). BUS offers a randomizedalgorithm that operates in constant time. IM-SUM op-erates in amortized O (1) time and DIM-SUM in worstcase constant time. Empirically, DIM-SUM it is slowerthan FAST. Additionally, DIM-SUM requires φ(cid:15) coun-ters, for some φ >
0, for guaranteeing N · M · (cid:15) errorand operating in O ( φ − ) time. FAST only needs half asmany counters for the same time and error guarantees. Heavy hitters on sliding windows were first studiedby [1]. Given an accuracy parameter ( ε ), a windowsize ( W ) and a maximal increment size ( M ), such al-gorithms estimate flows’ volume on the sliding windowwith an additive error that is at most W · M · ε .Their algorithm requires O (cid:0) (cid:15) log (cid:15) (cid:1) counters and O (cid:0) (cid:15) log (cid:15) (cid:1) time for queries and updates. The workof [30] reduces the space requirements and update timeto O (cid:0) (cid:15) (cid:1) . An improved algorithm with a constant up-date time is given in [26]. Further, [3] provided an al-gorithm that requires O (cid:0) (cid:15) (cid:1) for queries and supportsconstant time updates and item frequency queries.The weighted variant of the problem was only studiedby [27], whose algorithm operates in O (cid:0) A(cid:15) (cid:1) time andrequires O (cid:0) A(cid:15) (cid:1) space for a W · M · ε approximation; here, A ∈ [1 , M ] is the average packet size in the window.In this work, we suggest an algorithm for the weightedproblem that ( i ) uses optimal O (cid:0) (cid:15) (cid:1) space, ( ii ) performsheavy hitters queries in optimal O (cid:0) (cid:15) (cid:1) time, and (iii)performs volume queries and updates in constant time. Hierarchical Heavy Hitters (HHH) were addressed,e.g., in [13, 14, 21, 36, 43]. HHH algorithms monitor ag-gregates of flows that share a common prefix. To do so,HHH algorithms treat flows identifiers as a hierarchicaldomain. We denote by H the size of this domain.The full and partial ancestry algorithms [14] are triebased algorithms that require O (cid:0) H(cid:15) log (cid:15)N (cid:1) space andoperate at O ( H log (cid:15)N ) time. The state of the art [36]algorithm requires O (cid:0) H(cid:15) (cid:1) space and its update time for2eighted inputs is O (cid:0) H log( (cid:15) ) (cid:1) . It solves the approxi-mate HHH problem by dividing it into multiple simplerheavy hitters problems. In our work, we replace theunderlying heavy hitters algorithm of [36] with FAST,which yields a space complexity of O (cid:0) H(cid:15) (cid:1) and an updatecomplexity of O ( H ). That is, we improve the updatecomplexity from O (cid:0) H log (cid:0) (cid:15) (cid:1)(cid:1) to O ( H ).
3. PRELIMINARIES
Given a set U and a positive integer M ∈ N + , we saythat S is a ( U , M )-weighted stream if it contains a se-quence of (cid:104) id, weight (cid:105) pairs. Specifically: S = (cid:104) p , p , . . . p N (cid:105) , where ∀ i ∈ , . . . , N : p i ∈ U × { , . . . M } . Given apacket p i = ( d i , w i ), we say that d i is p i ’s id while w i isits weight; N is the stream length , and M is the max-imal packet size . Notice that the same packet id maypossibly appear multiple times in the stream, and eachsuch occurrence may potentially be associated with adifferent weight. Given a ( U , M )-weighted stream S , wedenote v x , the volume of id x , as the total weight of allpackets with id x . That is: v x (cid:44) (cid:80) i ∈{ ,...,N } : d i = x w i . For a window size W ∈ N + , we denote the window volume ofid x as its total weight of packets with id x within thelast W packets, that is: v Wx (cid:44) (cid:80) i ∈{ N − W +1 ,...,N } : d i = x w i . We seek algorithms that support the operations:
ADD ( (cid:104) x , w (cid:105) ): append a packet with identifier x andweight w to S . Query ( x ): return an estimate (cid:98) v x of v x . WinQuery ( x ): return an estimate (cid:99) v Wx of v Wx .We now formally define the main problems in this work:( (cid:15), M ) -Volume Estimation : Query ( x ) returns anestimation ( (cid:98) v x ) that satisfies v x ≤ (cid:98) v x ≤ v x + N · M · (cid:15). ( W , (cid:15), M ) -Volume Estimation : WinQuery ( x ) re-turns an estimation ( (cid:99) v Wx ) that satisfies v Wx ≤ (cid:99) v Wx ≤ v Wx + W · M · (cid:15). ( θ, (cid:15), M ) -Approximate Weighted Heavy Hitters :returns a set H ⊆ U such that: ∀ x ∈ U :( v x > N · M · θ = ⇒ x ∈ H ) ∧ ( v x < N · M · ( θ − (cid:15) ) = ⇒ x / ∈ H ) . ( W , θ, (cid:15), M ) -Approximate Weighted Heavy Hit-ters : returns a set H ⊆ U such that ∀ x ∈ U :( v Wx > W · M · θ = ⇒ x ∈ H ) ∧ ( v Wx < W · M · ( θ − (cid:15) ) = ⇒ x / ∈ H ) . Our heavy hitter definitions are asymmetric. Thatis, they require that flows whose frequency is above thethreshold of N · M · θ (or W · M · θ ) are included inthe list, but flows whose volume is slightly less than thethreshold can be either included or excluded from thelist. This relaxation is necessary as it enables reduc-ing the required amount of space to sub linear. Let usemphasize that the identities of the heavy hitter flows Symbol Meaning S stream N number of elements in the stream M maximal value of an element in the stream W window size U the universe of elements[ r ] the set { , , ..., r − } φ FAST performance parameter. v x the volume of an element x in S (cid:99) v x an estimation of v x v Wx the volume of element x in the last W elements of S (cid:100) v Wx an estimation of v Wx (cid:15) estimation accuracy parameter θ heavy hitters threshold parameter Table 1: List of SymbolsFigure 1: An example of how FAST utilizes the SOSstructure. Here, flows are partially ordered according tothe third digit (100’s), and each flow maintains its ownremainder; e.g., the estimated volume of D is (cid:99) v D = 583.are not known in advance. Hence, it is impossible toa-priori allocate counters only to these flows. The basicnotations used in this work are listed in Table 1.
4. FREQUENT ITEMS ALGORITHM WITHA SEMI-STRUCTURED TABLE (FAST)
In this section, we present
Frequent items Algorithmwith a Semi-structured Table (FAST) , a novel algorithmthat achieves constant time weighted updates. FASTuses a data structure called
Semi Ordered Summary(SOS) , which maintains flow entries in a semi orderedmanner. That is, similarly to previous works, SOSgroups flows according to their volume, each of whichis called a volume group . The volume groups are main-tained in an ordered list. Each volume group is associ-ated with a value C that determines the volume of itsnodes. Unlike existing data structures, counters withineach volume group are kept unordered.Unlike previous works, the grouping is done at coarsegranularity. Each node (inside a group) includes a vari-able called Remainder (denoted R ). The volume esti-mate of a flow is C + R where R is the remainder of itsvolume node and C is the value of its volume group.This semi-ordered structure is unique to SOS and en-ables it to serve weighted updates in O (1). Volumequeries are satisfied in constant time using a separateaggregate hash table which maps between each flowidentifier and its SOS node. FAST then uses SOS tofind a near-minimum flow when needed.Figure 1 provides an intuitive example for the case3 lgorithm 1 FAST (
M, (cid:15), φ ) Initialization: C ← ∅ , ∀ x : c x ← , r x ← , s ← (cid:22) M · φ (cid:23) , C ← (cid:24) φ(cid:15) (cid:25) function Add (Item x , Weight w ) if x ∈ C or | C | < C then c x ← c x + (cid:106) rx + w s (cid:107) r x ← ( r x + w ) mod s C ← C ∪ { x } else Let m ∈ argmin y ∈ C ( c y ) (cid:46) arbitrary minimal item c x ← c m + (cid:106) s − w s (cid:107) r x ← ( s − w ) mod s C ← C \ { m } ∪ { x } function Query ( x ) if x ∈ C or | C | < C then return r x + s · c x else return s − s · min y ∈ C c y M = 1 , C ) and the item’s remainder( R ), e.g., the volume of A is 400 + 32 = 432. Flowsare partially ordered according to their third digit, i.e.,in multiples of 100, or M/
10. Within a specific group,however, items are unordered, e.g., A, B and J are un-ordered but all appear before items with volume of atleast 500. As the number of lists to skip prior to anaddition is O (1), the update complexity is also O (1).Intuitively, flows are only ordered according to vol-ume groups and if we make sure that the maximal weightcan only advance a flow a constant number of flowgroups then SOS operates in constant time. Alas, keep-ing the flows only partially ordered increases the er-ror. We compensate for such an increase by requiringa larger number of SOS entries compared to previouslysuggested fully ordered structures. The main challengein realizing this idea is to analyze the accuracy impactand provide strong estimation guarantees. FAST employs (cid:108) φ(cid:15) (cid:109) counters, for some non-negativeconstant φ ≥ φ determines how ordered SOS is: for φ = 0, we get full order, while for φ >
0, it is only or-dered up to M · φ/ M · φ/ O (1 /φ ) andis therefore constant for any fixed φ . We note that anΩ (cid:0) (cid:15) (cid:1) counters lower bound is known [35]. Thus, FASTis asymptotically optimal for constant φ . The pseudocode of FAST appears in Algorithm 1. We start by a simple useful observation
Observation Let a, b ∈ N : a = b · (cid:4) ab (cid:5) +( a mod b ) . For the analysis, we use the following notations: forevery item x ∈ U and stream length t , we denote by q t ( x ) the value of Query ( x ) after seeing t elements.We slightly abuse the notation and refer to t also as the time at which the t th element arrived, where time hereis discrete. We denote by C t the set of elements withan allocated counter at time t , by r x,t the value of r x and by c x,t the value of c x . Also, we denote the volumeat time t as v x,t (cid:44) (cid:80) i ∈{ ,...,t } : d i = x w i . All missing proofsappear in Appendix A.We now show that FAST has a one-sided error. Lemma For any t ∈ N , after seeing any ( U , M ) -weighted stream S of length t , for any x ∈ U : v x ≤ (cid:98) v x . We continue by showing that FAST is accurate ifthere are only a few distinct items.
Lemma If the stream contains at most (cid:108) φ(cid:15) (cid:109) dis-tinct elements then FAST provides an exact estimationof an items volume upon query. We now analyze the sum of counters in C . Lemma For any t ∈ N , after seeing any ( U , M ) -weighted stream S of length t , FAST satisfies: (cid:80) x ∈ C t Query ( x ) ≤ t · M · (1 + φ/ . Next, we show a bound on FAST’s estimation error.
Lemma For any t ∈ N , after seeing any ( U , M ) -weighted stream S of length t , for any x ∈ U : (cid:98) v x ≤ v x + t · M · (cid:15). Next, we prove a bound on the run time of FAST.
Lemma let φ > , FAST adds in O (cid:16) φ (cid:17) time. Next, we combine Lemma 1, Lemma 4 and Lemma 5to conclude the correctness of the FAST algorithm.
Theorem For any constant φ > , when allocated C (cid:44) (cid:108) φ(cid:15) (cid:109) counters, FAST operates in constant timeand solves the ( (cid:15), M ) - Volume Estimation problem.
Finally, FAST also solves the heavy hitters problem:
Theorem For any fixed φ > , when allocatedwith C (cid:44) (cid:108) φ(cid:15) (cid:109) counters, by returning { x ∈ U | (cid:98) v x ≥ N · M · θ } , FAST solves the ( θ, (cid:15), M )- Weighted HeavyHitters problem.
5. WINDOWED FAST (WFAST)
We now present
Windowed Frequent items Algorithmwith a Semi-structured Table (WFAST) , an efficient al-gorithm for the (
W, (cid:15), M ) - Volume Estimation and(
W, θ, (cid:15), M ) - Weighted Heavy Hitters problems.We partition the stream into consecutive sequences ofsize W called frames . Each frame is further divided into k (cid:44) (cid:6) (cid:15) (cid:7) blocks , each of size Wk , which we assume is aninteger for simplicity. Figure 2 illustrates the setting.4igure 2: The stream is divided into intervals of size W called frames and each frame is partitioned into k equal-sized blocks . The window of interest is also of size W , and overlaps with at most 2 frames and k +1 blocks. k A constant k (cid:44) (cid:100) /ε (cid:101) y A FAST instance using k (1 + φ ) counters. b A queue of k + 1 queues.An efficient implementation appears in [3]. B The histogram of b , implemented using a hash table. o The offset within the current frame.
Table 2: Variables used by the WFAST algorithm.WFAST uses a FAST instance y to estimate the vol-ume of each flow within the current frame. Once a frameends (the stream length is divisible by W ), we “flush”the instance, i.e., reset all counters and remainders to0. Yet, we do not “forget” all information in a flush, ashigh volume flows are stored in a dedicated data struc-ture. Specifically, we say that an element x overflowed at time t if (cid:106) q x,t MW/k (cid:107) > (cid:106) q x,t − MW/k (cid:107) . We use a queue ofqueues structure b to keep track of which elements haveoverflowed in each block. That is, each node of the mainqueue represents a block and contains a queue of all el-ements that overflowed in its block. Particularly, thesecondary queues maintain the ids of overflowing ele-ments. Once a block ends, we remove the oldest block’snode (queue) from the main queue, and initialize a newqueue for the starting block. Finally, we answer queriesabout the window volume of an item x by multiply-ing its overflows count by M W/k , adding the residualcount from y (i.e., the part that is not recorded in b ),plus 2 M W/k to ensure an overestimation.For O (1) time queries, we also maintain a hash table B that tracks the overflow count for each item. That is,for each element x , B [ x ] contains the number of times x is recorded in b . Since multiple items may overflow inthe same block, we cannot update B once a block endsin constant time. We address this issue by deamortiz-ing B ’s update, and on each arrival we remove a single item from the queue of the oldest block (if such exists).The pseudo code of WFAST appears in Algorithm 2and a list containing its variables description appearsin Table 2. An efficient implementation of the queue ofqueues b is described in [3]. We start by introducing several notations to be usedin this section. We mark the queried element by x , the Algorithm 2
WFAST (
W, M, φ ) Initialization: y ← F requentitemsAlgorithmwithaSemi − structuredT able ( M, /k, φ ) , o ← , B ← Empt hash table , b ← Queue of k + 1 empty queues. function add (Item x , Weight w ) o ← o + 1 mod W if o = 0 then (cid:46) new frame starts y . flush () if o mod Wk = 0 then (cid:46) new block b . pop () b . append (new empty queue) if b .tail is not empty then (cid:46) remove oldest item oldID ← b .tail. pop () B [ oldID ] ← B [ oldID ] − if B [ oldID ] = 0 then B . remove ( oldID ) prevOverflowCount ← (cid:22) y . query ( x ) MW/k (cid:23) y. add ( x, w ) (cid:46) add item if (cid:22) y . query ( x ) MW/k (cid:23) > prevOverflowCount then (cid:46) overflow b .head. push ( x ) if B. contains (x) then B [ x ] ← B [ x ] + 1 else B [ x ] ← (cid:46) adding x to B function WinQuery (Item x) if B . Contains ( x ) then return MW/k · ( B [ x ] + 2) + ( y. query ( x ) mod MW/k ) else (cid:46) x has no overflows return MW/k +y. query ( x ) current time by W + o , and assume that item W isthe first element of the current frame. For convenience,denote v x ( t , t ) (cid:44) (cid:80) i ∈{ t ,...,t } : x i = x w i , i.e., the volume of x between t and t . The goal is then to approximate thewindow volume of x , which is defined as v wx (cid:44) v ( o +1 , W + o ) , i.e., the sum of weights in the timestamps within (cid:104) o + 1 , o + 2 , . . . , W + o (cid:105) in which x arrived. We nextstate the main correctness theorem for WFAST. Theorem Algorithm 2 solves the ( W, (cid:15), M ) - VolumeEstimation problem.
Due to lack of space, the proof of the Theorem appearsin the Appendix.As a corollary, Algorithm 2 can find heavy hitters.
Theorem By returning all items x ∈ U for which (cid:99) v Wx ≥ M W θ , Algorithm 2 solves ( W, θ, (cid:15), M ) - WeightedHeavy Hitters . WFAST runtime analysis:
As listed in the pseudo code of WFAST (see Algo-rithm 2) and the description above, processing new el-ements requires adding them to the FAST instance y ,which takes O ( φ ) time, and another O (1) operations.The query processing includes O (1) operations and hashtables accesses. For returning the heavy hitters, we goover all of the items with allocated counters in time O ( φ(cid:15) ). In summary, we get the following theorem: Theorem For any fixed φ > , WFAST processesnew elements and answers window-volume queries inconstant time, while finding the window’s weighted heavyhitters in O ( (cid:15) ) time. a) SanJose14 (b) YouTube (c) Chicago16(d) SanJose13 (e) DC1 (f) Chicago15 Figure 3: Runtime comparison for a given error guarantee ( (cid:15) = 2 − ). All algorithms provide the same guaranteesand FAST uses different φ values to show the speedup gained from allocating additional counters.
6. HIERARCHICAL HEAVY HITTERS
Hierarchical heavy hitters (HHH) algorithms treat IPaddresses as a hierarchical domain. At the bottom are fully specified
IP addresses such as p = 101 . . . p = 101 . . . ∗ and p = 101 . . ∗ are level 1 and level 2 prefixes of p ,respectively. Such prefixes generalize an IP address. Inthis example, p ≺ p ≺ p , indicating that p satisfiesthe pattern of p , and any IP address that satisfies p also satisfies p . The above example refers to a sin-gle dimension (e.g., the source IP), and can be gener-alized to multiple dimensions (e.g., pairs of source IPand destination IP). HHH algorithms need to find theheavy hitter prefixes at each level of the induced hierar-chy. For example, this enables identifying heavy hitterssubnets, which may be suspected of generating a DDoSattack. The problem is formally defined in [14, 36]. Hierarchical Fast (HFAST)
Hierarchical FAST (HFAST) is derived from the algo-rithm of [36]. Specifically, the work of [36] suggests
Hierarchical Space Saving with a Heap(HSSH) . In theirwork, the HHH prefixes are distilled from multiple so- lutions of plain heavy hitter problems. That is, eachprefix pattern has its own separate heavy hitters al-gorithm that is updated on each packet arrival. Forexample, consider a packet whose source IP address is101 . . .
104 where the (one dimensional) HHH mea-surements are carried according to source addresses. Inthis case, the packet arrival is translated into the follow-ing five heavy hitters update operations: 101 . . . . . . ∗ , 101 . . ∗ , 101 . ∗ , and ∗ . Finally, HHHsare identified by calculating the heavy hitters of eachseparate heavy hitters algorithm.HFAST is derived by replacing the underlying heavyhitters algorithm in [36] from Space Saving with heap [35]to FAST. This asymptotically improves the update com-plexity from O (cid:0) H log (cid:0) (cid:15) (cid:1)(cid:1) to O ( H ), where H is the sizeof the hierarchy. Since the analysis of [36] is indiffer-ent to the internal implementation of the heavy hittersalgorithm, no analysis is required for HFAST.Finally, we note that a hierarchical heavy hitters al-gorithm on sliding windows can be constructed usingthe work of [36] by replacing each space saving instancewith our WFAST. The complexity of the proposed al-gorithm is O (cid:0) H(cid:15) (cid:1) space and O ( H ) update time. To ourknowledge, there is no prior work for this problem.6 a) SanJose14 (b) YouTube (c) Chicago16(d) SanJose13 (e) DC1 (f) Chicago15 Figure 4: Runtime comparison as a function of accuracy guarantee ( (cid:15) ) provided by the algorithms.
7. EVALUATION
Our evaluation is performed on an Intel i7-5500UCPU with a clock speed of 2.4GHz, 16 GB RAM and aWindows 8.1 operating system. We compare our C++prototypes to the following alternatives:
Count Min Sketch (CMS) [15] – a sketch based solu-tion that can only solve the volume estimation problem.
Space Saving Heap (SSH) – a heap based implemen-tation [12] of Space Saving [35] that has a logarithmicruntime complexity.
Hierarchical Space Saving Heap (HSSH) – a hierar-chical heavy hitters algorithm [36] that uses SSH as abuilding block and operates in O ( H log( ε )) complexity. Full Ancestry – a trie based HHH algorithm suggestedby [14], which operates in O ( H log (cid:15)N ) complexity. Partial Ancestry – a trie based HHH algorithm sug-gested by [14], which operates in O ( H log (cid:15)N ) complex-ity and is considered faster than Full Ancestry.Related work implementations were taken from opensource libraries released by [11] for streams and by [36]for hierarchical heavy hitters. As we have no access toa concrete implementation of a competing sliding win-dow protocol, we compare WFAST to Hung and Ting’salgorithm [27] by conservatively estimating the spaceneeded by their approach. Each data point we reporthere is the average of 10 runs. Our evaluation includes the following datasets. Thepacket traces characteristics are summarized in Table 3.The CAIDA backbone Internet traces that monitorlinks in Chicago [24, 25] and San Jose [22, 23]. A dat-acenter trace from a large university [7] and a trace of436K YouTube video accesses [10]. The weight of avideo is its length in seconds.As shown in Table 3, the impact of jumbo framesvaries between backbone links. Yet, the weight of largepackets increases over time in both. In the San Joselink, the number and volume of large packets have in-creased by 50% within a period of 6 months. In theChicago link, large packets are still insignificant, buttheir number and volume have increased by 50% in twomonths. φ on Runtime Recall that smaller φ yields space efficiency while theruntime is proportional to φ , i.e, smaller φ is expectedto cause a slower runtime. In Appendix B, we showruntime performance evaluation of FAST as a functionof φ for three different ε values (2 − , − , − ). Whilewe indeed obtained a speedup with larger φ values, in-creasing φ beyond a certain small threshold has littleimpact on performance. For the rest of our evaluation,we focus on φ = 0 .
25 that offers attractive space/time7 a) SanJose14 (b) YouTube (c) Chicago16
Figure 5: Space overheads of WFAST compared to previous works. Note that WFAST operates in constant timewhile the other algorithm requires linear scanning of all counters. (a) Chicago 16 (b) YouTube (c) DC1(d) Chicago16 (e) YouTube (f) DC1
Figure 6: WFAST with varying window sizes ( ε = 2 − ) and varying ε (with a window size of W = 2 ).trade off, as well as on φ = 4 that yields higher perfor-mance at the expense of more space. To explain the tradeoff proposed by FAST, we mea-sured the runtime of the various algorithms for a fixederror guarantee. Here, SSH and CMS are fully deter-mined by the error guarantees (set to be (cid:15) = 2 − ) andthus have a single measurement point. CMS requiresmore counters as it uses 10 rows of (cid:100) e/(cid:15) (cid:101) counters each, while SSH only requires 1 /(cid:15) . FAST can provide thesame error guarantee for different φ values, which affectsboth runtime and the number of counters. Hence, FASTis represented by a curve. As Figure 3 shows, in alltraces, allocating a few additional counters to the 1 /(cid:15) re-quired by SSH allows FAST to achieve higher through-put. Additionally, on all traces, FAST provides fasterthroughput than CMS with far fewer counters. WhileFAST has larger per counter overheads than CMS, itsID to counter mapping allows it to solve the Weighted a) SanJose14 (b) Chicago15 (c) Chicago16 Figure 7: Runtime comparison of HHH algorithms as a function of their accuracy guarantee ( (cid:15) ). Trace Date(Y/M/D)
Table 3: A summary of key characteristics of the real Internet traces used in this work.
Heavy Hitters problem that CMS cannot.
Figure 4 presents a comparative analysis of the oper-ation speed of previous approaches. Recall that CMS isa probabilistic scheme; we configured it with a failureprobability of 0 . φ = 4 (4FAST) and φ = 0 .
25 (0.25FAST).As can be observed, 4FAST and 0.25FAST are con-siderably faster than the alternatives in Chicago16 andYouTube. In SanJose14 and SanJose13, SSH is as fastas 4FAST for a large (cid:15) (small number of counters). Yet,as (cid:15) decreases and the number of counters increases,SSH becomes slower due to its logarithmic complex-ity. In contrast, CMS is almost workload independent.When considering only previous work, in some work-loads CMS is faster than SSH, mainly because SSH’sperformance is workload dependent.
We evaluate WFAST compared to Hung and Ting’salgorithm [27], which is the only one that supportsweighted updates on sliding windows. Figure 5 showsthe memory consumption of WFAST with parameters φ = 4 and φ = 0 .
25 (4WFAST, 0.25FAST) compared toHung and Ting’s algorithm. All algorithms are config-ured to provide the same worst case error guarantee. Asshown, WFAST is up to 100 times more space efficientthan Hung and Ting’s algorithm. Sadly, we could notobtain an implementation of Hung and Ting’s algorithmand thus do not compare its runtime to WFAST. How-ever, WFAST improves their update complexity from O ( A(cid:15) ), where A is the average packet size, to O (1). Figure 6 shows the operation speed of WFAST for dif-ferent window sizes and different ε values. There is littledependence in window size and ε with the exception ofthe DC1 dataset. In this dataset, since the average andmaximal packet sizes are similar, the inner working ofWFAST causes overflows to be more frequent when ε is close to the window size. Thus, to achieve similarperformance as the other traces one needs sufficientlylarge window size in this trace. In Figure 7, we evaluate the speed of our HFASTcompared to the algorithm of [36], which is denoted byHSSH, as well as the Partial Ancestry and Full Ances-try algorithms by [14]. We used the library of [36] fortheir own HSSH implementation as well as for the Par-tial Ancestry and Full Ancestry implementations. Sincethe library was released for Linux, we used a differentmachine for our HFAST evaluation. Specifically, weused a Dell 730 server running Ubuntu 16.04.01 release.The server has 128GB of RAM and an Intel(R) Xeon(R)CPU E5-2667 v4 @ 3.20GHz processor.We used two dimensional source/destination hierar-chies in byte granularity, where networks IDs are as-sumed to be 8, 16 or 24 bits long. The weight of eachpacket is its byte volume, including both the payloadsize and the header size. As depicted, HFAST is upto 7 times faster than the best alternative and at least2.4 times faster in every data point. It appears thatfor large (cid:15) values, HSSH is faster than the Partial andFull Ancestry algorithms. Yet, for small (cid:15) values, allprevious algorithms operate in similar speed.9 . DISCUSSION
In this paper, we presented algorithms for estimatingper flow traffic volume in streams, sliding windows andhierarchical domains. Our algorithms offer both asymp-totic and empirical improvements for these problems.For streams, FAST processes packets in constant timewhile being asymptotically space optimal. This is en-abled by our novel approach of maintaining only a par-tial order between counters. An evaluation over real-world traffic traces has yielded a speed improvement ofup to 2.4X compared to previous work.In the sliding window case, we showed that WFASTworks reasonably fast and offers 100x reduction in re-quired space, bringing sliding windows to the realm ofpossibility. For a given error of W · M · (cid:15) , WFAST re-quires O (cid:0) (cid:15) (cid:1) counters while previous work uses O (cid:0) A(cid:15) (cid:1) ,where A is the average packet size. Moreover, WFASTruns in constant time while previous work runs in O (cid:0) A(cid:15) (cid:1) .For hierarchical domains, we presented HFAST thatrequires O ( H(cid:15) ) space and has O ( H ) update complexity.This improves over the O (cid:0) H log (cid:15) (cid:1) update complex-ity of previous work. Additionally, we demonstrateda speedup of 2.4X-7X on real Internet traces. To ourknowledge, there is no prior work on that problem andwe plan to examine its possible applications in the fu-ture. The code of FAST is available as open source [2].We thank Yechiel Kimchi for helpful code optimiza-tion suggestions.
9. REFERENCES [1]
Arasu, A., and Manku, G. S.
Approximate counts and quantilesover sliding windows. In
ACM PODS 2004 .[2]
Ben-Basat, R., and Einziger, G.
FAST code. Available: https://github.com/ranbenbasat/FAST .[3]
Ben-Basat, R., Einziger, G., Friedman, R., and Kassner, Y.
Heavy Hitters in Streams and Sliding Windows. In
IEEEINFOCOM (2016).[4]
Ben-Basat, R., Einziger, G., Friedman, R., and Kassner, Y.
Randomized admission policy for efficient top-k and frequencyestimation. In
IEEE INFOCOM (2017).[5]
Ben-Basat, R., Einziger, G., Friedman, R., and Kassner, Y.
Optimal Elephant Flow Detection. In
IEEE INFOCOM (2017).[6]
Ben Basat, R., Einziger, G., Friedman, R., Luizelli, M. C., andWaisbard, E.
Constant time updates in hierarchical heavyhitters. In
ACM SIGCOMM (2017).[7]
Benson, T., Akella, A., and Maltz, D. A.
Network trafficcharacteristics of data centers in the wild. In
ACMIMC (2010) .[8]
Charikar, M., Chen, K., and Farach-Colton, M.
FindingFrequent Items in Data Streams. In
EATCS ICALP (2002).[9]
Chen, M., and Chen, S.
Counter Tree: A Scalable CounterArchitecture for Per-Flow Traffic Measurement. In
IEEE ICNP (2015).[10]
Cheng, X., Dale, C., and Liu, J.
Statistics and Social Networkof YouTube Videos. In
IWQoS (2008).[11]
Cormode, G., and Hadjieleftheriou, M.
Finding FrequentItems in Data Streams.
VLDB 1 , 2 (2008).[12]
Cormode, G., and Hadjieleftheriou, M.
Methods for FindingFrequent Items in Data Streams.
J. VLDB 19 , 1 (2010).[13]
Cormode, G., Korn, F., Muthukrishnan, S., and Srivastava, D.
Diamond in the Rough: Finding Hierarchical Heavy Hitters inMulti-dimensional Data. SIGMOD 2004.[14]
Cormode, G., Korn, F., Muthukrishnan, S., and Srivastava, D.
Finding Hierarchical Heavy Hitters in Streaming Data.
ACMTrans. Knowl. Discov. Data 1 , 4 (2008).[15]
Cormode, G., and Muthukrishnan, S.
An Improved DataStream Summary: The Count-min Sketch and Its Applications.
J. Algorithms (2005). [16]
Dimitropoulos, X., Hurley, P., and Kind, A.
ProbabilisticLossy Counting: An Efficient Algorithm for Finding HeavyHitters.
ACM SIGCOMM CCR 38 , 1 (2008).[17]
Einziger, G., Fellman, B., and Kassner, Y.
IndependentCounter Estimation Buckets. In
IEEE INFOCOM (2015).[18]
Einziger, G., and Friedman, R.
TinyLFU: A Highly EfficientCache Admission Policy. In
Euromicro PDP (2014).[19]
Einziger, G., Luizelli, M. C., and Waisbard, E.
Constant timeweighted frequency estimation for virtual networkfunctionalities. In (2017).[20]
Garcia-Teodoro, P., Diaz-Verdejo, J. E., Macia-Fernandez,G., and Vazquez, E.
Anomaly-Based Network IntrusionDetection: Techniques, Systems and Challenges.
Computersand Security (2009).[21]
Hershberger, J., Shrivastava, N., Suri, S., and T´oth, C. D.
Space Complexity of Hierarchical Heavy Hitters inMulti-dimensional Data Streams. In
ACM PODS (2005).[22]
Hick, P.
CAIDA Anonymized Internet Trace, equinix-sanjose2013-06-19 13:00-13:05 UTC, Direction B., 2014.[23]
Hick, P.
CAIDA Anonymized Internet Trace, equinix-sanjose2013-12-19 13:00-13:05 UTC, Direction B., 2014.[24]
Hick, P.
CAIDA Anonymized Internet Trace, equinix-chicago2015-12-17 13:00-13:05 UTC, Direction A., 2015.[25]
Hick, P.
CAIDA Anonymized Internet Trace, equinix-chicago2016-02-18 13:00-13:05 UTC, Direction A., 2016.[26]
Hung, R. Y. S., Lee, L., and Ting, H.
Finding frequent itemsover sliding windows with constant update time.
Inf. Proc.Let.10’ 110 , 7.[27]
Hung, R. Y. S., and Ting, H. F.
Finding Heavy Hitters over theSliding Window of a Weighted Data Stream. In
LATIN (2008).[28]
Kabbani, A., Alizadeh, M., Yasuda, M., Pan, R., andPrabhakar, B.
AF-QCN: Approximate Fairness with QuantizedCongestion Notification for Multi-tenanted Data Centers. In
IEEE HOTI (2010).[29]
Karp, R. M., Shenker, S., and Papadimitriou, C. H.
A SimpleAlgorithm for Finding Frequent Elements in Streams and Bags.
ACM Transactions Database Systems 28 , 1 (Mar. 2003).[30]
Lee, L., and Ting, H. F.
A simpler and more efficientdeterministic scheme for finding frequent items over slidingwindows. In
Proc. of PODS 2006 .[31]
Li, T., Chen, S., and Ling, Y.
Per-Flow Traffic MeasurementThrough Randomized Counter Sharing.
IEEE/ACM Trans. onNetworking (2012).[32]
Lu, Y., Montanari, A., Prabhakar, B., Dharmapurikar, S.,and Kabbani, A.
Counter Braids: a Novel Counter Architecturefor Per-Flow Measurement. In
ACM SIGMETRICS (2008).[33]
Manerikar, N., and Palpanas, T.
Frequent Items in StreamingData: An Experimental Evaluation of the State-of-the-Art.
Data Knowl. Eng. (2009).[34]
Manku, G. S., and Motwani, R.
Approximate FrequencyCounts over Data Streams. In
VLDB (2002).[35]
Metwally, A., Agrawal, D., and Abbadi, A. E.
EfficientComputation of Frequent and Top-k Elements in Data Streams.In
IN ICDT (2005).[36]
Mitzenmacher, M., Steinke, T., and Thaler, J.
HierarchicalHeavy Hitters with the Space Saving Algorithm. In
ALENEX (2012).[37]
Mukherjee, B., Heberlein, L., and Levitt, K.
NetworkIntrusion Detection.
Network, IEEE 8 , 3 (1994).[38]
Ramabhadran, S., and Varghese, G.
Efficient Implementationof a Statistics Counter Architecture.
ACM SIGMETRICS (2003).[39]
Rong, Q., Zhang, G., Xie, G., and Salamatian, K.
MnemonicLossy Counting: An efficient and accurate heavy-hittersidentification algorithm. In
IEEE IPCCC (2010).[40]
Sekar, V., Duffield, N., Spatscheck, O., van der Merwe, J.,and Zhang, H.
LADS: Large-scale Automated DDOS DetectionSystem. In
USENIX ATEC (2006).[41]
Shah, D., Iyer, S., Prabhakar, B., and McKeown, N.
Maintaining Statistics Counters in Router Line Cards.
IEEEMicro (2002).[42]
Tsidon, E., Hanniel, I., and Keslassy, I.
Estimators Also NeedShared Values to Grow Together. In
IEEE INFOCOM (2012).[43]
Zhang, Y., Singh, S., Sen, S., Duffield, N., and Lund, C.
Online Identification of Hierarchical Heavy Hitters:Algorithms, Evaluation, and Applications. ACM IMC. PPENDIXA. MISSING PROOFSProof of Lemma 1
Proof.
We prove v x,t ≤ q t ( x ) by induction over t . Basis: t = 0. Here, we have v x,t = 0 = q t ( x ). Hypothesis: v x,t − ≤ q t − ( x ) Step: (cid:104) x t , w t (cid:105) arrives at time t . By case analysis:Consider the case where the queried item x is notthe arriving one (i.e., x (cid:54) = x t ). In this case, we have v x,t = v x,t − . If x ∈ C t − but was evicted (Line 10)then c x ∈ argmin y ∈ C t − ( c y,t − ). This means that: q t − ( x ) = r x,t − + s · argmin y ∈ C t − ( c y,t − ) ≤ s − s · argmin y ∈ C t ( c y,t ) = q t ( x ) , where the last equation follows from the query for x / ∈ C t (Line 15). Next, if x ∈ C t − and x ∈ C t , its es-timated volume is determined by Line 13 and we get q t ( x ) = q t − ( x ) ≥ v x,t − = v x,t . If x / ∈ C t − then x / ∈ C t , so the values of q t ( x ) , q t − ( x ) are determinedby line 15. Since the value of min y ∈ C c y can only in-crease over time, we have q t ( x ) ≥ q t − ( x ) ≥ v x,t andthe claim holds.On the other hand, assume that we are queried aboutthe last item, i.e., x = x t . In this case, we get v x,t = v x,t − + w t . We consider the following cases: First, if x ∈ C t − , then q t ( x ) = q t − ( x )+ w t . Using the hypothe-sis, we conclude that v x,t = v x,t − + w t ≤ q t − ( x )+ w t = q t ( x ) as required. Next, if | C t − | < C , we also have q t ( x ) = q t − ( x ) + w t and the above analysis holds. Fi-nally, if x / ∈ C t − and | C t − | = C , then q t − ( x ) = s − s · min y ∈ C t − c y,t − . (1) On the other hand, when x arrives, the condition ofLine 2 was not satisfied, and thus q t ( x ) = r x,t + s · c x,t = ( s − w ) mod s + s · (cid:18) min y ∈ C t − c y,t − + (cid:22) s − w s (cid:23)(cid:19) (Observation 1) = s · min y ∈ C t − c y,t − + s − w (1) = q t − ( x ) + w (cid:16) inductionhypothesis (cid:17) ≥ v x,t − + w = v x,t . Proof of Lemma 2
Proof.
Since | C | ≤ C , we get that the conditionsin Line 2 and Line 13 are always satisfied. Before thequeried element x first appeared, we have r x = c x =0 and thus Query ( x )= 0. Once x appears once, itgets a counter and upon every arrival with value w , theestimation for x exactly increases by w , since x nevergets evicted (which can only happen in Line 7). Proof of Lemma 3
Proof.
We prove the claim by induction on the streamlength t . Basis: t = 0.In this case, all counters have value of 0 and thus (cid:80) x ∈ C t q t ( x ) = 0 = t · ( M · (1 + φ/ Hypothesis: (cid:80) x ∈ C t − q t − ( x ) ≤ ( t − · M · (1 + φ/ . Step: (cid:104) x t , w t (cid:105) arrives at time t . We consider the fol-lowing cases:1. x ∈ C t − or | C t − | < (cid:108) φ(cid:15) (cid:109) . In this case, thecondition in Line 2 is satisfied and thus c x,t = c x,t − + (cid:106) r x,t − + w s (cid:107) (Line 3) and r x,t = ( r x,t − + w )mod s (Line 4). By Observation 1 we get q t ( x ) = (cid:16) by line13 (cid:17) r x,t + s · c x,t = c x,t − + (cid:22) r x,t − + w s (cid:23) + ( r x,t − + w ) mod s = w + c x,t − + r x,t − = q t − ( x ) + w. (2) Since the value of a query for every y ∈ C t \ { x } remains unchanged, we get that (cid:88) y ∈ Ct q t ( y ) = q t ( x ) + (cid:88) y ∈ Ct − y (cid:54) = x q t − ( y ) (by (3)) = w + q t − ( x ) + (cid:88) y ∈ Ct − y (cid:54) = x q t − ( y )= w + (cid:88) y ∈ Ct − q t − ( y ) (cid:16) inductionhypothesis (cid:17) ≤ w + ( t − · ( M · (1 + φ/ ≤ M + ( t − · ( M · (1 + φ/ ( φ ≥ ≤ t · ( M · (1 + φ/ . x / ∈ C t − and | C t − | = (cid:108) φ(cid:15) (cid:109) . In this case, thecondition of Line 2 is false and therefore c x,t = c m,t − + (cid:4) s − w s (cid:5) (Line 8) and r x,t ← ( s − w )mod s (Line 9). From Observation 1 we get that q t ( x ) = (cid:16) by Line13 (cid:17) r x,t + s · c x,t = c m,t − + (cid:22) s − w s (cid:23) + ( s − w ) mod s = w + c m,t − + s − q t − ( m ) − r m,t − + (cid:22) Mφ (cid:23) + w ≤ q t − ( m ) + (cid:22) Mφ (cid:23) + w. (3) As before, the value of a query for every y ∈ C t \{ x } is unchanged, and since C t − \ C t = { m } , (cid:88) y ∈ Ct q t ( y ) = q t ( x ) − q t − ( m ) + (cid:88) y ∈ Ct − q t − ( y ) (by (3)) ≤ (cid:22) Mφ (cid:23) + w + (cid:88) y ∈ Ct − q t − ( y ) (cid:16) inductionhypothesis (cid:17) ≤ (cid:22) Mφ (cid:23) + w + ( t − · ( M · (1 + φ/ ≤ (cid:22) Mφ (cid:23) + M + ( t − · ( M · (1 + φ/ ( φ ≥ ≤ t · ( M · (1 + φ/ . a) SanJose14 (b) YouTube (c) Chicago16(d) SanJose13 (e) DC1 (f) Chicago15 Figure 8: The effect of parameter φ on operation speed for different error guarantees ( (cid:15) ). φ influences the spacerequirement as the algorithm is allocated with (cid:108) φ(cid:15) (cid:109) counters. Proof of Lemma 4
Proof.
First, consider the case where the streamcontains at most (cid:108) φ(cid:15) (cid:109) distinct elements. By Lemma 2, (cid:98) v x ≤ v x and the claim holds. Otherwise, we have seenmore than (cid:108) φ(cid:15) (cid:109) distinct elements, and specifically t > (cid:24) φ(cid:15) (cid:25) . (4) From Lemma 3, it follows that min y ∈ Ct Query ( y ) ≤ t · M · (1 + φ/ (cid:108) φ(cid:15) (cid:109) ≤ t · M · (cid:15) · (1 + φ/ φ . (5) Notice that ∀ x ∈ C t , Query ( x ) is determined in Line 13;that is, q t ( x ) = r x,t + s · c x,t . Next, observe that anitem’s remainder value is bounded by s − ∀ x, y ∈ C t : q t ( x ) ≥ s + q t ( y ) = ⇒ c x,t > c y,t . (6) By choosing y ∈ arg min y ∈ C t q t ( y ), we get that if v x,t ≥ q t ( y ) + s , then q t ( x ) ≥ q t ( y ) + s and thus c x,t > c y,t .Next, we show that if v x,t ≥ t · M · (cid:15) , then c x > min y ∈ C t c y and thus x will never be the “victim” in Line 7: q t ( x ) ≥ v x,t ≥ t · M · (cid:15) = t · M · (cid:15) · φ/
21 + φ + Mφ/ · t φ(cid:15) (5) ≥ q t ( y ) + Mφ/ · t φ(cid:15) (4) > q t ( y ) + Mφ/ . Next, since q t ( x ) and q t ( y ) are integers, it follows that q t ( x ) ≥ q t ( y ) + (cid:22) M · φ (cid:23) = q t ( y ) + s . Finally, we apply (6) to conclude that once x arriveswith a cumulative volume of t · M · (cid:15) , it will never beevicted (Line 7) and from that moment on its volumewill be measured exactly. Proof of Lemma 5
Proof.
As mentioned before, FAST utilizes the SOSdata structure that answers queries in O (1). Updatesare a bit more complex as we need to handle weightsand thus may be required to move the flow more thanonce, upon a counter increase. Whenever we wish toincrease the value of a counter (Line 3 and Line 8), weneed to remove the item from its current group andplace it in a group that has the increased c value. Thismeans that for increasing a counter by n ∈ N , we haveto traverse at most n groups until we find the correctlocation. Since the remainder value is at most s − (cid:4) s − w s (cid:5) (Line 312nd Line 8). Finally, since s = (cid:106) M · φ + 1 (cid:107) , we get thatthe counter increase is bounded by (cid:22) (cid:98) M · φ/ (cid:99) − w (cid:98) M · φ/ (cid:99) (cid:23) < wMφ/ ≤ φ = O (cid:18) φ (cid:19) . B. MISSING FIGURE
Figure 8 shows runtime performance evaluation ofFAST as a function of φ for three different ε values(2 − , − , − ). While we indeed obtained a speedupwith larger φ values, increasing φφ