[PDF] Fast Flow Volume Estimation

Abstract

The increasing popularity of jumbo frames means growing variance in the size of packets transmitted in modern networks. Consequently, network monitoring tools must maintain explicit traffic volume statistics rather than settle for packet counting as before. We present constant time algorithms for volume estimations in streams and sliding windows, which are faster than previous work. Our solutions are formally analyzed and are extensively evaluated over multiple real-world packet traces as well as synthetic ones. For streams, we demonstrate a run-time improvement of up to 2.4X compared to the state of the art. On sliding windows, we exhibit a memory reduction of over 100X on all traces and an asymptotic runtime improvement to a constant. Finally, we apply our approach to hierarchical heavy hitters and achieve an empirical 2.4-7X speedup.

Full PDF

FFast Flow Volume Estimation

Ran Ben Basat

Technion [email protected] Gil Einziger

Nokia Bell Labs [email protected] Roy Friedman

Technion [email protected]

ABSTRACT

The increasing popularity of jumbo frames means growingvariance in the size of packets transmitted in modern net-works. Consequently, network monitoring tools must main-tain explicit trafﬁc volume statistics rather than settle forpacket counting as before. We present constant time algo-rithms for volume estimations in streams and sliding win-dows, which are faster than previous work. Our solutions areformally analyzed and are extensively evaluated over multi-ple real-world packet traces as well as synthetic ones. Forstreams, we demonstrate a run-time improvement of up to2.4X compared to the state of the art. On sliding windows,we exhibit a memory reduction of over 100X on all tracesand an asymptotic runtime improvement to a constant. Fi-nally, we apply our approach to hierarchical heavy hittersand achieve an empirical 2.4-7X speedup.

1. INTRODUCTION

Traﬃc measurement is vital for many network algo-rithms such as routing, load balancing, quality of ser-vice, caching and anomaly/intrusion detection [18, 20,28, 37]. Typically, networking devices handle millionsof ﬂows [38, 41]. Often, monitoring applications trackthe most frequently appearing ﬂows, known as heavyhitters , as their impact is most signiﬁcant.Most works on heavy hitters identiﬁcation have fo-cused on packet counting [3, 17, 42]. However, in recentyears jumbo frames and large TCP packets are becom-ing increasingly popular and so the variability in packetsizes grows. Consequently, plain packet counting mayno longer serve as a good approximation for bandwidthutilization. For example, in data collected by [23] in2014, less than 1% of the packets account for over 25%of the total traﬃc. Here, packet count based heavy hit-ters algorithms might fail to identify some heavy hitterﬂows in terms of bandwidth consumption.Hence, in this paper we explicitly address monitoringof ﬂow volume rather than plain packet counting. Fur-ther, given the rapid line rates and the high volume ofaccumulating data, an aging mechanism such as a slid-ing window is essential for ensuring data freshness andthe estimation’s relevance. Hence, we study estimations of ﬂow volumes in both streams and sliding windows.Finally, per ﬂow measurements are not enough forcertain functionalities like anomaly detection and Dis-tributed Denial of Service (DDoS) attack detection [40,43]. In such attacks, each attacking device only gen-erates a small portion of the traﬃc and is not a heavyhitter. Yet, their combined traﬃc volume is overwhelm-ing.

Hierarchical heavy hitters (HHH) aggregates traﬃcfrom IP addresses that share some common preﬁx [6].In a DDoS, when attacking devices share common IPpreﬁxes, HHH can discover the attack. To that end, weconsider volume based HHH detection as well.Before explaining our contribution, let us ﬁrst moti-vate why packet counting solutions are not easily adapt-able to volume estimation. Counter algorithms typi-cally maintain a ﬁxed set of counters [3, 4, 16, 29, 34,35, 39] that is considerably smaller than the numberof ﬂows. Ideally, counters are allocated to the heavyhitters. When a packet from an unmonitored ﬂow ar-rives, the corresponding ﬂow is allocated the minimalcounter [35] or a counter whose value has dropped be-low a dynamically increased threshold [34].We refer to a stream in which each packet is asso-ciated with a weight is as a weighted stream. Similarly,we refer to streams without weights, or when all packetsreceive the same weight as unweighted . For unweightedstreams, ordered data structures allow constant timeupdates and queries [3,35], since when a counter is incre-mented, its relative order among all counters changes byat most one. Unfortunately, maintaining the counterssorted after a counter increment in a weighted streameither requires to search for its new location, which in-curs a logarithmic cost, or resorting to logarithmic timedata structures like heaps. The reason is that if thecounter is incremented by some value w , its relative po-sition might change by up to w positions. This diﬃcultymotivates our work . The most naive approach treats a packet of size w as w consecutive arrivals of the same packet in the unweightedcase, resulting in linear update times, which is even worse. a r X i v : . [ c s . D S ] O c t e contribute to the following network measurementproblems: (i) stream heavy hitters, (ii) sliding windowheavy hitters, (iii) stream hierarchical heavy hitters.Speciﬁcally, our ﬁrst contribution is Frequent items Al-gorithm with a Semi-structured Table (FAST), a novelalgorithm for monitoring ﬂow volumes and ﬁnding heavyhitters. FAST processes elements in worst case O (1)time using asymptotically optimal space. We formallyprove and analyze the performance of FAST. We thenevaluate FAST on 5 real Internet packet traces from adata center and backbone networks, demonstrating a2.4X performance gain compared to previous works.Our second contribution is Windowed Frequent itemsAlgorithm with a Semi-structured Table (WFAST), anovel algorithm for monitoring ﬂow volumes and ﬁndingheavy hitters in sliding windows. We evaluate WFASTon ﬁve Internet traces and show that its runtime is rea-sonably fast, and that it requires as little as 1% of thememory of previous work [27]. We analyze WFAST andshow that it operates in constant time and is space op-timal, which asymptotically improves both the runtimeand the space consumption of previous work. We be-lieve that such a dramatic improvement makes volumeestimation over a sliding window practical!Our third contribution is

Hierarchical Frequent itemsAlgorithm with a Semi-structured Table (HFAST), whichﬁnds hierarchical heavy hitters. HFAST is created byreplacing the underlying HH algorithm in [36] (SpaceSaving) with FAST. We evaluate HFAST and demon-strate an asymptotic update time improvement as wellas an empirical 2.4-7X speedup on real Internet traces.

2. RELATED WORK2.1 Streams

Sketches such as

Count Sketch (CS) [8] and

CountMin Sketch (CMS) [15] are attractive as they enablecounter sharing and need not maintain a ﬂow to countermapping for all ﬂows. Sketches typically only providea probabilistic estimation, and often do not store ﬂowidentiﬁers. Thus, they cannot ﬁnd the heavy hitters,but only focus on the volume estimation problem. Ad-vanced sketches, such as Counter Braids [32], Random-ized Counter Sharing [31] and Counter Tree [9], improveaccuracy, but their queries require complex decoding.In counter based algorithms, a ﬂow table is main-tained, but only a small number of ﬂows are monitored.These algorithms diﬀer from each other in the size andmaintenance policy of the ﬂow table, e.g.,

Lossy Count-ing [34] and its extensions [16, 39],

Frequent [29] and

Space Saving [35]. Given ideal conditions, counter al-gorithms are considered superior to sketch based tech-niques. Particularly, Space Saving was empirically shownto be the most accurate [11,12,33]. Many counter basedalgorithms were developed by the databases community and are mostly suitable for software implementation.The work of [3] suggests a compact static memory im-plementation of Space Saving that may be more acces-sible for hardware design. Yet, software implementa-tions are becoming increasingly relevant in networkingas emerging technologies such as NFVs become popular.Alas, most previous works rely on sorted data struc-tures such as

Stream Summary [35] or SAIL [3] that onlyoperate in constant time for unweighted updates. Thus,a logarithmic time heap based implementation of SpaceSaving was suggested [12] for the more general volumecounting problem. IM-SUM, DIM-SUM [5] and BUS-SS [19] are very recent algorithms developed for thevolume heavy-hitters problem ( only for streams, with no sliding windows support). BUS oﬀers a randomizedalgorithm that operates in constant time. IM-SUM op-erates in amortized O (1) time and DIM-SUM in worstcase constant time. Empirically, DIM-SUM it is slowerthan FAST. Additionally, DIM-SUM requires φ(cid:15) coun-ters, for some φ >

0, for guaranteeing N · M · (cid:15) errorand operating in O ( φ − ) time. FAST only needs half asmany counters for the same time and error guarantees. Heavy hitters on sliding windows were ﬁrst studiedby [1]. Given an accuracy parameter ( ε ), a windowsize ( W ) and a maximal increment size ( M ), such al-gorithms estimate ﬂows’ volume on the sliding windowwith an additive error that is at most W · M · ε .Their algorithm requires O (cid:0) (cid:15) log (cid:15) (cid:1) counters and O (cid:0) (cid:15) log (cid:15) (cid:1) time for queries and updates. The workof [30] reduces the space requirements and update timeto O (cid:0) (cid:15) (cid:1) . An improved algorithm with a constant up-date time is given in [26]. Further, [3] provided an al-gorithm that requires O (cid:0) (cid:15) (cid:1) for queries and supportsconstant time updates and item frequency queries.The weighted variant of the problem was only studiedby [27], whose algorithm operates in O (cid:0) A(cid:15) (cid:1) time andrequires O (cid:0) A(cid:15) (cid:1) space for a W · M · ε approximation; here, A ∈ [1 , M ] is the average packet size in the window.In this work, we suggest an algorithm for the weightedproblem that ( i ) uses optimal O (cid:0) (cid:15) (cid:1) space, ( ii ) performsheavy hitters queries in optimal O (cid:0) (cid:15) (cid:1) time, and (iii)performs volume queries and updates in constant time. Hierarchical Heavy Hitters (HHH) were addressed,e.g., in [13, 14, 21, 36, 43]. HHH algorithms monitor ag-gregates of ﬂows that share a common preﬁx. To do so,HHH algorithms treat ﬂows identiﬁers as a hierarchicaldomain. We denote by H the size of this domain.The full and partial ancestry algorithms [14] are triebased algorithms that require O (cid:0) H(cid:15) log (cid:15)N (cid:1) space andoperate at O ( H log (cid:15)N ) time. The state of the art [36]algorithm requires O (cid:0) H(cid:15) (cid:1) space and its update time for2eighted inputs is O (cid:0) H log( (cid:15) ) (cid:1) . It solves the approxi-mate HHH problem by dividing it into multiple simplerheavy hitters problems. In our work, we replace theunderlying heavy hitters algorithm of [36] with FAST,which yields a space complexity of O (cid:0) H(cid:15) (cid:1) and an updatecomplexity of O ( H ). That is, we improve the updatecomplexity from O (cid:0) H log (cid:0) (cid:15) (cid:1)(cid:1) to O ( H ).

3. PRELIMINARIES

Given a set U and a positive integer M ∈ N + , we saythat S is a ( U , M )-weighted stream if it contains a se-quence of (cid:104) id, weight (cid:105) pairs. Speciﬁcally: S = (cid:104) p , p , . . . p N (cid:105) , where ∀ i ∈ , . . . , N : p i ∈ U × { , . . . M } . Given apacket p i = ( d i , w i ), we say that d i is p i ’s id while w i isits weight; N is the stream length , and M is the max-imal packet size . Notice that the same packet id maypossibly appear multiple times in the stream, and eachsuch occurrence may potentially be associated with adiﬀerent weight. Given a ( U , M )-weighted stream S , wedenote v x , the volume of id x , as the total weight of allpackets with id x . That is: v x (cid:44) (cid:80) i ∈{ ,...,N } : d i = x w i . For a window size W ∈ N + , we denote the window volume ofid x as its total weight of packets with id x within thelast W packets, that is: v Wx (cid:44) (cid:80) i ∈{ N − W +1 ,...,N } : d i = x w i . We seek algorithms that support the operations:

ADD ( (cid:104) x , w (cid:105) ): append a packet with identiﬁer x andweight w to S . Query ( x ): return an estimate (cid:98) v x of v x . WinQuery ( x ): return an estimate (cid:99) v Wx of v Wx .We now formally deﬁne the main problems in this work:( (cid:15), M ) -Volume Estimation : Query ( x ) returns anestimation ( (cid:98) v x ) that satisﬁes v x ≤ (cid:98) v x ≤ v x + N · M · (cid:15). ( W , (cid:15), M ) -Volume Estimation : WinQuery ( x ) re-turns an estimation ( (cid:99) v Wx ) that satisﬁes v Wx ≤ (cid:99) v Wx ≤ v Wx + W · M · (cid:15). ( θ, (cid:15), M ) -Approximate Weighted Heavy Hitters :returns a set H ⊆ U such that: ∀ x ∈ U :( v x > N · M · θ = ⇒ x ∈ H ) ∧ ( v x < N · M · ( θ − (cid:15) ) = ⇒ x / ∈ H ) . ( W , θ, (cid:15), M ) -Approximate Weighted Heavy Hit-ters : returns a set H ⊆ U such that ∀ x ∈ U :( v Wx > W · M · θ = ⇒ x ∈ H ) ∧ ( v Wx < W · M · ( θ − (cid:15) ) = ⇒ x / ∈ H ) . Our heavy hitter deﬁnitions are asymmetric. Thatis, they require that ﬂows whose frequency is above thethreshold of N · M · θ (or W · M · θ ) are included inthe list, but ﬂows whose volume is slightly less than thethreshold can be either included or excluded from thelist. This relaxation is necessary as it enables reduc-ing the required amount of space to sub linear. Let usemphasize that the identities of the heavy hitter ﬂows Symbol Meaning S stream N number of elements in the stream M maximal value of an element in the stream W window size U the universe of elements[ r ] the set { , , ..., r − } φ FAST performance parameter. v x the volume of an element x in S (cid:99) v x an estimation of v x v Wx the volume of element x in the last W elements of S (cid:100) v Wx an estimation of v Wx (cid:15) estimation accuracy parameter θ heavy hitters threshold parameter Table 1: List of SymbolsFigure 1: An example of how FAST utilizes the SOSstructure. Here, ﬂows are partially ordered according tothe third digit (100’s), and each ﬂow maintains its ownremainder; e.g., the estimated volume of D is (cid:99) v D = 583.are not known in advance. Hence, it is impossible toa-priori allocate counters only to these ﬂows. The basicnotations used in this work are listed in Table 1.

4. FREQUENT ITEMS ALGORITHM WITHA SEMI-STRUCTURED TABLE (FAST)

In this section, we present

Frequent items Algorithmwith a Semi-structured Table (FAST) , a novel algorithmthat achieves constant time weighted updates. FASTuses a data structure called

Semi Ordered Summary(SOS) , which maintains ﬂow entries in a semi orderedmanner. That is, similarly to previous works, SOSgroups ﬂows according to their volume, each of whichis called a volume group . The volume groups are main-tained in an ordered list. Each volume group is associ-ated with a value C that determines the volume of itsnodes. Unlike existing data structures, counters withineach volume group are kept unordered.Unlike previous works, the grouping is done at coarsegranularity. Each node (inside a group) includes a vari-able called Remainder (denoted R ). The volume esti-mate of a ﬂow is C + R where R is the remainder of itsvolume node and C is the value of its volume group.This semi-ordered structure is unique to SOS and en-ables it to serve weighted updates in O (1). Volumequeries are satisﬁed in constant time using a separateaggregate hash table which maps between each ﬂowidentiﬁer and its SOS node. FAST then uses SOS toﬁnd a near-minimum ﬂow when needed.Figure 1 provides an intuitive example for the case3 lgorithm 1 FAST (

M, (cid:15), φ ) Initialization: C ← ∅ , ∀ x : c x ← , r x ← , s ← (cid:22) M · φ (cid:23) , C ← (cid:24) φ(cid:15) (cid:25) function Add (Item x , Weight w ) if x ∈ C or | C | < C then c x ← c x + (cid:106) rx + w s (cid:107) r x ← ( r x + w ) mod s C ← C ∪ { x } else Let m ∈ argmin y ∈ C ( c y ) (cid:46) arbitrary minimal item c x ← c m + (cid:106) s − w s (cid:107) r x ← ( s − w ) mod s C ← C \ { m } ∪ { x } function Query ( x ) if x ∈ C or | C | < C then return r x + s · c x else return s − s · min y ∈ C c y M = 1 , C ) and the item’s remainder( R ), e.g., the volume of A is 400 + 32 = 432. Flowsare partially ordered according to their third digit, i.e.,in multiples of 100, or M/

10. Within a speciﬁc group,however, items are unordered, e.g., A, B and J are un-ordered but all appear before items with volume of atleast 500. As the number of lists to skip prior to anaddition is O (1), the update complexity is also O (1).Intuitively, ﬂows are only ordered according to vol-ume groups and if we make sure that the maximal weightcan only advance a ﬂow a constant number of ﬂowgroups then SOS operates in constant time. Alas, keep-ing the ﬂows only partially ordered increases the er-ror. We compensate for such an increase by requiringa larger number of SOS entries compared to previouslysuggested fully ordered structures. The main challengein realizing this idea is to analyze the accuracy impactand provide strong estimation guarantees. FAST employs (cid:108) φ(cid:15) (cid:109) counters, for some non-negativeconstant φ ≥ φ determines how ordered SOS is: for φ = 0, we get full order, while for φ >

0, it is only or-dered up to M · φ/ M · φ/ O (1 /φ ) andis therefore constant for any ﬁxed φ . We note that anΩ (cid:0) (cid:15) (cid:1) counters lower bound is known [35]. Thus, FASTis asymptotically optimal for constant φ . The pseudocode of FAST appears in Algorithm 1. We start by a simple useful observation

Observation Let a, b ∈ N : a = b · (cid:4) ab (cid:5) +( a mod b ) . For the analysis, we use the following notations: forevery item x ∈ U and stream length t , we denote by q t ( x ) the value of Query ( x ) after seeing t elements.We slightly abuse the notation and refer to t also as the time at which the t th element arrived, where time hereis discrete. We denote by C t the set of elements withan allocated counter at time t , by r x,t the value of r x and by c x,t the value of c x . Also, we denote the volumeat time t as v x,t (cid:44) (cid:80) i ∈{ ,...,t } : d i = x w i . All missing proofsappear in Appendix A.We now show that FAST has a one-sided error. Lemma For any t ∈ N , after seeing any ( U , M ) -weighted stream S of length t , for any x ∈ U : v x ≤ (cid:98) v x . We continue by showing that FAST is accurate ifthere are only a few distinct items.

Lemma If the stream contains at most (cid:108) φ(cid:15) (cid:109) dis-tinct elements then FAST provides an exact estimationof an items volume upon query. We now analyze the sum of counters in C . Lemma For any t ∈ N , after seeing any ( U , M ) -weighted stream S of length t , FAST satisﬁes: (cid:80) x ∈ C t Query ( x ) ≤ t · M · (1 + φ/ . Next, we show a bound on FAST’s estimation error.

Lemma For any t ∈ N , after seeing any ( U , M ) -weighted stream S of length t , for any x ∈ U : (cid:98) v x ≤ v x + t · M · (cid:15). Next, we prove a bound on the run time of FAST.

Lemma let φ > , FAST adds in O (cid:16) φ (cid:17) time. Next, we combine Lemma 1, Lemma 4 and Lemma 5to conclude the correctness of the FAST algorithm.

Theorem For any constant φ > , when allocated C (cid:44) (cid:108) φ(cid:15) (cid:109) counters, FAST operates in constant timeand solves the ( (cid:15), M ) - Volume Estimation problem.

Finally, FAST also solves the heavy hitters problem:

Theorem For any ﬁxed φ > , when allocatedwith C (cid:44) (cid:108) φ(cid:15) (cid:109) counters, by returning { x ∈ U | (cid:98) v x ≥ N · M · θ } , FAST solves the ( θ, (cid:15), M )- Weighted HeavyHitters problem.

5. WINDOWED FAST (WFAST)

We now present

Windowed Frequent items Algorithmwith a Semi-structured Table (WFAST) , an eﬃcient al-gorithm for the (

W, (cid:15), M ) - Volume Estimation and(

W, θ, (cid:15), M ) - Weighted Heavy Hitters problems.We partition the stream into consecutive sequences ofsize W called frames . Each frame is further divided into k (cid:44) (cid:6) (cid:15) (cid:7) blocks , each of size Wk , which we assume is aninteger for simplicity. Figure 2 illustrates the setting.4igure 2: The stream is divided into intervals of size W called frames and each frame is partitioned into k equal-sized blocks . The window of interest is also of size W , and overlaps with at most 2 frames and k +1 blocks. k A constant k (cid:44) (cid:100) /ε (cid:101) y A FAST instance using k (1 + φ ) counters. b A queue of k + 1 queues.An eﬃcient implementation appears in [3]. B The histogram of b , implemented using a hash table. o The oﬀset within the current frame.

Table 2: Variables used by the WFAST algorithm.WFAST uses a FAST instance y to estimate the vol-ume of each ﬂow within the current frame. Once a frameends (the stream length is divisible by W ), we “ﬂush”the instance, i.e., reset all counters and remainders to0. Yet, we do not “forget” all information in a ﬂush, ashigh volume ﬂows are stored in a dedicated data struc-ture. Speciﬁcally, we say that an element x overﬂowed at time t if (cid:106) q x,t MW/k (cid:107) > (cid:106) q x,t − MW/k (cid:107) . We use a queue ofqueues structure b to keep track of which elements haveoverﬂowed in each block. That is, each node of the mainqueue represents a block and contains a queue of all el-ements that overﬂowed in its block. Particularly, thesecondary queues maintain the ids of overﬂowing ele-ments. Once a block ends, we remove the oldest block’snode (queue) from the main queue, and initialize a newqueue for the starting block. Finally, we answer queriesabout the window volume of an item x by multiply-ing its overﬂows count by M W/k , adding the residualcount from y (i.e., the part that is not recorded in b ),plus 2 M W/k to ensure an overestimation.For O (1) time queries, we also maintain a hash table B that tracks the overﬂow count for each item. That is,for each element x , B [ x ] contains the number of times x is recorded in b . Since multiple items may overﬂow inthe same block, we cannot update B once a block endsin constant time. We address this issue by deamortiz-ing B ’s update, and on each arrival we remove a single item from the queue of the oldest block (if such exists).The pseudo code of WFAST appears in Algorithm 2and a list containing its variables description appearsin Table 2. An eﬃcient implementation of the queue ofqueues b is described in [3]. We start by introducing several notations to be usedin this section. We mark the queried element by x , the Algorithm 2

WFAST (

W, M, φ ) Initialization: y ← F requentitemsAlgorithmwithaSemi − structuredT able ( M, /k, φ ) , o ← , B ← Empt hash table , b ← Queue of k + 1 empty queues. function add (Item x , Weight w ) o ← o + 1 mod W if o = 0 then (cid:46) new frame starts y . flush () if o mod Wk = 0 then (cid:46) new block b . pop () b . append (new empty queue) if b .tail is not empty then (cid:46) remove oldest item oldID ← b .tail. pop () B [ oldID ] ← B [ oldID ] − if B [ oldID ] = 0 then B . remove ( oldID ) prevOverflowCount ← (cid:22) y . query ( x ) MW/k (cid:23) y. add ( x, w ) (cid:46) add item if (cid:22) y . query ( x ) MW/k (cid:23) > prevOverflowCount then (cid:46) overﬂow b .head. push ( x ) if B. contains (x) then B [ x ] ← B [ x ] + 1 else B [ x ] ← (cid:46) adding x to B function WinQuery (Item x) if B . Contains ( x ) then return MW/k · ( B [ x ] + 2) + ( y. query ( x ) mod MW/k ) else (cid:46) x has no overﬂows return MW/k +y. query ( x ) current time by W + o , and assume that item W isthe ﬁrst element of the current frame. For convenience,denote v x ( t , t ) (cid:44) (cid:80) i ∈{ t ,...,t } : x i = x w i , i.e., the volume of x between t and t . The goal is then to approximate thewindow volume of x , which is deﬁned as v wx (cid:44) v ( o +1 , W + o ) , i.e., the sum of weights in the timestamps within (cid:104) o + 1 , o + 2 , . . . , W + o (cid:105) in which x arrived. We nextstate the main correctness theorem for WFAST. Theorem Algorithm 2 solves the ( W, (cid:15), M ) - VolumeEstimation problem.

Due to lack of space, the proof of the Theorem appearsin the Appendix.As a corollary, Algorithm 2 can ﬁnd heavy hitters.

Theorem By returning all items x ∈ U for which (cid:99) v Wx ≥ M W θ , Algorithm 2 solves ( W, θ, (cid:15), M ) - WeightedHeavy Hitters . WFAST runtime analysis:

As listed in the pseudo code of WFAST (see Algo-rithm 2) and the description above, processing new el-ements requires adding them to the FAST instance y ,which takes O ( φ ) time, and another O (1) operations.The query processing includes O (1) operations and hashtables accesses. For returning the heavy hitters, we goover all of the items with allocated counters in time O ( φ(cid:15) ). In summary, we get the following theorem: Theorem For any ﬁxed φ > , WFAST processesnew elements and answers window-volume queries inconstant time, while ﬁnding the window’s weighted heavyhitters in O ( (cid:15) ) time. a) SanJose14 (b) YouTube (c) Chicago16(d) SanJose13 (e) DC1 (f) Chicago15 Figure 3: Runtime comparison for a given error guarantee ( (cid:15) = 2 − ). All algorithms provide the same guaranteesand FAST uses diﬀerent φ values to show the speedup gained from allocating additional counters.

6. HIERARCHICAL HEAVY HITTERS

Hierarchical heavy hitters (HHH) algorithms treat IPaddresses as a hierarchical domain. At the bottom are fully speciﬁed

IP addresses such as p = 101 . . . p = 101 . . . ∗ and p = 101 . . ∗ are level 1 and level 2 preﬁxes of p ,respectively. Such preﬁxes generalize an IP address. Inthis example, p ≺ p ≺ p , indicating that p satisﬁesthe pattern of p , and any IP address that satisﬁes p also satisﬁes p . The above example refers to a sin-gle dimension (e.g., the source IP), and can be gener-alized to multiple dimensions (e.g., pairs of source IPand destination IP). HHH algorithms need to ﬁnd theheavy hitter preﬁxes at each level of the induced hierar-chy. For example, this enables identifying heavy hitterssubnets, which may be suspected of generating a DDoSattack. The problem is formally deﬁned in [14, 36]. Hierarchical Fast (HFAST)

Hierarchical FAST (HFAST) is derived from the algo-rithm of [36]. Speciﬁcally, the work of [36] suggests

Hierarchical Space Saving with a Heap(HSSH) . In theirwork, the HHH preﬁxes are distilled from multiple so- lutions of plain heavy hitter problems. That is, eachpreﬁx pattern has its own separate heavy hitters al-gorithm that is updated on each packet arrival. Forexample, consider a packet whose source IP address is101 . . .

104 where the (one dimensional) HHH mea-surements are carried according to source addresses. Inthis case, the packet arrival is translated into the follow-ing ﬁve heavy hitters update operations: 101 . . . . . . ∗ , 101 . . ∗ , 101 . ∗ , and ∗ . Finally, HHHsare identiﬁed by calculating the heavy hitters of eachseparate heavy hitters algorithm.HFAST is derived by replacing the underlying heavyhitters algorithm in [36] from Space Saving with heap [35]to FAST. This asymptotically improves the update com-plexity from O (cid:0) H log (cid:0) (cid:15) (cid:1)(cid:1) to O ( H ), where H is the sizeof the hierarchy. Since the analysis of [36] is indiﬀer-ent to the internal implementation of the heavy hittersalgorithm, no analysis is required for HFAST.Finally, we note that a hierarchical heavy hitters al-gorithm on sliding windows can be constructed usingthe work of [36] by replacing each space saving instancewith our WFAST. The complexity of the proposed al-gorithm is O (cid:0) H(cid:15) (cid:1) space and O ( H ) update time. To ourknowledge, there is no prior work for this problem.6 a) SanJose14 (b) YouTube (c) Chicago16(d) SanJose13 (e) DC1 (f) Chicago15 Figure 4: Runtime comparison as a function of accuracy guarantee ( (cid:15) ) provided by the algorithms.

7. EVALUATION

Our evaluation is performed on an Intel i7-5500UCPU with a clock speed of 2.4GHz, 16 GB RAM and aWindows 8.1 operating system. We compare our C++prototypes to the following alternatives:

Count Min Sketch (CMS) [15] – a sketch based solu-tion that can only solve the volume estimation problem.

Space Saving Heap (SSH) – a heap based implemen-tation [12] of Space Saving [35] that has a logarithmicruntime complexity.

Hierarchical Space Saving Heap (HSSH) – a hierar-chical heavy hitters algorithm [36] that uses SSH as abuilding block and operates in O ( H log( ε )) complexity. Full Ancestry – a trie based HHH algorithm suggestedby [14], which operates in O ( H log (cid:15)N ) complexity. Partial Ancestry – a trie based HHH algorithm sug-gested by [14], which operates in O ( H log (cid:15)N ) complex-ity and is considered faster than Full Ancestry.Related work implementations were taken from opensource libraries released by [11] for streams and by [36]for hierarchical heavy hitters. As we have no access toa concrete implementation of a competing sliding win-dow protocol, we compare WFAST to Hung and Ting’salgorithm [27] by conservatively estimating the spaceneeded by their approach. Each data point we reporthere is the average of 10 runs. Our evaluation includes the following datasets. Thepacket traces characteristics are summarized in Table 3.The CAIDA backbone Internet traces that monitorlinks in Chicago [24, 25] and San Jose [22, 23]. A dat-acenter trace from a large university [7] and a trace of436K YouTube video accesses [10]. The weight of avideo is its length in seconds.As shown in Table 3, the impact of jumbo framesvaries between backbone links. Yet, the weight of largepackets increases over time in both. In the San Joselink, the number and volume of large packets have in-creased by 50% within a period of 6 months. In theChicago link, large packets are still insigniﬁcant, buttheir number and volume have increased by 50% in twomonths. φ on Runtime Recall that smaller φ yields space eﬃciency while theruntime is proportional to φ , i.e, smaller φ is expectedto cause a slower runtime. In Appendix B, we showruntime performance evaluation of FAST as a functionof φ for three diﬀerent ε values (2 − , − , − ). Whilewe indeed obtained a speedup with larger φ values, in-creasing φ beyond a certain small threshold has littleimpact on performance. For the rest of our evaluation,we focus on φ = 0 .

25 that oﬀers attractive space/time7 a) SanJose14 (b) YouTube (c) Chicago16

Figure 5: Space overheads of WFAST compared to previous works. Note that WFAST operates in constant timewhile the other algorithm requires linear scanning of all counters. (a) Chicago 16 (b) YouTube (c) DC1(d) Chicago16 (e) YouTube (f) DC1

Figure 6: WFAST with varying window sizes ( ε = 2 − ) and varying ε (with a window size of W = 2 ).trade oﬀ, as well as on φ = 4 that yields higher perfor-mance at the expense of more space. To explain the tradeoﬀ proposed by FAST, we mea-sured the runtime of the various algorithms for a ﬁxederror guarantee. Here, SSH and CMS are fully deter-mined by the error guarantees (set to be (cid:15) = 2 − ) andthus have a single measurement point. CMS requiresmore counters as it uses 10 rows of (cid:100) e/(cid:15) (cid:101) counters each, while SSH only requires 1 /(cid:15) . FAST can provide thesame error guarantee for diﬀerent φ values, which aﬀectsboth runtime and the number of counters. Hence, FASTis represented by a curve. As Figure 3 shows, in alltraces, allocating a few additional counters to the 1 /(cid:15) re-quired by SSH allows FAST to achieve higher through-put. Additionally, on all traces, FAST provides fasterthroughput than CMS with far fewer counters. WhileFAST has larger per counter overheads than CMS, itsID to counter mapping allows it to solve the Weighted a) SanJose14 (b) Chicago15 (c) Chicago16 Figure 7: Runtime comparison of HHH algorithms as a function of their accuracy guarantee ( (cid:15) ). Trace Date(Y/M/D)

Table 3: A summary of key characteristics of the real Internet traces used in this work.

Heavy Hitters problem that CMS cannot.

Figure 4 presents a comparative analysis of the oper-ation speed of previous approaches. Recall that CMS isa probabilistic scheme; we conﬁgured it with a failureprobability of 0 . φ = 4 (4FAST) and φ = 0 .

25 (0.25FAST).As can be observed, 4FAST and 0.25FAST are con-siderably faster than the alternatives in Chicago16 andYouTube. In SanJose14 and SanJose13, SSH is as fastas 4FAST for a large (cid:15) (small number of counters). Yet,as (cid:15) decreases and the number of counters increases,SSH becomes slower due to its logarithmic complex-ity. In contrast, CMS is almost workload independent.When considering only previous work, in some work-loads CMS is faster than SSH, mainly because SSH’sperformance is workload dependent.

We evaluate WFAST compared to Hung and Ting’salgorithm [27], which is the only one that supportsweighted updates on sliding windows. Figure 5 showsthe memory consumption of WFAST with parameters φ = 4 and φ = 0 .

25 (4WFAST, 0.25FAST) compared toHung and Ting’s algorithm. All algorithms are conﬁg-ured to provide the same worst case error guarantee. Asshown, WFAST is up to 100 times more space eﬃcientthan Hung and Ting’s algorithm. Sadly, we could notobtain an implementation of Hung and Ting’s algorithmand thus do not compare its runtime to WFAST. How-ever, WFAST improves their update complexity from O ( A(cid:15) ), where A is the average packet size, to O (1). Figure 6 shows the operation speed of WFAST for dif-ferent window sizes and diﬀerent ε values. There is littledependence in window size and ε with the exception ofthe DC1 dataset. In this dataset, since the average andmaximal packet sizes are similar, the inner working ofWFAST causes overﬂows to be more frequent when ε is close to the window size. Thus, to achieve similarperformance as the other traces one needs suﬃcientlylarge window size in this trace. In Figure 7, we evaluate the speed of our HFASTcompared to the algorithm of [36], which is denoted byHSSH, as well as the Partial Ancestry and Full Ances-try algorithms by [14]. We used the library of [36] fortheir own HSSH implementation as well as for the Par-tial Ancestry and Full Ancestry implementations. Sincethe library was released for Linux, we used a diﬀerentmachine for our HFAST evaluation. Speciﬁcally, weused a Dell 730 server running Ubuntu 16.04.01 release.The server has 128GB of RAM and an Intel(R) Xeon(R)CPU E5-2667 v4 @ 3.20GHz processor.We used two dimensional source/destination hierar-chies in byte granularity, where networks IDs are as-sumed to be 8, 16 or 24 bits long. The weight of eachpacket is its byte volume, including both the payloadsize and the header size. As depicted, HFAST is upto 7 times faster than the best alternative and at least2.4 times faster in every data point. It appears thatfor large (cid:15) values, HSSH is faster than the Partial andFull Ancestry algorithms. Yet, for small (cid:15) values, allprevious algorithms operate in similar speed.9 . DISCUSSION

In this paper, we presented algorithms for estimatingper ﬂow traﬃc volume in streams, sliding windows andhierarchical domains. Our algorithms oﬀer both asymp-totic and empirical improvements for these problems.For streams, FAST processes packets in constant timewhile being asymptotically space optimal. This is en-abled by our novel approach of maintaining only a par-tial order between counters. An evaluation over real-world traﬃc traces has yielded a speed improvement ofup to 2.4X compared to previous work.In the sliding window case, we showed that WFASTworks reasonably fast and oﬀers 100x reduction in re-quired space, bringing sliding windows to the realm ofpossibility. For a given error of W · M · (cid:15) , WFAST re-quires O (cid:0) (cid:15) (cid:1) counters while previous work uses O (cid:0) A(cid:15) (cid:1) ,where A is the average packet size. Moreover, WFASTruns in constant time while previous work runs in O (cid:0) A(cid:15) (cid:1) .For hierarchical domains, we presented HFAST thatrequires O ( H(cid:15) ) space and has O ( H ) update complexity.This improves over the O (cid:0) H log (cid:15) (cid:1) update complex-ity of previous work. Additionally, we demonstrateda speedup of 2.4X-7X on real Internet traces. To ourknowledge, there is no prior work on that problem andwe plan to examine its possible applications in the fu-ture. The code of FAST is available as open source [2].We thank Yechiel Kimchi for helpful code optimiza-tion suggestions.

9. REFERENCES [1]

Arasu, A., and Manku, G. S.

Approximate counts and quantilesover sliding windows. In

ACM PODS 2004 .[2]

Ben-Basat, R., and Einziger, G.

FAST code. Available: https://github.com/ranbenbasat/FAST .[3]

Ben-Basat, R., Einziger, G., Friedman, R., and Kassner, Y.

Heavy Hitters in Streams and Sliding Windows. In

IEEEINFOCOM (2016).[4]

Ben-Basat, R., Einziger, G., Friedman, R., and Kassner, Y.

Randomized admission policy for eﬃcient top-k and frequencyestimation. In

IEEE INFOCOM (2017).[5]

Ben-Basat, R., Einziger, G., Friedman, R., and Kassner, Y.

Optimal Elephant Flow Detection. In

IEEE INFOCOM (2017).[6]

Ben Basat, R., Einziger, G., Friedman, R., Luizelli, M. C., andWaisbard, E.

Constant time updates in hierarchical heavyhitters. In

ACM SIGCOMM (2017).[7]

Benson, T., Akella, A., and Maltz, D. A.

Network traﬃccharacteristics of data centers in the wild. In

ACMIMC (2010) .[8]

Charikar, M., Chen, K., and Farach-Colton, M.

FindingFrequent Items in Data Streams. In

EATCS ICALP (2002).[9]

Chen, M., and Chen, S.

Counter Tree: A Scalable CounterArchitecture for Per-Flow Traﬃc Measurement. In

IEEE ICNP (2015).[10]

Cheng, X., Dale, C., and Liu, J.

Statistics and Social Networkof YouTube Videos. In

IWQoS (2008).[11]

Cormode, G., and Hadjieleftheriou, M.

Finding FrequentItems in Data Streams.

VLDB 1 , 2 (2008).[12]

Cormode, G., and Hadjieleftheriou, M.

Methods for FindingFrequent Items in Data Streams.

J. VLDB 19 , 1 (2010).[13]

Cormode, G., Korn, F., Muthukrishnan, S., and Srivastava, D.

Diamond in the Rough: Finding Hierarchical Heavy Hitters inMulti-dimensional Data. SIGMOD 2004.[14]

Cormode, G., Korn, F., Muthukrishnan, S., and Srivastava, D.

Finding Hierarchical Heavy Hitters in Streaming Data.

ACMTrans. Knowl. Discov. Data 1 , 4 (2008).[15]

Cormode, G., and Muthukrishnan, S.

An Improved DataStream Summary: The Count-min Sketch and Its Applications.

J. Algorithms (2005). [16]

Dimitropoulos, X., Hurley, P., and Kind, A.

ProbabilisticLossy Counting: An Eﬃcient Algorithm for Finding HeavyHitters.

ACM SIGCOMM CCR 38 , 1 (2008).[17]

Einziger, G., Fellman, B., and Kassner, Y.

IndependentCounter Estimation Buckets. In

IEEE INFOCOM (2015).[18]

Einziger, G., and Friedman, R.

TinyLFU: A Highly EﬃcientCache Admission Policy. In

Euromicro PDP (2014).[19]

Einziger, G., Luizelli, M. C., and Waisbard, E.

Constant timeweighted frequency estimation for virtual networkfunctionalities. In (2017).[20]

Garcia-Teodoro, P., Diaz-Verdejo, J. E., Macia-Fernandez,G., and Vazquez, E.

Anomaly-Based Network IntrusionDetection: Techniques, Systems and Challenges.

Computersand Security (2009).[21]

Hershberger, J., Shrivastava, N., Suri, S., and T´oth, C. D.

Space Complexity of Hierarchical Heavy Hitters inMulti-dimensional Data Streams. In

ACM PODS (2005).[22]

Hick, P.

CAIDA Anonymized Internet Trace, equinix-sanjose2013-06-19 13:00-13:05 UTC, Direction B., 2014.[23]

Hick, P.

CAIDA Anonymized Internet Trace, equinix-sanjose2013-12-19 13:00-13:05 UTC, Direction B., 2014.[24]

Hick, P.

CAIDA Anonymized Internet Trace, equinix-chicago2015-12-17 13:00-13:05 UTC, Direction A., 2015.[25]

Hick, P.

CAIDA Anonymized Internet Trace, equinix-chicago2016-02-18 13:00-13:05 UTC, Direction A., 2016.[26]

Hung, R. Y. S., Lee, L., and Ting, H.

Finding frequent itemsover sliding windows with constant update time.

Inf. Proc.Let.10’ 110 , 7.[27]

Hung, R. Y. S., and Ting, H. F.

Finding Heavy Hitters over theSliding Window of a Weighted Data Stream. In

LATIN (2008).[28]

Kabbani, A., Alizadeh, M., Yasuda, M., Pan, R., andPrabhakar, B.

AF-QCN: Approximate Fairness with QuantizedCongestion Notiﬁcation for Multi-tenanted Data Centers. In

IEEE HOTI (2010).[29]

Karp, R. M., Shenker, S., and Papadimitriou, C. H.

A SimpleAlgorithm for Finding Frequent Elements in Streams and Bags.

ACM Transactions Database Systems 28 , 1 (Mar. 2003).[30]

Lee, L., and Ting, H. F.

A simpler and more eﬃcientdeterministic scheme for ﬁnding frequent items over slidingwindows. In

Proc. of PODS 2006 .[31]

Li, T., Chen, S., and Ling, Y.

Per-Flow Traﬃc MeasurementThrough Randomized Counter Sharing.

IEEE/ACM Trans. onNetworking (2012).[32]

Lu, Y., Montanari, A., Prabhakar, B., Dharmapurikar, S.,and Kabbani, A.

Counter Braids: a Novel Counter Architecturefor Per-Flow Measurement. In

ACM SIGMETRICS (2008).[33]

Manerikar, N., and Palpanas, T.

Frequent Items in StreamingData: An Experimental Evaluation of the State-of-the-Art.

Data Knowl. Eng. (2009).[34]

Manku, G. S., and Motwani, R.

Approximate FrequencyCounts over Data Streams. In

VLDB (2002).[35]

Metwally, A., Agrawal, D., and Abbadi, A. E.

EﬃcientComputation of Frequent and Top-k Elements in Data Streams.In

IN ICDT (2005).[36]

Mitzenmacher, M., Steinke, T., and Thaler, J.

HierarchicalHeavy Hitters with the Space Saving Algorithm. In

ALENEX (2012).[37]

Mukherjee, B., Heberlein, L., and Levitt, K.

NetworkIntrusion Detection.

Network, IEEE 8 , 3 (1994).[38]

Ramabhadran, S., and Varghese, G.

Eﬃcient Implementationof a Statistics Counter Architecture.

ACM SIGMETRICS (2003).[39]

Rong, Q., Zhang, G., Xie, G., and Salamatian, K.

MnemonicLossy Counting: An eﬃcient and accurate heavy-hittersidentiﬁcation algorithm. In

IEEE IPCCC (2010).[40]

Sekar, V., Duffield, N., Spatscheck, O., van der Merwe, J.,and Zhang, H.

LADS: Large-scale Automated DDOS DetectionSystem. In

USENIX ATEC (2006).[41]

Shah, D., Iyer, S., Prabhakar, B., and McKeown, N.

Maintaining Statistics Counters in Router Line Cards.

IEEEMicro (2002).[42]

Tsidon, E., Hanniel, I., and Keslassy, I.

Estimators Also NeedShared Values to Grow Together. In

IEEE INFOCOM (2012).[43]

Zhang, Y., Singh, S., Sen, S., Duffield, N., and Lund, C.

Online Identiﬁcation of Hierarchical Heavy Hitters:Algorithms, Evaluation, and Applications. ACM IMC. PPENDIXA. MISSING PROOFSProof of Lemma 1

Proof.

We prove v x,t ≤ q t ( x ) by induction over t . Basis: t = 0. Here, we have v x,t = 0 = q t ( x ). Hypothesis: v x,t − ≤ q t − ( x ) Step: (cid:104) x t , w t (cid:105) arrives at time t . By case analysis:Consider the case where the queried item x is notthe arriving one (i.e., x (cid:54) = x t ). In this case, we have v x,t = v x,t − . If x ∈ C t − but was evicted (Line 10)then c x ∈ argmin y ∈ C t − ( c y,t − ). This means that: q t − ( x ) = r x,t − + s · argmin y ∈ C t − ( c y,t − ) ≤ s − s · argmin y ∈ C t ( c y,t ) = q t ( x ) , where the last equation follows from the query for x / ∈ C t (Line 15). Next, if x ∈ C t − and x ∈ C t , its es-timated volume is determined by Line 13 and we get q t ( x ) = q t − ( x ) ≥ v x,t − = v x,t . If x / ∈ C t − then x / ∈ C t , so the values of q t ( x ) , q t − ( x ) are determinedby line 15. Since the value of min y ∈ C c y can only in-crease over time, we have q t ( x ) ≥ q t − ( x ) ≥ v x,t andthe claim holds.On the other hand, assume that we are queried aboutthe last item, i.e., x = x t . In this case, we get v x,t = v x,t − + w t . We consider the following cases: First, if x ∈ C t − , then q t ( x ) = q t − ( x )+ w t . Using the hypothe-sis, we conclude that v x,t = v x,t − + w t ≤ q t − ( x )+ w t = q t ( x ) as required. Next, if | C t − | < C , we also have q t ( x ) = q t − ( x ) + w t and the above analysis holds. Fi-nally, if x / ∈ C t − and | C t − | = C , then q t − ( x ) = s − s · min y ∈ C t − c y,t − . (1) On the other hand, when x arrives, the condition ofLine 2 was not satisﬁed, and thus q t ( x ) = r x,t + s · c x,t = ( s − w ) mod s + s · (cid:18) min y ∈ C t − c y,t − + (cid:22) s − w s (cid:23)(cid:19) (Observation 1) = s · min y ∈ C t − c y,t − + s − w (1) = q t − ( x ) + w (cid:16) inductionhypothesis (cid:17) ≥ v x,t − + w = v x,t . Proof of Lemma 2

Proof.

Since | C | ≤ C , we get that the conditionsin Line 2 and Line 13 are always satisﬁed. Before thequeried element x ﬁrst appeared, we have r x = c x =0 and thus Query ( x )= 0. Once x appears once, itgets a counter and upon every arrival with value w , theestimation for x exactly increases by w , since x nevergets evicted (which can only happen in Line 7). Proof of Lemma 3

Proof.

We prove the claim by induction on the streamlength t . Basis: t = 0.In this case, all counters have value of 0 and thus (cid:80) x ∈ C t q t ( x ) = 0 = t · ( M · (1 + φ/ Hypothesis: (cid:80) x ∈ C t − q t − ( x ) ≤ ( t − · M · (1 + φ/ . Step: (cid:104) x t , w t (cid:105) arrives at time t . We consider the fol-lowing cases:1. x ∈ C t − or | C t − | < (cid:108) φ(cid:15) (cid:109) . In this case, thecondition in Line 2 is satisﬁed and thus c x,t = c x,t − + (cid:106) r x,t − + w s (cid:107) (Line 3) and r x,t = ( r x,t − + w )mod s (Line 4). By Observation 1 we get q t ( x ) = (cid:16) by line13 (cid:17) r x,t + s · c x,t = c x,t − + (cid:22) r x,t − + w s (cid:23) + ( r x,t − + w ) mod s = w + c x,t − + r x,t − = q t − ( x ) + w. (2) Since the value of a query for every y ∈ C t \ { x } remains unchanged, we get that (cid:88) y ∈ Ct q t ( y ) = q t ( x ) + (cid:88) y ∈ Ct − y (cid:54) = x q t − ( y ) (by (3)) = w + q t − ( x ) + (cid:88) y ∈ Ct − y (cid:54) = x q t − ( y )= w + (cid:88) y ∈ Ct − q t − ( y ) (cid:16) inductionhypothesis (cid:17) ≤ w + ( t − · ( M · (1 + φ/ ≤ M + ( t − · ( M · (1 + φ/ ( φ ≥ ≤ t · ( M · (1 + φ/ . x / ∈ C t − and | C t − | = (cid:108) φ(cid:15) (cid:109) . In this case, thecondition of Line 2 is false and therefore c x,t = c m,t − + (cid:4) s − w s (cid:5) (Line 8) and r x,t ← ( s − w )mod s (Line 9). From Observation 1 we get that q t ( x ) = (cid:16) by Line13 (cid:17) r x,t + s · c x,t = c m,t − + (cid:22) s − w s (cid:23) + ( s − w ) mod s = w + c m,t − + s − q t − ( m ) − r m,t − + (cid:22) Mφ (cid:23) + w ≤ q t − ( m ) + (cid:22) Mφ (cid:23) + w. (3) As before, the value of a query for every y ∈ C t \{ x } is unchanged, and since C t − \ C t = { m } , (cid:88) y ∈ Ct q t ( y ) = q t ( x ) − q t − ( m ) + (cid:88) y ∈ Ct − q t − ( y ) (by (3)) ≤ (cid:22) Mφ (cid:23) + w + (cid:88) y ∈ Ct − q t − ( y ) (cid:16) inductionhypothesis (cid:17) ≤ (cid:22) Mφ (cid:23) + w + ( t − · ( M · (1 + φ/ ≤ (cid:22) Mφ (cid:23) + M + ( t − · ( M · (1 + φ/ ( φ ≥ ≤ t · ( M · (1 + φ/ . a) SanJose14 (b) YouTube (c) Chicago16(d) SanJose13 (e) DC1 (f) Chicago15 Figure 8: The eﬀect of parameter φ on operation speed for diﬀerent error guarantees ( (cid:15) ). φ inﬂuences the spacerequirement as the algorithm is allocated with (cid:108) φ(cid:15) (cid:109) counters. Proof of Lemma 4

Proof.

First, consider the case where the streamcontains at most (cid:108) φ(cid:15) (cid:109) distinct elements. By Lemma 2, (cid:98) v x ≤ v x and the claim holds. Otherwise, we have seenmore than (cid:108) φ(cid:15) (cid:109) distinct elements, and speciﬁcally t > (cid:24) φ(cid:15) (cid:25) . (4) From Lemma 3, it follows that min y ∈ Ct Query ( y ) ≤ t · M · (1 + φ/ (cid:108) φ(cid:15) (cid:109) ≤ t · M · (cid:15) · (1 + φ/ φ . (5) Notice that ∀ x ∈ C t , Query ( x ) is determined in Line 13;that is, q t ( x ) = r x,t + s · c x,t . Next, observe that anitem’s remainder value is bounded by s − ∀ x, y ∈ C t : q t ( x ) ≥ s + q t ( y ) = ⇒ c x,t > c y,t . (6) By choosing y ∈ arg min y ∈ C t q t ( y ), we get that if v x,t ≥ q t ( y ) + s , then q t ( x ) ≥ q t ( y ) + s and thus c x,t > c y,t .Next, we show that if v x,t ≥ t · M · (cid:15) , then c x > min y ∈ C t c y and thus x will never be the “victim” in Line 7: q t ( x ) ≥ v x,t ≥ t · M · (cid:15) = t · M · (cid:15) · φ/

21 + φ + Mφ/ · t φ(cid:15) (5) ≥ q t ( y ) + Mφ/ · t φ(cid:15) (4) > q t ( y ) + Mφ/ . Next, since q t ( x ) and q t ( y ) are integers, it follows that q t ( x ) ≥ q t ( y ) + (cid:22) M · φ (cid:23) = q t ( y ) + s . Finally, we apply (6) to conclude that once x arriveswith a cumulative volume of t · M · (cid:15) , it will never beevicted (Line 7) and from that moment on its volumewill be measured exactly. Proof of Lemma 5

Proof.

As mentioned before, FAST utilizes the SOSdata structure that answers queries in O (1). Updatesare a bit more complex as we need to handle weightsand thus may be required to move the ﬂow more thanonce, upon a counter increase. Whenever we wish toincrease the value of a counter (Line 3 and Line 8), weneed to remove the item from its current group andplace it in a group that has the increased c value. Thismeans that for increasing a counter by n ∈ N , we haveto traverse at most n groups until we ﬁnd the correctlocation. Since the remainder value is at most s − (cid:4) s − w s (cid:5) (Line 312nd Line 8). Finally, since s = (cid:106) M · φ + 1 (cid:107) , we get thatthe counter increase is bounded by (cid:22) (cid:98) M · φ/ (cid:99) − w (cid:98) M · φ/ (cid:99) (cid:23) < wMφ/ ≤ φ = O (cid:18) φ (cid:19) . B. MISSING FIGURE

Figure 8 shows runtime performance evaluation ofFAST as a function of φ for three diﬀerent ε values(2 − , − , − ). While we indeed obtained a speedupwith larger φ values, increasing φφ