[PDF] SRLA: A real time sliding time window super point cardinality estimation algorithm for high speed network based on GPU

Abstract

Super point is a special host in network which communicates with lots of other hosts in a certain time period. The number of hosts contacting with a super point is called as its cardinality. Cardinality estimating plays important roles in network management and security. All of existing works focus on how to estimate super point's cardinality under discrete time window. But discrete time window causes great delay and the accuracy of estimating result is subject to the starting of the window. sliding time window, moving forwarding a small slice every time, offers a more accuracy and timely scale to monitor super point's cardinality. On the other hand, super point's cardinality estimating under sliding time window is more difficult because it requires an algorithm to record the cardinality incrementally and report them immediately at the end of the sliding duration. This paper firstly solves this problem by devising a sliding time window available algorithm SRLA. SRLA records hosts cardinality by a novel structure which could be updated incrementally. In order to reduce the cardinality estimating time at the end of every sliding time window, SRLA generates a super point candidate list while scanning packets and calculates the cardinality of hosts in the candidate list only. It also has the ability to run parallel to deal with high speed network in line speed. This paper gives the way to deploy SRLA on a common GPU. Experiments on real world traffics which have 40 GB/s bandwidth show that SRLA successfully estimates super point's cardinality within 100 milliseconds under sliding time window when running on a low cost Nvidia GPU, GTX650 with 1 GB memory. The estimating time of SRLA is much smaller than that of other algorithms which consumes more than 2000 milliseconds under discrete time window.

Full PDF

SSRLA: A real time sliding time window super point cardinality estimationalgorithm for high speed network based on GPU

Jie Xu a, ∗ , Wei Ding b , Jian Gong b , Xiaoyan Hu b a School of Computer Science and Engineering, South East University, Nanjing, China b School of Cyber Science and Engineering, South East University, Nanjing, China

Abstract

Super point is a special host in network which communicates with lots of other hosts in a certain time period.The number of hosts contacting with a super point is called as its cardinality. Cardinality estimating playsimportant roles in network management and security. All of existing works focus on how to estimate superpoint’s cardinality under discrete time window. But discrete time window causes great delay and the accuracyof estimating result is subject to the starting of the window. sliding time window, moving forwarding a smallslice every time, oﬀers a more accuracy and timely scale to monitor super point’s cardinality. On the otherhand, super point’s cardinality estimating under sliding time window is more diﬃcult because it requires analgorithm can record the cardinality incrementally and report them immediately at the end of the slidingduration. This paper ﬁrstly solves this problem by devising a sliding time window available algorithmSRLA. SRLA consists of two cardinality estimating algorithms, sliding rough estimator SRE and slidinglinear estimator SLE. SRE is used to detect super point while scanning packets and it generates a candidatesuper point list at the end of a sliding time window. With this candidate super point list, SLE estimatesthe cardinality of every candidate super point fast and accurately. SRLA’s ability of working under slidingtime window comes from a novel cardinality recorder, distance recorder DR. DR records the time a hostappearing and helps SRE and SLE to judge if a host contacting with a super point is in a certain timeperiod. SRLA could run parallel to deal with high speed network in line speed. This paper also gives theway to deploy SRLA on a common GPU. Experiments on real world traﬃcs which have 40 GB/s bandwidthshow that SRLA estimates super point’s cardinality within 100 milliseconds under sliding time window whenrunning on a Nvidia GPU GTX650 with 1 GB memory. The estimating time of SRLA is much smaller thanthat of other algorithms which consumes more than 2000 milliseconds under discrete time window.

1. Introduction

Super point cardinality estimation has beenresearched for a long time because of itsimportance[1][2][3]. And many excellent algorithmshave been proposed recent years[4][5] . But thesealgorithms only work for discrete time window, un-der which there is no duplicating time period be-tween two adjacent windows. These algorithms willreinitialize at the beginning of every window anddiscard hosts’ cardinality information of previous ∗ Corresponding author

Email addresses: [email protected] (Jie Xu), [email protected] (Wei Ding), [email protected] (Jian Gong), [email protected] (Xiaoyan Hu) time[6]. The discrete time window splits host cardi-nality into discrete pieces and doesn’t report superpoint cardinality until the end of a window whichhas a latency of the size of time window. Slidingtime window which moves a small unit smoothlyhas a better measurement result than discrete timewindow. It stores and updates host cardinality in-formation incrementally. Sliding time window es-timates super point cardinality more precisely be-cause it is not aﬀected by the starting of window.And sliding time window reports super point moretimely for the sake that the moving step is muchsmaller than the size of discrete time window and atthe end of each moving step, super point cardinal-ity will be estimated immediately. But super pointdetection and cardinality estimation under sliding

Preprint submitted to Elsevier November 5, 2018 a r X i v : . [ c s . N I] J u l ime window is more complex than that under dis-crete time window because it maintains hosts stateof some previous time and estimates super point’scardinality more frequently.Super point’s cardinality estimation could be di-vided into three procedures: packets scanning, su-per point detection and cardinality estimation. Theﬁrst procedure scans every packet and records nec-essary information about hosts in diﬀerent net-works. Procedures 2 and 3 detect super points andestimate their cardinalities according to the record-ing information. They always run together at theend of a time window which is what all of existingalgorithms do[7][8]. But the merging of super pointdetection and cardinality estimation will consumelots of time at the end of a time window becauserestoring super points from huge hosts is a complexprocedure. Under discrete time window, the endwindow procedure time will not cause many inﬂu-ence because two adjacent discrete time windowshave no duplicate time period and duration betweentheir end points is equal to the length of them. Butunder sliding time window, the duration of two win-dows’ end points is only a small part of the windowsize and super point’s cardinality will be estimatedmore frequently than that under discrete time win-dow. In order to estimate super point’s cardinalityunder sliding time window in real time, the end win-dow procedure time must be much smaller than thesliding step. If we try to detect super point whilescanning packets in the time window and only es-timate the cardinalities of these detected candidatesuper points, the procedure time at the end of thetime window will be reduced greatly.Millions of packets passing through a high speednetwork every second[9][10]. So the super pointdetection algorithm running with packets scanningmust be light weight: small memory requirement,fast processing speed. This light weight detectionalgorithm generates a candidate super points listwhile scanning packets. At the end of a time win-dow, a more accurate algorithm will be used toestimate every candidate super point’s cardinalityfast. This paper devises two estimators: slidingrough estimator SRE and sliding long estimator

SLE . SRE is a light weigh estimator which judgesif a host is super point while scanning packets.And

SLE calculates the cardinality of a given host.Based on these two estimators, a novel sliding su-per point’s cardinality estimating algorithm

SRLA is proposed. In order to work under sliding timewindow,

SRLA uses a new method called as dis- tance recorder DR to record the appearance of ahost. DR helps SRLA to judge if a host appearsin a certain sliding time window by updating itselfincrementally.Nowadays network bandwidth is becoming higherand higher[11][12]. To estimate super point’s cardi-nality from the high speed network in real time, par-allel processing technology is necessary. Most of theprevious algorithms tried to accelerate the packetsprocessing speed by used fast memory SRAM. Butthe small size SRAM limits the accuracy of thesealgorithms in a high-speed network. What’s more,estimation algorithm requires lots of computationoperations and the computation ability of CPU isalso the bottleneck. Parallel computation abilityof GPU (Graphic Processing Unit) is stronger thanthat of CPU because of its plenty operating cores.When using GPU to scan packets parallel, a highthroughput will be acquired.Motivated by these ideas, this paper ﬁrstly pro-posed a sliding time window available super pointcardinality estimation algorithm

SRLA . The maincontribution of this paper is listed below.1. Devise a novel light weight method to judgeif a host is a super point under sliding timewindow.2. Firstly propose a super point detection andcardinality estimation algorithm under slidingtime window.3. Deploy the sliding super point detection andcardinality estimation algorithm on a commonGPU to deal with core network in real time.In the next section, we will introduce previous su-per point detection algorithm under discrete timewindow and analyze their merit and weakness. Insection 3, two sliding time window available cardi-nality estimators, sliding rough estimator and slid-ing linear estimator, are proposed. Section 4 intro-duces the novel algorithm

SRLA and describes howit does to detect super points and estimate the car-dinality under sliding time window. In this section,a method to deploy

SRLA on GPU is also pro-posed. Section 5 shows experiments of real world40Gb/s core network traﬃc. And we make a con-clusion in the last section.

2. Related work

Super point detection is a hot topic in networkresearch ﬁeld. Shobha et al.[7] proposed an algo-2ithm that did not keep the state of every host sothis algorithm can scale very well. Cao et al.[6] useda pair-based sampling method to eliminate the ma-jority of low opposite number hosts and reservedmore resource to estimate the opposite number ofthe resting hosts. Estan et al.[13] proposed two bitsmap algorithms based on sampling ﬂows. Severalhosts could share a bit of this map to reduce mem-ory consumption. All of these methods were basedon sampling ﬂows which limited its accuracy.Wang et al.[14] devised a novel structure, calleddouble connection degree sketch (DCDS), to storeand estimate diﬀerent hosts cardinalities. They up-dated DCDS by setting several bits simply. In orderto restore super points at the end of a time period,which bits to be updated were determined by Chi-nese Remainder Theory(CRT) when scanning pack-ets. By using CRT, every bit of DCDS could beshared by diﬀerent hosts. But the computing pro-cess of CRT was very complex which limited thespeed of this algorithm.Liu et al.[15] proposed a simple method to restoresuper hosts basing on bloom ﬁlter. They called thisalgorithm as Vector Bloom Filter(VBF). VBF usedthe bits extracted from a IP address to decide whichbits to be updated when scanning packets. Com-pared with CRT, bit extraction only needed a smalloperation. But VBF would consume much timeto restore super point when the number of superpoints was very big because it used only four bitarrays to record cardinalities.Most of the previous works only focused on accel-erating speed by adopting fast memory but they ne-glected the calculation ability of processors. Seon-Ho et al.[16] ﬁrst used GPU to estimate hosts op-posite numbers. They devised a Collision-toleranthash table to ﬁlter ﬂows from origin traﬃc andused a bitmap data structure to record and esti-mate hosts’ opposite numbers. But this methodneeded to store IP address of every ﬂow while scan-ning traﬃc because they could not restore superpoints from the bitmap directly. Additional candi-date IP address storing space increased the memoryrequirement of this algorithm.All of these algorithms can’t not work under slid-ing time window because they must reinitialize theirdata structures at the beginning of every window.And they consume too much time when estimatingcardinalities which is longer than the step of win-dow sliding. In the following part, an incrementallyupdating and fast estimating algorithm is proposedto detect super point and estimate their cardinali- ties under sliding time window.

3. Sliding cardinality estimation

There are huge hosts in a high speed network.But super point takes up a little proportion. Esti-mating super point’s cardinality under sliding timewindow contains two parts: detecting super pointsunder sliding time window, estimating their cardi-nalities. Both of these parts have a same issue, howto estimate a host’s cardinality under sliding timewindow. Two sliding estimators, sliding rough esti-mator (SRE) and sliding linear estimator (SLE), areintroduced in this section for super point detectionand cardinality estimation separately. SRE judgesif a host is super point with small memory and SLEgives the cardinality estimation for a certain host.For verbal clarity, we ﬁrstly give the deﬁnition ofsliding time window.

Suppose there are two networks A and B .These two networks are contacting with each otherthrough an edge router ER . A might be a city-wide network or even a country-wide network. And B might be another city-wide network or the Inter-net. All traﬃc between A and B could be observedfrom ER . Split this traﬃc by successive time slicesas shown in ﬁgure 1. Figure 1: Sliding time window and discrete time window

These time slices have the same duration. Thelength of a time slice could be 1 second, 1 minute orany period in diﬀerent situations. Every time sliceis identiﬁed by a number. A sliding time window W ( t, k ) contains k successive slices starting fromthe t time slice as shown in the top part of ﬁgure1. Sliding time window will move forward one slice3nce a time. So two adjacent sliding time windowscontain k − k is set to 1, thereis no duplicate time period between two adjacentwindows, which is the case of discrete time windowin the bottom part of ﬁgure 1.Let A be the network from which we want todetect super points. A host’s packets stream in asliding time window is deﬁned as below. Deﬁnition 1 (Packets stream of a host) . For ahost aip ∈ A , every packet passing through ER insliding time window W ( t, k ) which has aip as sourceor destination address composes packets stream of aip , written as P kt ( aip, t, k ). aip ’s opposite hosts stream ST ( aip, t, k ) couldbe derived from P kt ( aip, t, k ) by extracting theother IP address except aip . A IP address bip may appear several times in ST ( aip, t, k ) because aip can send several packets to bip or receivemany packets from bip . Hosts in ST ( aip, t, k )make up of opposite hosts set of aip , writtenas OP ( aip, t, k ). The number of element in OP ( aip, t, k ), denoted as | OP ( aip, t, k ) | , is no big-ger than that of ST ( aip, t, k ). | OP ( aip, t, k ) | is thecardinality of aip in sliding time window W ( t, k ).Sliding super point is deﬁned according to host’scardinality. Deﬁnition 2 (Sliding super point) . For a host aip ∈ A , if | OP ( aip, t, k ) | ≥ θ , aip is a sliding superpoint in sliding time window W ( t, k ). Where θ is apositive integer.Threshold θ is deﬁned by users for diﬀerent appli-cations. It could be selected according to the aver-age cardinality of all host in the past or the normalcardinality of a server. How to get | OP ( aip, t, k ) | from ST ( aip, t, k ) is a hard task. Because pack-ets pass through ER with high speed and everypacket could only be scanned a time in the stream.How to process every coming packet, judge if it isa super point and give an accurate estimation of | OP ( aip, t, k ) | at the end of the last time slice of W ( t, k ) is the key step in the whole algorithm. Cardinality estimator scans IP pair stream in awindow and gives hosts’ cardinalities estimation atthe end of this window. A IP pair is extracted forma packet passing through R . Deﬁnition 3.

IPpair A IP pair is a tuple oftwo IP addresses extracted from a packet like < aip i , bip j > where aip i ∈ A and bip j ∈ B . IP pairstream in W ( t, k ) is the stream of IP pairs extractedfrom every packet passing through R in W ( t, k ) andit is denoted by IP pair ( A, t, k ).For a host aip in A , its IP pair stream IP pair ( aip, t, k ) is the sub stream of IP pairs whichhave aip as the ﬁrst IP addresses. Let OP ( aip, t, k )represent the set of the second IP addresses of IPpairs in IP pair ( aip, t, k ). The task of estimate aip ’s cardinality is to get the number of hosts in OP ( aip, t, k ), written as | OP ( aip, t, k ) | , by scan-ning every IP pair in IP pair ( aip, t, k ). In orderto calculate | OP ( aip, t, k ) | , a key step is to acquirehow many distinct IP pairs appear in W ( t, k ). Inanother word, for a given IP pair < aip, bip j > which may appears several times in a slice, theproblem is to determine if it appears in W ( t, k ).This problem is simple in discrete time windowwhere k = 1 by using a single bit which will beset to 1 if < aip, bip j > appears. But in slidingtime window when k >

1, there are k − W ( t, k ) slides to W ( t + 1 , k ), there are k − t + 1 toslice t + k −

1, appearing in them at the same time. < aip, bip j > may appear in slice t or in some slicesafter t . A sliding cardinality estimator must distin-guish these cases and judge if < aip, bip j > appearsin the new window after sliding. This paper devisesa new recorder, distance recorder DR , to solve thisproblem. DR is a recorder which consists of z bits. Itrecords the distance between the nearest slice where < aip, bip j > appears and the current scanningslice. For example, suppose that the estimator isnow scanning IP pair ( aip, t, k ) and < aip, bip j > appears in slice t − d and not appears in slices after t − d . Then the value of DR is d . When d = 0, thedistance will be 0 too which means < aip, bip j > appears in the current slice. Only when the dis-tance is smaller than k will host bip j appears in thesliding time window W ( t, k ). So z determines themax number of slices in a sliding time window andthe max value of k is 2 z −

1. In discrete time win-dow, 1 bit is big enough for DR . When all of the z bits in DR is set to 1, it means that the distance ismore than k and DR is also initialized to this value. DR has four operations as listed below where dr , dr , dr are instance of DR . DRinit( dr ) set every bit of dr to 1;4 Rset( dr ) set every bit of dr to 0; DRslide( dr ) if the value of dr is smaller than 2 z −

1, increment dr by 1; DRjoin( dr , dr ) return a new DR which has themax value of dr and dr These operations make sure that DR holds thecorrect distance for a certain IP pair or a cer-tain host in network B . A precise way to calcu-late | OP ( aip, t, k ) | is to allocate a DR for everyhost in OP ( aip, t, k ) and store these DR s by hashtable or tree structures. This method could ac-quire the exact value of | OP ( aip, t, k ) | by count-ing the number of DR whose value are smallerthan k at the end of every slice. But it alsoconsumes many memory and computing resource.For every host in OP ( aip, t, k ), precise method re-quires 32 + z bits, 32 bits for IP address and z bits for DR . The total memory requirement ismore than | OP ( aip, t, k ) | ∗ (32 + z ) / | OP ( aip, t, k ) | is very big, locating DR of every hostis also a hard task. So precise method is used to runoﬄine to acquire baseline to evaluate the accuracyof other algorithms.To saving memory and reducing processing time,estimator methods are required. When detectionsuper point, an estimator only needs to tell if a hostis super point or not. Under this requirement, amemory eﬃcient algorithm, sliding rough estimator SRE , is devised.For a host aip , the task of judging super pointis to determine if | OP ( aip, t, k ) | ≥ θ by scanningevery host in OP ( aip, t, k ) once. SRE proposed inthis paper is a memory eﬃcient algorithm whichcan tell if a host is a super point in a time periodwith only g DR s and 8 DR s are big enough for IPv4address. Its weight | SRE | k is the number of DR init whose value is smaller than k. These g DR s areinitialized to 2 z − SRE samples and records hosts in

IP pair ( aip, t, k )by the least signiﬁcant bits of their hashed value.Least signiﬁcant bit of an integer is deﬁned in thebelow. Deﬁnition 4 (Least signiﬁcant bit, LSB) . Givenan integer i , let BIN ( i ) represent its binary for-matter. The least signiﬁcant bit of i , LSB ( i ), isthe index of the ﬁrst 1’ bit of BIN ( i ) starting fromright.For example, LSB (3) = 0,

LSB (40) = 3. The bi-nary formatters of 3 and 40 are “11” and “101000”. The ﬁrst bit of

BIN (3) is 1, so

LSB (3) equalsto 0. While

BIN (40) meets its ﬁrst 1’until thefourth bit, so its

LSB is 3. For every host bip in IP pair ( aip, t, k ), SRE hashes it to a random valuebetween 0 and 2 − H .If LSB ( H ( bip )) is smaller than an integer τ , thisIP will not be recorded by SRE where τ is derivedfrom θ by equation1. τ = ceil ( log ( θ/g )) (1)When LSB ( H ( bip )) ≥ τ , a bit selected by H ( bip ) will be set where H is another hash func-tion mapping bip to a value between 0 and g − DR , if | SRE | k is no smaller than ρ ∗ g , | OP ( aip, t, k ) | is judged as bigger than θ by SRE , where ρ = 0 . ∗ (1 − e − / ). ρ is ac-quired from [18]. SRE deals with every host in

IP pair ( aip, t, k ) in this way. SRE has a high probability to report a superpoint. Then we will give its mathematical analyze.

Lemma 1.

Suppose there are α diﬀerent balls, g diﬀerent boxes and α ≥ g . Throw all of these ballsrandomly to these boxes. Let F N ( α, g ) representthe number of situations that every g boxes has atleast a ball. Then F N ( α, g ) = g α − (cid:80) r − i =1 C ir ∗ F N ( α, i ) and F N ( α,

1) = 1.

Proof.

There are total g α situations to threw α balls to g boxes. When there is only a box, thereis only a situation, throwing all balls to it. Whenthrowing all balls to i boxes and all of these boxescontain at least on balls, there are C ir ∗ F N ( α, i )situations. Deduct all situations that all balls arethrown to a subset of g boxes from g α , the rest isthe number of situations that there are no emptyboxes. Theorem 1.

Throw α balls to g boxes. Let g rep-resent the number of boxes that contain at least aball. The number of situations that there are g balls are none empty,denoted by F N ( α, g, g ) , is C g g ∗ F N ( α, n ) , where ≤ n ≤ g .Proof. The rest g − g balls are empty. There are C ng situations to choose gg empty balls. Each situationhas F N ( α, g ) methods to throw α balls. So thenumber of total situations is C g g ∗ F N ( α, g ). OP ( aip, t, k ) could be regarded as the set of ballsand g bits could be regarded as boxes in theorem1. | SRE | k means the number of DR whose valuesare smaller than k . Suppose there are α hosts in5 P ( aip, t, k ) updating SRE . The probability thatthere are | SRE | k = g is : P r { α, g, g } = F N ( α, g, g ) g α (2)Every host in OP ( aip, t, k ) has probability τ toupdate SRE . So the probability that there are α hosts in OP ( aip, t, k ) updating SRE is:

P r {| OP ( aip, t, k ) | , α } = C α | OP ( aip,t,k ) | ∗ τ α ∗ (1 − τ ) | OP ( aip,t,k ) |− α (3)Combine equation 2 and 3, we will get the prob-ability that there are g DR being set in RE afterscanning ST ( aip, t, k ) as shown in equation 4. P r {| OP ( aip, t, k ) | , g, τ, g } = | OP ( aip,t,k ) | (cid:88) α = g P r {| OP ( aip, t, k ) | , α } ∗ P r { α, g, g } (4). The probability that there are more than n SRE after scanning ST ( aip, t, k ) could bederived from 4 as shown in equation 5. P r + | OP ( aip, t, k ) | , g, τ, n = g (cid:88) g = n P r {| OP ( aip, t, k ) | , g, τ, g } (5)Equation 5 proofs that SRE has a high proba-bility to detect super point. But it is a light weightestimator and can’t give an accurate cardinality es-timation. Sliding linear estimator introduced in thenext makes up this shortage.

Linear estimator, LE , is a famous cardinality es-timation algorithm[19]. It uses g (cid:48) bits, which areinitialized to 0 at the beginning of a discrete timewindow, to estimate host’s cardinality. When scan-ning a host bip in IP pair ( aip, t, k ), one bit in LE selected by hash function H will be set. H ( bip )maps bip to a random value between 0 and g (cid:48) − | LE | represent the weight of LE , which meansthe number of 1 bit in it. At the end of a discretetime window, | OP ( aip, t, | will be estimated bythe following equation. | OP ( aip, t, | = − g (cid:48) ∗ ln ( g (cid:48) − | LE | g (cid:48) ) (6)But LE only works when k = 1. In order to esti-mate cardinality under sliding time window, slid-ing linear estimator SLE replaces the g (cid:48) bits in LE with g (cid:48) DR s. The weight of SLE denoted as | SLE | k is the number of short integer whose value issmaller than k . SLE estimates a host’s cardinalityby equation 7. | OP ( aip, t, k ) | (cid:48) = − g (cid:48) ∗ ln ( g (cid:48) − | SLE | k g (cid:48) ) (7)According to paper [19], the estimating accuracyof SLE depends on the value of g (cid:48) , the bigger g (cid:48) is,the more accurate the estimating result will be. Buta big g (cid:48) requires more time to calculating | SLE | k which increasing the estimating time. So SLE isonly suit to estimating cardinality of candidate su-per points at the end of slice, instead of estimatingevery time while scanning IP pair. When combining

SRE with

SLE , an novel sliding time window su-per point’s cardinality estimating algorithm

SRLA is proposed.

4. Detect super points and estimate theircardinalities on GPU

Network A contains a great number of hosts andit’s not eﬃcient to allocate a SRE and

SLE forevery host. This section introduces a novel algo-rithm which can detect super point and estimatetheir cardinalities under sliding time window withﬁxed number of estimators.

Because 8 DR s are big enough for SRE to judgeif a host is super point, it can detect super pointfast. When using with LE , it can estimate superpoint’s cardinality more quickly. Motivated by thisidea, we design a novel estimator, sliding estimator SE . SE consists of a SRE , a LE and 16 bits. The16 bits in SE is used to indicate that a host hasbeen judged as a super point in the time slice andwe call them as super point indicator SI . When ahost aip ∈ A is ﬁrstly judged as a super point, a bitin SI , selected by a hash function H ( aip ) where H hashes aip to a random value between [0,15],will be set to 1. Let SI [ i ] point to the i th bit in6 I . An array of SE with u rows and v columns,denoted by SEA , is used to detect super pointsin the network A and estimate their cardinalities.Figure 2 illustrates the structure of SEA . Figure 2: Structure of SEA

Every IP pair < aip, bip > will update u SE se-lecting from the u rows of SEA by u hash functions, RH i ( aip ) where 0 ≤ i ≤ u − U SE ( aip ) repre-sents the union SE of these u SE s in SEA and

U SI ( aip ), U RE ( aip ) and U LE ( aip ) represent the SI , SRE and LE in U SE ( aip ) respectively. For ahost aip ∈ A , its union SE in SEA is acquired byalgorithm 1. SI [ i, j ], RE [ i, j ] and LE [ i, j ] is the SI , SRE and LE of SE in the i th row, j th column. Af-ter updating these SE s, aip will be checked by U RE ( aip ) to test if it is a super point. If it isand U SI [ H ( aip )] is zero, aip will be inserted in toa candidate super point list and the H ( aip )th bitof every SI [ i, RH i ( aip )] will be set to 1. This willavoid to add aip to the candidate super point listmore times.To calculate U SE ( aip ) every time scanning anIP pair is time consuming, especially that the g (cid:48) isvery big often more than one thousand. We onlyneed to acquire U SI ( aip ) and U RE ( aip ) for superpoint judging and candidate super point list inser-tion. SI contains only 16 bits and SRE consists ofonly 8 DR for IPv4 address. The merging time willbe reduced greatly. Algorithm 2 describes how toupdate SEA for every IP pair.Algorithm 2 ﬁrstly updates u LE s by setting a DR of them to 0. Then it begins to update SRE after checking aip . Not every IP pair could pass

Algorithm 1

UnionSE

Input: aip ∈ A ; SEA

Output:

U SE ( aip ) the union of SE relating with aip Init

U SE ( aip )set every bit of SI in U SE ( aip ) to 1set every DR in U RE ( aip ) to 2 z − DR in U LE ( aip ) to 2 z − for i ∈ [0 , u − do U SI ( aip ) ⇐ U SI ( aip )& SI [ i, RH i ( aip )] for j ∈ [0 , g − do U RE ( aip )[ j ] ⇐ DRjoin ( U RE ( aip )[ j ] , RE [ i, RH i ( aip )][ j ]) end forfor j ∈ [0 , g (cid:48) − do U LE ( aip )[ j ] ⇐ DRjoin ( U LE ( aip )[ j ] , LE [ i, RH i ( aip )][ j ]) end forend for Return

U SE ( aip ) Algorithm 2

ScanIPpair

Output:

SEA ,IP pair < aip, bip >

Candidate super point list

CSIPleidx ⇐ H ( bip ) for ridx ∈ [0 , u − do LE [ i, RH i ( aip )][ leidx ] ⇐ end forif LSB ( H ( aip )) ≤ τ then Return end if reidx ⇐ H ( bip ) siidx ⇐ H ( bip ) for ridx ∈ [0 , u − do RE [ i, RH i ( aip )][ reidx ] ⇐ end forif | U RE ( aip ) | k ≥ τ thenif U SI ( aip )[ siidx ] equal to 0 then insert aip into CSIP for ridx ∈ [0 , u − do SI [ i, RH i ( aip )][ siidx ] ⇐ end forend ifend if τ of them updates SRE . This checking process accelerates the scan-ning speed greatly. When a IP pair updates, theﬁrst IP address of it will be checked if is a superpoint by the union

SRE . Super point reported by

SRE will be inserted into the candidate list

CSIP .Algorithm 2 deals with every IP pair in a slice. Af-ter scanning all IP pairs in this slice, the cardinalityof hosts in the candidate super point list could beacquired from

SEA by the algorithm described inthe next section.

SEA uses ﬁx number of LE , u ∗ v SE s, to esti-mate the cardinalities of all hosts in A . This causesthat a LE will record more than one hosts’ cardi-nalities and the result will be over estimating. Inorder to reduce the inﬂuence, u LE s will be usedtogether and a host’s cardinality will be estimatedfrom the union LE . But when there are many dis-tinct IP pairs in a slice, there are still many DR in the union LE setting by other hosts. Estimat-ing the number of these error DR and remove themfrom the union LE helps to improve the accuracyof cardinality estimation.Let | LDR ( i ) | k represent the number of all LE s’ DR in the i th row whose values are smaller than k . Then the probability that a DR of a LE in the i th row is set by some host is P LEdr ( i ) = | LDR ( i ) | k g (cid:48) ∗ v . | LDR ( i ) | k could be acquired by scanning every LE in the i th row. Suppose a LE is used to record thecardinality of a host aip exclusively. Then | LE | k isexpected to be d = g (cid:48) − g (cid:48) ∗ e − | OP ( aip,t,k ) | g (cid:48) , accordingto equation 7. In the union LE , every of these d DR will be set by some other hosts with probability U P

LEdr as shown in the following equation.

U P

LEdr = u − (cid:89) i =0 P LEdr ( i ) (8)Let | U LE ( aip ) | k represent the number of DR in the union LE whose values are smaller than k .Then | U LE ( aip ) | k = d + ( g − d ) ∗ U P

LEdr . And aip ’s cardinality could be estimated by the follow-ing equation. | OP ( aip, t, k ) | (cid:48) = − g (cid:48) ∗ ln (1 − | U LE | k − g (cid:48) ∗ U P

LEdr g (cid:48) ∗ (1 − U P

LEdr ) )(9) Equation 9 gives a more accurate estimation byremoving the error setting DR from U LE . The car-dinality of every host in the candidate super pointlist will be estimated in this way.

SRLA works under sliding time window. To dothis,

SRLA updates

SEA incrementally instead ofreinitialize it before every time slice. After estimat-ing super point’s cardinality,

SRLA updates all SI , DR and the candidate super point list by algorithm3. Algorithm 3

SEA updating before sliding

Input:

SEA

Candidate super point list

CSIP

Output:

New candidate super point list

N CSIP for si in SI of every SE in SEA do si ⇐ end forfor dr in DR of all SRE and LE of SE in SEA do if dr < z − then dr + + end ifend forfor aip in CSIP doif | U RE ( aip ) | k ≥ g ∗ ρ then insert aip into N CSIP for ridx ∈ [0 , u − do SI [ i, RH i ( aip )][ siidx ] ⇐ end forend ifend for Return

N CSIP

Algorithm 3 not only updates all DR in SEA butalso derives a new candidate super point list fromnow current one for the next time window. Thismakes sure that no super points will be neglected.For example, if aip is a super point in W ( t + 1 , k )and all of its opposite hosts appear in time slice t +1to t + k −

1. In this case, aip will not be insertedinto the candidate super point list while scanningIP pairs in time slice t + k . But it could be detectedout from the candidate super point lists in W ( t, k ). While scanning IP pairs,

SRLA only sets some SI s, DR s. Bot SI s and DR could be set by serverthreads at the same time without causing any mis-takes, because a bit or a DR is still being 1” orzero after setting several times. So several IP pairs8ould be processed concurrently. GPU is a specialdevice which contains plenty computing cores andhas high memory accessing through put. Althoughthe ability of every single core of CPU is a littlestronger than that of GPU, but the total comput-ing resource of a GPU card is much more abundantthan that of CPU considering the plenty number ofcores a GPU containing.GPU is good at these tasks which process hugedata with the same instructions. SRLA is such onethat scanning diﬀerent IP pairs by algorithm 2. ButGPU could only access its own memory directly, sothese IP pairs should be stored in a buﬀer and thencopied to GPU’s graphic memory as shown in ﬁgure3.

Figure 3: Structure of SEA

Before

SRLA starting,

SEA will be initializedon GPU’s graphic memory to be accessed by GPUthreads directly. When the IP pairs buﬀer is full,it will be sent to GPU’s global memory by PCIebus. After receiving these IP pairs, GPU launchesthousands of cores to deal with them at the sametime. Stream processor SP is a set of hundredsof computing cores. A GPU card contains several SP s. Every SP reads a part of IP pairs in thebuﬀer and distributes them to diﬀerent cores forfurther processing. Every core runs algorithm 2 toupdate SEA and candidate super point list

CSIP in a time slice.After scanning all IP pairs in a slice, every com-puting core estimates cardinality of candidate superpoint by equation 9.Let C u represent the time of IP pairs scanning, C e represent the time of candidate super point’scardinality estimation and C s represent the dura-tion of a slice. In order to deal with high speed net- work traﬃc in real time, C u + C e must be smallerthan C s . Cardinality of candidate super point willnot be estimated until the end of a slice, so theestimating latency under sliding time window is C s + C e . Experiments proves that for a 40Gb/snetwork, SRLA ’s C e is as small as 300 millisecondswith a common GPU card and C u is smaller than150 milliseconds. This shows that SRLA workswell under a sliding time window whose sliding stepcould be as small as 1 second when running onGPU.

5. Experiment

To evaluate the performance of

SRLA , we usea real world traﬃc collecting from the node ofJiangSu province of CERNET. The experimentdata are two one-hour traﬃcs starting from 13:00on October 21 and 23, 2017. There are two parts inour experiments: super point cardinality estimationunder discrete time window and super point cardi-nality under sliding time window. In both of theseparts, super point’s threshold θ is set to 1024. Theexperiment runs on a PC with GPU card NvidiaGTX 650, 1 GB graphic memory. The parameter of the discrete time window is setto C s = 300 seconds, k = 1 and z = 1. There are 12discrete time windows in a one-hour traﬃc and theaverage information of these two traﬃcs are listedin talbe 1.In table 1, AN etIP ” and

BN etIP ” mean thenumber of hosts in A and B separately and F low ”means the average number of distinct IP pairs in adiscrete time window. From it we can see that,the average packets speed of this traﬃc is 4.5 mpps(million packets per second) and super point makesup smaller than 0.047 percent of the total hosts in A .Accuracy is a key merit of cardinality estima-tion. We measure the accuracy by false positiverate(FPR), false negative rate(FNR) as deﬁned be-low. Deﬁnition 5 (FPR/FNR) . For a traﬃc with N super points, an algorithm detects N (cid:48) super points.In the N (cid:48) detected super points, there are N + hostswhich are not super points. And there are N − su-per points which are not detected by the algorithm.FPR means the ratio of N + to N and FNR meansthe ratio of N − to N .9 able 1: Traﬃc informationFigure 4: SRLA accuracy comparing on traﬃc Oct, 21, 2017 FPR may decrease with the increase of FNR. Ifan algorithm reports more hosts as super point, itsFNR will decrease but FPR will increase. So we usethe sum of FPR and FNR, total false rate TFR, toevaluate the accuracy of an algorithm.The parameters of

SEA inﬂuences the accuracyof

SRLA . We ﬁrstly compare the accuracy of

SRLA with diﬀerent u , v and g (cid:48) . Figure 4 and5 show the average accuracy of SRLA under the12 discrete time windows of diﬀerent traﬃcs andparameters.Every sub ﬁgure compares the accuracy of

SRLA under diﬀerent g (cid:48) , changing from 1024 to 8192. Big g (cid:48) helps to reduce T F R in most cases. But big g (cid:48) requires more time to acquire | SLE | k which willcause a big C e . T F R also decreases gradually withthe increase of v . But when v grows to 131072from 65536, T F R decreases slightly but memorydoubles.

SRLA has the lowest false rate when g (cid:48) , u and v are set to the biggest value. But memoryrequirement and C e also grow rapidly. When g (cid:48) =1024, v = 65536 and u = 4, SRLA ’s false ratesare small enough and in the following experiments,

SRLA ’s parameters are set as these values. Andwhen running under discrete time window, z = 1 isenough for DR .To compare the performance of SRLA with otheralgorithms, we use DCDS[14], VBFA[15], GSE [16]to compare with it. Table 2 lists the average resultof all the 24 discrete time windows.GSE has a lower FPR than other algorithms. Itcan remove fake super points according the esti-mating ﬂow number. But GSE may remove somesuper points too, which causes it has a higher FNR.Because it uses discrete bits to record host’s cardi-nality, collecting all of these bits together when esti-mate super points cardinality will use lots of time.DCDS uses CRT when storing host’s cardinality.CRT has a better randomness which makes DCDShas a lower FNR. But CRT is very complex con-taining many operations. So DCDS’s speed is thelowest among all of these algorithms. VBFA hasthe fastest speed but its TFR is higher than that ofSRLA.From table 2 we can see that, SRLA uses thesmallest memory, smaller than one-twentieth of10 igure 5: SRLA accuracy comparing on traﬃc Oct, 23, 2017Table 2: Comparing result under others’ memory. Because

SRLA generates a can-didate super point list while packets scanning, soit has the smallest C e , only 4 milliseconds. And SRLA is the only one which can run under slidingtime window.

In the sliding time window experiments, a timeslice is set to 1 second, k is 300 and z equals to16. We let the window sliding from W (0 , W (2999 , SRLA runs on traﬃc 2017-10-21.

SRLA ’s FPR, FNR and TFR are illustrated inﬁgure 6, 7 and 8.Under most sliding time window,

SRLA has alow FNR, smaller than 1 . SRLA has the similar accuracy when it under discretetime window. This proves that

SRLA estimatessuper point cardinality successfully under slidingtime window on GPU. In the sliding time windowexperiments,

SRLA ’s average C e is 100 millisec-onds which is more than that under discrete time window. Because in sliding time window, z is setto 16 and SRLA requires more time to calculate | SLE | k . But C e + C u is still much smaller than C s and SRLA ’s average C u is 109 milliseconds for ev-ery single slice. So SRLA can detect and estimatethe cardinality of super point in real time undersliding time window.

6. Conclusion

Super point cardinality estimation is an impor-tant and diﬃcult task on network management. In-cremental updating and small estimating time aretwo special diﬃculties in it.

SRLA proposed in thispaper is the ﬁrst one solve this problem in real timewith a common GPU.

SRLA ’s capability of incre-mental updating comes from DR , a new recorderwhich can determine if itself is updated in a certainsliding time window. In order to reduce the superpoint’s cardinality estimation time, SRLA generat-ing a candidate super point list while scanning IPpairs. This candidate super point list is acquiredby the light weight sliding estimator

SRE . SRE is11 igure 6: FPR under sliding time windowFigure 7: FNR under sliding time windowFigure 8: TFR under sliding time window memory eﬃcient and fast processing which makessure that it doesn’t cause many additional time forIP pairs scanning. At the end of a time slice,

SRLA estimates every candidate super point in the list bysliding linear estimator

SLE . SLE gives a highaccuracy estimation of a host’s cardinality. Whenrunning on a common GPU,

SRLA estimates thesuper point cardinality in real time for a 40Gb/snetwork.

7. ReferenceReferences [1] S. Khattak, Z. Ahmed, A. A. Syed, and S. A.Khayam, “Botﬂex: A community-driven tool forbotnet detection,”

Journal of Network and Com-puter Applications

ComputersElectrical Engineering

Journal of Networkand Computer Applications , Aug 2014, pp. 333–337.[5] P. Wang, X. Guan, J. Zhao, J. Tao, and T. Qin, “Anew sketch method for measuring host connection de-gree distribution,”

IEEE Transactions on InformationForensics and Security , vol. 9, no. 6, pp. 948–960, June2014.[6] J. Cao, Y. Jin, A. Chen, T. Bu, and Z. L. Zhang, “Iden-tifying high cardinality internet hosts,” in

IEEE INFO-COM 2009 , April 2009, pp. 810–818.[7] S. Venkataraman, D. Song, P. B. Gibbons, andA. Blum, “New streaming algorithms for fast detection f superspreaders,” in in Proceedings of Network andDistributed System Security Symposium (NDSS , 2005,pp. 149–166.[8] G. Cormode and M. Hadjieleftheriou, “Finding thefrequent items in streams of data,” Commun. ACM ,vol. 52, no. 10, pp. 97–105, Oct. 2009. [Online]. Avail-able: http://doi.acm.org/10.1145/1562764.1562789[9] A. B. Paul, S. Biswas, S. Nandi, and S. Chakraborty,“Matem: A uniﬁed framework based on trustand mcdm for assuring security, reliability andqos in dtn routing,”

Journal of Network andComputer Applications

Journal of Networkand Computer Applications

Journal of Network andComputer Applications

IEEE/ACM Trans. Netw. , vol. 14, no. 5,pp. 925–937, Oct. 2006. [Online]. Available: http://dx.doi.org/10.1109/TNET.2006.882836[14] P. Wang, X. Guan, T. Qin, and Q. Huang, “A datastreaming method for monitoring host connection de-grees of high-speed links,”

IEEE Transactions on In-formation Forensics and Security , vol. 6, no. 3, pp.1086–1098, Sept 2011.[15] W. Liu, W. Qu, J. Gong, and K. Li, “Detection of super-points using a vector bloom ﬁlter,”

IEEE Transactionson Information Forensics and Security , vol. 11, no. 3,pp. 514–527, March 2016.[16] S.-H. Shin, E.-J. Im, and M. Yoon, “A grandspread estimator using a graphics processing unit,”

Journal of Parallel and Distributed Computing

Journal of Algorithms

Proceedings of the Twenty-ninth ACMSIGMOD-SIGACT-SIGART Symposium on Principlesof Database Systems , ser. PODS ’10. New York,NY, USA: ACM, 2010, pp. 41–52. [Online]. Available:http://doi.acm.org/10.1145/1807085.1807094[19] K.-Y. Whang, B. T. Vander-Zanden, and H. M. Taylor,“A linear-time probabilistic counting algorithm for database applications,”

ACM Trans. Database Syst. ,vol. 15, no. 2, pp. 208–229, Jun. 1990. [Online].Available: http://doi.acm.org/10.1145/78922.78925[20] CERNET, “China education and research net-work,” http://iptas.edu.cn/src/system.php, 2017, on-line;accessed 2017.,vol. 15, no. 2, pp. 208–229, Jun. 1990. [Online].Available: http://doi.acm.org/10.1145/78922.78925[20] CERNET, “China education and research net-work,” http://iptas.edu.cn/src/system.php, 2017, on-line;accessed 2017.