SRLA: A real time sliding time window super point cardinality estimation algorithm for high speed network based on GPU
SSRLA: A real time sliding time window super point cardinality estimationalgorithm for high speed network based on GPU
Jie Xu a, ∗ , Wei Ding b , Jian Gong b , Xiaoyan Hu b a School of Computer Science and Engineering, South East University, Nanjing, China b School of Cyber Science and Engineering, South East University, Nanjing, China
Abstract
Super point is a special host in network which communicates with lots of other hosts in a certain time period.The number of hosts contacting with a super point is called as its cardinality. Cardinality estimating playsimportant roles in network management and security. All of existing works focus on how to estimate superpoint’s cardinality under discrete time window. But discrete time window causes great delay and the accuracyof estimating result is subject to the starting of the window. sliding time window, moving forwarding a smallslice every time, offers a more accuracy and timely scale to monitor super point’s cardinality. On the otherhand, super point’s cardinality estimating under sliding time window is more difficult because it requires analgorithm can record the cardinality incrementally and report them immediately at the end of the slidingduration. This paper firstly solves this problem by devising a sliding time window available algorithmSRLA. SRLA consists of two cardinality estimating algorithms, sliding rough estimator SRE and slidinglinear estimator SLE. SRE is used to detect super point while scanning packets and it generates a candidatesuper point list at the end of a sliding time window. With this candidate super point list, SLE estimatesthe cardinality of every candidate super point fast and accurately. SRLA’s ability of working under slidingtime window comes from a novel cardinality recorder, distance recorder DR. DR records the time a hostappearing and helps SRE and SLE to judge if a host contacting with a super point is in a certain timeperiod. SRLA could run parallel to deal with high speed network in line speed. This paper also gives theway to deploy SRLA on a common GPU. Experiments on real world traffics which have 40 GB/s bandwidthshow that SRLA estimates super point’s cardinality within 100 milliseconds under sliding time window whenrunning on a Nvidia GPU GTX650 with 1 GB memory. The estimating time of SRLA is much smaller thanthat of other algorithms which consumes more than 2000 milliseconds under discrete time window.
1. Introduction
Super point cardinality estimation has beenresearched for a long time because of itsimportance[1][2][3]. And many excellent algorithmshave been proposed recent years[4][5] . But thesealgorithms only work for discrete time window, un-der which there is no duplicating time period be-tween two adjacent windows. These algorithms willreinitialize at the beginning of every window anddiscard hosts’ cardinality information of previous ∗ Corresponding author
Email addresses: [email protected] (Jie Xu), [email protected] (Wei Ding), [email protected] (Jian Gong), [email protected] (Xiaoyan Hu) time[6]. The discrete time window splits host cardi-nality into discrete pieces and doesn’t report superpoint cardinality until the end of a window whichhas a latency of the size of time window. Slidingtime window which moves a small unit smoothlyhas a better measurement result than discrete timewindow. It stores and updates host cardinality in-formation incrementally. Sliding time window es-timates super point cardinality more precisely be-cause it is not affected by the starting of window.And sliding time window reports super point moretimely for the sake that the moving step is muchsmaller than the size of discrete time window and atthe end of each moving step, super point cardinal-ity will be estimated immediately. But super pointdetection and cardinality estimation under sliding
Preprint submitted to Elsevier November 5, 2018 a r X i v : . [ c s . N I] J u l ime window is more complex than that under dis-crete time window because it maintains hosts stateof some previous time and estimates super point’scardinality more frequently.Super point’s cardinality estimation could be di-vided into three procedures: packets scanning, su-per point detection and cardinality estimation. Thefirst procedure scans every packet and records nec-essary information about hosts in different net-works. Procedures 2 and 3 detect super points andestimate their cardinalities according to the record-ing information. They always run together at theend of a time window which is what all of existingalgorithms do[7][8]. But the merging of super pointdetection and cardinality estimation will consumelots of time at the end of a time window becauserestoring super points from huge hosts is a complexprocedure. Under discrete time window, the endwindow procedure time will not cause many influ-ence because two adjacent discrete time windowshave no duplicate time period and duration betweentheir end points is equal to the length of them. Butunder sliding time window, the duration of two win-dows’ end points is only a small part of the windowsize and super point’s cardinality will be estimatedmore frequently than that under discrete time win-dow. In order to estimate super point’s cardinalityunder sliding time window in real time, the end win-dow procedure time must be much smaller than thesliding step. If we try to detect super point whilescanning packets in the time window and only es-timate the cardinalities of these detected candidatesuper points, the procedure time at the end of thetime window will be reduced greatly.Millions of packets passing through a high speednetwork every second[9][10]. So the super pointdetection algorithm running with packets scanningmust be light weight: small memory requirement,fast processing speed. This light weight detectionalgorithm generates a candidate super points listwhile scanning packets. At the end of a time win-dow, a more accurate algorithm will be used toestimate every candidate super point’s cardinalityfast. This paper devises two estimators: slidingrough estimator SRE and sliding long estimator
SLE . SRE is a light weigh estimator which judgesif a host is super point while scanning packets.And
SLE calculates the cardinality of a given host.Based on these two estimators, a novel sliding su-per point’s cardinality estimating algorithm
SRLA is proposed. In order to work under sliding timewindow,
SRLA uses a new method called as dis- tance recorder DR to record the appearance of ahost. DR helps SRLA to judge if a host appearsin a certain sliding time window by updating itselfincrementally.Nowadays network bandwidth is becoming higherand higher[11][12]. To estimate super point’s cardi-nality from the high speed network in real time, par-allel processing technology is necessary. Most of theprevious algorithms tried to accelerate the packetsprocessing speed by used fast memory SRAM. Butthe small size SRAM limits the accuracy of thesealgorithms in a high-speed network. What’s more,estimation algorithm requires lots of computationoperations and the computation ability of CPU isalso the bottleneck. Parallel computation abilityof GPU (Graphic Processing Unit) is stronger thanthat of CPU because of its plenty operating cores.When using GPU to scan packets parallel, a highthroughput will be acquired.Motivated by these ideas, this paper firstly pro-posed a sliding time window available super pointcardinality estimation algorithm
SRLA . The maincontribution of this paper is listed below.1. Devise a novel light weight method to judgeif a host is a super point under sliding timewindow.2. Firstly propose a super point detection andcardinality estimation algorithm under slidingtime window.3. Deploy the sliding super point detection andcardinality estimation algorithm on a commonGPU to deal with core network in real time.In the next section, we will introduce previous su-per point detection algorithm under discrete timewindow and analyze their merit and weakness. Insection 3, two sliding time window available cardi-nality estimators, sliding rough estimator and slid-ing linear estimator, are proposed. Section 4 intro-duces the novel algorithm
SRLA and describes howit does to detect super points and estimate the car-dinality under sliding time window. In this section,a method to deploy
SRLA on GPU is also pro-posed. Section 5 shows experiments of real world40Gb/s core network traffic. And we make a con-clusion in the last section.
2. Related work
Super point detection is a hot topic in networkresearch field. Shobha et al.[7] proposed an algo-2ithm that did not keep the state of every host sothis algorithm can scale very well. Cao et al.[6] useda pair-based sampling method to eliminate the ma-jority of low opposite number hosts and reservedmore resource to estimate the opposite number ofthe resting hosts. Estan et al.[13] proposed two bitsmap algorithms based on sampling flows. Severalhosts could share a bit of this map to reduce mem-ory consumption. All of these methods were basedon sampling flows which limited its accuracy.Wang et al.[14] devised a novel structure, calleddouble connection degree sketch (DCDS), to storeand estimate different hosts cardinalities. They up-dated DCDS by setting several bits simply. In orderto restore super points at the end of a time period,which bits to be updated were determined by Chi-nese Remainder Theory(CRT) when scanning pack-ets. By using CRT, every bit of DCDS could beshared by different hosts. But the computing pro-cess of CRT was very complex which limited thespeed of this algorithm.Liu et al.[15] proposed a simple method to restoresuper hosts basing on bloom filter. They called thisalgorithm as Vector Bloom Filter(VBF). VBF usedthe bits extracted from a IP address to decide whichbits to be updated when scanning packets. Com-pared with CRT, bit extraction only needed a smalloperation. But VBF would consume much timeto restore super point when the number of superpoints was very big because it used only four bitarrays to record cardinalities.Most of the previous works only focused on accel-erating speed by adopting fast memory but they ne-glected the calculation ability of processors. Seon-Ho et al.[16] first used GPU to estimate hosts op-posite numbers. They devised a Collision-toleranthash table to filter flows from origin traffic andused a bitmap data structure to record and esti-mate hosts’ opposite numbers. But this methodneeded to store IP address of every flow while scan-ning traffic because they could not restore superpoints from the bitmap directly. Additional candi-date IP address storing space increased the memoryrequirement of this algorithm.All of these algorithms can’t not work under slid-ing time window because they must reinitialize theirdata structures at the beginning of every window.And they consume too much time when estimatingcardinalities which is longer than the step of win-dow sliding. In the following part, an incrementallyupdating and fast estimating algorithm is proposedto detect super point and estimate their cardinali- ties under sliding time window.
3. Sliding cardinality estimation
There are huge hosts in a high speed network.But super point takes up a little proportion. Esti-mating super point’s cardinality under sliding timewindow contains two parts: detecting super pointsunder sliding time window, estimating their cardi-nalities. Both of these parts have a same issue, howto estimate a host’s cardinality under sliding timewindow. Two sliding estimators, sliding rough esti-mator (SRE) and sliding linear estimator (SLE), areintroduced in this section for super point detectionand cardinality estimation separately. SRE judgesif a host is super point with small memory and SLEgives the cardinality estimation for a certain host.For verbal clarity, we firstly give the definition ofsliding time window.
Suppose there are two networks A and B .These two networks are contacting with each otherthrough an edge router ER . A might be a city-wide network or even a country-wide network. And B might be another city-wide network or the Inter-net. All traffic between A and B could be observedfrom ER . Split this traffic by successive time slicesas shown in figure 1. Figure 1: Sliding time window and discrete time window
These time slices have the same duration. Thelength of a time slice could be 1 second, 1 minute orany period in different situations. Every time sliceis identified by a number. A sliding time window W ( t, k ) contains k successive slices starting fromthe t time slice as shown in the top part of figure1. Sliding time window will move forward one slice3nce a time. So two adjacent sliding time windowscontain k − k is set to 1, thereis no duplicate time period between two adjacentwindows, which is the case of discrete time windowin the bottom part of figure 1.Let A be the network from which we want todetect super points. A host’s packets stream in asliding time window is defined as below. Definition 1 (Packets stream of a host) . For ahost aip ∈ A , every packet passing through ER insliding time window W ( t, k ) which has aip as sourceor destination address composes packets stream of aip , written as P kt ( aip, t, k ). aip ’s opposite hosts stream ST ( aip, t, k ) couldbe derived from P kt ( aip, t, k ) by extracting theother IP address except aip . A IP address bip may appear several times in ST ( aip, t, k ) because aip can send several packets to bip or receivemany packets from bip . Hosts in ST ( aip, t, k )make up of opposite hosts set of aip , writtenas OP ( aip, t, k ). The number of element in OP ( aip, t, k ), denoted as | OP ( aip, t, k ) | , is no big-ger than that of ST ( aip, t, k ). | OP ( aip, t, k ) | is thecardinality of aip in sliding time window W ( t, k ).Sliding super point is defined according to host’scardinality. Definition 2 (Sliding super point) . For a host aip ∈ A , if | OP ( aip, t, k ) | ≥ θ , aip is a sliding superpoint in sliding time window W ( t, k ). Where θ is apositive integer.Threshold θ is defined by users for different appli-cations. It could be selected according to the aver-age cardinality of all host in the past or the normalcardinality of a server. How to get | OP ( aip, t, k ) | from ST ( aip, t, k ) is a hard task. Because pack-ets pass through ER with high speed and everypacket could only be scanned a time in the stream.How to process every coming packet, judge if it isa super point and give an accurate estimation of | OP ( aip, t, k ) | at the end of the last time slice of W ( t, k ) is the key step in the whole algorithm. Cardinality estimator scans IP pair stream in awindow and gives hosts’ cardinalities estimation atthe end of this window. A IP pair is extracted forma packet passing through R . Definition 3.
IPpair A IP pair is a tuple oftwo IP addresses extracted from a packet like < aip i , bip j > where aip i ∈ A and bip j ∈ B . IP pairstream in W ( t, k ) is the stream of IP pairs extractedfrom every packet passing through R in W ( t, k ) andit is denoted by IP pair ( A, t, k ).For a host aip in A , its IP pair stream IP pair ( aip, t, k ) is the sub stream of IP pairs whichhave aip as the first IP addresses. Let OP ( aip, t, k )represent the set of the second IP addresses of IPpairs in IP pair ( aip, t, k ). The task of estimate aip ’s cardinality is to get the number of hosts in OP ( aip, t, k ), written as | OP ( aip, t, k ) | , by scan-ning every IP pair in IP pair ( aip, t, k ). In orderto calculate | OP ( aip, t, k ) | , a key step is to acquirehow many distinct IP pairs appear in W ( t, k ). Inanother word, for a given IP pair < aip, bip j > which may appears several times in a slice, theproblem is to determine if it appears in W ( t, k ).This problem is simple in discrete time windowwhere k = 1 by using a single bit which will beset to 1 if < aip, bip j > appears. But in slidingtime window when k >
1, there are k − W ( t, k ) slides to W ( t + 1 , k ), there are k − t + 1 toslice t + k −
1, appearing in them at the same time. < aip, bip j > may appear in slice t or in some slicesafter t . A sliding cardinality estimator must distin-guish these cases and judge if < aip, bip j > appearsin the new window after sliding. This paper devisesa new recorder, distance recorder DR , to solve thisproblem. DR is a recorder which consists of z bits. Itrecords the distance between the nearest slice where < aip, bip j > appears and the current scanningslice. For example, suppose that the estimator isnow scanning IP pair ( aip, t, k ) and < aip, bip j > appears in slice t − d and not appears in slices after t − d . Then the value of DR is d . When d = 0, thedistance will be 0 too which means < aip, bip j > appears in the current slice. Only when the dis-tance is smaller than k will host bip j appears in thesliding time window W ( t, k ). So z determines themax number of slices in a sliding time window andthe max value of k is 2 z −
1. In discrete time win-dow, 1 bit is big enough for DR . When all of the z bits in DR is set to 1, it means that the distance ismore than k and DR is also initialized to this value. DR has four operations as listed below where dr , dr , dr are instance of DR . DRinit( dr ) set every bit of dr to 1;4 Rset( dr ) set every bit of dr to 0; DRslide( dr ) if the value of dr is smaller than 2 z −
1, increment dr by 1; DRjoin( dr , dr ) return a new DR which has themax value of dr and dr These operations make sure that DR holds thecorrect distance for a certain IP pair or a cer-tain host in network B . A precise way to calcu-late | OP ( aip, t, k ) | is to allocate a DR for everyhost in OP ( aip, t, k ) and store these DR s by hashtable or tree structures. This method could ac-quire the exact value of | OP ( aip, t, k ) | by count-ing the number of DR whose value are smallerthan k at the end of every slice. But it alsoconsumes many memory and computing resource.For every host in OP ( aip, t, k ), precise method re-quires 32 + z bits, 32 bits for IP address and z bits for DR . The total memory requirement ismore than | OP ( aip, t, k ) | ∗ (32 + z ) / | OP ( aip, t, k ) | is very big, locating DR of every hostis also a hard task. So precise method is used to runoffline to acquire baseline to evaluate the accuracyof other algorithms.To saving memory and reducing processing time,estimator methods are required. When detectionsuper point, an estimator only needs to tell if a hostis super point or not. Under this requirement, amemory efficient algorithm, sliding rough estimator SRE , is devised.For a host aip , the task of judging super pointis to determine if | OP ( aip, t, k ) | ≥ θ by scanningevery host in OP ( aip, t, k ) once. SRE proposed inthis paper is a memory efficient algorithm whichcan tell if a host is a super point in a time periodwith only g DR s and 8 DR s are big enough for IPv4address. Its weight | SRE | k is the number of DR init whose value is smaller than k. These g DR s areinitialized to 2 z − SRE samples and records hosts in
IP pair ( aip, t, k )by the least significant bits of their hashed value.Least significant bit of an integer is defined in thebelow. Definition 4 (Least significant bit, LSB) . Givenan integer i , let BIN ( i ) represent its binary for-matter. The least significant bit of i , LSB ( i ), isthe index of the first 1’ bit of BIN ( i ) starting fromright.For example, LSB (3) = 0,
LSB (40) = 3. The bi-nary formatters of 3 and 40 are “11” and “101000”. The first bit of
BIN (3) is 1, so
LSB (3) equalsto 0. While
BIN (40) meets its first 1’until thefourth bit, so its
LSB is 3. For every host bip in IP pair ( aip, t, k ), SRE hashes it to a random valuebetween 0 and 2 − H .If LSB ( H ( bip )) is smaller than an integer τ , thisIP will not be recorded by SRE where τ is derivedfrom θ by equation1. τ = ceil ( log ( θ/g )) (1)When LSB ( H ( bip )) ≥ τ , a bit selected by H ( bip ) will be set where H is another hash func-tion mapping bip to a value between 0 and g − DR , if | SRE | k is no smaller than ρ ∗ g , | OP ( aip, t, k ) | is judged as bigger than θ by SRE , where ρ = 0 . ∗ (1 − e − / ). ρ is ac-quired from [18]. SRE deals with every host in
IP pair ( aip, t, k ) in this way. SRE has a high probability to report a superpoint. Then we will give its mathematical analyze.
Lemma 1.
Suppose there are α different balls, g different boxes and α ≥ g . Throw all of these ballsrandomly to these boxes. Let F N ( α, g ) representthe number of situations that every g boxes has atleast a ball. Then F N ( α, g ) = g α − (cid:80) r − i =1 C ir ∗ F N ( α, i ) and F N ( α,
1) = 1.
Proof.
There are total g α situations to threw α balls to g boxes. When there is only a box, thereis only a situation, throwing all balls to it. Whenthrowing all balls to i boxes and all of these boxescontain at least on balls, there are C ir ∗ F N ( α, i )situations. Deduct all situations that all balls arethrown to a subset of g boxes from g α , the rest isthe number of situations that there are no emptyboxes. Theorem 1.
Throw α balls to g boxes. Let g rep-resent the number of boxes that contain at least aball. The number of situations that there are g balls are none empty,denoted by F N ( α, g, g ) , is C g g ∗ F N ( α, n ) , where ≤ n ≤ g .Proof. The rest g − g balls are empty. There are C ng situations to choose gg empty balls. Each situationhas F N ( α, g ) methods to throw α balls. So thenumber of total situations is C g g ∗ F N ( α, g ). OP ( aip, t, k ) could be regarded as the set of ballsand g bits could be regarded as boxes in theorem1. | SRE | k means the number of DR whose valuesare smaller than k . Suppose there are α hosts in5 P ( aip, t, k ) updating SRE . The probability thatthere are | SRE | k = g is : P r { α, g, g } = F N ( α, g, g ) g α (2)Every host in OP ( aip, t, k ) has probability τ toupdate SRE . So the probability that there are α hosts in OP ( aip, t, k ) updating SRE is:
P r {| OP ( aip, t, k ) | , α } = C α | OP ( aip,t,k ) | ∗ τ α ∗ (1 − τ ) | OP ( aip,t,k ) |− α (3)Combine equation 2 and 3, we will get the prob-ability that there are g DR being set in RE afterscanning ST ( aip, t, k ) as shown in equation 4. P r {| OP ( aip, t, k ) | , g, τ, g } = | OP ( aip,t,k ) | (cid:88) α = g P r {| OP ( aip, t, k ) | , α } ∗ P r { α, g, g } (4). The probability that there are more than n SRE after scanning ST ( aip, t, k ) could bederived from 4 as shown in equation 5. P r + | OP ( aip, t, k ) | , g, τ, n = g (cid:88) g = n P r {| OP ( aip, t, k ) | , g, τ, g } (5)Equation 5 proofs that SRE has a high proba-bility to detect super point. But it is a light weightestimator and can’t give an accurate cardinality es-timation. Sliding linear estimator introduced in thenext makes up this shortage.
Linear estimator, LE , is a famous cardinality es-timation algorithm[19]. It uses g (cid:48) bits, which areinitialized to 0 at the beginning of a discrete timewindow, to estimate host’s cardinality. When scan-ning a host bip in IP pair ( aip, t, k ), one bit in LE selected by hash function H will be set. H ( bip )maps bip to a random value between 0 and g (cid:48) − | LE | represent the weight of LE , which meansthe number of 1 bit in it. At the end of a discretetime window, | OP ( aip, t, | will be estimated bythe following equation. | OP ( aip, t, | = − g (cid:48) ∗ ln ( g (cid:48) − | LE | g (cid:48) ) (6)But LE only works when k = 1. In order to esti-mate cardinality under sliding time window, slid-ing linear estimator SLE replaces the g (cid:48) bits in LE with g (cid:48) DR s. The weight of SLE denoted as | SLE | k is the number of short integer whose value issmaller than k . SLE estimates a host’s cardinalityby equation 7. | OP ( aip, t, k ) | (cid:48) = − g (cid:48) ∗ ln ( g (cid:48) − | SLE | k g (cid:48) ) (7)According to paper [19], the estimating accuracyof SLE depends on the value of g (cid:48) , the bigger g (cid:48) is,the more accurate the estimating result will be. Buta big g (cid:48) requires more time to calculating | SLE | k which increasing the estimating time. So SLE isonly suit to estimating cardinality of candidate su-per points at the end of slice, instead of estimatingevery time while scanning IP pair. When combining
SRE with
SLE , an novel sliding time window su-per point’s cardinality estimating algorithm
SRLA is proposed.
4. Detect super points and estimate theircardinalities on GPU
Network A contains a great number of hosts andit’s not efficient to allocate a SRE and
SLE forevery host. This section introduces a novel algo-rithm which can detect super point and estimatetheir cardinalities under sliding time window withfixed number of estimators.
Because 8 DR s are big enough for SRE to judgeif a host is super point, it can detect super pointfast. When using with LE , it can estimate superpoint’s cardinality more quickly. Motivated by thisidea, we design a novel estimator, sliding estimator SE . SE consists of a SRE , a LE and 16 bits. The16 bits in SE is used to indicate that a host hasbeen judged as a super point in the time slice andwe call them as super point indicator SI . When ahost aip ∈ A is firstly judged as a super point, a bitin SI , selected by a hash function H ( aip ) where H hashes aip to a random value between [0,15],will be set to 1. Let SI [ i ] point to the i th bit in6 I . An array of SE with u rows and v columns,denoted by SEA , is used to detect super pointsin the network A and estimate their cardinalities.Figure 2 illustrates the structure of SEA . Figure 2: Structure of SEA
Every IP pair < aip, bip > will update u SE se-lecting from the u rows of SEA by u hash functions, RH i ( aip ) where 0 ≤ i ≤ u − U SE ( aip ) repre-sents the union SE of these u SE s in SEA and
U SI ( aip ), U RE ( aip ) and U LE ( aip ) represent the SI , SRE and LE in U SE ( aip ) respectively. For ahost aip ∈ A , its union SE in SEA is acquired byalgorithm 1. SI [ i, j ], RE [ i, j ] and LE [ i, j ] is the SI , SRE and LE of SE in the i th row, j th column. Af-ter updating these SE s, aip will be checked by U RE ( aip ) to test if it is a super point. If it isand U SI [ H ( aip )] is zero, aip will be inserted in toa candidate super point list and the H ( aip )th bitof every SI [ i, RH i ( aip )] will be set to 1. This willavoid to add aip to the candidate super point listmore times.To calculate U SE ( aip ) every time scanning anIP pair is time consuming, especially that the g (cid:48) isvery big often more than one thousand. We onlyneed to acquire U SI ( aip ) and U RE ( aip ) for superpoint judging and candidate super point list inser-tion. SI contains only 16 bits and SRE consists ofonly 8 DR for IPv4 address. The merging time willbe reduced greatly. Algorithm 2 describes how toupdate SEA for every IP pair.Algorithm 2 firstly updates u LE s by setting a DR of them to 0. Then it begins to update SRE after checking aip . Not every IP pair could pass
Algorithm 1
UnionSE
Input: aip ∈ A ; SEA
Output:
U SE ( aip ) the union of SE relating with aip Init
U SE ( aip )set every bit of SI in U SE ( aip ) to 1set every DR in U RE ( aip ) to 2 z − DR in U LE ( aip ) to 2 z − for i ∈ [0 , u − do U SI ( aip ) ⇐ U SI ( aip )& SI [ i, RH i ( aip )] for j ∈ [0 , g − do U RE ( aip )[ j ] ⇐ DRjoin ( U RE ( aip )[ j ] , RE [ i, RH i ( aip )][ j ]) end forfor j ∈ [0 , g (cid:48) − do U LE ( aip )[ j ] ⇐ DRjoin ( U LE ( aip )[ j ] , LE [ i, RH i ( aip )][ j ]) end forend for Return
U SE ( aip ) Algorithm 2
ScanIPpair
Output:
SEA ,IP pair < aip, bip >
Candidate super point list
CSIPleidx ⇐ H ( bip ) for ridx ∈ [0 , u − do LE [ i, RH i ( aip )][ leidx ] ⇐ end forif LSB ( H ( aip )) ≤ τ then Return end if reidx ⇐ H ( bip ) siidx ⇐ H ( bip ) for ridx ∈ [0 , u − do RE [ i, RH i ( aip )][ reidx ] ⇐ end forif | U RE ( aip ) | k ≥ τ thenif U SI ( aip )[ siidx ] equal to 0 then insert aip into CSIP for ridx ∈ [0 , u − do SI [ i, RH i ( aip )][ siidx ] ⇐ end forend ifend if τ of them updates SRE . This checking process accelerates the scan-ning speed greatly. When a IP pair updates, thefirst IP address of it will be checked if is a superpoint by the union
SRE . Super point reported by
SRE will be inserted into the candidate list
CSIP .Algorithm 2 deals with every IP pair in a slice. Af-ter scanning all IP pairs in this slice, the cardinalityof hosts in the candidate super point list could beacquired from
SEA by the algorithm described inthe next section.
SEA uses fix number of LE , u ∗ v SE s, to esti-mate the cardinalities of all hosts in A . This causesthat a LE will record more than one hosts’ cardi-nalities and the result will be over estimating. Inorder to reduce the influence, u LE s will be usedtogether and a host’s cardinality will be estimatedfrom the union LE . But when there are many dis-tinct IP pairs in a slice, there are still many DR in the union LE setting by other hosts. Estimat-ing the number of these error DR and remove themfrom the union LE helps to improve the accuracyof cardinality estimation.Let | LDR ( i ) | k represent the number of all LE s’ DR in the i th row whose values are smaller than k . Then the probability that a DR of a LE in the i th row is set by some host is P LEdr ( i ) = | LDR ( i ) | k g (cid:48) ∗ v . | LDR ( i ) | k could be acquired by scanning every LE in the i th row. Suppose a LE is used to record thecardinality of a host aip exclusively. Then | LE | k isexpected to be d = g (cid:48) − g (cid:48) ∗ e − | OP ( aip,t,k ) | g (cid:48) , accordingto equation 7. In the union LE , every of these d DR will be set by some other hosts with probability U P
LEdr as shown in the following equation.
U P
LEdr = u − (cid:89) i =0 P LEdr ( i ) (8)Let | U LE ( aip ) | k represent the number of DR in the union LE whose values are smaller than k .Then | U LE ( aip ) | k = d + ( g − d ) ∗ U P
LEdr . And aip ’s cardinality could be estimated by the follow-ing equation. | OP ( aip, t, k ) | (cid:48) = − g (cid:48) ∗ ln (1 − | U LE | k − g (cid:48) ∗ U P
LEdr g (cid:48) ∗ (1 − U P
LEdr ) )(9) Equation 9 gives a more accurate estimation byremoving the error setting DR from U LE . The car-dinality of every host in the candidate super pointlist will be estimated in this way.
SRLA works under sliding time window. To dothis,
SRLA updates
SEA incrementally instead ofreinitialize it before every time slice. After estimat-ing super point’s cardinality,
SRLA updates all SI , DR and the candidate super point list by algorithm3. Algorithm 3
SEA updating before sliding
Input:
SEA
Candidate super point list
CSIP
Output:
New candidate super point list
N CSIP for si in SI of every SE in SEA do si ⇐ end forfor dr in DR of all SRE and LE of SE in SEA do if dr < z − then dr + + end ifend forfor aip in CSIP doif | U RE ( aip ) | k ≥ g ∗ ρ then insert aip into N CSIP for ridx ∈ [0 , u − do SI [ i, RH i ( aip )][ siidx ] ⇐ end forend ifend for Return
N CSIP
Algorithm 3 not only updates all DR in SEA butalso derives a new candidate super point list fromnow current one for the next time window. Thismakes sure that no super points will be neglected.For example, if aip is a super point in W ( t + 1 , k )and all of its opposite hosts appear in time slice t +1to t + k −
1. In this case, aip will not be insertedinto the candidate super point list while scanningIP pairs in time slice t + k . But it could be detectedout from the candidate super point lists in W ( t, k ). While scanning IP pairs,
SRLA only sets some SI s, DR s. Bot SI s and DR could be set by serverthreads at the same time without causing any mis-takes, because a bit or a DR is still being 1” orzero after setting several times. So several IP pairs8ould be processed concurrently. GPU is a specialdevice which contains plenty computing cores andhas high memory accessing through put. Althoughthe ability of every single core of CPU is a littlestronger than that of GPU, but the total comput-ing resource of a GPU card is much more abundantthan that of CPU considering the plenty number ofcores a GPU containing.GPU is good at these tasks which process hugedata with the same instructions. SRLA is such onethat scanning different IP pairs by algorithm 2. ButGPU could only access its own memory directly, sothese IP pairs should be stored in a buffer and thencopied to GPU’s graphic memory as shown in figure3.
Figure 3: Structure of SEA
Before
SRLA starting,
SEA will be initializedon GPU’s graphic memory to be accessed by GPUthreads directly. When the IP pairs buffer is full,it will be sent to GPU’s global memory by PCIebus. After receiving these IP pairs, GPU launchesthousands of cores to deal with them at the sametime. Stream processor SP is a set of hundredsof computing cores. A GPU card contains several SP s. Every SP reads a part of IP pairs in thebuffer and distributes them to different cores forfurther processing. Every core runs algorithm 2 toupdate SEA and candidate super point list
CSIP in a time slice.After scanning all IP pairs in a slice, every com-puting core estimates cardinality of candidate superpoint by equation 9.Let C u represent the time of IP pairs scanning, C e represent the time of candidate super point’scardinality estimation and C s represent the dura-tion of a slice. In order to deal with high speed net- work traffic in real time, C u + C e must be smallerthan C s . Cardinality of candidate super point willnot be estimated until the end of a slice, so theestimating latency under sliding time window is C s + C e . Experiments proves that for a 40Gb/snetwork, SRLA ’s C e is as small as 300 millisecondswith a common GPU card and C u is smaller than150 milliseconds. This shows that SRLA workswell under a sliding time window whose sliding stepcould be as small as 1 second when running onGPU.
5. Experiment
To evaluate the performance of
SRLA , we usea real world traffic collecting from the node ofJiangSu province of CERNET. The experimentdata are two one-hour traffics starting from 13:00on October 21 and 23, 2017. There are two parts inour experiments: super point cardinality estimationunder discrete time window and super point cardi-nality under sliding time window. In both of theseparts, super point’s threshold θ is set to 1024. Theexperiment runs on a PC with GPU card NvidiaGTX 650, 1 GB graphic memory. The parameter of the discrete time window is setto C s = 300 seconds, k = 1 and z = 1. There are 12discrete time windows in a one-hour traffic and theaverage information of these two traffics are listedin talbe 1.In table 1, AN etIP ” and
BN etIP ” mean thenumber of hosts in A and B separately and F low ”means the average number of distinct IP pairs in adiscrete time window. From it we can see that,the average packets speed of this traffic is 4.5 mpps(million packets per second) and super point makesup smaller than 0.047 percent of the total hosts in A .Accuracy is a key merit of cardinality estima-tion. We measure the accuracy by false positiverate(FPR), false negative rate(FNR) as defined be-low. Definition 5 (FPR/FNR) . For a traffic with N super points, an algorithm detects N (cid:48) super points.In the N (cid:48) detected super points, there are N + hostswhich are not super points. And there are N − su-per points which are not detected by the algorithm.FPR means the ratio of N + to N and FNR meansthe ratio of N − to N .9 able 1: Traffic informationFigure 4: SRLA accuracy comparing on traffic Oct, 21, 2017 FPR may decrease with the increase of FNR. Ifan algorithm reports more hosts as super point, itsFNR will decrease but FPR will increase. So we usethe sum of FPR and FNR, total false rate TFR, toevaluate the accuracy of an algorithm.The parameters of
SEA influences the accuracyof
SRLA . We firstly compare the accuracy of
SRLA with different u , v and g (cid:48) . Figure 4 and5 show the average accuracy of SRLA under the12 discrete time windows of different traffics andparameters.Every sub figure compares the accuracy of
SRLA under different g (cid:48) , changing from 1024 to 8192. Big g (cid:48) helps to reduce T F R in most cases. But big g (cid:48) requires more time to acquire | SLE | k which willcause a big C e . T F R also decreases gradually withthe increase of v . But when v grows to 131072from 65536, T F R decreases slightly but memorydoubles.
SRLA has the lowest false rate when g (cid:48) , u and v are set to the biggest value. But memoryrequirement and C e also grow rapidly. When g (cid:48) =1024, v = 65536 and u = 4, SRLA ’s false ratesare small enough and in the following experiments,
SRLA ’s parameters are set as these values. Andwhen running under discrete time window, z = 1 isenough for DR .To compare the performance of SRLA with otheralgorithms, we use DCDS[14], VBFA[15], GSE [16]to compare with it. Table 2 lists the average resultof all the 24 discrete time windows.GSE has a lower FPR than other algorithms. Itcan remove fake super points according the esti-mating flow number. But GSE may remove somesuper points too, which causes it has a higher FNR.Because it uses discrete bits to record host’s cardi-nality, collecting all of these bits together when esti-mate super points cardinality will use lots of time.DCDS uses CRT when storing host’s cardinality.CRT has a better randomness which makes DCDShas a lower FNR. But CRT is very complex con-taining many operations. So DCDS’s speed is thelowest among all of these algorithms. VBFA hasthe fastest speed but its TFR is higher than that ofSRLA.From table 2 we can see that, SRLA uses thesmallest memory, smaller than one-twentieth of10 igure 5: SRLA accuracy comparing on traffic Oct, 23, 2017Table 2: Comparing result under others’ memory. Because
SRLA generates a can-didate super point list while packets scanning, soit has the smallest C e , only 4 milliseconds. And SRLA is the only one which can run under slidingtime window.
In the sliding time window experiments, a timeslice is set to 1 second, k is 300 and z equals to16. We let the window sliding from W (0 , W (2999 , SRLA runs on traffic 2017-10-21.
SRLA ’s FPR, FNR and TFR are illustrated infigure 6, 7 and 8.Under most sliding time window,
SRLA has alow FNR, smaller than 1 . SRLA has the similar accuracy when it under discretetime window. This proves that
SRLA estimatessuper point cardinality successfully under slidingtime window on GPU. In the sliding time windowexperiments,
SRLA ’s average C e is 100 millisec-onds which is more than that under discrete time window. Because in sliding time window, z is setto 16 and SRLA requires more time to calculate | SLE | k . But C e + C u is still much smaller than C s and SRLA ’s average C u is 109 milliseconds for ev-ery single slice. So SRLA can detect and estimatethe cardinality of super point in real time undersliding time window.
6. Conclusion
Super point cardinality estimation is an impor-tant and difficult task on network management. In-cremental updating and small estimating time aretwo special difficulties in it.
SRLA proposed in thispaper is the first one solve this problem in real timewith a common GPU.
SRLA ’s capability of incre-mental updating comes from DR , a new recorderwhich can determine if itself is updated in a certainsliding time window. In order to reduce the superpoint’s cardinality estimation time, SRLA generat-ing a candidate super point list while scanning IPpairs. This candidate super point list is acquiredby the light weight sliding estimator
SRE . SRE is11 igure 6: FPR under sliding time windowFigure 7: FNR under sliding time windowFigure 8: TFR under sliding time window memory efficient and fast processing which makessure that it doesn’t cause many additional time forIP pairs scanning. At the end of a time slice,
SRLA estimates every candidate super point in the list bysliding linear estimator
SLE . SLE gives a highaccuracy estimation of a host’s cardinality. Whenrunning on a common GPU,
SRLA estimates thesuper point cardinality in real time for a 40Gb/snetwork.
7. ReferenceReferences [1] S. Khattak, Z. Ahmed, A. A. Syed, and S. A.Khayam, “Botflex: A community-driven tool forbotnet detection,”
Journal of Network and Com-puter Applications
ComputersElectrical Engineering
Journal of Networkand Computer Applications , Aug 2014, pp. 333–337.[5] P. Wang, X. Guan, J. Zhao, J. Tao, and T. Qin, “Anew sketch method for measuring host connection de-gree distribution,”
IEEE Transactions on InformationForensics and Security , vol. 9, no. 6, pp. 948–960, June2014.[6] J. Cao, Y. Jin, A. Chen, T. Bu, and Z. L. Zhang, “Iden-tifying high cardinality internet hosts,” in
IEEE INFO-COM 2009 , April 2009, pp. 810–818.[7] S. Venkataraman, D. Song, P. B. Gibbons, andA. Blum, “New streaming algorithms for fast detection f superspreaders,” in in Proceedings of Network andDistributed System Security Symposium (NDSS , 2005,pp. 149–166.[8] G. Cormode and M. Hadjieleftheriou, “Finding thefrequent items in streams of data,” Commun. ACM ,vol. 52, no. 10, pp. 97–105, Oct. 2009. [Online]. Avail-able: http://doi.acm.org/10.1145/1562764.1562789[9] A. B. Paul, S. Biswas, S. Nandi, and S. Chakraborty,“Matem: A unified framework based on trustand mcdm for assuring security, reliability andqos in dtn routing,”
Journal of Network andComputer Applications
Journal of Networkand Computer Applications
Journal of Network andComputer Applications
Journal of Network andComputer Applications
IEEE/ACM Trans. Netw. , vol. 14, no. 5,pp. 925–937, Oct. 2006. [Online]. Available: http://dx.doi.org/10.1109/TNET.2006.882836[14] P. Wang, X. Guan, T. Qin, and Q. Huang, “A datastreaming method for monitoring host connection de-grees of high-speed links,”
IEEE Transactions on In-formation Forensics and Security , vol. 6, no. 3, pp.1086–1098, Sept 2011.[15] W. Liu, W. Qu, J. Gong, and K. Li, “Detection of super-points using a vector bloom filter,”
IEEE Transactionson Information Forensics and Security , vol. 11, no. 3,pp. 514–527, March 2016.[16] S.-H. Shin, E.-J. Im, and M. Yoon, “A grandspread estimator using a graphics processing unit,”
Journal of Parallel and Distributed Computing
Journal of Algorithms
Proceedings of the Twenty-ninth ACMSIGMOD-SIGACT-SIGART Symposium on Principlesof Database Systems , ser. PODS ’10. New York,NY, USA: ACM, 2010, pp. 41–52. [Online]. Available:http://doi.acm.org/10.1145/1807085.1807094[19] K.-Y. Whang, B. T. Vander-Zanden, and H. M. Taylor,“A linear-time probabilistic counting algorithm for database applications,”
ACM Trans. Database Syst. ,vol. 15, no. 2, pp. 208–229, Jun. 1990. [Online].Available: http://doi.acm.org/10.1145/78922.78925[20] CERNET, “China education and research net-work,” http://iptas.edu.cn/src/system.php, 2017, on-line;accessed 2017.,vol. 15, no. 2, pp. 208–229, Jun. 1990. [Online].Available: http://doi.acm.org/10.1145/78922.78925[20] CERNET, “China education and research net-work,” http://iptas.edu.cn/src/system.php, 2017, on-line;accessed 2017.