Data stream fusion for accurate quantile tracking and analysis
Massimo Cafaro, Catiuscia Melle, Italo Epicoco, Marco Pulimeno
aa r X i v : . [ c s . D S ] J a n Data stream fusion for accurate quantile tracking and analysis
Massimo Cafaro a, ∗ , Catiuscia Melle a , Italo Epicoco a , Marco Pulimeno a a University of Salento, Lecce, Italy
Abstract
UDDS
KETCH is a recent algorithm for accurate tracking of quantiles in data streams, derived from theDDS
KETCH algorithm. UDDS
KETCH provides accuracy guarantees covering the full range of quantilesindependently of the input distribution and greatly improves the accuracy with regard to DDS
KETCH .In this paper we show how to compress and fuse data streams (or datasets) by using UDDS
KETCH datasummaries that are fused into a new summary related to the union of the streams (or datasets) processed bythe input summaries whilst preserving both the error and size guarantees provided by UDDS
KETCH . Thisproperty of sketches, known as mergeability, enables parallel and distributed processing. We prove thatUDDS
KETCH is fully mergeable and introduce a parallel version of UDDS
KETCH suitable for message-passing based architectures. We formally prove its correctness and compare it to a parallel version ofDDS
KETCH , showing through extensive experimental results that our parallel algorithm almost alwaysoutperforms the parallel DDS
KETCH algorithm with regard to the overall accuracy in determining thequantiles.
Keywords:
Quantiles, sketches, message-passing.
1. Introduction
Mergeability of data summaries is an importantproperty [1] since it allows parallel and distributedprocessing of datasets. In general, given two sum-maries on two datasets, mergeability means thatthere exists an algorithm to merge the two sum-maries into a single summary related to the unionof the two datasets, simultaneously preservingthe error and size guarantees. Big volume datastreams (or big data) can therefore be compressedand fused by means of a suitable, mergeable sketchdata structure.To formally define the concept of mergeability,we shall denote by S () a summarization algorithm,by D a dataset, by ǫ an error parameter and by S ( D , ǫ ) a valid summary for D with error ǫ pro-duced by S () . The summarization algorithm S () is mergeable if there is an algorithm A that, given ∗ Corresponding author
Email addresses: [email protected] (Massimo Cafaro), [email protected] (Catiuscia Melle), [email protected] (Italo Epicoco), [email protected] (Marco Pulimeno) two input summaries S ( D , ǫ ) and S ( D , ǫ ) , out-puts a summary S ( D ⊎ D , ǫ ) (here ⊎ stands forthe multiset sum operation [12]).Even though mergeability is a fundamentalproperty of data summary, merging algorithmsmay not be necessarily simple or may be complexto formally prove correct. In particular, mergingalgorithms for the problems of heavy hitters andquantiles were not known until a few years ago.Regarding heavy hitters, Cormode and Had-jieleftheriou presented in 2009 [5] a survey of exist-ing algorithms, classifying them as either counter–based or sketch–based. In the concluding remarks,Cormode and Hadjieleftheriou stated that “In thedistributed data case, different parts of the in-put are seen by different parties (different routersin a network, or different stores making sales).The problem is then to find items which are fre-quent over the union of all the inputs. Again dueto their linearity properties, sketches can easilysolve such problems. It is less clear whether onecan merge together multiple counter–based sum-maries to obtain a summary with the same accu-racy and worst–case space bounds”.The first merging algorithm for summaries ob- Preprint submitted to Elsevier January 19, 2021 ained by running the M
ISRA -G RIES algorithm [11](rediscovered and improved by [6] and [8] andalso known as F
REQUENT ) was published in 2011[4]. One year later, [1] provided a new merge al-gorithm for F
REQUENT and S
PACE S AVING [10],showing that the summaries of these algorithmsare isomorphic. The same paper also provided amerging algorithms for Greenwald-Khanna quan-tile summaries. Later, improved merging algo-rithms for both M
ISRA -G RIES and S
PACE S AVING summaries were presented [2] [3].We formally prove that our UDDS
KETCH [7]data summary for tracking quantiles is mergeable,design and analyze a corresponding parallel al-gorithm and provide extensive experimental re-sults showing the excellent scalability and accu-racy achieved. This result enables parallel and dis-tributed processing of big volume data streams (orbig data), that can be compressed and fused for ac-curate quantile tracking and analysis.The rest of this paper is organized as follows. Werecall related work in Section 2. The merge pro-cedure is presented in Section 3 and it is formallyproved to be correct in Section 4. Experimental re-sults are provided and discussed in Section 5. Fi-nally, we draw our conclusions in Section 6.
2. Related Work
UDDS
KETCH is based on the DDS
KETCH algo-rithm [9], and achieves better accuracy by using adifferent, carefully designed collapsing procedure.Basically, DDS
KETCH allows computing quantilesin a streaming setting, with accuracy defined asfollows. Let S be a multi-set of size n over R and R ( x ) the rank of the element x , (the number of el-ements in S smaller than or equal to x ). Then, theitem x whose rank R ( x ) in the sorted multi-set S is ⌊ + q ( n − ) ⌋ (respectively ⌈ + q ( n − ) ⌉ ) for0 ≤ q ≤ q -quantile item x q ∈ S . For instance, x and x arerespectively the minimum and maximum elementof S , whilst x is the median. We are now readyto define relative accuracy. Definition 1.
Relative accuracy. ˜ x q is an α -accurateq-quantile if | ˜ x q − x q | ≤ α x q for a given q-quantileitem x q ∈ S. A sketch data structure is an α -accurate ( q , q ) -sketch if it can output α -accurate q-quantilesfor q ≤ q ≤ q . The DDS
KETCH data summary is a collection ofbuckets. The algorithm handles items x ∈ R > and requires in input two parameters: the first one, α ,is related to the user’s defined accuracy; the sec-ond one, m , represents the maximum number ofbuckets allowed. Using α , the algorithm derivesthe quantity γ = + α − α which is used to define theboundaries of the i th bucket B i . All of the values x such that γ i − < x ≤ γ i fall in the bucket B i , with i = ⌈ log γ x ⌉ , which is just a counter variable ini-tially set to zero. We recall here that DDS KETCH can also handle negative values by using anothersketch in which an item x ∈ R < is handled byinserting − x .Inserting a value is done by simply increment-ing the counter by one; similarly deleting a valuerequires decrementing by one the correspondingcounter (when a counter reaches the value zero,the corresponding bucket is discarded and thrownaway). Initially the summary is empty, and buck-ets are dynamically added as needed. It is worthnoting here that bucket indexes are dynamic aswell, depending just on the input value x to beinserted and on the γ value. In order to avoidthat the summary grows without bounds, whenthe number of buckets in the summary exceeds themaximum number of m buckets, a collapsing pro-cedure is executed. The collapse is done on thefirst two buckets with counts greater than zero (al-ternatively, it can be done on the last two buck-ets). Let the first two buckets be respectively B y and B z , with y < z . Collapsing works as follows:the count stored by B y is added to B z , and B y is re-moved from the summary. Algorithm 1 presentsthe pseudo-code for the insertion of a value x intothe summary S . Algorithm 1
DDSketch-Insert( x , S ) Require: x ∈ R > i ← ⌈ log γ x ⌉ if B i ∈ S then B i ← B i + else B i ← S ← S ∪ B i end ifif |S| > m then let B y and B z be the first two buckets B z ← B y + B z S ← S r B y end if UDDS
KETCH uses a uniform collapsing proce-2ure that provides far better accuracy with re-gard to DDS
KETCH . In practice, we collapse allof the buckets, two by two. Given a pair of in-dices ( i , i + ) , with i an odd index and B i , B i + ,
0, we create and add to the summarya new bucket with index j = ⌈ i ⌉ , with countervalue equal to the sum of the B i and B i + coun-ters. The new bucket replaces the two collapsedbuckets. Algorithm 2 reports the pseudocode ofthe uniform collapse procedure. Algorithm 2
UniformCollapse( S ) Require: sketch S = { B i } i for each { i : B i > } do j ← ⌈ i ⌉ B ′ j ← B ′ j + B i end forreturn S ← { B ′ i } i In [7] we provide a theoretical bound on the ac-curacy achieved by the UDDS
KETCH data sum-mary.
3. Mergeability of UDDS
KETCH
Letting k ( n , ǫ ) be the maximum size of a sum-mary S ( D , ǫ ) for any D consisting of n items, thesize of the merged summary S ( D ⊎ D , ǫ ) is, ingeneral, at most k ( | D | + | D | , ǫ ) . In our case, themaximum size of the UDDS KETCH data summary m is independent of n , being the sketch a collec-tion of at most m buckets with m = O ( ) (froma practical perspective, m can be a small constant;as an example, m =
500 is already enough toprovide good accuracy). Therefore, we shall de-note the maximum size of our summary as k ( m , ǫ ) .We shall show that the size of a merged summary S ( D ⊎ D , ǫ ) for UDDS KETCH is still k ( m , ǫ ) .Our parallel UDDS KETCH algorithm is bothsimple and fast. Basically, the input dataset, con-sisting of n items, is partitioned among the avail-able p processes, so that each process p i is in chargeof processing either ⌈ np ⌉ or ⌊ np ⌋ items using itsown UDDS KETCH data structure S i . Next, all ofthe processes execute a parallel reduction, using asuser’s defined reduction operator the Algorithm 3,which works as follows.We shall denote by { B ik } k the set of buckets ofthe sketch S i , and by m the maximum number of buckets related to the size of a sketch. The algo-rithm merges two input sketches S and S ; with-out loss of generality, we assume that the γ val-ues for the two sketches are the same (full detailsshall be provided in the next Section, in which weformally prove the correctness of our merge proce-dure).An UDDS KETCH data structure S m , which shallbe returned as the merged sketch, is initialized.The merge procedure is based on the fact thatgiven the common γ value, each bucket interval isfixed. Therefore, in order to merge two sketches itis enough to add the counters of buckets coveringthe same interval. For the remaining buckets in S and S we just create a bucket in the merged sketchwith the same count. As a consequence, mergingis done by scanning the buckets of S and S andconsidering only those buckets whose counter isgreater than zero. However, the newly created S m sketch may exceed the size limit. Therefore, wecheck if the size of S m exceeds m buckets and, incase, we invoke the UDDS KETCH U NIFORM C OL - LAPSE () procedure to enforce the constraint on thesize. Finally, we return the merged sketch S m .We now analyze the computational complexityof Algorithm 3. Initializing the merged sketch S m requires O ( ) constant time in the worst case.Scanning S and S requires in the worst case O ( m ) time. Indeed, there are m buckets in eachof the input sketches, and for each one we execute O ( ) operations, taking into account that search-ing for corresponding buckets is done through anhash table. Finally, the U NIFORM C OLLAPSE () op-eration requires at most O ( m ) time in the worstcase (again, we just need to scan at most m buck-ets). Taking into account that m = O ( ) , overallthe worst case computational complexity of Algo-rithm 3 is O ( ) .The computational complexity of the parallelUDDS KETCH algorithm is therefore O ( np + log p ) since each process p i spends O ( np ) to insert itsshare of the input items in its sketch, and the par-allel reduction requires O ( log p ) (there are log p steps, each one costing O ( ) ). Finally, we remarkhere that Algorithm 3 can also be used in a dis-tributed setting.
4. Correctness
In this Section we formally prove that our par-allel UDDS
KETCH algorithm is correct when exe-3 lgorithm 3
Merge( S , S ) Require: S = { B i } i , S = { B j } j : sketches to bemerged Ensure: S m ← { B mk } k : merged sketchI NIT ( S m ) for each { i : B i > ∨ B i > } do B mi ← B i + B i end forif S m . size > m then U NIFORM C OLLAPSE ( S m ) end ifreturn S m cuted on p processors (or cores). We need the fol-lowing definition. Definition 2.
A multiset N = ( N , f ) is a pair whereN is some set, called the underlying set of N , andf : N → N is a function. The generalized indicatorfunction of N isI N ( x ) : = (cid:26) f ( x ) x ∈ N ,0 x < N , (1) where the integer–valued function f , for each x ∈ N,provides its multiplicity, i.e., the number of occurrencesof x in N . The cardinality of N is expressed by |N | : = Card ( N ) = ∑ x ∈ N I N ( x ) , (2) whilst the cardinality of the underlying set N is | N | : = Card ( N ) = ∑ x ∈ N
1. (3)A multiset (also called a bag ) essentially is a setwhere the duplication of elements is allowed. Wealso need the definition of the sum operation [12]for multisets.
Definition 3.
Let A = ( A , f ) and B = ( B , g ) be twomultisets. The sum of A and B is the multiset whoseunderlying set is the union of the underlying sets andwhose multiplicity function is the sum of the multiplic-ity functions: A ⊎ B = (( A ∪ B ) , f + g ) . In the sequel, N will play the role of a finite in-put dataset, containing n items. We partition theoriginal dataset N , considered as a multiset, in p datasets N i ( i =
0, . . . , p − ) , namely N = U i N i .Let the dataset N i be assigned to the processor p i ,whose rank is denoted by id , with id =
0, . . . , p −
1. Let also |N i | denote the cardinality of N i , with ∑ i |N i | = |N | = n .The first step of the algorithm consists in the ex-ecution of the sequential UDDS KETCH algorithm(which has already been proved to be correct) onthe dataset assigned to each processor p i . There-fore, in order to prove the overall correctness ofthe algorithm, we just need to demonstrate that theparallel reduction is correct.Our strategy is to prove that if a single sub-step of the parallel reduction is correct (i.e., Algo-rithm 3), then we can naturally extend the proofto the O ( log p ) steps of the whole parallel reduc-tion. We begin by proving the following Lemma,which states that UDDSketch is permutation in-variant with regard to insertion-only streams. Lemma 1.
UDDSketch is permutation invariant withregard to insertion-only streams, i.e., it produces thesame sketch regardless of the order in which the inputitems are inserted.Proof.
Let D = ( ∆ , µ ) be a multiset representing aninsertion-only input stream (i.e., deleting an itemis not allowed). ∆ ⊂ R + is the underlying set of D and µ : ∆ → N ≥ is its multiplicity function.Let i γ : ∆ → Z : i γ ( x ) = ⌈ log γ x ⌉ denote thefunction which maps each item x ∈ ∆ to the cor-responding bucket in the sketch built by UDDS- KETCH processing D and assume that the sketchcan grow unbounded. Then i γ ( ∆ ) , the image of ∆ through the mapping function i γ , correspondsto the set of bucket keys in the sketch summariz-ing the multiset D with a guaranteed accuracy of α = γ − γ + and | i γ ( ∆ ) | is the number of such buckets,i.e., the size of the sketch.Moreover, for each bucket key k ∈ i γ ( ∆ ) , thepreimage of k under i γ , denoted by i − γ ( k ) , is theset of items assigned to the bucket B k , and we cancompute the value of a bucket B k as the sum of themultiplicities of its items in the input dataset, i.e., B k = ∑ x ∈ i − γ ( k ) µ ( x ) .Therefore, the sketch computed by UDDSketchon a dataset D is completely determined by thesets i γ ( ∆ ) and i − γ ( k ) ∀ k ∈ i γ ( ∆ ) which do not de-pend on the order in which the items in D areprocessed. We can represent the sketch producedby UDDSketch for the dataset D as the multiset S = ( i γ ( ∆ ) , β ) , where β : Σ → N ≥ : β ( k ) = ∑ x ∈ i − γ ( k ) µ ( x ) .When the sketch is allowed to grow unbounded,the value of γ and consequently the accuracy of the4ketch is not constrained; it can be set arbitrarilyand is not modified by UDDSketch. On the con-trary, when a limit to the number of buckets is im-posed, UDDSketch must determine the value of γ that allows respecting that limit, i.e., the value of γ also becomes an output of the algorithm.In fact, the collapsing procedure of UDDSketchis equivalent to a change of the value of γ , whichis squared in each collapse operation, and a sketchreconstruction through the mapping function us-ing the new γ value. When a limit of m buckets isimposed to the size of the sketch and that limit isexceeded with the current value of γ , UUDSketchsquares that value and reconstructs the sketch un-til the constraint | i γ ( ∆ ) | ≤ m is satisfied.The characterization of the sketch as the mul-tiset ( i γ ( ∆ ) , β ) continues to hold even if collaps-ing operations are executed with γ set to the valueneeded to respect the sketch size constraint, andthe sketch remains invariant with regard to the or-der in which the items are processed or the order inwhich the collapsing operations are executed, thusproving that UDDSketch is permutation invariantwhen processing insertion-only streams.We consider now a single step of the parallel re-duction, i.e., the case when the input dataset, rep-resented by a multiset D , is partitioned into themultisets D and D , so that D = D U D , where U represents the sum operation [12]. We indepen-dently process D and D with two instances ofUDDSketch initialized with the same initial valueof the parameter γ and the same limit to the num-ber of buckets.Without loss of generality, we assume that the fi-nal values of γ for the two sketches are the same.In fact, we prove here that this is not restric-tive. Setting the same initial conditions, the se-quence of values that γ can assume due to col-lapses of the two sketches is the same, i.e., γ ∈{ γ , γ , γ , γ . . . } holds for both the sketches. Ifthe final values of γ do not match, we can alwaysrepeatedly collapse the sketch with smaller γ untilit matches the γ of the other sketch. We shall showthat the following Theorem holds. Theorem 1.
Let D = ( ∆ , µ ) and D = ( ∆ , µ ) betwo multisets and S and S the sketches produced byUDDSketch respectively processing D and D with alimit to the number of buckets, m, and an initial valueof γ = γ . Denote by S m the sketch obtained by merg-ing S and S on the basis of the UDDSketch merge procedure and denote by S g the sketch that UDDSketchwould produce on the multiset D = ( ∆ , µ ) = D U D with the same size limit m and the same initial value of γ = γ . Then, S g = S m .Proof. We shall prove that separately computing S and S and then merging them in order to ob-tain S m , results in the same sequence of operationsrelated to sequentially processing through UDDS-ketch all of the items in D , but in a particular order.Without loss of generality, we assume that thefinal value of γ for S is larger than that for S , theother case being symmetric.To make it possible merging S and S , we needto repeatedly collapse S until its γ value (and con-sequently its mapping function) matches the oneof S . After this preliminary operation, all of theitems available both in D and D turn out to beprocessed by the same mapping function althoughby two separate sketches. This also means thatbuckets with the same key in the two sketches havethe same boundaries.Denote by T the sketch computed by sequen-tially processing D . We start the sequential proce-dure by first inserting in T all of the items in D .Therefore, at the end, it holds that T = S . Then,we continue to insert in T all of the items in D that fall in buckets already present in T . This pro-duces the same result that we obtain in the merg-ing procedure, when we set S m = S and incre-ment the count of each bucket in S m with the countof the bucket with the same key in the sketch S , ifit exists. Now, we continue to insert in T all ofthe remaining items of D , which leads to the cre-ation of new buckets in T , but we do not collapsethe sketch for now. This corresponds to adding tothe sketch S m all of the buckets in S with keysthat are not yet in S m and this concludes the firststep of the merging procedure. Up to this point,consisting of the same operations, the sequentialprocedure on D and the merging procedure on S and S produce two identical sketches, T = S m .The second step of the merging procedure consistsof collapsing S m until the constraint on the sketchsize, m , is satisfied, but this constraint also holdsfor T , which is subject to the same number of col-lapses. Thus, the equality is maintained. T is the sketch that we obtain processingthrough UDDSketch the dataset D in a particu-lar order of insertions and collapses, but we knowfrom Lemma 1 that the order of insertions and col-lapses is not relevant, therefore we can conclude5hat T = S g which finally proves the thesis S m = S g .Lemma 1 and Theorem 1 hold for insertion-only input streams. When the input stream alsoincludes deletions, the permutation invariance ofUDDSketch and consequently the equality be-tween the two sketches S m and S g can not be guar-anteed. Anyway, the following Theorem, holdseven when deletions are allowed. Theorem 2.
Let σ and σ be two streams includinginsertions and deletions of items drawn from the uni-verse set U = [ x min , x max ] ⊂ R + and S and S be the sketches produced by UDDSketch processing re-spectively σ and σ with the sketch size limited to mbuckets, and an initial value of γ = γ . Denote by S m the sketch obtained by merging S and S on the ba-sis of the UDDSketch merge procedure and denote by S g the sketch that UDDSketch would produce on thestream σ = σ U σ with the sketch size limited to thesame number of buckets, m, and the same initial valueof γ = γ . Then, S g and S m have the same error bound.Proof. The value of γ during the execution of UD-DSketch can only grow due to the collapses of thesketch and its final value depends on the orderin which deletions are interleaved with insertions.The worst case scenario, when γ reaches its largestvalue, happens when all of the deletions are post-poned after all of the insertions. This particular or-der of insertions and deletions, in turn, producesa sketch with the same final value of γ that onewould obtain by processing only the insertions ofthe input stream and completely ignoring the dele-tions. In fact, deletions may change the bucketcounters’ values in a sketch, but not its γ value.On the other hand, an insertion-only stream fallsin the hypothesis of Theorem 1. Thus, if we con-sider only the insertions in σ , σ and their concate-nation σ , and ignore deletions, the two sketches S m and S g would be equal and have the same finalvalue of γ . Denote by ˜ γ this value, then, with re-gard to the original input streams with deletions,˜ γ is an upper bound on the values of γ both for S m and S g .We know that the value of γ for S g is guaran-teed as bounded by Theorem 3 of [7], i.e., γ ≤ ˜ γ ≤ (cid:16) m q x max x min (cid:17) . Therefore, the guarantee on the accu-racy of UDDSketch stated by Theorem 3 of [7] con-tinues to hold also for a sketch computed through the merge procedure. Lemma 2.
The parallel reduction in which the sketches S , · · · , S p are processed on p processors or cores of ex-ecution is correct.Proof. Consider a single step of the reduction, inwhich two sketches S i and S j are merged produc-ing the sketch S m . By Theorem 1 and 2, the sketch S m is correct and subject to the same error boundof both S i and S j . Now consider the whole reduc-tion operation. Let A = ( A , f ) , B = ( B , f ) and C = ( C , f ) . It can be easily shown, by reduction tothe analogous properties holding in the ring of theintegers, that the multiset sum operation has thefollowing properties.1. Commutativity: A U B = B U A .2. Associativity: ( A U B ) U C = A U ( B U C ) ;3. There exists a multiset, the null multiset ǫ =( ∅ , g : x → ) , such that A U ǫ = A .Regarding commutativity, A ] B = (( A ∪ B ) , f + f )= (( A ∪ B ) , f + f )= B ] A (4)For associativity, A ] (cid:16) B ] C (cid:17) = (( A ∪ B ∪ C ) , f + ( f + f ))= (( A ∪ B ∪ C ) , ( f + f ) + f )= (cid:16) A ] B (cid:17) ] C (5)Finally, let ǫ = ( ∅ , g : x → ) be the emptymultiset, i.e. the unique multiset with an emptyunderlying set; thus Card ( ǫ ) = A ] ǫ = (( A ∪ ∅ ) , f + g )= ( A , f )= A (6)Therefore, the merge procedure described fortwo multisets can be used, being associative, as aparallel reduction operator. Moreover, being alsocommutative, the order of evaluation must not benecessarily fixed (e.g., for non commutative user’sdefined operators in MPI is defined to be in as-cending, process rank order, beginning with pro-cess zero) but can be changed, taking advantage6 able 1: Synthetic datasets Dataset Min value Max value Distribution beta 3.04 × − Beta (
5, 1.5 ) exponential 1.19 × − Exp ( ) lognormal 1.08 × − × Lognormal (
1, 1.5 ) normal 39.7 60.5 N ( , 20000 ) uniform 2.18 × − × Unif (
5, 10 ) of commutativity and associativity. Moreover, thefinal sketch obtained by the parallel reduction op-erator is also subject to the same error bound of theinput sketches.
5. Experimental Results
In this Section, we present and discuss the re-sults of the experiments carried out for both UD-DS
KETCH and DDS
KETCH . The aim is twofold: i)we aim at showing that the accuracy does not de-crease when executing the algorithm in parallel; ii)the running time of UDDS
KETCH is similar to thatof DDS
KETCH .Both algorithms have been implemented in C++.The tests have been executed on two supercomput-ers: Marconi100 (at CINECA, Italy) and Zeus (atEuro Mediterranean Center on Climate Change,Foundation, Italy). Marconi100 is made of 980computing nodes equipped with 2 16-cores IBMPower9 processors, 256 GB of main memory andMellanox Infiniband EDR DragonFly+; the codehas been compiled with the PGI compiler pgc++version 20.9-0 with optimization level O3. Zeusis a parallel cluster made of 384 computing node,each one equipped with 2 18-cores Intel Xeon Goldprocessors, 96 GB of main memory and MellanoxInfiniband EDR network; the code has been com-piled with the Intel compiler icpc v19.0.5 with op-timization level O3. The source code is freely avail-able for inspection and reproducibility of results .The tests have been performed on 5 syntheticdatasets, whose properties are summarized in Ta-ble 1. The experiments have been executed vary-ing the number of parallel processes and measur-ing the execution time, the q -accuracy, the finalvalue of α and the total number of collapses forboth algorithms. We recall that for UDDS KETCH the q -accuracy is equal to 0 by construction, and https://github.com/cafaro/PUDDSKETCH Table 2: Experiments parameters Parameter Set of values
Number of procs. (M100) {
32, 64, 128, 256, 512 } Number of procs. (Zeus) {
36, 72, 144, 288, 576 } Stream Lenght (M100) 16 · Stream Lenght (Zeus) 18 · User α for DDS KETCH the final value of α is equal to itsinitial value. The stream length and the sketchsize have been kept constant for every experimentsas reported in Table 2. The results obtained onboth parallel computers are totally equivalent andshowed the same behaviours; for this reason, wereport here only the results on Marconi100.Fig. 1 reports the total number of collapses forDDS KETCH and UDDS
KETCH . As expected, wehave that DDS
KETCH performs a number of col-lapses which is about three order of magnitudegreater than those performed by UDDS
KETCH .Even though the running time for both a DDS-
KETCH and a UDDS
KETCH collapse is O ( ) , theasymptotic notation hides a bigger constant in thecase of UDDS KETCH .The parallel computation performance is shownin Fig. 2 in which we use log-log plots to rep-resent the parallel running time of DDS
KETCH and UDDS
KETCH with different input distribu-tions. The log-log plots give also a clear evidenceof the parallel scalability of the algorithms: indeed,the ideal parallel speedup is represented by curvewith slope equal to −
1. The results clearly showthat our UDDS
KETCH algorithm provides a goodparallel scalability and its parallel running time isequal to the DDS
KETCH ; only with the exponentialdistribution UDDS
KETCH is slightly slower thanthe DDS
KETCH (the difference in the executiontime is less than 5%).Moreover, UDDS
KETCH outperforms DDS-
KETCH with regard to the accuracy. Table 3 reportsthe q -accuracy and α value at the end of compu-tation; as shown, UDDS KETCH has a q -accuracyequal to 0 for every distributions, which meansthat it can provide an accurate estimation for all ofthe quantiles, with a relative error less than α ; in-stead, DDS KETCH is accurate only for those quan-tiles greater than q which, for some distributionslike the exponential and the lognormal, is greaterthan 0.99, demonstrating that the sketch size is notbig enough to guarantee a quantile estimation with7 ● ● ● ●○ ○ ○ ○ ○▲ ▲ ▲ ▲ ▲△ △ △ △ △★ ★ ★ ★ ★
32 64 128 256 5120500100015002000250030003500
Number of Processes
C(cid:0)(cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:7) ● NORMAL ○ LOGNORMAL ▲ UNIFORM △ EXPONENTIAL ★ BETA
Number of Collapses - DDSketch (a) ● ● ● ● ●○ ○ ○ ○ ○▲ ▲ ▲ ▲ ▲△ △ △ △ △★ ★ ★ ★ ★
32 64 128 256 512012345
Number of Processes C o ll ap s e s ● NORMAL ○ LOGNORMAL ▲ UNIFORM △ EXPONENTIAL ★ BETA
Number of Collapses - UDDSketch (b)Figure 1: Number of sketch collapses varying the input distribution.Table 3: Accuracy
DDS
KETCH
UDDS
KETCH
Dataset q -accuracy α q -accuracy α beta 0.798 0.001 0 0.019exponential 0.998 0.001 0 0.031lognormal 0.999 0.001 0 0.031normal 0 0.001 0 0.001uniform 0.360 0.001 0 0.016 an error less than α = KETCH al-gorithm, instead, is self adaptive and consistentlymakes good use of the available space: for the ex-ponential and the lognormal distributions it usesa greater value for α to guarantee a quantile es-timation along all of the quantiles range with anerror as small as possible using the sketch size de-fined by the user. Therefore, the results confirmthat the parallel version of the UDDS KETCH algo-rithm outperforms DDS
KETCH with regard to theaccuracy, and that simultaneously it exhibits goodparallel scalability and a running time comparablewith DDS
KETCH .
6. Conclusions
In this paper we have introduced a parallelversion of the UDDS
KETCH algorithm for accu- rate quantile tracking and analysis, suitable formessage-passing based architectures. The algo-rithm allows compressing and fusing big volumedata streams (or big data) retaining the error andsize guarantees provided by the sequential UD-DS
KETCH algorithm. We have formally proved itscorrectness and compared it to a parallel versionof DDS
KETCH . The extensive experimental resultsconfirm the validity of our approach, since our al-gorithm almost always outperforms the parallelDDS
KETCH algorithm with regard to the overallaccuracy in determining the quantiles, providingsimultaneously a good parallel scalability.
Acknowledgments
The authors would like to thank CINECA forgranting the access to the Marconi M100 super-computer machine through grant IsC80 PDQAHP10CZD477, and Euro Mediterranean Center onClimate Change, Foundation, Italy for granting theaccess to the Zeus supercomputer machine.8 ● ● ● ●○ ○ ○ ○ ○
32 64 128 256 51220050010002000
Number of Processes E l ap s ed T i m e ( S e c ond s ) ● UDDSketch ○ DDSketch
Beta
D(cid:8)(cid:9)(cid:10)(cid:11)(cid:12)(cid:13)(cid:14)(cid:15)(cid:16)(cid:17)(cid:18) - Stream = · (cid:20)(cid:21)9 (a) ● ● ● ● ●○ ○ ○ ○ ○
32 64 128 256 51250100200500
Number of Processes E l ap s ed T i m e ( S e c ond s ) ● UDDSketch ○ DDSketch
Exponential Distribution - Stream = · (cid:22) (b) ● ● ● ● ●○ ○ ○ ○ ○
32 64 128 256 51250100200500
Number of Processes E l ap s ed T i m e ( S e c ond s ) ● UDDSketch ○ DDSketch
L(cid:23)(cid:24)(cid:25)(cid:26)(cid:27)(cid:28)(cid:29)(cid:30)
Distribution - Stream = · (cid:31) (c) ● ● ● ● ●○ ○ ○ ○ ○
32 64 128 256 51250100200500
Number of Processes E l ap s ed T i m e ( S e c ond s ) ● UDDSketch ○ DDSketch
Normal
Distribution - Stream = · (d) ● ● ● ● ●○ ○ ○ ○ ○
32 64 128 256 51250100200500
Number of Processes E l ap s ed T i m e ( S e c ond s ) ● UDDSketch ○ DDSketch
U!"
Distribution - Stream = · ’ (e)Figure 2: Parallel Running time varying the number of processes in log-log plots. eferences [1] Pankaj K. Agarwal, Graham Cormode, Zengfeng Huang,Jeff Phillips, Zhewei Wei, and Ke Yi, Mergeable summaries ,Proceedings of the 31st acm sigmod-sigact-sigai sympo-sium on principles of database systems, 2012, pp. 23–34.[2] Massimo Cafaro and Marco Pulimeno,
Merging frequentsummaries , Proceedings of the 17th italian conference ontheoretical computer science (ictcs 2016), volume 1720,2016, pp. 280–285.[3] Massimo Cafaro, Marco Pulimeno, and Piergiulio Tem-pesta,
A parallel space saving algorithm for frequent itemsand the hurwitz zeta distribution , Information Sciences (2016), 1 –19.[4] Massimo Cafaro and Piergiulio Tempesta,
Finding frequentitems in parallel , oncurrency and Computation: Practiceand Experience (October 2011), no. 15, 1774–1788.[5] Graham Cormode and Marios Hadjieleftheriou, Findingthe frequent items in streams of data , Commun. ACM (2009), no. 10, 97–105.[6] Erik D. Demaine, Alejandro L´opez-Ortiz, and J. IanMunro, Frequency estimation of internet packet streams withlimited space , Esa, 2002, pp. 348–360. [7] I. Epicoco, C. Melle, M. Cafaro, M. Pulimeno, and G. Mor-leo,
Uddsketch: Accurate tracking of quantiles in data streams ,IEEE Access (2020), 147604–147617.[8] Richard M. Karp, Scott Shenker, and Christos H. Papadim-itriou, A simple algorithm for finding frequent elements instreams and bags , ACM Trans. Database Syst. (2003),no. 1, 51–55.[9] Charles Masson, Jee E. Rim, and Homin K. Lee, Ddsketch:A fast and fully-mergeable quantile sketch with relative-errorguarantees , Proc. VLDB Endow. (August 2019), no. 12,2195–2205.[10] Ahmed Metwally, Divyakant Agrawal, and Amr El Ab-badi, An integrated efficient solution for computing frequentand top-k elements in data streams , ACM Trans. DatabaseSyst. (September 2006), no. 3, 1095–1133.[11] Jayadev Misra and David Gries, Finding repeated elements ,Sci. Comput. Program. (1982), no. 2, 143–152.[12] Apostolos Syropoulos, Mathematics of multisets , In multisetprocessing: Mathematical, computer science, and molecu-lar computing points of view, lncs 2235, 2001, pp. 347–358., In multisetprocessing: Mathematical, computer science, and molecu-lar computing points of view, lncs 2235, 2001, pp. 347–358.