[PDF] Data stream fusion for accurate quantile tracking and analysis

Abstract

UDDSKETCH is a recent algorithm for accurate tracking of quantiles in data streams, derived from the DDSKETCH algorithm. UDDSKETCH provides accuracy guarantees covering the full range of quantiles independently of the input distribution and greatly improves the accuracy with regard to DDSKETCH. In this paper we show how to compress and fuse data streams (or datasets) by using UDDSKETCH data summaries that are fused into a new summary related to the union of the streams (or datasets) processed by the input summaries whilst preserving both the error and size guarantees provided by UDDSKETCH. This property of sketches, known as mergeability, enables parallel and distributed processing. We prove that UDDSKETCH is fully mergeable and introduce a parallel version of UDDSKETCH suitable for message-passing based architectures. We formally prove its correctness and compare it to a parallel version of DDSKETCH, showing through extensive experimental results that our parallel algorithm almost always outperforms the parallel DDSKETCH algorithm with regard to the overall accuracy in determining the quantiles.

Full PDF

aa r X i v : . [ c s . D S ] J a n Data stream fusion for accurate quantile tracking and analysis

Massimo Cafaro a, ∗ , Catiuscia Melle a , Italo Epicoco a , Marco Pulimeno a a University of Salento, Lecce, Italy

Abstract

UDDS

KETCH is a recent algorithm for accurate tracking of quantiles in data streams, derived from theDDS

KETCH algorithm. UDDS

KETCH provides accuracy guarantees covering the full range of quantilesindependently of the input distribution and greatly improves the accuracy with regard to DDS

KETCH .In this paper we show how to compress and fuse data streams (or datasets) by using UDDS

KETCH datasummaries that are fused into a new summary related to the union of the streams (or datasets) processed bythe input summaries whilst preserving both the error and size guarantees provided by UDDS

KETCH . Thisproperty of sketches, known as mergeability, enables parallel and distributed processing. We prove thatUDDS

KETCH is fully mergeable and introduce a parallel version of UDDS

KETCH suitable for message-passing based architectures. We formally prove its correctness and compare it to a parallel version ofDDS

KETCH , showing through extensive experimental results that our parallel algorithm almost alwaysoutperforms the parallel DDS

KETCH algorithm with regard to the overall accuracy in determining thequantiles.

Keywords:

Quantiles, sketches, message-passing.

1. Introduction

Mergeability of data summaries is an importantproperty [1] since it allows parallel and distributedprocessing of datasets. In general, given two sum-maries on two datasets, mergeability means thatthere exists an algorithm to merge the two sum-maries into a single summary related to the unionof the two datasets, simultaneously preservingthe error and size guarantees. Big volume datastreams (or big data) can therefore be compressedand fused by means of a suitable, mergeable sketchdata structure.To formally deﬁne the concept of mergeability,we shall denote by S () a summarization algorithm,by D a dataset, by ǫ an error parameter and by S ( D , ǫ ) a valid summary for D with error ǫ pro-duced by S () . The summarization algorithm S () is mergeable if there is an algorithm A that, given ∗ Corresponding author

Email addresses: [email protected] (Massimo Cafaro), [email protected] (Catiuscia Melle), [email protected] (Italo Epicoco), [email protected] (Marco Pulimeno) two input summaries S ( D , ǫ ) and S ( D , ǫ ) , out-puts a summary S ( D ⊎ D , ǫ ) (here ⊎ stands forthe multiset sum operation [12]).Even though mergeability is a fundamentalproperty of data summary, merging algorithmsmay not be necessarily simple or may be complexto formally prove correct. In particular, mergingalgorithms for the problems of heavy hitters andquantiles were not known until a few years ago.Regarding heavy hitters, Cormode and Had-jieleftheriou presented in 2009 [5] a survey of exist-ing algorithms, classifying them as either counter–based or sketch–based. In the concluding remarks,Cormode and Hadjieleftheriou stated that “In thedistributed data case, different parts of the in-put are seen by different parties (different routersin a network, or different stores making sales).The problem is then to ﬁnd items which are fre-quent over the union of all the inputs. Again dueto their linearity properties, sketches can easilysolve such problems. It is less clear whether onecan merge together multiple counter–based sum-maries to obtain a summary with the same accu-racy and worst–case space bounds”.The ﬁrst merging algorithm for summaries ob- Preprint submitted to Elsevier January 19, 2021 ained by running the M

ISRA -G RIES algorithm [11](rediscovered and improved by [6] and [8] andalso known as F

REQUENT ) was published in 2011[4]. One year later, [1] provided a new merge al-gorithm for F

REQUENT and S

PACE S AVING [10],showing that the summaries of these algorithmsare isomorphic. The same paper also provided amerging algorithms for Greenwald-Khanna quan-tile summaries. Later, improved merging algo-rithms for both M

ISRA -G RIES and S

PACE S AVING summaries were presented [2] [3].We formally prove that our UDDS

KETCH [7]data summary for tracking quantiles is mergeable,design and analyze a corresponding parallel al-gorithm and provide extensive experimental re-sults showing the excellent scalability and accu-racy achieved. This result enables parallel and dis-tributed processing of big volume data streams (orbig data), that can be compressed and fused for ac-curate quantile tracking and analysis.The rest of this paper is organized as follows. Werecall related work in Section 2. The merge pro-cedure is presented in Section 3 and it is formallyproved to be correct in Section 4. Experimental re-sults are provided and discussed in Section 5. Fi-nally, we draw our conclusions in Section 6.

2. Related Work

UDDS

KETCH is based on the DDS

KETCH algo-rithm [9], and achieves better accuracy by using adifferent, carefully designed collapsing procedure.Basically, DDS

KETCH allows computing quantilesin a streaming setting, with accuracy deﬁned asfollows. Let S be a multi-set of size n over R and R ( x ) the rank of the element x , (the number of el-ements in S smaller than or equal to x ). Then, theitem x whose rank R ( x ) in the sorted multi-set S is ⌊ + q ( n − ) ⌋ (respectively ⌈ + q ( n − ) ⌉ ) for0 ≤ q ≤ q -quantile item x q ∈ S . For instance, x and x arerespectively the minimum and maximum elementof S , whilst x is the median. We are now readyto deﬁne relative accuracy. Deﬁnition 1.

Relative accuracy. ˜ x q is an α -accurateq-quantile if | ˜ x q − x q | ≤ α x q for a given q-quantileitem x q ∈ S. A sketch data structure is an α -accurate ( q , q ) -sketch if it can output α -accurate q-quantilesfor q ≤ q ≤ q . The DDS

KETCH data summary is a collection ofbuckets. The algorithm handles items x ∈ R > and requires in input two parameters: the ﬁrst one, α ,is related to the user’s deﬁned accuracy; the sec-ond one, m , represents the maximum number ofbuckets allowed. Using α , the algorithm derivesthe quantity γ = + α − α which is used to deﬁne theboundaries of the i th bucket B i . All of the values x such that γ i − < x ≤ γ i fall in the bucket B i , with i = ⌈ log γ x ⌉ , which is just a counter variable ini-tially set to zero. We recall here that DDS KETCH can also handle negative values by using anothersketch in which an item x ∈ R < is handled byinserting − x .Inserting a value is done by simply increment-ing the counter by one; similarly deleting a valuerequires decrementing by one the correspondingcounter (when a counter reaches the value zero,the corresponding bucket is discarded and thrownaway). Initially the summary is empty, and buck-ets are dynamically added as needed. It is worthnoting here that bucket indexes are dynamic aswell, depending just on the input value x to beinserted and on the γ value. In order to avoidthat the summary grows without bounds, whenthe number of buckets in the summary exceeds themaximum number of m buckets, a collapsing pro-cedure is executed. The collapse is done on theﬁrst two buckets with counts greater than zero (al-ternatively, it can be done on the last two buck-ets). Let the ﬁrst two buckets be respectively B y and B z , with y < z . Collapsing works as follows:the count stored by B y is added to B z , and B y is re-moved from the summary. Algorithm 1 presentsthe pseudo-code for the insertion of a value x intothe summary S . Algorithm 1

DDSketch-Insert( x , S ) Require: x ∈ R > i ← ⌈ log γ x ⌉ if B i ∈ S then B i ← B i + else B i ← S ← S ∪ B i end ifif |S| > m then let B y and B z be the ﬁrst two buckets B z ← B y + B z S ← S r B y end if UDDS

KETCH uses a uniform collapsing proce-2ure that provides far better accuracy with re-gard to DDS

KETCH . In practice, we collapse allof the buckets, two by two. Given a pair of in-dices ( i , i + ) , with i an odd index and B i , B i + ,

0, we create and add to the summarya new bucket with index j = ⌈ i ⌉ , with countervalue equal to the sum of the B i and B i + coun-ters. The new bucket replaces the two collapsedbuckets. Algorithm 2 reports the pseudocode ofthe uniform collapse procedure. Algorithm 2

UniformCollapse( S ) Require: sketch S = { B i } i for each { i : B i > } do j ← ⌈ i ⌉ B ′ j ← B ′ j + B i end forreturn S ← { B ′ i } i In [7] we provide a theoretical bound on the ac-curacy achieved by the UDDS

KETCH data sum-mary.

3. Mergeability of UDDS

KETCH

Letting k ( n , ǫ ) be the maximum size of a sum-mary S ( D , ǫ ) for any D consisting of n items, thesize of the merged summary S ( D ⊎ D , ǫ ) is, ingeneral, at most k ( | D | + | D | , ǫ ) . In our case, themaximum size of the UDDS KETCH data summary m is independent of n , being the sketch a collec-tion of at most m buckets with m = O ( ) (froma practical perspective, m can be a small constant;as an example, m =

500 is already enough toprovide good accuracy). Therefore, we shall de-note the maximum size of our summary as k ( m , ǫ ) .We shall show that the size of a merged summary S ( D ⊎ D , ǫ ) for UDDS KETCH is still k ( m , ǫ ) .Our parallel UDDS KETCH algorithm is bothsimple and fast. Basically, the input dataset, con-sisting of n items, is partitioned among the avail-able p processes, so that each process p i is in chargeof processing either ⌈ np ⌉ or ⌊ np ⌋ items using itsown UDDS KETCH data structure S i . Next, all ofthe processes execute a parallel reduction, using asuser’s deﬁned reduction operator the Algorithm 3,which works as follows.We shall denote by { B ik } k the set of buckets ofthe sketch S i , and by m the maximum number of buckets related to the size of a sketch. The algo-rithm merges two input sketches S and S ; with-out loss of generality, we assume that the γ val-ues for the two sketches are the same (full detailsshall be provided in the next Section, in which weformally prove the correctness of our merge proce-dure).An UDDS KETCH data structure S m , which shallbe returned as the merged sketch, is initialized.The merge procedure is based on the fact thatgiven the common γ value, each bucket interval isﬁxed. Therefore, in order to merge two sketches itis enough to add the counters of buckets coveringthe same interval. For the remaining buckets in S and S we just create a bucket in the merged sketchwith the same count. As a consequence, mergingis done by scanning the buckets of S and S andconsidering only those buckets whose counter isgreater than zero. However, the newly created S m sketch may exceed the size limit. Therefore, wecheck if the size of S m exceeds m buckets and, incase, we invoke the UDDS KETCH U NIFORM C OL - LAPSE () procedure to enforce the constraint on thesize. Finally, we return the merged sketch S m .We now analyze the computational complexityof Algorithm 3. Initializing the merged sketch S m requires O ( ) constant time in the worst case.Scanning S and S requires in the worst case O ( m ) time. Indeed, there are m buckets in eachof the input sketches, and for each one we execute O ( ) operations, taking into account that search-ing for corresponding buckets is done through anhash table. Finally, the U NIFORM C OLLAPSE () op-eration requires at most O ( m ) time in the worstcase (again, we just need to scan at most m buck-ets). Taking into account that m = O ( ) , overallthe worst case computational complexity of Algo-rithm 3 is O ( ) .The computational complexity of the parallelUDDS KETCH algorithm is therefore O ( np + log p ) since each process p i spends O ( np ) to insert itsshare of the input items in its sketch, and the par-allel reduction requires O ( log p ) (there are log p steps, each one costing O ( ) ). Finally, we remarkhere that Algorithm 3 can also be used in a dis-tributed setting.

4. Correctness

In this Section we formally prove that our par-allel UDDS

KETCH algorithm is correct when exe-3 lgorithm 3

Merge( S , S ) Require: S = { B i } i , S = { B j } j : sketches to bemerged Ensure: S m ← { B mk } k : merged sketchI NIT ( S m ) for each { i : B i > ∨ B i > } do B mi ← B i + B i end forif S m . size > m then U NIFORM C OLLAPSE ( S m ) end ifreturn S m cuted on p processors (or cores). We need the fol-lowing deﬁnition. Deﬁnition 2.

A multiset N = ( N , f ) is a pair whereN is some set, called the underlying set of N , andf : N → N is a function. The generalized indicatorfunction of N isI N ( x ) : = (cid:26) f ( x ) x ∈ N ,0 x < N , (1) where the integer–valued function f , for each x ∈ N,provides its multiplicity, i.e., the number of occurrencesof x in N . The cardinality of N is expressed by |N | : = Card ( N ) = ∑ x ∈ N I N ( x ) , (2) whilst the cardinality of the underlying set N is | N | : = Card ( N ) = ∑ x ∈ N

1. (3)A multiset (also called a bag ) essentially is a setwhere the duplication of elements is allowed. Wealso need the deﬁnition of the sum operation [12]for multisets.

Deﬁnition 3.

Let A = ( A , f ) and B = ( B , g ) be twomultisets. The sum of A and B is the multiset whoseunderlying set is the union of the underlying sets andwhose multiplicity function is the sum of the multiplic-ity functions: A ⊎ B = (( A ∪ B ) , f + g ) . In the sequel, N will play the role of a ﬁnite in-put dataset, containing n items. We partition theoriginal dataset N , considered as a multiset, in p datasets N i ( i =

0, . . . , p − ) , namely N = U i N i .Let the dataset N i be assigned to the processor p i ,whose rank is denoted by id , with id =

0, . . . , p −

1. Let also |N i | denote the cardinality of N i , with ∑ i |N i | = |N | = n .The ﬁrst step of the algorithm consists in the ex-ecution of the sequential UDDS KETCH algorithm(which has already been proved to be correct) onthe dataset assigned to each processor p i . There-fore, in order to prove the overall correctness ofthe algorithm, we just need to demonstrate that theparallel reduction is correct.Our strategy is to prove that if a single sub-step of the parallel reduction is correct (i.e., Algo-rithm 3), then we can naturally extend the proofto the O ( log p ) steps of the whole parallel reduc-tion. We begin by proving the following Lemma,which states that UDDSketch is permutation in-variant with regard to insertion-only streams. Lemma 1.

UDDSketch is permutation invariant withregard to insertion-only streams, i.e., it produces thesame sketch regardless of the order in which the inputitems are inserted.Proof.

Let D = ( ∆ , µ ) be a multiset representing aninsertion-only input stream (i.e., deleting an itemis not allowed). ∆ ⊂ R + is the underlying set of D and µ : ∆ → N ≥ is its multiplicity function.Let i γ : ∆ → Z : i γ ( x ) = ⌈ log γ x ⌉ denote thefunction which maps each item x ∈ ∆ to the cor-responding bucket in the sketch built by UDDS- KETCH processing D and assume that the sketchcan grow unbounded. Then i γ ( ∆ ) , the image of ∆ through the mapping function i γ , correspondsto the set of bucket keys in the sketch summariz-ing the multiset D with a guaranteed accuracy of α = γ − γ + and | i γ ( ∆ ) | is the number of such buckets,i.e., the size of the sketch.Moreover, for each bucket key k ∈ i γ ( ∆ ) , thepreimage of k under i γ , denoted by i − γ ( k ) , is theset of items assigned to the bucket B k , and we cancompute the value of a bucket B k as the sum of themultiplicities of its items in the input dataset, i.e., B k = ∑ x ∈ i − γ ( k ) µ ( x ) .Therefore, the sketch computed by UDDSketchon a dataset D is completely determined by thesets i γ ( ∆ ) and i − γ ( k ) ∀ k ∈ i γ ( ∆ ) which do not de-pend on the order in which the items in D areprocessed. We can represent the sketch producedby UDDSketch for the dataset D as the multiset S = ( i γ ( ∆ ) , β ) , where β : Σ → N ≥ : β ( k ) = ∑ x ∈ i − γ ( k ) µ ( x ) .When the sketch is allowed to grow unbounded,the value of γ and consequently the accuracy of the4ketch is not constrained; it can be set arbitrarilyand is not modiﬁed by UDDSketch. On the con-trary, when a limit to the number of buckets is im-posed, UDDSketch must determine the value of γ that allows respecting that limit, i.e., the value of γ also becomes an output of the algorithm.In fact, the collapsing procedure of UDDSketchis equivalent to a change of the value of γ , whichis squared in each collapse operation, and a sketchreconstruction through the mapping function us-ing the new γ value. When a limit of m buckets isimposed to the size of the sketch and that limit isexceeded with the current value of γ , UUDSketchsquares that value and reconstructs the sketch un-til the constraint | i γ ( ∆ ) | ≤ m is satisﬁed.The characterization of the sketch as the mul-tiset ( i γ ( ∆ ) , β ) continues to hold even if collaps-ing operations are executed with γ set to the valueneeded to respect the sketch size constraint, andthe sketch remains invariant with regard to the or-der in which the items are processed or the order inwhich the collapsing operations are executed, thusproving that UDDSketch is permutation invariantwhen processing insertion-only streams.We consider now a single step of the parallel re-duction, i.e., the case when the input dataset, rep-resented by a multiset D , is partitioned into themultisets D and D , so that D = D U D , where U represents the sum operation [12]. We indepen-dently process D and D with two instances ofUDDSketch initialized with the same initial valueof the parameter γ and the same limit to the num-ber of buckets.Without loss of generality, we assume that the ﬁ-nal values of γ for the two sketches are the same.In fact, we prove here that this is not restric-tive. Setting the same initial conditions, the se-quence of values that γ can assume due to col-lapses of the two sketches is the same, i.e., γ ∈{ γ , γ , γ , γ . . . } holds for both the sketches. Ifthe ﬁnal values of γ do not match, we can alwaysrepeatedly collapse the sketch with smaller γ untilit matches the γ of the other sketch. We shall showthat the following Theorem holds. Theorem 1.

Let D = ( ∆ , µ ) and D = ( ∆ , µ ) betwo multisets and S and S the sketches produced byUDDSketch respectively processing D and D with alimit to the number of buckets, m, and an initial valueof γ = γ . Denote by S m the sketch obtained by merg-ing S and S on the basis of the UDDSketch merge procedure and denote by S g the sketch that UDDSketchwould produce on the multiset D = ( ∆ , µ ) = D U D with the same size limit m and the same initial value of γ = γ . Then, S g = S m .Proof. We shall prove that separately computing S and S and then merging them in order to ob-tain S m , results in the same sequence of operationsrelated to sequentially processing through UDDS-ketch all of the items in D , but in a particular order.Without loss of generality, we assume that theﬁnal value of γ for S is larger than that for S , theother case being symmetric.To make it possible merging S and S , we needto repeatedly collapse S until its γ value (and con-sequently its mapping function) matches the oneof S . After this preliminary operation, all of theitems available both in D and D turn out to beprocessed by the same mapping function althoughby two separate sketches. This also means thatbuckets with the same key in the two sketches havethe same boundaries.Denote by T the sketch computed by sequen-tially processing D . We start the sequential proce-dure by ﬁrst inserting in T all of the items in D .Therefore, at the end, it holds that T = S . Then,we continue to insert in T all of the items in D that fall in buckets already present in T . This pro-duces the same result that we obtain in the merg-ing procedure, when we set S m = S and incre-ment the count of each bucket in S m with the countof the bucket with the same key in the sketch S , ifit exists. Now, we continue to insert in T all ofthe remaining items of D , which leads to the cre-ation of new buckets in T , but we do not collapsethe sketch for now. This corresponds to adding tothe sketch S m all of the buckets in S with keysthat are not yet in S m and this concludes the ﬁrststep of the merging procedure. Up to this point,consisting of the same operations, the sequentialprocedure on D and the merging procedure on S and S produce two identical sketches, T = S m .The second step of the merging procedure consistsof collapsing S m until the constraint on the sketchsize, m , is satisﬁed, but this constraint also holdsfor T , which is subject to the same number of col-lapses. Thus, the equality is maintained. T is the sketch that we obtain processingthrough UDDSketch the dataset D in a particu-lar order of insertions and collapses, but we knowfrom Lemma 1 that the order of insertions and col-lapses is not relevant, therefore we can conclude5hat T = S g which ﬁnally proves the thesis S m = S g .Lemma 1 and Theorem 1 hold for insertion-only input streams. When the input stream alsoincludes deletions, the permutation invariance ofUDDSketch and consequently the equality be-tween the two sketches S m and S g can not be guar-anteed. Anyway, the following Theorem, holdseven when deletions are allowed. Theorem 2.

Let σ and σ be two streams includinginsertions and deletions of items drawn from the uni-verse set U = [ x min , x max ] ⊂ R + and S and S be the sketches produced by UDDSketch processing re-spectively σ and σ with the sketch size limited to mbuckets, and an initial value of γ = γ . Denote by S m the sketch obtained by merging S and S on the ba-sis of the UDDSketch merge procedure and denote by S g the sketch that UDDSketch would produce on thestream σ = σ U σ with the sketch size limited to thesame number of buckets, m, and the same initial valueof γ = γ . Then, S g and S m have the same error bound.Proof. The value of γ during the execution of UD-DSketch can only grow due to the collapses of thesketch and its ﬁnal value depends on the orderin which deletions are interleaved with insertions.The worst case scenario, when γ reaches its largestvalue, happens when all of the deletions are post-poned after all of the insertions. This particular or-der of insertions and deletions, in turn, producesa sketch with the same ﬁnal value of γ that onewould obtain by processing only the insertions ofthe input stream and completely ignoring the dele-tions. In fact, deletions may change the bucketcounters’ values in a sketch, but not its γ value.On the other hand, an insertion-only stream fallsin the hypothesis of Theorem 1. Thus, if we con-sider only the insertions in σ , σ and their concate-nation σ , and ignore deletions, the two sketches S m and S g would be equal and have the same ﬁnalvalue of γ . Denote by ˜ γ this value, then, with re-gard to the original input streams with deletions,˜ γ is an upper bound on the values of γ both for S m and S g .We know that the value of γ for S g is guaran-teed as bounded by Theorem 3 of [7], i.e., γ ≤ ˜ γ ≤ (cid:16) m q x max x min (cid:17) . Therefore, the guarantee on the accu-racy of UDDSketch stated by Theorem 3 of [7] con-tinues to hold also for a sketch computed through the merge procedure. Lemma 2.

The parallel reduction in which the sketches S , · · · , S p are processed on p processors or cores of ex-ecution is correct.Proof. Consider a single step of the reduction, inwhich two sketches S i and S j are merged produc-ing the sketch S m . By Theorem 1 and 2, the sketch S m is correct and subject to the same error boundof both S i and S j . Now consider the whole reduc-tion operation. Let A = ( A , f ) , B = ( B , f ) and C = ( C , f ) . It can be easily shown, by reduction tothe analogous properties holding in the ring of theintegers, that the multiset sum operation has thefollowing properties.1. Commutativity: A U B = B U A .2. Associativity: ( A U B ) U C = A U ( B U C ) ;3. There exists a multiset, the null multiset ǫ =( ∅ , g : x → ) , such that A U ǫ = A .Regarding commutativity, A ] B = (( A ∪ B ) , f + f )= (( A ∪ B ) , f + f )= B ] A (4)For associativity, A ] (cid:16) B ] C (cid:17) = (( A ∪ B ∪ C ) , f + ( f + f ))= (( A ∪ B ∪ C ) , ( f + f ) + f )= (cid:16) A ] B (cid:17) ] C (5)Finally, let ǫ = ( ∅ , g : x → ) be the emptymultiset, i.e. the unique multiset with an emptyunderlying set; thus Card ( ǫ ) = A ] ǫ = (( A ∪ ∅ ) , f + g )= ( A , f )= A (6)Therefore, the merge procedure described fortwo multisets can be used, being associative, as aparallel reduction operator. Moreover, being alsocommutative, the order of evaluation must not benecessarily ﬁxed (e.g., for non commutative user’sdeﬁned operators in MPI is deﬁned to be in as-cending, process rank order, beginning with pro-cess zero) but can be changed, taking advantage6 able 1: Synthetic datasets Dataset Min value Max value Distribution beta 3.04 × − Beta (

5, 1.5 ) exponential 1.19 × − Exp ( ) lognormal 1.08 × − × Lognormal (

1, 1.5 ) normal 39.7 60.5 N ( , 20000 ) uniform 2.18 × − × Unif (

5, 10 ) of commutativity and associativity. Moreover, theﬁnal sketch obtained by the parallel reduction op-erator is also subject to the same error bound of theinput sketches.

5. Experimental Results

In this Section, we present and discuss the re-sults of the experiments carried out for both UD-DS

KETCH and DDS

KETCH . The aim is twofold: i)we aim at showing that the accuracy does not de-crease when executing the algorithm in parallel; ii)the running time of UDDS

KETCH is similar to thatof DDS

KETCH .Both algorithms have been implemented in C++.The tests have been executed on two supercomput-ers: Marconi100 (at CINECA, Italy) and Zeus (atEuro Mediterranean Center on Climate Change,Foundation, Italy). Marconi100 is made of 980computing nodes equipped with 2 16-cores IBMPower9 processors, 256 GB of main memory andMellanox Inﬁniband EDR DragonFly+; the codehas been compiled with the PGI compiler pgc++version 20.9-0 with optimization level O3. Zeusis a parallel cluster made of 384 computing node,each one equipped with 2 18-cores Intel Xeon Goldprocessors, 96 GB of main memory and MellanoxInﬁniband EDR network; the code has been com-piled with the Intel compiler icpc v19.0.5 with op-timization level O3. The source code is freely avail-able for inspection and reproducibility of results .The tests have been performed on 5 syntheticdatasets, whose properties are summarized in Ta-ble 1. The experiments have been executed vary-ing the number of parallel processes and measur-ing the execution time, the q -accuracy, the ﬁnalvalue of α and the total number of collapses forboth algorithms. We recall that for UDDS KETCH the q -accuracy is equal to 0 by construction, and https://github.com/cafaro/PUDDSKETCH Table 2: Experiments parameters Parameter Set of values

Number of procs. (M100) {

32, 64, 128, 256, 512 } Number of procs. (Zeus) {

36, 72, 144, 288, 576 } Stream Lenght (M100) 16 · Stream Lenght (Zeus) 18 · User α for DDS KETCH the ﬁnal value of α is equal to itsinitial value. The stream length and the sketchsize have been kept constant for every experimentsas reported in Table 2. The results obtained onboth parallel computers are totally equivalent andshowed the same behaviours; for this reason, wereport here only the results on Marconi100.Fig. 1 reports the total number of collapses forDDS KETCH and UDDS

KETCH . As expected, wehave that DDS

KETCH performs a number of col-lapses which is about three order of magnitudegreater than those performed by UDDS

KETCH .Even though the running time for both a DDS-

KETCH and a UDDS

KETCH collapse is O ( ) , theasymptotic notation hides a bigger constant in thecase of UDDS KETCH .The parallel computation performance is shownin Fig. 2 in which we use log-log plots to rep-resent the parallel running time of DDS

KETCH and UDDS

KETCH with different input distribu-tions. The log-log plots give also a clear evidenceof the parallel scalability of the algorithms: indeed,the ideal parallel speedup is represented by curvewith slope equal to −

1. The results clearly showthat our UDDS

KETCH algorithm provides a goodparallel scalability and its parallel running time isequal to the DDS

KETCH ; only with the exponentialdistribution UDDS

KETCH is slightly slower thanthe DDS

KETCH (the difference in the executiontime is less than 5%).Moreover, UDDS

KETCH outperforms DDS-

KETCH with regard to the accuracy. Table 3 reportsthe q -accuracy and α value at the end of compu-tation; as shown, UDDS KETCH has a q -accuracyequal to 0 for every distributions, which meansthat it can provide an accurate estimation for all ofthe quantiles, with a relative error less than α ; in-stead, DDS KETCH is accurate only for those quan-tiles greater than q which, for some distributionslike the exponential and the lognormal, is greaterthan 0.99, demonstrating that the sketch size is notbig enough to guarantee a quantile estimation with7 ● ● ● ●○ ○ ○ ○ ○▲ ▲ ▲ ▲ ▲△ △ △ △ △★ ★ ★ ★ ★

32 64 128 256 5120500100015002000250030003500

Number of Processes

C(cid:0)(cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:7) ● NORMAL ○ LOGNORMAL ▲ UNIFORM △ EXPONENTIAL ★ BETA

Number of Collapses - DDSketch (a) ● ● ● ● ●○ ○ ○ ○ ○▲ ▲ ▲ ▲ ▲△ △ △ △ △★ ★ ★ ★ ★

32 64 128 256 512012345

Number of Processes C o ll ap s e s ● NORMAL ○ LOGNORMAL ▲ UNIFORM △ EXPONENTIAL ★ BETA

Number of Collapses - UDDSketch (b)Figure 1: Number of sketch collapses varying the input distribution.Table 3: Accuracy

DDS

KETCH

UDDS

KETCH

Dataset q -accuracy α q -accuracy α beta 0.798 0.001 0 0.019exponential 0.998 0.001 0 0.031lognormal 0.999 0.001 0 0.031normal 0 0.001 0 0.001uniform 0.360 0.001 0 0.016 an error less than α = KETCH al-gorithm, instead, is self adaptive and consistentlymakes good use of the available space: for the ex-ponential and the lognormal distributions it usesa greater value for α to guarantee a quantile es-timation along all of the quantiles range with anerror as small as possible using the sketch size de-ﬁned by the user. Therefore, the results conﬁrmthat the parallel version of the UDDS KETCH algo-rithm outperforms DDS

KETCH with regard to theaccuracy, and that simultaneously it exhibits goodparallel scalability and a running time comparablewith DDS

KETCH .

6. Conclusions

In this paper we have introduced a parallelversion of the UDDS

KETCH algorithm for accu- rate quantile tracking and analysis, suitable formessage-passing based architectures. The algo-rithm allows compressing and fusing big volumedata streams (or big data) retaining the error andsize guarantees provided by the sequential UD-DS

KETCH algorithm. We have formally proved itscorrectness and compared it to a parallel versionof DDS

KETCH . The extensive experimental resultsconﬁrm the validity of our approach, since our al-gorithm almost always outperforms the parallelDDS

KETCH algorithm with regard to the overallaccuracy in determining the quantiles, providingsimultaneously a good parallel scalability.

Acknowledgments

The authors would like to thank CINECA forgranting the access to the Marconi M100 super-computer machine through grant IsC80 PDQAHP10CZD477, and Euro Mediterranean Center onClimate Change, Foundation, Italy for granting theaccess to the Zeus supercomputer machine.8 ● ● ● ●○ ○ ○ ○ ○

32 64 128 256 51220050010002000