[PDF] A Case for Partitioned Bloom Filters

Abstract

In a partitioned Bloom Filter the m bit vector is split into k disjoint m/k sized parts, one per hash function. Contrary to hardware designs, where they prevail, software implementations mostly adopt standard Bloom filters, considering partitioned filters slightly worse, due to the slightly larger false positive rate (FPR). In this paper, by performing an in-depth analysis, first we show that the FPR advantage of standard Bloom filters is smaller than thought; more importantly, by studying the per-element FPR, we show that standard Bloom filters have weak spots in the domain: elements which will be tested as false positives much more frequently than expected. This is relevant in scenarios where an element is tested against many filters, e.g., in packet forwarding. Moreover, standard Bloom filters are prone to exhibit extremely weak spots if naive double hashing is used, something occurring in several, even mainstream, libraries. Partitioned Bloom filters exhibit a uniform distribution of the FPR over the domain and are robust to the naive use of double hashing, having no weak spots. Finally, by surveying several usages other than testing set membership, we point out the many advantages of having disjoint parts: they can be individually sampled, extracted, added or retired, leading to superior designs for, e.g., SIMD usage, size reduction, test of set disjointness, or duplicate detection in streams. Partitioned Bloom filters are better, and should replace the standard form, both in general purpose libraries and as the base for novel designs.

Full PDF

AA Case for Partitioned Bloom Filters

Paulo Sérgio AlmeidaINESC TEC and University of Minho

Abstract

In a partitioned Bloom Filter the m bit vector is split into k dis-joint m/k sized parts, one per hash function. Contrary to hardware de-signs, where they prevail, software implementations mostly adopt stan-dard Bloom ﬁlters, considering partitioned ﬁlters slightly worse, due tothe slightly larger false positive rate (FPR). In this paper, by performingan in-depth analysis, ﬁrst we show that the FPR advantage of standardBloom ﬁlters is smaller than thought; more importantly, by studying theper-element FPR, we show that standard Bloom ﬁlters have weak spots in the domain: elements which will be tested as false positives much morefrequently than expected. This is relevant in scenarios where an elementis tested against many ﬁlters, e.g., in packet forwarding. Moreover, stan-dard Bloom ﬁlters are prone to exhibit extremely weak spots if naivedouble hashing is used, something occurring in several, even mainstream,libraries. Partitioned Bloom ﬁlters exhibit a uniform distribution of theFPR over the domain and are robust to the naive use of double hashing,having no weak spots. Finally, by surveying several usages other thantesting set membership, we point out the many advantages of having dis-joint parts: they can be individually sampled, extracted, added or retired,leading to superior designs for, e.g., SIMD usage, size reduction, test of setdisjointness, or duplicate detection in streams. Partitioned Bloom ﬁltersare better, and should replace the standard form, both in general purposelibraries and as the base for novel designs. A Bloom ﬁlter [3] is a probabilistic data structure to represent a set in a compactway. An element which has been inserted will always be reported as present; anelement not in the set may erroneously be reported as present (i.e., false positivesmay arise), but the Bloom ﬁlter may be conﬁgured such that the probability offalse positives may be as low as desired. Bloom ﬁlters are used in many settings,such as networking [7] and distributed systems [33].A standard Bloom ﬁlter is a single array of m bits over which k independenthash functions range. When inserting an element, each of the k functions isused to produce an index, and the corresponding bit is set. When querying,an element is considered present if all bits in the positions given by the k hashfunctions are set. 1 a r X i v : . [ c s . D S ] S e p variant, partitioned Bloom ﬁlters, proposed by Mullin [25], divides thearray into k disjoint parts of size m/k (assuming m multiple of k ). Each ofthe k hash functions ranges over m/k , being used to set or test a bit in thecorresponding part. The more obvious feature in partitioned Bloom ﬁlters isthe complete independence of each of the k parts and of each corresponding bitsetting/testing. This has some obvious advantages, such as parallel access toeach part, which has made partitioned Bloom ﬁlters widely adopted in hardwareimplementations, such as in [9, 31], where they are sometimes called parallelBloom signatures .A hybrid variant divides the ﬁlter in k/h parts, with h hash functions perpart, such as a hardware implementation in [12], where k/h independent multi-port memory cores, each allowing h accesses per cycle is used. For hardwaredesigns, an important consideration [31] is that using single-port SRAM, for thepartitioned scheme, requires much less area than using k -ported SRAM for thestandard scheme, or h -ported SRAM for the hybrid scheme, because the sizeof an SRAM cell increases quadratically with the number of ports. This settlesthe standard-versus-partitioned choice for hardware designs, leading them toopt for the partitioned variant.Concerning software implementations, standard Bloom ﬁlters prevail. Thegeneral feeling towards partitioned Bloom ﬁlters is that they are almost thesame as standard ones, but produce slightly worse false positive rates, speciallyin small Bloom ﬁlters. This comes from the observation [21] that partitionedBloom ﬁlters will have slightly more bits set than standard ones, and this slightlyhigher ﬁll ratio (proportion of set bits) will result in a correspondingly higherfalse positive rate.As we will demonstrate in this paper, the issue is more subtle, and this slightadvantage comes at a substantial cost, including in the false positive rate itself.The main contributions of this paper are: • Perform an in-depth analysis of the false positive rate in Bloom ﬁlterswhere we: provide a simpler explanation, compared with current litera-ture, of why the standard formula is a strict lower bound of the true falsepositive rate; address the eﬀect due to diﬀerent hash functions collidingfor a given element; obtain for the ﬁrst time an exact formula for the per-element false positive rate, i.e., the expected false positive rate, foreach speciﬁc element of the domain, over the range of ﬁlters that do notcontain it. • Point out the consequences for standard Bloom ﬁlters of the above hashcollision problem, namely the occurrence of weak spots in the domain:elements which will be tested as false positives much more frequently thanexpected. This can be a problem both for standard small capacity Bloomﬁlters, or for blocked Bloom ﬁlters [29], and its unexpectedly frequentoccurrence be as surprising as the

Birthday Problem [8]. • Expose pitfalls when using

Double Hashing with standard Bloom ﬁlters,of which many widespread libraries seem to be unaware oﬀ, and contrast2 x y

Figure 1: Standard Bloom ﬁlter using 4 hash functions.it with the robustness of partitioned Bloom ﬁlters in this matter. • Survey usages for Bloom ﬁlters other than testing set membership, iden-tifying many advantages that result from having disjoint parts that canbe individually sampled, extracted, added or retired. We identify how thepartitioned scheme leads to superior designs for SIMD techniques, testingset disjointness, reducing ﬁlter size, and duplicate detection in streams.

While most Bloom ﬁlters are used to represent large sets, in some scenariossmall Bloom ﬁlters are used. If a small false positive rate is also wanted, thecombination of a small m and a (relatively) large k will cause, for a standardBloom ﬁlter, a non-negligible probability that two or more of the k hash func-tions (applied to a given element) collide (produce the same index). Such acollision is illustrated in Figure 1, in yellow, where two of the 4 hash functionsapplied to y produce the same index, resulting in a total of three bits beingset for y , instead of the expected 4 bits. Such intra-element hash collisions arenot normally illustrated (or discussed) in Bloom ﬁlter presentations, which justfocus on inter-element collisions, such as the one between x and y , in red.In fact the surprisingly high probability of intra-element hash collisions isprecisely an instance of the Birthday Problem , stated in 1927 by H. Davenport ,as described in [8]. The probability that, for a given element, two or more ofthe k independent hash functions return the same value is: − P ( m, k ) m k , (1)where P ( m, k ) denotes the k -permutations of m . We now give some examples. Sets of words in small strings

Mullin [26] used Bloom ﬁlters to store sets ofwords occurring in strings (e.g., titles and authors of articles), typically up to 15words per string, with ﬁlters ranging from 32 up to 256 bits, the most commonone being 96 bits, and using 8 hash functions per ﬁlter. With m = 96 and k = 8 two or more hash function will collide in one out of four cases (25.88%),where the false positive error will be at least twice the expected from the classic But frequently misattributed to von Mises, who stated a similar but diﬀerent version ofthe problem. Some archaeology about its origin can be found at [2].

11 11 0 00 0 0 00 0 0 00 0 0 00 00 0 0 00 0 0 00 00 00 00 0 0 0 00 00 00 0 0 00 00 x y

Figure 2: Partitioned Bloom ﬁlter using 4 hash functions, represented as abidimensional bit array with one row per part.formula (for ﬁlters that reached design capacity), or much higher than expected(for ﬁlters still far away from design capacity).

Packet forwarding

Whitaker and Wetherall [34] used small Bloom ﬁlters inpackets to detect possible forwarding loops in experimental routing protocols.In this case 64 bits ﬁlters were used, with “4 bits set to one”. With m = 64 and k = 4 two or more hash function will collide 9.1 percent of the time.Interestingly, and diﬀerent from the more normal usage, in this case a givenelement (node) is tested against many Bloom ﬁlters (packets), and instead ofusing k hash functions for the element, a Bloom mask with exactly 4 ones atrandom positions is computed at start time, overcoming the collision problem.

Blocked Bloom ﬁlters

One problem with Bloom ﬁlters is the spreadingof memory accesses, hurting performance. This is avoided by blocked Bloomﬁlters [29], where the ﬁlter is divided into many blocks, each block a Bloomﬁlter ﬁtting into a single cache line (e.g., 512 bits), and using an extra hashfunction to select the block. For a very high precision ﬁlter, with k = 16 and m = 512 , hash collisions will occur for 21 percent of elements, and even for amore normal setting of k = 8 , there will be collisions for 5.3 percent of elements.For an extreme performance BBF that requires a single memory access, usingword sized blocks, m = 64 , for k = 8 we have collisions 36 percent of time. So,the collision problem occurs in practice for BBFs.It should be emphasized that using blocking is the only way that Bloom ﬁl-ters can remain performance-wise competitive with dictionary-based approaches(such as Cuckoo Filters [15] or Morton Filters [6]). Therefore, the scenario ofa small Bloom ﬁlter (a block of a BBF) is important, even for “big data” usinghuge BBFs.The above mentioned hash collision possibility is not a problem in partitionedBloom ﬁlters because each of the k functions is used to set/test bits in a diﬀerentpart. While in standard Bloom ﬁlters hash collisions will lead to bit collisions(the same bit being used for diﬀerent functions), in partitioned Bloom ﬁlterssuch hash collisions will not lead to bit collisions. This is illustrated in Fig-ure 2, which shows a partitioned Bloom ﬁlter using 4 parts, represented as abidimensional bit array with one row per part. It can be seen that even if two4f the 4 hash functions applied to y produce the same value (column index),two diﬀerent bits in the ﬁlter are set.So, while for partitioned Bloom ﬁlters, exactly k distinct bits in the ﬁlter areaccessed, in standard Bloom ﬁlters up to k distinct bits are accessed (most times k bits, but sometimes less than k bits). As we will see, this makes the standardfalse positive formula incorrect, producing a value lower than the actual one,and complicating the exact false positive calculation (something that has beenaddressed before) but it also produces a non-uniform distribution of the falsepositive rate, with the occurrence of weak spots in the domain, something thatwe address here for the ﬁrst time.Interestingly, in the original proposal by Bloom exactly k bits are set/tested.From [3]: “each message in the set to be stored is hash coded into a number ofdistinct bit addresses” and “where d is the number of distinct bits set to 1 foreach message in the given set”. The original formula for false positive rate isconsistent with this behavior. This fact seems to have been mostly ignored inthe literature, one notable exception being [23] “In [Bl70], the assumption wasthat the k locations are chosen without repetitions; it is also possible to allowrepetitions, which makes the program simpler” and more recently [17], whichcompares the original proposal with standard Bloom ﬁlters.The original Bloom proposal is not practical, as it demands some extra eﬀortto ensure exactly k distinct addresses, e.g., iterating over an unbounded familyof hash functions until k diﬀerent values have been produced (with the need tocompare each new value to all the previous ones); or a way to directly producea pseudo-random k -permutation of m , keyed by the element. And even if littlecost seems to be required [30], practitioners typically would not be aware of theproblem or solution, and would not bother to address such minutiae. So, it isnot surprising that what became adopted as standard Bloom ﬁlters diﬀers fromthe original proposal.Partitioned Bloom ﬁlters, which diﬀer both from the original and the stan-dard ones, not only are immune to the birthday problem (being in a sense morein the spirit of the original proposal) but are also practical to implement. We now do a theoretical analysis of the false positive rate, revisiting the Bloom’sanalysis, the standard analysis, existing improvements to the standard analysisproducing a correct formula, the formula for partitioned Bloom ﬁlters, and com-pare standard with partitioned Bloom ﬁlters. In the next section we present anovel per-element false positive analysis, showing how the expected false positivebehaves for diﬀerent elements in the domain.5 .1 Original Bloom’s analysis

Bloom’s analysis [3] states that the probability of a bit still being zero after n elements are added is (cid:18) − km (cid:19) n , (2)which, contrary to what sometimes is said, is correct, but for the original Bloomproposal where exactly k distinct bits are set, and that the false positive rateis: (cid:18) − (cid:18) − km (cid:19) n (cid:19) k . (3)The analysis is almost correct, but it suﬀers from the same problem as thestandard analysis below. But it is irrelevant for standard Bloom ﬁlters used inpractice, as they diﬀer from the original Bloom proposal. The standard analysis, by Mullin [25], and widely used, states that the proba-bility of a bit still being zero after n elements are added is (cid:18) − m (cid:19) kn , (4)which is correct, and that the false positive rate is F a ( n, m, k ) = (cid:32) − (cid:18) − m (cid:19) kn (cid:33) k . (5)which is only approximate, as we discuss below. There is one problem with the standard analysis, which has already been de-tected and corrected before. The standard analysis derives the false positiverate only as function of the mean ﬁll ratio p , as p k . Even though this gives avery good approximation for large Bloom ﬁlters, given the high concentrationof the ﬁll ratio around its mean [24], it is not an exact formula.Exact formulas for standard Bloom ﬁlters were developed [5, 11], by derivingthe probability distribution of the ﬁll ratio and weighing the false positive rateincurred by each concrete ﬁll ratio with the probability of it occurring. A similarresult had already been derived in [23], for a Bloom ﬁlter variant divided in pages(essentially, a blocked Bloom ﬁlter with typically large blocks), and a formulafor the original Bloom ﬁlters was derived more recently in [17].6 simpler strict lower bound argument The standard formula, in Equa-tion 5, has also been proven to be a strict lower bound for the true false positiverate in [5] using considerations of conditional probability, and to be a lowerbound in [11] by resorting to Hölder’s inequality [18]. We now present a simplerand more elegant reasoning of why it is a strict lower bound. It results froma direct application of Jensen’s inequality [20]: for a convex function, such as f ( x ) = x k when k > and x > , and for a non-constant random variable R ,such as the ﬁll ratio, f ( E [ R ]) < E [ f ( R )] . (6)This means that, for k > , raising the expected ﬁll ratio to the power of k , asdone in the standard formula, produces a value always smaller than the expectedvalue of the ﬁll ratio raised to the power of k , which is what gives the exactaverage false positive rate.As presented by the above mentioned works, computing the ﬁll ratio distri-bution is an instance of the well known balls into bins experiment. It can becomputed by resorting to the number of surjective functions from an n -set to an i -set, e ni [16], that can be directly derived using the inclusion-exclusion principle(in the complementary form) as: e ni = i (cid:88) j =0 ( − j (cid:18) ij (cid:19) ( i − j ) n . (7)The probability B ( n, m, i ) of having exactly i non-empty bins, after throwing n balls randomly into m bins is then: B ( n, m, i ) = (cid:0) mi (cid:1) e ni m n . (8)The probability of having exactly i bits set after inserting n elements intoan m sized standard Bloom ﬁlter using k hash functions is then: S ( n, m, k, i ) = B ( nk, m, i ) (9)The false positive rate for a standard Bloom ﬁlter is then: F s ( n, m, k ) = m (cid:88) i =1 S ( n, m, k, i ) (cid:18) im (cid:19) k , (10) As the k parts are independently set/tested, the expected false positive rate isthe product of the individual expected rates, and so computed as the one foreach part to the power of k . For each part, the standard formula, with k = 1 ,gives the exact part false positive rate, as the inequality in Equation 6 becomesan equality when k = 1 . So, for a partitioned Bloom ﬁlter of size m , made up7able 1: Comparison between partitioned and standard Bloom ﬁlters false posi-tive rates, for diﬀerent combinations of m and k , for ﬁlters at nominal occupation( n = mk ln 2 ), showing both the approximate ( F a ) and the exact ( F s ) values forstandard ﬁlters, the value for partitioned ﬁlters ( F p ) and the ratio F p /F s . m k F a F s F p F p /F s

64 4 0.06244514 0.06423247 0.06676410 1.039413608 0.00227672 0.00260362 0.00316870 1.21703762512 4 0.06126247 0.06148344 0.06176528 1.004584118 0.00375309 0.00381650 0.00389940 1.0217209716 0.00001409 0.00001513 0.00001661 1.097834754096 4 0.06233016 0.06235819 0.06239353 1.000566768 0.00385474 0.00386284 0.00387308 1.0026509416 0.00001486 0.00001499 0.00001516 1.01143019of k parts, each m/k bits, the exact false positive rate when n elements wereinserted is given by: F p ( n, m, k ) = (cid:18) − (cid:18) − km (cid:19) n (cid:19) k . (11)which is much simpler than the exact formula for standard Bloom ﬁlters (aswell as the one for original Bloom ﬁlters, described in [17]). Interestingly, itcoincides with Bloom’s formula for his original proposal, while being exact.This formula simplicity results from the conceptual simplicity: a partitionedBloom ﬁlter can be seen as an AND of k independent single-hash ﬁlters, all usedfor insertions. It also translates to a simplicity of presentation, which is better,pedagogically, than standard Bloom ﬁlters, as it allows deriving a more complex(composite) concept in terms of a simpler one (each part). Common folklore is that partitioned Bloom ﬁlters are not worth over standardones, e.g., in [21] “partitioned ﬁlters tend to have more 1’s than nonpartitionedﬁlters, resulting in larger false positive probabilities”. But hash collisions, eventhough decreasing the ﬁll ratio, increase the false positives for elements suﬀeringthe collision, and so the question is more subtle. Using the exact formulas foreach case, Table 1 shows how partitioned and standard Bloom ﬁlters compare,namely the ratio of false positives F p /F s , for some combinations of m and k forﬁlters at full capacity with n = mk ln 2 .It can be seen that although partitioned ﬁlters have indeed slightly morefalse positives, the diﬀerence is less than what the standard formula ( F a ) would8able 2: Ratio between partitioned and standard Bloom ﬁlters false positiverates, F p /F s , for diﬀerent combinations of m , k , and occupation (fraction of thenominal capacity n = mk ln 2 ). occupation m k k = 8 , with 22% higher false positive rate, buteven blocked Bloom ﬁlters normally aim for blocks of cache line size ( m = 512 ).Table 2 shows the ratio of false positives F p /F s for ﬁlters at diﬀerent occu-pations (namely / , / , and / ) relative to the nominal capacity. The ratioincreases somewhat for word sized ﬁlters and small occupations, but those occu-pations for those ﬁlters are degenerate cases, with just a few elements inserted,and negligible false positive rates, whether for standard or partitioned ﬁlters.So, the average false positive rate is not relevant for making a choice betweenstandard versus partitioned Bloom ﬁlters. But as we discuss next, a more rele-vant issue is the distribution of false positives over the elements in the domainsubject to being tested. There are two ways that Bloom ﬁlters can be used, and two diﬀerent points ofview regarding false positives:1. Filter point of view: having a ﬁlter, in which elements were inserted alongtime, test new elements using the ﬁlter.2. Element point of view: for a speciﬁc element, test it against many diﬀerentﬁlters that show up, to see if the element is present in them.The ﬁrst usage is the more normal, for which we want to know the globalaverage false positive rate. The second usage corresponds to the packet forward-ing scenario, where at each node (representing an element) many diﬀerent ﬁlters9rrive (each one representing a path that a packet took to reach the node). Forthis second usage we want to know, for each speciﬁc element in the domain, theaverage false positive rate over all possible ﬁlters (considering some ﬁxed k , m ,and n ) that do not include the element. Particularly relevant is the question ofwhether this per-element rate is the same for all elements (the global average)or whether it is non-uniform, varying for diﬀerent elements.For partitioned Bloom ﬁlters, with k independent parts, accessed by k inde-pendent hash functions, the per-element false positive rate is the same for allelements, and equal to the global average. But for standard Bloom ﬁlters, thepossibility of hash collisions makes some elements have less than k independentbits to test. We have thus a non-uniform distribution of false positives: for agiven element having d < k diﬀerent bit positions to test, the average false posi-tive rate will be higher than for those elements for which no collisions occurred.Elements suﬀering collisions are then weak spots in the domain: they will beconsidered more often than expected as belonging to ﬁlters against which theyare tested. As we will see, for elements suﬀering several hash function collisions,the false positive rate can be more than one order of magnitude larger thanexpected. We now derive an exact formula for the per-element false positiverate. Consider a speciﬁc element e of the domain, having d diﬀerent bit positionsresulting from the k independent hash functions, where d ≤ k . We want to knowthe average false positive rate F s ( n, m, k, d ) when e is tested against standardBloom ﬁlters of size m where a set of n elements not containing e was inserted.A ﬁrst observation is that the per-element rate cannot be obtained by simplygoing to the exact formula in Equation 10, where the ﬁll ratio is raised to thepower of k , and replace ( i/m ) k with ( i/m ) d , i.e., F s ( n, m, k, d ) (cid:54) = m (cid:88) i =1 S ( n, m, k, i ) (cid:18) im (cid:19) d . (12)The reason is that by saying that there are d diﬀerent positions, they arenot independent, and we cannot use the independent testing assumption as forthe k positions. This can be seen by a simple example of a ﬁlter with k = 2 , m = 2 , n = 1 , and computing the false positive for elements with d = 2 diﬀerentbits. When considering the case i = 1 , i.e., one bit set in the ﬁlter, being the ﬁllratio / , for d = 2 there is no possibility of a false positive, while using ( i/m ) d would give the erroneous (1 / = 1 / .The correct formula for the probability of d diﬀerent bits being set when i of the m bits in the ﬁlter are set is: d − (cid:89) j =0 i − jm − j , (13)10able 3: Ratio between per-element and global false positive rate for standardBloom ﬁlters, F s ( n, m, k, d ) /F s ( n, m, k ) , for diﬀerent combinations of m , k , andhash collisions c = k − d , for ﬁlters at diﬀerent occupations.collisionsoccupation m k d positions is one of the i bits set, the second is one of theremaining i − , the third one of the remaining i − and so on. The probabilityis zero for d > i .The correct formula for the per-element false positive rate is then obtainedby averaging over the diﬀerent possible numbers of bits set, weighted by theirprobability of occurring, as before, resulting in: F s ( n, m, k, d ) = m (cid:88) i =1 S ( n, m, k, i ) d − (cid:89) j =0 i − jm − j , (14)Table 3 shows how the per-element false positive rate compares with the(global) average false positive rate, showing the ratio F s ( n, m, k, d ) /F s ( n, m, k ) for diﬀerent numbers of hash collision c = k − d , from no collision ( d = k ) upto three collisions ( d = k − ), for ﬁlters at diﬀerent occupations (ratios relativeto nominal capacity n = mk ln 2 ).It can be seen that the false positive rate increases noticeably with thenumber of hash collision that occur for the element being tested, in relationto the global average rate for the ﬁlter. This eﬀect is more prevalent for smalloccupations, with false positive rates reaching two orders of magnitude largerthan the global average for / occupation and three collisions. This may causesurprises in scenarios where a ﬁlter is dimensioned with some expectations about11able 4: Probability of having some hash collision(s) and of having exactly ≤ c ≤ hash collisions, for some combinations of k and m .collisions m k some 0 1 2 364 4 0.0911 0.9089 0.0894 0.0017 0.00008 0.3660 0.6340 0.3115 0.0510 0.0034512 8 0.0535 0.9465 0.0525 0.0010 0.000016 0.2108 0.7892 0.1905 0.0192 0.0011the false positives rate over its lifetime, from empty to full. Some elementswill incur much more false positives than what planned for, if using either thestandard or exact formula for the global average. The question of how frequent are those weak spots in the domain, speciallythe “very weak” spots having more than one hash collision is easily answered.The probability of an element being a weak spot is an instance of the birthdayproblem, as discussed above, with value given by Equation 1. For an m sizedBloom ﬁlter, the probability of the k hashes resulting in d diﬀerent bits (i.e., c = k − d collisions) is an instance of the balls into bins experiment, with value B ( k, m, d ) as given by Equation 8.Table 4 shows the probability of having some (one or more) hash collisions,and of having exactly ≤ c ≤ collisions, for some combinations of k and m .It can be seen that collisions happen frequently not only in word sized ﬁlters(36% of elements for m = 64 and k = 8 ) but also for the important case of cacheline sized ( m = 512 ) blocks in blocked Bloom ﬁlters, reaching 21% for veryhigh accuracy ( K = 16 ) ﬁlters. Two collisions can happen with non-negligiblefrequency, in 5 percent of elements for the word sized ﬁlters with k = 8 , orin two percent of elements in the ( m = 512 , k = 16 ) case. And while threecollisions is indeed very rare, 3 in a thousand for the ( m = 64 , k = 8 ) ﬁlter orone in a thousand for the ( m = 512 , k = 16 ) ﬁlter, this is no consolation whenthose “unlucky” elements are subject to being tested against many ﬁlters. One technique used to improve performance, by avoiding the need to compute k hash functions, is to resort to double hashing , which amounts to using twohash functions { h , h } , to simulate k hash functions. In the more naive form12igure 3: Eﬀects of double hashing when inserting an element x in a standard(left) versus partitioned (right) Bloom ﬁlter, when b = h ( x ) is 0, / , or / the size of the vector being indexed (ﬁlter or part).it amounts to computing g , . . . , g k − as: g i ( x ) = h ( x ) + ih ( x ) mod m The ﬁrst time that double hashing was applied to Bloom ﬁlters seems to havebeen by Dillinger and Manolios [14], for model checking. It was popularized afterMitzenmacher [21] showed that it could be used to implement a Bloom ﬁlterwithout any loss in the asymptotic false positive probability, and experimentallyvalidating it for medium sized Bloom ﬁlters, starting with m = 10000 bits.However, small Bloom ﬁlters were not considered (e.g., a 512 bits block in aBBF) and, as usual, only the global false positive rate was considered.Here we address small ﬁlters and the possibility of a non-uniform distributionof false positives, with weak spots in the domain. We show that indeed, standardBloom ﬁlters, but not partitioned ones, are prone to even more problematicweak spots caused by the use of double hashing. Although more sophisticatedvariants, like enhanced double hashing or triple hashing have been proposed,naive doubling hashing in particular has become relatively popular, and canbe found in many Bloom ﬁlter implementations. Therefore, these issues havepractical consequences.Dillinger’s PhD dissertation [13], which includes a detailed study of diﬀerentforms of double and triple hashing, already recognized the existence of pitfalls,specially in naive double hashing. It identiﬁed three issues, which we now showthat only aﬀect standard, but not partitioned, Bloom ﬁlters. Issue 1:

Some possibilities for b = h ( x ) can result in many repetitions of thesame index. The worse case would be if b = 0 (mod m ), in which case all indiceswould be the same, but the existence of common factors between b and m alsocauses problems. Figure 3 shows some examples, with b = 0 , b = m/ and b = m/ . On the left, for standard Bloom ﬁlters, there is overwhelming indexcollision, which causes bit collisions, resulting in very weak spots. In a BBF13igure 4: Full overlap between x and y when using double hashing in a standardBloom ﬁlter, when h ( y ) = h ( x ) + ( k − h ( x ) mod m and h ( y ) = m − h ( x )mod m (left), and the lack of such overlap in a partitioned Bloom ﬁlter (right).Figure 5: Partial overlap (yellow) between x (green) and y (blue) when usingdouble hashing in a standard Bloom ﬁlter, when h ( x ) = h ( y ) mod m (left),and the lack of such overlap in a partitioned Bloom ﬁlter (right).with 512 bit blocks, one out of 512 elements in the domain will have a singlebit set/tested, resulting in a disastrous / probability of them being testedas a false positive in ﬁlters at nominal capacity ( / ﬁll ratio). Then, one out512 elements / probability, and so on. For partitioned Bloom ﬁlters, indexcollisions do not cause bit collisions, resulting always in k bits being set/tested. Issue 2:

The indices generated by double hashing, used to index a standardBloom ﬁlter are treated as a set, not a sequence, and we can compute thesame set going “forward” or going “backward”. Two elements x and y , canhave a full overlap of the k bits without both h and h colliding, if h ( y ) = h ( x ) + ( k − h ( x ) mod m and h ( y ) = m − h ( x ) mod m . For a partitionedBloom ﬁlter, such overlap does not occur, as the diﬀerent parts are indexed inorder, and so we have eﬀectively a sequence of indices. Figure 4 illustrates thefull overlap between x and y , for a standard Bloom ﬁlter and the absence ofoverlap in a partitioned Bloom ﬁlter. Issue 3:

Using double hashing in a standard Bloom ﬁlter is prone to partialoverlapping of the k indices, namely when h ( x ) = h ( y ) mod m . This isillustrated in Figure 5. In the same ﬁgure, it can be seen that in partitionedBloom ﬁlters such overlap does not occur.14tandard Bloom ﬁlters are thus subject to these anomalies, the more seriousbeing the possibility of extreme weak spots, if naive double hashing is used. Intheory, Issue 1 (which causes weak spots) is easy to overcome, by ensuring thereare no collisions, e.g., in the popular case when m is a power of two by restricting b = h ( x ) to produce odd numbers. In practice, implementers have been soldthe idea that double hashing can be used harmlessly, and commonly do nottake precautions, namely when the ﬁlter is parameterized, being m arbitraryand possibly small. This occurs even in mainstream libraries, such as in GoogleCore Libraries for Java [1]. Partitioned Bloom ﬁlters have the advantage of notbeing subject to such weak spots, and thus are robust to naive double hashingimplementations.It should be noted that if Issue 1 is addressed, the impact of double hashingon the global false positive rate is larger for partitioned Bloom ﬁlters than forthe standard ones. This impact comes from the probability of the pair of indicesfor one element colliding with the pair from another element, i.e., h ( x ) = h ( y ) and h ( x ) = h ( y ) (modulo vector size). Between two elements it is /m forstandard Bloom ﬁlters and / ( m/k ) for partitioned.In practice, for large Bloom ﬁlters the contribution of double hashing for theglobal false positive rate is negligible, unless high accuracy ﬁlters are wanted,in which case care must be taken and triple hashing may be needed. For smallﬁlters, or in general when BBFs are used, neither double nor triple hashingshould be used, as only a few bits per index are needed, and a single hash wordcan be split to obtain the k indices. Concretely, in a BBF with 512 bit blocksand k = 8 , we need 9 bits per index for standard and 6 bits per index forpartitioned ﬁlters. This means that a partitioned scheme needs ∗ bitsper block and a single 64 bit hash word is enough for ﬁlters up to − = 65536 blocks, i.e., = 33554432 bits, while if standard ﬁlters are used ∗ bitsper block are needed and a 64 bit hash word is not enough even for small ﬁlters.This reinforces the superiority of partitioned Bloom ﬁlters over standard ones. Regardless of the false positive rate itself, the disjointness of the parts in apartitioned Bloom ﬁlter provides several advantages over standard ﬁlters, eitherin terms of obtaining fast implementations or making the partitioned schememore ﬂexible to be used in more scenarios, or as the base for further extensions.Each disjoint part can be sampled, extracted, added, or retired individually,leading to interesting outcomes. We conclude our case by surveying some ofthese usages and advantages.

In addition to improving memory accesses, through blocked Bloom ﬁlters, an-other way to improve performance is to use Single Instruction Multiple Data(SIMD) processor extensions, to test multiple bits in a single processor cycle.15owever, standard Bloom ﬁlters are not directly suitable to SIMD, because the k bits are spread over memory, needing an extra gather step to collect and placethem appropriately, causing some slowdown.A sophisticated SIMD approach [28] for standard Bloom ﬁlters uses pre-cisely gather instructions to collect bits spread over memory. It achieves higherthroughput, by testing diﬀerent hashes of diﬀerent elements at each step, butnot lower latency of individual query operations.Even using BBFs based on standard Bloom ﬁlter blocks is not directly suit-able to SIMD, because the k bits are not placed over independent disjoint partsof the cache line (e.g., words) to be used together as a vector register. Whenintroducing BBFs the authors already discussed SIMD usage, and to overcomethis problem they propose using a table of k bit block-sized patterns. However,to avoid collisions between elements when indexing, the table cannot be toosmall, competing for cache usage.Partitioned Bloom ﬁlters are more directly suitable to SIMD. A blockedBloom ﬁlter using the partitioned scheme, with cache-line sized blocks and wordsized parts is perfect for SIMD, and arises as the natural combination of blockingand partitioning. This is precisely what Ultra-Fast Bloom Filters [22] haverecently proposed. We may conjecture that, had partitioned Bloom ﬁlters beenthe norm at the time when BBFs were introduced, this combination would haveappeared one decade earlier.

Bloom ﬁlters can also be used for set union and intersection. Unlike for union(bitwise or) which is exact, intersection of ﬁlters (bitwise and) over-representsthe ﬁlter for the intersection: given sets S and S , we have F ( S ) ∧ F ( S ) ≥ F ( S ∩ S ) . In addition to testing for the presence of some element, an importantuse case is testing for set disjointness, i.e., that the intersection is an empty set.An example is checking whether two set of addresses, representing a read-setand a write-set are disjoint, when implementing transactional memory .Using standard Bloom ﬁlters, being sure that the sets are disjoint is onlypossible when the resulting ﬁlter intersection is completely empty (all zeroes).Having less than k one bits is not enough, due to weak spots. As alreadynoticed [19], even if the intersection result had a single bit it could be (even ifextremely unlikely) due to an element, present in both sets, having the k hashfunctions collide.Partitioned Bloom ﬁlters are much better suited for testing set disjoint-ness, as it is enough that one of the k parts of the ﬁlter intersection is emptyto conclude that the set intersection is empty. This was already exploited [9]for Speculative Multithreading. A comparison of set disjoitness testing con-cluded [19] that the probability of false set-overlap reporting was substantiallysmaller for partitioned Bloom ﬁlters than standard Bloom ﬁlters. This probabil-ity, for standard ( P s ) and partitioned ( P p ) m sized ﬁlters with k hash functions,16epresenting sets with n and n elements, compares as: P s = 1 − (cid:18) − m (cid:19) k n n > − (cid:18) − km (cid:19) n n > (cid:18) − (cid:18) − km (cid:19) n n (cid:19) k = P p . This is intuitively easy to understand: the probability of a false set-overlapfor a standard m sized ﬁlter, due to some of the k ∗ n ∗ k ∗ n pairs of indicescolliding, is greater than the probability of such an overlap in a given m/k sized part for the partitioned scheme, which is substantially greater than theprobability that there is an overlap in each of the k parts. Sometimes it is useful to obtain a smaller sized, lower accuracy, version of aBloom ﬁlter. Either because the ﬁlter was overdimensioned and we do not needthe resulting overly high accuracy; or we want to obtain an explicitly loweraccuracy view (but enough for some purpose), e.g., to ship over the network,wanting to save bandwidth.A standard Bloom ﬁlter is not suitable for this purpose because of the min-gling of bits from diﬀerent hash functions. What can be done is to use the same k hashes, but remap the indices to a smaller m (cid:48) sized vector (preferably with m some multiple of m (cid:48) ), moving the bit in position i to i modulo m (cid:48) , and usingmodulo m (cid:48) indexing for the new ﬁlter. The problem is that the resulting ﬁllrate renders the ﬁlter, when not immediately useless, having an overly high falsepositive rate, when comparing with the optimal for the new smaller size and thesame number of elements [27].Partitioned Bloom ﬁlters are much better for this purpose. Due to thedisjointness of the k parts, we can simply consider the ﬁrst k (cid:48) parts as a smallerBloom ﬁlter, e.g., to be shipped elsewhere. For the worst case of a ﬁlter alreadyat full capacity, the new one will provide the optimal false positive rate for thenew smaller size. Considerable size reductions are viable, which would rendera standard Bloom ﬁlter useless due to the ﬁll rate approaching 1. The samepaper proposes Block-partitioned Bloom ﬁlters , composed of several blocks (eacha standard ﬁlter, with insertions in each, and using AND for queries), to be ableto extract some blocks as a new ﬁlter. It mentions that maximum size ﬂexibilityis achieved by using one hash per block, i.e., by using a partitioned Bloom ﬁlter.

Bloom ﬁlter based approaches to achieve queries over a sliding window of aninﬁnite stream tend to be space ineﬃcient. Traditionally they have been basedeither on some variation of Counting Bloom ﬁlters [4], on storing the insertiontimestamp [35], or using several disjoint segments which can be individuallyadded and retired, one example being Double Buﬀering [10]. This uses a pairof active and warm-up

Bloom ﬁlters, using the active for queries and insertingin both until the warm-up is half-full, at which point it becomes the active, theprevious active is discarded and a new empty warm-up is added.17hile with standard Bloom ﬁlters a segment must be a whole ﬁlter, par-titioned Bloom ﬁlters can be used as a base for better designs, in which eachdisjoint part can be treated as a segment.

Age-Partitioned Bloom Filters [32]use k + l (for some conﬁgurable l ) parts in a circular buﬀer, using the k more“recent” parts for insertions, discarding (zeroing) the “oldest” part after each gen-eration (batch of insertions), and testing for the presence of k adjacent matchesfor queries. This results in the currently best Bloom ﬁlter based design forquerying a sliding window over a stream. Frequently, a focus on one small diﬀerence in a quantitative aspect misses thewhole picture. Partitioned Bloom ﬁlters have thus been considered worse thanstandard, and frequently not adopted, due to having slightly more false positives.This is ironic given that the diﬀerence amounts to a negligible variation ofcapacity, for the same false positive rate.In this paper we have shown how much simpler, elegant, robust and versa-tile partitioned Bloom ﬁlters are. The simplicity of the exact formula resultsfrom the conceptual simplicity of them being essentially the AND of single-hashﬁlters. Standard Bloom ﬁlters have a more complex nature due to the possi-bility of intra-element hash collisions, with a resulting complex exact formula,normally approximated, leading sometimes to surprises.But essentially, we have shown how standard Bloom ﬁlters exhibit a non-uniform distribution of the false positive probability, with weak spots in the do-main: elements that are reported much more frequently as false positives thanexpected. This is an aspect than has been neglected from the literature. More-over, the issue of weak spots is much aggravated when naive double hashing isused. Even though easily circumventable, many libraries, including mainstreamones, suﬀer from this anomaly. The lesson seems to be that practitioners fre-quently skim over results, failing to notice subtle problems. Partitioned Bloomﬁlters have a uniform distribution of false positives over the domain, with noweak spots, even if naive double hashing is used. Moreover, the need for lesshash bits makes such schemes less warranted.Finally, going beyond set-membership test, by surveying other usages, theﬂexibility of being able to sample, extract, add or retire individual parts be-comes clear, showing the partitioned scheme to be better. Like the hardwarecommunity already did, partitioned Bloom ﬁlters should be widely adopted bysoftware implementers, replacing standard Bloom ﬁlters as the new normal.

References [1] Dimitris Andreou and Kurt Alfred Kluever. BloomFilterStrategies ingoogle core libraries for java. https://github.com/google/guava/blob/18aster/guava/src/com/google/common/hash/BloomFilterStrategies.java,2011 (accessed September 8, 2020).[2] Pat’s Blog. Who created the birthday problem, and even one moreversion. https://pballew.blogspot.com/2011/01/who-created-birthday-problem-and-even.html, 2011 (accessed May 26, 2020).[3] Burton H. Bloom. Space/time trade-oﬀs in hash coding with allowableerrors.

Communications of the ACM , 13(7):422–426, 1970.[4] Flavio Bonomi, Michael Mitzenmacher, Rina Panigrahy, Sushil Singh, andGeorge Varghese. An improved construction for counting bloom ﬁlters.In

Algorithms - ESA 2006, 14th Annual European Symposium, Zurich,Switzerland, September 11-13, 2006, Proceedings , pages 684–695, 2006.[5] Prosenjit Bose, Hua Guo, Evangelos Kranakis, Anil Maheshwari, PatMorin, Jason Morrison, Michiel Smid, and Yihui Tang. On the false-positive rate of bloom ﬁlters.

Information Processing Letters , 108(4):210 –213, 2008.[6] Alexander Breslow and Nuwan Jayasena. Morton ﬁlters: Faster, space-eﬃcient cuckoo ﬁlters via biasing, compression, and decoupled logical spar-sity.

PVLDB , 11(9):1041–1055, 2018.[7] Andrei Broder and Michael Mitzenmacher. Network applications of bloomﬁlters: A survey.

Internet mathematics , 1(4):485–509, 2004.[8] W. W. Rouse Ball. Revised by H. S. M. Coxeter.

Mathematical Recreationsand Essays . Macmillan, 11th edition, 1939.[9] Luis Ceze, James Tuck, Josep Torrellas, and Calin Cascaval. Bulk dis-ambiguation of speculative threads in multiprocessors.

ACM SIGARCHComputer Architecture News , 34(2):227–238, 2006.[10] Francis Chang, Kang Li, and Wu-chang Feng. Approximate caches forpacket classiﬁcation. In

Proceedings IEEE INFOCOM 2004, The 23rd An-nual Joint Conference of the IEEE Computer and Communications Soci-eties, Hong Kong, China, March 7-11, 2004 , pages 2196–2207, 2004.[11] Ken Christensen, Allen Roginsky, and Miguel Jimeno. A new analysis ofthe false positive rate of a bloom ﬁlter.

Information Processing Letters ,110(21):944–949, 2010.[12] Sarang Dharmapurikar, Praveen Krishnamurthy, Todd Sproull, and JohnLockwood. Deep packet inspection using parallel bloom ﬁlters. In

Highperformance interconnects, 2003. proceedings. 11th symposium on , pages44–51. IEEE, 2003.[13] Peter C. Dillinger.

Adaptive approximate state storage . PhD thesis, North-eastern University, 2010. 1914] Peter C. Dillinger and Panagiotis Manolios. Fast and accurate bitstateveriﬁcation for SPIN. In Susanne Graf and Laurent Mounier, editors,

ModelChecking Software, 11th International SPIN Workshop, Barcelona, Spain,April 1-3, 2004, Proceedings , volume 2989 of

Lecture Notes in ComputerScience , pages 57–75. Springer, 2004.[15] Bin Fan, David G. Andersen, Michael Kaminsky, and Michael Mitzen-macher. Cuckoo ﬁlter: Practically better than bloom. In

Proceedings ofthe 10th ACM International on Conference on emerging Networking Ex-periments and Technologies, CoNEXT 2014, Sydney, Australia, December2-5, 2014 , pages 75–88, 2014.[16] F Gerrish. 63.29. surjections from an m-set to an n-set.

The MathematicalGazette , 63(426):259–261, 1979.[17] Fabio Grandi. On the analysis of bloom ﬁlters.

Inf. Process. Lett. , 129:35–39, 2018.[18] O. Hölder. Ueber einen mittelwertsatz.

Nachrichten von der Königl.Gesellschaft der Wissenschaften und der Georg-Augusts-Universität zuGöttingen , (2):38–47, 1889.[19] Mark C. Jeﬀrey and J. Gregory Steﬀan. Understanding bloom ﬁlter inter-section for lazy address-set disambiguation. In Rajmohan Rajaraman andFriedhelm Meyer on the Heath, editors,

SPAA 2011: Proceedings of the23rd Annual ACM Symposium on Parallelism in Algorithms and Architec-tures, San Jose, CA, USA, June 4-6, 2011 (Co-located with FCRC 2011) ,pages 345–354. ACM, 2011.[20] J. L. W. V. Jensen. Sur les fonctions convexes et les inégalités entre lesvaleurs moyennes.

Acta mathematica , 30:175–193, 1906.[21] Adam Kirsch and Michael Mitzenmacher. Less hashing, same performance:Building a better bloom ﬁlter.

Random Struct. Algorithms , 33(2):187–218,2008.[22] Jianyuan Lu, Ying Wan, Yang Li, Chuwen Zhang, Huichen Dai, Yi Wang,Gong Zhang, and Bin Liu. Ultra-fast bloom ﬁlters using SIMD techniques.

IEEE Trans. Parallel Distrib. Syst. , 30(4):953–964, 2019.[23] Udi Manber and Sun Wu. An algorithm for approximate membership check-ing with application to password security.

Information Processing Letters ,50(4):191–197, 1994.[24] Michael Mitzenmacher. Compressed bloom ﬁlters.

IEEE/ACM Transac-tions on Networking (TON) , 10(5):604–612, 2002.[25] James K. Mullin. A second look at bloom ﬁlters.

Communications of theACM , 26(8):570–571, 1983. 2026] James K. Mullin. Accessing textual documents using compressed indexes ofarrays of small bloom ﬁlters.

The Computer Journal , 30(4):343–348, 1987.[27] Odysseas Papapetrou, Wolf Siberski, and Wolfgang Nejdl. Cardinality es-timation and dynamic length adaptation for bloom ﬁlters.

Distributed Par-allel Databases , 28(2-3):119–156, 2010.[28] Orestis Polychroniou and Kenneth A. Ross. Vectorized bloom ﬁlters foradvanced SIMD processors. In Alfons Kemper and Ippokratis Pandis, ed-itors,

Tenth International Workshop on Data Management on New Hard-ware, DaMoN 2014, Snowbird, UT, USA, June 23, 2014 , pages 6:1–6:6.ACM, 2014.[29] Felix Putze, Peter Sanders, and Johannes Singler. Cache-, hash-, and space-eﬃcient bloom ﬁlters.

ACM Journal of Experimental Algorithmics , 14,2009.[30] Charles S. Roberts. Partial-match retrieval via the method of superimposedcodes.

Proceedings of the IEEE , 67(12):1624–1642, 1979.[31] Daniel Sanchez, Luke Yen, Mark D Hill, and Karthikeyan Sankar-alingam. Implementing signatures for transactional memory. In , pages 123–133. IEEE, 2007.[32] Ariel Shtul, Carlos Baquero, and Paulo Sérgio Almeida. Age-partitionedbloom ﬁlters.

CoRR , abs/2001.03147, 2020.[33] Sasu Tarkoma, Christian Esteve Rothenberg, and Eemil Lagerspetz. The-ory and practice of bloom ﬁlters for distributed systems.

IEEE Communi-cations Surveys & Tutorials , 14(1):131–155, 2012.[34] Andrew Whitaker and David Wetherall. Forwarding without loops inicarus. In

Open Architectures and Network Programming Proceedings, 2002IEEE , pages 63–75. IEEE, 2002.[35] Linfeng Zhang and Yong Guan. Detecting click fraud in pay-per-clickstreams of online advertising networks. In28th IEEE International Confer-ence on Distributed Computing Systems (ICDCS 2008), 17-20 June 2008,Beijing, China