Linear Probing with 5-Independent Hashing
aa r X i v : . [ c s . D S ] M a y Lecture Notes on
Linear Probing with 5-Independent Hashing
Mikkel ThorupMay 12, 2017
Abstract
These lecture notes show that linear probing takes expected constant time if the hash functionis 5-independent. This result was first proved by Pagh et al. [STOC’07,SICOMP’09]. The simpleproof here is essentially taken from [Pˇatra¸scu and Thorup ICALP’10]. We will also consider asmaller space version of linear probing that may have false positives like Bloom filters.These lecture notes illustrate the use of higher moments in data structures, and could beused in a course on randomized algorithms. k -independence The concept of k -independence was introduced by Wegman and Carter [21] in FOCS’79 and hasbeen the cornerstone of our understanding of hash functions ever since. A hash function is arandom function h : [ u ] → [ t ] mapping keys to hash values . Here [ s ] = { , . . . , s − } . We can alsothink of a h as a random variable distributed over [ t ] [ u ] . We say that h is k -independent if for anydistinct keys x , . . . , x k − ∈ [ u ] and (possibly non-distinct) hash values y , . . . , y k − ∈ [ t ], we havePr[ h ( x ) = y ∧ · · · ∧ h ( x k − ) = y k − ] = 1 /t k . Equivalently, we can define k -independence via twoseparate conditions; namely,(a) for any distinct keys x , . . . , x k − ∈ [ u ], the hash values h ( x ) , . . . , h ( x k − ) are independentrandom variables, that is, for any (possibly non-distinct) hash values y , . . . , y k − ∈ [ t ] and i ∈ [ k ], Pr[ h ( x i ) = y i ] = Pr h h ( x i ) = y i | V j ∈ [ k ] \{ i } h ( x j ) = y j i , and(b) for any x ∈ [ u ], h ( x ) is uniformly distributed in [ t ].As the concept of independence is fundamental to probabilistic analysis, k -independent hash func-tions are both natural and powerful in algorithm analysis. They allow us to replace the heuristicassumption of truly random hash functions that are uniformly distributed in [ t ] [ u ] , hence needing u lg t random bits (lg = log ), with real implementable hash functions that are still “independentenough” to yield provable performance guarantees similar to those proved with true randomness.We are then left with the natural goal of understanding the independence required by algorithms.Once we have proved that k -independence suffices for a hashing-based randomized algorithm,we are free to use any k -independent hash function. The canonical construction of a k -independent1ash function is based on polynomials of degree k −
1. Let p ≥ u be prime. Picking random a , . . . , a k − ∈ [ p ] = { , . . . , p − } , the hash function is defined by: h ( x ) = (cid:0) a k − x k − + · · · + a x + a (cid:1) mod p (1)If we want to limit the range of hash values to [ t ], we use h ′ ( x ) = h ( x ) mod t . This preservesrequirement (a) of independence among k hash values. Requirement (b) of uniformity is close tosatisfied if p ≫ t . More precisely, for any key x ∈ [ p ] and hash value y ∈ [ t ], we get 1 /t − /p < Pr[ h ′ ( x ) = y ] < /t + 1 /p .Sometimes 2-independence suffices. For example, 2-independence implies so-called universality[5]; namely that the probability of two keys x and y colliding with h ( x ) = h ( y ) is 1 /t ; or closeto 1 /t if the uniformity of (b) is only approximate. Universality implies expected constant timeperformance of hash tables implemented with chaining. Universality also suffices for the 2-levelhashing of Fredman et al. [7], yielding static hash tables with constant query time.At the other end of the spectrum, when dealing with problems involving n objects, O (lg n )-independence suffices in a vast majority of applications. One reason for this is the Chernoff boundsof [18] for k -independent events, whose probability bounds differ from the full-independence Cher-noff bound by 2 − Ω( k ) . Another reason is that random graphs with O (lg n )-independent edges [2]share many of the properties of truly random graphs.The independence measure has long been central to the study of randomized algorithms. Itapplies not only to hash functions, but also to pseudo-random number generators viewed as as-signing hash values to 0 , , , . . . . For example, [10] considers variants of QuickSort, [1] considerthe maximal bucket size for hashing with chaining, and [9, 6] consider Cuckoo hashing. In severalcases [1, 6, 10], it is proved that linear transformations x (cid:0) ( ax + b ) mod p (cid:1) do not suffice forgood performance, hence that 2-independence is not in itself sufficient.Our focus in these notes is linear probing described below. Linear probing is a classic implementation of hash tables. It uses a hash function h to map adynamic set S of keys into an array T of size t > | S | . The entries of T are keys, but we can also seeif an entry is “empty”. This could be coded, either via an extra bit, or via a distinguished nil-key.We start with an empty set S and all empty locations. When inserting x , if the desired location h ( x ) ∈ [ t ] is already occupied, the algorithm scans h ( x ) + 1 , h ( x ) + 2 , . . . , t − , , , . . . until anempty location is found, and places x there. Below, for simplicity, we ignore the wrap-around from t − x is always placed in a location i ≥ h ( x ).To search a key x , the query algorithm starts at h ( x ) and scans either until it finds x , or runsinto an empty position, which certifies that x is not in the hash table. When the query search isunsuccessful, that is, when x is not stored, the query algorithm scans exactly the same locationsas an insert of x . A general bound on the query time is hence also a bound on the insertion time.Deletions are slightly more complicated. The invariant we want to preserve is that if a key x isstored at some location i ∈ [ t ], then all locations from h ( x ) to i are filled; for otherwise the abovesearch would not get to x . Suppose now that x is deleted from location i . We then scan locations j = i + 1 , i + 2 , . . . for a key y with h ( y ) ≤ i . If such a y is found at location j , we move y tolocation i , but then, recursively, we have to try refilling j , looking for a later key z with h ( z ) ≤ j .The deletion process terminates when we reach an empty location d , for then the invariant says2hat there cannot be a key y at a location j > d with h ( y ) ≤ d . The recursive refillings alwaysvisit successive locations, so the total time spent on deleting x is proportional to the number oflocations from that of x and to the first empty location. Summing up, we have Theorem 1
With linear probing, the time it takes to search, insert, or delete a key x is at mostproportional to the number of locations from h ( x ) to the first empty location. With n the nunber of keys and t the size of the table, we call n/t the load of our table. Wegenerally assume that the load is bounded from 1, e.g., that the number of keys is n ≤ t . With agood distribution of keys, we would then hope that the number of locations from h ( x ) to an emptylocation is O (1).This classic data structure is one of the most popular implementations of hash tables, due toits unmatched simplicity and efficiency. The practical use of linear probing dates back at least to1954 to an assembly program by Samuel, Amdahl, Boehme (c.f. [12]). On modern architectures,access to memory is done in cache lines (of much more than a word), so inspecting a few consecutivevalues is typically only slightly worse that a single memory access. Even if the scan straddles a cacheline, the behavior will still be better than a second random memory access on architectures withprefetching. Empirical evaluations [3, 8, 14] confirm the practical advantage of linear probing overother known schemes, e.g., chaining, but caution [8, 20] that it behaves quite unreliably with weakhash functions. Taken together, these findings form a strong motivation for theoretical analysis.Linear probing was shown to take expected constant time for any operation in 1963 byKnuth [11], in a report which is now regarded as the birth of algorithm analysis. This analy-sis, however, assumed a truly random hash function.A central open question of Wegman and Carter [21] was how linear probing behaves with k -independence. Siegel and Schmidt [17, 19] showed that O (lg n )-independence suffices for anyoperation to take expected constant time. Pagh et al. [13] showed that just 5-independence sufficesfor expected constant operation time. They also showed that linear transformations do not suffice,hence that 2-independence is not in itself sufficient.Pˇatra¸scu and Thorup [16] proved that 4-independence is not in itself sufficient for expectedconstant operation time. They display a concrete combination of keys and a 4-independent randomhash function where searching certain keys takes super constant expected time. This shows thatthe 5-independence result of Pagh et al. [13] is best possible. In fact [16] provided a completeunderstanding of linear probing with low independence as summarized in Table 1.Considering loads close to 1, that is load (1 − ε ), Pˇatra¸scu and Thorup [15] proved that theexpected operation time is O (1 /ε ) with 5-independent hashing, matching the bound of Knuth [11]assuming true randomness. The analysis from [15] also works for something called simple tabulationhashing that is we shall return to in Section 3.2. -independence Below we present the simplified version of the proof from [15] of the result from [13] that 5-independent hashing suffices for expected constant time with linear probing. For simplicity, weassume that the load is at most . Thus we study a set S of n keys stored in a linear probing tableof size t ≥ n . We assume that t is a power of two.A crucial concept is a run R which is a maximal interval of filled positions. We have an emptyposition before R , which means that all keys x ∈ S landing in R must also hash into R in the sense3ndependence 2 3 4 ≥ √ n ) Θ(lg n ) Θ(lg n ) Θ(1)Construction time Θ( n lg n ) Θ( n lg n ) Θ( n ) Θ( n )Table 1: Expected time bounds for linear probing with a poor k -independent hash function. Thebounds are worst-case expected, e.g., a lower bound for the query means that there is a concretecombination of stored set, query key, and k -independent hash function with this expected searchtime while the upper-bound means that this is the worst expected time for any such combination.Construction time refers to the worst-case expected total time for inserting n keys starting froman empty table.that h ( x ) ∈ R . Also, we must have exactly r = | R | keys hashing to R since the position after R isempty.By Theorem 1 the time it takes for any operation on a key q is at most proportional to thenumber of locations from h ( x ) to the first empty location. We upper bound this number by r + 1where r is the length of the run containing h ( q ). Here r = 0 if the location h ( q ) is empty. We notethat the query key q might itself be in R , and hence be part of the run, e.g., in the case of deletions.We want to give an expected upper bound on r . In order to limit the number of different eventsleading to a long run, we focus on dyadic intervals: a (dyadic) ℓ -interval is an interval of length 2 ℓ of the form [ i ℓ , ( i + 1)2 ℓ ) where i ∈ [ t/ ℓ ]. Assuming that the hashing maps S uniformly into [ t ],we expect n ℓ /t ≤ ℓ keys to hash into a given ℓ -interval I . We say that I is “near-full” if at least ℓ keys from S \ { q } hash into I . We claim that a long run implies that some dyadic interval ofsimilar size is near-full. More precisely, Lemma 2
Consider a run R of length r ≥ ℓ +2 . Then one of the first four ℓ -intervals intersecting R must be near-full. Proof
Let I , . . . , I be the first four ℓ -intervals intersecting R . Then I may only have its lastend-point in R while I , . . . , I are contained in R since r ≥ · ℓ . In particular, this means that L = (cid:16)S i ∈ [4] I i (cid:17) ∩ R has length at least 3 · ℓ + 1.But L is a prefix of R , so all keys landing in L must hash into L . Since L is full, we must haveat least 3 · ℓ + 1 keys hashing into L . Even if this includes the query key q , then we conclude thatone of our four intervals I i must have 3 · ℓ / ≥ ℓ keys from S \ { q } hashing into it, implyingthat I i is near-full.Getting back to our original question, we are considering the run R containing the hash of thequery q . Lemma 3
If the run containing the hash of the query key q is of length r ∈ [2 ℓ +2 , ℓ +3 ) , then oneof the following 12 consecutive ℓ -intervals is near-full: the ℓ -interval containing h ( q ) , the 8 nearest ℓ -intervals to its left, and the 3 nearest ℓ -intervals to its right. Proof
Let R be the run containing h ( q ). To apply Lemma 2, we want to show that the firstfour ℓ -intervals intersecting R has to be among the 12 mentioned in Lemma 3. Since the run R containing h ( q ) has length less than 8 · ℓ , the first ℓ -interval intersecting R can be at most 8 beforethe one containing h ( q ). The 3 following intervals are then trivially contained among the 12.4or our analysis, in the random choice of the hash function h , we first fix the hash value h ( q ) ofthe query key q . Conditioned on this value of h ( q ), for each ℓ , let P ℓ be an upper-bound on the theprobability that any given ℓ -interval is near-full. Then the probability that the run containing h ( q )has length r ∈ [2 ℓ +2 , ℓ +3 ) is bounded by 12 P ℓ . Of course, this only gives us a bound for r ≥ q isbounded byThus, conditioned on the hash of the query key, for each ℓ we are interested in a bound P ℓ onthe probability that any given ℓ -interval is near-full. Then the probability that the run containing h ( q ) has length r ∈ [2 ℓ +2 , ℓ +3 ) is bounded by 12 P ℓ . Of course, this only gives us a bound for r ≥ q isbounded by 3 + log t X ℓ =0 ℓ +3 · P ℓ = O log t X ℓ =0 ℓ P ℓ . Combined with Theorem 1, we have now proved
Theorem 4
Consider storing a set S of keys in a linear probing table of size t where t is a powerof two. Conditioned on the hash of a key q , let P ℓ bound the probability that ℓ keys from S \ { q } hash to any given ℓ -interval. Then the expected time to search, insert, or delete q is bounded by O log t X ℓ =0 ℓ P ℓ . We note that Theorem 4 does not mention the size of S . However, as mentioned earlier, with auniform distribution, the expected number of elements hashing to an ℓ -interval is ≤ ℓ | S | /t , so for P ℓ to be small, we want this expectation to be significantly smaller than ℓ . Assuming | S | ≤ t ,the expected number is ℓ .To get constant expected cost for linear probing, we are going to assume that the hash functionused is 5-independent. This means that no matter the hash value h ( q ) of q , conditioned on h ( q ),the keys from S \ { q } are hashed 4-independently. This means that if X x is the indicator variablefor a key x ∈ S \ { q } hashing to a given interval I , then the variables X x , x ∈ S \ { q } are 4-wiseindependent. The probabilistic tool we shall use here to analyze 4-wise independent variables is a 4 th momentbound. For i ∈ [ n ], let X i ∈ [2] = { , } , p i = Pr[ X i = 1] = E[ X i ], X = P i ∈ [ n ] X i , and µ = E[ X ] = P i ∈ [ n ] p i . Also σ i = Var[ X i ] = E[( X i − p i ) ] = p i (1 − p i ) + (1 − p i ) p i = p i − p i . Aslong as the X i are pairwise independent, the variance of the sum is the sum of the variances, so wedefine σ = Var[ X ] = X i ∈ [ n ] Var[ X i ] = X i ∈ [ n ] σ i ≤ µ. By Chebyshev’s inequality, we havePr[ | X − µ | ≥ d √ µ ] ≤ Pr[ | X − µ | ≥ dσ ] ≤ /d . (2)5e are going to prove a stronger bound if the variables are 4-wise independent and µ ≥ d ≥ Theorem 5
If the variables X , . . . , X n − ∈ { , } are 4-wise independent, X = P i ∈ [ n ] X i , and µ = E[ X ] ≥ , then Pr[ | X − µ | ≥ d √ µ ] ≤ /d . Proof
Note that ( X − µ ) = P i ∈ [ n ] ( X i − p i ). By linearity of expectation, the fourth moment is:E[( X − µ ) ] = E[ (cid:0) X i X i − p i (cid:1) ] = X i,j,k,l ∈ [ n ] E (cid:2) ( X i − p i )( X j − p j )( X k − p k )( X l − p l ) (cid:3) . Our goal is to get a good bound on the fourth moment.Consider a term E (cid:2) ( X i − p i )( X j − p j )( X k − p k )( X l − p l ) (cid:3) . The at most 4 distinct variablesare completely independent. Suppose one of them, say, X i , appears only once. By definition,E (cid:2) ( X i − p i ) (cid:3) = 0, and since it is independent of the other factors, we get E (cid:2) ( X i − p i )( X j − p j )( X k − p k )( X l − p l ) (cid:3) = 0. We can therefore ignore all terms where any variable appears once. We maytherefore assume that each variables appears either twice or 4 times. In terms with variablesappearing twice, we have two indices a < b where a is assigned to two of i, j, k, l , while b is assignedto the other two, yielding (cid:0) (cid:1) combinations based on a < b . Thus we getE[( X − µ ) ] = X i,j,k,l ∈ [ n ] E (cid:2) ( X i − p i )( X j − p j )( X k − p k )( X l − p l ) (cid:3) = X i E (cid:2) ( X i − p i ) (cid:3) + (cid:18) (cid:19) X a
1, so ( X i − p i ) m − ≤
1, and therefore( X i − p i ) m ≤ ( X i − p i ) . Continuing our calculation, we getE[( X − µ ) ] = X i E (cid:2) ( X i − p i ) (cid:3) + (cid:18) (cid:19) X a
1, we get E[( X − µ ) ] ≤ µ + 3 µ ≤ µ . (5)which is our desired bound on the fourth moment.6y Markov’s inequality,Pr[ | X − µ | ≥ d √ µ ] = Pr[( X − µ ) ≥ ( d √ µ ) ] ≤ E[( X − µ ) ] / ( d √ µ ) ≤ /d . This completes the proof of Theorem 5.We are now ready to prove the 5-independence suffices for linear probing.
Theorem 6
Suppose we use a 5-independent hash function h to store a set S of n keys in a linearprobing table of size t ≥ n where t is a power of two. Then it takes expected constant time tosearch, insert, or delete a key. Proof
First we fix the hash of the query key q . To apply Theorem 4, we need to find a bound P ℓ on the probability that ℓ keys from S \ { q } hash to any given ℓ -interval I . For each key x ∈ S \ { q } , let X x be the indicator variable for h ( x ) ∈ I . Then X = P x ∈ S \{ q } X x is the number ofkeys hashing to I , and the expectation of X is µ = E[ X ] = n ℓ /t ≤ ℓ . Our concern is the eventthat X ≥
34 2 ℓ = ⇒ X − µ ≥
112 2 ℓ > p ℓ µ. Since h is 5-independent, the X x are 4-independent, so by Theorem 5, we getPr (cid:20) X ≥
34 2 ℓ (cid:21) ≤ / ℓ = O (1 / ℓ ) . Thus we can use P ℓ = O (1 / ℓ ) in Theorem 4, and then we get that the expected operation cost is O log t X ℓ =0 ℓ P ℓ = O log t X ℓ =0 ℓ / ℓ = O (1) . Problem 1
Above we assumed that the range of our hash function is [ t ] where t is a power of two.As suggested in the introduction, we use a hash function based on a degree 4 polynomial over aprime field Z p where p ≫ , that is, we pick independent random coefficients a , . . . , a ∈ [ p ] , anddefine the hash function h ′ : [ p ] → [ t ] by h ′ ( x ) = (cid:16)(cid:0) a x + · · · + a x + a (cid:1) mod p (cid:17) mod t. Then for any distinct x , . . . , x , the hash values h ′ ( x ) , . . . , h ′ ( x ) are independent. Moreover, wehave almost uniformity in the sense that for any x ∈ [ p ] and y ∈ [ t ] , we have /t − /p < Pr[ h ′ ( x ) = y ] < /t + 1 /p .Prove that Theorem 6 still holds with constant operation time if p ≥ t . Problem 2
Assuming full randomness, use Chernoff bounds to prove that the longest run in thehash table has length O (log n ) with probability at least − /n . Hint.
You can use Lemma 2 to prove that if there is run of length r ≥ ℓ +2 , then some ℓ -intervalis near-full. You can then pick ℓ = C ln n for some large enough constant C . Problem 3
Using Chebyshev’s inequality, show that with 3-independent hashing, the expected op-eration time is O (log n ) . .2 Fourth moment and simple tabulation hashing In the preceding analysis we use the 5-independence of the hash function as follows. First we fixthe hash of the query key. Conditioned on this fixing, we still have 4-independence in the hashesof the stored keys, and we use this 4-independence to prove the 4th moment bound (5) on thenumber stored keys hashing to any given interval. This was all we needed about the hash functionto conclude that linear probing takes expected constant time per operation.Pˇatra¸scu and Thorup [15] have proved that something called simple tabulation hashing, thatis only 3-independent, within a constant factor provides the same 4th moment bound (5) on thenumber of stored keys hashing to any given interval conditioned on a fixed hash of the query key.Linear probing therefore also works in expected constant time with simple tabulation. This isimportant because simple tabulation is 10 times faster than 5-independence implemented with apolynomial as in (1).Simple tabulation hashing was invented by Zobrist [22] in 1970 for chess computers. The basicidea is to view a key x as consisting of c characters for some constant c , e.g., a 32-bit key couldbe viewed as consisting of c = 4 characters of 8 bits. We initialize c tables T , . . . , T c mappingcharacters to random hash values that are bit-strings of a certain length. A key x = ( x , ..., x c ) isthen hashed to T [ x ] ⊕ · · · ⊕ T c [ x c ] where ⊕ denotes bit-wise xor. k -th moment The 4 th moment bound used above generalizes to any even moment. First we need Theorem 7
Let X , . . . , X n − ∈ { , } be k -wise independent variables for some (possibly odd) k ≥ . Let p i = Pr[ X i = 1] and σ i = Var[ X i ] = p i − p i . Moreover, let X = P i ∈ [ n ] X i , µ = E[ X ] = P i ∈ [ n ] p i , and σ = Var[ X ] = P i ∈ [ n ] σ i . Then E[( X − µ ) k ] ≤ O ( σ + σ k ) = O ( µ + µ k/ ) . Proof
The proof is a simple generalization of the proof of Theorem 5 up to (4). We have( X − µ ) k = X i ,...,i k − ∈ [ n ] (cid:0) ( X i − p i )( X i − p i ) · · · ( X i k − − p i k − ) (cid:1) By linearity of expectation,E[( X − µ ) k ] = X i ,...,i k − ∈ [ n ] E (cid:2)(cid:0) ( X i − p i )( X i − p i ) · · · ( X i k − − p i k − ) (cid:1)(cid:3) We now consider a specific term (cid:0) ( X i − p i )( X i − p i ) · · · ( X i k − − p i k − ) (cid:1) Let j < j < · · · < j c − be the distinct indices among i , i , . . . , i n − , and let m h be the multiplicityof j h . Then (cid:0) ( X i − p i )( X i − p i ) · · · ( X i k − − p i k − ) (cid:1) = (cid:0) ( X j − p j ) m ( X j − p j ) m · · · ( X j c − − p j c − ) m c − (cid:1) . k different variables so they are all independent, and thereforeE (cid:2)(cid:0) ( X j − p j ) m ( X j − p j ) m · · · ( X j c − − p j c − ) m c − (cid:1)(cid:3) = E [( X j − p j ) m ] E [( X j − p j ) m ] · · · E (cid:2) ( X j c − − p j c − ) m c − (cid:3) Now, for any i ∈ [ n ], E[ X i − p i ] = 0, so if any multiplicity is 1, the expected value is zero.We therefore only need to count terms where all multiplicities m h are at least 2. The sum ofmultiplicities is P h ∈ [ c ] m h = k , so we conclude that there are c ≤ k/ j , . . . , j c − .Now by (3),E [( X j − p j ) m ] E [( X j − p j ) m ] · · · E (cid:2) ( X j c − − p j c − ) m c − (cid:3) ≤ σ j σ j · · · σ j c − . We now want to bound the number tuples ( i , i , . . . , i k − ) that have the same c distinct indices j < j < · · · < j c − . A crude upper bound is that we have c choices for each i h , hence c k tuples.We therefore conclude thatE[( X − µ ) k ] = X i ,...,i k − ∈ [ n ] E (cid:2)(cid:0) ( X i − p i )( X i − p i ) · · · ( X i k − − p i k − ) (cid:1)(cid:3) ≤ ⌊ k/ ⌋ X c =1 c k X ≤ j Let X , . . . .X n − ∈ { , } be k -wise independent variables for some even constant k ≥ . Let p i = Pr[ X i = 1] and σ i = Var[ X i ] = p i − p i . Moreover, let X = P i ∈ [ n ] X i , µ = E[ X ] = P i ∈ [ n ] p i , and σ = Var[ X ] = P i ∈ [ n ] σ i . If µ = Ω(1) , then Pr[ | X − µ | ≥ d √ µ ] = O (1 /d k ) . roof By Theorem 7 and Markov’s inequality, we getPr[ | X − µ | ≥ d √ µ ] = Pr[( X − µ ) k ≥ d k µ k/ ] ≤ E[( X − µ ) k ] d k µ k/ = O (cid:0) µ + µ ⌊ k/ ⌋ (cid:1) d k µ k/ = O (1 /d k ) . Problem 4 In the proofs of this section, where and why do we need that (a) k is a constant and(b) that k is even. We will now show how we can reduce the space of a linear probing table if we are willing to allow fora small chance of false positives, that is, the table attemps to answer if a query q is in the currentstored set S . If it answers “no”, then q S . If q ∈ S , then it always answers “yes”. However, evenif q S , then with some probability ≤ P , the table may answer “yes”. Bloom [4] was the first tosuggest creating such a filter using less space than one giving exact answers. Our implementationhere, using linear probing, is completely different. The author suggested this use of linear probingto various people in the late 90ties, but it was never written down.To create a filter, we use a universal hash function s : [ u ] → [2 b ]. We call s ( x ) the signature of x . The point is that s ( x ) should be much smaller than x , that is, b ≪ log u . The linear probingarray T is now only an array of t signatures. We still use the hash function h : [ u ] → [ t ] to startthe search for a key in the array. Thus, to check if a key q is positive in the filter, we look for s ( q ) among the signatures in T from location h ( q ) and onwards until the first empty locatition. If s ( q ) is found, we report “yes”; otherwise “no”. If we want to include q to the filter, we only dosomething if s ( q ) was not found. Then we place s ( q ) it in the first empty location. Our filter doesnot support deletion of keys (c.f. Problem 6). Theorem 9 Assume that the hash function h and the signature function s are independent, that h is 5-independent, and that s is universal. Then the probability of a false positive on a given key q S is O (1 / b ) . Proof The keys from S have been inserted in some given order. Let us assume that h is fixed.Suppose we inserted the keys exactly, that is, not just their signatures, and let X ( q ) be the set ofkeys encountered when searching for q , that is, X ( q ) is the set of keys from h ( q ) and till the firstempty location. Note that X ( q ) depends only on h , not on s .In Problem 5 you will argue that if q is a false positive, then s ( q ) = s ( x ) for some x ∈ X ( q ).For every key x ∈ [ u ] \ { q } , by universality of s , we have Pr[ s ( x ) = s ( q )] ≤ / b . Since q S ⊇ X ( q ), by union, Pr[ ∃ x ∈ X ( q ) : s ( x ) = s ( q )] ≤ | X ( q ) | / b . It follows that the probability10hat q is a false positive is bounded by X Y ⊆ S Pr[ X ( q ) = Y ] · | Y | / b = E[ | X ( q ) | ] / b . By Theorem 6, E[ | X ( q ) | ] = O (1) when h is 5-independent. Problem 5 To complete the proof of Theorem 9, consider a sequence x , . . . , x n of distinct keys in-serted in an exact linear probing table (as defined in Section 2). Also, let x i , ..., x i m be a subequenceof these keys, that is, ≤ i < i < · · · < i m ≤ n . The task is to prove any fixed h : [ u ] → [ t ] andany fixed j ∈ [ t ] , that when only the subsequence is inserted, then the sequence of keys encounteredfrom location j and till the first empty location is a subsequence of those encountered when the fullsequence is inserted. Hint. Using induction on n , show that the above statement is preserved when a new key x n +1 is added. Here x n +1 may or may not be part of the subsequence.The relation to the proof of Theorem 9 is that when we insert keys in a filter, we skip keys whosesignatures are found as false postives. This means that only a subsequence of the keys have theirsignatures inserted. When searching for a key q starting from from location j = h ( q ) , we have thusproved that we only consider (signatures of ) a subset of the set X ( q ) of keys that we would haveconsidered if all keys where inserted. In particular, this means that if we from j encounter a key x with s ( x ) = s ( q ) , then x ∈ X ( q ) as required for the proof of Theorem 9. Problem 6 Discuss why we canot support deletions. Problem 7 What would happen if we instead used h ( s ( x )) as the hash function to place or find x ?What would be the probability of a false positive? Sometimes it is faster to generate the hash values and signatures together so that the pairs( h ( x ) , s ( x )) are 5-independent while the hash values and signatures are not necessarily independentof each other. An example is if we generate a larger hash value, using high-order bits for h ( x ) andlow-order bits for s ( x ). In this situation we get a somewhat weaker bound than that in Theorem 9. Theorem 10 Assuming that x ( h ( x ) , s ( x )) is 5-independent, the probability of a false positiveon a given key q S is O (1 / b/ ) . Proof Considering the exact insertion of all keys, we consider two cases. Either (a) there is a runof length at least 2 b/ around h ( q ), or (b) there is no such run.For case (a), we use Lemma 3 together with the bound P ℓ = O (1 / ℓ ) from the proof ofTheorem 4. We get that the probability of getting a run of length at least 2 b/ is bounded by ∞ X ℓ = b/ − P ℓ = O (1 / b/ ) . We now consider case (b). By the statement proved in Problem 5, we know that any signature s ( x )considered is from a key x from the set X ( q ) of keys that we would have considered from j = h ( q )if all keys were inserted exactly. With no run of length at least 2 b/ , all keys in X ( q ) must hash11o ( h ( q ) − b/ , h ( q ) + 2 b/ ]. Thus, if we get a false positive in case (b), it is because there is a key x ∈ S with s ( x ) = s ( q ) and h ( x ) ∈ ( h ( q ) − b/ , h ( q ) + 2 b/ ]. Since ( h ( x ) , s ( x )) and ( h ( q ) , s ( q )) areindependent, the probability that this happens for x is bounded by 2 b/ / ( t b ) = O (1 / ( n b/ )),yeilding O (1 / b/ ) when we sum over all n keys in S . By union, the probability of a false positivein case (a) or (b) is bounded by O (1 / b/ ), as desired.We note that with the simple tabulation mentioned in Section 3.2, we can put hash-signature pairs asconcatenated bit strings in the character tables T , . . . , T c . Then ( h ( x ) , s ( x )) = T [ x ] ⊕ · · · ⊕ T c [ x c ].The nice thing here is that with simple tabulation hashing, the output bits are all completelyindependent, which means that Theorem 9 applies even though we generate the hash-signaturepairs together. References [1] Noga Alon, Martin Dietzfelbinger, Peter Bro Miltersen, Erez Petrank, and G´abor Tardos.Linear hash functions. J. ACM , 46(5):667–683, 1999.[2] Noga Alon and Asaf Nussboim. k -wise independent random graphs. In Proc. 49th IEEESymposium on Foundations of Computer Science (FOCS) , pages 813–822, 2008.[3] John R. Black, Charles U. Martel, and Hongbin Qi. Graph and hashing algorithms for modernarchitectures: Design and performance. In Proc. 2nd International Workshop on AlgorithmEngineering (WAE) , pages 37–48, 1998.[4] Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communicationsof the ACM , 13(7):422–426, 1970.[5] Larry Carter and Mark N. Wegman. Universal classes of hash functions. Journal of Computerand System Sciences , 18(2):143–154, 1979. Announced at STOC’77.[6] Martin Dietzfelbinger and Ulf Schellbach. On risks of using cuckoo hashing with simple uni-versal hash classes. In Proc. 20th ACM/SIAM Symposium on Discrete Algorithms (SODA) ,pages 795–804, 2009.[7] Michael L. Fredman, J´anos Koml´os, and Endre Szemer´edi. Storing a sparse table with 0(1)worst case access time. Journal of the ACM , 31(3):538–544, 1984. Announced at FOCS’82.[8] Gregory L. Heileman and Wenbin Luo. How caching affects hashing. In Proc. 7th Workshopon Algorithm Engineering and Experiments (ALENEX) , pages 141–154, 2005.[9] Daniel M. Kane Jeffery S. Cohen. Bounds on the independence required for cuckoo hashing,2009. Manuscript.[10] Howard J. Karloff and Prabhakar Raghavan. Randomized algorithms and pseudorandom num-bers. Journal of the ACM , 40(3):454–476, 1993.[11] Donald E. Knuth. Notes on open addressing. Unpublished memorandum. See http://citeseer.ist.psu.edu/knuth63notes.html , 1963.1212] Donald E. Knuth. The Art of Computer Programming, Volume III: Sorting and Searching .Addison-Wesley, 1973.[13] Anna Pagh, Rasmus Pagh, and Milan Ruˇzi´c. Linear probing with constant independence. SIAM Journal on Computing , 39(3):1107–1120, 2009. Announced at STOC’07.[14] Rasmus Pagh and Flemming Friche Rodler. Cuckoo hashing. Journal of Algorithms , 51(2):122–144, 2004. Announced at ESA’01.[15] Mihai Pˇatra¸scu and Mikkel Thorup. The power of simple tabulation-based hashing. Journalof the ACM , 59(3):Article 14, 2012. Announced at STOC’11.[16] Mihai Pˇatra¸scu and Mikkel Thorup. On the k -independence required by linear probing andminw ise independence. ACM Transactions on Algorithms , 12(1):Article 8, 2016. Announcedat ICALP’10.[17] Jeanette P. Schmidt and Alan Siegel. The analysis of closed hashing under limited randomness.In Proc. 22nd ACM Symposium on Theory of Computing (STOC) , pages 224–234, 1990.[18] Jeanette P. Schmidt, Alan Siegel, and Aravind Srinivasan. Chernoff-Hoeffding bounds forapplications with limited independence. SIAM Journal on Discrete Mathematics , 8(2):223–250, 1995. Announced at SODA’93.[19] Alan Siegel and Jeanette P. Schmidt. Closed hashing is computable and optimally random-izable with universal hash functions. Technical Report TR1995-687, Courant Institute, NewYork University, 1995.[20] Mikkel Thorup and Yin Zhang. Tabulation-based 5-independent hashing with applications tolinear probing and second moment estimation. SIAM Journal on Computing , 41(2):293–331,2012. Announced at SODA’04 and ALENEX’10.[21] Mark N. Wegman and Larry Carter. New classes and applications of hash functions.