On the Hardness of Set Disjointness and Set Intersection with Bounded Universe
aa r X i v : . [ c s . D S ] O c t On the Hardness of Set Disjointness and SetIntersection with Bounded Universe
Isaac Goldstein ∗ , Moshe Lewenstein † , and Ely Porat ‡ {goldshi,moshe,porately}@cs.biu.ac.il Abstract
In the SetDisjointness problem, a collection of m sets S , S , ..., S m from some universe U ispreprocessed in order to answer queries on the emptiness of the intersection of some two querysets from the collection. In the SetIntersection variant, all the elements in the intersection of thequery sets are required to be reported. These are two fundamental problems that were consideredin several papers from both the upper bound and lower bound perspective.Several conditional lower bounds for these problems were proven for the tradeoff betweenpreprocessing and query time or the tradeoff between space and query time. Moreover, there areseveral unconditional hardness results for these problems in some specific computational models.The fundamental nature of the SetDisjointness and SetIntersection problems makes them usefulfor proving the conditional hardness of other problems from various areas. However, the universeof the elements in the sets may be very large, which may cause the reduction to some otherproblems to be inefficient and therefore it is not useful for proving their conditional hardness.In this paper, we prove the conditional hardness of SetDisjointness and SetIntersection withbounded universe. This conditional hardness is shown for both the interplay between prepro-cessing and query time and the interplay between space and query time. Moreover, we presentseveral applications of these new conditional lower bounds. These applications demonstrates thestrength of our new conditional lower bounds as they exploit the limited universe size. We believethat this new framework of conditional lower bounds with bounded universe can be useful forfurther significant applications. F.2 ANALYSIS OF ALGORITHMS AND PROBLEM COM-PLEXITY
Keywords and phrases set disjointness, set intersection, 3SUM, space-time tradeoff, conditionallower bounds ∗ This research is supported by the Adams Foundation of the Israel Academy of Sciences and Humanities † This work was partially supported by ISF grant ‡ This work was partially supported by ISF grant
On the Hardness of Set Disjointness and Set Intersection with Bounded Universe
The emerging field of fine-grained complexity receives much attention in the last years. Oneof the most notable pillars of this field is the celebrated 3SUM conjecture. In the 3SUMproblem, given a set of n numbers we are required to decide if there are 3 numbers inthis set that sum up to zero. It is conjectured that no truly subquadratic solution to thisproblem exists. This conjecture was extensively used to prove the conditional hardness ofother problems in a variety of research areas, see e.g. [9, 10, 1, 2, 3, 4, 11, 20, 19, 23, 25,31]. The 3SUM problem is closely related to the fundamental SetDisjointness problem. Inthe SetDisjointness problem we are given m sets S , S , ..., S m from some universe U forpreprocessing. After the preprocessing phase, given a query pair of indices ( i, j ) we arerequired to decide if the intersection S i ∩ S j is empty or not. In the SetIntersection variant,all the elements within the intersection S i ∩ S j are required to be reported.Cohen and Porat [15] investigated the upper bound of both problems. Specifically, theyshowed that SetDisjointness can be solved almost trivially in linear space and O ( √ N ) querytime, where N is the total number of elements in all sets. This solution can be generalizedto a full tradeoff between the space S and the query time T such that S · T = O ( N ). Forthe SetIntersection problem, Cohen and Porat demonstrated a linear space solution with O ( N √ N ) preprocessing time and O ( √ N √ out + out ) query time, where out is the outputsize. This was further generalized by Cohen [14] to a solution that uses O ( N − t ) spacewith O ( N − t ) preprocessing time and O ( N t out − t + out ) query time for 0 ≤ t ≤ / time hardness ofthe multiphase problem, which is a dynamic version of the SetDisjointness problem, based onthe 3SUM conjecture. He also proved a connection between 3SUM and reporting triangles ina graph which is closely related to the SetIntersection problem. His conditional hardness res-ults were improved by Kopelowitz et al. [23] that considered the preprocessing and query time tradeoff of both SetDisjointness and SetInteresection. Specifically, they proved, based on the3SUM conjecture, that SetDisjointness has the following lower bound on the tradeoff betweenpreprocessing time T p and query time T q for any 0 < γ < T p + N γ − γ T q = Ω( N − γ − o (1) ).Moreover, based on the 3SUM conjecture they also proved that SetIntersection has the fol-lowing lower bound on the tradeoff between preprocessing, query and reporting (per outputelement) time for any 0 ≤ γ < , δ > T p + N γ )3+ δ − γ T q + N δ )3+ δ − γ T r = Ω( N δ − γ − o (1) ).Kopelowitz et al. [22] also proved the conditional time hardness of the dynamic versions ofSetDisjointness and SetInteresection.The lower bound on the space-query time tradeoff for solving SetDisjointness was con-sidered by Cohen and Porat [16] and Pˇatraşcu and Roditty [26]. They have the followingconjecture regarding the hardness of SetDisjointness (this is the formulation of Cohen andPorat. Pˇatraşcu and Roditty use slightly different formulation): ◮ Conjecture 1.
SetDisjointness Conjecture . Any data structure for the SetDisjointnessproblem with constant query time must use ˜Ω( N ) space.Recently, Goldstein et al. [21] considered space conditional hardness in a broader senseand demonstrated the conditional hardness of SetDisjointness and SetInteresection withregard to their space-query time tradeoff. They had a generalized form of Conjecture 1 thatclaims that the whole (simple) space-time tradeoff upper-bound for SetDisjointness is tight: ◮ Conjecture 2.
Strong SetDisjointness Conjecture . Any data structure for the Set-Disjointness problem that answers queries in T time must use S = ˜Ω( N T ) space. . Goldstein, M. Lewenstein and E. Porat 3 Moreover, they also presented a conjecture regarding the space-time tradeoff for SetInt-ersection: ◮ Conjecture 3.
Strong SetIntersection Conjecture . Any data structure for the SetInt-ersection problem that answers queries in O ( T + out ) time, where out is the size of the outputof the query, must use S = ˜Ω( N T ) space.Goldstein et al. [21] showed connections between these conjectures and other problemslike 3SUM-Indexing (a data structure variant of 3SUM), k-Reachability and boolean matrixmultiplication. Unconditional lower bounds for the space-time tradeoff of SetDisjointnessand SetIntersection were proven by Dietz et al. [18] and Afshani and Nielsen [5] for specificmodels of computation. The results of Dietz et al. [18] implies that Conjecture 2 is true inthe semi-group model. Afshani and Nielsen [5] proved Conjecture 3 in the pointer-machinemodel.The fundamental nature of SetDisjointness and SetIntersection makes them useful forproving conditional lower bounds especially when considering their connection to the 3SUMproblem. Indeed, several conditional lower bounds where proven using these problems(see [16, 17, 20, 23, 26, 27]). One major problem with this approach is that the universe ofthe elements in the sets of the SetDisjointness and SetIntersection problems can be large.This may cause the reduction from these problems to other problems, which we wish toprove their conditional hardness, to be inefficient. Therefore, it is of utmost interest to ob-tain a conditional lower bound on the hardness of SetDisjointness and SetIntersection withbounded universe, which in turn will be fruitful for achieving conditional lower bounds forother applications. Our Results . In this paper we prove several conditional lower bounds for SetDisjoint-ness and SetIntersection with bounded universe. We obtain the following results regardingthe interplay between space and query time for solving these problems: (1) Based on theStrong SetDisjointness Conjecture, we prove that SetDisjointness with m sets from universe[ u ] must either use Ω( m − o (1) ) space or have Ω( u / − o (1) ) query time. (2) Based on theStrong SetDisjointness Conjecture, we prove that SetIntersection with m sets from uni-verse [ u ] must either use Ω( m − o (1) ) space or have ˜Ω( u α − o (1) + out ) query time, for any1 / ≤ α ≤ out such that out = Ω( u α − − δ ) and δ > m sets fromuniverse [ u ] must either use Ω(( m u α ) − o (1) ) space or have ˜Ω( u α − o (1) + out ) query time forany 1 / ≤ α ≤ out such that out = Ω( u α − − δ ) and δ > m sets fromuniverse [ u ] must either have Ω( m − o (1) ) preprocessing time or have Ω( u / − o (1) ) querytime. (ii) Any solution to SetIntersection with m sets from universe [ u ] must either haveΩ( m − o (1) ) preprocessing time or have Ω( u − o (1) ) query time.These new conditional lower bounds are useful in proving conditional lower bounds forother problems that exploit the small universe size as explained before. We give someexamples of such applications.(1) Range Mode . The Range Mode problem is a classic problem that was studiedin several papers (see e.g. [12, 24]). In this problem, an array A with n elements is givenfor preprocessing. Then, we are required to answer range mode queries. That is, given arange [ i, j ] we have to find the mode element (the most frequent element) in the range [ i, j ]in A . The best known upper bound for the space-query time tradeoff of this problem is On the Hardness of Set Disjointness and Set Intersection with Bounded Universe S · T = ˜ O ( n ), where S is the space usage and T is the query time ([12, 24]). We proveusing our new lower bound for SetDisjointess with bounded universe the following lowerbound on the tradeoff between space and query time: S · T = Ω( n − o (1) ). We note thatif the query time in the lower bound on SetDisjointness (in Theorem 4, see (1) above) was˜Ω( u − o (1) ) then the lower and upper bounds were tight.(2) Distance oracle is a data structure for computing the shortest path between anytwo vertices in a graph. We say that a distance oracle has a stretch t if for any two verticesin the graph the distance it returns is no more than t times the true distance betweenthese vertices. Approximate distance oracles were investigated in many papers (see forexample [7, 6, 13, 26, 27, 28]). Agrawal [6] showed a ( )-stretch distance oracle for a graph G = ( V, E ) that uses ˜ O ( | E | + | V | α ) space and has O ( α | E || V | ) query time for any 1 ≤ α ≤ ( | V | | E | ) . We prove that this tradeoff is the best that can be achieved for any stretch-less-than-2 distance oracles based on our new lower bound on SetDisjointness with boundeduniverse (see a more detailed discussion in Section 4).(3) is a data structure variant of 3SUM. In this problem, two arrays A and B with n numbers in each of them are preprocessed. Then, given a query number z we are required to decide if there are x ∈ A and y ∈ B such that x + y = z . Goldsteinet al. [21] conjecture that there is no ˜ O (1) query time solution to 3SUM-Indexing usingtruly subquadratic space. In a stronger form of this conjecture they claim that there isno truly sublinear query time solution to 3SUM-Indexing using truly subquadratic space.Goldstein et al. [21] proved some connections between 3SUM-Indexing, SetDisjointness andSetIntersection. In this paper we strengthen their results using our new lower bounds forSetDisjointness and SetIntersection with bounded universe. Specifically, we prove based onour new lower bound on SetDisjointness with bounded universe that any solution to 3SUM-Indexing where the universe of the numbers within arrays A and B is [ n ǫ ] for any ǫ > S ) and query time ( T ): S · T = Ω( n − o (1) ).Moreover, we prove the same lower bound on the tradeoff between preprocessing ( T p ) andquery time ( T q ): T p · T q = ˜Ω( n − o (1) ). The latter is proven based on the 3SUM conjecturefollowing our reduction to SetDisjointness with bounded universe.In the 3SUM conjecture the universe of the numbers in the given instance is assumedto be [ n ] (see [25]) or even [ n ] (see [30]). It is known that 3SUM can be easily solved in O ( u log u ) time if the universe is [ u ] by using FFT. Therefore, 3SUM with numbers fromuniverse [ n − ǫ ] for any ǫ > n ] seems to be muchstronger conjecture (it was used once in [9]). Solving 3SUM-Indexing can be done easilywith ˜ O ( n ) preprocessing time, O ( n ) space and ˜ O (1) query time. Our results demonstratethat this is tight even if the universe of the numbers in A and B and the query numbers is[ n ǫ ] for any ǫ >
0. This is a very strong lower bound, as 3SUM-Indexing with numbersfrom universe [ u ] can be solved with ˜ O ( u ) preprocessing time, O ( u ) space and ˜ O (1) querytime. This is done in a similar way to solving 3SUM with numbers from universe [ u ].Consequently, for any ǫ >
0, 3SUM-Indexing with numbers from universe [ n − ǫ ] can besolved by a data structure that has constant query, while the preprocessing time and spaceare subquadratic. Our new conditional lower bound demonstrates that having such a datastructure for a slightly larger universe seems to be impossible. . Goldstein, M. Lewenstein and E. Porat 5 We prove the hardness of SetDisjointness with bounded universe in the following theorem: ◮ Theorem 4.
Any solution to SetDisjointness with sets S , S , ..., S m ⊆ [ u ] for any valueof u ∈ [ N δ , N ] , such that N = P mi =1 | S i | and δ > , must either use Ω( m − o (1) ) space orhave ˜Ω( u / − o (1) ) query time, unless the Strong SetDisjointness Conjecture is false. Proof.
Let us assume to the contradiction that the Strong SetDisjointness Conjecture istrue, but there is an algorithm A that solves SetDisjointness on m sets from a universe [ u ]and creates a data structure D , such that the space complexity of the data structure D is O ( m − ǫ ) for some ǫ > A is O ( u / − ǫ ) for some0 < ǫ ≤ /
2. We define ǫ = min( ǫ , ǫ ).Now, given an instance of SetDisjointness with sets S ′ , S ′ , ..., S ′ m ′ , we denote by N ′ thetotal number of elements in all sets, that is N ′ = P mi =1 | S ′ i | . We rename the elements of allthe sets such that each element e i is mapped to some integer x i ∈ [ N ′ ].We distinguish between 3 types of sets:(a) Large sets are all the sets with more than √ u elements. Denote by d the number oflarge sets. Let S p , S p , ..., S p d be some ordering of the large sets. Let p be a functionsuch that p ( i ) = p j if S i is the set S p j in the ordering of the large sets.(b) Small sets are all the sets with O ( u / − ǫ ) elements.(c) Medium sets are all the sets that are neither large nor small. Denote by e the numberof medium sets. Let S q , S q , ..., S q e be some ordering of the medium sets. Let q be afunction such that q ( i ) = q j if S i is the set S q j in the ordering of the medium sets.Now, we can solve SetDisjointness in the following way. Preprocessing :(1) For any set S i use static hashing to save all elements of the set in a table T i , such thatwe can check if some element exists in the set in O (1) time and the size of T i is O ( | S i | ).(2) Maintain a d × ( d + e ) matrix M . The ℓ th row in this matrix represents the set S p ℓ .For 1 ≤ ℓ ≤ d , the ℓ th column represents S p ℓ and for d + 1 ≤ ℓ ≤ d + e , the ℓ th columnrepresents S q ℓ − d .(3) For all pairs of sets S i and S j such that S i is a large set and S j is a large or medium set,save an explicit answer to the emptiness of the intersection of S i and S j in M [ p ( i ) , p ( j )]and M [ p ( j ) , p ( i )] if S j is a large set and in M [ p ( i ) , d + q ( j )] if S j is a medium set.(4) Pick log n hash functions h i : N → [8 u ], for 1 ≤ i ≤ log n . Apply each h i to all elementsin all medium sets. Denote by h i ( S j ) the set S j after h i has been applied to its elements.(5) For every i, j ∈ [ e ], if S q i ∩ S q j = ∅ do the following: Check if for all k ∈ [log n ] there are x ∈ S q i and x ′ ∈ S q j such that x = x ′ but h k ( x ) = h k ( x ′ ). If so, go back to step (4).(6) For every k ∈ [log n ]:(6.1) Apply h k to all the elements of all the medium sets.(6.2) Use algorithm A to create a data structure D k that solves the set disjointness problemon the medium sets S q , S q , ..., S q e after h k has been applied to their elements. Query :Given a pair of indices i and j , we need to determine if S i ∩ S j is empty or not. Withoutloss of generality we assume that | S i | < | S j | and do the following: On the Hardness of Set Disjointness and Set Intersection with Bounded Universe (1) If S i is a small set:(1.1) For each element x ∈ S i : Check if x ∈ S j using table T j . If so, return 0.(1.2) Return 1.(2) If S j is a large set:(2.1) If S i is a large set: Return M [ p ( i ) , p ( j )].(2.2) If S i is a medium set: Return and M [ p ( j ) , d + q ( i )].(3) Else (if both S i and S j are medium sets):For every k ∈ [log n ], check by using algorithm A and the data structure D k if S i and S j are disjoint.If there is at least one value of k for which these sets are disjoint, return 1.Otherwise, return 0. Correctness.
If at least one of the query sets is small then we can check if any of itselements is in the other query set using the hash tables that have been created in step (1)of the preprocessing phase. This is done in step (1) of the query algorithm. If at least oneof the sets is large we can find the answer immediately by looking at the right position ofmatrix M that has been created in steps (2)-(3) of the preprocessing phase. The last optionis that both query sets are medium. If this is the case we use the data structures that havebeen created in step (6) of the preprocessing phase. In steps (4) and (5) of the preprocessingphase we look for log n hash functions such that if any pair of sets are disjoint then theymust be disjoint when applying the hash functions to their elements by at least one of thelog n hash functions. Therefore, if any of the data structures that have been created in step(6) of the preprocessing phase reports that a pair of sets are disjoint they must be disjoint.Moreover, if a pair of sets are disjoint then there must be at least one data structure thatreports that they are disjoint. This is checked in the step (3) of the query algorithm.The last thing that needs to be justified is the existence of log n hash function such thatfor every pair of sets S i and S j that are disjoint they are also disjoint after applying thehash functions by at least one of the log n hash functions. The range of the hash functionis [8 u ]. The number of elements in the medium sets is no more than √ u . Therefore, for anytwo medium sets S i and S j and a hash function h k : N → [8 u ] we have by the union-boundthat Pr[ ∃ x ∈ S i , x ∈ S j : x = x ∧ h k ( x ) = h k ( x )] ≤ √ u ·√ u u = 1 /
8. Consequently, theprobability that a pair of disjoint medium sets S i and S j are not disjoint when applying h k for all k ∈ [log n ] is no more than (1 / log n = 1 /n . Therefore, the probability that any pairof disjoint medium sets are not disjoint when applying h k for all k ∈ [log n ] is no more than n /n = 1 /n by the union-bound. Using the probabilistic method we get that there mustbe log n hash functions such that for every pair of sets S i and S j that are disjoint they arealso disjoint after applying the hash functions by at least one of the log n hash functions. Complexity analysis . Space complexity . The space for the tables in step (1) of the preprocessing is clearly O ( N ) - linear in the total number of elements. The total number of large sets d is at most O ( N/u / ). The total number of medium sets e is at most O ( N/u / − ǫ ). Therefore, thesize of the matrix M is at most O ( N/u / · ( N/u / + N/u / − ǫ )) = O ( N /u − ǫ ). Thereare log n data structures that are created in step (6). Each data structure uses at most O (( N/u / − ǫ ) − ǫ ) = O ( N − ǫ /u − ǫ/ ǫ ) space. Consequently, the total space complexityis S = ˜ O ( N /u − ǫ + N − ǫ /u − ǫ/ ǫ ). Query time complexity . Step (1) of the query algorithm can be done in O ( u / − ǫ )as this is the size of the largest small set. Step (2) is done in constant time by looking at . Goldstein, M. Lewenstein and E. Porat 7 the right position in M . In step (3) we do log n queries using algorithm A and the datastructures D k . The query time for each query is O ( u / − ǫ ) as the universe of the sets afterapplying any hash function h k is [8 u ]. Therefore, the total query time is T = O ( u / − ǫ ).Following our analysis we have that S · T = ˜ O (( N /u − ǫ + N − ǫ /u − ǫ/ ǫ ) · ( u / − ǫ ) ) =˜ O ( N u − ǫ + N − ǫ u ǫ/ − ǫ ) = ˜ O ( N u − ǫ + N u − ǫ ) (the last equality follows from the fact that u ≤ N ).This contradicts the Strong SetDisjointness Conjecture and therefore our assump-tion is false. ◭ From the proof of the above theorem we get a specific range for the value of m for hardinstances of SetDisjointness. Bounding the value of m for hard instances may be usefulfor some specific applications. Therefore, we state the following corollary of the proof ofTheorem 4: ◮ Corollary 5.
For any ǫ > , any solution to set disjointness with sets S , S , ..., S m ⊆ [ u ] for any value of u ∈ [ N δ , N ] , such that N = P mi =1 | S i | , δ > and the solution works for anyvalue of m in the range [ Nu / , Nu / − ǫ ] , must either use Ω( m − o (1) ) space or have Ω( u / − o (1) ) query time, unless the Strong SetDisjointness Conjecture is false. We also prove conditional lower bounds on SetIntersection with bounded universe basedon the Strong SetDisjointness Conjecture and the Strong SetIntersection Conjecture bygeneralizing the ideas from the previous proof. These results appear in Appendix A.
We combine the ideas of Goldstein et al. [20] and Kopelowitz et al. [23] to get conditionallower bounds on the complexity of SetDisjointness with bounded universe. To achieve thesebounds we prove the following lemma: ◮ Lemma 6.
Let X be any integer in [ n δ , n ] for any δ > . For any ǫ > , an instance of3SUM-Indexing that contains 2 arrays with n integers can be reduced to ǫ log X instancesof SetDisjointness SD , SD , ..., SD ǫ log X . For any ≤ i ≤ ǫ log X , instance SD i have N i = n √ u i elements from universe [ u i ] and m = n q Xu i sets that each one of them is of size O ( √ u i ) , where u i = X ǫ / i − . The time and space complexity of the reduction is trulysubquadratic in n . Each query to the 3SUM-Indexing instance can be answered by at most O ( n/ √ X ) queries to each instance SD i plus some additional time that is truly sublinear in n . Proof.
We begin with an instance of 3SUM indexing with arrays A and B and do thefollowing construction in order to reduce this 3SUM indexing instance to 2 ǫ log n instancesof SetDisjointness. The construction uses almost-linear and almost-balanced hash functionsthat serve as a useful tool in many reductions from 3SUM. We briefly define this notion here(see full details in [23, 29]). Let H be a family of hash functions from [ u ] → [ m ]. H is called linear if for any h ∈ H and any x, x ′ ∈ [ u ], we have h ( x ) + h ( x ′ ) ≡ h ( x + x ′ ) (mod m ). H is called almost-linear if for any h ∈ H and any x, x ′ ∈ [ u ], we have either h ( x ) + h ( x ′ ) ≡ h ( x + x ′ ) + c h (mod m ), or h ( x ) + h ( x ′ ) ≡ h ( x + x ′ ) + c h + 1 (mod m ), where c h is an integerthat depends only on the choice of h . For a function h : [ u ] → [ m ] and a set S ⊂ [ u ] where | S | = n , we say that i ∈ [ m ] is an overflowed value of h if |{ x ∈ S : h ( x ) = i }| > n/m . H is called almost-balanced if for a random h ∈ H and any set S ⊂ [ u ] where | S | = n , theexpected number of elements from S that are mapped to overflowed values is O ( m ). For On the Hardness of Set Disjointness and Set Intersection with Bounded Universe simplicity of presentation, we treat the almost-linear hash functions as linear and this onlyaffects some constant factors in our analysis.
Construction . Initial Construction . We use an almost-linear almost-balanced hash function h : U → [ R ] to map the elements of A to R buckets A , A , ..., A R such that A i = { x ∈ A : h ( x ) = i } and the elements of B to R buckets B , B , ..., B R such that B i = { x ∈ B : h ( x ) = i } . As h is almost-balanced the expected size of each bucket is O ( n/R ). Moreover, buckets withmore than 3 n/R elements, called overflowed buckets, have no more than O ( R ) elements intotal. We save these O ( R ) elements in lists L A and L B (we put elements from overflowedbuckets of A in L A and elements from overflowed buckets of B in L B ). We also sort A and B and save lookup tables for both A and B .We pick another almost-linear almost-balanced hash function h : U → [ n ]. For eachbucket A i , we create an n -length characteristic vector v A i such that v A i [ j ] = 1 if there is x ∈ A i such that h ( x ) = j and v A i [ j ] = 0 if there is no x ∈ A i such that h ( x ) = j . In thesame way we create an n -length characteristic vector v B j for each bucket B j . Quad Trees Construction . We create a search quad tree for each pair of buckets A i and B j following the idea of Goldstein et al. [20]. The construction involves calculating the convolution of many pairs of vectors. The convolution of two vectors u, v ∈ { R + ∪ { }} n isa vector c , such that c [ k ] = P ki =0 u [ i ] v [ k − i ] for 0 ≤ k ≤ n −
2. Constructing the quad treeis done as follows:
Quad-Tree-Construction( v A i , v B j , X ) (1) For the bottom level of the quad tree:(1.1) Partition the characteristic vector v A i into ⌈ n/X ⌉ sub-vectors v A i , ..., v A i ⌈ n/X ⌉ eachof them of length X .(1.2) Pad the last sub-vector with zeroes if needed.(1.3) Let i , i , ..., i Y be the indices of the ones in some sub-vector v A ik . If Y > X/R (1.3.1) Duplicate v A ik t = ⌈ Y / ( X/R ) ⌉ times.(1.3.2) For every p ∈ [ t ]: Save in the p th copy of v A ik just the ones in the indices i ( p − · ( X/R )+1 , ..., i p · ( X/R ) − . Replace all other ones by zeroes.(1.4) Denote the sequence of sub-vectors of v A i and their duplicates by P A i = v A i , v A i , ..., v cn/XA i for some constant c ≥
1. Order the sub-vectors in P A i bythe locations of the ones. That is, sub-vector w occurs before u in P A i if the ones in w appear before the ones of u in v A i . A sub-vector w that contains only zeroes andtherefore represents a sub-vector v A ik for some 1 < k ≤ ⌈ n/X ⌉ without any duplicatesappears before all sub-vectors v A ik ′ for k ′ > k and their duplicates.(1.5) Repeat steps (1.1)-(1.4) for v B j and create a sequence of sub-vectors P B j = v B j , v B j , ..., v c ′ n/XB j for some constant c ′ ≥ c ≥ c ′ . Add to the end of the sequence P B j the vectors v c ′ n/X +1 B j , ..., v cn/XB j , such that each of these vectors contains exactly X zeroes.(1.7) For each pair of sub-vectors v kA i and v ℓB j :(1.7.1) Create a node c k,ℓi,j in the quad tree.(1.7.2) Calculate the convolution of v kA i and v ℓB j and save the result in c k,ℓi,j .(2) For the next level of the quad tree upward:(2.1) Create a sequence of sub-vectors v ′ A i , v ′ A i , ..., v ′ cn/ XA i such that v ′ kA i is the concaten-ation of v k − A i and v kA i from the previous level. . Goldstein, M. Lewenstein and E. Porat 9 (2.2) For every v ′ kA i if there are overlapping locations in v k − A i and v kA i - merge them. Thatis, if there are elements in both sub-vectors that represent the same interval of v A i ,merge all of them in v ′ kA i by setting each overlapping location to 1 if any of the twooverlapping elements in this location is 1, and setting each overlapping location to 0otherwise.(2.3) Repeat steps (2.1) and (2.2) for v B j and create a sequence v ′ B j , v ′ B j , ..., v ′ cn/ XB j .(2.4) For each pair of sub-vectors v ′ kA i and v ′ ℓB j create a node c ′ k,ℓi,j in the quad tree.(2.5) Make the node c ′ k,ℓi,j the parent of 4 nodes from the previous level: c k − ,ℓ − i,j , c k − ,ℓi,j , c k,ℓ − i,j , c k, ℓi,j .(2.6) Calculate the convolution of v ′ kA i and v ′ ℓB j and save the result in c ′ k,ℓi,j . The convolutionof v ′ kA i and v ′ ℓB j can be easily calculated using the convolution results that are savedin c k − ,ℓ − i,j , c k − ,ℓi,j , c k,ℓ − i,j , c k, ℓi,j from the previous level.(3) Repeat step (2) for all the levels up to the root. Notice that in the root we have thecomplete vectors v A i and v B j and we calculate and save their convolution within theroot node.We emphasize that in the bottom level of the quad tree the number of sub-vectors of v A i including all duplicates is no more than cn/X for some constant c ≥
1, as the total numberof ones in v A i is O ( n/R ). Therefore, the size of the sequence in step (1.4) is cn/X .We call a quad tree such that the length of the sub-vectors in its bottom level is XX -quad-tree. We denote the level of the quad tree with sub-vectors of length Z by ℓ Z . Weemphasize that we consider the length of the sub-vectors for the last notation by their lengthif we do no merging in any level of the quad tree. Convolution by SetDisjointness . The convolution c of two X -length vectors v and u can be calculated using SetDisjointness in the following way: Let us denote by v i (for any0 ≤ i ≤ X −
1) a (2 X − v i [ j + i ] = v [ j ] for every 0 ≤ j ≤ X − v i are zeroes. It is clear that v i is the vector v that its elementswhere shifted by i locations and the empty locations are filled with zeroes. Therefore, wecall the vector v i an i -shift of v . We define u i in a similar way. Let us denote by v R thevector v in reverse order of elements. It is straightforward to observe that c [ j ] (the j thelement in the convolution result of v and u ) equals to the inner product of v Rj (we notethat the reverse operation is done before the shift operation) and u X − . Informally, thecomplete convolution of v and u can be calculated by the inner product of (padded) u andthe reversed version of (padded) v in X − v by shifting both v and u . Specifically, the value of c [ j ] can be obtained by theinner product of v Rj mod √ X and u X − −⌊ j √ X ⌋·√ X . Therefore, the convolution of v and u canbe calculated by the inner product of O ( √ X ) shifted versions of both v and u .Each of the (2 X − w we construct a set S w such that S w = { j | w [ j ] = 1 } . Instead of calculating the inner product of v Rj mod √ X and u X − −⌊ j √ X ⌋·√ X , wecan calculate | S v Rj mod √ X ∩ S u X − −⌊ j √ X ⌋·√ X | and get the same result. In our query processthrough the quad tree we just need to know in each node if the value in some position ofthe convolution within that node is zero or not. Thus, instead of calculating | S v Rj mod √ X ∩ S u X − −⌊ j √ X ⌋·√ X | we just need to determine if S v Rj mod √ X ∩ S u X − −⌊ j √ X ⌋·√ X = ∅ or not. All inall, the convolution of two X -length vectors v and u can be determined by a SetDisjointnessinstance that contains O ( √ X ) sets such that their size equals to the number of ones in either v or u . Consequently, instead of saving explicitly the convolution result in each node in somelevel of the quad tree that represents sub-vectors of length X , we can create an instance ofSetDisjointness that can be used to determine if a specific position in a convolution resultis zero or not. Hybrid Quad Tree Construction . Using the idea from the previous paragraph wemodify the quad tree construction in the following way: We construct in the regular way,that is explained in detail above, each of the quad trees until level ℓ X − ǫ . From level ℓ X − ǫ to level ℓ X ǫ we do not save the convolution results explicitly in the quad tree for eachlevel, but rather we create a SetDisjointness instance that can be used to answer if a specificposition in a convolution result is zero or not. This is an hybrid construction in which wecreate an ( X ǫ )-quad-tree that the bottom X ǫ levels are not saved explicitly. Instead,the information for these bottom levels is determined by the SetDisjointness instances wecreate. These levels are called the implicit levels of the hybrid quad tree while the levels inwhich we save the convolution results explicitly are called the explicit levels of the hybridquad tree. Query .Given a query integer number z , we search for a pair of integers x ∈ A and y ∈ B suchthat x + y = z . First of all, we check for each element x ∈ L A if there is y ∈ B such that x + y = z and we also check for each element y ∈ L B if there is x ∈ A such that x + y = z .This can be done easily in ˜ O ( R ) time using the sorted versions of A and B . Then, if x is in bucket A i then by the (almost) linearity property of h we expect y to be in bucket B j such that j = i − h ( z ). In order to find out if there is x ∈ A i and y ∈ B j such that x + y = z we can calculate the convolution of v A i and v B j . Denote the vector that containstheir convolution result by C i,j . If C i,j [ h ( z )] = 0 then there are no x ∈ A i and y ∈ B j such that x + y = z . However, if C i,j [ h ( z )] = 0 then there may be x ∈ A i and y ∈ B j suchthat x + y = z , but it may also be the case that h ( x ) + h ( y ) = h ( z ) while x + y = z .Therefore, in order to verify if there are x ∈ A i and y ∈ B j such that x + y = z , we need tofind all pairs of x ′ ∈ A i and y ′ ∈ B j such that h ( x ′ ) + h ( y ′ ) = h ( z ) and check if indeed x ′ + y ′ = z . There are exactly C i,j [ h ( z )] such pairs, which are also called witnesses.In order to efficiently find the witnesses of C i,j [ h ( z )], we use the hybrid quad tree wehave constructed for buckets A i and B j in the following way: We start at the root of thehybrid quad tree if the convolution result in the root is non-zero at location h ( z ), we look atthe children of the root node and continue the search at each child that contains a non-zerovalue in the convolution result it saves in the index that corresponds to index h ( z ) of theconvolution in the root. This way we continue downward all the way to the leaves. In thelevels of the hybrid quad tree that the convolution results are not saved explicitly we querythe SetDisjointness instances in order to get an indication for the existence of a witness inthe search path from the root.If we reach a leaf of the quad tree and the convolution result within this leaf is non-zeroin the location that corresponds to the index h ( z ) of the convolution in the root, then wedo a "2SUM-like" search within this leaf.The "2SUM-like" search is done as follows: Let us assume that the leaf represents 2sub-vectors v kA i and v ℓB j . We recover the original elements that these sub-vectors represent.Let the array A ki contain all x ∈ A i such that there is one in v kA i that corresponds to h ( x ).In the same way we construct array B ℓj . We sort both A ki and B ℓj . Let d be the size of A ki .Then, if A ki [ d −
1] + B ℓj [0] = z we are done. Otherwise, if the sum is greater than z we checkif A ki [ d −
2] + B ℓj [0] = z and if it is smaller than z we check if A ki [ d −
1] + B ℓj [1] = z . Thisway we continue until we get to the end of one of the arrays or find a pair of elements that . Goldstein, M. Lewenstein and E. Porat 11 its sum equals z . Analysis . There are R possible pairs of buckets A i and B j . Therefore, we construct R quad trees. In order to save the convolution results in all the nodes in an explicit level ℓ Z ofsome hybrid quad tree, the space we need to use is O ( n /Z ) (for each pair A i and B j , thereare O ( n /Z ) pairs of sub-vectors one from v A i and the other from v B j . The size of theconvolution of the two sub-vectors is O ( Z )). Therefore, the total space for constructing theexplicit levels of the hybrid quad trees is ˜ O ( n /X ǫ · R ) (a level that is closer to the rootrequires less space than a level that is farther away from the root. There are at most log n levels in each quad tree. The bottom explicit level is ℓ X ǫ ). This is also the preprocessingtime for constructing these levels of the hybrid quad trees as the convolution of two n -lengthvector can be calculated in ˜ O ( n ) time.From level ℓ X − ǫ to level ℓ X ǫ we do not save the convolution results explicitly in thequad tree for each level, but rather we create a SetDisjointness instance that can be usedto answer if a specific position in a convolution result is zero or not, as explained in detailpreviously. Let us analyse the cost of the SetDisjointness instance for some implicit level ℓ Z . We have O ( R ) buckets. Each bucket is represented by a characteristic vector thatis partitioned into O ( n/Z ) parts of length Z , such that each part contains O ( Z/R ) ones.For each sub-vector we create O ( √ Z ) sets that represent O ( √ Z ) shifts of the sub-vectoras explained previously. Therefore, the total number of sets we have is O ( R · n/Z · √ Z ) = O ( nR/ √ Z ). Each set contains O ( Z/R ) elements, so the total number of elements in all setsis O ( R · n/ √ Z · Z/R ) = O ( n √ Z ). The universe of all the elements in the sets is Z .For a query integer z we have O ( R ) pairs of buckets A i and B j in which we may havetwo elements, one from each array, that sum up to z (as j = i − h ( z )). For a pair ofbuckets A i and B j , we search for all the witnesses of C i,j [ h ( z )] in the quad tree of A i and B j . Searching for a witness from the root to a leaf of the quad tree can be done in O (log n ) time in the levels we save the convolution explicitly and a constant number ofqueries for each SetDisjointness instance. Within a leaf we do a "2SUM-like" search on 2arrays that contain O ( X − ǫ /R ) elements. Therefore, the total search time per witness is atmost ˜ O ( X − ǫ /R ). A false witness is a witness pair of elements ( x, y ) such that x + y = z , but h ( x ) + h ( y ) = h ( z ). The probability that a pair of numbers ( x, y ) is a false witness is 1 /n (because the range of h is [ n ]). Therefore, the expected number of false witnesses within aspecific pair of buckets is at most O (( n/R ) · /n ) = O ( n/R ) by the union-bound (noticethat the number of elements in each bucket is O ( n/R )). Consequently, the total expectednumber of false witnesses is at most O ( Rn/R ) = O ( n/R ). As explained before, the totalsearch time per witness is at most ˜ O ( X − ǫ /R ). Thus, the total query time is ˜ O ( nX − ǫ /R ).All in all, the total space and preprocessing time that is required by the explicit levels ofthe O ( R ) hybrid quad trees is ˜ O ( n /X ǫ · R ) which is truly subquadratic in n if we set R = √ X . Moreover, the total query time is ˜ O ( nX − ǫ /R ) which is truly sublinear in n ifwe set R = √ X . Therefore, by setting R = √ X we have that the space and preprocessingtime of the reduction is truly subquadratic in n . Additionally, a query can be answer by atmost O ( n/ √ X ) queries to each SetDisjointness instance plus some additional time that istruly sublinear in n . ◭◮ Theorem 7.
Any solution to SetDisjointness with sets S , S , ..., S m ⊆ [ u ] for any value of u ∈ [ N δ , N ] , such that N = P mi =1 | S i | and δ > , must either have Ω( m − o (1) ) preprocessingtime or have Ω( u / − o (1) ) query time, unless the 3SUM Conjecture is false. Proof.
Given an instance of the 3SUM problem that contains 3 arrays
A, B and C with n numbers in each of them, we can solve this instance simply by creating a 3SUM in- dexing instance with arrays A and B and n queries - one for each number in C . Thus,using the previous lemma the given 3SUM instance can be reduced for any integer valueof X in [ n δ , n ] (for any δ >
0) and for any ǫ > ǫ log X instances of SetDisjointness SD , SD , ..., SD ǫ log X . For any 1 ≤ i ≤ ǫ log X , instance SD i have N = n √ u i ele-ments from universe [ u i ] and m = n q Xu i sets that each one of them is of size O ( √ u i ), where u i = X ǫ / i − . The total time for this reduction is O ( n − ǫ ) for some ǫ >
0, and the totalnumber of queries is ˜ O ( n / √ X ). Consequently, if we assume to the contradiction that thereis an algorithm that solves SetDisjointness on m sets from a universe [ u ] with O ( m − ǫ )preprocessing time for some ǫ > O ( u / − ǫ ) query time for some 0 < ǫ ≤ / O ( n − ǫ ) + P ǫ log Xi =1 O (( n q Xu i ) − ǫ + n √ X u / − ǫ i )time. We have that for any i , u i ≤ X ǫ and q Xu i ≤ q XX − ǫ = X ǫ/ . Therefore, P ǫ log Xi =1 O (( n q Xu i ) − ǫ + n √ X u / − ǫ i )) = ˜ O ( n (1+ ǫ/ − ǫ ) + n √ X X (1+ ǫ )(1 / − ǫ ) ). Thus, bysetting ǫ = min( ǫ , ǫ ) we have a total running time that is truly subquadratic in n . Thiscontradicts the 3SUM Conjecture. ◭ Another implication of our reduction in Lemma 6 is a similar reduction from 3SUM toSetIntersection. This reduction leads to a similar conditional lower bound on the prepro-cessing and query time tradeoff of SetIntersection with bounded universe. This is done inAppendix A.
In this section we present several applications of our lower bounds on SetDisjointness andSetIntersection with bounded universe. Several hardness results on the reporting variantsof the problems in this section appear in Appendix B
As mentioned in the introduction, the range mode problem can be solved using S space and T query time such that: S · T = ˜ O ( n ) [12, 24]. In the following Theorem we prove that S · T = ˜Ω( n ). This lower bound is proved based on the Strong SetDisjointness Conjectureusing Theorem 4. We note that if the lower bound on the query time in Theorem 4 wasΩ( u − o (1) ) instead of Ω( u / − o (1) ) then the lower bound and upper bound were tight. ◮ Theorem 8.
Any data structure that answers Range Mode Queries in T time on a stringof length n must use S = ˜Ω( n /T ) space, unless the Strong SetDisjointness Conjecture isfalse. Proof.
We use the idea of Chan et al. [12] and apply our theorem on the hardness ofSetDisjointness with bounded universe. We begin with an instance of SetIntersection withsets S , S , ..., S m ⊆ [ u ] such that u ∈ [ N δ , N ], N = P mi =1 | S i | and δ >
0. We create a string
ST R that is the concatenation of two string T and T of equal length. The string T is theconcatenation of the strings T , T , ..., T m . For each i the string T i is of length u andeach character in it is a different number in [ u ]. The prefix of T i contains all the numbersin [ u ] \ S i in a sorted order. This prefix is followed by all the numbers in S i in a sortedorder. This is called the suffix of T i . T is constructed very similar to T but with a changein the order of the suffix and prefix. Specifically, the string T is given by the concatenationof the strings T , T , ..., T m . For each i the string T i is of length u and each character in . Goldstein, M. Lewenstein and E. Porat 13 it is a different number in [ u ]. The prefix of T i contains all the numbers in S i in a sortedorder. This prefix is followed by all the numbers in [ u ] \ S i in a sorted order. This is calledthe suffix of T i . For every 1 ≤ i ≤ m , let us denote by a i the index where the prefix of T i ends and by b i the index where the prefix of T i ends.The string ST R is preprocessed for range mode queries. Then, given a query pair ( i, j )for SetDisjointness, we need to decide if S i ∩ S j = ∅ or not. This is done by a range modequery for the range [ a i + 1 , b j ]. For every p ∈ [2] and q ∈ [ m ], the string T pq containscharacters that represent all the numbers in [ u ], such that each of these numbers occursexactly once in the string. Between T i and T j we have m − i + j − u ]. Therefore, each character occurs m − i + j − T i and T j . The suffix of T i starting at index a i + 1 contains all the charactersthat represent the elements of S i , while the prefix of T j ending at index b j contains allthe characters that represent the elements of S j . Consequently, if there is an intersectionbetween S i and S j we will have at least one character that occurs in both the suffix of T i and the prefix of T j . Thus, the mode of the range [ a i + 1 , b j ] will be m − i + j + 1 if S i ∩ S j = ∅ , and less than m − i + j + 1 if the S i ∩ S j = ∅ . Therefore, if we get from therange mode query a character c that occurs m − i + j + 1 times in the query range we knowthat the intersection is not empty, and if not we know that the intersection is empty. Evenif the range mode query does not return the frequency of the mode within the query range,but rather just the mode element itself, we can save a hash table for every input set and usethis tables to check in constant time if the returned element occurs in both S i and S j .Consequently, an instance of SetDisjointness with m sets from universe [ u ] (such that u ∈ [ N δ , N ], N = P mi =1 | S i | and δ > n = 2 mu , such that every query to the SetDisjointnessinstance can be answered by a query to the range mode instance. Let us assume to thecontrary that the range mode problem can be solved by a data structure that answers queriesin ˜ O ( T ) time per query using ˜ O ( S ) space such that S · T = ˜ O ( n − ǫ ). Let T = ˜ O ( u / − ǫ/ ),we have that S = ˜ O ( n ǫ /T ) = ˜ O (( mu ) − ǫ /u / − ǫ/ ) = ˜ O ( m − ǫ u − ǫ /u − ǫ ) ) = ˜ O ( m − ǫ ).Therefore, we have a solution to SetDisjointness with m sets from universe [ u ] with querytime ˜ O ( u / − ǫ/ ) and space ˜ O ( mu + m − ǫ ) (we add mu to the space usage, as we must at leastsave the string ST ). According to Corollary 5 the reduction from general SetDisjointnessto SetDisjointness with bounded universe holds for N/ √ u ≤ m . Therefore, for any value of u ≤ N / − ǫ we have that √ u ≤ N / − ǫ/ . Thus, the following holds: √ u ≤ N / − ǫ/ ⇒ N / − ǫ/ ≤ √ u ⇒ NN / − ǫ/ ≤ N √ u ⇒ N / ǫ/ ≤ N √ u ≤ m ⇒ N / ǫ/ − ǫ − ǫ ≤ m − ǫ .Consequently, we have that u ≤ N / − ǫ < N / − ǫ/ − ǫ/ ≤ N / − ǫ/ − ǫ / ≤ m . All inall, for any u ≤ N / − ǫ the reduction holds and mu = ˜ O ( m − ǫ ). Consequently, the totalspace for solving SetDisjointness with bounded universe using our reduction to the rangemode problem is ˜ O ( m − ǫ ) and the query time is ˜ O ( u / − ǫ/ ). This contradicts the StrongSetDisjointness Conjecture according to Corollary 5. ◭ Using Theorem 7 and the same idea from the proof of Theorem 8, we obtain the follow-ing result regarding the preprocessing and query time tradeoff for solving the range modeproblem: ◮ Corollary 9.
Any data structure that answers Range Mode Queries in T time on a stringof length n must have P = ˜Ω( n /T ) preprocessing time, unless the 3SUM Conjecture isfalse. Agarwal [6] presented space-time tradeoffs for distance oracles for undirected graph G =( V, E ) with average degree µ (that is, µ = | E || V | ): (i) (1 + k )-stretch distance oracles thatuse ˜ O ( | E | + | V | α ) space and have O (( αµ ) k ) query time, for any 1 ≤ α ≤ | V | (ii) (1+ k +0 . )-stretch distance oracles that use ˜ O ( | E | + | V | α ) space and have O ( α ( αµ ) k ) query time, for any1 ≤ α ≤ | V | . (iii) (1+ )-stretch distance oracle that uses ˜ O ( | E | + | V | α ) space and has O ( αµ )query time for any 1 ≤ α ≤ ( | V | | E | ) . In the last result ((iii)) Agarwal managed to shave an α factor of the query time in (ii) (for k = 1). Therefore, both -stretch distance oracle and2-stretch distance oracle (by setting k = 1 in (i)) have the same space-time tradeoff. It isknown that 3-stretch distance oracle has a better tradeoff (see [8]). Moreover, by (i) and (ii)the tradeoff for stretch less than 5 / / t ∈ [ , ◮ Theorem 10.
Any distance oracle for undirected graph G = ( V, E ) with stretch less than must either use Ω( | V | − o (1) ) space or have Ω( µ − o (1) ) query time, where µ is the averagedegree of a vertex in G , unless Strong SetDisjointness Conjecture is false. Proof.
We use the idea of Cohen and Porat [16] with our hardness results for SetDisjointnesswith bounded universe. Given an instance of SetDisjointness with sets S , S , ..., S m ⊆ [ u ]such that u ∈ [ N δ , N ], N = P mi =1 | S i | and δ >
0, we construct a bipartite graph G = ( V, E )as follows: In one side, we create a vertex v i for each set S i . In the other side, we createa vertex u j for each element j ∈ [ u ]. For each element x in some set S i we create an edge( v i , u x ). Formally, V = { v i | ≤ i ≤ m } ∪ { u j | j ∈ [ u ] } and E = { ( v i , u x ) | x ∈ S i } . For any i, j ∈ [ m ], if S i ∩ S j = ∅ then it is clear that the distance between v i and v j is exactly 2.Otherwise, the distance is at least 4. A stretch less-than 2 distance oracle can distinguishbetween these two possibilities and therefore a SetDisjointness query can be answered byone query to a stretch less-than 2 distance oracle for G .It is clear that | V | = m + u and | E | = N . We assume to the contradiction that thereis a stretch less than two distance oracle that uses ˜ O ( | V | − ǫ ) space and answers queries in˜ O ( µ − ǫ ) = ˜ O (( | E || V | ) − ǫ ) time, for some ǫ , ǫ >
0. Therefore, SetDisjointness with boundeduniverse can be solved using ˜ O (( m + u ) − ǫ ) space and queries can be answered using˜ O (( Nm + u ) − ǫ ) time. According to Corollary 5 the reduction from general SetDisjointnessto SetDisjointness with bounded universe holds for N/ √ u ≤ m . Therefore, for any value of u ≤ N / we have that the reduction holds and u ≤ m (see the full details in the proof ofTheorem 8). Moreover, we have that N/ ( m + u ) ≤ N/m ≤ √ u . Consequently, for any u ≤ N / we have a solution to SetDisjointness with bounded universe that uses ˜ O (( m + u ) − ǫ ) =˜ O ( m − ǫ ) space and answers queries in ˜ O ( Nm + u − ǫ ) = ˜ O (( √ u ) − ǫ ) = ˜ O ( u / − ǫ / ) time.This contradicts Strong SetDisjointness Conjecture according to Corollary 5. ◭ The previous theorem can be stated in a different way that makes it clear that the space-time tradeoff of Agarwal [6] is tight for distance oracles with stretch t such that 5 / ≤ t < ◮ Corollary 11.
There is no stretch less-than- distance oracle for undirected graph G =( V, E ) that uses ˜ O ( | V | α ) space and have ˜ O ( α − ǫ µ ) query time for any | V | δ ≤ α and any δ, ǫ > , unless conjecture 1 is false. . Goldstein, M. Lewenstein and E. Porat 15 Using Theorem 7 and the same idea from the proof of Theorem 10, we obtain the followingresult regarding the preprocessing and query time tradeoff for distance oracles with stretchless-than-2: ◮ Theorem 12.
Any distance oracle for undirected graph G = ( V, E ) with stretch less than must either be constructed in Ω( | V | − o (1) ) preprocessing time or have Ω( µ − o (1) ) querytime, where µ is the average degree of a vertex in G , unless the 3SUM Conjecture is false. In the following theorem we prove a conditional lower bound on the space-time tradeoff forsolving 3SUM-Indexing with universe size that is [ n ǫ ] for any ǫ > n is the size of theinput arrays). ◮ Theorem 13.
For any ǫ > and < δ ≤ , any solution to 3SUM-Indexing with arrays A = a , a , ..., a n and B = b , b , ..., b n such that for every i ∈ [ n ] a i , b i ∈ [ n ǫ ] musteither use Ω( n − δ − o (1) ) space or have Ω( n δ − o (1) ) query time, unless Strong SetDisjointnessConjecture is false. Proof.
We use the idea of Goldstein et al. [21] with our hardness for SetDisjointness withbounded universe. We begin with an instance of SetDisjointness with sets S , S , ..., S m ⊆ [ u ]such that u = N δ , m ∈ [ Nu / , Nu / − ǫ ′ ], N = P mi =1 | S i | , ǫ ′ = ǫ/ δ > x in some set S i we create two numbers x ,i and x ,i . The number x ,i consists of 3 blocks of bits (ordered from the least significant bit toward the most significantbit): (i) A block of log m bits that contains the value of the index i . (ii) A block of log m padding zero bits. (iii) A block of log u bits that contains the value of x −
1. The number x ,i consists of 3 blocks of bits (ordered from the least significant bit toward the most significantbit): (i) A block of log m padding zero bits. (ii) A block of log m bits that contains thevalue of the index i . (iii) A block of log u bits that contains the value of u − x . We placethe number x ,i in array A and the number x ,i in array B . The number of elements ineach of these arrays is N , as we add a number to each array for every element in the inputsets. These two arrays form an instance of 3SUM-Indexing which is preprocessed in orderto answer queries.Given a query asking whether S i ∩ S j = ∅ or not, we can answer it by creating a querynumber z to the 3SUM-Indexing instance as follows: The number z consists of 3 blocks ofbits (ordered from the least significant bit toward the most significant bit): (i) A block oflog m bits that contain the value of the index i . (ii) A block of log m that contain the valueof the index j . (iii) A block of log u bits that contains the value of u −
1. It straightforwardto see that we get a positive answer to the query number z iff S i ∩ S j = ∅ : (i) If we have x ,k ∈ A and y ,k ∈ B such that x ,k + y ,k = z , then we must have that: (1) k = i which means that x is in S i . (2) k = j which means that y is in S j . (3) x − u − y = u − x = y . (ii) If S i ∩ S j = ∅ then there is an element x such that x ∈ S i and x ∈ S j . From our construction it is clear that indeed x ,i + x ,j = z .Thus, we have reduced our SetDisjointness instance to an instance of 3SUM-Indexingsuch that each query to the SetDisjointness instance can be answered by a query to the3SUM-Indexing instance. The size of each array in the 3SUM-Indexing instance is N . All thenumbers in these arrays have 2 log m +log u bits. Let u = N δ and m ∈ [ Nu / , Nu / − ǫ ′ ], for ǫ ′ ≤ ǫ , then the number of bits in each number of A and B is bounded by 2 log Nu / − ǫ ′ + log N δ =2 log N − δ/ ǫ ′ δ + log N δ = 2(1 − δ/ ǫ ′ δ ) log N + δ log N = (2 + 2 ǫ ′ δ ) log N ≤ (2 + ǫ ) log N . By setting n = N we have that both A and B have n elements and all the numbers are in[ n ǫ ].We assume to the contradiction that 3SUM-Indexing with universe [ n ǫ ] can be solvedusing ˜ O ( n − δ − γ ) space, while answering queries in ˜ O ( n δ − γ ) time, for some γ , γ > m from universe[ u ] using S = ˜ O ( n − δ − γ ) space, while answering queries in T = ˜ O ( n δ − γ ) time. Wehave that u = n δ , so n = u /δ . Moreover, m ≥ n − δ/ , so n ≤ m / (1 − δ/ . Therefore, S = ˜ O ( m (2 − δ − γ ) / (1 − δ/ ) = ˜ O ( m − γ − δ ) and T = ˜ O ( u ( δ − γ ) /δ ) = ˜ O ( u / − γ /δ ). Thiscontradicts Corollary 5. ◭ Using Theorem 7 and the same idea from the proof of Theorem 13, we obtain the followingresult regarding the preprocessing and query time tradeoff for distance oracles with stretchless-than-2: ◮ Theorem 14.
For any ǫ > and < δ ≤ , any solution to 3SUM-Indexing witharrays A = a , a , ..., a n and B = b , b , ..., b n such that for every i ∈ [ n ] a i , b i ∈ [ n ǫ ] must either have Ω( n − δ − o (1) ) preprocessing time or have Ω( n δ − o (1) ) query time, unless the3SUM Conjecture is false. References Amir Abboud and Kevin Lewi. Exact weight subgraphs and the k-sum conjecture. In
International Colloquium on Automata, Languages and Programming, ICALP 2013 , pages1–12, 2013. Amir Abboud and Virginia Vassilevska Williams. Popular conjectures imply strong lowerbounds for dynamic problems. In
Foundations of Computer Science, FOCS 2014 , pages434–443, 2014. Amir Abboud, Virginia Vassilevska Williams, and Oren Weimann. Consequences of fasteralignment of sequences. In
International Colloquium on Automata, Languages and Pro-gramming, ICALP 2014 , pages 39–51, 2014. Amir Abboud, Virginia Vassilevska Williams, and Huacheng Yu. Matching triangles andbasing hardness on an extremely popular conjecture. In
Symposium on Theory of Comput-ing, STOC 2015 , pages 41–50, 2015. Peyman Afshani and Jesper Sindahl Nielsen. Data structure lower bounds for document in-dexing problems. In
International Colloquium on Automata, Languages, and Programming,ICALP 2016 , pages 93:1–93:15, 2016. Rachit Agarwal. The space-stretch-time tradeoff in distance oracles. In
Algorithms - ESA2014 - 22th Annual European Symposium, Wroclaw, Poland, September 8-10, 2014. Pro-ceedings , pages 49–60, 2014. Rachit Agarwal and Philip Brighten Godfrey. Distance oracles for stretch less than 2. In
Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms,SODA 2013, New Orleans, Louisiana, USA, January 6-8, 2013 , pages 526–538, 2013. Rachit Agarwal, Philip Brighten Godfrey, and Sariel Har-Peled. Approximate distancequeries and compact routing in sparse graphs. In
INFOCOM 2011. 30th IEEE InternationalConference on Computer Communications, Joint Conference of the IEEE Computer andCommunications Societies, 10-15 April 2011, Shanghai, China , pages 1754–1762, 2011. Amihood Amir, Timothy M. Chan, Moshe Lewenstein, and Noa Lewenstein. On hardness ofjumbled indexing. In
International Colloquium on Automata, Languages and Programming,ICALP 2014 , pages 114–125, 2014. . Goldstein, M. Lewenstein and E. Porat 17 Amihood Amir, Tsvi Kopelowitz, Avivit Levy, Seth Pettie, Ely Porat, and B. Riva Shalom.Mind the gap: Essentially optimal algorithms for online dictionary matching with onegap. In
International Symposium on Algorithms and Computation, ISAAC 2016 , pages12:1–12:12, 2016. Gill Barequet and Sariel Har-Peled. Polygon-containment and translational min-hausdorff-distance between segment sets are 3sum-hard. In
Symposium on Discrete Algorithms,SODA 1999 , pages 862–863, 1999. Timothy M. Chan, Stephane Durocher, Kasper Green Larsen, Jason Morrison, andBryan T. Wilkinson. Linear-space data structures for range mode query in arrays.
TheoryComput. Syst. , 55(4):719–741, 2014. Shiri Chechik. Approximate distance oracles with constant query time. In
Symposium onTheory of Computing, STOC 2014, New York, NY, USA, May 31 - June 03, 2014 , pages654–663, 2014. Hagai Cohen. Fast set intersection and two-patterns matching. Master’s thesis, Bar-IlanUniversity, Ramat-Gan, Israel, 2010. Hagai Cohen and Ely Porat. Fast set intersection and two-patterns matching.
Theor.Comput. Sci. , 411(40-42):3795–3800, 2010. Hagai Cohen and Ely Porat. On the hardness of distance oracle for sparse graph.
CoRR ,abs/1006.1117, 2010. Pooya Davoodi, Michiel H. M. Smid, and Freek van Walderveen. Two-dimensional rangediameter queries. In
LATIN 2012: Theoretical Informatics - 10th Latin American Sym-posium, Arequipa, Peru, April 16-20, 2012. Proceedings , pages 219–230, 2012. Paul F. Dietz, Kurt Mehlhorn, Rajeev Raman, and Christian Uhrig. Lower bounds for setintersection queries.
Algorithmica , 14(2):154–168, 1995. Anka Gajentaan and Mark H. Overmars. On a class of O ( n ) problems in computationalgeometry. Comput. Geom. , 5:165–185, 1995. Isaac Goldstein, Tsvi Kopelowitz, Moshe Lewenstein, and Ely Porat. How hard is it to find(honest) witnesses? In
European Symposium on Algorithms, ESA 2016 , pages 45:1–45:16,2016. Isaac Goldstein, Tsvi Kopelowitz, Moshe Lewenstein, and Ely Porat. Conditional lowerbounds for space/time tradeoffs. In
Algorithms and Data Structures Symposium, WADS2017 , pages 421–436, 2017. Tsvi Kopelowitz, Seth Pettie, and Ely Porat. Dynamic set intersection. In
Algorithmsand Data Structures - 14th International Symposium, WADS 2015, Victoria, BC, Canada,August 5-7, 2015. Proceedings , pages 470–481, 2015. Tsvi Kopelowitz, Seth Pettie, and Ely Porat. Higher lower bounds from the 3SUM conjec-ture. In
Symposium on Discrete Algorithms, SODA 2016 , pages 1272–1287, 2016. Danny Krizanc, Pat Morin, and Michiel H. M. Smid. Range mode and range median querieson lists and trees.
Nord. J. Comput. , 12(1):1–17, 2005. Mihai Patrascu. Towards polynomial lower bounds for dynamic problems. In
Symposiumon Theory of Computing, STOC 2010 , pages 603–610, 2010. Mihai Patrascu and Liam Roditty. Distance oracles beyond the Thorup-Zwick bound.
SIAM J. Comput. , 43(1):300–311, 2014. Mihai Patrascu, Liam Roditty, and Mikkel Thorup. A new infinity of distance oracles forsparse graphs. In , pages 738–747, 2012. Mikkel Thorup and Uri Zwick. Approximate distance oracles.
J. ACM , 52(1):1–24, 2005. Joshua R. Wang. Space-efficient randomized algorithms for K-SUM. In
European Sym-posium on Algorithms, ESA 2014 , pages 810–829, 2014. Virginia Vassilevska Williams. On some fine-grained questions in algorithms and complex-ity. In
International Congress of Mathematicians, ICM 2018 , 2018. Virginia Vassilevska Williams and Ryan Williams. Finding, minimizing, and countingweighted subgraphs.
SIAM J. Comput. , 42(3):831–854, 2013. . Goldstein, M. Lewenstein and E. Porat 19
AppendixA Conditional Lower Bounds for SetIntersection
In the following theorem we prove a conditional lower bound on SetIntersection with boundeduniverse based on the Strong SetDisjointness Conjecture by generalizing the ideas fromTheorem 4. Specifically, we demonstrate that for SetIntersection we either have the samespace lower bound as for SetDisjointness or we have a ˜Ω( u − o (1) + out ) bound on the querytime. The query time bound is stronger than the Ω( u / − o (1) ) bound that we have forSetDisjointness. However, we argue that this lower bound for SetIntersection holds onlywhen the output is large. If we have an upper bound on the size of the output we still havea lower bound on the query time, but this lower bound gets closer to ˜Ω( u / − o (1) + out ) asthe size of the output gets smaller. Eventually, this coincides with the lower bound we havefor SetDisjointness (notice that in order to answer SetDisjointness queries we just need tooutput a single element from the intersection if there is any). ◮ Theorem 15.
Any solution to SetIntersection with sets S , S , ..., S m ⊆ [ u ] for any valueof u ∈ [ N δ , N ] , such that N = P mi =1 | S i | and δ > , must either use Ω( m − o (1) ) space orhave ˜Ω( u α − o (1) + out ) query time, for any / ≤ α ≤ and any output size out such that out = Ω( u α − − δ ) and δ > ,unless Strong SetDisjointness Conjecture is false. Proof.
We use the same idea as in the proof of Theorem 4. Let us assume to the contradic-tion that Strong SetDisjointness Conjecture is true but there is an algorithm A ′ that solvesSetIntersection on m sets from a universe [ u ] and creates a data structure D , such that thespace complexity of the data structure D is O ( m − ǫ ) for some ǫ > A ′ is O ( u α − ǫ ) for some 0 < ǫ ≤ /
2. We define ǫ = min( ǫ , ǫ ).In the proof, we call those sets with at least u α − / ǫ elements large sets and those setswith at most O ( u α − ǫ ) elements small sets . All other sets are called medium sets .SetIntersection (for general universe) can be solved in the following way:The preprocessing phase is similar to the one that is done in the proof of Theorem 4with the following changes: 1. In step (5) we check for each pair of medium sets S i and S j such that S i ∩ S j = ∅ that the size of h k ( S i ) ∩ h k ( S j ) is no more than u α − − / ǫ for at leastone h k : U → [8 u ] that we pick in step (4). This is done instead of just checking for theemptiness of h k ( S i ) ∩ h k ( S j ). 2. In step (6.2) we use algorithm A ′ to create a data structure D k that solves the SetIntersection problem instead of the SetDisjointness problem.The query phase is also very similar to the one from Theorem 4 with the following change:In step (3), for each k , we get one by one the elements in the intersection of h k ( S i ) and h k ( S j ) by querying the data structure D k . For each element e in that intersection we verifythat it is contained in both S i and S j using the tables T i and T j . If this is the case, thenwe return that the sets are not disjoint. Otherwise, we add one to a counter of the numberof elements in ( h k ( S i ) ∩ h k ( S j )) \ ( S i ∩ S j ). If this counter exceeds u α − − / ǫ we stop thequery immediately and continue to the next value of k .The correctness of this reduction follows from the same arguments as in the proof ofTheorem 4. The difference is in analysing the hash functions and their properties. For anytwo unequal elements x ∈ S i and x ∈ S j , where both S i and S j are medium sets, and forany hash function h k : N → [8 u ] we have that Pr[ h k ( x ) = h k ( x )] ≤ / (8 u ). We call twounequal elements x ∈ S i and x ∈ S j such that h k ( x ) = h k ( x ) a false-positive of h k . Thenumber of elements in the medium sets is no more than u α − / ǫ . Consequently, the expectednumber of false-positives in h k ( S i ) ∩ h k ( S j ) is no more than ( u α − / ǫ ) / u = u α − − / ǫ / h k is more than u α − − / ǫ is no more than 1/8. Therefore, the probability that a pair of mediumsets S i and S j has more than u α − − / ǫ false-positives when applying h k for all k ∈ [log n ]is no more than (1 / log n = 1 /n . Thus, the probability that the number of false-positivesfor any pair of medium sets is more than u α − − / ǫ when applying h k for all k ∈ [log n ] isno more than n /n = 1 /n by the union-bound. Using the probability method we get thatthere must be log n hash functions such that for every pair of medium sets S i and S j thenumber of false-positives is no more than u α − − / ǫ after applying the hash functions byat least one of the log n hash functions. Complexity analysis . Space complexity . The space for the tables in step (1) of the preprocessing is clearly O ( N ) - linear in the total number of elements. The total number of large sets d is at most O ( N/u α − / ǫ ). The total number of medium sets e is at most O ( N/u α − ǫ ). Therefore,the size of the matrix M is at most O ( N/u α − / ǫ · N/u α − ǫ ) = O ( N /u α − / ǫ ). Thereare log n data structures that are created in step (6). Each data structure uses at most O (( N/u α − ǫ ) − ǫ ) = O ( N − ǫ /u α − (2+ α ) ǫ + ǫ ) space. Consequently, the total space complexityis S = ˜ O ( N /u α − / ǫ + N − ǫ /u α − (2+ α ) ǫ + ǫ ). Query time complexity . Step (1) of the query algorithm can be done in O ( u α − ǫ )as this is the size of the largest small set. Step (2) is done in constant time by lookingat the right position in M . In step (3) we do log n queries using algorithm A ′ and thedata structures D k . the universe of the sets after applying any hash function h k is [8 u ], sothe query time for each query is O ( u α − ǫ + out ) ( out is the size of the output we get fromthe query). We do not allow the query to output more than u α − − / ǫ < u α − ǫ elements.Therefore, the total query time is T = O ( u α − ǫ ).Following our analysis we have that S · T = ˜ O (( N /u α − / ǫ + N − ǫ /u α − (2+ α ) ǫ + ǫ ) · ( u α − ǫ ) ) = ˜ O ( N u − / ǫ + N − ǫ u αǫ − ǫ ). As α ≤ u ≤ N , we have that u αǫ ≤ N ǫ .Therefore, S · T = ˜ O ( N u − / ǫ + N u − ǫ ). This contradicts the Strong SetDisjointnessConjecture and therefore our assumption is false. ◭ A better lower bound on the space complexity for solving SetIntersection can be obtainedbased on the Strong SetIntersection Conjecture. This is demonstrated by the followingtheorem: ◮ Theorem 16.
Any solution to SetIntersection with sets S , S , ..., S m ⊆ [ u ] for any valueof u ∈ [ N δ , N ] , such that N = P mi =1 | S i | and δ > , must either use Ω(( m u α ) − o (1) ) spaceor have ˜Ω( u α − o (1) + out ) query time for any / ≤ α ≤ and any output size out such that out = Ω( u α − − δ ) and δ > , unless Strong SetIntersection Conjecture is false. Proof.
The proof is very similar to the proof of Theorem 4. Let us assume to the contra-diction that the Strong SetIntersection Conjecture is true but there is an algorithm A ′ thatsolves SetIntersection on m sets from a universe [ u ] and creates a data structure D , suchthat the space complexity of the data structure D is O ( m u α ) − ǫ ) for some ǫ > A ′ is O ( u α − ǫ ) for some 0 < ǫ ≤ /
2. We define ǫ = min( ǫ , ǫ ).In order to solve SetIntersection for general universe we use almost the same prepro-cessing and query procedures as in the the proof of Theorem 4 except for the follow-ing changes: 1. In the preprocessing phase, we do not save in matrix M in the entries M [ p ( i ) , p ( j )] or M [ p ( i ) , d + q ( j )] just the answer to the emptiness of the intersection of S i and S j , but rather we save in this location a list of all the elements within the intersectionof S i and S j . 2. In the query phase, in step (2) we return a list of elements and not justa single bit. 3. In the query phase, in step (3) for each k we get the intersection of h k ( S i )and h k ( S j ) by querying the data structure D k . For each element e in that intersection we . Goldstein, M. Lewenstein and E. Porat 21 return it after verifying that it is contained in both S i and S j using the tables T i and T j .Moreover, we count the number of elements in ( h k ( S i ) ∩ h k ( S j )) \ ( S i ∩ S j ) as we get themfrom the query and if they exceed u α − − / ǫ we stop the query immediately and continuewith the next value of k .The correctness of the above solution to set intersection follows from the same argumentsas in the proof of Theorem 15. Complexity analysis . Space complexity . The space for the tables in step (1) of the preprocessing is clearly O ( N ). Matrix M in this solution contains in each entry the complete list of elementsin the intersection of some pair of sets. The total number of large sets d is at most O ( N/u α − / ǫ ). The total number of medium sets e is at most O ( N/u α − ǫ ). The totalnumber of elements in all sets is N . Therefore, the size of the matrix M is at most O ( N/u α − / ǫ · N/u α − ǫ · u α − ǫ ) = O ( N /u α − / ǫ ) (see the full details in Appendix C). Thereare log n data structures that are created in step (6). Each data structure use at most O ((( N/u α − ǫ ) u α ) − ǫ ) = O ( N − ǫ /u α − (2+ α ) ǫ +2 ǫ ) space. Consequently, the total space com-plexity is S = ˜ O ( N /u α − / ǫ + N − ǫ /u α − (2+ α ) ǫ +2 ǫ ). Query time complexity . Step (1) of the query algorithm can be done in O ( u α − ǫ ) asthis is the size of the largest small set. Step (2) is done in constant time plus the outputsize by looking at the right position in M . In step (3) we do log n queries using algorithm A ′ and the data structures D k . The universe of the sets after applying any hash function h k is [ u ], so the query time for each query is O ( u α − ǫ + out ) ( out is the size of the outputwe get from the query). We do not allow the query to output more than u α − − / ǫ < u α − ǫ false-positive elements. Therefore, the total query time is O ( T + out ), where T = O ( u α − ǫ ).Following our analysis we have that S · T = ˜ O (( N /u α − / ǫ + N − ǫ /u α − (2+ α ) ǫ +2 ǫ ) · ( u α − ǫ )) = ˜ O ( N u − / ǫ + N − ǫ u (1+ α ) ǫ − ǫ ). As α ≤ u ≤ N , we have that u (1+ α ) ǫ ≤ N ǫ . Therefore, S · T = ˜ O ( N u − / ǫ + N u − ǫ ). This contradicts the Strong SetIntersectionConjecture and therefore our assumption is false. ◭ The construction in the proof of Lemma 6 can be modified in order to obtain the followingreduction from 3SUM-Indexing to SetIntersection: ◮ Lemma 17.
For any < γ < δ ≤ , an instance of 3SUM-Indexing that contains 2 arrayswith n integers can be reduced to an instance SI of SetIntersection. The instance SI have N = n √ u elements from universe [ u ] and m = n γ − δ/ sets that each one of them is ofsize O ( √ u ) , where u = n δ and < γ < δ ≤ . The time and space complexity of thereduction is ˜ O ( n γ − δ ) . Each query to the 3SUM-Indexing instance can be answered by atmost O ( n γ − δ ) queries to SI plus some additional O (log n ) time. Proof.
We follow the construction from the proof of Lemma 6. In each quad tree we con-struct for some two buckets A i and B j , we save the convolution results of the correspondingsub-vectors until the bottom level in which the size of each subvector is X . In this level,for each pair of sub-vectors we create O ( √ X ) sets (representing different shifts) in the sameway we construct the sets for the SetDisjointness instances in the proof of Lemma 6. Thesesets form a SetIntersection instance that contains O ( R · n/X · √ X ) = O ( nR/ √ X ) sets. Inthe query phase, whenever we search a quad tree and get to a leaf node we can immediatelyreport all pairs of elements that are witnesses for C i,j [ h ( z )]. This is easily done by a singleSetIntersection query. The number of sub-vectors in the bottom level is O ( n/X ) for both v A i and v B j . For every sub-vector of v A i there are at most O (1) sub-vectors of v B j thattheir convolution with v A i may contain a witness pair for C i,j [ h ( z )]. Consequently, we doat most O ( n/X ) intersection queries within each quad tree. Therefore, the total space for constructing the quad trees’ levels with explicit convolutionresults is ˜ O ( n /X · R ) (see the full analysis in the proof of Lemma 6). This is also thepreprocessing time for constructing these quad trees as the convolution of two n -lengthvectors can be calculated in ˜ O ( n ) time. It is clear that the space and preprocessing time aretruly subquadratic in n for any δ > γ >
0. Moreover, the query time overhead is no morethan O (log n ) for every query (a search through a path from the root to a leaf in some quadtree). ◭◮ Theorem 18.
Any solution to SetIntersection with sets S , S , ..., S m ⊆ [ u ] for any value of u ∈ [ N δ , N ] , such that N = P mi =1 | S i | and δ > , must either have Ω( m − o (1) ) preprocessingtime or have ˜Ω( u − o (1) + out ) query time, unless the 3SUM Conjecture is false. Proof.
Given an instance of the 3SUM problem that contains 3 arrays
A, B and C with n numbers in each of them, we can solve this instance simply by creating a 3SUM indexinginstance with arrays A and B and n queries - one for each number in C . Thus, using theprevious lemma the given 3SUM instance can be reduced to an instance of SetIntersectionwith m = n γ − δ/ sets from universe [ u ] using O ( n γ − δ ) time for preprocessing, wherethe total number of queries to these instances is O ( n γ − δ ).We assume to the contradiction that there is an algorithm that solves SetIntersectionon m sets from a universe [ u ] with O ( m − ǫ ) preprocessing time for some ǫ > O ( u − ǫ + out ) query time for some 0 < ǫ ≤
1. If we choose the value of δ such that δ > max 2 , ǫ , then we have a solution to 3SUM with truly subquadratic running time. Thiscontradicts the 3SUM Conjecture. ◭ . Goldstein, M. Lewenstein and E. Porat 23 B Hardness of Reporting ProblemsB.1 Range Mode Reporting
In the reporting variant of the Range Mode problem we are required to report all elementsin the query range that are the mode of this range. We have stronger lower bounds for thisvariant using the same construction as in the proof of Theorem 8 with the conditional lowerbounds for SetIntersection with bounded universe. The results refer to both the interplaybetween space and query time and the interplay between preprocessing and query time. ◮ Theorem 19.
Any data structure that answers Range Mode Reporting in O ( T + out ) timeon a string of length n , where out is the output size, must use S = ˜Ω( n /T ) space, unlessthe Strong SetIntersection Conjecture is false. ◮ Theorem 20.
Any data structure that answers Range Mode Reporting in O ( T + out ) time on a string of length n , where out is the output size, must must have P = ˜Ω( n /T ) preprocessing time, unless the 3SUM Conjecture is false. B.2 3SUM-Indexing Reporting
In the reporting variant of 3SUM-Indexing we are required to report all pairs of numbers a ∈ A and b ∈ B such that their sum equals the query number. Using our hardness resultsfor SetIntersection with bounded universe we prove the following conditional lower boundson 3SUM-Indexing reporting. These results are obtained by applying the same techniquesas in the proof of Theorem 13. ◮ Theorem 21.
For any ǫ > and < δ ≤ , any solution to 3SUM-Indexing reporting witharrays A = a , a , ..., a n and B = b , b , ..., b n such that for every i ∈ [ n ] a i , b i ∈ [ n ǫ − δ ] must either use Ω( n − δ − o (1) ) space or have ˜Ω( n δ − o (1) + out ) query time, where out is theoutput size, unless Strong SetIntersection Conjecture is false. ◮ Theorem 22.
For any ǫ > and < δ ≤ , any solution to 3SUM-Indexing reporting witharrays A = a , a , ..., a n and B = b , b , ..., b n such that for every i ∈ [ n ] a i , b i ∈ [ n ǫ − δ ] must either have Ω( n − δ − o (1) ) preprocessing time or have ˜Ω( n δ − o (1) + out ) query time, where out is the output size, unless the 3SUM Conjecture is false. C Algorithms for Solving SetIntersection
The Strong SetIntersection Conjecture argues that any solution to SetIntersection such thatthe query time of the solution is O ( T + out ), where out is the output size, must use S = ˜Ω( N T )space. In the following subsections we present three simple algorithms that demonstrate howto achieve the S · T = O ( N ) tradeoff. This tradeoff is superior to the tradeoff of Cohen andPorat [15] and Cohen [14] where the output size is large (see the discussion in Section 3). C.1 Algorithm 1
We are given an instance of SetIntersection with sets S , S , ..., S m . Let S k , S k , ..., S k p beall the sets in the input instance that their size is larger than r , for some integer r ≥ p × p matrix M , such that the i th row and columnrepresent the set S k i . At location ( s, t ) of matrix M , for 1 ≤ s, t ≤ p , we save a list of allelements in the intersection of S k s and S k t . Moreover, for every set S i , for 1 ≤ i ≤ m , wesave a hash table that contains all the elements in S i . Given a query pair ( i, j ), if one of thesets S i or S j has no more than r elements then we can go over each of the elements of thisset and check in the hash table of the other set if this element is also contained in the otherset. If both sets have at least r elements then the intersection can be found at location ( i, j )of matrix M . Analysis . If at least one of the sets has no more than r elements then the query timeis O ( r ) using the hash table of the other set. If both sets have at least r elements then theanswer is kept in matrix M , so the query time is O ( out ). That is, the total query time is O ( r + out ). The total number of elements in all sets is O ( N ). Therefore, the size of the hashtables is O ( N ). The number of sets with at least r elements is no more than Nr . The size ofthe intersection between two sets that one of them has no more than 2 r elements is boundedby 2 r . Therefore, the total space in M for the rows and columns representing sets that theirsize is within the range [ r, r ] is at most ( Nr ) · r = N r . Using the same argument, thethe total space in M for the rows and columns representing sets that their size is within therange [2 i r, i +1 r ], for 0 ≤ i ≤ log Nr − N i r ) · i +1 r = N i r . Consequently, thetotal space used by the algorithm is P log Nr − i =0 2 N i r = N r P log Nr − i =0 12 i = O ( N r ). C.2 Algorithm 2
The idea is similar to the solution of Cohen and Porat [15] with some modifications.We create a binary tree. The tree has log r + 1 levels for some r ≥ C = { S k , S k , ..., S k p } be the collection of all the sets in the input instance thattheir size is larger than r . In the root we create a p × p boolean matrix such that the i throw and column represent S k i . For every s, t ∈ [ p ], we set M [ s, t ] to 1, if S k s ∩ S k t = ∅ , andto 0, otherwise. For the next level downward, we ignore all sets with less than r elements.That is, we continue just with the sets in C . Let e , e , .., e q be all the elements in the setsof C . For every 1 ≤ i ≤ q , let f i be the number of sets in C that contain e i . Let z be thelargest integer such that P zi =1 e i ≤ N . The left child of the root handles all the sets in C ignoring all their elements in { e z +1 , ..., e q } , while the right child of the root handles all thesets in C ignoring all their elements in { e , ..., e z +1 } . The element e z +1 is kept in the root.It is clear that the number of elements that each node handles is no more than N .We continue the construction recursively downward the tree. That is, a node v in the i th level (the root level is 0) represents a collection C ′ of sets S ′ , S ′ , ..., S ′ m ′ . For all the setsthat their size is at least r i , a matrix is created that contains the answers to disjointness . Goldstein, M. Lewenstein and E. Porat 25 queries of two sets with at least r i elements. Then, we create two child nodes. Withinthese child nodes, we continue with a collection C ′′ that contains all sets in C ′ with at least r i elements. Let e ′ , e ′ , .., e ′ q ′ be all the elements in the sets of C ′′ . For every 1 ≤ i ≤ q ′ ,let f ′ i be the number of sets in C ′′ that contain e ′ i . Let z ′ be the largest integer such that P z ′ i =1 e ′ i ≤ N i . The left child of v handles all the sets in C ′′ ignoring all their elements in { e ′ z ′ +1 , ..., e ′ q ′ } , while the right child of v handles all the sets in C ′′ ignoring all their elementsin { e ′ , ..., e ′ z ′ +1 } . The element e z ′ +1 is kept in v . It is clear that the number of elementsthat each child node handles is no more than N i .In any node u of the (log r )-level of the binary tree, the matrix we create does notcontain just answers to disjointness queries, instead, it contains the complete answers to theintersection queries for each pair of the sets that u represents. The nodes in the (log r )-levelhave no child nodes.Finally, besides the binary tree, for every set S i , for 1 ≤ i ≤ m , we save a hash table T i that contains all the elements in S i .The query is handled as follows. Given a query pair ( i, j ), we start handling the queryfrom the root of the tree we have constructed. If either S i or S j has less than r elements,we can answer the query immediately using the hash table T i or the hash table T j . If bothsets have at least r elements we look at the proper location in the matrix in the root nodein order to know if S i ∩ S j = ∅ or not. If the intersection is empty we are done. Otherwise,we first check if the element e z +1 is contained in both sets, if so it is reported. Then, wecontinue the search recursively in the two child nodes. Within the search in the left childnode we ignore all the elements of the sets S i and S j that are in { e z +1 , ..., e q } , while in thesearch in the right child we ignore all their elements in { e , ..., e z +1 } . Finally, if we reacha node in the (log r )-level then we report the elements of the intersection of the relevantsubsets of S i and S j that remain when reaching that node. This is done by using the matrixthat is saved in that node. Analysis . In the i th level of the binary tree the total number of elements that arepropagated to each node in this level is no more than N i . Therefore, the number of sets ina node in the i th level that their size is at least r i is at most N ir i = Nr . Consequently, thematrix size in each node of the binary tree is bounded by ( Nr ) . In the (log r )-level of thebinary tree we keep the complete intersection between each of the sets in a node. The totalnumber of elements in a node in the (log r )-level is at most N log r = Nr . The number of setsin each node in the (log r )-level is at most Nr . Therefore, the size of the matrix in each nodein the (log r )-level is also ( Nr ) , as the worst case scenario is when the same element occursin all sets. In this case, it appears in the intersection of all pairs of sets, that is, it appears atmost ( Nr ) times. All in all, the size of the matrix in all nodes is no more than ( Nr ) . In the i th level there are 2 i nodes. Thus, the total size of the binary tree is P log ri =1 i ( Nr ) = ˜ O ( N r ).The query time in each node during the traversal of the binary tree is O (1) if both setsin question have at least r i elements. Otherwise, it is O ( r i ) as for each element of the smallset we check if it is in the other set using the hash table. If one of the sets has less than r i ina node in the i th level the search in that path is stopped. This node is called a stopper node.Let us denote by x the number of stopper nodes. Let v , v , ..., v x be the stopper nodes andlet us denote by ℓ i the level of the stopper node v i . The total query time is dominated bythe query time in the stopper nodes. Searching until the stopper node is done in O (log r )time and it is guaranteed that at least one output element is found in the stopper node.Therefore, the query time is ˜ O ( P xi =1 r ℓi ) = ˜ O ( r P xi =1 12 ℓi = ˜ O ( r ). The last equality follows from Kraft inequality that states that for any binary tree T X ℓ ∈ leaves ( T ) depth ( ℓ ) ≤
1. If weget to a node v in the (log r )-level then we report all the elements of the intersection thatare saved at the proper location in the matrix of v . Therefore, the time we spend in suchnode is proportional to the size of the output which is optimal. All in all, the total querytime is ˜ O ( r + out ). C.3 Algorithm 3
In the preprocessing phase, we first calculate the frequency of each element that occurs in atleast one set of the input collection. We save a list L of the r most frequent elements in theinput sets. Moreover, we create a dictionary D . We create an entry ( i, j ) in the dictionaryfor each pair of sets S i and S j that their intersection is not empty. This entry contains allthe elements in S i ∩ S j excluding those that are in L . Additionally, for every set S i we savea hash table to quickly verify the existence of some element in the set. Given a query pairof sets S i and S j , we first go over all the elements in L and check for each one of them if itis in both S i and S j using their hash tables. Then, we output all the element in D in theentry that corresponds to S i and S j if there is such an entry. Analysis . The query time is obviously O ( r + out ), as the number of elements in L is r and all the elements that are found in the dictionary are part of the output. Let e , e , ..., e k be all the elements in the input sets sorted in non-increasing order of frequency. Additionally,let us denote by f i the number of sets in which an element e i occurs. If an element e i does notappear in the list L then its frequency f i is at most N/r . The number of intersections of twosets in which element e i occurs is no more than f i . Therefore, the total size of the dictionary D is O ( f r +1 + f r +2 + ... + f k ). We know that f r +1 + f r +2 + ... + f k ≤ N , and that each elementis the sum is bounded by N/r . The sum f r +1 + f r +2 + ... + f k is maximized when all elementsin this sum are as larger as possible. Consequently, f r +1 + f r +2 + ... + f k ≤ r ( N/r ) = N /r .Therefore, the total space complexity of the algorithm is O ( N /r ). C.4 Hybrid Solution
In the previous subsections we presented several algorithms that solves SetIntersection with O ( T + out ) query time (where out is the output size) and S = O ( N T ) space. Cohen [14]demonstrated a solution that uses O ( N − t ) space and answer queries in O ( N t out − t + out )time for 0 ≤ t ≤ /
2. As mentioned in Section 3, the last solution is better than the firstone whenever out < N t − t . Otherwise, the first solution is better. It is desired to obtain asolution that combine these two tradeoffs. The simple idea to do so is as follows. We fix aspecific amount of space S that we are allowed to use. Then, we maintain 3 data structuresthat each one of them uses S space: (1) Data structure for SetDisjointness that for every pairof set S i and S j from the input collection does not just answer whether their intersection isempty or not, but rather returns the size of their intersection. This can be easily done usingthe same space-time tradeoff that is known for SetDisjointness. That is, S × T = O ( N ),where S is the space complexity and T is the query time. The idea is to maintain a matrixfor all sets that their size is larger than T that contains an explicit answer for the size ofthe intersection of two sets that are both larger than T . For the intersection of sets that atleast one of them is smaller than T , we can just go over all the elements of the small set andcheck how many of them occurs in the larger set using a hash table for the large set. (2) Adata structure for solving SetIntersection using S space with O ( N S + out ) query time. Thisdata structure can be constructed using any of the 3 algorithms that are described in the . Goldstein, M. Lewenstein and E. Porat 27 previous subsections. (3) A data structure for solving SetIntersection using S space with O ( N out − t / √ S + out ) query time. This data structure can be obtained using the solutionby Cohen [14] (see also the discussion in Section 3).Using these 3 data structures, when we are given as a query two sets S i and S j we canfind the size of their intersection by the first data structure. Then, if out < N t − t (the valueof t is fully determined by the fact that S = O ( N − tt