[PDF] A Faster Algorithm for Finding Closest Pairs in Hamming Metric

Abstract

We study the Closest Pair Problem in Hamming metric, which asks to find the pair with the smallest Hamming distance in a collection of binary vectors. We give a new randomized algorithm for the problem on uniformly random input outperforming previous approaches whenever the dimension of input points is small compared to the dataset size. For moderate to large dimensions, our algorithm matches the time complexity of the previously best-known locality sensitive hashing based algorithms. Technically our algorithm follows similar design principles as Dubiner (IEEE Trans. Inf. Theory 2010) and May-Ozerov (Eurocrypt 2015). Besides improving the time complexity in the aforementioned areas, we significantly simplify the analysis of these previous works. We give a modular analysis, which allows us to investigate the performance of the algorithm also on non-uniform input distributions. Furthermore, we give a proof of concept implementation of our algorithm which performs well in comparison to a quadratic search baseline. This is the first step towards answering an open question raised by May and Ozerov regarding the practicability of algorithms following these design principles.

Full PDF

aa r X i v : . [ c s . D S ] F e b A Faster Algorithm for Finding Closest Pairs inHamming Metric

Andre Esser , Robert Kübler , and Floyd Zweydinger Cryptography Research Center, Technology Innovation Institute, AbuDhabi, UAE, [email protected] Medion AG Essen, Germany, [email protected] Ruhr University Bochum, Germany, [email protected]

Abstract

We study the Closest Pair Problem in Hamming metric, which asks to ﬁnd the pairwith the smallest Hamming distance in a collection of binary vectors. We give a newrandomized algorithm for the problem on uniformly random input outperforming previ-ous approaches whenever the dimension of input points is small compared to the datasetsize. For moderate to large dimensions, our algorithm matches the time complexity ofthe previously best-known locality sensitive hashing based algorithms. Technically ouralgorithm follows similar design principles as Dubiner (IEEE Trans. Inf. Theory 2010)and May-Ozerov (Eurocrypt 2015). Besides improving the time complexity in the afore-mentioned areas, we signiﬁcantly simplify the analysis of these previous works. We givea modular analysis, which allows us to investigate the performance of the algorithmalso on non-uniform input distributions. Furthermore, we give a proof of concept im-plementation of our algorithm which performs well in comparison to a quadratic searchbaseline. This is the ﬁrst step towards answering an open question raised by May andOzerov regarding the practicability of algorithms following these design principles.

Keywords— closest pair problem, nearest neighbor, LSH

Finding closest pairs in a given dataset of binary vectors is a fundamental problem in theoreticalcomputer sciences with numerous applications in data science, machine learning, computer vision,cryptography, and many others.Image data for example is often represented via compact binary codes to allow for eﬃcient closestpair search in applications like similarity search in images or facial recognition systems [7,15,20]. Theusage of binary codes also allows decoding the represented data to common codewords. Here, themost eﬃcient algorithms known for decoding such random binary linear codes also heavily beneﬁtfrom improved algorithms for the Closest Pair Problem [6, 17]. Another common application liesin the ﬁeld of bioinformatics, where the analysis of genomes involves closest pair search on largedatasets to identify most correlated genetic markers [16, 19].To be more precise, the Closest Pair Problem asks to ﬁnd the pair of vectors with the minimalHamming distance among n given binary vectors. While the general version of this problem doesnot make any restrictions on the distribution of input points, several settings imply a uniformdistribution of dataset elements [6, 16, 17, 19]. Usually, in such settings, there is a planted pair, hich attains relative distance ω ∈ [0 , ], which has to be found. This uniform version is alsoknown as the light bulb problem [22]. The problem can be solved in time linearly in the dataset size as long as the dimension of vectors is constant [5, 14]. As soon as the dimension is non-constant aneﬀect occurs known as curse of dimensionality , which lets the problem become much harder.The most common framework to assess the problem is based on locality-sensitive hashing (LSH),whose research was initiated in the pioneering work of Indyk and Motwani [12]. Roughly speaking, alocality-sensitive hash function is more likely to hash points that are close to each other to the samevalue, rather than points that are far apart. To solve the Closest Pair Problem leveraging an LSHfamily one chooses a random hash function of the family and computes the hash value of all pointsin the dataset. In a next step, one computes the pairwise distance only for those pairs hashing tothe same value. This process is then repeated for diﬀerent hash functions until the closest pair isfound. The initial algorithm by Indyk-Motwani achieves a time complexity of n log ( − ω ) . In generala time lower bound of n − w is known for LSH based algorithms [8,18]. In [8] Dubiner also gives anabstract idea of an algorithm achieving this lower bound. Later May and Ozerov [17] gave the ﬁrstconcrete algorithmic description following similar design principles, also achieving the mentionedlower bound. Additionally, current data-dependent hashing schemes [2], where the hash functiondepends also on the actual points in the dataset, improve on the initial idea by Indyk-Motwani andalso match the time lower bound of [8, 18].In the uniform setting Valiant [21] was able to circumvent the lower bound by leveraging fastmatrix multiplication and hence breaking out of the LSH framework to give an algorithm that runsin time n . poly( d ). Remarkably, the complexity exponent of Valiant’s algorithm does not dependon the relative distance ω at all. Later this bound was improved to n . poly( d ) by Karpa et al. [13]and simpliﬁed in an elegant algorithm by Alman [1] achieving the same complexity.All mentioned algorithms have in common, that they assume a dimension of d = c ( n ) log( n ),where c ( n ) is at least a big constant, the results by [2,8,21] for example take c ( n ) = o (1) . Here, thealgorithm by May-Ozerov forms an exception by being applicable for any c ( n ) ≥ − H ( ω ) , where H ( · ) denotes the binary entropy function. Nevertheless, the mentioned lower bound is only achievedfor c ( n ) approaching inﬁnity. Recently, Xie, Xu and Xu [23] proposed a new algorithm based ondecoding the points of the data set according to some random code, exploiting that close vectorsare more likely to be decoded to the same word. Their algorithm is also applicable for any c ( n )that allows to bound the number of pairs attaining relative distance ω to a constant number withhigh probability. The authors are able to derandomize their approach and, thus, obtain the fastestknown deterministic algorithm for small constants c ( n ). However, if one also considers probabilisticprocedures, their method is inferior to the one by May-Ozerov. We design a randomized algorithm, which achieves the best-known running time for solving theClosest Pair Problem on uniformly random input, when the dimension d is small, which means forsmall constants c ( n ). Additionally, our algorithm matches the running time of the best known LSHalgorithms for larger values of c ( n ) and still matches the time lower bound for LSH based schemesif c ( n ) = o (1). To quantify we give in Figure 1 the achieved runtime exponent for c ( n ) ∈ { . , , } of our algorithm in comparison to May-Ozerov. As indicated by the ﬁgure, our approach performsalso exceptionally well for large closest pair distances, where common LSH based techniques usuallyfail [9]. Moreover, we show that for large distances our algorithm is indeed optimal.Technically our algorithm follows similar design principles as [8,17]. At its core, these algorithmsgroup the elements of the given datasets recursively into buckets according to some criterion, whichfulﬁlls properties that are similar to those of locality-sensitive hash functions. As the buckets in therecursion are decreasing in size, at the end of the recursion they become small enough to computethe pairwise distance of all contained elements naively. here we ignore polylogarithmic factors in the dataset size . . . . . .

52 distance ω t i m ee x p o n e n t ϑ (a) Dimension d = 4 log ( n ) . . . . . .

52 distance ω (b) Dimension d = 2 log ( n ) . . . . . .

52 distance ω (c) Dimension d = 1 . ( n ) Figure 1: Time complexity exponent ϑ as a function of the relative distance ω of the closest pair fordiﬀerent dimensions. The running time is of the form n ϑ · poly( d ), where the dashed line representsMay-Ozerov’s algorithm and the solid line depicts the exponent of our new algorithm. The dottedline gives the maximal ω for which the algorithm by May-Ozerov is still applicable.In contrast to previous works, we exchange the used bucket criteria, which allows us to signif-icantly simplify the algorithms’ analysis as well as improve for the mentioned parameter regimes.Also our approach is applicable for any c ( n ), thus we are able to remove the restriction c ( n ) ≥ − H ( ω ) .Following May-Ozerov and Dubiner, we study the bichromatic version of the Closest Pair Prob-lem, which takes as input two datasets rather than one and the goal is to ﬁnd the closest pairbetween those given datasets. Obviously, there exists a randomized reduction between the ClosestPair Problem and its bichromatic version, but our algorithm can also be easily adapted to the singledataset case. However, May and Ozerov require the elements within each dataset to be pairwiseindependent of each other, as a minor contribution we get rid of this restriction, too.Also, we investigate the algorithms’ performance on diﬀerent input distributions. Therefore wegive a modular analysis, which allows for an easy exchange of dataset distribution as well as thechoice of bucketing criterion. We also give numerical upper bounds for the algorithm’s complexityexponent on some exemplary input distributions. These examples suggest that the chosen criterionis well suited as long as the distance between input elements concentrates around d (as in the caseof random input lists), while being non-optimal as soon as the expected distance decreases.We also address an open research question regarding the practical applicability of algorithmsfollowing the design of [8,17] raised by May and Ozerov. As their algorithm inherits a huge polyno-mial overhead in time and space, they left it as an open problem to give a more practical algorithmfollowing a similar design. While our analysis ﬁrst suggests an equally high overhead, we are ableto give an eﬃcient implementation of our algorithm, which requires in addition to the input datasetonly constant space. Also, our practical experiments show that most of the overhead of our al-gorithm is an artifact of the analysis and can be circumvented in practice so that our algorithmperforms well compared to a quadratic search baseline.The rest of the paper is organized as follows: In the subsequent section, we introduce thenecessary notation and state the exact deﬁnition of the Closest Pair Problem under consideration.In section 3, we then give a detailed description of our new algorithm and establish a proof of itsrunning time as well as its correctness. In the following section 4, we investigate the performance ofour algorithm on diﬀerent input distributions. Finally, in section 5, we give practical improvementsof the algorithm and runtime results of our implementation compared to a quadratic search baseline. Preliminaries

For a, b ∈ N , a ≤ b we denote [ a, b ] := { a, a + 1 , . . . , b − , b } . In particular, let [ b ] := [1 , b ]. For avector v ∈ F d and I ∈ [ d ] let v I be the projection of v onto the coordinates indexed by I , i.e. for v = ( v , v , . . . , v d ) and I = { i , i , . . . , i k } we have v I = ( v i , . . . , v i k ) ∈ F k . We denote the uniformdistribution on F d as U (cid:0) F d (cid:1) . We deﬁne f ( n ) = ˜ O ( g ( n )) : ⇔ ∃ i ∈ N : f ( n ) = O (cid:0) g ( n ) · log i ( g ( n )) (cid:1) ,i.e. the tilde additionally suppresses polylogarithmic factors in comparison to the standard Landaunotation O .Furthermore, we consider all logarithms having base 2. Deﬁne the binary entropy function as H ( x ) = − x log( x ) − (1 − x ) log(1 − x ) for x ∈ (0 , H (0) = H (1) := 0. Usingthis together with Stirling’s formula n ! = Θ (cid:0) √ πn (cid:0) ne (cid:1) n (cid:1) we obtain (cid:0) nωn (cid:1) = ˜Θ (cid:0) H ( ω ) n (cid:1) . Weadditionally deﬁne H − : [0 , → [0 , ] to be the inverse of the left branch of H . Lemma 1.

Let v , . . . , v n ∼ U (cid:0) F d (cid:1) independent and M ∈ F n × n an invertible matrix. Then for  v ′ ... v ′ n  := M ·  v ... v n  it also holds that v ′ , . . . , v ′ n ∼ U (cid:0) F d (cid:1) are independently and uniformly distributed. Corollary 1.

For v , w , z ∼ U (cid:0) F d (cid:1) independent, v + z , w + z ∼ U (cid:0) F d (cid:1) are also uniform andindependent.Proof. We have v + zw + zz ! := ! · vwz ! Since the matrix is invertible we can apply Lemma 1.

In this work, we consider the Bichromatic Closest Pair Problem in Hamming metric. Here, theinputs are two lists of equal size containing elements drawn uniformly at random from F d plusa planted pair, whose Hamming distance is ωd for some known ω . More formally, we state theproblem in the following deﬁnition. To allow for easy comparison to the result of May-Ozerov, wefollow their notation using the dimension as the primary diﬃculty parameter. Thus we let the listsizes be n := 2 λd , which means λ = c ( n ) , where d = c ( n ) log n . Deﬁnition 1 (Bichromatic Closest Pair Problem) . Let d ∈ N , ω ∈ (cid:2) , (cid:3) and λ ∈ (0 , . Let L = ( v i ) i ∈ [2 λd ] , L = ( w i ) i ∈ [2 λd ] ∈ (cid:0) F d (cid:1) λd be two lists containing elements uniformly drawn atrandom, together with a distinguished pair ( x , y ) ∈ L × L with wt( x + y ) = ωd . We further assumethat for each i, j the vectors v i and w j are pairwise stochastically independent. The Closest PairProblem CP d,λ,ω asks to ﬁnd this closest pair ( x , y ) given L , L and the weight parameter ω . Wecall ( x , y ) the solution of the CP d,λ,ω problem. First, note that λ ≤ λ > λ ≤

1. We also consider theClosest Pair Problem on input lists whose elements are distributed according to some distribution D diﬀerent from the uniform one used in Deﬁnition 1. To indicate this, we refer to the CP d,λ,ω overdistribution D . Note that in this case, the meaningful upper bound for λ is the entropy of D . echnically speaking, it is also not necessary to know the value of ω , as the time complexity ofappropriate algorithms to solve the CP d,λ,ω problem is solely increasing in ω . Thus if ω is unknown,one would apply the algorithm for each ωd = 0 , , , . . . until the solution is found, which results atmost in polynomial overhead.It is well known, that any LSH based algorithm solving the problem of Deﬁnition 1 with non-negligible probability needs at least time complexity | L | − ω = 2 λd − ω [8, 18]. However, this lowerbound assumes the promised pair to be uniquely distinguishable from all other pairs in L × L .Obviously, if the relation of ω and λ lets us expect more than the promised pair of distance ωd in the input lists, an algorithm solving the Closest Pair Problem needs to ﬁnd all (or at least anon-negligible fraction) of these closest pairs. Such scenarios for example frequently occur whenthe solution to the CP d,λ,ω problem actually is a solution to some diﬀerent problem [4, 10, 11, 17],which enables a distinction from other closest pairs. Hence, if the input lists contain E closest pairsthe optimal time complexity becomes ˜Ω (cid:16) max(2 λd − ω , E ) (cid:17) Let ( v , w ) ∈ L × L \{ ( x , y ) } be arbitrary list elements. If the elements are chosen independentlyand uniformly at random, as stated in Deﬁnition 1 we expect E to be of size E (cid:2) | E | (cid:3) = ( | L × L | − · Pr [wt( v + w ) = ωd ] + 1 |{z} from ( x , y ) = (cid:0) λd − (cid:1) · (cid:0) dωd (cid:1) d + 1= ˜Θ (cid:0) (2 λ + H ( ω ) − d (cid:1) , and, thus, the optimal time complexity to solve the CP d,λ,ω problem becomes T opt = ˜Ω (cid:16) max (cid:16) λd − w , (2 λ + H ( ω ) − d (cid:17)(cid:17) . (1) Our algorithm groups the input elements according to some criterion into several buckets, eachone representing a new closest pair instance with smaller list size. We then apply this bucketingprocedure recursively until the buckets contain few enough elements to eventually solve the ClosestPair Problem represented by them via a naive quadratic search algorithm, the exhaustive search.As a bucketing criterion we choose the weight of the vectors after adding a randomly drawnvector z from F d . Thus, each bucket is represented by a vector z and only those elements v areadded to the bucket, which satisfy wt( v + z ) = δd , where δ is determined later.More precisely in each recursive iteration, our algorithm works only on equally large blocks ofthe input vectors and not on the full d coordinates, i.e. the weight condition is only checked on thecurrent block. This is a technical necessity to obtain independence of vectors in the same bucketon fresh blocks. Let us formally deﬁne the notion of blocks. Deﬁnition 2 (Block) . Let d, r ∈ N with r | d and i ∈ [ r ] . Then we denote the i -th block of [ d ] as B di,r := h ( i − dr + 1 , i dr i . Note that [ d ] = U i ∈ [ r ] B di,r and (cid:12)(cid:12) B di,r (cid:12)(cid:12) = dr for each i ∈ [ r ] . For a leaner notation and since the roleof d does not change in the course of this paper, we omit the index d in the following, thus we write B i,r := B di,r . Note that in such a scenario the searched ( x , y ) is probably not the pair with the smallest Hammingdistance, however, we still refer to elements attaining Hamming distance ωd as closest pairs . orresponds torelative weight δ x k = dr λd d . . . . . . y . . . . . .. . . . . . + z ( ) + z ( ) + z ( ) N ... x y . . . . . .. . . . . . + z ( ) + z ( ) + z ( ) N ... x y . . . . . . Figure 2: We start oﬀ on the left side of the illustration with the two input lists L , L containingthe closest pair ( x , y ). Going right, in each iteration of the algorithm, N diﬀerent z ( j ) i are randomlychosen and all of the list elements are tested if they fulﬁll the bucketing criterion. The crosshatchedpattern indicates the parts where the bucket criterion is fulﬁlled, i.e. the list vectors diﬀer from z ( j ) i in δk positions.In each iteration, we choose the number N of buckets in such a way that with overwhelmingprobability the closest pair lands in at least one of the buckets. Hence, our algorithm creates atree with branching factor N with the distinguished pair being contained in one of the leaves. Thedeeper we get into the tree, the smaller and, hence, the easier the closest pair instances get. Analgorithmic description of the whole procedure is given in pseudocode in Algorithm 1.The following theorem gives the time complexity of our algorithm to solve the CP d,λ,ω . Theorem 1.

Let ω ∈ (cid:2) , (cid:3) and λ ∈ [0 , . Then Algorithm 1 solves the CP d,λ,ω problem withoverwhelming success probability in expected time ϑd (1+ o (1)) , where ϑ =  (1 − ω ) (cid:18) − H (cid:18) δ ⋆ − ω − ω (cid:19)(cid:19) for ω ≤ ω ⋆ λ + H ( ω ) − for ω > ω ⋆ , with δ ⋆ := H − (1 − λ ) and ω ⋆ := 2 δ ⋆ (1 − δ ⋆ ) . The case distinction can intuitively be explained as follows: As long as the number of pairs withdistance ωd in the input lists is small enough the algorithm is optimal for a choice of δ such that thelists at the leaves of the tree become polynomial in size. However, if too many closest pairs existin the input lists, enforcing polynomial size of the leaf nodes lets the probability of the solutionbeing contained in one of them drop immensely. Thus to still ensure the algorithm having successin ﬁnding the solution an enormous branching factor would be required. Hence, instead the choiceof δ is adapted, which leads to larger leaf nodes and in total to a time complexity that is linear inthe number of closest pairs, which matches the lower bound from Equation (1). lgorithm 1 Closest-Pair ( L , L , ω ) Input: lists L , L ∈ (cid:0) F d (cid:1) λd , weight parameter ω ∈ (cid:2) , (cid:3) Output: list L containing the solution ( x , y ) ∈ L × L to the CP d,λ,ω Set r, P, N ∈ N , δ ∈ (cid:2) , (cid:3) properly and deﬁne k := dr ⊲ See Equation (8) for P permutations π do ⊲ permutation on the bit positions Stack S := [( π ( L ) , π ( L ) , L ← ∅ while S is not empty do ( A, B, i ) ← S. pop() if i < r then for N randomly chosen z ∈ F k do A ′ ← ( v ∈ A | wt (cid:0) ( v + z ) B i +1 ,r (cid:1) = δk ) B ′ ← ( w ∈ B | wt (cid:0) ( w + z ) B i +1 ,r (cid:1) = δk ) S. push(( A ′ , B ′ , i + 1)) else for v ∈ A, w ∈ B do ⊲ Naive search if wt( v + w ) = ωd then L ← L ∪ { ( v , w ) } return L We establish the proof of Theorem 1 in a series of lemmata and theorems. Note that anybucketing algorithm heavily depends on two probabilities speciﬁc to the chosen bucketing criterion.First, the probability that any element falls into a bucket, which we call p in the remainder ofthis work. This probability is mainly responsible for the lists’ sizes throughout the algorithm. Thesecond relevant probability, which we call q describes the event of both, x and y , falling into thesame bucket, where ( x , y ) is the solution to the CP d,λ,ω problem. This is the probability of ( x , y ) surviving one iteration meaning that q determines the success probability of the algorithm. Insummary, for our choice of bucketing criteria, we get p := Pr z (cid:2) wt(( v + z ) B i,r ) = δk (cid:3) for any v ∈ F k and q := Pr z (cid:2) wt(( x + z ) B i,r ) = wt(( y + z ) B i,r ) = δk (cid:3) , (2)where k = dr is the block width. If we assume that the ωd diﬀering coordinates of x and y distributeevenly into the r blocks, i.e. wt(( x + y ) B i,r ) = ωk for each i , these probabilities are independent of i for δk ﬁxed. This property is ensured for at least one of the P permutations in Algorithm 1 withoverwhelming probability, as we will see in the proof of Theorem 1.We determine the exact form of q and p later. First, we are going to prove the followingstatement about the expected running time of Algorithm 1 in dependence on both probabilities. Theorem 2.

Let q and p be as deﬁned in Equation (2) , ω ∈ (cid:2) , (cid:3) , λ ∈ [0 , and r = λd log d . ThenAlgorithm 1 solves the CP d,λ,ω problem in expected time max (cid:18) q − r , λd · p r − q r , (cid:0) λd · p r (cid:1) q r (cid:19) o (1) with a success probability overwhelming in d .Proof. First, we are going to prove the statement about the time complexity.The algorithm maintains a stack, containing list pairs together with an associated counter. Inevery iteration of the loop in line 5, one element is removed from the stack and if the counter i ssociated with this element is smaller than r , N additional elements ( A ′ , B ′ , i + 1) are pushed tothe stack in line 11. Let us consider the elements on the stack as nodes in a tree of depth r , whereall elements with associated counter i are siblings on level i of the tree. Also, depict the elementspushed to the stack in line 11 as child nodes of the currently processed node ( A, B, i ). Then thetotal number of elements with associated counter i pushed to the stack is bounded by the numberof nodes on level i in a tree with branching factor N , which is N i .Next, let us determine the lists’ sizes on level i of that tree. Therefore, let the expected size oflists on level i be L i . As these lists are constructed from the lists of the previous level by testingthe weight condition in line 9 and 10, it holds that L i = L i − · Pr (cid:2) wt(( v + z ) B i,r )) = δk (cid:3) := L i − · p , where i > L = | L | . By substitution we get L i = | L | · p i , for i = 0 , . . . , r. Now, we are able to compute the time needed to create the nodes on level i of the tree. Observethat for the creation of a level- i node we need to linearly scan through the larger lists of a node onlevel i − N i nodes of level i we need a totaltime of T i = ˜ O (cid:0) L i − · N i (cid:1) = ˜ O (cid:0) | L | · p i − · N i (cid:1) , for each 0 < i ≤ r . Eventually, the list pairs on level r are matched by a naive search with quadraticruntime resulting in T r +1 = ˜ O ( N r · E [ | A r | · | B r | ]) , where A r , B r describe the lists of a level- r node.The expected value of the product, now, depends on the chosen input distribution. We nextargue that for the given input distribution we have E [ | A r | · | B r | ] = O (cid:0) E [ | A r | ] · E [ | B r | ] (cid:1) = O ( L r ) . To see this, ﬁrst note that for v , w , z independent and uniform, v + z and w + z are alsoindependent and uniform according to Corollary 1. This in turn impliesPr (cid:2) wt(( v + z ) B i,r )) = δk, wt(( w + z ) B i,r )) = δk (cid:3) = Pr (cid:2) wt(( v + z ) B i,r )) = δk (cid:3) · Pr (cid:2) wt(( w + z ) B i,r )) = δk (cid:3) = p since deterministic functions of independent random variables are still independent. This alsoworks for either v = x or w = y , but not for ( v , w ) = ( x , y ). In this case, however, we havePr (cid:2) wt(( x + z ) B i,r )) = δk, wt(( y + z ) B i,r )) = δk (cid:3) = q by deﬁnition. With this insight, we can ex-press E [ | A i | · | B i | ] in terms of E [ | A i − | · | B i − | ] for each i via E [ | A i | · | B i | | A i − , B i − ] = X v ∈ A i − , w ∈ B i − ( v , w ) =( x , y ) Pr (cid:2) wt(( v + z ) B i,r ) = δk, wt(( w + z ) B i,r ) = δk (cid:3) + Pr (cid:2) wt(( x + z ) B i,r )) = δk, wt(( y + z ) B i,r )) = δk (cid:3) = ( | A i − | · | B i − | − p + q ≤ | A i − | · | B i − | · p + 1 , Applying the Law of total Expectation we obtain E [ | A i | · | B i | ] = E [ E [ | A i | · | B i | | A i − , B i − ]] ≤ E [ | A i − | · | B i − | ] · p + 1 (3) uccessive application of Equation (3) yields E [ | A r | · | B r | ] ≤ E [ | L | · | L | ] · p r + r = 2 λd p r + r = O ( L r ) (4)Finally, the algorithm is repeated for P diﬀerent permutations on the bit positions of elementsin L , L . In summary, the expected time complexity to build all list becomes the sum of the T i multiplied by P , thus, by choosing N := dq and P = ( d + 1) r +1 we get T ′ = P · r +1 X i =1 T i ≤ ( d + 1) r +1 · r X i =1 N i · | L | · p i − + ( | L | · p r ) · N r ! = ( d + 1) r +1 · r X i =1 | L | · d i q · (cid:18) pq (cid:19) i − + ( | L | · p r ) · d r q r ! ≤ ( d + 1) r +1 · (cid:18) r · | L | · p r − q r + ( | L | · p r ) q r (cid:19) = max (cid:18) λd · p r − q r , (cid:0) λd · p r (cid:1) q r (cid:19) o (1) , where the inequality follows from the fact that pq ≥ q = Pr (cid:2) wt(( x + z ) B i,r ) = wt(( y + z ) B i,r ) = δk (cid:3) ≤ Pr (cid:2) wt(( x + z ) B i,r ) = δk (cid:3) = p , and the ﬁnal equality stems from the fact that | L | = 2 λd and r = o ( λd log d ) as given in the theorem.Note that T ′ disregards the fact that no matter how small the lists in the tree become, thealgorithm needs to traverse all T ′′ = ˜ O ( N r ) = ˜ O (cid:18)(cid:18) dq (cid:19) r (cid:19) nodes of the tree. Hence, the expected time complexity of the whole algorithm is T = max( T ′ , T ′′ ) , which proves the claim.Let us now consider the success probability of the algorithm. Therefore, we assume that thechosen permutation distributes the weight on x + y such that in every block of length r the weightis equal to ωdr , which we describe as a good permutation. The probability of a random permutation π distributing the weight in such a way isPr [good π ] = Pr h wt (cid:0) π ( x + y ) B i,r (cid:1) = ωdr , for i = 1 , . . . , r i = (cid:0) drωdr (cid:1) r (cid:0) dω (cid:1) ≥ (cid:16) dr + 1 (cid:17) − r . Thus, the probability of at least one out of ( d + 1) r +1 chosen permutations being good is p := Pr [at least one good π ]= 1 − (1 − Pr [good π ]) ( d +1) r +1 = 1 − (cid:18) − (cid:16) dr + 1 (cid:17) − r (cid:19) ( d +1) r +1 ≥ − e − d . he algorithm succeeds, whenever there exists a leaf node in the tree, containing the distin-guished pair ( x , y ). As every node in the tree is constructed based on its parent, it follows that allnodes on the path from the root to that leaf need to contain ( x , y ). By deﬁnition the probabilityof x and y satisfying the bucket criterion at the same time (thus for the same z ) is q and sincewe condition on a good permutation, q is equal for every considered block. Let us deﬁne indicatorvariables X j for the ﬁrst level, where X j = 1 iﬀ the j -th node contains ( x , y ). Observe that the X j for independent choices of z are independent. Thus, clearly the number of trials until ( x , y )is contained in any node on level one is distributed geometrically with parameter q . Hence, theprobability of the solution being contained in at least one node on the ﬁrst level is p := Pr [ ∃ ( A, B, ∈ S : ( x , y ) ∈ A × B ]= 1 − (1 − q ) N = 1 − (1 − q ) dq ≥ − e − d . Now, imagine the pair being contained in some level- i node. Considering that node, we have withthe same probability p again that at least one child contains the solution, and the same argumentholds until we reach the leaves. Also, by the independent choices of z the events remain independentwhich implies that the probability of ( x , y ) being contained in a level- r list is p r . In summary, thesuccess probability isPr [success] = p · p r ≥ (1 − e − d ) r +1 ≥ − r + 1 e d ≥ − de d . The proof of Theorem 2 already shows, how diﬀerent distributions may aﬀect the complexityof the algorithm by changing the expected value E [ | A r | · | B r | ]. This inﬂuence on the algorithmscomplexity by diﬀerent input distributions is further investigated in Section 4.In the next two lemmata, we will proof the exact forms of q and p to conduct the run timeanalysis. Lemma 2.

Let k ∈ N , δ ∈ [0 , . If x ∈ F k and z ∼ U ( F k ) then Pr z [wt( x + z ) = δk ] = (cid:18) kδk (cid:19) (cid:16) (cid:17) k . Proof.

Since z ∼ U ( F k ), the probability is (cid:12)(cid:12) { z ∈ F k | wt( x + z ) = δk } (cid:12)(cid:12)(cid:12)(cid:12) F k (cid:12)(cid:12) . To compute the numerator, note that wt( x + z ) = δk means that x and z diﬀer in δk out of k coordinates, for which there are (cid:0) kδk (cid:1) possibilities. Using (cid:12)(cid:12) F k (cid:12)(cid:12) = 2 k , the lemma follows.Before we continue, let us make a small deﬁnition. Deﬁnition 3.

Let k ∈ N and x , y ∈ F k . Then we deﬁne D ( x , y ) ⊆ [ k ] to be the set of coordinateswhere x and y diﬀer, i.e. D ( x , y ) := { i ∈ [ k ] | x i = y i } . Furthermore, let S ( x , y ) := [ k ] \ D ( x , y ) be the set of coordinates where they are the same. Now we derive the exact form of the probability q of a pair with diﬀerence ωk falling into thesame bucket. Lemma 3.

Let k ∈ N , δ ∈ [0 , . If x , y ∈ F k with wt( x + y ) = ωk and z ∼ U ( F k ) . Then Pr z [wt( x + z ) = wt( y + z ) = δk ] = (cid:18) ωk ωk (cid:19)(cid:18) (1 − ω ) k (cid:0) δ − ω (cid:1) k (cid:19) (cid:16) (cid:17) k . roof. Let A := { z ∈ F k | wt( x + z ) = wt( y + z ) = δk } . In analogy to Lemma 2, the probability we search for is | A | | F k | = | A | · (cid:0) (cid:1) k . In the following, let ω x := wt( x + z ) and analogously ω y := wt( y + z ). Now observe that everycoordinate z i of z with i ∈ S ( x , y ), so belonging to the set of equal coordinates between x and y ,either contributes to both ω x and ω y does not aﬀect either one of them. Let us deﬁne the amountof the z i ’s with i ∈ S ( x , y ) that contribute to the weight as a := | S ( x , y ) ∩ D ( x , z ) | .Now consider the z i ’s with i ∈ D ( x , y ). Clearly, any such z i contributes either to ω x or to ω y . Thus, let us deﬁne the number of those z i with i ∈ D ( x , y ) that contribute to ω x as b x := | D ( x , y ) ∩ D ( x , z ) | and analogously those which contribute to ω y as b y := | D ( x , y ) ∩ D ( y , z ) | .Obviously we have b x + b y = | D ( x , y ) | = ωk (5)On the other hand we are only interested in those z for which ω x = ω y = δk , which yields the twoequations ω x = a + b x = δk (6) ω y = a + b y = δk (7)All three equations together yield the unique solution b x = b y = ωk a = (cid:16) δ − ω (cid:17) k . This shows the following: If z ∈ A , it is necessary that z diﬀers from x (analogously y ) in exactly- ω k out of ωk coordinates of D ( x , y ) and- (cid:0) δ − ω (cid:1) k out of (1 − ω ) k coordinates of S ( x , y ).Thus, because we can freely combine both conditions, in total there are | A | = (cid:18) ωk ω k (cid:19)(cid:18) (1 − ω ) k (cid:0) δ − ω (cid:1) k (cid:19) diﬀerent values for z , ﬁnishing the proof.Now we are ready to prove Theorem 1 about the time complexity of Algorithm 1 for solvingthe CP d,λ,ω problem. For convenience we restate the theorem here. Theorem 1.

1) + (1 − ω ) (cid:18) − H (cid:16) δ − ω − ω (cid:17)(cid:19) . Now, minimizing ϑ ⋆ yields a global minimum at δ min = (1 − √ − ω ) attaining a value of ϑ ⋆ ( δ min ) = 2 λ + H ( ω ) − . As we are restricted to values for δ which are larger than δ ⋆ solving δ min ≥ δ ⋆ for ω yields δ min ≥ δ ⋆ ⇔ ω ≥ δ ⋆ (1 − δ ⋆ ) = ω ⋆ . This proves the claim of the theorem whenever ω > ω ⋆ . For all other values of ω we simplychoose δ = δ ⋆ , which yields ϑ = ϑ ⋆ ( δ ⋆ ) = (1 − ω ) (cid:18) − H (cid:16) δ ⋆ − ω − ω (cid:17)(cid:19) for ω ≤ ω ⋆ as claimed.Now to boost the expected running time 2 ϑd (1+ o (1)) of the algorithm to actually being obtainedwith overwhelming probability we use a standard Markov argument. Let X denote the randomvariable describing the running time of the algorithm. Then the probability that the algorithmneeds more time than 2 √ d E [ X ] to ﬁnish isPr h X ≥ √ d · E [ X ] i ≤ E [ X ]2 √ d · E [ X ] = 2 −√ d , or equivalently the algorithm ﬁnishes in less time than 2 √ d E [ X ] = 2 ϑd (1+ o (1)) with overwhelmingprobability. Also, a standard application of the union bound yields that the intersection of thealgorithm ﬁnishing within the claimed time and the algorithm having success in ﬁnding the solutionis still overwhelming. he theorem shows that whenever ω > ω ∗ our algorithm obtains the optimal time complexityfor uniformly random lists as given in Equation (1). Additionally, our algorithm reaches the timelower bound for locality-sensitive hashing based algorithms for all values of ω , whenever the inputlist sizes are subexponential in the dimension d , which is shown in the following lemma. Lemma 4.

Let ω ∈ (cid:2) , (cid:3) , and ϑ as deﬁned in Theorem 1. Then we have lim λ → ϑλ = 11 − ω . Proof.

Note that for λ converging zero, δ ⋆ = H − (1 − λ ) approaches . This implies ω ⋆ :=2 δ ⋆ (1 − δ ⋆ ) = and hence for all choices of ω we have ϑ = (1 − ω ) (cid:18) − H (cid:18) δ − ω − ω (cid:19)(cid:19) . Now, for this choice of ϑ , May and Ozerov [17, Corollary 1] already showed the statement of thislemma, by applying L’Hoptial’s rule twice.For convenience we restate all parameter choices of Algorithm 1 for solving the CP d,λ,ω in thefollowing overview: r = d log d , P = ( d + 1) r +1 , k = drN = dq , where q = (cid:18) ωk ωk (cid:19)(cid:18) (1 − ω ) k (cid:0) δ − ω (cid:1) k (cid:19) (cid:16) (cid:17) k δ = (cid:26) δ ⋆ for ω ≤ δ ⋆ (1 − δ ⋆ ) (1 − √ − ω ) else , with δ ⋆ := H − (1 − λ ) (8) In this section, we show how to adapt the analysis of Algorithm 1 to variable input distributions.Therefore, we ﬁrst reformulate Theorem 2 in Corollary 2 for the case of considering the CP d,λ,ω over an arbitrary distribution D . As already indicated in the proof of Theorem 2, this reformulationdepends on the expected value E of the cost of the naive search at the bottom of the computationtree, which is highly inﬂuenced by the distribution D . Then, we show how to compute E and how toupper bound it eﬀectively. Finally, we give upper bounds for the time complexity of the algorithmto solve the CP d,λ,ω over some generic distributions. These examples suggest that the algorithm isbest suited for distributions D , where the weight of the sum v + w of elements v , w ∼ D concentratesat d . Let us start with the reformulation of the theorem.

Corollary 2.

Let D be some distribution over F d , q and p be as deﬁned in Equation (2) , ω ∈ (cid:2) , (cid:3) , λ ∈ [0 , and r = λd log d . Also let E = E [ | A | · | B | ] for A and B in line 13 of Algorithm 1 (where theexpectation is taken over the distribution of input lists and the random choices of the algorithm).Then Algorithm 1 solves the CP d,λ,ω problem over D in time max (cid:18) q − r , λd · p r − q r , E q r (cid:19) o (1) with success probability overwhelming in d . This behavior seems quite natural as in this case, the solution is most distinguishable from randominput pairs. roof. The proof follows along the lines of the proof of Theorem 2, by observing that T r +1 = N r · E and the expected time complexity is again ampliﬁed to being obtained with overwhelmingprobability by using a Markov argument similar to the proof of Theorem 1.In the next lemma, we show how to upper bound the value of E . Lemma 5 (Expectation of Naive Search) . Let D be some distribution over F d , ω ∈ (cid:2) , (cid:3) , λ ∈ [0 , and r = λd log d . Also let E = E [ | A | · | B | ] for A and B in line 13 of Algorithm 1 when solving someinstance of the CP d,λ,ω over D (where the expectation is taken over the distribution of input listsand the random choices of the algorithm). Then we have E ≤ λd r Y i =1 α i + 4 r · λd · p r where α i := Pr v , w ∼D (cid:2) wt(( v + z ) B i,r ) = δk, wt(( w + z ) B i,r ) = δk (cid:3) .Proof. Given in Appendix A.While Lemma 5 gives an upper bound on the required expectation, it is not very handy. In thenext lemma, we show how to further bound this expectation and how it aﬀects the running time ofthe algorithm.

Lemma 6 (Complexity for Arbitrary Distributions) . Let D be some distribution over F d , r := λd log d , ω ∈ (cid:2) , (cid:3) and λ ∈ [0 , . Also let E = E [ | A | · | B | ] for A and B in line 13 of Algorithm 1 whensolving some instance of the CP d,λ,ω over D (where the expectation is taken over the distribution ofinput lists and the random choices of the algorithm). Then Algorithm 1 solves the CP d,λ,ω over D in time max (cid:18) q − r , λd · p r − q r , εd q r (cid:19) o (1) , where ε = 2 λ − min i ∈ [ r ] γ ∈ [0 , (1 − γ ) (cid:18) − H (cid:18) δ − γ − γ (cid:19)(cid:19) − r · log p i,γk d with p i,γk := Pr (cid:2) wt(( v + w ) B i,r ) = γk (cid:3) .Proof. Given in Appendix A.Note that if it further holds that for v ∼ D each of the r blocks of v is identically dis-tributed we can further simplify the term of ε from Lemma 6. In this case, we have p ri,γk ≤ Pr [wt( v + w ) = γd ] := p γd , thus we get ε = 2 λ − min γ ∈ [0 , (1 − γ ) (cid:18) − H (cid:18) δ − γ − γ (cid:19)(cid:19) − log p γd d . Now if we are given an arbitrary distribution D we can maximize ε according to γ . Then wecan similar to the proof of Theorem 1 derive a value for δ minimizing the overall time complexity.We performed this maximization and optimization numerically for some generic input distri-butions. We considered distributions, where the weight of input vectors is distributed binomially ,chosen according to a Poisson distribution or ﬁxed to a speciﬁc value. This means, ﬁrst a weight issampled according to the chosen distribution and then a vector of that weight is selected uniformlyamong all vectors of that weight. . . . . . . . . . . . ω t i m ee x p o n e n t ϑ ω (a) List sizes | L | = | L | = 2 . d . . . . . . . . . . ω (b) List sizes | L | = | L | = 2 . d Figure 3: Time complexity exponents as a function of the weight of the closest pair for diﬀerentinput list distributions, where the expected weight of input elements is equal to 0 . d , 0 . d , 0 . d ,0 . d , 0 . d from left to right.The running time of Algorithm 1 for solving the CP d,λ,ω over the considered distributions seemsto be only dependent on the expected weight of vectors contained in the input lists. That meansthe time complexity for input lists containing random vectors whose weight is either ﬁxed to γd orbinomially or Poisson distributed with expectation γd is equal. This can possibly be explained bythe low variance of all these distributions, which implies a high concentration around this expectedweight.We see in Figure 3, that the value for ω , from where on the complexity becomes quadratic in thelists sizes shifts to the left. This behavior stems from the fact, that the expected weight of a sumof elements is no longer d , but roughly 2 γ (1 − γ ) d . What also stands out is, that the complexityfor ω = 0 is no longer linear in the lists sizes. The reason for this is that the probability of randompairs falling into the same bucket and the probability of the closest pair falling into the same bucketconverge for decreasing weight of input list elements. This indicates that for input distributionswith smaller expected weight a diﬀerent bucketing criterion might be beneﬁcial. We pose this as anopen question for further research. In this section, we give experimental results of the performance of a proof of concept implemen-tation of our new algorithm. These experiments verify the performance gain of our algorithmover a naive quadratic search approach. We also verify the numerical estimates of the algo-rithm’s performance on diﬀerent input distributions from the previous section and give somepractical related improvements to our algorithm. Our implementation is publicly available at https://github.com/submission-nn/nn-algorithm .Before discussing the benchmark results let us ﬁrst brieﬂy describe some of the practical im-provements we introduced in our implementation, which diﬀer from the description in Section 3.We implemented a true depth-ﬁrst search rather than the iterative description given previously.The iterative description just allowed for a more convenient analysis. Thus, our algorithm needs tostore only the lists of a single path from the root to a leaf node at any time. Also, as all lists ofsubsequent levels are subsets of previous ones, we do not create r diﬀerent lists. We rather rearrangethe elements of the input list such that elements belonging to the list of the subsequent level areconsecutive, making it suﬃcient to just memorize the range of elements that belong to the nextlevel list. This way, we only need to store the input list plus two integer markers for each level. . . . . . − − weight ω r un t i m e i n s d = 32 . . . . . weight ωd = 64 . . . . . − − weight ω r un t i m e i n s d = 128 . . . . . weight ωd = 256 ε = 0 ε = 1 ≤ δk Figure 4: Runtime results in seconds in logarithmic scale (y-axis) as the function of the distance ω of the closest pair (x-axis) on random input lists of size 2 . The dotted, dashed and dash-dottedlines indicate the runtime results for the diﬀerent bucketing strategies used. The straight horizontalline is the time used by a naive quadratic search.Also, it turns out that in practice often a small depth of the tree (not exceeding 8 in ourexperiments) is already suﬃcient to achieve good runtime results. Regarding the branching factor N of the tree, we achieve optimal results either for values close to its expectation q as given by theanalysis or values being signiﬁcantly smaller. The case of using a very small branching factor canbe seen as a pruning strategy, similar to the one used in lattice enumeration algorithms for shortestvector search [3]. Additionally, we benchmarked three diﬀerent strategies for the weight criteria:1. Strictly enforcing a weight of δk in each block, as described in our algorithm.2. Allowing for a small deviation ± ε around δk .3. Allowing for weights of at most δk .Additionally, we introduced a threshold for the size of the lists in the tree from where the compu-tation of further leaves is aborted and naive search is used instead.Figure 4 shows the runtime results for the diﬀerent bucket criteria on small input lists of size2 containing random elements. Here, each data point was averaged over 50 measurements. Theexperimental results clearly indicate a signiﬁcant gain over the quadratic search approach. The lesssigniﬁcant gain for small dimension d is due to the reduced amount of possible blocks or equivalentlythe low depth of the computation tree, which lets the algorithm not reach its full potential. In the ase of small input lists, we observe that a bucketing strategy that allows a deviation of ε = 1 from δk is beneﬁcial for most values of d .Figure 5 shows the same experiments performed on larger input lists of size 2 . Besides a moresigniﬁcant improvement over the naive search, we can observe that the bucketing criterion that uses δk as an upper bound becomes more beneﬁcial for nearly all values of ω and d . . . . . . − − weight ω r un t i m e i n s d = 32 . . . . . weight ωd = 64 . . . . . − − weight ω r un t i m e i n s d = 128 . . . . . weight ωd = 256 ε = 0 ε = 1 ≤ δk Figure 5: Runtime results in seconds in logarithmic scale (y-axis) as the function of the distance ω of the closest pair (x-axis) on random input lists of size 2 . The dotted, dashed and dash-dottedlines indicate the runtime results for the diﬀerent bucketing strategies used. The straight horizontalline is the time used by a naive quadratic search.Eventually, Figure 6 shows the experimental runtime results on input lists, whose elementsare drawn from a diﬀerent input distribution, analyzed in Section 4. Here the distribution is theuniformly random distribution over vectors of weight γd . One can observe that for growing d theshape of the graph resembles the theoretical results from Figure 3. References [1] Alman, J.: An illuminating algorithm for the light bulb problem. In: Fineman, J.T., Mitzen-macher, M. (eds.) 2nd Symposium on Simplicity in Algorithms, SOSA@SODA 2019, January8-9, 2019 - San Diego, CA, USA. OASICS, vol. 69, pp. 2:1–2:11. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019), https://doi.org/10.4230/OASIcs.SOSA.2019.2 . . . . . − − weight ω r un t i m e i n s d = 32 . . . . . weight ωd = 64 . . . . . − − weight ω r un t i m e i n s d = 128 γ = 0 . γ = 0 . γ = 0 . γ = 0 . γ = 0 . U Figure 6: Runtime results in seconds in logarithmic scale (y-axis) as the function of the distance ω of the closest pair (x-axis) on input lists of size 2 containing random elements of weight γd . Thedensely dashed line ( U ) indicates the runtime on uniformly random lists.

2] Andoni, A., Razenshteyn, I.: Optimal data-dependent hashing for approximate near neighbors.In: Proceedings of the forty-seventh annual ACM symposium on Theory of computing. pp.793–801 (2015)[3] Aono, Y., Nguyen, P.Q., Seito, T., Shikata, J.: Lower bounds on lattice enumeration with ex-treme pruning. In: Annual International Cryptology Conference. pp. 608–637. Springer (2018)[4] Becker, A., Ducas, L., Gama, N., Laarhoven, T.: New directions in nearest neighbor search-ing with applications to lattice sieving. In: Krauthgamer, R. (ed.) 27th Annual ACM-SIAMSymposium on Discrete Algorithms. pp. 10–24. ACM-SIAM, Arlington, VA, USA (Jan 10–12,2016)[5] Bentley, J.L.: Multidimensional divide-and-conquer. Communications of the ACM 23(4), 214–229 (1980)[6] Both, L., May, A.: Decoding linear codes with high error rate and its impact for LPN security.In: Lange, T., Steinwandt, R. (eds.) Post-Quantum Cryptography - 9th International Confer-ence, PQCrypto 2018. pp. 25–46. Springer, Heidelberg, Germany, Fort Lauderdale, Florida,United States (Apr 9–11 2018)[7] Calonder, M., Lepetit, V., Strecha, C., Fua, P.: Brief: Binary robust independent elementaryfeatures. In: European conference on computer vision. pp. 778–792. Springer (2010)[8] Dubiner, M.: Bucketing coding and information theory for the statistical high-dimensionalnearest-neighbor problem. IEEE Transactions on Information Theory 56(8), 4166–4179 (2010)[9] Esmaeili, M.M., Ward, R.K., Fatourechi, M.: A fast approximate nearest neighbor search algo-rithm in the hamming space. IEEE transactions on pattern analysis and machine intelligence34(12), 2481–2488 (2012)[10] Gueye, C.T., Klamti, J.B., Hirose, S.: Generalization of bjmm-isd using may-ozerov nearestneighbor algorithm over an arbitrary ﬁnite ﬁeld F q . In: International Conference on Codes,Cryptology, and Information Security. pp. 96–109. Springer (2017)[11] Hirose, S.: May-ozerov algorithm for nearest-neighbor problem over F q and its applicationto information set decoding. In: International Conference for Information Technology andCommunications. pp. 115–126. Springer (2016)[12] Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of di-mensionality. In: 30th Annual ACM Symposium on Theory of Computing. pp. 604–613. ACMPress, Dallas, TX, USA (May 23–26, 1998)[13] Karppa, M., Kaski, P., Kohonen, J.: A faster subquadratic algorithm for ﬁnding outlier correla-tions. In: Krauthgamer, R. (ed.) 27th Annual ACM-SIAM Symposium on Discrete Algorithms.pp. 1288–1305. ACM-SIAM, Arlington, VA, USA (Jan 10–12, 2016)[14] Khuller, S., Matias, Y.: A simple randomized sieve algorithm for the closest-pair problem.Information and Computation 118(1), 34–37 (1995)[15] Lu, J., Liong, V.E., Zhou, X., Zhou, J.: Learning compact binary face descriptor for facerecognition. IEEE transactions on pattern analysis and machine intelligence 37(10), 2041–2056(2015)[16] Marchini, J., Donnelly, P., Cardon, L.R.: Genome-wide strategies for detecting multiple locithat inﬂuence complex diseases. Nature genetics 37(4), 413–417 (2005)[17] May, A., Ozerov, I.: On computing nearest neighbors with applications to decoding of binarylinear codes. In: Oswald, E., Fischlin, M. (eds.) Advances in Cryptology – EUROCRYPT 2015,Part I. Lecture Notes in Computer Science, vol. 9056, pp. 203–228. Springer, Heidelberg, Ger-many, Soﬁa, Bulgaria (Apr 26–30, 2015)[18] Motwani, R., Naor, A., Panigrahi, R.: Lower bounds on locality sensitive hashing. In: Pro-ceedings of the twenty-second annual symposium on Computational geometry. pp. 154–157(2006)

19] Musani, S.K., Shriner, D., Liu, N., Feng, R., Coﬀey, C.S., Yi, N., Tiwari, H.K., Allison, D.B.:Detection of gene × gene interactions in genome-wide association studies of human populationdata. Human heredity 63(2), 67–84 (2007)[20] Strecha, C., Bronstein, A., Bronstein, M., Fua, P.: Ldahash: Improved matching with smallerdescriptors. IEEE transactions on pattern analysis and machine intelligence 34(1), 66–78 (2011)[21] Valiant, G.: Finding correlations in subquadratic time, with applications to learning paritiesand juntas. In: 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science. pp.11–20. IEEE (2012)[22] Valiant, L.G.: Functionality in neural nets. In: COLT. vol. 88, pp. 28–39 (1988)[23] Xie, N., Xu, S., Xu, Y.: A new coding-based algorithm for ﬁnding closest pair of vectors.Theoretical Computer Science 782, 129–144 (2019) A Proofs for General Distributions

In this section we give the proofs for the lemmata regarding the performance of our algorithm ondiﬀerent input distributions, which were omitted in the main body of the paper.

A.1 Proof of Lemma 5

Similar to the proof of Theorem 2, let us bound E [ | A i | · | B i | ] in terms of E [ | A i − | · | B i − | ], E [ | A i − | ]and E [ | B i − | ] for each i . E [ | A i | · | B i | | A i − , B i − ] = X v ∈ A i − \{ x } w ∈ B i − \{ y } Pr (cid:2) wt(( v + z ) B i,r ) = δk, wt(( w + z ) B i,r ) = δk (cid:3)| {z } =: α i + X v ∈ A i − Pr (cid:2) wt(( v + z ) B i,r ) = δk, wt(( y + z ) B i,r ) = δk (cid:3)| {z } ≤ p + X w ∈ B i − Pr (cid:2) wt(( x + z ) B i,r ) = δk, wt(( w + z ) B i,r ) = δk (cid:3)| {z } ≤ p + Pr (cid:2) wt(( x + z ) B i,r )) = δk, wt(( y + z ) B i,r )) = δk (cid:3) ≤ α i · | A i − | · | B i − | + p · ( | A i − | + | B i − | + 1)and hence E [ | A i | · | B i | ] ≤ α i · E [ | A i − | · | B i − | ] + p · ( E [ | A i − | ] + E [ | B i − | ] + 1). Again, applying thisequation successively, we obtain E = E [ | A r | · | B r | ] ≤ λd r Y i =1 α i + 4 · λd · r X i =1 i − Y j =0 α r − j ! p r − i +1 ≤ λd r Y i =1 α i + 4 r · λd · p r . A.2 Proof of Lemma 6

Taking the result for E from Lemma 5 and plugging into the run time formula from Corollary 2 weget that the CP d,λ,ω problem over D can be solved with probability overwhelming in d in timemax (cid:18) q − r , λd · p r − q r , E q r (cid:19) o (1) ≤ max (cid:18) q − r , λd · p r − q r , λd Q ri =1 α i q r (cid:19) o (1) ince the right summand r · λd · p r q r of E q r is asymptotically smaller than the second entry in the max,i.e. λd · p r − q r . Thus, is suﬃces to ﬁnd an easier upper bound for the ﬁrst summand S := 2 λd Q ri =1 α i .Remembering α i = Pr (cid:2) wt(( v + z ) B i,r ) = δk, wt(( w + z ) B i,r ) = δk (cid:3) we receive S ≤ λd · (cid:18) max i ∈ [ r ] α i (cid:19) r = 2 λd · max i ∈ [ r ] k X j =0 q i,j · Pr (cid:2) wt(( v + w ) B i,r ) = j (cid:3)! r ≤ λd + o ( d ) · (cid:18) max i ∈ [ r ] , j ∈ [ k ] ∪{ } q i,j · Pr (cid:2) wt(( v + w ) B i,r ) = j (cid:3)(cid:19) r = 2 λd + o ( d ) · (cid:18) max i ∈ [ r ] , γ ∈ [0 , q i,γk · Pr (cid:2) wt(( v + w ) B i,r ) = γk (cid:3)(cid:19) r , where q i,γk = Pr (cid:2) wt(( v + z ) B i,r ) = δk, wt(( w + z ) B i,r ) = δk | wt(( v + w ) B i,r ) = γk (cid:3) . Lemma 3lets us rewrite this probability as q i,γk = (cid:18) γk γk (cid:19)(cid:18) (1 − γ ) k (cid:0) δ − γ (cid:1) k (cid:19) (cid:16) (cid:17) k ≤ − (cid:16) − H (cid:16) δ − γ − γ (cid:17)(cid:17) (1 − γ ) k . We end up with S ≤ λd + r · max i ∈ [ r ] , γ ∈ [0 , − (cid:16) − H (cid:16) δ − γ − γ (cid:17)(cid:17) (1 − γ ) k +log p i,γk + o ( d ) = 2 (cid:16) λ + max i ∈ [ r ] , γ ∈ [0 , − (cid:16) − H (cid:16) δ − γ − γ (cid:17)(cid:17) (1 − γ )+ rd · log p i,γk (cid:17) d + o ( d ) = 2 (cid:16) λ − min i ∈ [ r ] , γ ∈ [0 , (cid:16) − H (cid:16) δ − γ − γ (cid:17)(cid:17) (1 − γ ) − rd · log p i,γk (cid:17) d + o ( d ) with p i,γk := Pr (cid:2) wt(( v + w ) B i,r ) = γk (cid:3) , which proves the claim., which proves the claim.