A Faster Algorithm for Finding Closest Pairs in Hamming Metric
aa r X i v : . [ c s . D S ] F e b A Faster Algorithm for Finding Closest Pairs inHamming Metric
Andre Esser , Robert Kübler , and Floyd Zweydinger Cryptography Research Center, Technology Innovation Institute, AbuDhabi, UAE, [email protected] Medion AG Essen, Germany, [email protected] Ruhr University Bochum, Germany, [email protected]
Abstract
We study the Closest Pair Problem in Hamming metric, which asks to find the pairwith the smallest Hamming distance in a collection of binary vectors. We give a newrandomized algorithm for the problem on uniformly random input outperforming previ-ous approaches whenever the dimension of input points is small compared to the datasetsize. For moderate to large dimensions, our algorithm matches the time complexity ofthe previously best-known locality sensitive hashing based algorithms. Technically ouralgorithm follows similar design principles as Dubiner (IEEE Trans. Inf. Theory 2010)and May-Ozerov (Eurocrypt 2015). Besides improving the time complexity in the afore-mentioned areas, we significantly simplify the analysis of these previous works. We givea modular analysis, which allows us to investigate the performance of the algorithmalso on non-uniform input distributions. Furthermore, we give a proof of concept im-plementation of our algorithm which performs well in comparison to a quadratic searchbaseline. This is the first step towards answering an open question raised by May andOzerov regarding the practicability of algorithms following these design principles.
Keywords— closest pair problem, nearest neighbor, LSH
Finding closest pairs in a given dataset of binary vectors is a fundamental problem in theoreticalcomputer sciences with numerous applications in data science, machine learning, computer vision,cryptography, and many others.Image data for example is often represented via compact binary codes to allow for efficient closestpair search in applications like similarity search in images or facial recognition systems [7,15,20]. Theusage of binary codes also allows decoding the represented data to common codewords. Here, themost efficient algorithms known for decoding such random binary linear codes also heavily benefitfrom improved algorithms for the Closest Pair Problem [6, 17]. Another common application liesin the field of bioinformatics, where the analysis of genomes involves closest pair search on largedatasets to identify most correlated genetic markers [16, 19].To be more precise, the Closest Pair Problem asks to find the pair of vectors with the minimalHamming distance among n given binary vectors. While the general version of this problem doesnot make any restrictions on the distribution of input points, several settings imply a uniformdistribution of dataset elements [6, 16, 17, 19]. Usually, in such settings, there is a planted pair, hich attains relative distance ω ∈ [0 , ], which has to be found. This uniform version is alsoknown as the light bulb problem [22]. The problem can be solved in time linearly in the dataset size as long as the dimension of vectors is constant [5, 14]. As soon as the dimension is non-constant aneffect occurs known as curse of dimensionality , which lets the problem become much harder.The most common framework to assess the problem is based on locality-sensitive hashing (LSH),whose research was initiated in the pioneering work of Indyk and Motwani [12]. Roughly speaking, alocality-sensitive hash function is more likely to hash points that are close to each other to the samevalue, rather than points that are far apart. To solve the Closest Pair Problem leveraging an LSHfamily one chooses a random hash function of the family and computes the hash value of all pointsin the dataset. In a next step, one computes the pairwise distance only for those pairs hashing tothe same value. This process is then repeated for different hash functions until the closest pair isfound. The initial algorithm by Indyk-Motwani achieves a time complexity of n log ( − ω ) . In generala time lower bound of n − w is known for LSH based algorithms [8,18]. In [8] Dubiner also gives anabstract idea of an algorithm achieving this lower bound. Later May and Ozerov [17] gave the firstconcrete algorithmic description following similar design principles, also achieving the mentionedlower bound. Additionally, current data-dependent hashing schemes [2], where the hash functiondepends also on the actual points in the dataset, improve on the initial idea by Indyk-Motwani andalso match the time lower bound of [8, 18].In the uniform setting Valiant [21] was able to circumvent the lower bound by leveraging fastmatrix multiplication and hence breaking out of the LSH framework to give an algorithm that runsin time n . poly( d ). Remarkably, the complexity exponent of Valiant’s algorithm does not dependon the relative distance ω at all. Later this bound was improved to n . poly( d ) by Karpa et al. [13]and simplified in an elegant algorithm by Alman [1] achieving the same complexity.All mentioned algorithms have in common, that they assume a dimension of d = c ( n ) log( n ),where c ( n ) is at least a big constant, the results by [2,8,21] for example take c ( n ) = o (1) . Here, thealgorithm by May-Ozerov forms an exception by being applicable for any c ( n ) ≥ − H ( ω ) , where H ( · ) denotes the binary entropy function. Nevertheless, the mentioned lower bound is only achievedfor c ( n ) approaching infinity. Recently, Xie, Xu and Xu [23] proposed a new algorithm based ondecoding the points of the data set according to some random code, exploiting that close vectorsare more likely to be decoded to the same word. Their algorithm is also applicable for any c ( n )that allows to bound the number of pairs attaining relative distance ω to a constant number withhigh probability. The authors are able to derandomize their approach and, thus, obtain the fastestknown deterministic algorithm for small constants c ( n ). However, if one also considers probabilisticprocedures, their method is inferior to the one by May-Ozerov. We design a randomized algorithm, which achieves the best-known running time for solving theClosest Pair Problem on uniformly random input, when the dimension d is small, which means forsmall constants c ( n ). Additionally, our algorithm matches the running time of the best known LSHalgorithms for larger values of c ( n ) and still matches the time lower bound for LSH based schemesif c ( n ) = o (1). To quantify we give in Figure 1 the achieved runtime exponent for c ( n ) ∈ { . , , } of our algorithm in comparison to May-Ozerov. As indicated by the figure, our approach performsalso exceptionally well for large closest pair distances, where common LSH based techniques usuallyfail [9]. Moreover, we show that for large distances our algorithm is indeed optimal.Technically our algorithm follows similar design principles as [8,17]. At its core, these algorithmsgroup the elements of the given datasets recursively into buckets according to some criterion, whichfulfills properties that are similar to those of locality-sensitive hash functions. As the buckets in therecursion are decreasing in size, at the end of the recursion they become small enough to computethe pairwise distance of all contained elements naively. here we ignore polylogarithmic factors in the dataset size . . . . . .
52 distance ω t i m ee x p o n e n t ϑ (a) Dimension d = 4 log ( n ) . . . . . .
52 distance ω (b) Dimension d = 2 log ( n ) . . . . . .
52 distance ω (c) Dimension d = 1 . ( n ) Figure 1: Time complexity exponent ϑ as a function of the relative distance ω of the closest pair fordifferent dimensions. The running time is of the form n ϑ · poly( d ), where the dashed line representsMay-Ozerov’s algorithm and the solid line depicts the exponent of our new algorithm. The dottedline gives the maximal ω for which the algorithm by May-Ozerov is still applicable.In contrast to previous works, we exchange the used bucket criteria, which allows us to signif-icantly simplify the algorithms’ analysis as well as improve for the mentioned parameter regimes.Also our approach is applicable for any c ( n ), thus we are able to remove the restriction c ( n ) ≥ − H ( ω ) .Following May-Ozerov and Dubiner, we study the bichromatic version of the Closest Pair Prob-lem, which takes as input two datasets rather than one and the goal is to find the closest pairbetween those given datasets. Obviously, there exists a randomized reduction between the ClosestPair Problem and its bichromatic version, but our algorithm can also be easily adapted to the singledataset case. However, May and Ozerov require the elements within each dataset to be pairwiseindependent of each other, as a minor contribution we get rid of this restriction, too.Also, we investigate the algorithms’ performance on different input distributions. Therefore wegive a modular analysis, which allows for an easy exchange of dataset distribution as well as thechoice of bucketing criterion. We also give numerical upper bounds for the algorithm’s complexityexponent on some exemplary input distributions. These examples suggest that the chosen criterionis well suited as long as the distance between input elements concentrates around d (as in the caseof random input lists), while being non-optimal as soon as the expected distance decreases.We also address an open research question regarding the practical applicability of algorithmsfollowing the design of [8,17] raised by May and Ozerov. As their algorithm inherits a huge polyno-mial overhead in time and space, they left it as an open problem to give a more practical algorithmfollowing a similar design. While our analysis first suggests an equally high overhead, we are ableto give an efficient implementation of our algorithm, which requires in addition to the input datasetonly constant space. Also, our practical experiments show that most of the overhead of our al-gorithm is an artifact of the analysis and can be circumvented in practice so that our algorithmperforms well compared to a quadratic search baseline.The rest of the paper is organized as follows: In the subsequent section, we introduce thenecessary notation and state the exact definition of the Closest Pair Problem under consideration.In section 3, we then give a detailed description of our new algorithm and establish a proof of itsrunning time as well as its correctness. In the following section 4, we investigate the performance ofour algorithm on different input distributions. Finally, in section 5, we give practical improvementsof the algorithm and runtime results of our implementation compared to a quadratic search baseline. Preliminaries
For a, b ∈ N , a ≤ b we denote [ a, b ] := { a, a + 1 , . . . , b − , b } . In particular, let [ b ] := [1 , b ]. For avector v ∈ F d and I ∈ [ d ] let v I be the projection of v onto the coordinates indexed by I , i.e. for v = ( v , v , . . . , v d ) and I = { i , i , . . . , i k } we have v I = ( v i , . . . , v i k ) ∈ F k . We denote the uniformdistribution on F d as U (cid:0) F d (cid:1) . We define f ( n ) = ˜ O ( g ( n )) : ⇔ ∃ i ∈ N : f ( n ) = O (cid:0) g ( n ) · log i ( g ( n )) (cid:1) ,i.e. the tilde additionally suppresses polylogarithmic factors in comparison to the standard Landaunotation O .Furthermore, we consider all logarithms having base 2. Define the binary entropy function as H ( x ) = − x log( x ) − (1 − x ) log(1 − x ) for x ∈ (0 , H (0) = H (1) := 0. Usingthis together with Stirling’s formula n ! = Θ (cid:0) √ πn (cid:0) ne (cid:1) n (cid:1) we obtain (cid:0) nωn (cid:1) = ˜Θ (cid:0) H ( ω ) n (cid:1) . Weadditionally define H − : [0 , → [0 , ] to be the inverse of the left branch of H . Lemma 1.
Let v , . . . , v n ∼ U (cid:0) F d (cid:1) independent and M ∈ F n × n an invertible matrix. Then for v ′ ... v ′ n := M · v ... v n it also holds that v ′ , . . . , v ′ n ∼ U (cid:0) F d (cid:1) are independently and uniformly distributed. Corollary 1.
For v , w , z ∼ U (cid:0) F d (cid:1) independent, v + z , w + z ∼ U (cid:0) F d (cid:1) are also uniform andindependent.Proof. We have v + zw + zz ! := ! · vwz ! Since the matrix is invertible we can apply Lemma 1.
In this work, we consider the Bichromatic Closest Pair Problem in Hamming metric. Here, theinputs are two lists of equal size containing elements drawn uniformly at random from F d plusa planted pair, whose Hamming distance is ωd for some known ω . More formally, we state theproblem in the following definition. To allow for easy comparison to the result of May-Ozerov, wefollow their notation using the dimension as the primary difficulty parameter. Thus we let the listsizes be n := 2 λd , which means λ = c ( n ) , where d = c ( n ) log n . Definition 1 (Bichromatic Closest Pair Problem) . Let d ∈ N , ω ∈ (cid:2) , (cid:3) and λ ∈ (0 , . Let L = ( v i ) i ∈ [2 λd ] , L = ( w i ) i ∈ [2 λd ] ∈ (cid:0) F d (cid:1) λd be two lists containing elements uniformly drawn atrandom, together with a distinguished pair ( x , y ) ∈ L × L with wt( x + y ) = ωd . We further assumethat for each i, j the vectors v i and w j are pairwise stochastically independent. The Closest PairProblem CP d,λ,ω asks to find this closest pair ( x , y ) given L , L and the weight parameter ω . Wecall ( x , y ) the solution of the CP d,λ,ω problem. First, note that λ ≤ λ > λ ≤
1. We also consider theClosest Pair Problem on input lists whose elements are distributed according to some distribution D different from the uniform one used in Definition 1. To indicate this, we refer to the CP d,λ,ω overdistribution D . Note that in this case, the meaningful upper bound for λ is the entropy of D . echnically speaking, it is also not necessary to know the value of ω , as the time complexity ofappropriate algorithms to solve the CP d,λ,ω problem is solely increasing in ω . Thus if ω is unknown,one would apply the algorithm for each ωd = 0 , , , . . . until the solution is found, which results atmost in polynomial overhead.It is well known, that any LSH based algorithm solving the problem of Definition 1 with non-negligible probability needs at least time complexity | L | − ω = 2 λd − ω [8, 18]. However, this lowerbound assumes the promised pair to be uniquely distinguishable from all other pairs in L × L .Obviously, if the relation of ω and λ lets us expect more than the promised pair of distance ωd in the input lists, an algorithm solving the Closest Pair Problem needs to find all (or at least anon-negligible fraction) of these closest pairs. Such scenarios for example frequently occur whenthe solution to the CP d,λ,ω problem actually is a solution to some different problem [4, 10, 11, 17],which enables a distinction from other closest pairs. Hence, if the input lists contain E closest pairsthe optimal time complexity becomes ˜Ω (cid:16) max(2 λd − ω , E ) (cid:17) Let ( v , w ) ∈ L × L \{ ( x , y ) } be arbitrary list elements. If the elements are chosen independentlyand uniformly at random, as stated in Definition 1 we expect E to be of size E (cid:2) | E | (cid:3) = ( | L × L | − · Pr [wt( v + w ) = ωd ] + 1 |{z} from ( x , y ) = (cid:0) λd − (cid:1) · (cid:0) dωd (cid:1) d + 1= ˜Θ (cid:0) (2 λ + H ( ω ) − d (cid:1) , and, thus, the optimal time complexity to solve the CP d,λ,ω problem becomes T opt = ˜Ω (cid:16) max (cid:16) λd − w , (2 λ + H ( ω ) − d (cid:17)(cid:17) . (1) Our algorithm groups the input elements according to some criterion into several buckets, eachone representing a new closest pair instance with smaller list size. We then apply this bucketingprocedure recursively until the buckets contain few enough elements to eventually solve the ClosestPair Problem represented by them via a naive quadratic search algorithm, the exhaustive search.As a bucketing criterion we choose the weight of the vectors after adding a randomly drawnvector z from F d . Thus, each bucket is represented by a vector z and only those elements v areadded to the bucket, which satisfy wt( v + z ) = δd , where δ is determined later.More precisely in each recursive iteration, our algorithm works only on equally large blocks ofthe input vectors and not on the full d coordinates, i.e. the weight condition is only checked on thecurrent block. This is a technical necessity to obtain independence of vectors in the same bucketon fresh blocks. Let us formally define the notion of blocks. Definition 2 (Block) . Let d, r ∈ N with r | d and i ∈ [ r ] . Then we denote the i -th block of [ d ] as B di,r := h ( i − dr + 1 , i dr i . Note that [ d ] = U i ∈ [ r ] B di,r and (cid:12)(cid:12) B di,r (cid:12)(cid:12) = dr for each i ∈ [ r ] . For a leaner notation and since the roleof d does not change in the course of this paper, we omit the index d in the following, thus we write B i,r := B di,r . Note that in such a scenario the searched ( x , y ) is probably not the pair with the smallest Hammingdistance, however, we still refer to elements attaining Hamming distance ωd as closest pairs . orresponds torelative weight δ x k = dr λd d . . . . . . y . . . . . .. . . . . . + z ( ) + z ( ) + z ( ) N ... x y . . . . . .. . . . . . + z ( ) + z ( ) + z ( ) N ... x y . . . . . . Figure 2: We start off on the left side of the illustration with the two input lists L , L containingthe closest pair ( x , y ). Going right, in each iteration of the algorithm, N different z ( j ) i are randomlychosen and all of the list elements are tested if they fulfill the bucketing criterion. The crosshatchedpattern indicates the parts where the bucket criterion is fulfilled, i.e. the list vectors differ from z ( j ) i in δk positions.In each iteration, we choose the number N of buckets in such a way that with overwhelmingprobability the closest pair lands in at least one of the buckets. Hence, our algorithm creates atree with branching factor N with the distinguished pair being contained in one of the leaves. Thedeeper we get into the tree, the smaller and, hence, the easier the closest pair instances get. Analgorithmic description of the whole procedure is given in pseudocode in Algorithm 1.The following theorem gives the time complexity of our algorithm to solve the CP d,λ,ω . Theorem 1.
Let ω ∈ (cid:2) , (cid:3) and λ ∈ [0 , . Then Algorithm 1 solves the CP d,λ,ω problem withoverwhelming success probability in expected time ϑd (1+ o (1)) , where ϑ = (1 − ω ) (cid:18) − H (cid:18) δ ⋆ − ω − ω (cid:19)(cid:19) for ω ≤ ω ⋆ λ + H ( ω ) − for ω > ω ⋆ , with δ ⋆ := H − (1 − λ ) and ω ⋆ := 2 δ ⋆ (1 − δ ⋆ ) . The case distinction can intuitively be explained as follows: As long as the number of pairs withdistance ωd in the input lists is small enough the algorithm is optimal for a choice of δ such that thelists at the leaves of the tree become polynomial in size. However, if too many closest pairs existin the input lists, enforcing polynomial size of the leaf nodes lets the probability of the solutionbeing contained in one of them drop immensely. Thus to still ensure the algorithm having successin finding the solution an enormous branching factor would be required. Hence, instead the choiceof δ is adapted, which leads to larger leaf nodes and in total to a time complexity that is linear inthe number of closest pairs, which matches the lower bound from Equation (1). lgorithm 1 Closest-Pair ( L , L , ω ) Input: lists L , L ∈ (cid:0) F d (cid:1) λd , weight parameter ω ∈ (cid:2) , (cid:3) Output: list L containing the solution ( x , y ) ∈ L × L to the CP d,λ,ω Set r, P, N ∈ N , δ ∈ (cid:2) , (cid:3) properly and define k := dr ⊲ See Equation (8) for P permutations π do ⊲ permutation on the bit positions Stack S := [( π ( L ) , π ( L ) , L ← ∅ while S is not empty do ( A, B, i ) ← S. pop() if i < r then for N randomly chosen z ∈ F k do A ′ ← ( v ∈ A | wt (cid:0) ( v + z ) B i +1 ,r (cid:1) = δk ) B ′ ← ( w ∈ B | wt (cid:0) ( w + z ) B i +1 ,r (cid:1) = δk ) S. push(( A ′ , B ′ , i + 1)) else for v ∈ A, w ∈ B do ⊲ Naive search if wt( v + w ) = ωd then L ← L ∪ { ( v , w ) } return L We establish the proof of Theorem 1 in a series of lemmata and theorems. Note that anybucketing algorithm heavily depends on two probabilities specific to the chosen bucketing criterion.First, the probability that any element falls into a bucket, which we call p in the remainder ofthis work. This probability is mainly responsible for the lists’ sizes throughout the algorithm. Thesecond relevant probability, which we call q describes the event of both, x and y , falling into thesame bucket, where ( x , y ) is the solution to the CP d,λ,ω problem. This is the probability of ( x , y ) surviving one iteration meaning that q determines the success probability of the algorithm. Insummary, for our choice of bucketing criteria, we get p := Pr z (cid:2) wt(( v + z ) B i,r ) = δk (cid:3) for any v ∈ F k and q := Pr z (cid:2) wt(( x + z ) B i,r ) = wt(( y + z ) B i,r ) = δk (cid:3) , (2)where k = dr is the block width. If we assume that the ωd differing coordinates of x and y distributeevenly into the r blocks, i.e. wt(( x + y ) B i,r ) = ωk for each i , these probabilities are independent of i for δk fixed. This property is ensured for at least one of the P permutations in Algorithm 1 withoverwhelming probability, as we will see in the proof of Theorem 1.We determine the exact form of q and p later. First, we are going to prove the followingstatement about the expected running time of Algorithm 1 in dependence on both probabilities. Theorem 2.
Let q and p be as defined in Equation (2) , ω ∈ (cid:2) , (cid:3) , λ ∈ [0 , and r = λd log d . ThenAlgorithm 1 solves the CP d,λ,ω problem in expected time max (cid:18) q − r , λd · p r − q r , (cid:0) λd · p r (cid:1) q r (cid:19) o (1) with a success probability overwhelming in d .Proof. First, we are going to prove the statement about the time complexity.The algorithm maintains a stack, containing list pairs together with an associated counter. Inevery iteration of the loop in line 5, one element is removed from the stack and if the counter i ssociated with this element is smaller than r , N additional elements ( A ′ , B ′ , i + 1) are pushed tothe stack in line 11. Let us consider the elements on the stack as nodes in a tree of depth r , whereall elements with associated counter i are siblings on level i of the tree. Also, depict the elementspushed to the stack in line 11 as child nodes of the currently processed node ( A, B, i ). Then thetotal number of elements with associated counter i pushed to the stack is bounded by the numberof nodes on level i in a tree with branching factor N , which is N i .Next, let us determine the lists’ sizes on level i of that tree. Therefore, let the expected size oflists on level i be L i . As these lists are constructed from the lists of the previous level by testingthe weight condition in line 9 and 10, it holds that L i = L i − · Pr (cid:2) wt(( v + z ) B i,r )) = δk (cid:3) := L i − · p , where i > L = | L | . By substitution we get L i = | L | · p i , for i = 0 , . . . , r. Now, we are able to compute the time needed to create the nodes on level i of the tree. Observethat for the creation of a level- i node we need to linearly scan through the larger lists of a node onlevel i − N i nodes of level i we need a totaltime of T i = ˜ O (cid:0) L i − · N i (cid:1) = ˜ O (cid:0) | L | · p i − · N i (cid:1) , for each 0 < i ≤ r . Eventually, the list pairs on level r are matched by a naive search with quadraticruntime resulting in T r +1 = ˜ O ( N r · E [ | A r | · | B r | ]) , where A r , B r describe the lists of a level- r node.The expected value of the product, now, depends on the chosen input distribution. We nextargue that for the given input distribution we have E [ | A r | · | B r | ] = O (cid:0) E [ | A r | ] · E [ | B r | ] (cid:1) = O ( L r ) . To see this, first note that for v , w , z independent and uniform, v + z and w + z are alsoindependent and uniform according to Corollary 1. This in turn impliesPr (cid:2) wt(( v + z ) B i,r )) = δk, wt(( w + z ) B i,r )) = δk (cid:3) = Pr (cid:2) wt(( v + z ) B i,r )) = δk (cid:3) · Pr (cid:2) wt(( w + z ) B i,r )) = δk (cid:3) = p since deterministic functions of independent random variables are still independent. This alsoworks for either v = x or w = y , but not for ( v , w ) = ( x , y ). In this case, however, we havePr (cid:2) wt(( x + z ) B i,r )) = δk, wt(( y + z ) B i,r )) = δk (cid:3) = q by definition. With this insight, we can ex-press E [ | A i | · | B i | ] in terms of E [ | A i − | · | B i − | ] for each i via E [ | A i | · | B i | | A i − , B i − ] = X v ∈ A i − , w ∈ B i − ( v , w ) =( x , y ) Pr (cid:2) wt(( v + z ) B i,r ) = δk, wt(( w + z ) B i,r ) = δk (cid:3) + Pr (cid:2) wt(( x + z ) B i,r )) = δk, wt(( y + z ) B i,r )) = δk (cid:3) = ( | A i − | · | B i − | − p + q ≤ | A i − | · | B i − | · p + 1 , Applying the Law of total Expectation we obtain E [ | A i | · | B i | ] = E [ E [ | A i | · | B i | | A i − , B i − ]] ≤ E [ | A i − | · | B i − | ] · p + 1 (3) uccessive application of Equation (3) yields E [ | A r | · | B r | ] ≤ E [ | L | · | L | ] · p r + r = 2 λd p r + r = O ( L r ) (4)Finally, the algorithm is repeated for P different permutations on the bit positions of elementsin L , L . In summary, the expected time complexity to build all list becomes the sum of the T i multiplied by P , thus, by choosing N := dq and P = ( d + 1) r +1 we get T ′ = P · r +1 X i =1 T i ≤ ( d + 1) r +1 · r X i =1 N i · | L | · p i − + ( | L | · p r ) · N r ! = ( d + 1) r +1 · r X i =1 | L | · d i q · (cid:18) pq (cid:19) i − + ( | L | · p r ) · d r q r ! ≤ ( d + 1) r +1 · (cid:18) r · | L | · p r − q r + ( | L | · p r ) q r (cid:19) = max (cid:18) λd · p r − q r , (cid:0) λd · p r (cid:1) q r (cid:19) o (1) , where the inequality follows from the fact that pq ≥ q = Pr (cid:2) wt(( x + z ) B i,r ) = wt(( y + z ) B i,r ) = δk (cid:3) ≤ Pr (cid:2) wt(( x + z ) B i,r ) = δk (cid:3) = p , and the final equality stems from the fact that | L | = 2 λd and r = o ( λd log d ) as given in the theorem.Note that T ′ disregards the fact that no matter how small the lists in the tree become, thealgorithm needs to traverse all T ′′ = ˜ O ( N r ) = ˜ O (cid:18)(cid:18) dq (cid:19) r (cid:19) nodes of the tree. Hence, the expected time complexity of the whole algorithm is T = max( T ′ , T ′′ ) , which proves the claim.Let us now consider the success probability of the algorithm. Therefore, we assume that thechosen permutation distributes the weight on x + y such that in every block of length r the weightis equal to ωdr , which we describe as a good permutation. The probability of a random permutation π distributing the weight in such a way isPr [good π ] = Pr h wt (cid:0) π ( x + y ) B i,r (cid:1) = ωdr , for i = 1 , . . . , r i = (cid:0) drωdr (cid:1) r (cid:0) dω (cid:1) ≥ (cid:16) dr + 1 (cid:17) − r . Thus, the probability of at least one out of ( d + 1) r +1 chosen permutations being good is p := Pr [at least one good π ]= 1 − (1 − Pr [good π ]) ( d +1) r +1 = 1 − (cid:18) − (cid:16) dr + 1 (cid:17) − r (cid:19) ( d +1) r +1 ≥ − e − d . he algorithm succeeds, whenever there exists a leaf node in the tree, containing the distin-guished pair ( x , y ). As every node in the tree is constructed based on its parent, it follows that allnodes on the path from the root to that leaf need to contain ( x , y ). By definition the probabilityof x and y satisfying the bucket criterion at the same time (thus for the same z ) is q and sincewe condition on a good permutation, q is equal for every considered block. Let us define indicatorvariables X j for the first level, where X j = 1 iff the j -th node contains ( x , y ). Observe that the X j for independent choices of z are independent. Thus, clearly the number of trials until ( x , y )is contained in any node on level one is distributed geometrically with parameter q . Hence, theprobability of the solution being contained in at least one node on the first level is p := Pr [ ∃ ( A, B, ∈ S : ( x , y ) ∈ A × B ]= 1 − (1 − q ) N = 1 − (1 − q ) dq ≥ − e − d . Now, imagine the pair being contained in some level- i node. Considering that node, we have withthe same probability p again that at least one child contains the solution, and the same argumentholds until we reach the leaves. Also, by the independent choices of z the events remain independentwhich implies that the probability of ( x , y ) being contained in a level- r list is p r . In summary, thesuccess probability isPr [success] = p · p r ≥ (1 − e − d ) r +1 ≥ − r + 1 e d ≥ − de d . The proof of Theorem 2 already shows, how different distributions may affect the complexityof the algorithm by changing the expected value E [ | A r | · | B r | ]. This influence on the algorithmscomplexity by different input distributions is further investigated in Section 4.In the next two lemmata, we will proof the exact forms of q and p to conduct the run timeanalysis. Lemma 2.
Let k ∈ N , δ ∈ [0 , . If x ∈ F k and z ∼ U ( F k ) then Pr z [wt( x + z ) = δk ] = (cid:18) kδk (cid:19) (cid:16) (cid:17) k . Proof.
Since z ∼ U ( F k ), the probability is (cid:12)(cid:12) { z ∈ F k | wt( x + z ) = δk } (cid:12)(cid:12)(cid:12)(cid:12) F k (cid:12)(cid:12) . To compute the numerator, note that wt( x + z ) = δk means that x and z differ in δk out of k coordinates, for which there are (cid:0) kδk (cid:1) possibilities. Using (cid:12)(cid:12) F k (cid:12)(cid:12) = 2 k , the lemma follows.Before we continue, let us make a small definition. Definition 3.
Let k ∈ N and x , y ∈ F k . Then we define D ( x , y ) ⊆ [ k ] to be the set of coordinateswhere x and y differ, i.e. D ( x , y ) := { i ∈ [ k ] | x i = y i } . Furthermore, let S ( x , y ) := [ k ] \ D ( x , y ) be the set of coordinates where they are the same. Now we derive the exact form of the probability q of a pair with difference ωk falling into thesame bucket. Lemma 3.
Let k ∈ N , δ ∈ [0 , . If x , y ∈ F k with wt( x + y ) = ωk and z ∼ U ( F k ) . Then Pr z [wt( x + z ) = wt( y + z ) = δk ] = (cid:18) ωk ωk (cid:19)(cid:18) (1 − ω ) k (cid:0) δ − ω (cid:1) k (cid:19) (cid:16) (cid:17) k . roof. Let A := { z ∈ F k | wt( x + z ) = wt( y + z ) = δk } . In analogy to Lemma 2, the probability we search for is | A | | F k | = | A | · (cid:0) (cid:1) k . In the following, let ω x := wt( x + z ) and analogously ω y := wt( y + z ). Now observe that everycoordinate z i of z with i ∈ S ( x , y ), so belonging to the set of equal coordinates between x and y ,either contributes to both ω x and ω y does not affect either one of them. Let us define the amountof the z i ’s with i ∈ S ( x , y ) that contribute to the weight as a := | S ( x , y ) ∩ D ( x , z ) | .Now consider the z i ’s with i ∈ D ( x , y ). Clearly, any such z i contributes either to ω x or to ω y . Thus, let us define the number of those z i with i ∈ D ( x , y ) that contribute to ω x as b x := | D ( x , y ) ∩ D ( x , z ) | and analogously those which contribute to ω y as b y := | D ( x , y ) ∩ D ( y , z ) | .Obviously we have b x + b y = | D ( x , y ) | = ωk (5)On the other hand we are only interested in those z for which ω x = ω y = δk , which yields the twoequations ω x = a + b x = δk (6) ω y = a + b y = δk (7)All three equations together yield the unique solution b x = b y = ωk a = (cid:16) δ − ω (cid:17) k . This shows the following: If z ∈ A , it is necessary that z differs from x (analogously y ) in exactly- ω k out of ωk coordinates of D ( x , y ) and- (cid:0) δ − ω (cid:1) k out of (1 − ω ) k coordinates of S ( x , y ).Thus, because we can freely combine both conditions, in total there are | A | = (cid:18) ωk ω k (cid:19)(cid:18) (1 − ω ) k (cid:0) δ − ω (cid:1) k (cid:19) different values for z , finishing the proof.Now we are ready to prove Theorem 1 about the time complexity of Algorithm 1 for solvingthe CP d,λ,ω problem. For convenience we restate the theorem here. Theorem 1.
Let ω ∈ (cid:2) , (cid:3) and λ ∈ [0 , . Then Algorithm 1 solves the CP d,λ,ω problem withoverwhelming success probability in expected time ϑd (1+ o (1)) , where ϑ = (1 − ω ) (cid:18) − H (cid:18) δ ⋆ − ω − ω (cid:19)(cid:19) for ω ≤ ω ⋆ λ + H ( ω ) − for ω > ω ⋆ , with δ ⋆ := H − (1 − λ ) and ω ⋆ := 2 δ ⋆ (1 − δ ⋆ ) .Proof. First let us give the exact form of log p and log q using Stirling’s formula to approximate thebinomial coefficients in Lemma 2 and 3. By setting the block width k = dr we getlog q = (1 − ω ) (cid:18) H (cid:16) δ − ω − ω (cid:17) − (cid:19) dr (cid:0) o (1) (cid:1) andlog p = (cid:0) H ( δ ) − (cid:1) dr (cid:0) o (1) (cid:1) . ow, let us reconsider the running time given in Theorem 2 as T = max (cid:18) q r |{z} ( a ) , λd · p r − q r | {z } ( b ) , (cid:0) λd · p r (cid:1) q r | {z } ( c ) (cid:19) o (1) , where r = λd log d .We now show that the running time for all values of δ ≥ δ ⋆ := H − (1 − λ ) is solely dominatedby ( c ). Observe that we have ( c ) ≥ ( b ), whenever2 λd · p r ≥ p r − ⇔ H ( δ ) ≥ − λrr + 1 ⇔ δ ≥ H − (cid:16) − λ r (cid:17) → H − (1 − λ ) = δ ⋆ , since r = o (1). Also we have ( c ) ≥ ( a ) for the same choice of delta, as2 λd · p r ≥ ⇔ δ ≥ H − (1 − λ ) = δ ⋆ . Thus, for all choices of δ ≥ δ ⋆ the running time is ( T δ ) (1+ o (1)) with ϑ ⋆ ( δ ) := log T δ d = 2( λ + H ( δ ) −
1) + (1 − ω ) (cid:18) − H (cid:16) δ − ω − ω (cid:17)(cid:19) . Now, minimizing ϑ ⋆ yields a global minimum at δ min = (1 − √ − ω ) attaining a value of ϑ ⋆ ( δ min ) = 2 λ + H ( ω ) − . As we are restricted to values for δ which are larger than δ ⋆ solving δ min ≥ δ ⋆ for ω yields δ min ≥ δ ⋆ ⇔ ω ≥ δ ⋆ (1 − δ ⋆ ) = ω ⋆ . This proves the claim of the theorem whenever ω > ω ⋆ . For all other values of ω we simplychoose δ = δ ⋆ , which yields ϑ = ϑ ⋆ ( δ ⋆ ) = (1 − ω ) (cid:18) − H (cid:16) δ ⋆ − ω − ω (cid:17)(cid:19) for ω ≤ ω ⋆ as claimed.Now to boost the expected running time 2 ϑd (1+ o (1)) of the algorithm to actually being obtainedwith overwhelming probability we use a standard Markov argument. Let X denote the randomvariable describing the running time of the algorithm. Then the probability that the algorithmneeds more time than 2 √ d E [ X ] to finish isPr h X ≥ √ d · E [ X ] i ≤ E [ X ]2 √ d · E [ X ] = 2 −√ d , or equivalently the algorithm finishes in less time than 2 √ d E [ X ] = 2 ϑd (1+ o (1)) with overwhelmingprobability. Also, a standard application of the union bound yields that the intersection of thealgorithm finishing within the claimed time and the algorithm having success in finding the solutionis still overwhelming. he theorem shows that whenever ω > ω ∗ our algorithm obtains the optimal time complexityfor uniformly random lists as given in Equation (1). Additionally, our algorithm reaches the timelower bound for locality-sensitive hashing based algorithms for all values of ω , whenever the inputlist sizes are subexponential in the dimension d , which is shown in the following lemma. Lemma 4.
Let ω ∈ (cid:2) , (cid:3) , and ϑ as defined in Theorem 1. Then we have lim λ → ϑλ = 11 − ω . Proof.
Note that for λ converging zero, δ ⋆ = H − (1 − λ ) approaches . This implies ω ⋆ :=2 δ ⋆ (1 − δ ⋆ ) = and hence for all choices of ω we have ϑ = (1 − ω ) (cid:18) − H (cid:18) δ − ω − ω (cid:19)(cid:19) . Now, for this choice of ϑ , May and Ozerov [17, Corollary 1] already showed the statement of thislemma, by applying L’Hoptial’s rule twice.For convenience we restate all parameter choices of Algorithm 1 for solving the CP d,λ,ω in thefollowing overview: r = d log d , P = ( d + 1) r +1 , k = drN = dq , where q = (cid:18) ωk ωk (cid:19)(cid:18) (1 − ω ) k (cid:0) δ − ω (cid:1) k (cid:19) (cid:16) (cid:17) k δ = (cid:26) δ ⋆ for ω ≤ δ ⋆ (1 − δ ⋆ ) (1 − √ − ω ) else , with δ ⋆ := H − (1 − λ ) (8) In this section, we show how to adapt the analysis of Algorithm 1 to variable input distributions.Therefore, we first reformulate Theorem 2 in Corollary 2 for the case of considering the CP d,λ,ω over an arbitrary distribution D . As already indicated in the proof of Theorem 2, this reformulationdepends on the expected value E of the cost of the naive search at the bottom of the computationtree, which is highly influenced by the distribution D . Then, we show how to compute E and how toupper bound it effectively. Finally, we give upper bounds for the time complexity of the algorithmto solve the CP d,λ,ω over some generic distributions. These examples suggest that the algorithm isbest suited for distributions D , where the weight of the sum v + w of elements v , w ∼ D concentratesat d . Let us start with the reformulation of the theorem.
Corollary 2.
Let D be some distribution over F d , q and p be as defined in Equation (2) , ω ∈ (cid:2) , (cid:3) , λ ∈ [0 , and r = λd log d . Also let E = E [ | A | · | B | ] for A and B in line 13 of Algorithm 1 (where theexpectation is taken over the distribution of input lists and the random choices of the algorithm).Then Algorithm 1 solves the CP d,λ,ω problem over D in time max (cid:18) q − r , λd · p r − q r , E q r (cid:19) o (1) with success probability overwhelming in d . This behavior seems quite natural as in this case, the solution is most distinguishable from randominput pairs. roof. The proof follows along the lines of the proof of Theorem 2, by observing that T r +1 = N r · E and the expected time complexity is again amplified to being obtained with overwhelmingprobability by using a Markov argument similar to the proof of Theorem 1.In the next lemma, we show how to upper bound the value of E . Lemma 5 (Expectation of Naive Search) . Let D be some distribution over F d , ω ∈ (cid:2) , (cid:3) , λ ∈ [0 , and r = λd log d . Also let E = E [ | A | · | B | ] for A and B in line 13 of Algorithm 1 when solving someinstance of the CP d,λ,ω over D (where the expectation is taken over the distribution of input listsand the random choices of the algorithm). Then we have E ≤ λd r Y i =1 α i + 4 r · λd · p r where α i := Pr v , w ∼D (cid:2) wt(( v + z ) B i,r ) = δk, wt(( w + z ) B i,r ) = δk (cid:3) .Proof. Given in Appendix A.While Lemma 5 gives an upper bound on the required expectation, it is not very handy. In thenext lemma, we show how to further bound this expectation and how it affects the running time ofthe algorithm.
Lemma 6 (Complexity for Arbitrary Distributions) . Let D be some distribution over F d , r := λd log d , ω ∈ (cid:2) , (cid:3) and λ ∈ [0 , . Also let E = E [ | A | · | B | ] for A and B in line 13 of Algorithm 1 whensolving some instance of the CP d,λ,ω over D (where the expectation is taken over the distribution ofinput lists and the random choices of the algorithm). Then Algorithm 1 solves the CP d,λ,ω over D in time max (cid:18) q − r , λd · p r − q r , εd q r (cid:19) o (1) , where ε = 2 λ − min i ∈ [ r ] γ ∈ [0 , (1 − γ ) (cid:18) − H (cid:18) δ − γ − γ (cid:19)(cid:19) − r · log p i,γk d with p i,γk := Pr (cid:2) wt(( v + w ) B i,r ) = γk (cid:3) .Proof. Given in Appendix A.Note that if it further holds that for v ∼ D each of the r blocks of v is identically dis-tributed we can further simplify the term of ε from Lemma 6. In this case, we have p ri,γk ≤ Pr [wt( v + w ) = γd ] := p γd , thus we get ε = 2 λ − min γ ∈ [0 , (1 − γ ) (cid:18) − H (cid:18) δ − γ − γ (cid:19)(cid:19) − log p γd d . Now if we are given an arbitrary distribution D we can maximize ε according to γ . Then wecan similar to the proof of Theorem 1 derive a value for δ minimizing the overall time complexity.We performed this maximization and optimization numerically for some generic input distri-butions. We considered distributions, where the weight of input vectors is distributed binomially ,chosen according to a Poisson distribution or fixed to a specific value. This means, first a weight issampled according to the chosen distribution and then a vector of that weight is selected uniformlyamong all vectors of that weight. . . . . . . . . . . . ω t i m ee x p o n e n t ϑ ω (a) List sizes | L | = | L | = 2 . d . . . . . . . . . . ω (b) List sizes | L | = | L | = 2 . d Figure 3: Time complexity exponents as a function of the weight of the closest pair for differentinput list distributions, where the expected weight of input elements is equal to 0 . d , 0 . d , 0 . d ,0 . d , 0 . d from left to right.The running time of Algorithm 1 for solving the CP d,λ,ω over the considered distributions seemsto be only dependent on the expected weight of vectors contained in the input lists. That meansthe time complexity for input lists containing random vectors whose weight is either fixed to γd orbinomially or Poisson distributed with expectation γd is equal. This can possibly be explained bythe low variance of all these distributions, which implies a high concentration around this expectedweight.We see in Figure 3, that the value for ω , from where on the complexity becomes quadratic in thelists sizes shifts to the left. This behavior stems from the fact, that the expected weight of a sumof elements is no longer d , but roughly 2 γ (1 − γ ) d . What also stands out is, that the complexityfor ω = 0 is no longer linear in the lists sizes. The reason for this is that the probability of randompairs falling into the same bucket and the probability of the closest pair falling into the same bucketconverge for decreasing weight of input list elements. This indicates that for input distributionswith smaller expected weight a different bucketing criterion might be beneficial. We pose this as anopen question for further research. In this section, we give experimental results of the performance of a proof of concept implemen-tation of our new algorithm. These experiments verify the performance gain of our algorithmover a naive quadratic search approach. We also verify the numerical estimates of the algo-rithm’s performance on different input distributions from the previous section and give somepractical related improvements to our algorithm. Our implementation is publicly available at https://github.com/submission-nn/nn-algorithm .Before discussing the benchmark results let us first briefly describe some of the practical im-provements we introduced in our implementation, which differ from the description in Section 3.We implemented a true depth-first search rather than the iterative description given previously.The iterative description just allowed for a more convenient analysis. Thus, our algorithm needs tostore only the lists of a single path from the root to a leaf node at any time. Also, as all lists ofsubsequent levels are subsets of previous ones, we do not create r different lists. We rather rearrangethe elements of the input list such that elements belonging to the list of the subsequent level areconsecutive, making it sufficient to just memorize the range of elements that belong to the nextlevel list. This way, we only need to store the input list plus two integer markers for each level. . . . . . − − weight ω r un t i m e i n s d = 32 . . . . . weight ωd = 64 . . . . . − − weight ω r un t i m e i n s d = 128 . . . . . weight ωd = 256 ε = 0 ε = 1 ≤ δk Figure 4: Runtime results in seconds in logarithmic scale (y-axis) as the function of the distance ω of the closest pair (x-axis) on random input lists of size 2 . The dotted, dashed and dash-dottedlines indicate the runtime results for the different bucketing strategies used. The straight horizontalline is the time used by a naive quadratic search.Also, it turns out that in practice often a small depth of the tree (not exceeding 8 in ourexperiments) is already sufficient to achieve good runtime results. Regarding the branching factor N of the tree, we achieve optimal results either for values close to its expectation q as given by theanalysis or values being significantly smaller. The case of using a very small branching factor canbe seen as a pruning strategy, similar to the one used in lattice enumeration algorithms for shortestvector search [3]. Additionally, we benchmarked three different strategies for the weight criteria:1. Strictly enforcing a weight of δk in each block, as described in our algorithm.2. Allowing for a small deviation ± ε around δk .3. Allowing for weights of at most δk .Additionally, we introduced a threshold for the size of the lists in the tree from where the compu-tation of further leaves is aborted and naive search is used instead.Figure 4 shows the runtime results for the different bucket criteria on small input lists of size2 containing random elements. Here, each data point was averaged over 50 measurements. Theexperimental results clearly indicate a significant gain over the quadratic search approach. The lesssignificant gain for small dimension d is due to the reduced amount of possible blocks or equivalentlythe low depth of the computation tree, which lets the algorithm not reach its full potential. In the ase of small input lists, we observe that a bucketing strategy that allows a deviation of ε = 1 from δk is beneficial for most values of d .Figure 5 shows the same experiments performed on larger input lists of size 2 . Besides a moresignificant improvement over the naive search, we can observe that the bucketing criterion that uses δk as an upper bound becomes more beneficial for nearly all values of ω and d . . . . . . − − weight ω r un t i m e i n s d = 32 . . . . . weight ωd = 64 . . . . . − − weight ω r un t i m e i n s d = 128 . . . . . weight ωd = 256 ε = 0 ε = 1 ≤ δk Figure 5: Runtime results in seconds in logarithmic scale (y-axis) as the function of the distance ω of the closest pair (x-axis) on random input lists of size 2 . The dotted, dashed and dash-dottedlines indicate the runtime results for the different bucketing strategies used. The straight horizontalline is the time used by a naive quadratic search.Eventually, Figure 6 shows the experimental runtime results on input lists, whose elementsare drawn from a different input distribution, analyzed in Section 4. Here the distribution is theuniformly random distribution over vectors of weight γd . One can observe that for growing d theshape of the graph resembles the theoretical results from Figure 3. References [1] Alman, J.: An illuminating algorithm for the light bulb problem. In: Fineman, J.T., Mitzen-macher, M. (eds.) 2nd Symposium on Simplicity in Algorithms, SOSA@SODA 2019, January8-9, 2019 - San Diego, CA, USA. OASICS, vol. 69, pp. 2:1–2:11. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019), https://doi.org/10.4230/OASIcs.SOSA.2019.2 . . . . . − − weight ω r un t i m e i n s d = 32 . . . . . weight ωd = 64 . . . . . − − weight ω r un t i m e i n s d = 128 γ = 0 . γ = 0 . γ = 0 . γ = 0 . γ = 0 . U Figure 6: Runtime results in seconds in logarithmic scale (y-axis) as the function of the distance ω of the closest pair (x-axis) on input lists of size 2 containing random elements of weight γd . Thedensely dashed line ( U ) indicates the runtime on uniformly random lists.
2] Andoni, A., Razenshteyn, I.: Optimal data-dependent hashing for approximate near neighbors.In: Proceedings of the forty-seventh annual ACM symposium on Theory of computing. pp.793–801 (2015)[3] Aono, Y., Nguyen, P.Q., Seito, T., Shikata, J.: Lower bounds on lattice enumeration with ex-treme pruning. In: Annual International Cryptology Conference. pp. 608–637. Springer (2018)[4] Becker, A., Ducas, L., Gama, N., Laarhoven, T.: New directions in nearest neighbor search-ing with applications to lattice sieving. In: Krauthgamer, R. (ed.) 27th Annual ACM-SIAMSymposium on Discrete Algorithms. pp. 10–24. ACM-SIAM, Arlington, VA, USA (Jan 10–12,2016)[5] Bentley, J.L.: Multidimensional divide-and-conquer. Communications of the ACM 23(4), 214–229 (1980)[6] Both, L., May, A.: Decoding linear codes with high error rate and its impact for LPN security.In: Lange, T., Steinwandt, R. (eds.) Post-Quantum Cryptography - 9th International Confer-ence, PQCrypto 2018. pp. 25–46. Springer, Heidelberg, Germany, Fort Lauderdale, Florida,United States (Apr 9–11 2018)[7] Calonder, M., Lepetit, V., Strecha, C., Fua, P.: Brief: Binary robust independent elementaryfeatures. In: European conference on computer vision. pp. 778–792. Springer (2010)[8] Dubiner, M.: Bucketing coding and information theory for the statistical high-dimensionalnearest-neighbor problem. IEEE Transactions on Information Theory 56(8), 4166–4179 (2010)[9] Esmaeili, M.M., Ward, R.K., Fatourechi, M.: A fast approximate nearest neighbor search algo-rithm in the hamming space. IEEE transactions on pattern analysis and machine intelligence34(12), 2481–2488 (2012)[10] Gueye, C.T., Klamti, J.B., Hirose, S.: Generalization of bjmm-isd using may-ozerov nearestneighbor algorithm over an arbitrary finite field F q . In: International Conference on Codes,Cryptology, and Information Security. pp. 96–109. Springer (2017)[11] Hirose, S.: May-ozerov algorithm for nearest-neighbor problem over F q and its applicationto information set decoding. In: International Conference for Information Technology andCommunications. pp. 115–126. Springer (2016)[12] Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of di-mensionality. In: 30th Annual ACM Symposium on Theory of Computing. pp. 604–613. ACMPress, Dallas, TX, USA (May 23–26, 1998)[13] Karppa, M., Kaski, P., Kohonen, J.: A faster subquadratic algorithm for finding outlier correla-tions. In: Krauthgamer, R. (ed.) 27th Annual ACM-SIAM Symposium on Discrete Algorithms.pp. 1288–1305. ACM-SIAM, Arlington, VA, USA (Jan 10–12, 2016)[14] Khuller, S., Matias, Y.: A simple randomized sieve algorithm for the closest-pair problem.Information and Computation 118(1), 34–37 (1995)[15] Lu, J., Liong, V.E., Zhou, X., Zhou, J.: Learning compact binary face descriptor for facerecognition. IEEE transactions on pattern analysis and machine intelligence 37(10), 2041–2056(2015)[16] Marchini, J., Donnelly, P., Cardon, L.R.: Genome-wide strategies for detecting multiple locithat influence complex diseases. Nature genetics 37(4), 413–417 (2005)[17] May, A., Ozerov, I.: On computing nearest neighbors with applications to decoding of binarylinear codes. In: Oswald, E., Fischlin, M. (eds.) Advances in Cryptology – EUROCRYPT 2015,Part I. Lecture Notes in Computer Science, vol. 9056, pp. 203–228. Springer, Heidelberg, Ger-many, Sofia, Bulgaria (Apr 26–30, 2015)[18] Motwani, R., Naor, A., Panigrahi, R.: Lower bounds on locality sensitive hashing. In: Pro-ceedings of the twenty-second annual symposium on Computational geometry. pp. 154–157(2006)
19] Musani, S.K., Shriner, D., Liu, N., Feng, R., Coffey, C.S., Yi, N., Tiwari, H.K., Allison, D.B.:Detection of gene × gene interactions in genome-wide association studies of human populationdata. Human heredity 63(2), 67–84 (2007)[20] Strecha, C., Bronstein, A., Bronstein, M., Fua, P.: Ldahash: Improved matching with smallerdescriptors. IEEE transactions on pattern analysis and machine intelligence 34(1), 66–78 (2011)[21] Valiant, G.: Finding correlations in subquadratic time, with applications to learning paritiesand juntas. In: 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science. pp.11–20. IEEE (2012)[22] Valiant, L.G.: Functionality in neural nets. In: COLT. vol. 88, pp. 28–39 (1988)[23] Xie, N., Xu, S., Xu, Y.: A new coding-based algorithm for finding closest pair of vectors.Theoretical Computer Science 782, 129–144 (2019) A Proofs for General Distributions
In this section we give the proofs for the lemmata regarding the performance of our algorithm ondifferent input distributions, which were omitted in the main body of the paper.
A.1 Proof of Lemma 5
Similar to the proof of Theorem 2, let us bound E [ | A i | · | B i | ] in terms of E [ | A i − | · | B i − | ], E [ | A i − | ]and E [ | B i − | ] for each i . E [ | A i | · | B i | | A i − , B i − ] = X v ∈ A i − \{ x } w ∈ B i − \{ y } Pr (cid:2) wt(( v + z ) B i,r ) = δk, wt(( w + z ) B i,r ) = δk (cid:3)| {z } =: α i + X v ∈ A i − Pr (cid:2) wt(( v + z ) B i,r ) = δk, wt(( y + z ) B i,r ) = δk (cid:3)| {z } ≤ p + X w ∈ B i − Pr (cid:2) wt(( x + z ) B i,r ) = δk, wt(( w + z ) B i,r ) = δk (cid:3)| {z } ≤ p + Pr (cid:2) wt(( x + z ) B i,r )) = δk, wt(( y + z ) B i,r )) = δk (cid:3) ≤ α i · | A i − | · | B i − | + p · ( | A i − | + | B i − | + 1)and hence E [ | A i | · | B i | ] ≤ α i · E [ | A i − | · | B i − | ] + p · ( E [ | A i − | ] + E [ | B i − | ] + 1). Again, applying thisequation successively, we obtain E = E [ | A r | · | B r | ] ≤ λd r Y i =1 α i + 4 · λd · r X i =1 i − Y j =0 α r − j ! p r − i +1 ≤ λd r Y i =1 α i + 4 r · λd · p r . A.2 Proof of Lemma 6
Taking the result for E from Lemma 5 and plugging into the run time formula from Corollary 2 weget that the CP d,λ,ω problem over D can be solved with probability overwhelming in d in timemax (cid:18) q − r , λd · p r − q r , E q r (cid:19) o (1) ≤ max (cid:18) q − r , λd · p r − q r , λd Q ri =1 α i q r (cid:19) o (1) ince the right summand r · λd · p r q r of E q r is asymptotically smaller than the second entry in the max,i.e. λd · p r − q r . Thus, is suffices to find an easier upper bound for the first summand S := 2 λd Q ri =1 α i .Remembering α i = Pr (cid:2) wt(( v + z ) B i,r ) = δk, wt(( w + z ) B i,r ) = δk (cid:3) we receive S ≤ λd · (cid:18) max i ∈ [ r ] α i (cid:19) r = 2 λd · max i ∈ [ r ] k X j =0 q i,j · Pr (cid:2) wt(( v + w ) B i,r ) = j (cid:3)! r ≤ λd + o ( d ) · (cid:18) max i ∈ [ r ] , j ∈ [ k ] ∪{ } q i,j · Pr (cid:2) wt(( v + w ) B i,r ) = j (cid:3)(cid:19) r = 2 λd + o ( d ) · (cid:18) max i ∈ [ r ] , γ ∈ [0 , q i,γk · Pr (cid:2) wt(( v + w ) B i,r ) = γk (cid:3)(cid:19) r , where q i,γk = Pr (cid:2) wt(( v + z ) B i,r ) = δk, wt(( w + z ) B i,r ) = δk | wt(( v + w ) B i,r ) = γk (cid:3) . Lemma 3lets us rewrite this probability as q i,γk = (cid:18) γk γk (cid:19)(cid:18) (1 − γ ) k (cid:0) δ − γ (cid:1) k (cid:19) (cid:16) (cid:17) k ≤ − (cid:16) − H (cid:16) δ − γ − γ (cid:17)(cid:17) (1 − γ ) k . We end up with S ≤ λd + r · max i ∈ [ r ] , γ ∈ [0 , − (cid:16) − H (cid:16) δ − γ − γ (cid:17)(cid:17) (1 − γ ) k +log p i,γk + o ( d ) = 2 (cid:16) λ + max i ∈ [ r ] , γ ∈ [0 , − (cid:16) − H (cid:16) δ − γ − γ (cid:17)(cid:17) (1 − γ )+ rd · log p i,γk (cid:17) d + o ( d ) = 2 (cid:16) λ − min i ∈ [ r ] , γ ∈ [0 , (cid:16) − H (cid:16) δ − γ − γ (cid:17)(cid:17) (1 − γ ) − rd · log p i,γk (cid:17) d + o ( d ) with p i,γk := Pr (cid:2) wt(( v + w ) B i,r ) = γk (cid:3) , which proves the claim., which proves the claim.