Continuous LWE
CContinuous LWE
Joan Bruna ∗ a,b,c , Oded Regev † a , Min Jae Song ‡ a , and Yi Tang aa Courant Institute of Mathematical Sciences, New York University, New York b Center for Data Science, New York University c Institute for Advanced Study, PrincetonMay 20, 2020
Abstract
We introduce a continuous analogue of the Learning with Errors (LWE) problem, which wename CLWE. We give a polynomial-time quantum reduction from worst-case lattice problems toCLWE, showing that CLWE enjoys similar hardness guarantees to those of LWE. Alternatively,our result can also be seen as opening new avenues of (quantum) attacks on lattice problems. Ourwork resolves an open problem regarding the computational complexity of learning mixtures ofGaussians without separability assumptions (Diakonikolas 2016, Moitra 2018). As an additionalmotivation, (a slight variant of) CLWE was considered in the context of robust machine learning(Diakonikolas et al. FOCS 2017), where hardness in the statistical query (SQ) model was shown;our work addresses the open question regarding its computational hardness (Bubeck et al. ICML2019).
The Learning with Errors (LWE) problem has served as a foundation for many lattice-based cryp-tographic schemes [Pei16]. Informally, LWE asks one to solve noisy random linear equations. To bemore precise, the goal is to find a secret vector s ∈ Z nq given polynomially many samples of the form( a i , b i ), where a i ∈ Z nq is uniformly chosen and b i ≈ (cid:104) a i , s (cid:105) (mod q ). In the absence of noise, LWEcan be efficiently solved using Gaussian elimination. However, LWE is known to be hard assuminghardness of worst-case lattice problems such as Gap Shortest Vector Problem (GapSVP) or Short-est Independent Vectors Problem (SIVP) in the sense that there is a polynomial-time quantumreduction from these worst-case lattice problems to LWE [Reg05].In this work, we introduce a new problem, called Continuous LWE (CLWE). As the namesuggests, this problem can be seen as a continuous analogue of LWE, where equations in Z nq arereplaced with vectors in R n (see Figure 1). More precisely, CLWE considers noisy inner products z i ≈ γ (cid:104) y i , w (cid:105) (mod 1), where the noise is drawn from a Gaussian distribution of width β > γ > w ∈ R n is a secret unit vector, and the public vectors y i ∈ R n aredrawn from the standard Gaussian. Given polynomially many samples of the form ( y i , z i ), CLWEasks one to find the secret direction w . ∗ This work is partially supported by the Alfred P. Sloan Foundation, NSF RI-1816753, NSF CAREER CIF1845360, and the Institute for Advanced Study. † Research supported by the Simons Collaboration on Algorithms and Geometry, a Simons Investigator Award,and by the National Science Foundation (NSF) under Grant No. CCF-1814524. ‡ Research supported by the National Science Foundation (NSF) under Grant No. CCF-1814524. a r X i v : . [ c s . CC ] M a y igure 1: Scatter plot of two-dimensional CLWE samples. Color indicates the last ( z ) coordinate.One can also consider a closely related homogeneous variant of CLWE (see Figure 2). Thisdistribution, which we call homogeneous CLWE, can be obtained by essentially conditioning on z i ≈
0. It is a mixture of “Gaussian pancakes” of width ≈ β/γ in the secret direction and width 1in the remaining n − ≈ /γ . (See Definition 2.19 for the precise statement.)Figure 2: Left: Scatter plot of two-dimensional homogeneous CLWE samples. Right: Unnormal-ized probability densities of homogeneous CLWE (blue) and Gaussian (orange) along the hiddendirection.Our main result is that CLWE (and homogeneous CLWE) enjoy hardness guarantees similar tothose of LWE. Theorem 1.1 (Informal) . Let n be an integer, β = β ( n ) ∈ (0 , and γ = γ ( n ) ≥ √ n such thatthe ratio γ/β is polynomially bounded. If there exists an efficient algorithm that solves CLWE β,γ , hen there exists an efficient quantum algorithm that approximates worst-case lattice problems towithin polynomial factors. Although we defined CLWE above as a search problem of finding the hidden direction, Theo-rem 1.1 is actually stronger, and applies to the decision variant of CLWE in which the goal is todistinguish CLWE samples ( y i , z i ) from samples where the noisy inner product z i is replaced by arandom number distributed uniformly on [0 ,
1) (and similarly for the homogeneous variant).
Motivation: Lattice algorithms.
Our original motivation to consider CLWE is as a possibleapproach to finding quantum algorithms for lattice problems. Indeed, the reduction above (just likethe reduction to LWE [Reg05]), can be interpreted in an algorithmic way: in order to quantumlysolve worst-case lattice problems, “all” we have to do is solve CLWE (classically or quantumly).The elegant geometric nature of CLWE opens up a new toolbox of techniques that can potentiallybe used for solving lattice problems, such as sum-of-squares-based techniques and algorithms forlearning mixtures of Gaussians [MV10]. Indeed, some recent algorithms (e.g., [KKK19, RY20])solve problems that include CLWE or homogeneous CLWE as a special case (or nearly so), yet asfar as we can tell, so far none of the known results leads to an improvement over the state of theart in lattice algorithms.To demonstrate the usefulness of CLWE as an algorithmic target, we show in Section 7 asimple moment-based algorithm that solves CLWE in time exp( γ ). Even though this does notimply subexponential time algorithms for lattice problems (since Theorem 1.1 requires γ > √ n ),it is interesting to contrast this algorithm with an analogous algorithm for LWE by Arora andGe [AG11]. The two algorithms have the same running time (where γ is replaced by the absolutenoise αq in the LWE samples), and both rely on related techniques (moments in our case, poweringin Arora-Ge’s), yet the Arora-Ge algorithm is technically more involved than our rather trivialalgorithm (which just amounts to computing the empirical covariance matrix). We interpret thisas an encouraging sign that CLWE might be a better algorithmic target than LWE. Motivation: Hardness of learning Gaussian mixtures.
Learning mixtures of Gaussians isa classical problem in machine learning [Pea94]. Efficient algorithms are known for the task if theGaussian components are guaranteed to be sufficiently well separated (e.g., [Das99, VW02, AK05,DS07, BV08, RV17, HL18, KSS18, DKS18]). Without such strong separation requirements, it isknown that efficiently recovering the individual components of a mixture (technically known as“parameter estimation”) is in general impossible [MV10]; intuitively, this exponential informationtheoretic lower bound holds because the Gaussian components “blur into each other”, despite beingmildly separated pairwise.This leads to the question of whether there exists an efficient algorithm that can learn mixturesof Gaussians without strong separation requirement, not in the above strong parameter estimationsense (which is impossible), but rather in the much weaker density estimation sense, where the goalis merely to output an approximation of the given distribution’s density function. See [Dia16, Moi18]for the precise statement and [DKS17] where a super-polynomial lower bound for density estimationis shown in the restricted statistical query (SQ) model [Kea98, Fel+17]. Our work provides anegative answer to this open question, showing that learning Gaussian mixtures is computationallydifficult even if the goal is only to output an estimate of the density (see Proposition 5.2). It isworth noting that our hard instance has almost non-overlapping components, i.e., the pairwisestatistical distance between distinct Gaussian components is essentially 1, a property shared by theSQ-hard instance of [DKS17]. 3 otivation: Robust machine learning.
Variants of CLWE have already been analyzed in thecontext of robust machine learning [Bub+19], in which the goal is to learn a classifier that is robustagainst adversarial examples at test time [Sze+14]. In particular, Bubeck et al. [Bub+19] use theSQ-hard Gaussian mixture instance of Diakonikolas et al. [DKS17] to establish SQ lower bounds forlearning a certain binary classification task, which can be seen as a variant of homogeneous CLWE.The key difference between our distribution and that of [DKS17, Bub+19] is that our distributionhas equal spacing between the “layers” along the hidden direction, whereas their “layers” arecentered around roots of Hermite polynomials (the goal being to exactly match the lower momentsof the standard Gaussian). The connection to lattices, which we make for the first time here,answers an open question by Bubeck et al. [Bub+19].As additional evidence of the similarity between homogeneous CLWE and the distribution con-sidered in [DKS17, Bub+19], we prove a super-polynomial SQ lower bound for homogeneous CLWE(even with super-polynomial precision). For γ = Ω( √ n ), this result translates to an exponentialSQ lower bound for exponential precision, which corroborates our computational hardness resultbased on worst-case lattice problems. The uniform spacing in the hidden structure of homogeneousCLWE leads to a simplified proof of the SQ lower bound compared to previous works, which con-sidered non-uniform spacing between the Gaussian components. Note that computational hardnessdoes not automatically imply SQ hardness.Bubeck et al. [Bub+19] were also interested in a variant of the learning problem where insteadof one hidden direction, there are m ≥ m = 1 case above are replaced with “Gaussian baguettes” in the case m = 2, forming an orthogonal grid in the secret two-dimensional space. As we show in Section 9,our computational hardness easily extends to the m > m > ≈ /γ (whichcan be as high as ≈ / √ n if we want our hardness to hold) to ≈ √ m/γ (which can be as high as ≈ m ≈ n ). This is a desirable feature for showing hardness of robust machine learning. Motivation: Cryptographic applications.
Given the wide range of cryptographic applica-tions of LWE [Pei16], it is only natural to expect that CLWE would also be useful for somecryptographic tasks, a question we leave for future work. CLWE’s clean and highly symmetricdefinition should make it a better fit for some applications; its continuous nature, however, mightrequire a discretization step due to efficiency considerations.
Strong analogy with LWE.
As argued above, there are apparently nontrivial differences be-tween CLWE and LWE, especially in terms of possible algorithmic approaches. However, there isundoubtedly also strong similarity between the two. Indeed, the origins of both LWE and CLWEcan be traced back to the seminal work of Ajtai and Dwork [AD97] (see also [Reg04]). In termsof parameters, the γ parameter in CLWE (density of layers) plays the role of the absolute noiselevel αq in LWE. And the β parameter in CLWE plays the role of the relative noise parameter α in LWE. Using this correspondence between the parameters, the hardness proved for CLWE inTheorem 1.1 is essentially identical to the one proved for LWE in [Reg05]. The similarity extendseven to the noiseless case, where α = 0 in LWE and β = 0 in CLWE. In particular, in Section 6we present an efficient LLL-based algorithm for solving noiseless CLWE, which is analogous toGaussian elimination for noiseless LWE. 4 cknowledgements. We thank Aravindan Vijayaraghavan and Ilias Diakonikolas for usefulcomments.
Broadly speaking, our proof follows the iterative structure of the original LWE hardness proof [Reg05](in fact, one might say most of the ingredients for CLWE were already present in that 2005 pa-per!). We also make use of some recent techniques, such as a way to reduce to decision problemsdirectly [PRS17].In more detail, as in previous work, our main theorem boils down to solving the followingproblem: we are given a CLWE β,γ oracle and polynomially many samples from D L,r , the discreteGaussian distribution on L of width r , and our goal is to solve BDD L ∗ ,γ/r , which is the problemof finding the closest vector in the dual lattice L ∗ given a vector t that is within distance γ/r of L ∗ . (It is known that BDD L ∗ , /r can be efficiently solved even if all we are given is polynomiallymany samples from D L,r , without any need for an oracle [AR05]; the point here is that the CLWEoracle allows us to extend the decoding radius from 1 /r to γ/r .) Once this is established, the maintheorem follows from previous work [PRS17, Reg05]. Very briefly, the resulting BDD solution isused in a quantum procedure to produce discrete Gaussian samples that are shorter than the oneswe started with. This process is then repeated, until eventually we end up with the desired shortdiscrete Gaussian samples. We remark that this process incurs a √ n loss in the Gaussian width(Lemma 3.4), and the reason we require γ ≥ √ n is to overcome this loss.We now explain how we solve the above problem. For simplicity, assume for now that we have a search CLWE oracle that recovers the secret exactly. Let the given BDD instance be u + w , where u ∈ L ∗ and (cid:107) w (cid:107) = γ/r . We will consider the general case of (cid:107) w (cid:107) ≤ γ/r in Section 3. The mainidea is to generate CLWE samples whose secret is essentially the desired BDD solution w , whichwould then complete the proof. To begin, take a sample from the discrete Gaussian distribution y ∼ D L,r (as provided to us) and consider the inner product (cid:104) y , u + w (cid:105) = (cid:104) y , w (cid:105) (mod 1) , where the equality holds since (cid:104) y , u (cid:105) ∈ Z by definition. The ( n +1)-dimensional vector ( y , (cid:104) y , w (cid:105) mod1) is almost a CLWE sample (with parameter γ since γ = r (cid:107) w (cid:107) is the width of (cid:104) y , w (cid:105) ) — the onlyproblem is that in CLWE the y ’s need to be distributed according to a standard Gaussian, buthere the y ’s are distributed according to a discrete Gaussian over L . To complete the transforma-tion into bonafide CLWE samples, we add Gaussian noise of appropriate variance to both y and (cid:104) y , w (cid:105) (and rescale y so that it is distributed according to the standard Gaussian distribution).We then apply the search CLWE β,γ oracle on these CLWE samples to recover w and thereby solveBDD L ∗ ,γ/r .Note that our main result actually involves a decision CLWE oracle, which does not recoverthe secret w immediately. Working with this decision oracle requires some care. To that end, ourproof will incorporate the “oracle hidden center” finding procedure from [PRS17], the details ofwhich can be found in Section 3.3. We actually require samples from D L,r i for polynomially many r i ’s satisfying r i ≥ r , see Section 3. Preliminaries
Definition 2.1 (Statistical distance) . For two distributions D and D over R n with density func-tions φ and φ , respectively, we define the statistical distance between them as ∆( D , D ) = 12 (cid:90) R n | φ ( x ) − φ ( x ) | d x . We denote the statistical distance by ∆( φ , φ ) if only the density functions are specified.Moreover, for random variables X ∼ D and X ∼ D , we also denote ∆( X , X ) = ∆( D , D ).One important fact is that applying (possibly a randomized) function cannot increase statisticaldistance, i.e., for random variables X, Y and function f ,∆( f ( X ) , f ( Y )) ≤ ∆( X, Y ) . We define the advantage of an algorithm A solving the decision problem of distinguishing twodistributions D n and D (cid:48) n parameterized by n as (cid:12)(cid:12)(cid:12) Pr x ∼D n [ A ( x ) = YES] − Pr x ∼D (cid:48) n [ A ( x ) = YES] (cid:12)(cid:12)(cid:12) . Moreover, we define the advantage of an algorithm A solving the average-case decision problem ofdistinguishing two distributions D n,s and D (cid:48) n,s parameterized by n and s , where s is equipped withsome distribution S n , as (cid:12)(cid:12)(cid:12) Pr s ∼S n [ A B n,s (1 n ) = YES] − Pr s ∼S n [ A B (cid:48) n,s (1 n ) = YES] (cid:12)(cid:12)(cid:12) , where B n,s and B n,s are respectively the sampling oracles of D n,s and D (cid:48) n,s . We say that an algorithm A has non-negligible advantage if its advantage is a non-negligible function in n , i.e., a function inΩ( n − c ) for some constant c > Lattices. A lattice is a discrete additive subgroup of R n . Unless specified otherwise, we assumeall lattices are full rank, i.e., their linear span is R n . For an n -dimensional lattice L , a set of linearlyindependent vectors { b , . . . , b n } is called a basis of L if L is generated by the set, i.e., L = B Z n where B = [ b , . . . , b n ]. The determinant of a lattice L with basis B is defined as det( L ) = | det( B ) | ;it is easy to verify that the determinant does not depend on the choice of basis.The dual lattice of a lattice L , denoted by L ∗ , is defined as L ∗ = { y ∈ R n | (cid:104) x , y (cid:105) ∈ Z for all x ∈ L } . If B is a basis of L then ( B T ) − is a basis of L ∗ ; in particular, det( L ∗ ) = det( L ) − . Definition 2.2.
For an n -dimensional lattice L and ≤ i ≤ n , the i -th successive minimum of L is defined as λ i ( L ) = inf { r | dim(span( L ∩ B ( , r ))) ≥ i } , where B ( , r ) is the closed ball of radius r centered at the origin. We define the function ρ s ( x ) = exp( − π (cid:107) x /s (cid:107) ). Note that ρ s ( x ) /s n , where n is the dimensionof x , is the probability density of the Gaussian distribution with covariance s / (2 π ) · I n .6 efinition 2.3 (Discrete Gaussian) . For lattice L ⊂ R n , vector y ∈ R n , and parameter r > , the discrete Gaussian distribution D y + L,r on coset y + L with width r is defined to have support y + L and probability mass function proportional to ρ r . For y = , we simply denote the discrete Gaussian distribution on lattice L with width r by D L,r . Abusing notation, we denote the n -dimensional continuous Gaussian distribution with zeromean and isotropic variance r / (2 π ) as D R n ,r . Finally, we omit the subscript r when r = 1 andrefer to D R n as the standard Gaussian (despite it having covariance I n / (2 π )). Claim 2.4 ([Pei10, Fact 2.1]) . For any r , r > and vectors x , c , c ∈ R n , let r = (cid:112) r + r , r = r r /r , and c = ( r /r ) c + ( r /r ) c . Then ρ r ( x − c ) · ρ r ( x − c ) = ρ r ( c − c ) · ρ r ( x − c ) . Fourier analysis.
We briefly review basic tools of Fourier analysis required later on. The Fouriertransform of a function f : R n → C is defined to beˆ f ( w ) = (cid:90) R n f ( x ) e − πi (cid:104) x , w (cid:105) d x . An elementary property of the Fourier transform is that if f ( w ) = g ( w + v ) for some v ∈ R n ,then ˆ f ( w ) = e πi (cid:104) v , w (cid:105) ˆ g ( w ). Another important fact is that the Fourier transform of a Gaussian isalso a Gaussian, i.e., ˆ ρ = ρ ; more generally, ˆ ρ s = s n ρ /s . We also exploit the Poisson summationformula stated below. Note that we denote by f ( A ) = (cid:80) x ∈ A f ( x ) for any function f and anydiscrete set A . Lemma 2.5 (Poisson summation formula) . For any lattice L and any function f , f ( L ) = det( L ∗ ) · (cid:98) f ( L ∗ ) . Smoothing parameter.
An important lattice parameter induced by discrete Gaussian whichwill repeatedly appear in our work is the smoothing parameter , defined as follows.
Definition 2.6 (Smoothing parameter) . For lattice L and real ε > , we define the smoothingparameter η ε ( L ) as η ε ( L ) = inf { s | ρ /s ( L ∗ \ { } ) ≤ ε } . Intuitively, this parameter is the width beyond which the discrete Gaussian distribution behaveslike a continuous Gaussian. This is formalized in the lemmas below.
Lemma 2.7 ([Reg05, Claim 3.9]) . For any n -dimensional lattice L , vector u ∈ R n , and r, s > satisfying rs/t ≥ η ε ( L ) for some ε < , where t = √ r + s , the statistical distance between D u + L,r + D R n ,s and D R n ,t is at most ε . Lemma 2.8 ([PRS17, Lemma 2.5]) . For any n -dimensional lattice L , real ε > , and r ≥ η ε ( L ) ,the statistical distance between D R n ,r mod L and the uniform distribution over R n /L is at most ε/ . To be precise, f needs to satisfy some niceness conditions; this will always hold in our applications. D L,r and add continuous Gaussian noise D R n ,s to the sample, the resulting distribution is statistically close to D R n , √ r + s , which is preciselywhat one gets by adding two continuous Gaussian distributions of width r and s . Unless specifiedotherwise, we always assume ε is negligibly small in n , say ε = exp( − n ). The following are someuseful upper and lower bounds on the smoothing parameter η ε ( L ). Lemma 2.9 ([PRS17, Lemma 2.6]) . For any n -dimensional lattice L and ε = exp( − c n ) , η ε ( L ) ≤ c √ n/λ ( L ∗ ) . Lemma 2.10 ([MR07, Lemma 3.3]) . For any n -dimensional lattice L and ε > , η ε ( L ) ≤ (cid:114) ln(2 n (1 + 1 /ε )) π · λ n ( L ) . Lemma 2.11 ([Reg05, Claim 2.13]) . For any n -dimensional lattice L and ε > , η ε ( L ) ≥ (cid:114) ln 1 /επ · λ ( L ∗ ) . Computational problems.
GapSVP and SIVP are among the main computational problemson lattices and are believed to be computationally hard (even with quantum computation) forpolynomial approximation factor α ( n ). We also define two additional problems, DGS and BDD. Definition 2.12 (GapSVP) . For an approximation factor α = α ( n ) , an instance of GapSVP α isgiven by an n -dimensional lattice L and a number d > . In YES instances, λ ( L ) ≤ d , whereasin NO instances, λ ( L ) > α · d . Definition 2.13 (SIVP) . For an approximation factor α = α ( n ) , an instance of SIVP α is givenby an n -dimensional lattice L . The goal is to output a set of n linearly independent lattice vectorsof length at most α · λ n ( L ) . Definition 2.14 (DGS) . For a function ϕ that maps lattices to non-negative reals, an instance of DGS ϕ is given by a lattice L and a parameter r ≥ ϕ ( L ) . The goal is to output an independentsample whose distribution is within negligible statistical distance of D L,r . Definition 2.15 (BDD) . For an n -dimensional lattice L and distance bound d > , an instance of BDD
L,d is given by a vector t = w + u , where u ∈ L and (cid:107) w (cid:107) ≤ d . The goal is to output w . We now define the learning with errors (LWE) problem. This definition will not be used in thesequel, and is included for completeness. Let n and q be positive integers, and α > q as Z q = Z /q Z and quotient group of reals modulothe integers as T = R / Z = [0 , Definition 2.16 (LWE distribution) . For integer q ≥ and vector s ∈ Z nq , the LWE distribution A s ,α over Z nq × T is sampled by independently choosing uniformly random a ∈ Z nq and e ∼ D R ,α ,and outputting ( a , ( (cid:104) a , s (cid:105) /q + e ) mod 1) . Definition 2.17.
For an integer q = q ( n ) ≥ and error parameter α = α ( n ) > , the average-casedecision problem LWE q,α is to distinguish the following two distributions over Z nq × T : (1) the LWEdistribution A s ,α for some uniformly random s ∈ Z nq (which is fixed for all samples), or (2) theuniform distribution. .3 Continuous learning with errors We now define the CLWE distribution, which is the central subject of our analysis.
Definition 2.18 (CLWE distribution) . For unit vector w ∈ R n and parameters β, γ > , definethe CLWE distribution A w ,β,γ over R n +1 to have density at ( y , z ) proportional to ρ ( y ) · (cid:88) k ∈ Z ρ β ( z + k − γ (cid:104) y , w (cid:105) ) . Equivalently, a sample ( y , z ) from the CLWE distribution A w ,β,γ is given by the ( n + 1)-dimensional vector ( y , z ) where y ∼ D R n and z = ( γ (cid:104) y , w (cid:105) + e ) mod 1 where e ∼ D R ,β . The vector w is the hidden direction, γ is the density of layers, and β is the noise added to each equation. Fromthe CLWE distribution, we can arrive at the homogeneous CLWE distribution by conditioning on z = 0. A formal definition is given as follows. Definition 2.19 (Homogeneous CLWE distribution) . For unit vector w ∈ R n and parameters β, γ > , define the homogeneous CLWE distribution H w ,β,γ over R n to have density at y propor-tional to ρ ( y ) · (cid:88) k ∈ Z ρ β ( k − γ (cid:104) y , w (cid:105) ) . (1)The homogeneous CLWE distribution can be equivalently defined as a mixture of Gaussians.To see this, notice that Eq. (1) is equal to (cid:88) k ∈ Z ρ √ β + γ ( k ) · ρ ( π w ⊥ ( y )) · ρ β/ √ β + γ (cid:16) (cid:104) y , w (cid:105) − γβ + γ k (cid:17) , (2)where π w ⊥ denotes the projection on the orthogonal space to w . Hence, H w ,β,γ can be viewed asa mixture of Gaussian components of width β/ (cid:112) β + γ (which is roughly β/γ for β (cid:28) γ ) in thesecret direction, and width 1 in the orthogonal space. The components are equally spaced, with aseparation of γ/ ( β + γ ) between them (which is roughly 1 /γ for β (cid:28) γ ).We remark that the integral of (1) (or equivalently, of (2)) over all y is Z = β (cid:112) β + γ · ρ (cid:32) (cid:112) β + γ Z (cid:33) . (3)This is easy to see since the integral over y of the product of the last two ρ terms in (2) is β/ (cid:112) β + γ independently of k . Definition 2.20.
For parameters β, γ > , the average-case decision problem CLWE β,γ is todistinguish the following two distributions over R n × T : (1) the CLWE distribution A w ,β,γ for someuniformly random unit vector w ∈ R n (which is fixed for all samples), or (2) D R n × U . Definition 2.21.
For parameters β, γ > , the average-case decision problem hCLWE β,γ is todistinguish the following two distributions over R n : (1) the homogeneous CLWE distribution H w ,β,γ for some uniformly random unit vector w ∈ R n (which is fixed for all samples), or (2) D R n . Note that CLWE β,γ and hCLWE β,γ are defined as average-case problems. We could haveequally well defined them to be worst-case problems, requiring the algorithm to distinguish thedistributions for all hidden directions w ∈ R n . The following claim shows that the two formulationsare equivalent. 9 laim 2.22. For any β, γ > , there is a polynomial-time reduction from worst-case CLWE β,γ to(average-case)
CLWE β,γ .Proof.
Given CLWE samples { ( y i , z i ) } poly( n ) i =1 from A w ,β,γ , we apply a random rotation R , givingus samples of the form { ( Ry i , z i } poly( n ) i =1 . Since the standard Gaussian is rotationally invariant and (cid:104) y , w (cid:105) = (cid:104) Ry , R T w (cid:105) , the rotated CLWE samples are distributed according to A R T w ,β,γ . Since R is a random rotation, the random direction R T w is uniformly distributed on the sphere. In this section, we give an overview of the quantum reduction from worst-case lattice problems toCLWE. Our goal is to show that we can efficiently solve worst-case lattice problems, in particularGapSVP and SIVP, using an oracle for CLWE (and with quantum computation). We first stateour main theorem, which was stated informally as Theorem 1.1 in the introduction.
Theorem 3.1.
Let β = β ( n ) ∈ (0 , and γ = γ ( n ) ≥ √ n be such that γ/β is polynomiallybounded. Then there is a polynomial-time quantum reduction from DGS √ nη ε ( L ) /β to CLWE β,γ . Using standard reductions from GapSVP and SIVP to DGS (see, e.g., [Reg05, Section 3.3]),our main theorem immediately implies the following corollary.
Corollary 3.2.
Let β = β ( n ) ∈ (0 , and γ = γ ( n ) ≥ √ n such that γ/β is polynomially bounded.Then, there is a polynomial-time quantum reduction from SIVP α and GapSVP α to CLWE β,γ forsome α = ˜ O ( n/β ) . Based on previous work, to prove Theorem 3.1, it suffices to prove the following lemma, whichis the goal of this section.
Lemma 3.3.
Let β = β ( n ) ∈ (0 , and γ = γ ( n ) ≥ √ n such that q = γ/β is polynomiallybounded. There exists a probabilistic polynomial-time (classical) algorithm with access to an oraclethat solves CLWE β,γ , that takes as input a lattice L ⊂ R n , parameters β, γ , and r ≥ q · η ε ( L ) , and poly( n ) many samples from the discrete Gaussian distribution D L,r i for poly( n ) parameters r i ≥ r and solves BDD L ∗ ,d for d = γ/ ( √ r ) . In other words, we can implement an oracle for BDD L ∗ ,γ/ ( √ r ) using polynomially many discreteGaussian samples and the CLWE oracle as a sub-routine. The proof of Lemma 3.3 will be given inSection 3.2 (which is the novel contribution) and Section 3.3 (which mainly follows [PRS17]).In the rest of this subsection, we briefly explain how Theorem 3.1 follows from Lemma 3.3.This derivation is already implicit in past work [PRS17, Reg05], and is included here mainly forcompleteness. Readers familiar with the reduction may skip directly to Section 3.2.The basic idea is to start with samples from a very wide discrete Gaussian (which can beefficiently sampled) and then iteratively sample from narrower discrete Gaussians, until eventuallywe end up with short discrete Gaussian samples, as required (see Figure 3). Each iteration consistsof two steps: the first classical step is given by Lemma 3.3, allowing us to solve BDD on the duallattice; the second step is quantum and is given in Lemma 3.4 below, which shows that solvingBDD leads to sampling from narrower discrete Gaussian. Lemma 3.4 ([Reg05, Lemma 3.14]) . There exists an efficient quantum algorithm that, given any n -dimensional lattice L , a number d < λ ( L ∗ ) / , and an oracle that solves BDD L ∗ ,d , outputs asample from D L, √ n/ ( √ d ) . . . poly samplesfrom D L,r oracle forBDD L ∗ ,γ/ ( √ r ) poly samplesfrom D L,r √ n/γ oracle forBDD L ∗ ,γ / ( √ nr ) poly samplesfrom D L,rn/γ . . . c l a ss i c a l, u s e s C L W E q u a n t u m c l a ss i c a l, u s e s C L W E q u a n t u m Figure 3: Two iterations of the reduction.Similar to [PRS17], there is a subtle requirement in Lemma 3.3 that we need discrete Gaussiansamples from several different parameters r (cid:48) ≥ r . However, this is a non-issue since an oracle forBDD L ∗ ,γ/ ( √ r ) also solves BDD L ∗ ,γ/ ( √ r (cid:48) ) for any r (cid:48) ≥ r , so Lemma 3.4 in fact allows us to efficientlysample from D L,r (cid:48) √ n/γ for any r (cid:48) ≥ r . In this subsection we prove Lemma 3.5, showing how to generate CLWE samples from the givenBDD instance using discrete Gaussian samples. In the next subsection we will show how to solvethe BDD instance by applying the decision CLWE oracle to these samples, thereby completing theproof of Lemma 3.3.
Lemma 3.5.
There is an efficient algorithm that takes as input an n -dimensional lattice L , avector w + u where u ∈ L ∗ , reals r, s , s > such that rs / (cid:112) (cid:107) w (cid:107) ( rs /s ) + t ≥ η ε ( L ) for some ε < and t = (cid:112) r + s , and samples from D L,r , and outputs samples that are within statisticaldistance ε of the CLWE distribution A w (cid:48) ,β,γ for w (cid:48) = w / (cid:107) w (cid:107) , β = (cid:107) w (cid:107) (cid:112) ( rs /t ) + ( s / (cid:107) w (cid:107) ) and γ = (cid:107) w (cid:107) r /t .Proof. We start by describing the algorithm. For each x from the given samples from D L,r , do thefollowing. First, take the inner product (cid:104) x , w + u (cid:105) , which gives us (cid:104) x , w + u (cid:105) = (cid:104) x , w (cid:105) mod 1 . Appending this inner product modulo 1 to the sample x , we get ( x , (cid:104) x , w (cid:105) mod 1). Next, we“smooth out” the lattice structure of x by adding Gaussian noise v ∼ D R n ,s to x and e ∼ D R ,s to (cid:104) x , w (cid:105) (modulo 1). Then, we have( x + v , ( (cid:104) x , w (cid:105) + e ) mod 1) . (4)Finally, we normalize the first component by t so that its marginal distribution has unit width,giving us (( x + v ) /t, ( (cid:104) x , w (cid:105) + e ) mod 1) , (5)11hich the algorithm outputs.Our goal is to show that the distribution of (5) is within statistical distance 8 ε of the CLWEdistribution A w (cid:48) ,β,γ , given by ( y (cid:48) , ( γ (cid:104) y (cid:48) , w (cid:48) (cid:105) + e (cid:48) ) mod 1) , where y (cid:48) ∼ D R n and e (cid:48) ∼ D R ,β . Because applying a function cannot increase statistical distance(specifically, dividing the first component by t and taking mod 1 of the second), it suffices to showthat the distribution of ( x + v , (cid:104) x , w (cid:105) + e ) , (6)is within statistical distance 8 ε of that of( y , ( r/t ) (cid:104) y , w (cid:105) + e (cid:48) ) , (7)where y ∼ D R n ,t and e (cid:48) ∼ D R ,β . First, observe that by Lemma 2.7, the statistical distance betweenthe marginals on the first component (i.e., between x + v and y ) is at most 4 ε . It is thereforesufficient to bound the statistical distance between the second components conditioned on anyfixed value y of the first component. Conditioned on the first component being y , the secondcomponent in (6) has the same distribution as (cid:104) x + h , w (cid:105) (8)where h ∼ D R n ,s / (cid:107) w (cid:107) , and the second component in (7) has the same distribution as (cid:104) ( r/t ) y + h (cid:48) , w (cid:105) (9)where h (cid:48) ∼ D R n ,β/ (cid:107) w (cid:107) .By Claim 3.6 below, conditioned on x + v = y , the distribution of x is ( r/t ) y + D L − ( r/t ) y ,rs /t .Therefore, by Lemma 2.7, the conditional distribution of x + h given x + v = y is within statis-tical distance 4 ε of that of ( r/t ) y + h (cid:48) . Since statistical distance cannot increase by applying afunction (inner product with w in this case), (8) is within statistical distance 4 ε of (9). Hence, thedistribution of (6) is within statistical distance 8 ε of that of (7). Claim 3.6.
Let y = x + v , where x ∼ D L,r and v ∼ D R n ,s . Then, the conditional distribution of x given y = y is ( r/t ) y + D L − ( r/t ) y ,rs/t where t = √ r + s .Proof. Observe that x conditioned on y = y is a discrete random variable supported on L . Theprobability of x given y = y is proportional to ρ r ( x ) · ρ s ( y − x ) = ρ t ( y ) · ρ rs/t ( x − ( r/t ) y ) ∝ ρ rs/t ( x − ( r/t ) y ) , where the equality follows from Claim 2.4. Hence, the conditional distribution of x − ( r/t ) y given y = y is D L − ( r/t ) y ,rs/t . In this subsection, we complete the proof of Lemma 3.3. We first give some necessary backgroundon the Oracle Hidden Center Problem (OHCP) [PRS17]. The problem asks one to search fora “hidden center” w ∗ using a decision oracle whose acceptance probability depends only on thedistance to w ∗ . The problem’s precise statement is as follows.12 efinition 3.7 (OHCP) . For parameters ε, δ ∈ [0 , and ζ ≥ , the ( ε, δ, ζ ) - OHCP is an approxi-mate search problem that tries to find the “hidden” center w ∗ . Given a scale parameter d > andaccess to a randomized oracle O : R n × R ≥ → { , } such that its acceptance probability p ( w , t ) only depends on exp( t ) (cid:107) w − w ∗ (cid:107) for some (unknown) “hidden center” w ∗ ∈ R n with δd ≤ (cid:107) w ∗ (cid:107) ≤ d and for any w ∈ R n with (cid:107) w − w ∗ (cid:107) ≤ ζd , the goal is to output w s.t. (cid:107) w − w ∗ (cid:107) ≤ εd . Notice that OHCP corresponds to our problem since we want to solve BDD, which is equivalentto finding the “hidden” offset vector w ∗ , using a decision oracle for CLWE β,γ . The acceptanceprobability of the CLWE β,γ oracle will depend on the distance between our guess w and the trueoffset w ∗ . For OHCP, we have the following result from [PRS17]. Lemma 3.8 ([PRS17], Proposition 4.4) . There is a poly ( κ, n ) -time algorithm that takes as input aconfidence parameter κ ≥
20 log( n +1) (and the scale parameter d > ) and solves (exp( − κ ) , exp( − κ ) , /κ ) -OHCP in dimension n except with probability exp( − κ ) , provided that the oracle O correspond-ing to the OHCP instance satisfies the following conditions. For some p ( ∞ ) ∈ [0 , and t ∗ ≥ ,1. p ( , t ∗ ) − p ( ∞ ) ≥ /κ ;2. | p ( , t ) − p ( ∞ ) | ≤ − t/κ ) for any t ≥ ; and3. p ( w , t ) is κ -Lipschitz in t for any w ∈ R n such that (cid:107) w (cid:107) ≤ (1 + 1 /κ ) d .Furthermore, each of the algorithm’s oracle calls takes the form O ( · , i ∆) for some ∆ < thatdepends only on κ and n and ≤ i ≤ poly( κ, n ) . The main idea in the proof of Lemma 3.8 is performing a guided random walk with advicefrom the decision oracle O . The decision oracle O rejects a random step with high probability ifit increases the distance (cid:107) w − w ∗ (cid:107) . Moreover, there is non-negligible probability of decreasing thedistance by a factor exp(1 /n ) unless log (cid:107) w − w ∗ (cid:107) ≤ − κ . Hence, with sufficiently many steps, therandom walk will reach (cid:98) w , a guess of the hidden center, which is within exp( − κ ) distance to w ∗ with high probability.Our goal is to show that we can construct an oracle O satisfying the above conditions using anoracle for CLWE β,γ . Then, it follows from Lemma 3.8 that BDD with discrete Gaussian samples canbe solved using an oracle for CLWE. We first state some lemmas useful for our proof. Lemma 3.9is Babai’s closest plane algorithm and Lemma 3.10 is an upper bound on the statistical distancebetween two one-dimensional Gaussian distributions. Lemma 3.9 ([LLL82, Bab86]) . For any n -dimensional lattice L , there is an efficient algorithmthat solves BDD
L,d for d = 2 − n/ · λ ( L ) . Lemma 3.10 ([DMR18, Theorem 1.3]) . For all µ , µ ∈ R , and σ , σ > , we have ∆ (cid:0) N ( µ , σ ) , N ( µ , σ ) (cid:1) ≤ | σ − σ | σ , σ ) + | µ − µ | σ , σ ) , where N ( µ, σ ) denotes the Gaussian distribution with mean µ and standard deviation σ . Now, we prove Lemma 3.3, restated below.
Lemma 3.3.
Let β = β ( n ) ∈ (0 , and γ = γ ( n ) ≥ √ n such that q = γ/β is polynomiallybounded. There exists a probabilistic polynomial-time (classical) algorithm with access to an oraclethat solves CLWE β,γ , that takes as input a lattice L ⊂ R n , parameters β, γ , and r ≥ q · η ε ( L ) , and poly( n ) many samples from the discrete Gaussian distribution D L,r i for poly( n ) parameters r i ≥ r and solves BDD L ∗ ,d for d = γ/ ( √ r ) . roof. Let d (cid:48) = (1 − / (2 n )) · d . By [LM09, Corollary 2], it suffices to solve BDD L ∗ ,d (cid:48) . Let κ = poly( n ) with κ ≥ qn(cid:96) be such that the advantage of our CLWE β,γ oracle is at least 1 /κ ,where (cid:96) ≥ L ⊂ R n , a parameter r ≥ q · η ε ( L ), samples from D L,r i for 1 ≤ i ≤ poly( n ), and a BDD instance w ∗ + u where u ∈ L ∗ and (cid:107) w ∗ (cid:107) ≤ d (cid:48) , we want to recover w ∗ .Without loss of generality, we can assume that (cid:107) w ∗ (cid:107) ≥ exp( − n/ · λ ( L ∗ ) ≥ (2 q/r ) · exp( − n/ w ∗ efficiently using Babai’s closest plane algorithm(Lemma 3.9).We will use the CLWE oracle to simulate an oracle O : R n × R ≥ → { , } such that theprobability that O ( w , t ) outputs 1 (“accepts”) only depends on exp( t ) (cid:107) w − w ∗ (cid:107) . Our oracle O corresponds to the oracle in Definition 3.7 with w ∗ as the “hidden center”. We will use Lemma 3.8to find w ∗ .On input ( w , t ), our oracle O receives (cid:96) independent samples from D L, exp( t ) r . Then, we generateCLWE samples using the procedure from Lemma 3.5. The procedure takes as input these (cid:96) samples,the vector u + w ∗ − w where u ∈ L ∗ , and parameters exp( t ) r, exp( t ) s , s . Our choice of s and s will be specified below. Note that the CLWE oracle requires the “hidden direction” ( w − w ∗ ) / (cid:107) w − w ∗ (cid:107) to be uniformly distributed on the unit sphere. To this end, we apply the worst-to-averagecase reduction from Claim 2.22. Let S w ,t be the resulting CLWE distribution. Our oracle O thencalls the CLWE β,γ oracle on S (cid:96) w ,t and outputs 1 if and only if it accepts.Using the oracle O , we can run the procedure from Lemma 3.8 with confidence parameter κ and scale parameter d (cid:48) . The output of this procedure will be some approximation (cid:98) w to the oracle’s“hidden center” with the guarantee that (cid:107) (cid:98) w − w ∗ (cid:107) ≤ exp( − κ ) d (cid:48) . Finally, running Babai’s algorithmon the vector u + w ∗ − (cid:98) w will give us w ∗ exactly since (cid:107) (cid:98) w − w ∗ (cid:107) ≤ exp( − κ ) d ≤ β exp( − κ ) /η ε ( L ) ≤ − n λ ( L ∗ ) , where the last inequality is from Lemma 2.9.The running time of the above procedure is clearly polynomial in n . It remains to check thatour oracle O (1) is a valid instance of (exp( − κ ) , exp( − κ ) , /κ )-OHCP with hidden center w ∗ and (2) satisfies all the conditions of Lemma 3.8. First, note that S w ,t will be negligibly close instatistical distance to the CLWE distribution with parameters β (cid:48) = (cid:113) (exp( t ) (cid:107) w − w ∗ (cid:107) ) s (cid:48) + s ,γ (cid:48) = exp( t ) (cid:107) w − w ∗ (cid:107) r (cid:48) , where r (cid:48) = r / (cid:112) r + s and s (cid:48) = rs / (cid:112) r + s as long as r, s , s satisfy the conditions ofLemma 3.5. Then, we set s = r/ ( √ q ) and choose s such that s = β − ( s (cid:48) /r (cid:48) ) γ = β − ( s /r ) γ = β / . Lemma 3.5 requires rs / (cid:112) r (cid:107) w − w ∗ (cid:107) ( s /s ) + r + s ≥ η ε ( L ). We know that r ≥ q · η ε ( L )and s ≥ √ · η ε ( L ), so it remains to determine a sufficient condition for the aforementionedinequality. Observe that for any w such that (cid:107) w − w ∗ (cid:107) ≤ d , the condition s ≥ d · η ε ( L ) issufficient. Since r ≥ γ/β ) · η ε ( L ), this translates to s ≥ β/ ( √ s and s aslong as (cid:107) w − w ∗ (cid:107) ≤ d (beyond the BDD distance bound d (cid:48) ).Since S w ,t is negligibly close to the CLWE distribution, the acceptance probability p ( w , t ) of O only depends on exp( t ) (cid:107) w − w ∗ (cid:107) . Moreover, by assumption (cid:107) w ∗ (cid:107) ≥ exp( − n/ · (2 q/r ) ≥ exp( − κ ) d (cid:48) .14ence, O , κ, d (cid:48) correspond to a valid instance of (exp( − κ ) , exp( − κ ) , /κ )-OHCP with “hiddencenter” w ∗ .Next, we show that p ( w , t ) of O satisfies all three conditions of Lemma 3.8 with p ( ∞ ) taken tobe the acceptance probability of the CLWE oracle on samples from D R n × U . Item 1 of Lemma 3.8follows from our assumption that our CLWE β,γ oracle has advantage 1 /κ , and by our choice of r , s , and s , when t ∗ = log( γ/ ( (cid:107) w ∗ (cid:107) r (cid:48) )) > log( √ γ (cid:48) ( t ∗ ) = γ and β (cid:48) ( t ∗ ) = β . Hence, p ( , t ∗ ) − p ( ∞ ) ≥ /κ .We now show that Item 2 holds, which states that | p ( , t ) − p ( ∞ ) | ≤ − t/κ ) for any t > S ,t converges exponentially fast to D R n × U in statistical distance. Let f ( y , z )be the probability density of S ,t . Then,∆( S ,t , D R n × U ) = 12 (cid:90) | f ( z | y ) − U ( z ) | ρ ( y ) d y dz = 12 (cid:90) (cid:16) (cid:90) | f ( z | y ) − U ( z ) | dz (cid:17) ρ ( y ) d y . Hence, it suffices to show that the conditional density of z given y for S ,t converges exponentiallyfast to the uniform distribution on T . Notice that the conditional distribution of z given y is theGaussian distribution with width parameter β (cid:48) ≥ exp( t ) (cid:107) w ∗ (cid:107) r/ (2 q ) ≥ exp( t − n/ (cid:107) w ∗ (cid:107) ≥ (2 q/r ) · exp( − n/ Z , we know that β (cid:48) is larger than η ε ( Z ) for ε = exp( − exp(2 t − n )). Hence, one sample from this conditional distributionis within statistical distance ε of the uniform distribution by Lemma 2.8. By the triangle inequalityapplied to (cid:96) samples,∆ (cid:16) S (cid:96) ,t , ( D R n × U ) (cid:96) (cid:17) ≤ min(1 , (cid:96) exp( − exp(2 t − n ))) ≤ − t/κ ) , where in the last inequality, we use the the fact that we can choose κ to be such that 2 exp( − t/κ ) ≥ t ≥ κ/
2. And when t ≥ κ/ ≥ qn(cid:96) , we have (cid:96) exp( − exp(2 t − n )) (cid:28) exp( − t/κ ).It remains to verify Item 3, which states that p ( w , t ) is κ -Lipschitz in t for any (cid:107) w (cid:107) ≤ (1 +1 /κ ) d (cid:48) ≤ d . We show this by bounding the statistical distance between S w ,t and S w ,t for t ≥ t .With a slight abuse in notation, let f t i ( y , z ) be the probability density of S w ,t i and let ( β i , γ i ) bethe corresponding CLWE distribution parameters. For simplicity, also denote the hidden directionby w (cid:48) = ( w − w ∗ ) / (cid:107) w − w ∗ (cid:107) . Then,∆( f t , f t ) = 12 (cid:90) (cid:16) (cid:90) | f t ( z | y ) − f t ( z | y ) | dz (cid:17) ρ ( y ) d y = (cid:90) ∆ (cid:16) N ( γ (cid:104) y , w (cid:48) (cid:105) , β / √ π ) , N ( γ (cid:104) y , w (cid:48) (cid:105) , β / √ π ) (cid:17) ρ ( y ) d y ≤ (cid:90) (cid:16) − ( β /β ) ) + √ π ( γ − γ ) /β · |(cid:104) y , w (cid:48) (cid:105)| (cid:17) · ρ ( y ) d y (10) ≤ E y ∼ ρ [ M ( y )] · (cid:16) − exp( − t − t )) (cid:17) where M ( y ) = 12 (cid:16) √ πq · |(cid:104) y , w (cid:48) (cid:105)| (cid:17) ≤ E y ∼ ρ [ M ( y )] · t − t ) (11) ≤ ( κ/(cid:96) ) · ( t − t ) , (12)where (10) follows from Lemma 3.10, (11) uses the fact that 1 − exp( − t − t )) ≤ t − t ), and(12) uses the fact that E y ∼ ρ [ M ( y )] ≤ q ≤ κ/ (2 (cid:96) ). Using the triangle inequality over (cid:96) samples,15he statistical distance between S (cid:96) w ,t and S (cid:96) w ,t is at mostmin(1 , (cid:96) · ( κ/(cid:96) )( t − t )) ≤ κ ( t − t ) . Therefore, p ( w , t ) is κ -Lipschitz in t . In this section, we show the hardness of homogeneous CLWE by reducing from CLWE, whosehardness was established in the previous section. The main step of the reduction is to transformCLWE samples to homogeneous CLWE samples using rejection sampling (Lemma 4.1).Consider the samples ( y , z ) ∼ A w ,β,γ in CLWE β,γ . If we condition y on z = 0 (mod 1) thenwe get exactly samples y ∼ H w ,β,γ for hCLWE β,γ . However, this approach is impractical as z = 0(mod 1) happens with probability 0. Instead we condition y on z ≈ y will still have a “wavy” probability density in the directionof w with spacing 1 /γ , which accords with the picture of homogeneous CLWE. To avoid throwingaway too many samples, we will do rejection sampling with some small “window” δ = 1 / poly( n ).Formally, we have the following lemma. Lemma 4.1.
There is a poly( n, /δ ) -time probabilistic algorithm that takes as input a parameter δ ∈ (0 , and samples from A w ,β,γ , and outputs samples from H w , √ β + δ ,γ .Proof. Without loss of generality assume that w = e . By definition, the probability density ofsample ( y , z ) ∼ A w ,β,γ is p ( y , z ) = 1 β · ρ ( y ) · (cid:88) k ∈ Z ρ β ( z + k − γy ) . Let g : T → [0 ,
1] be the function g ( z ) = g ( z ) /M , where g ( z ) = (cid:80) k ∈ Z ρ δ ( z + k ) and M =sup z ∈ T g ( z ). We perform rejection sampling on the samples ( y , z ) with acceptance probabilityPr[accept | y , z ] = g ( z ). We remark that g ( z ) is efficiently computable (see [Bra+13, Section 5.2]).The probability density of outputting y and accept is (cid:90) T p ( y , z ) g ( z ) dz = ρ ( y ) βM · (cid:90) T (cid:88) k ,k ∈ Z ρ β ( z + k − γy ) ρ δ ( z + k ) dz = ρ ( y ) βM · (cid:90) T (cid:88) k,k ∈ Z ρ √ β + δ ( k − γy ) ρ βδ/ √ β + δ (cid:16) z + k + δ ( k − γy ) β + δ (cid:17) dz = δM (cid:112) β + δ · ρ ( y ) · (cid:88) k ∈ Z ρ √ β + δ ( k − γy ) , where the second equality follows from Claim 2.4. This shows that the conditional distribution of y upon acceptance is indeed H e , √ β + δ ,γ . Moreover, a byproduct of this calculation is that theexpected acceptance probability is Pr[accept] = Zδ/ ( M (cid:112) β + δ ), where, according to Eq. (3), Z = (cid:115) β + δ β + δ + γ · ρ √ β + δ + γ ( Z )= (cid:112) β + δ · ρ / √ β + δ + γ ( Z ) ≥ (cid:112) β + δ , g ( z ) = (cid:88) k ∈ Z ρ δ ( z + k ) ≤ · ∞ (cid:88) k =0 ρ δ ( k ) < · ∞ (cid:88) k =0 exp( − πk ) < δ <
1, implying that M ≤
4. Therefore, Pr[accept] ≥ δ/
4, and so the rejection samplingprocedure has poly( n, /δ ) expected running time.The above lemma reduces CLWE to homogeneous CLWE with slightly worse parameters. Hence,homogeneous CLWE is as hard as CLWE. Specifically, combining Theorem 3.1 (with β taken to be β/ √
2) and Lemma 4.1 (with δ also taken to be β/ √ Corollary 4.2.
For any β = β ( n ) ∈ (0 , and γ = γ ( n ) ≥ √ n such that γ/β is polynomiallybounded, there is a polynomial-time quantum reduction from DGS √ nη ε ( L ) /β to hCLWE β,γ . In this section, we prove the hardness of density estimation for k -mixtures of n -dimensional Gaus-sians by showing a reduction from homogeneous CLWE. This answers an open question regarding itscomputational complexity [Dia16, Moi18]. We first formally define density estimation for Gaussianmixtures. Definition 5.1 (Density estimation of Gaussian mixtures) . Let G n,k be the family of k -mixtures of n -dimensional Gaussians. The problem of density estimation for G n,k is the following. Given δ > and sample access to an unknown P ∈ G n,k , with probability / , output a hypothesis distribution Q (in the form of an evaluation oracle) such that ∆( Q, P ) ≤ δ . For our purposes, we fix the precision parameter δ to a very small constant, say, δ = 10 − . Nowwe show a reduction from hCLWE β,γ to the problem of density estimation for Gaussian mixtures.Corollary 4.2 shows that hCLWE β,γ is hard for γ ≥ √ n (assuming worst-case lattice problems arehard). Hence, by taking γ = 2 √ n and g ( n ) = O (log n ) in Proposition 5.2, we rule out the possibilityof a poly( n, k )-time density estimation algorithm for G n,k under the same hardness assumption. Proposition 5.2.
Let β = β ( n ) ∈ (0 , / , γ = γ ( n ) ≥ , and g ( n ) ≥ π . For k = 2 γ (cid:112) g ( n ) /π ,if there is an exp( g ( n )) -time algorithm that solves density estimation for G n, k +1 , then there is a O (exp( g ( n ))) -time algorithm that solves hCLWE β,γ .Proof. We apply the density estimation algorithm A to the unknown given distribution P . As wewill show below, with constant probability, it outputs a density estimate f that satisfies ∆( f, P ) < δ = 2 · − (and this is even though H w ,β,γ has infinitely many components). We then test whether P = D R n or not using the following procedure. We repeat the following procedure m = 1 / (6 √ δ )times. We draw x ∼ D R n and check whether the following holds f ( x ) D ( x ) ∈ [1 − √ δ, √ δ ] , (13)17here D denotes the density of D R n . We output P = D R n if Eq. (13) holds for all m independenttrials and P = H w ,β,γ otherwise. Since ∆( H w ,β,γ , D R n ) > / β,γ with probability at least 2 / O (exp( g ( n )) since this test uses a constantnumber of samples.If P = D R n , it is obvious that A outputs a close density estimate with constant probabilitysince D R n ∈ G n, k +1 . It remains to consider the case P = H w ,β,γ . To this end, we observe that H w ,β,γ is close to a (2 k + 1)-mixture of Gaussians. Indeed, by Claim 5.4 below,∆( H w ,β,γ , H ( k ) ) ≤ − π · k / ( β + γ )) < − π · k / (2 γ )) , where H ( k ) is the distribution given by truncating H w ,β,γ to the (2 k +1) central mixture components.Hence, the statistical distance between the joint distribution of exp( g ( n )) samples from H w ,β,γ andthat of exp( g ( n )) samples from H ( k ) is bounded by2 exp( − π · k / (2 γ )) · exp( g ( n )) = 2 exp( − g ( n )) ≤ − π ) . Since the two distributions are statistically close, a standard argument shows that A will output f satisfying ∆( f, H w ,β,γ ) ≤ ∆( f, H ( k ) ) + ∆( H ( k ) , H w ,β,γ ) < δ with constant probability. Claim 5.3.
Let β = β ( n ) ∈ (0 , / and γ = γ ( n ) ≥ . Then, ∆( H w ,β,γ , D R n ) > / . Proof.
Let γ (cid:48) = (cid:112) β + γ > γ . Let y ∈ R n be a random vector distributed according to H w ,β,γ .Using the Gaussian mixture form of (2), we observe that (cid:104) y , w (cid:105) mod γ/γ (cid:48) is distributed accordingto D β/γ (cid:48) mod γ/γ (cid:48) . Since statistical distance cannot increase by applying a function (inner productwith w and then applying the modulo operation in this case), it suffices to lower bound the statisticaldistance between D β/γ (cid:48) mod γ/γ (cid:48) and D mod γ/γ (cid:48) , where D denotes the 1-dimensional standardGaussian.By Chernoff, for all ζ >
0, at least 1 − ζ mass of D β/γ (cid:48) is contained in [ − a · ( β/γ (cid:48) ) , a · ( β/γ (cid:48) )], where a = (cid:112) log(1 /ζ ). Hence, D β/γ (cid:48) mod γ/γ (cid:48) is at least 1 − aβγ (cid:48) /γ − ζ far in statistical distance fromthe uniform distribution over R / ( γ/γ (cid:48) ) Z , which we denote by U . Moreover, by Lemma 2.8 andLemma 2.9, D mod γ/γ (cid:48) is within statistical distance ε/ − γ (cid:48) /γ ) / U . Therefore,∆( D β/γ (cid:48) mod γ/γ (cid:48) , D mod γ/γ (cid:48) ) ≥ ∆( D β/γ (cid:48) mod γ/γ (cid:48) , U ) − ∆( U, D mod γ/γ (cid:48) ) ≥ − aβγ (cid:48) /γ − ζ − ε/ > − √ aβ − ζ − exp( − γ ) / > / , where we set ζ = exp( −
2) and use the fact that β ≤ /
32 and γ ≥ Claim 5.4.
Let β = β ( n ) ∈ (0 , , γ = γ ( n ) ≥ , and k ∈ Z + . Then, ∆( H w ,β,γ , H ( k ) ) ≤ − π · k / ( β + γ )) , where H ( k ) is the distribution given by truncating H w ,β,γ to the central (2 k +1) mixture components. roof. We express H w ,β,γ in its Gaussian mixture form given in Eq. (2) and define a randomvariable X taking on values in Z such that the probability of X = i is equal to the probability thata sample comes from the i -th component in H w ,β,γ . Then, we observe that H ( k ) is the distributiongiven by conditioning on | X | ≤ k . Since X is a discrete Gaussian random variable with distribution D Z , √ β + γ , we observe that Pr[ | X | > k ] ≤ ε := 2 exp( − π · k / ( β + γ )) by [MP12, Lemma 2.8].Since conditioning on an event of probability 1 − ε cannot change the statistical distance by morethan ε , we have ∆( H w ,β,γ , H ( k ) ) ≤ ε . The noiseless CLWE problem ( β = 0) can be solved in polynomial time using LLL. This appliesboth to the homogeneous and the inhomogeneous versions, as well as to the search version. Theargument can be extended to the case of exponentially small β > y i , z i ), and find integer coefficients c , . . . , c m such that y = (cid:80) mi =1 c i y i is short, say (cid:107) y (cid:107) (cid:28) /γ . By Cauchy-Schwarz, we then have that γ (cid:104) y , w (cid:105) = (cid:80) mi =1 c i z i over the reals (not modulo 1!). This is formalized in Theorem 6.2. We first state Minkowski’sConvex Body Theorem, which we will use in the proof of our procedure. Lemma 6.1 ([Min10]) . Let L be a full-rank n -dimensional lattice. Then, for any centrally-symmetric convex set S , if vol( S ) > n · | det( L ) | , then S contains a non-zero lattice point. Theorem 6.2.
Let γ = γ ( n ) be a polynomial in n . Then, there exists a polynomial-time algorithmfor solving CLWE ,γ .Proof. Take n + 1 CLWE samples { ( y i , z i ) } n +1 i =1 and consider the matrix Y = (cid:20) y · · · y n y n +1 · · · δ (cid:21) , where δ = 2 − n .Consider the lattice L generated by the columns of Y . Since y i ’s are drawn from the Gaus-sian distribution, L is full rank. By Hadamard’s inequality, and the fact that with probabilityexponentially close to 1, (cid:107) y i (cid:107) ≤ √ n for all i , we have | det( L ) | ≤ δ · n n/ < − n . Now consider the n -dimensional cube S centered at with side length 2 − n . Then, vol( S ) = 2 − n ,and by Lemma 6.1, L contains a vector v satisfying (cid:107) v (cid:107) ∞ ≤ − n and so (cid:107) v (cid:107) ≤ √ n · − n . Applyingthe LLL algorithm [LLL82] gives us an integer combination of the columns of Y whose lengthis within 2 ( n +1) / factor of the shortest vector in L , which will therefore have (cid:96) norm less than √ n · − ( n − / . Let y be the corresponding combination of the y i vectors (which is equivalentlygiven by the first n coordinates of the output of LLL) and z ∈ ( − / , /
2] a representative of thecorresponding integer combination of the z i mod 1. Then, we have (cid:107) y (cid:107) ≤ √ n · − ( n − / andtherefore we obtain the linear equation γ · (cid:104) y , w (cid:105) = z over the reals (without mod 1).We now repeat the above procedure n times, and recover w by solving the resulting n linearequations. It remains to argue why the n vectors y we collect are linearly independent. First,19ote that the output y is guaranteed to be a non-zero vector since with probability 1, no integercombination of the Gaussian distributed y i is . Next, note that LLL is equivariant to rotations,i.e., if we rotate the input basis then the output vector will also be rotated by the same rotation.Moreover, spherical Gaussians are rotationally invariant. Hence, the distribution of the outputvector y ∈ R n is also rotationally invariant. Therefore, repeating the above procedure n times willgive us n linearly independent vectors. For γ = o ( √ n ), the covariance matrix will reveal the discrete structure of homogeneous CLWE,which will lead to a subexponential time algorithm for the problem. This clarifies why the hardnessresults of homogeneous CLWE do not extend beyond γ ≥ √ n .We define noiseless homogeneous CLWE distribution H w ,γ as H w ,β,γ with β = 0. We beginwith a claim that will allow us to focus on the noiseless case. Claim 7.1.
By adding Gaussian noise D R n ,β/γ to H w ,γ and then rescaling by a factor of γ/ (cid:112) β + γ ,the resulting distribution is H w , ˜ β, ˜ γ , where ˜ γ = γ/ (cid:112) β/γ ) and ˜ β = ˜ γ ( β/γ ) . Proof.
Without loss of generality, suppose w = e .Let z ∼ H w ,γ + D R n ,β/γ and ˜ z = γ z / (cid:112) β + γ . It is easy to verify that the marginals densityof ˜ z on subspace e ⊥ will simply be ρ . Hence we focus on calculating the density of z and ˜ z . Thedensity can be computed by convolving the probability densities of H w ,γ and D R n ,β/γ as follows. H w ,γ ∗ D R n ,β/γ ( z ) ∝ (cid:88) k ∈ Z ρ ( k/γ ) · ρ β/γ ( z − k/γ )= ρ √ β + γ /γ ( z ) · (cid:88) k ∈ Z ρ β/ √ β + γ (cid:16) k/γ − γ β + γ z (cid:17) = ρ (˜ z ) · (cid:88) k ∈ Z ρ ˜ β (cid:16) k − ˜ γ ˜ z (cid:17) , where the second to last equality follows from Claim 2.4. This verifies that the resulting distributionis indeed H w , ˜ β, ˜ γ .Claim 7.1 implies an homogeneous CLWE distribution with β > β >
0) case by adding independentGaussian noise and rescaling.
Lemma 7.2.
Let Σ (cid:31) be the covariance matrix of the n -dimensional noiseless homogeneousCLWE distribution H w ,γ with γ ≥ . Then, (cid:13)(cid:13)(cid:13) Σ − π I n (cid:13)(cid:13)(cid:13) ≥ γ exp( − πγ ) , where (cid:107) · (cid:107) denotes the spectral norm. Equivalently, in terms of the Gaussian mixture representation of Eq. (2), the resulting distribution has layersspaced by 1 / (cid:112) γ + β and of width β/ (cid:112) γ + β . roof. Without loss of generality, let w = e . Then H w ,γ = D L × D R n − where L is the one-dimensional lattice (1 /γ ) Z . Then, Σ = diag( E x ∼ D L [ x ] , π , . . . , π ), so it suffices to show that (cid:12)(cid:12)(cid:12) E x ∼ D L [ x ] − π (cid:12)(cid:12)(cid:12) ≥ γ exp( − πγ ) . Define g ( x ) = x · ρ ( x ). The Fourier transform of ρ is itself; the Fourier transform of g is given by (cid:98) g ( y ) = (cid:16) π − y (cid:17) ρ ( y ) . By definition and Poisson’s summation formula (Lemma 2.5), we have E x ∼ D L [ x ] = g ( L ) ρ ( L )= det( L ∗ ) · (cid:98) g ( L ∗ )det( L ∗ ) · ρ ( L ∗ ) = (cid:98) g ( L ∗ ) ρ ( L ∗ ) , where L ∗ = ((1 /γ ) Z ) ∗ = γ Z . Combining this with the expression for (cid:98) g , we have (cid:12)(cid:12)(cid:12) E x ∼ D L [ x ] − π (cid:12)(cid:12)(cid:12) = (cid:80) y ∈ L ∗ y ρ ( y )1 + ρ ( L ∗ \ { } ) ≥ γ exp( − πγ ) , where we use the fact that for γ ≥ ρ ( γ Z \ { } ) ≤ ρ ( Z \ { } ) < ∞ (cid:88) k =1 exp( − πk ) = 2 exp( − π )1 − exp( − π ) < . Combining Claim 7.1 and Lemma 7.2, we get the following corollary for the noisy case.
Corollary 7.3.
Let Σ (cid:31) be the covariance matrix of n -dimensional homogeneous CLWE distri-bution H w ,β,γ with γ ≥ and β > . Then, (cid:13)(cid:13)(cid:13) Σ − π I n (cid:13)(cid:13)(cid:13) ≥ γ exp( − π ( β + γ )) , where (cid:107) · (cid:107) denotes the spectral norm.Proof. Using Claim 7.1, we can view samples from H w ,β,γ as samples from H w ,γ (cid:48) with independentGaussian noise of width β (cid:48) /γ (cid:48) added and rescaled by γ (cid:48) / (cid:112) β (cid:48) + γ (cid:48) , where β (cid:48) , γ (cid:48) are given by β (cid:48) = β (cid:112) β/γ ) ,γ (cid:48) = (cid:112) β + γ . Let Σ be the covariance of H w ,β,γ and let Σ be the covariance of H w ,γ (cid:48) . Since the Gaussian noiseadded to H w ,γ (cid:48) is independent and β (cid:48) /γ (cid:48) = β/γ ,Σ = 11 + ( β/γ ) (cid:16) Σ + ( β/γ ) π I n (cid:17) . (cid:13)(cid:13)(cid:13) Σ − π I n (cid:13)(cid:13)(cid:13) = 11 + ( β/γ ) (cid:13)(cid:13)(cid:13)(cid:16) Σ + ( β/γ ) π I n (cid:17) − β/γ ) π I n (cid:13)(cid:13)(cid:13) = 11 + ( β/γ ) (cid:13)(cid:13)(cid:13) Σ − π I n (cid:13)(cid:13)(cid:13) ≥ γ exp( − π ( β + γ )) . where the last inequality follows from Lemma 7.2.We use the following lemma, which provides an upper bound on the error in estimating thecovariance matrix by samples. The sub-gaussian norm of a random variable Y is defined as (cid:107) Y (cid:107) ψ =inf { t > | E [exp( Y /t )] ≤ } and that of an n -dimensional random vector y is defined as (cid:107) y (cid:107) ψ = sup u ∈ S n − (cid:107)(cid:104) y , u (cid:105)(cid:107) ψ . Lemma 7.4 ([Ver18, Theorem 4.6.1]) . Let A be an m × n matrix whose rows A i are independent,mean zero, sub-gaussian isotropic random vectors in R n . Then for any u ≥ we have (cid:13)(cid:13)(cid:13) m A T A − I n (cid:13)(cid:13)(cid:13) ≤ K max( δ, δ ) where δ = C (cid:16)(cid:114) nm + u √ m (cid:17) , with probability at least − e − u for some constant C > . Here, K = max i (cid:107) A i (cid:107) ψ i . Combining Corollary 7.3 and Lemma 7.4, we have the following theorem for distinguishinghomogeneous CLWE distribution and Gaussian distribution.
Theorem 7.5.
Let γ = n ε , where ε < / is a constant, and let β = β ( n ) ∈ (0 , . Then, thereexists an exp( O ( n ε )) -time algorithm that solves hCLWE β,γ .Proof. Our algorithm takes m samples from the unknown input distribution P and computes thesample covariance matrix Σ m = (1 /m ) A T A , where A ’s rows are the samples, and its eigenvalues µ , . . . , µ n . Then, it determines whether P is a homogeneous CLWE distribution or not by testingthat (cid:12)(cid:12)(cid:12) µ i − π (cid:12)(cid:12)(cid:12) ≤ · γ exp( − π ( β + γ )) for all i ∈ [ n ] . The running time of this algorithm is O ( n m ) = exp( O ( n ε )). To show correctness, we first con-sider the case P = D R n . The standard Gaussian distribution satisfies the conditions of Lemma 7.4(after rescaling by 1 / (2 π )). Hence, the eigenvalues of Σ m will be within distance O ( (cid:112) n/m ) from1 / (2 π ) with high probability.Now consider the case P = H w ,β,γ . We can assume w = e without loss of generality sinceeigenvalues are invariant under rotations. Denote by y a random vector distributed according to H w ,β,γ and σ = E y ∼ H w ,β,γ [ y ]. The covariance of P is given byΣ = (cid:18) σ π I n − (cid:19) . (15)Now consider the sample covariance Σ m of P and denote by σ m = w T Σ m w = (1 /m ) (cid:80) mi =1 A i .Since A i ’s are sub-gaussian random variables [MP12, Lemma 2.8], σ m − σ is a sum of m indepen-dent, mean-zero, sub-exponential random variables. For m = ω ( n ), Bernstein’s inequality [Ver18,22orollary 2.8.3] implies that | σ m − σ | = O ( (cid:112) n/m ) with high probability. By Corollary 7.3, weknow that (cid:12)(cid:12)(cid:12) σ − π (cid:12)(cid:12)(cid:12) ≥ γ exp( − π ( β + γ )) . Hence, if we choose m = exp( cγ ) with some sufficiently large constant c , then Σ m will have aneigenvalue that is noticeably far from 1 / (2 π ) with high probability. Statistical Query (SQ) algorithms [Kea98] are a restricted class of algorithms that are only allowedto query expectations of functions of the input distribution without directly accessing individualsamples. To be more precise, SQ algorithms access the input distribution indirectly via the STAT( τ )oracle, which given a query function f and data distribution D , returns a value contained in theinterval E x ∼ D [ f ( x )] + [ − τ, τ ] for some precision parameter τ .In this section, we prove SQ hardness of distinguishing homogeneous CLWE distributions fromthe standard Gaussian. In particular, we show that SQ algorithms that solve homogeneous CLWErequire super-polynomial number of queries even with super-polynomial precision. This is formal-ized in Theorem 8.1. Theorem 8.1.
Let β = β ( n ) ∈ (0 , and γ = γ ( n ) ≥ √ . Then, any (randomized) SQ algorithmwith precision τ ≥ · exp( − π · γ / that successfully solves hCLWE β,γ with probability η > / requires at least (2 η − · exp( cn ) · τ β / (4 γ ) statistical queries of precision τ for some constant c > . Note that when γ = Ω( √ n ) and γ/β = poly( n ), even exponential precision τ = exp( − O ( n ))results in a query lower bound that grows as exp( ˜Ω( n )). This establishes an unconditional hardnessresult for SQ algorithms in the parameter regime γ = Ω( √ n ), which is consistent with our compu-tational hardness result based on worst-case lattice problems. The uniform spacing in homogeneousCLWE distributions gives us tight control over their pairwise correlation (see definition in (16)),which leads to a simple proof of the SQ lower bound.We first provide some necessary background on the SQ framework. We denote by B ( U , D ) thedecision problem in which the input distribution P either equals D or belongs to U , and the goalof the algorithm is to identify whether P = D or P ∈ U . For our purposes, D will be the standardGaussian D R n and U will be a finite set of homogeneous CLWE distributions. Abusing notation,we denote by D ( x ) the density of D . Following [Fel+17], we define the pairwise correlation betweentwo distributions P, Q relative to D as χ D ( P, Q ) := E x ∼ D (cid:20)(cid:18) P ( x ) D ( x ) − (cid:19) · (cid:18) Q ( x ) D ( x ) − (cid:19)(cid:21) = E x ∼ D (cid:20) P ( x ) Q ( x ) D ( x ) (cid:21) − . (16)Lemma 8.2 below establishes a lower bound on the number of statistical queries required tosolve B ( U , D ) in terms of pairwise correlation between distributions in U . Lemma 8.2 ([Fel+17, Lemma 3.10]) . Let D be a distribution and U be a set of distributions bothover a domain X such that for any P, Q ∈ U| χ D ( P, Q ) | ≤ (cid:40) δ if P = Qε otherwise . Let τ ≥ √ ε . Then, any (randomized) SQ algorithm that solves B ( U , D ) with success probability η > / requires at least (2 η − · |U | · τ / (2 δ ) queries to STAT( τ ) . U such that any two distinct vectors v , w ∈ U satisfy |(cid:104) v , w (cid:105)| ≤ / √
2, andidentify it with the set of homogeneous CLWE distributions { H w ,β,γ } w ∈U . A standard probabilisticargument shows that such a U can be as large as exp(Ω( n )), which proves Theorem 8.1. Proposition 8.3.
Let v , w ∈ R n be unit vectors and let H v , H w be n -dimensional homogeneousCLWE distributions with parameters γ ≥ , β ∈ (0 , , and hidden direction v and w , respectively.Then, for any α ≥ that satisfies γ (1 − α ) ≥ , | χ D ( H v , H w ) | ≤ (cid:40) γ/β ) if v = w − π · γ (1 − α )) if |(cid:104) v , w (cid:105)| ≤ α . Proof.
We will show that computing χ D ( H v , H w ) reduces to evaluating the Gaussian mass of twolattices L and L defined below. Then, we will tightly bound the Gaussian mass using Lemma 2.5and Lemma 2.10, which will result in upper bounds on | χ D ( H v , H w ) | . We define L and L byspecifying their bases B and B , respectively. B = 1 (cid:112) β + γ (cid:18) (cid:19) ,B = 1 (cid:112) β + γ − αγ ζ √ β + γ √ β + γ ζ , where ζ = (cid:112) ( β + γ ) − α γ / ( β + γ ). Then the basis of the dual lattice L ∗ and L ∗ is B − T and B − T , respectively. Note that λ ( L ) = 1 / ( β + γ ) and that the two columns of B have the samenorm, and so λ ( L ) ≤ β + γ · max (cid:110) α γ ζ ( β + γ ) , β + γ ζ (cid:111) = 1 ζ (17) ≤ γ (1 − α ) . (18)Now define the density ratio a ( t ) := H ( t ) /D ( t ), where D is the standard Gaussian and H isthe marginal distribution of homogeneous CLWE with parameters β, γ along the hidden direction.We immediately obtain a ( t ) = 1 Z (cid:88) k ∈ Z ρ β/γ ( t − k/γ ) , (19)where Z = (cid:82) R ρ ( t ) · (cid:80) k ∈ Z ρ β/γ ( t − k/γ ) dt . By Eq. (3), Z is given by Z = β (cid:112) β + γ · ρ (cid:32) (cid:112) β + γ Z (cid:33) . Moreover, we can express Z in terms of the Gaussian mass of ( L ) as Z = β β + γ · ρ ( L ) . D ( H v , H w ) can be expressed in terms of a ( t ) as χ D ( H v , H w ) = E x ∼ D (cid:104) a ( (cid:104) x , w (cid:105) ) · a ( (cid:104) x , v (cid:105) ) (cid:105) − . (20)Without loss of generality, assume v = e and w = α e + ξ e , where ξ = √ − α . Wefirst compute the pairwise correlation for v (cid:54) = w . For notational convenience, we denote by ε =8 · exp( − π · γ (1 − α )). χ D ( H v , H w ) + 1 = E x ∼ D (cid:104) a ( x ) · a ( αx + ξx ) (cid:105) = 1 Z (cid:88) k,(cid:96) ∈ Z (cid:90) (cid:90) ρ β ( γx − k ) · ρ β (( γαx + γξx ) − (cid:96) ) · ρ ( x ) · ρ ( x ) dx dx = 1 Z · β (cid:112) ( γξ ) + β (cid:88) k,(cid:96) ∈ Z (cid:90) ρ β ( γx − k ) · ρ ( x ) · ρ √ β / ( γξ ) ( (cid:96)/ ( γξ ) − ( α/ξ ) x ) dx = 1 Z · β (cid:112) ( γξ ) + β · β (cid:112) ( γξ ) + β ζ (cid:112) β + γ (cid:88) k,(cid:96) ∈ Z ρ √ β + γ ( k ) · ρ ζ (cid:16) (cid:96) − γ α · k/ ( β + γ ) (cid:17) = (cid:112) β + γ ζ · (cid:80) k,(cid:96) ∈ Z ρ √ β + γ ( k ) · ρ ζ (cid:16) (cid:96) − γ α · k/ ( β + γ ) (cid:17) ρ ( L )= (cid:112) β + γ ζ · ρ ( L ) ρ ( L )= (cid:112) β + γ ζ · det( L ∗ )det( L ∗ ) · ρ ( L ∗ ) ρ ( L ∗ )= ρ ( L ∗ ) ρ ( L ∗ ) (21) ∈ (cid:104)
11 + ε , ε (cid:105) , In (21), we used the Poisson summation formula (Lemma 2.5). The last line follows from (18) andLemma 2.10, which implies that for any 2-dimensional lattice L satisfying λ ( L ) ≤ ρ ( L ∗ \ { } ) ≤ − π/λ ( L ) ) . (22)Now consider the case v = w . Using (17), we get an upper bound λ ( L ) ≤ /β when α = 1.It follows that λ (( β/γ ) L ) ≤ /γ ≤
1. Hence, χ D ( H v , H v ) + 1 = (cid:112) β + γ ζ · ρ ( L ) ρ ( L ) ≤ (cid:112) β + γ ζ · ρ (( β/γ ) L ) ρ ( L )= (cid:112) β + γ ζ · det(( γ/β ) L ∗ )det( L ∗ ) · ρ (( γ/β ) L ∗ ) ρ ( L ∗ )= γ β · ρ (( γ/β ) L ∗ ) ρ ( L ∗ ) (23) ≤ γ/β ) . (24)25here we used Lemma 2.5 in (23) and in (24), we used (22) and the fact that λ (( β/γ ) L ) ≤ ρ (( γ/β ) L ∗ \ { } ) ≤ m ≥ Hidden Directions
In this section, we generalize the hardness result to the setting where the homogeneous CLWEdistribution has m ≥ Definition 9.1 ( m -Homogeneous CLWE distribution) . For ≤ m ≤ n , matrix W ∈ R n × m withorthonormal columns w , . . . , w m , and β, γ > , define the m -homogeneous CLWE distribution H W ,β,γ over R n to have density at y proportional to ρ ( y ) · m (cid:89) i =1 (cid:88) k ∈ Z ρ β ( k − γ (cid:104) y , w i (cid:105) ) . Note that the 0-homogeneous CLWE distribution is just D R n regardless of β and γ . Definition 9.2.
For parameters β, γ > and ≤ m ≤ n , the average-case decision problem hCLWE ( m ) β,γ is to distinguish the following two distributions over R n : (1) the m -homogeneous CLWEdistribution H W ,β,γ for some matrix W ∈ R n × m (which is fixed for all samples) with orthonormalcolumns chosen uniformly from the set of all such matrices, or (2) D R n . Lemma 9.3.
For any β, γ > and positive integer m = m ( n ) such that m ≤ n and n − m =Ω( n c ) for some constant c > , if there exists an efficient algorithm that solves hCLWE ( m ) β,γ withnon-negligible advantage, then there exists an efficient algorithm that solves hCLWE β,γ with non-negligible advantage.Proof. Suppose A is an efficient algorithm that solves hCLWE ( m ) β,γ with non-negligible advantagein dimension n . Then consider the following algorithm B that uses A as an oracle and solveshCLWE β,γ in dimension n (cid:48) = n − m + 1.1. Input: n (cid:48) -dimensional samples, drawn from either hCLWE β,γ or D R n (cid:48) ;2. Choose 0 ≤ i ≤ m − m − n − n (cid:48) coordinates to the given samples, where the first i appended coordinatesare drawn from H I i ,β,γ (with I i denoting the rank- i identity matrix) and the rest of thecoordinates are drawn from D R m − i − ;4. Rotate the augmented samples using a uniformly random rotation from the orthogonal group O ( n );5. Call A with the samples and output the result.As n = O ( n (cid:48) /c ), B is an efficient algorithm. Moreover, the samples passed to A are effectivelydrawn from either hCLWE ( i +1) β,γ or hCLWE ( i ) β,γ . Therefore the advantage of B is at least 1 /m fractionof the advantage of A , which would be non-negligible (in terms of n , and thus also in terms of n (cid:48) )as well.Combining Corollary 4.2 and Lemma 9.3, we obtain the following corollary. Corollary 9.4.
For any β = β ( n ) ∈ (0 , and γ = γ ( n ) ≥ √ n such that γ/β is polynomiallybounded, and positive integer m = m ( n ) such that m ≤ n and n − m = Ω( n c ) for some constant c > , there is a polynomial-time quantum reduction from DGS √ nη ε ( L ) /β to hCLWE ( m ) β,γ . eferences [AD97] M. Ajtai and C. Dwork. A public-key cryptosystem with worst-case/average-case equiv-alence. STOC . 1997, pp. 284–293.[AG11] S. Arora and R. Ge. New algorithms for learning in presence of errors.
ICALP . 2011,pp. 403–415.[AK05] S. Arora and R. Kannan. Learning mixtures of separated nonspherical Gaussians.
Ann.Appl. Probab.
15: 1A (2005), pp. 69–92.[AR05] D. Aharonov and O. Regev. Lattice problems in NP ∩ CoNP.
J. ACM
52: 5 (2005),pp. 749–765.[Bab86] L. Babai. On Lov´asz’ lattice reduction and the nearest lattice point problem.
Combi-natorica
6: 1 (1986), pp. 1–13.[Bra+13] Z. Brakerski, A. Langlois, C. Peikert, O. Regev, and D. Stehl´e. Classical hardness oflearning with errors.
STOC . 2013, pp. 575–584.[Bub+19] S. Bubeck, Y. T. Lee, E. Price, and I. Razenshteyn. Adversarial examples from com-putational constraints.
ICML . Vol. 97. 2019, pp. 831–840.[BV08] S. C. Brubaker and S. Vempala. Isotropic PCA and affine-invariant clustering.
FOCS .2008, pp. 551–560.[Das99] S. Dasgupta. Learning mixtures of Gaussians.
FOCS . 1999, p. 634.[Dia16] I. Diakonikolas. Learning structured distributions.
Handbook of Big Data . 2016, pp. 267–284.[DKS17] I. Diakonikolas, D. M. Kane, and A. Stewart. Statistical query lower bounds for robustestimation of high-dimensional Gaussians and Gaussian mixtures.
FOCS . 2017, pp. 73–84.[DKS18] I. Diakonikolas, D. M. Kane, and A. Stewart. List-decodable robust mean estimationand learning mixtures of spherical Gaussians.
STOC . 2018, pp. 1047–1060.[DMR18] L. Devroye, A. Mehrabian, and T. Reddad. The total variation distance between high-dimensional Gaussians. 2018. arXiv: .[DS07] S. Dasgupta and L. Schulman. A probabilistic analysis of EM for mixtures of separated,spherical Gaussians.
JMLR
J. ACM
64: 2 (2017).[HL18] S. B. Hopkins and J. Li. Mixture models, robustness, and sum of squares proofs.
STOC .2018, pp. 1021–1034.[Kea98] M. Kearns. Efficient noise-tolerant learning from statistical queries.
J. ACM
45: 6(1998), pp. 983–1006.[KKK19] S. Karmalkar, A. Klivans, and P. Kothari. List-decodable linear regression.
NeurIPS .2019, pp. 7425–7434.[KSS18] P. K. Kothari, J. Steinhardt, and D. Steurer. Robust moment estimation and improvedclustering via sum of squares.
STOC . 2018, pp. 1035–1046.[LLL82] A. K. Lenstra, H. W. Lenstra, and L. Lov´asz. Factoring polynomials with rationalcoefficients. en.
Mathematische Annalen
CRYPTO . 2009, pp. 577–594.[Min10] H. Minkowski. Geometrie der Zahlen. B.G. Teubner, 1910.[Moi18] A. Moitra. Algorithmic aspects of machine learning. Cambridge University Press, 2018.[MP12] D. Micciancio and C. Peikert. Trapdoors for lattices: simpler, tighter, faster, smaller.
EUROCRYPT . 2012, pp. 700–718.[MR07] D. Micciancio and O. Regev. Worst-case to average-case reductions based on Gaussianmeasures.
SIAM J. Comput.
37: 1 (2007), pp. 267–302.[MV10] A. Moitra and G. Valiant. Settling the polynomial learnability of mixtures of Gaussians.
FOCS . 2010, pp. 93–102.[Pea94] K. Pearson. Contributions to the mathematical theory of evolution.
Philosophical Trans-actions of the Royal Society of London. A
185 (1894), pp. 71–110.[Pei10] C. Peikert. An efficient and parallel Gaussian sampler for lattices.
CRYPTO . 2010,pp. 80–97.[Pei16] C. Peikert. A decade of lattice cryptography.
Foundations and Trends in TheoreticalComputer Science
10: 4 (2016), pp. 283–424.[PRS17] C. Peikert, O. Regev, and N. Stephens-Davidowitz. Pseudorandomness of ring-LWE forany ring and modulus.
STOC . 2017, pp. 461–473.[Reg04] O. Regev. New lattice-based cryptographic constructions.
J. ACM
51: 6 (2004), pp. 899–942.[Reg05] O. Regev. On lattices, learning with errors, random linear codes, and cryptography.
STOC . 2005, pp. 84–93.[RS09] R. Rubinfeld and R. A. Servedio. Testing monotone high-dimensional distributions.
Random Structures & Algorithms
34: 1 (2009), pp. 24–44.[RV17] O. Regev and A. Vijayaraghavan. On learning mixtures of well-separated Gaussians.
FOCS . 2017, pp. 85–96.[RY20] P. Raghavendra and M. Yau. List decodable learning via sum of squares.
SODA . 2020,pp. 161–180.[Sze+14] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R.Fergus. Intriguing properties of neural networks.
ICLR . 2014.[Ver18] R. Vershynin. High-dimensional probability: an introduction with applications in datascience. Cambridge University Press, 2018.[VW02] S. Vempala and G. Wang. A spectral algorithm for learning mixtures of distributions.