[PDF] On the Sample Complexity of solving LWE using BKW-Style Algorithms

Abstract

The Learning with Errors (LWE) problem receives much attention in cryptography, mainly due to its fundamental significance in post-quantum cryptography. Among its solving algorithms, the Blum-Kalai-Wasserman (BKW) algorithm, originally proposed for solving the Learning Parity with Noise (LPN) problem, performs well, especially for certain parameter settings with cryptographic importance. The BKW algorithm consists of two phases, the reduction phase and the solving phase. In this work, we study the performance of distinguishers used in the solving phase. We show that the Fast Fourier Transform (FFT) distinguisher from Eurocrypt'15 has the same sample complexity as the optimal distinguisher, when making the same number of hypotheses. We also show that it performs much better than theory predicts and introduce an improvement of it called the pruned FFT distinguisher. Finally, we indicate, via extensive experiments, that the sample dependency due to both LF2 and sample amplification is limited.

Full PDF

OOn the Sample Complexity of solving LWE usingBKW-Style Algorithms

Qian Guo, Erik M˚artensson and Paul Stankovski Wagner

Department of Electrical and Information Technology

Lund University, Lund, SwedenEmail: { qian.guo, erik.martensson, paul.stankovski wagner } @eit.lth.se Abstract —The Learning with Errors (LWE) problem receivesmuch attention in cryptography, mainly due to its fundamentalsigniﬁcance in post-quantum cryptography. Among its solvingalgorithms, the Blum-Kalai-Wasserman (BKW) algorithm, origi-nally proposed for solving the Learning Parity with Noise (LPN)problem, performs well, especially for certain parameter settingswith cryptographic importance. The BKW algorithm consists oftwo phases, the reduction phase and the solving phase.In this work, we study the performance of distinguishers usedin the solving phase. We show that the Fast Fourier Transform(FFT) distinguisher from Eurocrypt’15 has the same samplecomplexity as the optimal distinguisher, when making the samenumber of hypotheses. We also show that it performs much betterthan theory predicts and introduce an improvement of it calledthe pruned FFT distinguisher. Finally, we indicate, via extensiveexperiments, that the sample dependency due to both LF2 andsample ampliﬁcation is limited.

I. I

NTRODUCTION

Post-quantum cryptography studies replacements of cryp-tographic primitives based on the factoring or discrete-logproblem, since both can be efﬁciently solved by a quantumcomputer [1]. Lattice-based cryptography is its main area. Inthe NIST Post-Quantum Cryptography Standardization [2], 5out of 7 ﬁnalists and 2 out of 8 alternates are lattice-based.The

Learning with Errors (LWE) problem, introduced byRegev [3], is the major problem in lattice-based cryptography.Its average-case hardness can be based on the worst-casehardness of some standard lattice problems, which is extremelyinteresting in theoretical crypto. The most famous, of its manycryptographic applications, is the design of Fully Homomor-phic Encryption (FHE) schemes. Its binary counterpart, the

Learning Parity with Noise problem (LPN), also plays ansigniﬁcant role in cryptography (see [4]), especially in light-weight cryptography for very constrained environments suchas RFID tags and low-power devices.The algorithms for solving LWE can be divided into lattice-based, algebraic, and combinatorial methods. The last class ofalgorithms all inherit from the famous Blum-Kalai-Wasserman(BKW) algorithm [5], [6], and are the most relevant toour study. We refer interested readers to [7] for concretecomplexity estimation for solving LWE instances, and to [8],[9] for asymptotic complexity estimations.The BKW-type algorithms include two phases, the reductionphase and the solving phase. The prior consists of a seriesof operations, called BKW steps, iteratively reducing thedimension of the problem at the cost of increasing its noise level. At the end of the reduction phase, the original LWEproblem is transformed to a new problem with a much smallerdimension. The new problem can be solved efﬁciently by aprocedure called distinguishing in the solving phase.One of the main challenges in understanding the preciseperformance of BKW variants on solving the LWE problemcomes from the lack of extensive experimental studies, espe-cially on the various distinguishers proposed for the solvingphase. Firstly, we have borrowed many heuristics from BKWvariants on the LPN problem, but only very roughly or not atall veriﬁed them for the LWE problem. Secondly, the tightnessof the nice theoretical bound in [10] on the sample complexityof the FFT distinguisher also needs to be experimentallychecked. Lastly, a performance comparison of the differentknown distinguishers is still lacking.

A. Related Work

The BKW algorithm proposed by Blum et al. [5], [6] is theﬁrst sub-exponential algorithm for solving the LPN problem.Its initial distinguisher, an exhaustive search method in thebinary ﬁeld, recovers one bit of the secret by employingmajority voting. Later, Levieil and Fouque [11] applied the fastWalsh-Hadamard transform (FWHT) technique to acceleratethe distinguishing process and recovered a number of secretbits in one pass. They also proposed some heuristic versionsand tested these assumptions by experiments. In [12] Kirchnerproposed a secret-noise transform technique to change thesecret distribution to be sparse. This technique is an applicationof the transform technique proposed in [13] for solving LWE.Bernstein and Lange [14] further instantiated an attack onthe Ring-LPN problem, a variant of LPN with algebraic ringstructures. In [15], [16], Guo, Johansson, and L¨ondahl pro-posed a new distinguishing method called subspace hypothesistesting. Though this distinguisher can handle an instance withlarger dimension by using covering codes, its inherent natureis still an FWHT distinguisher. Improvements of the BKWalgorithm were further studied by Zhang et al. [17] and Bogos-Vaudenay [18]. An elaborate survey with experimental resultson the BKW algorithm for solving LPN can be found in [19].BKW for solving LWE follows a similar research line.Albrecht et al. initiated the study in [20]. In PKC 2014 [21],a new reduction technique called lazy modulus switching wasproposed. In both works, the solving phase uses an exhaustivesearch approach. In [10] Duc et al. introduced the fast Fourier a r X i v : . [ c s . CR ] F e b ransform (FFT) technique in the distinguishing process andbounded the sample complexity theoretically from the Hoeffd-ing inequality. Note that the actual performance regarding thebound is not experimentally veriﬁed and the information lossin the FFT distinguisher is unclear. There are new reductionmethods in [22]–[24], and in [22], the authors also proposeda new method with polynomial reconstruction in the solvingphase. This method has the same sample complexity as thatof the exhaustive search approach but requires ( q + 1) FFToperations rather than only one FFT in [10]. The BKW variantswith memory constraints were recently studied in [25]–[27].

B. Contributions

In the paper, we compare the performances of the knowndistinguishers empirically. We investigate the performanceof the optimal distinguisher and the FFT distinguisher. Wealso test the sample dependency when using LF2 or sampleampliﬁcation. We have the following contributions.1) We show that the FFT distinguisher and the optimaldistinguisher have the same sample complexity, if wemake sure that the distinguishers make the same numberof hypotheses. Thus, except for very sparse secrets, theFFT distinguisher is always preferable. This also makesthe polynomial reconstruction method of [22] obsolete.2) We indicate that the formula from [10] for the numberof samples needed for distinguishing is off by roughlyan order of magnitude.3) We introduce a pruned FFT method. By only testingprobable hypotheses, we improve the performance of theFFT method from [10] with no computational overhead.4) We indicate that the sample dependency due to usingLF2 or sample ampliﬁcation is limited.

C. Organization

The rest of the paper is organized as follows. Section IIintroduces some necessary background. In Section III we coverthe basic BKW algorithm. Section IV goes over distinguishersused for hypothesis testing when solving LWE using BKWand introduces the pruned FFT method. Next, in Section V weshow why the FFT distinguisher and the optimal distinguisherperform identically for our setting, followed by simulationresults in Section VI. Section VII concludes the paper.II. B

ACKGROUND

Let us introduce some notation. Bold small letters denotevectors. Let (cid:104)· , ·(cid:105) denote the scalar products of two vectorswith the same dimension. By | x | we denote the absolute valueof x for a real number x ∈ R . We also denote by (cid:60) ( y ) the realpart and (cid:107) y (cid:107) the absolute value of a complex number y ∈ C . A. LWE

Let us deﬁne the LWE problem.

Deﬁnition 1 (LWE):

Let n be a positive integer, q an odd prime. Let s be auniformly random secret vector in Z nq . Assume access to m noisy scalar products between s and known vectors a i , i.e. b i = (cid:104) a i , s (cid:105) + e i , (1) for i = 1 , . . . , m . The error terms e i are drawn from adistribution χ . The (search) LWE problem is to ﬁnd s .Thus, when solving LWE you have access to a large set ofpairs ( a i , b i ) and want to ﬁnd the corresponding secret vector s . Some versions restrict the number of available samples. Ifwe let b = ( b , b , . . . , b m ) , e = ( e , e , . . . , e m ) and A =[ a T , a T · · · a Tm ] we can write the problem on matrix form as b = sA + e . (2) B. Rounded Gaussian Distribution

For the error we use the rounded Gaussian distribution .Let f ( x | , σ ) denote the PDF of the normal ditribution withmean 0 and standard deviation σ , this distribution in turnbeing denoted as N (0 , σ ) . The rounded Gaussian distributionsamples from N (0 , σ ) , rounds to the nearest integer andwraps to the interval [ − ( q − / , ( q − / . In other words,the probability of choosing a certain error e is equal to ∞ (cid:88) k = −∞ (cid:90) e +1 / k · qe − / k · q f ( x | , σ ) dx, for e ∈ [ − ( q − / , ( q − / . We denote this distributionby ¯Ψ σ,q . We use the well-known heuristic approximation thatthe sum of two independent distributions X and X , drawnfrom ¯Ψ σ ,q and ¯Ψ σ ,q , is drawn from ¯Ψ √ σ + σ ,q . We alsouse the notation α = σ/q . Finally, we let U ( a, b ) denote thediscrete uniform distribution taking values from a up to b .III. BKWThe BKW algorithm was originally invented to solve LPN.It was ﬁrst used for LWE in [20]. The BKW algorithm consistsof two parts, reduction and hypothesis testing. A. Reduction

We divide samples into categories based on b positionvalues in the a vectors. Two samples should be in the samecategory if and only if the b position values get canceledwhen adding or subtracting the a vectors. Given two samples ([ ± a , a ] , b ) and ([ ± a , a ] , b ) within the same category.By adding/subtracting the a vectors we get a , = [0 0 · · · (cid:124) (cid:123)(cid:122) (cid:125) b symbols ∗ ∗ · · · ∗ ] . The corresponding b value is b , = b ± b . Now we havea new sample ( a , , b , ) . The corresponding noise variable is e , = e ± e , with variance σ , where σ is the variance ofthe originial noise. By calculating a suitable number of newsamples for each category we have reduced the dimensionalityof the problem by b , but increased the noise variance to σ .If we repeat the reduction process t times we end up with adimensionality of n − tb , and a noise variance of t · σ . Also common is to use the Discrete Gaussian distribution, which is similar. ) LF1 and LF2:

LF1 and LF2 are two implementationtricks originally proposed for solving LPN in [11]. Both cannaturally be generalized for solving LWE.In LF1 we choose one representative per category. We formnew samples by the other samples with the representative. Thisway all samples at the hypothesis testing stage are independentof each other. However, the sample size shrinks by ( q b − / samples per generation, requiring a large initial sample size.In LF2 we allow combining any pair of samples withina category, creating much more samples. If we form everypossible sample, a sample size of q b − / is enough tokeep the sample size constant between steps. The disadvantageof this approach is that the samples are no longer independent,leading to higher noise levels in the hypothesis stage of BKW.It is generally assumed that this effect is quite small. Thisassumption is well tested for solving the LPN problem [11].

2) Sample Ampliﬁcation:

Some versions of LWE limit thenumber of samples. We can get more samples using sampleampliﬁcation. For example, by adding/subtracting triples ofsamples we can increase the initial sample size m up to amaximum of · (cid:0) m (cid:1) . This does increase the noise by a factor of √ . It also leads to an increased dependency between samplesin the hypothesis testing phase, similar in principle to LF2.

3) Secret-Noise Transformation:

There is a transformationof the LWE problem that makes the distribution of the secretvector follow the distribution of the noise [12], [13].

4) Improved Reduction Steps:

There are many improve-ments of the plain BKW steps. Lazy modulus switching (LMS)was introduced in [21] and further developed in [23]. In [22]coded-BKW was introduced. Coded-BKW with sieving wasintroduced in [24] and improved in [9], [28].Since only the ﬁnal noise level, not the type of steps, mattersfor the distinguishers, we only use plain steps in this paper.

B. Hypothesis Testing

Assume that we have reduced all but k positions to 0,leaving k positions for the hypothesis testing phase. After thereduction phase we have samples on the form b = k (cid:88) i =1 a i · s i + e ⇔ b − k (cid:88) i =1 a i · s i = e, (3)where e is (approximately) rounded Gaussian distributedwith a standard deviation of σ f = 2 t/ · σ and mean 0. Now theproblem is to distinguish the correct guess s = ( s , s , . . . , s k ) from all the incorrect ones, among all q k guesses . For eachguess ˆ s we calculate the corresponding error terms in (3).For the correct guess the observed values of e are roundedGaussian distributed, while for the wrong guess they areuniformly random. How to distinguish the right guess fromall the wrong ones is explained in Section IV. After the secret-noise transforming most of these hypotheses are almostguaranteed to be incorrect, simplifying the hypothesis testing a bit.

IV. D

ISTINGUISHERS

For the hypothesis testing we study the optimal distin-guisher, which is an exhaustive search method; and a fastermethod based on the fast Fourier transform.

A. Optimal Distinguisher

Let D ˆ s denote the distribution of the e values for a givenguess of the secret vector ˆ s . As is shown in [29, Prop. 1] tooptimally distinguish the hypothesis D ˆ s = U (0 , q − against D ˆ s = ¯Ψ σ f ,q we calculate the log-likelihood ratio q − (cid:88) e =0 N ( e ) log Pr ¯Ψ σf ,q ( e )Pr U (0 ,q − ( e ) = q − (cid:88) e =0 N ( e ) log (cid:16) q · Pr ¯Ψ σf ,q ( e ) (cid:17) , (4)where N ( e ) denotes the number of times e occurs for theguess ˆ s , σ f denotes the standard deviation of the samples afterthe reduction phase and Pr D ( e ) denotes the probability ofdrawing e from the distribution D . We choose the value ˆ s thatmaximizes (4). The time complexity of this distinguisher is O ( m · q k ) , (5)if we try all possible hypotheses. After performing the secret-noise transformation of Section III-A3 we can limit ourselvesto assuming that the k values in s have an absolute value ofat most d , reducing the complexity to O ( m · (2 d + 1) k ) . (6)By only testing the likely hypotheses we have a lower risk ofchoosing an incorrect one . This trick of limiting the numberof hypotheses can of course also be applied to the FFT methodof Section IV-B, which we do in Section IV-D. B. Fast Fourier Transform Method

For LWE, the idea of using a transform to speed up thedistinguishing was introduced in [10]. Consider the function f ( x ) = m (cid:88) j =1 a j = x θ b j q , (7)where x ∈ Z kq , a j = x is equal to 1 if and only if x = a j and0 otherwise, and θ q denotes the q -th root of unity. The ideaof the FFT distinguisher is to calculate the FFT of f , that is ˆ f ( α ) = (cid:88) x ∈ Z kq f ( x ) θ −(cid:104) x , α (cid:105) q = m (cid:88) j =1 θ − ( (cid:104) a j , α (cid:105)− b j ) q . (8)Given enough samples compared to the noise level, thecorrect guess α = s maximizes (cid:60) ( ˆ f ( α )) in (8).The time complexity of the FFT distinguisher is O ( m + k · q k · log( q )) . (9) as long as the correct one is among our hypotheses. n general this complexity is much lower than the one in(5). However, it does depend on the sparsity of the secret s .For a binary s , the exhaustive methods are better.From [10, Thm. 16] we have the following (upper limit)formula for the sample complexity of the FFT distinguisher · ln (cid:18) q k (cid:15) (cid:19) (cid:18) qπ sin (cid:18) πq (cid:19) e − π σ /q (cid:19) − t +1 , (10)where (cid:15) is the probability of guessing s incorrectly. Noticethat the expression is slightly modiﬁed to ﬁt our notation andthat a minor error in the formula is corrected . C. Polynomial Reconstruction Method

In [22], a method combining exhaustive search and the FFTwas introduced. It achieves optimal distinguishing informationtheoretically, while being more efﬁcient than the optimaldistinguisher. However, its complexity is roughly a factor q higher than the complexity of the FFT distinguisher. D. Pruned FFT Distinguisher

Also when using an FFT distinguisher we can limit the num-ber of hypotheses. We only need a small subset of the outputvalues of the FFT distinguisher in (8), so we can speed-up thecalculations using a pruned FFT. In general, if we only need K out of all N output values, the time complexity for calculatingthe FFT improves from O ( N log( N )) to O ( N log( K )) [30].Limiting the magnitude when guessing the last k positions of s to d , this changes the time complexity from (9) to O ( m + k · q k · log(2 d + 1)) . (11)More importantly this method reduces the sample complex-ity. In the formula for sample complexity (10), the numerator q k corresponds to the number of values of s can take on thelast k positions. Re-doing the proofs of [10, Thm. 16], limitingthe magnitude of the guess in each position to d , we get · ln (cid:18) (2 d + 1) k (cid:15) (cid:19) (cid:18) qπ sin (cid:18) πq (cid:19) e − π σ /q (cid:19) − t +1 . (12)This reduced sample complexity comes at no extra cost.V. E QUAL P ERFORMANCE OF O PTIMAL AND

FFTD

ISTINGUISHERS

When starting to run simulations, we noticed that theFFT distinguisher and the optimal distinguisher performedidentically, in terms of number of samples to correctly guessthe secret. We explain this phenomenon in Appendix A .There are two immediate effects of this ﬁnding. • The polynomial reconstruction method is obsolete. Using our notation k should be within the logarithm and not as a factorin front of it like in [10]. We do, of course, not claim that this is true in general for distinguishingdistributions outside of our context of solving LWE using BKW. • Unless the secret is very sparse, the FFT distinguisher isstrictly better than the optimal distinguisher, since it iscomputationally cheaper.Hence we limit our investigation to the FFT distinguisherfrom Section VI. We do not make any claims about theequivalance between the sample complexity of the two distin-guishers outside of our context of solving LWE using BKW,when having large rounded (or Discrete) Gaussian noise .VI. S IMULATIONS AND R ESULTS

This section covers the simulations we ran, using the FBBLlibrary [31] from [32], and the results they yielded. For allﬁgures, each point corresponds to running plain BKW plusdistinguishing at least 30 times. For most points we ran slightlymore iterations. See Appendix B for details on the number ofiterations for all the points. We chose our parameters inspiredby the Darmstadt LWE Challenge [33].The challenges are a set of (search) LWE instances usedto compare LWE solving methods. Each instance consists ofthe dimension n , the modulus q ≈ n , the relative error size α and m ≈ n equations of the form (1). Our simulationsmostly use parameters inspired by the LWE challenges. Wemostly let q = 1601 (corresponding to n = 40 ) and vary α to get problem instances that require a suitable number ofsamples for simulating hypothesis testing. The records for theLWE challenges are set using lattice sieving [34]. A. Varying Noise Level

In the upper part of Figure 1 we compare the theoreticalsample complexity from (10) with simulation results froman implementation of the FFT distinguisher of [10] and ourpruned FFT distinguisher. The latter distinguisher guessesvalues of absolute value up to σ , rounded upwards. Thesimulated points are the median values of our simulations andthe theoretical values correspond to setting (cid:15) = 0 . in (10).We use q = 1601 , n = 28 , we take t = 13 steps of plainBKW, reducing 2 positions per step. Finally we guess the last2 positions and measure the minimum number of samples tocorrectly guess the secret. We vary α between 0.005 and 0.006.We use LF1 to guarantee that the samples are independent.We notice that there is a gap of roughly a factor 10 betweentheory and simulation. More exactly, the gap is a factor[10.8277, 8.6816, 10.1037, 8.6776, 10.5218, 10.1564] for thesix points, counting in increasing order of noise level.We also see a gap between the FFT distinguisher and prunedFFT distinguisher. We can estimate the gap by comparing (12)and (10). Counting in increasing level of noise by theory weexpect the pruned version to need [1.8056, 1.8056, 1.7895,1.7743, 1.7598, 1.7461] times less samples for the 6 datapoints. The numbers from the simulation were [2.0244, 1.8610,1.8433, 2.1905, 2.0665, 2.2060], pretty close to theory. Although it could be interesting to investigate. ig. 1: Theoretical values vs. simulated values.

B. Varying q In the lower part of Figure 1 we show how the number ofsamples needed for distinguishing varies with q . For q we usethe values [101, 201, 401, 801, 1601, 3201], for α we usethe values [0.0896, 0.0448, 0.0224, 0.0112, 0.0056, 0.0028]and the number of steps were [5, 7, 9, 11, 13, 15]. Therebythe ﬁnal noise level and the original s vectors have almost thesame distribution, making the q values the only varying factor.We use LF1 to guarantee that the samples are independent.Notice that the number of samples needed to guess the secretis roughly an order of magnitude lower than theory predicts,counting in increasing order of q , the gain is a factor [11.4537,10.6112, 9.2315, 10.4473, 9.5561, 9.7822] for the six points.Also notice that the pruned version is an improvement,that increases with q . This is because the total number ofhypotheses divided by the number of hypotheses we makeincreases with q . By comparing (12) and (10), we expect theimprovement to be a factor [1.1303, 1.2871, 1.4563, 1.6152,1.7743, 1.9334]. This is pretty close to the factors 1.1435,1.4551, 1.6215, 1.8507, 2.0121, 2.3045] from simulation. C. LF1 vs LF2

We investigate the increased number of samples needed dueto dependencies, when using LF2. For LF2, depending on thenumber of samples needed for guessing, we used either theminimum number of samples to produce a new generation ofthe same size or a sample size roughly equal to the size neededfor guessing at the end. To test the limit of LF2 we made sureto produce every possible sample from each category. See the upper part of Figure 2 for details. The setting is the same asin Section VI-A. We only use the pruned FFT distinguisher.Notice that the performance is almost exactly the same in boththe LF1 and the LF2 cases, as is generally assumed [11].

D. Sample Ampliﬁcation

The lower part of Figure 2 shows the increased number ofsamples needed, due to sample ampliﬁcation. We use q = 1601 and 1600 initial samples. We form new samples by combiningtriples of samples to get a large enough sample size. We varythe noise level between α = 0 . / √ and α = 0 . / √ .We take 13 steps of plain BKW, reducing 2 positions per step.Finally we guess the last 2 positions and measure the minimumnumber of samples needed to guess correctly. We use LF1and we compare the results against starting with as manysamples as we want and noise levels between α = 0 . and α = 0 . , both tricks to isolate the dependency due to sampleampliﬁcation. We only use the pruned FFT distinguisher. Thedifference between the points is small, implying that thedependency due to sample ampliﬁcation is limited. Fig. 2: Testing the effect of sample dependence.

VII. C

ONCLUSIONS

We have shown that the FFT distinguisher and the optimaldistinguisher have the same sample complexity for solvingLWE using BKW. We have also showed that it performsroughly an order of magnitude better than the upper limitformula from [10, Thm. 16]. Our pruned version of the FFTmethod improves the sample complexity of the FFT solver, atno cost. Finally, we have indicated that the sample dependencydue to both LF2 and sample ampliﬁcation is limited.

EFERENCES[1] P. W. Shor, “Algorithms for quantum computation: Discrete logarithmsand factoring,” in . Santa Fe, NM, USA: IEEE Computer Society Press, Nov. 20–22, 1994, pp. 124–134.[2] “NIST Post-Quantum Cryptography Standardization,”https://csrc.nist.gov/Projects/Post-Quantum-Cryptography/Post-Quantum-Cryptography-Standardization, accessed: 2019-09-24.[3] O. Regev, “On lattices, learning with errors, random linear codes,and cryptography,” in , H. N. Gabow and R. Fagin, Eds. Baltimore, MA, USA:ACM Press, May 22–24, 2005, pp. 84–93.[4] A. Blum, M. L. Furst, M. J. Kearns, and R. J. Lipton, “Cryptographicprimitives based on hard learning problems,” in

Advances in Cryptology– CRYPTO’93 , ser. Lecture Notes in Computer Science, D. R. Stinson,Ed., vol. 773. Santa Barbara, CA, USA: Springer, Heidelberg, Germany,Aug. 22–26, 1994, pp. 278–291.[5] A. Blum, A. Kalai, and H. Wasserman, “Noise-tolerant learning, theparity problem, and the statistical query model,” in . Portland, OR, USA: ACM Press,May 21–23, 2000, pp. 435–440.[6] ——, “Noise-tolerant learning, the parity problem, and the statisticalquery model,”

J. ACM , vol. 50, no. 4, pp. 506–519, 2003. [Online].Available: https://doi.org/10.1145/792538.792543[7] M. R. Albrecht, R. Player, and S. Scott, “On The Concrete Hardness OfLearning With Errors,”

J. Mathematical Cryptology , vol. 9, no. 3, pp.169–203, 2015.[8] G. Herold, E. Kirshanova, and A. May, “On the asymptotic complexityof solving LWE,”

Des. Codes Cryptogr. , vol. 86, no. 1, pp. 55–83,2018. [Online]. Available: https://doi.org/10.1007/s10623-016-0326-0[9] Q. Guo, T. Johansson, E. M˚artensson, and P. Stankovski Wagner, “Onthe asymptotics of solving the LWE problem using coded-bkw withsieving,”

IEEE Trans. Information Theory , vol. 65, no. 8, pp. 5243–5259,2019. [Online]. Available: https://doi.org/10.1109/TIT.2019.2906233[10] A. Duc, F. Tram`er, and S. Vaudenay, “Better algorithms for LWE andLWR,” in

Advances in Cryptology – EUROCRYPT 2015, Part I , ser.Lecture Notes in Computer Science, E. Oswald and M. Fischlin, Eds.,vol. 9056. Soﬁa, Bulgaria: Springer, Heidelberg, Germany, Apr. 26–30,2015, pp. 173–202.[11] ´E. Levieil and P.-A. Fouque, “An improved LPN algorithm,” in

SCN 06:5th International Conference on Security in Communication Networks ,ser. Lecture Notes in Computer Science, R. D. Prisco and M. Yung,Eds., vol. 4116. Maiori, Italy: Springer, Heidelberg, Germany, Sep. 6–8, 2006, pp. 348–359.[12] P. Kirchner, “Improved generalized birthday attack,” Cryptology ePrintArchive, Report 2011/377, 2011, http://eprint.iacr.org/2011/377.[13] B. Applebaum, D. Cash, C. Peikert, and A. Sahai, “Fast cryptographicprimitives and circular-secure encryption based on hard learning prob-lems,” in

Advances in Cryptology – CRYPTO 2009 , ser. Lecture Notesin Computer Science, S. Halevi, Ed., vol. 5677. Santa Barbara, CA,USA: Springer, Heidelberg, Germany, Aug. 16–20, 2009, pp. 595–618.[14] D. J. Bernstein and T. Lange, “Never trust a bunny,” Cryptology ePrintArchive, Report 2012/355, 2012, http://eprint.iacr.org/2012/355.[15] Q. Guo, T. Johansson, and C. L¨ondahl, “Solving LPN using coveringcodes,” in

Advances in Cryptology – ASIACRYPT 2014, Part I , ser.Lecture Notes in Computer Science, P. Sarkar and T. Iwata, Eds., vol.8873. Kaoshiung, Taiwan, R.O.C.: Springer, Heidelberg, Germany,Dec. 7–11, 2014, pp. 1–20.[16] Q. Guo, T. Johansson, and C. L¨ondahl, “Solving LPN using coveringcodes,”

J. Cryptology , vol. 33, no. 1, pp. 1–33, 2020. [Online].Available: https://doi.org/10.1007/s00145-019-09338-8[17] B. Zhang, L. Jiao, and M. Wang, “Faster algorithms for solving LPN,”in

Advances in Cryptology – EUROCRYPT 2016, Part I , ser. LectureNotes in Computer Science, M. Fischlin and J.-S. Coron, Eds., vol.9665. Vienna, Austria: Springer, Heidelberg, Germany, May 8–12,2016, pp. 168–195.[18] S. Bogos and S. Vaudenay, “Optimization of LPN solving algorithms,” in

Advances in Cryptology – ASIACRYPT 2016, Part I , ser. Lecture Notesin Computer Science, J. H. Cheon and T. Takagi, Eds., vol. 10031.Hanoi, Vietnam: Springer, Heidelberg, Germany, Dec. 4–8, 2016, pp.703–728. [19] S. Bogos, F. Tram`er, and S. Vaudenay, “On solving L P N using BK W and variants - implementation and analysis,”

Cryptography andCommunications , vol. 8, no. 3, pp. 331–369, 2016. [Online]. Available:https://doi.org/10.1007/s12095-015-0149-2[20] M. R. Albrecht, C. Cid, J.-C. Faug`ere, R. Fitzpatrick, and L. Perret, “Onthe complexity of the BKW algorithm on LWE,”

Designs, Codes andCryptography , vol. 74, no. 2, pp. 325–354, 2015.[21] M. R. Albrecht, J.-C. Faug`ere, R. Fitzpatrick, and L. Perret, “Lazymodulus switching for the BKW algorithm on LWE,” in

PKC 2014:17th International Conference on Theory and Practice of Public KeyCryptography , ser. Lecture Notes in Computer Science, H. Krawczyk,Ed., vol. 8383. Buenos Aires, Argentina: Springer, Heidelberg, Ger-many, Mar. 26–28, 2014, pp. 429–445.[22] Q. Guo, T. Johansson, and P. Stankovski, “Coded-BKW: Solving LWEusing lattice codes,” in

Advances in Cryptology – CRYPTO 2015, Part I ,ser. Lecture Notes in Computer Science, R. Gennaro and M. J. B.Robshaw, Eds., vol. 9215. Santa Barbara, CA, USA: Springer,Heidelberg, Germany, Aug. 16–20, 2015, pp. 23–42.[23] P. Kirchner and P.-A. Fouque, “An improved BKW algorithm forLWE with applications to cryptography and lattices,” in

Advances inCryptology – CRYPTO 2015, Part I , ser. Lecture Notes in ComputerScience, R. Gennaro and M. J. B. Robshaw, Eds., vol. 9215. SantaBarbara, CA, USA: Springer, Heidelberg, Germany, Aug. 16–20, 2015,pp. 43–62.[24] Q. Guo, T. Johansson, E. M˚artensson, and P. Stankovski, “Coded-BKWwith sieving,” in

Advances in Cryptology – ASIACRYPT 2017, Part I , ser.Lecture Notes in Computer Science, T. Takagi and T. Peyrin, Eds., vol.10624. Hong Kong, China: Springer, Heidelberg, Germany, Dec. 3–7,2017, pp. 323–346.[25] A. Esser, R. K¨ubler, and A. May, “LPN decoded,” in

Advances inCryptology – CRYPTO 2017, Part II , ser. Lecture Notes in ComputerScience, J. Katz and H. Shacham, Eds., vol. 10402. Santa Barbara, CA,USA: Springer, Heidelberg, Germany, Aug. 20–24, 2017, pp. 486–514.[26] A. Esser, F. Heuer, R. K¨ubler, A. May, and C. Sohler, “Dissection-BKW,” in

Advances in Cryptology – CRYPTO 2018, Part II , ser. LectureNotes in Computer Science, H. Shacham and A. Boldyreva, Eds., vol.10992. Santa Barbara, CA, USA: Springer, Heidelberg, Germany,Aug. 19–23, 2018, pp. 638–666.[27] C. Delaplace, A. Esser, and A. May, “Improved low-memory subset sumand LPN algorithms via multiple collisions,” in , ser. Lecture Notes in Com-puter Science, M. Albrecht, Ed., vol. 11929. Oxford, UK: Springer,Heidelberg, Germany, Dec. 16–18, 2019, pp. 178–199.[28] E. M˚artensson, “The asymptotic complexity of coded-bkw withsieving using increasing reduction factors,” in

IEEE InternationalSymposium on Information Theory, ISIT 2019, Paris, France, July7-12, 2019 . IEEE, 2019, pp. 2579–2583. [Online]. Available:https://doi.org/10.1109/ISIT.2019.8849218[29] T. Baign`eres, P. Junod, and S. Vaudenay, “How far can we go beyondlinear cryptanalysis?” in

Advances in Cryptology – ASIACRYPT 2004 ,ser. Lecture Notes in Computer Science, P. J. Lee, Ed., vol. 3329. JejuIsland, Korea: Springer, Heidelberg, Germany, Dec. 5–9, 2004, pp. 432–450.[30] H. V. Sorensen and C. S. Burrus, “Efﬁcient computation of the dft withonly a subset of input or output points,”

IEEE Transactions on SignalProcessing , vol. 41, no. 3, pp. 1184–1200, 1993.[31] A. Budroni, E. M˚artensson, and P. Stankovski Wagner, “FBBL - ﬁle-Based BKW for LWE,” https://github.com/FBBL/fbbl, 2020.[32] A. Budroni, Q. Guo, T. Johansson, E. M˚artensson, and P. S. Wagner,“Making the bkw algorithm practical for lwe,” in

Progress in Cryptology– INDOCRYPT 2020

Advances in Cryptology – EUROCRYPT 2019,Part II , ser. Lecture Notes in Computer Science, Y. Ishai and V. Rijmen,Eds., vol. 11477. Darmstadt, Germany: Springer, Heidelberg, Germany,May 19–23, 2019, pp. 717–746.

PPENDIX AE XPLAINING THE O PTIMIMALITY OF THE

FFTD

ISTINGUISHER

Consider a sample on the form (3). By making a guess ˆ s we calculate the corresponding error term ˆ e . The Fouriertransform of the FFT distinguisher in (8) can now be writtenas m (cid:88) j =1 θ ˆ e j q . (13)The real part (13) is equal to m (cid:88) j =1 cos (2 π ˆ e j /q ) . (14)The FFT distinguisher picks the guess that maximizes (14).Now, let us rewrite (4) for the optimal distinguisher as m (cid:88) j =1 log (cid:16) q · Pr ¯Ψ σf ,q (ˆ e j ) (cid:17) . (15)It turns out that with increasing noise level, the termsin (15) can be approximated as cosine functions with aperiod of q , as illustrated in Figure 3. The terms correspondto q = 1601 , starting with rounded Gaussian noise with α = 0 . , σ = α · q = 8 . and taking 12 or 13 stepsof plain BKW respectively. Notice that the approximationgets drastically better with increasing noise level . The 13step picture corresponds to the setting used in most of theexperiments in Section VI. For a large-scale problem, the noiselevel would of course be much larger, resulting in an evenbetter cosine approximation.Since both distinguishers pick the ˆ s that minimizes a sumof cosine functions with the same period, they will pick thesame ˆ s , hence they will perform identically. Also notice that the approximation is not necessarily the best cosineapproximation. It is simple the approximation that matches the largest andthe smallest value of the curve. (a) Taking 12 plain BKW steps(b) Taking 13 plain BKW steps

Fig. 3: Approximating the terms in (15) as cosine functions.

PPENDIX BN UMBER OF I TERATIONS IN THE S IMULATIONS

The following is a collection of lists of the number ofiterations used for each point to get the estimations of themedian values in Figures 1-2. For each ﬁgure and curve welist the number iterations from left to right in, in other wordsin increasing level of noise level α or modulus q . Figure 1 - Varying α Simulated FFT 31 51 52 59 50 52Simulated Pruned FFT 33 41 56 35 30 49

Figure 1 - Varying q Simulated FFT 100 100 95 80 67 82Simulated Pruned FFT 100 100 95 80 67 82

Figure 2 - LF1 vs. LF2

LF1 33 41 56 35 30 49LF2 43 46 69 37 69 50