[PDF] Quantum-classical reinforcement learning for decoding noisy classical parity information

Abstract

Full PDF

QQuantum-classical reinforcement learning for decoding noisyclassical parity information

Daniel K. Park,

1, 2, ∗ Jonghun Park,

1, 2 and June-Koo Kevin Rhee

1, 21

School of Electrical Engineering, KAIST, Daejeon, 34141, Republic of Korea ITRC of Quantum Computing for AI,KAIST, Daejeon, 34141, Republic of Korea

Abstract

Learning a hidden parity function from noisy data, known as learning parity with noise (LPN),is an example of intelligent behavior that aims to generalize a concept based on noisy examples.The solution to LPN immediately leads to decoding a random binary linear code in the presenceof classiﬁcation noise. This problem is thought to be intractable classically, but can be solvedeﬃciently if a quantum oracle can be queried. However, in practice, a learner is more likely toreceive data from classical oracles. In this work, we show that a naive application of the quantumLPN algorithm to classical data encoded in an equal superposition state requires an exponentialsample complexity. We then propose a quantum-classical reinforcement learning algorithm to solvethe LPN problem for data generated by a classical oracle and demonstrate a signiﬁcant reductionin the sample complexity. Simulations with a hidden bit string of length up to 12 show that thequantum-classical reinforcement learning performs better than known classical algorithms whenthe sample complexity, run time, and robustness to classical noise are collectively considered. Ouralgorithm is robust to any noise in the quantum circuit that eﬀectively appears as Pauli errors onthe ﬁnal state. ∗ [email protected] a r X i v : . [ qu a n t - ph ] O c t . INTRODUCTION Recent discoveries of quantum algorithms for machine learning and artiﬁcial intelligencehave gained much attention, and stimulated further exploration of quantum technologies forapplications in complex data analysis. A type of machine learning considered in this work isrelated to the ability to construct a general concept based on examples that contain errors.This task can be formulated in the probably approximately correct (PAC) framework [1] inwhich a learner constructs a hypothesis h with high probability based on a training set ofinput-output pairs such that h ( x ) agrees with f ( x ) on a large fraction of the inputs. In thiscontext, important metrics for characterizing the learnability are the sample complexity andthe time complexity that correspond to the minimum number of examples required to reachthe goal and the run time of the algorithm, respectively.A famous example of such tasks is the problem of learning a hidden Boolean functionthat outputs a binary inner product of an input and a hidden bit string of length n bymaking queries to an oracle that draws an input uniformly at random. This problem canalso be tackled by making queries to a quantum oracle that produces all possible input-output pairs in an equal superposition state. Learning the parity function from a noiselessoracle is easy in both classical and quantum cases. When the outcomes from an oracle arealtered by noise, learning from a classical oracle becomes intractable while the quantumversion remains to be eﬃcient [2]. This problem is also known as learning parity with noise(LPN). The LPN problem is equivalent to decoding a random linear code in the presence ofnoise [3], and several cryptographic applications have been suggested based on the hardnessof this problem and its generalizations [4, 5]. Furthermore, the robustness of the quantumlearning against noise opens up possibilities for achieving a quantum advantage with near-term quantum devices without relying on quantum error correction. However, the existenceof a quantum oracle for solving a speciﬁc problem is highly hypothetical. In practice, alearner often has to learn from a classical data set. The ability to exhibit the advantage ofquantum learning, especially when training examples are classical, remains an interestingand important open problem.In this work, we show that a naive application of the quantum LPN algorithm to classi-cal data requires an exponential amount of examples (i.e., training samples) or computingresources, thereby nullifying the quantum advantage. We then propose a quantum-classical2ybrid algorithm based on the reinforcement learning framework for solving the LPN prob-lem in the absence of the quantum oracle. The proposed algorithm uses noisy classicalsamples to prepare an input quantum state that is compatible with the original quantumLPN algorithm. Based on the outcome of the quantum algorithm, a reward is classicallyevaluated and an action is chosen by a greedy algorithm to update the quantum state in thenext learning cycle. Numerical calculations show that the required number of samples andrun time can be signiﬁcantly reduced. Furthermore, simulations show that in the regime ofsmall n , the quantum-classical hybrid algorithm performs comparably to or better than theclassical algorithm that performs the best in this regime in terms of the sample complexity.Simulation results also suggest that our algorithm is more robust to noise than the classicalalgorithms. Our algorithm is expected to improve the run time of the classical algorithms,provided that an eﬃcient means to update a quantum state with classical data, such asquantum random access memory, exists. Another notable feature of our algorithm is thatit is robust to noise that accumulates to depolarizing error on the outcome prior to themeasurement.Our algorithm also ﬁts within the framework of variational quantum algorithms, aclassical-quantum hybrid approach that has been developed as a promising avenue todemonstrate the quantum advantage with noisy intermediate scale quantum (NISQ) de-vices. Variational quantum algorithms have been used with success to ﬁnd near-optimalsolutions for various important problems in quantum chemistry [6–10], quantum machinelearning [11–17], and quantum control [18]. A unique challenge of the problem considered inthis work is that the algorithm must ﬁnd the exact and unique solution, i.e., the hidden bitstring, with high probability. Thus, our work serves as an intriguing example that utilizesthe concept of variational method for ﬁnding the exact solution of a problem.The remainder of the paper is organized as follows. Section II reviews LPN. Section IIIshows that in the absence of the quantum oracle, the naive application of the quantumalgorithm to classical data results in an exponential complexity. In Sec. IV, we present aquantum-classical hybrid algorithm based on reinforcement learning for solving the LPNproblem. Numerical calculations in Sec. IV B demonstrate that both sample and time com-plexities of the hybrid algorithm are signiﬁcantly reduced compared to the native applicationof the quantum LPN algorithm. Section IV C compares the performance of our algorithmand known classical algorithms via simulations. Section V discusses the resilience to depo-3 x f (cid:1) ( x ) = x · s ⊕ e mod 2 FIG. 1. A pictorial representation of the LPN problem. larizing errors on the ﬁnal state, and Section VI concludes.

II. LEARNING PARITY WITH NOISE

The goal of the parity learning problem is to ﬁnd a hidden bit string s ∈ { , } n bymaking queries to an example oracle that returns a training data pair that consists of auniformly random input x ∈ { , } n and an output of a Boolean function, f ( x ) = x · s mod 2 . (1)A noisy oracle outputs ( x, f ( x ) ⊕ e ), where e ∈ { , } has the Bernoulli distribution withparameter η , i.e., P ( e = 1) = η , and η < / | x (cid:105) | f ( x ) (cid:105) for all possible inputs x . By applying Hadamard gates to all qubits at the queryoutput, the learner acquires an entangled state1 √ (cid:0) | (cid:105) ⊗ n | (cid:105) + | s (cid:105) | (cid:105) (cid:1) . (2)Thus, whenever the label (last) qubit is 1 (occurs with probability 1 / n ) qubits in their computational bases reveals s . Note that this algorithm is very similarto the Bernstein-Vazirani (BV) algorithm [21], except that in the BV problem the learnercan choose an example in each query and the input state of the label qubit is prepared in |−(cid:105) . In the quantum case, since all example data are queried in superposition, the ability tochoose an example is irrelevant. On the other hand, the quantum LPN algorithm requiresthe extra post-selection step since the input of the label qubit is prepared in | (cid:105) .A noisy quantum oracle can be modeled with the local depolarizing channel D η ( ρ ) =(1 − η ) ρ + η

1l acting independently on all qubits at the oracle’s output with a knownconstant noise rate of η < /

2. The quantum circuit for solving the LPN algorithm is4 H | (cid:1)| (cid:1)| (cid:1)| (cid:1) { | (cid:1) labelqueryregister … HHH HHHHH D η D η D η D η D η FIG. 2. The quantum circuit for learning parity with noise introduced in Ref. [2]. Hadamard oper-ations ( H ) prepare the equal superposition of all possible input states. The dotted box representsthe quantum oracle that encodes the hidden parity function, and is realized using controlled- NOT gates between the query register (control) and label (target) qubits. The hidden bit string in thisexample is s = 101 . . .

0. Before measurement, all qubits experience independent depolarizing noisedenoted by D η with a noise rate η < / depicted in Fig. 2. In this example, s is 101 . . .

0, and it is encoded via a series of controlled-

NOT (c-

NOT ) gates targeting the result qubit controlled by the data qubits. The shaded areain the ﬁgure represents the quantum oracle whose structure is hidden from the learner.Learning the hidden parity function from noiseless examples is eﬃcient for both classicaland quantum oracles. However, in the presence of noise, the best-known classical algorithmshave superpolynomial complexities [3, 19, 20, 22], while the quantum learning based on thebit-wise majority vote remains eﬃcient [2]. The query and time complexities of the LPNproblem for classical and quantum oracles are summarized in Tab. I

Reference Oracle Queries (samples) TimeAngluin & Laird (AL) [19] Classical O ( n ) O (2 n )Blum, Kalai & Wasserman (BKW) [20] Classical 2 O (cid:16) n log n (cid:17) O (cid:16) n log n (cid:17) Lyubashevsky (L) [3] Classical O (cid:0) n (cid:15) (cid:1) O (cid:16) n log log n (cid:17) Cross, Smith & Smolin (CSS) [2] Quantum O (log n ) O ( n )TABLE I. Summary of the query (or sample) and time complexities of various LPN algorithmsreported in previous works. The advantage of having a quantum oracle for solving an LPN problem was demonstrated5xperimentally with superconducting qubits in Ref. [23]. Furthermore, a quantum advantagecan be demonstrated even when all query register qubits are fully depolarized by usingdeterministic quantum computation with one qubit [24, 25].In the following sections, we discuss quantum techniques to solve the LPN problem inthe absence of the quantum oracle. The general strategy considered in this work is toprepare a speciﬁc quantum state based on M classical noisy training samples, and apply themeasurement scheme developed in Ref. [2]. The measurement outcome of the query qubitsin the computational basis, ˜ s M , yields the hypothesis function. The goal of our algorithmsis to minimize M with which the probability to guess the correct hidden bit string is greaterthan 2 /

3, i.e., P (˜ s M = s | M ) = γ, γ > / . (3)Then by repeating the algorithm a constant number of times and taking a majority vote, s can be found with high probability. Note that this is not a strictly necessary condition as themajority vote can ﬁnd the correct answer eﬃciently with high probability for γ ≥ / /δ as long as δ is at most poly( n ), since the algorithm is to be repeated O ( δ ) times. However,Eq. (3) is a suﬃcient condition to solve the problem. III. NAIVE APPLICATION OF QUANTUM ALGORITHM TO CLASSICALDATAA. Learning from a sparse set of training samples

Given

M < n examples of data, ( x j , f ( x j )), where j = 1 , . . . , M , a naive way to applythe quantum LPN algorithm is to create a quantum state, | Ψ (cid:105) = 1 √ M M (cid:88) j =1 | x j (cid:105) | f (cid:48) ( x j ) (cid:105) , (4)and treat it as the output of the quantum oracle. Then, as in the quantum LPN algorithm,single-qubit Hadamard gates are applied to all qubits and the label qubit is measured in thecomputational basis. The measurement outcome of 1, which occurs with the probability of1/2, is post-selected to leave the query register qubits in the state, | ψ (cid:105) = 1 √ M n M (cid:88) j =1 (cid:34) ( − e j | s (cid:105) + (cid:88) y (cid:54) =0 n ( − x j · y ⊕ e j | s ⊕ y (cid:105) (cid:35) . (5)6rom the above state, the probability to guess s correctly is P (˜ s M = s ) = M (1 − η ) n . (6)However, this result also implies that even when the quantum oracle outputs only a fractionof all possible examples as an equal superposition state, and the noise does not act coherentlyon all x j as in Refs. [2, 23], the LPN problem can still be solved. As a side note, since x j is drawn uniformly at random, P ( x j · y = 1 | y (cid:54) = 0 n ) = P ( x j · y = 0 | y (cid:54) = 0 n ) = 1 / E (cid:104)(cid:80) Mj ( − x j · y | y (cid:54) = 0 n (cid:105) = 0. Thus, the probability to measure an incorrect bit string s ⊕ y (see the second term in Eq. (5)) depends on the error distribution that determines e j . Forinstance, the worst case occurs when the error bits e j coincides with x j · y ∀ j . In this case,the probability to obtain the incorrect answer is P (˜ s M = s ⊕ y ) = M/ n ≥ P (˜ s M = s ).Thus, in the worst case scenario, the naive application of the quantum algorithm to a limitednumber of classical samples cannot solve the LPN problem. B. Data span

In order to reduce the number of queries, linearly independent data can be added togenerate an artiﬁcial data, which can be used in creating the state in the form of Eq. (4).However, this not only requires the classical pre-processing, but also can increase the errorprobability of the generated parity bit. For example, from two data pairs, ( x , f (cid:48) ( x )) and( x , f (cid:48) ( x )) with linearly independent inputs, a new data ( x = x ⊕ x , f ( x ) = f (cid:48) ( x ) ⊕ f (cid:48) ( x )) can be created. Since f (cid:48) ( x ) ⊕ f (cid:48) ( x ) = ( x ⊕ x ) · s ⊕ ( e ⊕ e ) mod 2, the errorprobability of the artiﬁcial parity bit is η (cid:48) = P ( e ⊕ e = 1) = P ( e = 1) P ( e = 0) + P ( e = 0) P ( e = 1) = 2 η (1 − η ) . (7)The above equation can be generalized for a new data generated from d linearly independentdata as follows: η (cid:48) = P (cid:32) d (cid:88) i =1 e i mod 2 = 1 (cid:33) = (cid:88) j =odd (cid:18) dj (cid:19) η j (1 − η ) d − j = 1 − (1 − η ) d . (8)For d > η > / (cid:0) − (1 − η ) d (cid:1) / > η . Therefore, data span always increasesthe error rate in addition to increasing the time complexity for pre-processing the data.However, when η is suﬃciently small so that η (cid:48) also remains reasonably small, one mayconsider using the data span trick in order to reduce the sample complexity.7 . Generation of artiﬁcial data with a parity guess function The next strategy we consider is to generate missing data by guessing the parity functionand design an iterative algorithm to improve the accuracy of the guess. In this approach,the rate of accuracy improvement with respect to the number of queries determines theeﬃciency of the algorithm.A brief description of the iterative LPN (I-LPN) algorithm is as follows. First, all 2 n examples are provided as an equal quantum superposition state using M real data and2 n − M artiﬁcial data generated by a parity guess function. The quantum state can beprepared by guessing the quantum oracle of the quantum LPN algorithm, and inserting theoutput state of the oracle as an input to quantum random access memory (QRAM) [26–30]to update its entries according to real data. The circuit-based QRAM introduced in Ref. [30]can use ﬂip-register-ﬂop processes to update an output of a guessed quantum oracle with realdata using the number of steps that increases at least linearly with the number of samples.Then the usual quantum LPN protocol that consists of applying Hadamard gates, projectivemeasurements, and the post-selection outputs an n bit string in the register qubits. Thisstring is used to construct a new parity guess function in the next iteration for which a newsample is also acquired. The learner can also repeat the measurement procedure for guessinga new parity function without querying a new sample. This iteration is referred to as epoch .More detailed description of the algorithm is given in steps below.8 lgorithm 1 Iterative LPN (I-LPN) Make an initial guess of s as ˜ s = 0 n for m = 1 to M do Collect ( x m , f ( x m )) if x m = 0 n then Set f ( x m ) = 0 for i = 1 to number of epoch do Use ˜ s m − to implement a quantum oracle and prepare √ n (cid:80) n j | x j (cid:105) | g ( x j ) (cid:105) , where g ( x j ) = x j · ˜ s m − mod 2 Update the above state as | Ψ (cid:105) = √ n (cid:16)(cid:80) mj =1 | x j (cid:105) | f (cid:48) ( x j ) (cid:105) + (cid:80) n j = m +1 | x j (cid:105) | g ( x j ) (cid:105) (cid:17) Apply Hadamard gates on all qubits

Measure the label qubit in the computational basis and post-select the state with themeasurement outcome of | (cid:105) Measure the query registers of the post-selected state in the computational basis

Set ˜ s m − to the measured bit string Set ˜ s m = ˜ s m − Now we analyze the performance of I-LPN. In the iterative algorithm, the time complex-ity is dominated by the state preparation step. Since the quantum oracle implementationand the QRAM process given M training samples requires O ( n ) and O ( M ) run times, re-spectively, we focus on the estimation of the sample complexity. The I-LPN algorithm usesaforementioned procedure to prepare a quantum state | Ψ (cid:105) = 1 √ n n (cid:88) j =1 | x j (cid:105) | h ( x j ) (cid:105) , (9)where h ( x j ) =  f (cid:48) ( x j ) = x j · s ⊕ e j mod 2 if j ≤ M,g ( x j ) = x j · ˜ s M − mod 2 if j > M, (10)and ˜ s M − is the parity guess function from the previous round. Now let ε j be a Bernoullirandom variable that is 0 if h ( x j ) = f ( x j ) and 1 otherwise. In other words, the weightof a string deﬁned as ε := ε ε . . . ε n , denoted by w ( ε ), is the number of diﬀerent bitsbetween h n and f n , where (cid:3) i : j denotes a binary string (cid:3) ( x i ) (cid:3) ( x i +1 ) . . . (cid:3) ( x j ). With9his deﬁnition, we can write h ( x j ) = x j · s ⊕ ε j , (11)where ε j =  e j if j ≤ M,x j · ( s ⊕ ˜ s M − ) if j > M. (12)The post-selected state is | ψ (cid:105) = 12 n n (cid:88) j (cid:32) ( − ε j | s (cid:105) + (cid:88) y (cid:54) =0 n ( − x j · y ⊕ ε j | s ⊕ y (cid:105) (cid:33) , (13)The probability to obtain s from the projective measurement in the computational basis is P M := P (˜ s M = s ) = (1 − r ( M )) , (14)where r ( M ) = w ( ε ) / n is the error probability in estimating f from h given M noisy data.From Eq. (12), one can see that if ˜ s M − = s , then ε j = 0 ∀ j > M , and only the errors inthe real samples yield non-zero values in ε . However, if ˜ s M − (cid:54) = s , then since x j is chosenuniformly at random, ε j = 0 for 1/2 of the set of input x j for j > M on average. With this,the expectation value of the weight of ε can be calculated as E ( w ( ε )) =  M η if ˜ s M − = s,M η + (2 n − M ) / s M − (cid:54) = s. (15)We start the algorithm with an initial sample ( x , f ( x )), and the initial guess ˜ s = 0 n .Then the initial error probability is r (1) = 12 n η n + (cid:18) − n (cid:19) η + (2 n − / n . (16)The ﬁrst term takes into consideration that ˜ s = 0 n is the right answer with the probability1 / n . In this case, only the real data can carry incorrect parity bits with a probability η .The second term indicates that when ˜ s is incorrect, 1/2 of the 2 n − M samples can be written recursively as r (2) = P · η n + (1 − P ) · η + (2 n − / n ,r (3) = P · η n + (1 − P ) · η + (2 n − / n , ... r ( M ) = P M − · M η n + (1 − P M − ) · M η + (2 n − M ) / n . (17)10 n=5, =0n=10, =0n=25, =0n=25, =0.05n=25, =0.12/3 n increases FIG. 3. Success probability of the iterative LPN algorithm with respect to the number of samplesin which the parity guess function is obtained from the quantum LPN circuit and used in thesubsequent round.

For brevity, we denote ˜ η = M η/ n and ˜ η = ( M η + (2 n − M ) / / n throughout themanuscript. We plot the success probability as a function of the number of sample forvarious values of n and η in Fig. 3. If one increases epoch , the ﬁdelity curve simply con-verges faster to the one for n → ∞ . Thus, the number of samples needed to achieve desiredsuccess probability is exponential in n .In the following section, we show that an introduction of a simple policy for updating theparity guess function, as done in reinforcement learning, signiﬁcantly enhances the learningperformance. IV. REINFORCEMENT LEARNINGA. Greedy algorithm

To improve the performance of the iterative algorithm introduced in the previous section,we use the concepts of reinforcement learning, such as state, reward, policy and action. Thekey addition to the previous iterative algorithm is the use of a greedy algorithm, whichalways exploits current knowledge to maximize immediate reward, as the policy to make an11ction. We refer to this algorithm as reinforcement-learning parity with noise (R-LPN).The underlying idea of R-LPN can be described as follows. The state in each iterationis the guessed bit string after performing the usual quantum LPN algorithm. The rewardis determined by the Hamming distance between the parity bits generated by the guess andthe parity bits of the real data. At M th query, the learner obtains M guessed bit stringsas well as M reward values. The greedy algorithm then selects the guessed bit string thatmaximizes the reward, and use it to construct the guessed quantum oracle. Our algorithmcan be viewed as a variational quantum algorithm as the guessed quantum oracle can beparameterized with controlled-not gates and is updated in each iteration. The detaileddescription of the R-LPN algorithm is provided below, and a schematic representation ofthe algorithm is shown in Fig. 4. Algorithm 2

Reinforcement LPN (R-LPN) Make an initial guess of s as ˜ s = 0 n for m = 1 to M do Collect ( x m , f ( x m )) if x m = 0 n then Set f ( x m ) = 0 for i = 1 to number of epoch do Use ˜ s m − to implement the oracle in the quantum LPN algorithm and prepare √ n (cid:80) n j | x j (cid:105) | g ( x j ) (cid:105) , where g ( x j ) = x j · ˜ s m − mod 2 Create a state | Ψ (cid:105) = √ n (cid:16)(cid:80) mj =1 | x j (cid:105) | f (cid:48) ( x j ) (cid:105) + (cid:80) n j = m +1 | x j (cid:105) | g ( x j ) (cid:105) (cid:17) Apply Hadamard gates on all qubits

Set ˜ s m to the measured bit string Generate m sets of guessed parity bits g ( j )1: m , 1 ≤ j ≤ m using ˜ s ,. . . , ˜ s m Calculate the Hamming distance, d H ( g ( j )1: m , f (cid:48) m ) ∀ ≤ j ≤ m Set ˜ s m − = arg min ≤ j ≤ m d H ( g ( j )1: m , f (cid:48) m ) Set ˜ s m = ˜ s m − n M ∑ j = 1 | x j ⟩ | f ′ ( x j )⟩ + n ∑ j ′ = M + 1 | x j ′ ⟩ | g M ( x j ′ )⟩ } Classical-Quantum (in RL language)

State Preparation

HHH … } ˜ s M Reward StatePolicy π : ˜ s M = arg max j ≤ M r ( j ) Post-select 1

Artiﬁcial data { True data { Action: Update U g True parityGenerated parity r ( j ) = 1 − d H ( g ( j )1: M , f ′ M ) HH … H | | | | U g ∑ x ∈ {0,1} n | x ⟩ | g M ( x )⟩ ( x M , f ′ ( x M ) ) FIG. 4. Schematic of the quantum-classical hybrid algorithm for solving the learning parity withnoise problem, explained by using the terminologies in reinforcement learning.

B. Numerical analysis

We analyze the performance of R-LPN by numerically calculating the error probability,the probability to measure ˜ s M (cid:54) = s in the round with M samples, similar to the recursivecalculation shown in Sec. III C. The algorithm uses parity guess functions from all mea-surements up to the present round, i.e., ˜ s , . . . , ˜ s M . To construct the recursive formula, weconsider two situations. First, the set of parity guess functions does not contain the answer,i.e., s / ∈ { ˜ s , . . . , ˜ s M } . This occurs with the probability p M = (cid:81) M − j (1 − (1 − r ( j )) ),where r ( j ) is the error probability at the j th round, and (1 − r ( j )) corresponds to theprobability to obtain s (see Eq. (14)). When the parity guess function is wrong, the prob-ability to measure the wrong hidden bit string in the given round is ˜ η as explained in theprevious section.When s ∈ { ˜ s , . . . , ˜ s M } , we further consider two situations. First, M parity bits inthe training examples are error-free, which occurs with a probability of (1 − η ) M . In thiscase, given M linearly independent examples, which can be produced with the probability (cid:81) M − k =0 (1 − k − M ) > / M >

1, there are (cid:100) n − M (cid:101) choices out of all possible parityguess functions ˜ s M ∈ { , } n that can generate the same parity bit string as f M . Notethat when x j is uniformly zero, then f ( x j ) = 0 for any s . Thus, we exclude this examplewhen calculating the Hamming distance between the guessed parity bits and the true paritybits. We deﬁne c M = ( (cid:100) n − M (cid:101) − / (2 n −

1) as the probability to pick a wrong parity13uess function that produces the same parity bits as s among 2 n − M parity guess functions, the probability to obtain an incorrect parityfunction is actually less than c M . However, we use c M to make a reasonable estimation. Theincorrectly guessed parity function produces (2 n − M ) / M -bit parity string, f (cid:48) M , contains errors, then the probability to obtain awrong parity guess function can be estimated as β M = (cid:80) (cid:98) Mη (cid:101) k =1 (cid:0) Mk (cid:1) c M . This means that forsimplicity, there are up to (cid:98) M η (cid:101) (nearest integer to

M η ) errors in the true parity bit string.Combining all cases considered above, the error probability – the probability to obtain anincorrect parity guess function in the round with M examples – can be estimated as r ( M ) =(1 − p M − ) (cid:104) (1 − η ) M c M ((2 n − M ) / / n + (cid:0) − (1 − η ) M (cid:1) (˜ η (1 − β M ) + ˜ η β M ) (cid:105) + p M − ˜ η , (18)where the initial error probability, r (1), is given in Eq. (16).In the R-LPN algorithm, the time complexity is again dominated by state preparation,for which the number of steps increases at least linearly with the number of samples asmentioned in the previous section. The computation time for calculating the Hammingdistance between M guessed parity bit strings and the actual parity bit string is O ( nM ).Since these computation times depend on M , we focus on estimating the sample cost. Usingthe above equation, the number of samples required for achieving P (˜ s M = s ) > /

3, denotedby M / , can be calculated numerically. Figure 5 shows M / as a function of n for severalvalues of the error probability, η = { , . , . } . For each error rate, the number of epochis given as 1, n , and n . When n = 40, there are 2 ≈ possibilities for s . But even inthe presence of a relatively high error probability of 20%, having only about 10 examplessuﬃces to solve the problem. Figure 5 also suggests that the number of samples can befurther reduced by increasing epoch.We compare M / of I-LPN and R-LPN as a function of n for several values of the errorprobability, η = { , . , . } , in Fig. 6. In this comparison, the number of epoch is n . Theresult shows that R-LPN reduces the sample complexity by several orders of magnitude forwhen n is only 15 or so, and this improvement continues to increase as n increases. When n isabout 15 to 30, the curves qualitatively suggests that R-LPN enhances the sample complexity14

10 15 20 25 30 35 4010 epoch = 1, = 0epoch = 1, = 0.1epoch = 1, = 0.2epoch = n, = 0epoch = n, = 0.1epoch = n, = 0.2epoch = n , = 0epoch = n , = 0.1epoch = n , = 0.2 FIG. 5. Number of samples required for achieving P (˜ s M = s ) > /

3, denoted by M / as a functionof the length of the hidden bit string, n , for various error rates, η = { , . , . } . For each errorrate, the number of epoch is also varied among 1, n , and n . The number of samples neededincreases (decreases) as the error rate (number of epoch) increases. exponentially in n . However, our analysis does not provide a deﬁnitive conclusion about therate of improvement in the asymptotic limit. C. Simulation

We use simulations to verify the performance of the R-LPN algorithm, and to compareto known classical methods that are listed in Tab. I. Each iteration starts with the quantumstate of the form shown in step 7 of Alg. 2, using classical data that are provided uniformlyat random. The simulation then proceeds by following the subsequent steps in Alg. 2. Fora ﬁxed value of s , all simulations are repeated 200 times to average over the set of examplesdrawn uniformly at random. 15

10 15 20 25 3010 = 0 = 0.1 = 0.2 Epoch = n FIG. 6. Ratio between the numbers of samples required for achieving P (˜ s M = s ) > / M I / , and in R-LPN, denoted by M R / as a function of the length of the hidden bitstring, n , for various error rates, η = { , . , . } . For all calculations, the number of epoch is n .

1. Data ﬁltering

All simulations used an additional pre-processing step, which we refer to as data ﬁltering ,as an optional attempt to ﬁlter out erroneous examples and improve the success probability.Data ﬁltering counts the number of occurrence of example pairs, ( x j , f (cid:48) ( x j )), denoted by o j .Then, an example with a label k that appears less than some fraction of max j ( o j ) times (i.e., o k < w max j ( o j )/2) is discard. An intuitive motivation behind this procedure is that sinceexamples are randomly drawn from a uniform distribution, the same data can be drawnmultiple times and erroneous samples are less frequently queried than correct samples for η < /

2. The optimal choice of the ﬁltering coeﬃcient, w , depends on the error rate. Forexample, when η = 0, such data ﬁltering is not desired since all data are error-free. However,in our simulations, we assume that η is unknown, and we used w = 0 . (cid:112) d H ( g M M , f (cid:48) M )+0 . η . This choicemeans that the level of data ﬁltering increases as the number of samples, and hence thesuccess probability, increases. 16 . Results We ﬁrst simulate an R-LPN algorithm with a slight modiﬁcation, which is intended tosave the memory and time cost for storing all M sets of guessed parity bits to calculatetheir Hamming distances with respect to the real parity bits. Namely, only the guessedparity bits from the previous iteration is kept in the memory, and the greedy algorithmupdates the parity guess function for the next round by only comparing the rewards givenby the present and the previous guesses. We compare the performance of this modiﬁedalgorithm to the originally proposed R-LPN by analyzing the success probability with respectto the number of samples via simulations. The simulation results are depicted in Fig. 7 for n = 6, 7, and 8 as an example, and show that the modiﬁed R-LPN does not introduce anyconsiderable change in the sample complexity, especially around the region for P (˜ s M = s ) =2 /

3. Since the modiﬁed R-LPN algorithm uses only the current state for making an actionin the subsequent round, the simulation results of this algorithm are denoted by

Markov inthe ﬁgure legend. Hereinafter, all simulations use the modiﬁed R-LPN, since it performssimilarly to the original version in terms of the sample complexity while the memory andtime cost for calculating the rewards does not increase with M .Figure 8 shows the simulation results of the number of training samples required forsucceeding various LPN algorithms as a function of n for several values of the error rate, η . The number of epoch is n in all simulations in this ﬁgure. The simulation results showthat the R-LPN algorithm performs better than the known classical algorithms, AL [19]and BKW [20] (see Tab. I), in the regime of n ≤

12. In this regime, the algorithm byLyubashevsky (denoted by L) [3] consumes the least amount of samples among the classicalmethods. When η = 0, R-LPN appears to perform slightly worse than L. However, R-LPNbecomes advantageous for learning in the presence of the noise, especially as n increases.As summarized in Tab. I, the run time of L is subexponentially greater than its samplecomplexity. Moreover, the run time of BKW is comparable to its sample complexity andthe run time of AL increases exponentially with respect to n . However, the run time ofR-LPN is expected to be comparable to its sample complexity. Thus, we expect the R-LPNalgorithm to provide faster learning compared to the classical algorithms.The R-LPN algorithm is also more resilient to noise as demonstrated in Fig. 9. Thesimulation results show that the number of samples needed for succeeding aforementioned17 n=6n=6, Markovn=7n=7, Markovn=8n=8, Markov n=6n=6, Markovn=7n=7, Markovn=8n=8, Markov n=6n=6, Markovn=7n=7, Markovn=8n=8, Markov FIG. 7. The plots show the probability to ﬁnd a hidden bit string s in R-LPN algorithms as afunction of the normalized number of samples, M/ n . Dotted lines represent the simulation resultsof the R-LPN algorithm described in Alg. 2, and solid lines represent the simulation results of themodiﬁed R-LPN algorithm that keeps only the parity guess function from the previous round, andlabelled as Markov in the legend. Simulations are performed for (a) η = 0, (b) η = 0 .

1, and (c) η = 0 .

2, and for n = 6, 7, and 8. The number of epoch is 30 in all simulations in this ﬁgure. LPN algorithms as a function of the error rate, η , for various n . The number of epoch in allsimulations in this ﬁgure is n . The R-LPN algorithm requires less number of samples thanAL and BKW in all instances in simulations. R-LPN and L algorithms perform similarlyfor small n , but one can see that R-LPN prevails as n increases to 10. From this trend,we speculate that the advantage of R-LPN over the classical algorithms in terms of therobustness to classical noise can become greater as the problem size increases.Note that by increasing the number of epoch, the number of required samples can befurther reduced, at the cost of increasing the run time. (cid:15) -greedy Algorithm We also tested an (cid:15) -greedy algorithm as the policy for making an action via simulationwith n ≤

12. Here, ˜ s M that maximizes the reward is used to guess the quantum oraclewith a probability of 1 − (cid:15) , and a randomly guessed n -bit string is chosen as ˜ s M with a18 o f s a m p l e s R-LPNALBKWL o f s a m p l e s R-LPNALBKWL o f s a m p l e s R-LPNALBKWL

FIG. 8. Simulation results for the number of samples required for succeeding an LPN algorithmas a function of n for (a) η = 0, (b) η = 0 .

1, and (c) η = 0 .

2. Curves without symbols representthe simulation results for the R-LPN algorithm of this work. Simulation results of known classicalmethods listed in Table I are also plotted and indicated by squares for AL, triangles for BKW, andcircles for L. The number of epoch is n in all simulations in this ﬁgure. probability of (cid:15) . The simulation shows that the (cid:15) -greedy algorithm does not provide anynoticeable improvement. V. ROBUSTNESS TO PAULI ERRORS

Without loss of generality, we assume that the eigenstates of the σ z operator constitutethe computational basis. Then since the R-LPN algorithm performs the measurement in the σ z basis, it is not aﬀected by any error that eﬀectively appears as unwanted phase rotationsat the end of the quantum circuit.According to Ref. [2], when independent bit-ﬂip errors occur on the ﬁnal state with aprobability η x , a bit-wise majority vote on k post-selected bit strings gives an estimate ˆ s such that the error can be bounded as P (ˆ s (cid:54) = s ) < n exp( − kO (poly(1 / − η x ))). Whenthe R-LPN algorithm is completed, it outputs the same ﬁnal state as in Ref. [2] with highprobability. Hence the above result can be directly applied to our algorithm for bit-ﬂiperrors on the ﬁnal state. In this case, the algorithm needs to perform the bit-wise majority19 o f s a m p l e s R-LPNALBKWL o f s a m p l e s R-LPNALBKWL o f s a m p l e s R-LPNALBKWL

FIG. 9. Simulation results for the number of samples required for succeeding an LPN algorithmas a function of η for (a) n = 6, (b) n = 8, and (c) n = 10. Curves without symbols represent thesimulation results for the R-LPN algorithm. Simulation results of known classical methods listedin Tab. I are also plotted and indicated by squares for AL, triangles for BKW, and circles for L.The number of epoch is n in all simulations in this ﬁgure. vote at each cycle of querying a sample, increasing the total run time. Therefore, quantumnoise in the R-LPN algorithm that eﬀectively accumulates on the ﬁnal state as bit-ﬂiperrors with an error rate of η x < / O (log( n )poly(1 / (1 / − η x ))), while the sample complexity remains the same. VI. CONCLUSION

The quantum speed-up in the learning parity with noise problem diminishes in the absenceof the quantum oracle that provides a quantum state that encodes all possible examples insuperposition upon a query. We developed a quantum-classical hybrid algorithm for solvingthe LPN problem with classical examples. The LPN problem is particularly challenging asit requires the exact solution to be found. Our work demonstrates that the concept of vari-ational quantum algorithms can be extended for solving such problems. The reinforcementlearning signiﬁcantly reduces both the sample and the time cost of the quantum LPN algo-rithm in the absence of the quantum oracle. Simulations in the regime of small problem size,20.e., n ≤

12, show that our algorithm performs comparably or better than the classical algo-rithm that performs the best in this regime in terms of the sample complexity. The samplecost can be further reduced by increasing the number of epoch, at the cost of increasing therun time. In terms of the vulnerability to noise, our algorithm performs better than classicalalgorithms in this regime. Furthermore, time complexity can be reduced substantially, if aneﬃcient procedure for updating the quantum state is available.The ability to utilize quantum mechanical properties to enhance existing classical methodsfor learning from classical data is a signiﬁcant milestone towards practical quantum learning.In particular, whether the known advantage of oracle-based quantum algorithms can beretained in the absence of the quantum oracle is an interesting open problem. We showedthat for the LPN problem, quantum advantage can be achieved with the integration ofclassical reinforcement learning.Our results motivate future works to employ similar strategies to known oracle-basedquantum algorithms in order to extend their applicability to classical data. For example,extending the idea of the quantum-classical reinforcement learning to the learning with errorsproblem [31] would be an interesting future work. This work only considered classical noiseand a simple quantum noise model, and detailed studies on the eﬀects of actual experimentalquantum errors remains as the future work.

ACKNOWLEDGMENTS

We thank Suhwang Jeong and Jeongseok Ha for stimulating discussions. This work wassupported by the Ministry of Science and ICT, Korea, under an ITRC Program, IITP-2019-2018-0-01402, and by National Research Foundation of Korea (NRF-2019R1I1A1A01050161).

CONFLICT OF INTEREST

The authors declare that they have no conﬂict of interest. [1] L. G. Valiant. A theory of the learnable.

Commun. ACM , 27(11):1134–1142, Nov 1984.

2] Andrew W. Cross, Graeme Smith, and John A. Smolin. Quantum learning robust againstnoise.

Phys. Rev. A , 92:012327, Jul 2015.[3] Vadim Lyubashevsky.

The Parity Problem in the Presence of Noise, Decoding Random LinearCodes, and the Subset Sum Problem , volume 3624 of

Lecture Notes in Computer Science , pages378–389. Springer, Berlin, Heidelberg, 2005.[4] Oded Regev. On lattices, learning with errors, random linear codes, and cryptography. In

Proceedings of the Thirty-seventh Annual ACM Symposium on Theory of Computing , STOC’05, pages 84–93, New York, NY, USA, 2005. ACM.[5] Krzysztof Pietrzak. Cryptography from learning parity with noise. In M´aria Bielikov´a, Ger-hard Friedrich, Georg Gottlob, Stefan Katzenbeisser, and Gy¨orgy Tur´an, editors,

SOFSEM2012: Theory and Practice of Computer Science , pages 99–114, Berlin, Heidelberg, 2012.Springer Berlin Heidelberg.[6] Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, Man-Hong Yung, Xiao-Qi Zhou, Peter J.Love, Al´an Aspuru-Guzik, and Jeremy L. O’Brien. A variational eigenvalue solver on a pho-tonic quantum processor.

Nature Communications , 5:4213, 07 2014.[7] Jarrod R McClean, Jonathan Romero, Ryan Babbush, and Al´an Aspuru-Guzik. The theory ofvariational hybrid quantum-classical algorithms.

New Journal of Physics , 18(2):023023, Feb2016.[8] Jonathan Romero, Jonathan P Olson, and Alan Aspuru-Guzik. Quantum autoencoders foreﬃcient compression of quantum data.

Quantum Science and Technology , 2(4):045001, Aug2017.[9] Abhinav Kandala, Antonio Mezzacapo, Kristan Temme, Maika Takita, Markus Brink,Jerry M. Chow, and Jay M. Gambetta. Hardware-eﬃcient variational quantum eigensolverfor small molecules and quantum magnets.

Nature , 549:242, 09 2017.[10] Nikolaj Moll, Panagiotis Barkoutsos, Lev S Bishop, Jerry M Chow, Andrew Cross, Daniel JEgger, Stefan Filipp, Andreas Fuhrer, Jay M Gambetta, Marc Ganzhorn, Abhinav Kandala,Antonio Mezzacapo, Peter Mller, Walter Riess, Gian Salis, John Smolin, Ivano Tavernelli, andKristan Temme. Quantum optimization using variational algorithms on near-term quantumdevices.

Quantum Science and Technology , 3(3):030503, Jun 2018.[11] J. S. Otterbach, R. Manenti, N. Alidoust, A. Bestwick, M. Block, B. Bloom, S. Caldwell,N. Didier, E. Schuyler Fried, S. Hong, P. Karalekas, C. B. Osborn, A. Papageorge, E. C. eterson, G. Prawiroatmodjo, N. Rubin, Colm A. Ryan, D. Scarabelli, M. Scheer, E. A.Sete, P. Sivarajah, Robert S. Smith, A. Staley, N. Tezak, W. J. Zeng, A. Hudson, Blake R.Johnson, M. Reagor, M. P. da Silva, and C. Rigetti. Unsupervised Machine Learning on aHybrid Quantum Computer. arXiv:1712.05771, Dec 2017.[12] K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii. Quantum circuit learning. Phys. Rev. A ,98:032309, Sep 2018.[13] Maria Schuld, Alex Bocharov, Krysta Svore, and Nathan Wiebe. Circuit-centric quantumclassiﬁers. arXiv:1804.00633, Apr 2018.[14] Pierre-Luc Dallaire-Demers and Nathan Killoran. Quantum generative adversarial networks.

Phys. Rev. A , 98:012324, Jul 2018.[15] Thomas F¨osel, Petru Tighineanu, Talitha Weiss, and Florian Marquardt. Reinforcementlearning with neural networks for quantum feedback.

Phys. Rev. X , 8:031084, Sep 2018.[16] Christa Zoufal, Aur´elien Lucchi, and Stefan Woerner. Quantum Generative Adversarial Net-works for Learning and Loading Random Distributions. arXiv:1904.00043, Mar 2019.[17] Jonathan Romero and Alan Aspuru-Guzik. Variational quantum generators: Generative ad-versarial quantum machine learning for continuous distributions. arXiv:1901.00848, Jan 2019.[18] Jun Li, Xiaodong Yang, Xinhua Peng, and Chang-Pu Sun. Hybrid quantum-classical approachto quantum optimal control.

Phys. Rev. Lett. , 118:150503, Apr 2017.[19] Dana Angluin and Philip Laird. Learning from noisy examples.

Mach. Learn. , 2(4):343–370,Apr 1988.[20] Avrim Blum, Adam Kalai, and Hal Wasserman. Noise-tolerant learning, the parity problem,and the statistical query model.

J. ACM , 50(4):506–519, Jul 2003.[21] Ethan Bernstein and Umesh Vazirani. Quantum complexity theory.

SIAM J. Comput. ,26(5):1411–1473, Oct 1997.[22] ´Eric Levieil and Pierre-Alain Fouque.

An Improved LPN Algorithm , volume 4116 of

LectureNotes in Computer Science , pages 348–359. Springer, Berlin, Heidelberg, 2006.[23] Diego Rist`e, Marcus P. da Silva, Colm A. Ryan, Andrew W. Cross, Antonio D. C´orcoles,John A. Smolin, Jay M. Gambetta, Jerry M. Chow, and Blake R. Johnson. Demonstration ofquantum advantage in machine learning. npj Quantum Info. , 3(1):16, 2017.[24] E. Knill and R. Laﬂamme. Power of one bit of quantum information.

Phys. Rev. Lett. ,81:5672–5675, Dec 1998.

25] Daniel K. Park, June-Koo K. Rhee, and Soonchil Lee. Noise-tolerant parity learning with onequantum bit.

Phys. Rev. A , 97:032327, Mar 2018.[26] Vittorio Giovannetti, Seth Lloyd, and Lorenzo Maccone. Quantum random access memory.

Phys. Rev. Lett. , 100:160501, Apr 2008.[27] Vittorio Giovannetti, Seth Lloyd, and Lorenzo Maccone. Architectures for a quantum randomaccess memory.

Phys. Rev. A , 78:052310, Nov 2008.[28] Fang-Yu Hong, Yang Xiang, Zhi-Yan Zhu, Li-zhen Jiang, and Liang-neng Wu. Robust quan-tum random access memory.

Phys. Rev. A , 86:010306, Jul 2012.[29] Srinivasan Arunachalam, Vlad Gheorghiu, Tomas Jochym-OConnor, Michele Mosca, andPriyaa Varshinee Srinivasan. On the robustness of bucket brigade quantum RAM.

NewJournal of Physics , 17(12):123010, 2015.[30] Daniel K. Park, Francesco Petruccione, and June-Koo Kevin Rhee. Circuit-based quantumrandom access memory for classical data.

Scientiﬁc Reports , 9(1):3949, 2019.[31] Alex B. Grilo, Iordanis Kerenidis, and Timo Zijlstra. Learning-with-errors problem is easywith quantum samples.

Phys. Rev. A , 99:032314, Mar 2019., 99:032314, Mar 2019.