# Quantum machine learning with adaptive linear optics

QQuantum machine learning with adaptive linear optics

Ulysse Chabaud , Damian Markham , and Adel Sohbi Department of Computing and Mathematical Sciences, California Institute of Technology Université de Paris, IRIF, CNRS, France Laboratoire d’Informatique de Paris 6, CNRS, Sorbonne Université, 4 place Jussieu, 75005 Paris, France JFLI, CNRS, National Institute of Informatics, University of Tokyo, Tokyo, Japan School of Computational Sciences, Korea Institute for Advanced Study, Seoul 02455, Korea

We study supervised learning algorithmsin which a quantum device is used to per-form a computational subroutine—eitherfor prediction via probability estimation,or to compute a kernel via estimation ofquantum states overlap. We design imple-mentations of these quantum subroutinesusing Boson Sampling architectures in lin-ear optics, supplemented by adaptive mea-surements. We then challenge these quan-tum algorithms by deriving classical sim-ulation algorithms for the tasks of outputprobability estimation and overlap estima-tion. We obtain diﬀerent classical simula-bility regimes for these two computationaltasks in terms of the number of adaptivemeasurements and input photons. In bothcases, our results set explicit limits to therange of parameters for which a quantumadvantage can be envisaged with adaptivelinear optics compared to classical machinelearning algorithms: we show that thenumber of input photons and the numberof adaptive measurements cannot be si-multaneously small compared to the num-ber of modes. Interestingly, our analysisleaves open the possibility of a near-termquantum advantage with a single adaptivemeasurement.

Quantum computers promise dramatic advan-tages over their classical counterparts [1, 2], but afault-tolerant universal quantum computer is stillfar from being available [3]. The quest for near-term quantum speedup has thus led to the intro-

Ulysse Chabaud: [email protected] Sohbi: [email protected] duction of various subuniversal models—modelsthat are believed to have an intermediate com-putational power between classical and univer-sal quantum computing—such as Boson Sam-pling [4] or IQP circuits [5], recently culminatingwith the experimental demonstration of randomcircuit sampling [6] and Gaussian Boson Sam-pling [7].Finding practical applications for these subuni-versal models, other than the demonstration ofquantum speedup, is a timely issue, as it may en-able interesting quantum advantages in the era ofNoisy Intermediate-Scale Quantum devices [3].Recently, there has been an increased intereston the possibility of enhancing classical machinelearning algorithms using quantum computers [8],which includes the development of quantum neu-ral networks [9–18] and the development of quan-tum kernel methods [19–24].In particular, recent proposals have been drivenby subuniversal models such as Gaussian BosonSampling [25, 26] or IQP circuits [5, 19]. In thelatter, the authors considered supervised learningalgorithms in which some computational subrou-tines are executed in a quantum way, namely theestimation of the output probabilities of quantumcircuits, or the estimation of the overlap of theoutput states of quantum circuits. They showedthat IQP circuits alone could not provide a quan-tum advantage for these subroutines and there-fore considered minimal extensions of these cir-cuits, in terms of circuit depth.Hereafter, we study the use of Boson Samplinglinear optical interferometers [4], with input pho-tons, for similar quantum machine learning tasks.Instead of extending the depth which was shownto improve machine learning model performances[19] while making the training phase challenging[27–29], we allow for adaptive measurements—intermediate measurements that drive the rest of a r X i v : . [ qu a n t - ph ] F e b he computation—which provide a natural anal-ogy with the circuit depth in the linear optics pic-ture [30]: by encoding qubits into single-photonsand using a suﬃcient number of adaptive mea-surements, one can perform universal quantumcomputing [31].We give a detailed prescription for performingquantum machine learning with classical data, us-ing adaptive linear optics for computational sub-routines such as probability estimation and over-lap estimation.We also examine the classical simulability ofthese quantum subroutines. More precisely, wegive classical simulation algorithms whose run-times are explicitly dependent on: (i) the numberof modes m , (ii) the number of adaptive measure-ments k , (iii) the number of input photons n and(iv) the number of photons r detected during theadaptive measurements. This eﬀectively sets alimit on the range of parameters for which adap-tive linear optics may provide an advantage formachine learning over classical computers usingour methods, thus identifying the regimes wherea quantum advantage can be envisaged. BosonSampling instances [4] correspond to the casewith no adaptive measurement, while the Knill–Laﬂamme–Milburn scheme for universal quantumcomputing [31] corresponds to the case where thenumber of adaptive measurements scales linearlywith the size of the computation.For probability estimation, we show that theclassical simulation is eﬃcient whenever the num-ber of adaptive measurements or the number ofinput photons is constant. Moreover, the num-ber of input photons and the number of adaptivemeasurements cannot be simultaneously smallcompared to the number of modes (see Table 2).For overlap estimation, we show a similar be-haviour, although in this case our results do notrule out the possibility of a quantum advantagewith a single adaptive measurement (see Tables 3and 4). Our main technical contribution is an ex-pression for the inner product of the output statesof two adaptive unitary interferometers which isessentially independent of the number of adaptivemeasurements.The rest of the paper is organised as follows. Insection 2, we provide a background on quantummachine learning with classical data and classi-cal simulation of quantum computations. In sec-tion 3, we introduce the model of adaptive linear optics which we consider. We give a prescriptionfor performing probability estimation and overlapestimation with instances of this model, and de-tail how to use these as subroutines for machinelearning problems. In section 4, we derive twoclassical simulation algorithms, one for each ofthese two tasks, and we analyse the running timeof these algorithms. We conclude in section 5. We consider a typical machine learning problem,such as a classiﬁcation problem, where a classicaldataset D = { ~x , . . . , ~x |D| } from an input set X is given. One way to use quantum computers tosolve such problems is to encode classical dataonto quantum states such that there exists a so-called feature map ~x l

7→ | φ ( ~x l ) i which can beprocessed by a quantum computer. Deﬁnition 1 (Feature map [20]) . Let F be aHilbert space, called feature Hilbert space, X anon empty set, called input set, and ~x ∈ X . Afeature map is a map φ : X → F from inputs tovectors in the Hilbert space.

Many machine learning algorithms perform wellin linear cases such as the support vector ma-chines which will be used hereafter. However,many real world problems require non-linearityto make successful predictions. By using kernelmethods one can introduce non-linearity and useestimation methods that are linear in terms of thekernel evaluations.

Deﬁnition 2 (Kernel [20]) . Let X be an inputset. A function κ : X × X → C is called a kernelif for any ﬁnite subset D = { ~x , . . . , x |D| } with M ≥ the Gram matrix K with entries K l,l = κ ( ~x l , ~x l ) is positive semideﬁnite. The kernel corresponds to a dot product in a fea-ture space (here in a high-dimensional Hilbertspace). In [20], it is shown that the notions offeature map in Hilbert space and kernel can beconnected. A straightforward way is to deﬁne akernel K from a feature map φ as follows: κ ( ~x l , ~x l ) = h φ ( ~x l ) | φ ( ~x l ) i F , (1) here x m ∈ X , x m ∈ X and h·|·i F is the innerproduct over the Hilbert space F . There are two main ways to use feature Hilbertspaces:• The quantum variational classiﬁcation [19]or explicit approach [20]: the entire modelcomputation is preformed on a quantumdevice trained by a hybrid variationalquantum-classical or classical algorithm [14,15, 32]. In this case, the probability distri-bution over the possible outcomes is used forthe classiﬁcation. Hence, while the featuremap is used to encode the data, the kernel isnot known directly from the quantum device.In [19], the probability to obtain a certain bi-nary output b i for the classical data input x is given by: Pr ( b i ) = |h b i | W ( θ ) U φ ( x ) | ⊗ n i| , (2) where U φ ( x ) encodes the feature map and W ( θ ) corresponds to the quantum supportvector machine.• The quantum kernel estimation [19] or im-plicit approach [20]: a quantum-assistedmethod where the quantum device is onlyused to evaluate the kernel and the rest ofthe machine learning algorithm is carried onby a classical algorithm. In [19], the kernelbetween two classical data vectors x l and x l is given by: K l,l = h ⊗ n | U † φ ( x l ) U φ ( x l ) | ⊗ n i . (3) In order to challenge and justify the use of aquantum-assisted method, it is important to con-sider how hard it would be for a classical machineto compute the same quantities. Indeed, if theoutput probability or kernel could be estimateddirectly classically, there would be no need for aquantum computer for this task. Hence, when itcomes to classical hardness, the ﬁrst method re-quires that estimating the output probability inEq. (2) is hard, while the second method requiresthat estimating the overlap in Eq. (3) is hard (forclassical computers). We give more formal deﬁ-nitions for these classical simulation tasks in thefollowing section.

Depending on the approach used for simulat-ing classically the functioning of quantum de-vices, several notions of simulability are com-monly used. One could ask the classical simula-tion algorithm to mimic the output of the quan-tum computation [33, 34]: informally, a quantumcomputation is weakly simulable if there exists aclassical algorithm which outputs samples fromits output probability distribution in time poly-nomial in the size of the quantum computation.Various relaxations of this deﬁnition are possi-ble, allowing the classical sampling to be approx-imate rather than exact, or to abort with a smallprobability. The existence of such an eﬃcientclassical simulation has been ruled out for vari-ous subuniversal models of quantum computing,such as IQP circuits [5] or Boson Sampling [4],under complexity-theoretic conjectures—both inthe exact case and the approximate case, up toadditional conjectures.While weak simulation of quantum computa-tions is arguably the most commonly studied,other notions of classical simulation may be use-ful: if the output samples of a quantum computa-tion are used to compute a quantity which may becomputed eﬃciently classically by other means, itis no longer necessary to simulate the whole quan-tum device. We consider two concrete exampleswhich are prominent for variational quantum al-gorithms in quantum machine learning: probabil-ity estimation and overlap estimation [19, 20].

Deﬁnition 3 (Probability estimation) . Let P bea probability distribution over M outcomes andlet (cid:15), δ > . Given any outcome x in the sam-ple space of P , probability estimation refers tothe computational task of outputting an estimate ˜ P [ x ] such that (cid:12)(cid:12)(cid:12) ˜ P [ x ] − P [ x ] (cid:12)(cid:12)(cid:12) ≤ (cid:15), (4) with probability greater than − δ , in time O (poly ( M(cid:15) ) log δ ) . Eﬃcient probability estimation thus amounts tooutputting an estimate of the probability of aﬁxed outcome with polynomially small additiveerror, with exponentially small probability of fail-ure, in polynomial time in the size of the compu-tation. One may use the samples from a quantum omputation in order to perform eﬃciently prob-ability estimation for any given outcome: given aquantum device of size M which outputs samplesfrom some probability distribution and a ﬁxedoutcome x in the sample space, one may runthe device poly ( M ) times, recording the value whenever the outcome x is obtained and the value otherwise. Then, the frequency of the outcome x over the poly ( M ) uses of the quantum deviceis a polynomially precise additive estimate of theprobability of the outcome x with exponentiallysmall probability of failure, by virtue of Hoeﬀdinginequality [35].Weak simulation is at least as hard as proba-bility estimation, since by the previous reason-ing one may obtain polynomially precise addi-tive estimates of probabilities from samples ofthe probability distribution. Moreover, there aresome quantum computations for which weak sim-ulation is hard for classical computers (assum-ing widely believed conjectures from complex-ity theory), but probability estimation can bedone eﬃciently classically. This is the case forIQP circuits [5, 19], Boson Sampling interferom-eters [4] and even the circuit implementing theperiod-ﬁnding subroutine of Shor’s factoring al-gorithm [2]. We detail the latter case in Ap-pendix A, for the sake of clarifying the relationsbetween these diﬀerent notions of classical simu-lation. Note also that computing exactly outputprobabilities of quantum circuits or interferome-ters is a P -hard problem, therefore expected tobe hard classically and quantumly [36].A more general computational task than prob-ability estimation in the context of quantum com-puting is the following: Deﬁnition 4 (Overlap estimation) . Let | φ i and | ψ i be output states of two eﬃciently describablequantum computations of size M and let (cid:15), δ > .Overlap estimation refers to the computationaltask of outputting an estimate ˜ O such that (cid:12)(cid:12)(cid:12) ˜ O − | h φ | ψ i | (cid:12)(cid:12)(cid:12) ≤ (cid:15), (5) with probability greater than − δ , in time O (poly ( M(cid:15) ) log δ ) . The overlap between two quantum states is ameasure of their distinguishability [37] and over-lap estimation thus is related to quantum statediscrimination. Several techniques exist to per-form quantumly the overlap estimation of two states | φ i and | ψ i [38]. One of them is to per-form the swap test [39] with various copies of bothstates.Overlap estimation can be also done eﬃcientlyclassically for IQP circuits. Nonetheless, thisfamily of circuits has been identiﬁed as a promis-ing venue for implementing quantum machinelearning algorithms [19], when enlarged to con-tain similar circuits with bigger depth. Motivatedby this approach, we consider hereafter the case ofanother subuniversal model: Boson Sampling [4]with input photons, supplemented with adaptivemeasurements. In what follows, we study Boson Sampling archi-tectures [4], supplemented with a given numberof adaptive measurements—that is, some of themodes are measured throughout the computationand the rest of the computation can depend ontheir outcome—which we refer to as adaptive lin-ear optics. In this section, we derive quantum al-gorithms for performing probability and overlapestimation with adaptive linear optics, togetherwith a prescription for utilising these algorithmsas subroutines in supervised learning algorithms.

Hereafter, we detail the computational model ofadaptive linear optics which we consider. We ﬁrstintroduce a few notations.In the linear optics picture, the input stateswe consider are multimode photon number Fockstates over m modes (we use bold math for multi-index notations, see Table 1): | s i = 1 √ s ! ˆ a † s . . . ˆ a † s m m | i ⊗ m , (6) where s i and ˆ a † i are respectively the number ofphotons and the creation operator for the i th mode [40]. We identify these states with m -tuplesof integers s = ( s , . . . , s m ) ∈ N m . Following [4],let us deﬁne, for all n ∈ N , Φ m,n := { s = ( s , . . . , s m ) ∈ N m | | s | = n } . (7) This set corresponds to the m -mode Fock stateswith total number of photons equal to n , and wehave | Φ m,n | = (cid:0) m + n − n (cid:1) . s | s + · · · + s m s ! s ! · · · s m ! | s i | s . . . s m i s + t ( s + t , . . . , s m + t m ) m (0 , . . . , m (1 , . . . , Table 1: Multi-index notations. For m ∈ N ∗ , we write s = ( s , . . . , s m ) ∈ N m and t = ( t , . . . , t m ) ∈ N m . We consider a unitary interferometer of size m , described by an m × m unitary matrix U =( u ij ) ≤ i,j ≤ m . Unlike in the circuit picture, thematrix U does not act on the computational ba-sis, which is in this case the inﬁnite multimodeFock basis, but rather describes the linear evolu-tion of the creation operator of each mode. Moreprecisely, ˆ a † ... ˆ a † m U ˆ a † ... ˆ a † m = P mk =1 u k ˆ a † k ... P mk =1 u mk ˆ a † m . (8) We write ˆ U instead the unitary action of the in-terferometer on the multimode Fock basis. Be-cause the interferometer conserves the total num-ber of photons, for all p, q ∈ N , all s ∈ Φ m,p andall t ∈ Φ m,q h s | ˆ U | t i = 0 (9) whenever p = q . Let n ∈ N , s = ( s , . . . , s m ) ∈ Φ m,n and t = ( t , . . . , t m ) ∈ Φ m,n . CombiningEq. (6) and Eq. (8) we obtain [4] h s | ˆ U | t i = Per( U s , t ) √ s ! √ t ! , (10) where U s , t is the n × n matrix obtained from U byrepeating s i times its i th row and t j times its j th column for i, j = 1 , . . . , m , and where the perma-nent of a r × r matrix A = ( a ij ) ≤ i,j ≤ r is deﬁnedas Per A = X σ ∈S r r Y i =1 a iσ ( i ) , (11) where S r is the symmetric group over { , . . . , r } . We write Pr m,n [ . | t ] the probability distributionof the outputs over Φ m,n of the unitary interfer-ometer U acting on an input | t i . With the pre-vious notations we obtain, for all p, q ∈ N , all s ∈ Φ m,p and all t ∈ Φ m,q ,Pr m,n [ s | t ] = | Per( U s , t ) | s ! t ! δ pq , (12) where δ pq is the Kronecker symbol.In what follows, we ﬁx the input state | t i = | n m − n i , with single-photon states in the ﬁrst n modes and vacuum states in all other modes,where the superscript indicates the size of thestring (0 , . . . , or (1 , . . . , when there is a pos-sible ambiguity. We consider the case of linear op-tical quantum computing with adaptive photon-number measurements, which we refer to as adap-tive linear optics (see Fig. 1). We denote by k ∈ { , . . . , m } the number of single-mode adap-tive measurements. Without loss of generality,we assume that the ﬁrst k modes are measuredadaptively throughout the computation and wewrite p = ( p , . . . , p k ) the adaptive measurementoutcomes. For r ∈ N and p ∈ Φ k,r , let us deﬁne U p := [ k ⊕ U k ( p , . . . , p k )] . . . [ ⊕ U ( p )] U , (13) where j is the identity matrix of size j and wherethe unitary matrices U j depend on the measure-ment outcomes p , . . . , p j for all j ∈ { , . . . , k } .An adaptive interferometer U over m modes with n input photons and k adaptive measurements isthen represented as a family of nonadaptive uni-tary interferometers U := { U p | p ∈ Φ k,r , ≤ r ≤ n } , (14) for each possible adaptive measurement outcome p . The matrix U p describes the interferometer inFig. 1, when the adaptive measurement outcome p = ( p , . . . , p k ) ∈ Φ k,r has been obtained, where r is the number of photons detected during theadaptive measurements. In this case, the outputstate is a pure state which reads: Tr k h ( | p ih p | ⊗ m − k ) ˆ U p | t ih t | ˆ U p † i , (15) where the partial trace is over the ﬁrst k modesand where | p i denotes the k -mode Fock state | p . . . p k i . At the end of the computation, allthe remaining m − k modes are measured withphoton-number detection, yielding the ﬁnal out-come s = ( s , . . . , s m − k ) ∈ Φ m − k,n − r . .. ... ... . . .. . . . . . U k

Overlap estimation foradaptive linear optics

Input:

Adaptive interferometer U = { U r | r ∈ Φ k,r , ≤ r ≤ n } , adaptivemeasurement outcomes p , q . Parameters:

Input | t i and number ofshots T .Set the state preparation counter c sp = 0.Set the overlap counter c over = 0. while c sp < T do Run U on input | t i , obtaining adaptivemeasurement outcome r and outputstate | χ i . if r = p then c sp → c sp + 1.Run U q † on input | q i ⊗ | χ i .Measure in photon number basis.Record measurement outcome s . if s = t then c over → c over + 1, else c over → c over . elseif r = q then c sp → c sp + 1.Run U p † on input | p i ⊗ | χ i .Measure in photon number basis.Record measurement outcome s . if s = t then c over → c over + 1, else c over → c over . elsereturn c over /T . R d , y l ∈ C , ∀ l ∈ { . . . , |T |} and where C = {− , } in the case of binary classiﬁcation.We also consider a feature map that is givenby a subset of the family introduced in Eq. (14) : U φ = { U p l l } l , in which for each classical data x l there is a unitary operator U p l l . The goal of a support vector machine [41–43] isto ﬁnd the maximum-margin hyperplane that di-vides the set of points by their y values. Suchan hyperplane is deﬁned by a vector ~w ∈ R d and b ∈ R . With the hard-margin condition we have, y l ( ~w.~x l + b ) ≥ , ∀ l ∈ { . . . , |T |} . (18) n this case, a decision function over a new point ~x can be constructed directly from the hyperplaneas follows: f ( ~x ) = sign ( ~w.~x + b ) . (19) When we have access to a feature map φ , thevector ~w can be decomposed as ~w = |T | X l =1 α l y l φ ( x l ) , (20) for some α l ∈ C , and the decision function be-comes f ( ~x ) = sign |T | X l =1 α l y l κ ( ~x l , ~x ) + b , (21) where the kernel κ ( ~x l , ~x ) has an interpretation interms of the feature map φ as given in Eq. (1) .In order to ﬁnd the best separating hyper-plane, one may use a quadratic programmingsolver. In section 3.3.2, we consider the casewhere the programming solver is composed ofa hybrid quantum-classical algorithm whereas insection 3.3.3, such solver will be external to thequantum device. The optimisation program thatneeds to be solved is:maximize α l |T | X l =1 α l − X l,l α l y l α l y l κ ( ~x l , ~x l ) subject to |T | X l =1 α l y l = 00 ≤ α l ≤ / nλ, ∀ l ∈ { . . . , |T |} , (22) where λ is a parameter that is introduced for thesoft-margin condition. By solving such problem,one obtains the optimal coeﬃcients { α ∗ l } and b ∗ ofthe hyperplane (see Eq. (20) ). This formulationis for the quantum case , but similar methods canbe applied in the classical case [42]. The explicit method corresponds to the casewhere the prediction—assigning a classiﬁcationlabel to the data—is obtained from the probabil-ity distribution when using a quantum device.Hereafter, we consider using adaptive linearoptics for the feature map and use its output

Algorithm 2:

Explicit method: trainingphase

Input:

Training dataset T , feature maps U φ , optimisation routine with costfunction J ( ~θ ). Parameters:

Input | t i , number of shots T , initial parameters ~θ .Set ~θ = ~θ . while J ( ~θ ) not converged dofor l = 1 to |T | do Set c p l = 0.Set c y = 0 ∀ y ∈ C . while c p l < T do Run BS ( ~θ ) U p l l on | t i , obtainingadaptive measurementoutcome r and outcome label y . if r = p l then c p l → c p l + 1. c y → c y + 1. elseend while Compute Pr( y | p l ) = c y /T . end for Compute J ( ~θ ) and update ~θ . end whilereturn Value J ( ~θ ∗ ) and ﬁnal ~θ ∗ . state as an input state of a second quantum de-vice for the support vector machine [19]. Thechoice of the feature map ultimately depends onthe architecture at hand: one may use, e.g., re-conﬁgurable interferometers [44] in the linear op-tics picture. We show that Boson Sampling canbe used to realize a support vector machine al-gorithm by using a hybrid variational quantum-classical for the training phase [14, 15, 32]. Wewrite the Boson Sampling interferometer opera-tion BS ( ~θ ) , where ~θ are the parameters of thebeam splitters and phase shifters of the interfer-ometer, since any interferometer can be imple-mented eﬃciently with only phase shifters andbeam splitters [45]. In order to make a predic-tion we need to bin the outcomes. For a givenfunction g : Φ m,n → {− , +1 } , we can write thefollowing observable for the binning: N = X s ∈ Φ m,n g ( s ) | s i h s | . (23) The observable N can also be decomposed in lgorithm 3: Explicit method: predic-tion phase

Input:

Unlabeled data x , feature map U φ ( x ) , optimal parameters ~θ ∗ . Parameters:

Input | t i , number of shots T x .Set c p x = 0.Set c y = 0 ∀ y ∈ C . while c p x < T x do Run BS ( ~θ ∗ ) U φ ( x ) on | t i , obtainingadaptive measurement outcome r andoutcome label y . if r = p x then c p x → c p x + 1. c y → c y + 1. elseend while Compute Pr( y | p x ) = c y /T x . return argmax y { Pr( y | p x ) } . term of projectors as N = Π +1 − Π − , where Π y = ( + y N ) / .For a given data point ~x with the associatedoperation ˆ U p x x , the probability to obtain the out-come y is: Pr( y | p x ) = Tr h Π y BS ( ~θ ) ˆ U p x x | t i h t | ˆ U p x † x BS ( ~θ ) † i = 12 (cid:16) y h t | ˆ U p x † x BS ( ~θ ) † N BS ( ~θ ) ˆ U p x x | t i (cid:17) . (24) In the quantum circuit model, it has been explic-itly shown in [19] that the equivalent of Eq. (24) in the circuit picture is related to the decisionfunction of a support vector machine. This proofrelies on decomposing the variational quantumcircuit in the Pauli basis. Moreover, in [46], theauthors provide a quantum circuit that simulatesBoson Sampling interferometers with input pho-tons, using the quantum Schur transform. Hence,the unitary BS ( ~θ ) can be decomposed in thePauli basis (for qudits), thus providing the samerelation for Eq. (24) to the decision function of asupport vector machine.Experimentally, the probability Pr( y | p x ) canbe estimated using the following approximation Pr( y | p x ) ≈ T y | x T x , (25) where T x is the number of times where the value p x has been recorded and T y | x is the number of times where the value y has been recorded afterthat the value p x has been also recorded.In order to train the Boson Sampling device, itis necessary to use a cost function J ( ~θ ) and anoptimisation routine. For the optimisation rou-tine, standard variational optimisation methodscan be used such as hybrid variational quantum-classical or classical algorithms [14, 15, 32]. Dif-ferent cost functions can be used such as the em-pirical risk function [19] or the square-loss func-tion [20]. However, other type of cost functionmay not be suitable for this kind of problems,such as the ones usually used in gate synthesis orquantum state preparation [47, 48].The algorithm for the training phase is de-scribed in Algorithm 2 and the prediction phaseis described in Algorithm 3. Algorithm 4:

Implicit method: trainingphase

Input:

Dataset T , Feature maps U ,quadratic programming solver. Parameters:

Input | t i and number ofshots T . for l = 1 to |T | dofor l = 1 to |T | do Run U p l ⊗ Tl on input | t i ⊗ T .Run Algo. 1 with input U p l l , U p l l .Store output at kernel matrix entry K l,l = κ ( ~x l , ~x l ). endforendfor Solve the optimisation problem in Eq. (22)with T and K . return { α ∗ l } and b ∗ . The implicit method corresponds to the casewhere the kernel is computed by a quantum de-vice, while the rest of the machine learning algo-rithm is performed on a classical machine. Inadaptive linear optics, the kernel can be esti-mated using Algorithm 1. It is possible to pro-ceed to both the training and prediction phaseusing this hybrid quantum-classical method. Thealgorithm for the training phase is described inAlgorithm 4 and the prediction phase is describedin Algorithm 5. lgorithm 5: Implicit method: predic-tion phase

Input:

Dataset T , unlabeled data ~x ,optimal parameters { α ∗ l } and b Parameters:

Input | t i and number ofshots T .Set the variable f = b . for l = 1 to |T | do Run U p x ⊗ Tx on input | t i ⊗ T .Run Algo. 1 with parameters U p x ⊗ Tx | t i ⊗ T , U p l l . f → f + α l y l κ ( ~x l , ~x ). endforreturn Return sign( f ). The complexity of probability estimation andoverlap estimation of quantum computations hasbeen well studied in the circuit model [49, 50].In this section, we challenge the quantum al-gorithms introduced in the previous section byderiving classical algorithms for probability esti-mation and overlap estimation for adaptive lin-ear optics over m modes. We identify variouscomplexity regimes for diﬀerent numbers of in-put photons n ≤ m , diﬀerent number of adap-tive measurements k ≤ m , and diﬀerent numberof photons r ≤ n detected during the adaptivemeasurements. To do so, we derive generic ex-pressions for the output probabilities and for theoverlap of output states of linear optical interfer-ometers with adaptive measurements. Firstly, we obtain a classical algorithm for prob-ability estimation of adaptive linear optics over m modes with n input photons and k adaptivemeasurements.We ﬁrst consider the case k = 0 , i.e., Bo-son Sampling. The probability of an outcome s ∈ Φ m,n for a unitary interferometer U given theinput t = ( n , m − n ) ∈ Φ m,n is given by Eq. (12) :Pr m,n [ s ] = 1 s ! | Per ( U s , t ) | , (26) where U s , t is the n × n matrix obtained from U by repeating s i times its i th row for i ∈{ , . . . , m } and removing its j th column for j = { n + 1 , . . . , m } . When | s | 6 = n however, the prob-ability is , since the input t has n photons andthe linear interferometer does not change the to-tal number of photons.The permanent of a square matrix of size n canbe computed exactly in time O ( n n ) , thanks toRyser’s formula [51]. However, polynomially pre-cise estimates of the permanent of a square uni-tary matrix can be obtained in polynomial timein the size of the matrix using an algorithm dueto Gurvits [52], later generalised to matrices withrepeated lines or columns [53], so probability es-timation can be done classically eﬃciently, whichwas already noted in [4].We now turn to the case k > —which to ourknowledge has not been treated elsewhere—usingthe notations of section 3.1. This case is a directextension of the case k = 0 . With Eq. (12) , for r ∈ N , p ∈ Φ k,r and s ∈ Φ m − k,n − r , the probabil-ity of an total outcome ( p , s ) ∈ Φ m,n (adaptivemeasurement and ﬁnal outcome) is given byPr total m,n [ p , s ] = 1 p ! s ! (cid:12)(cid:12)(cid:12) Per (cid:16) U p ( p , s ) , t (cid:17)(cid:12)(cid:12)(cid:12) . (27) Let r ∈ { , . . . , n } and let s ∈ Φ m − k,n − r . Then,the probability of obtaining the ﬁnal outcome s after the adaptive measurements readsPr ﬁnal m,n [ s ] = X p ∈ Φ k,r Pr total m,n [ p , s ]= 1 s ! X p ∈ Φ k,r p ! (cid:12)(cid:12)(cid:12) Per (cid:16) U p ( p , s ) , t (cid:17)(cid:12)(cid:12)(cid:12) . (28) The sum is taken over the elements of Φ k,r , whichhas (cid:0) k + r − r (cid:1) elements, where r ≤ n is the totalnumber of photons detected during the adaptivemeasurements. Each permanent in the sum canbe estimated additively with polynomial preci-sion in time O (poly m ) , using the generalised al-gorithm from [53] (see Appendix B for details).Hence, probability estimation can be done classi-cally eﬃciently whenever the sum has a polyno-mial number of terms. In particular, as long asboth k and r are O (log m ) , the output probabil-ity can be estimated eﬃciently. The simulabilityregimes are summarised in Table 2.The universal quantum computing regime cor-responds to n = O ( m ) and k = O ( m ) [31].The time complexity of the classical simulation is O (cid:16)(cid:0) k + r − r (cid:1) poly m (cid:17) and there is a possibility ofsubuniversal quantum advantage for probability k O (1) O (log m ) O ( m ) O (1) O (log m ) O ( m ) Table 2: Simulability regimes for probability estimationas a function of the parameters r (the total numberof photons detected during the adaptive measurements)and k (the number of single-mode adaptive measure-ments). In green is the parameter regime for whichthe classical probability estimation is eﬃcient, i.e., takespolynomial time in m , while in red is the regime whereit is no longer eﬃcient. estimation for n = O (log m ) and k = O ( m ) , or n = O ( m ) and k = O (log m ) . Moreover, the frac-tion rn of input photons detected during the adap-tive measurements has to be suﬃciently large toprevent eﬃcient classical simulation. In this section, we obtain a classical algorithm foroverlap estimation of output states of adaptivelinear optical interferometers over m modes with n input photons and k adaptive measurements.Once again, we start with k = 0 . WithEq. (10) , the output state of an m -mode inter-ferometer U with input state t ∈ Φ m,n reads | φ i = X s ∈ Φ m,n h s | ˆ U | t i | s i = X s ∈ Φ m,n Per ( U s , t ) √ s ! t ! | s i , (29) where U s , t is the n × n matrix obtained from U by repeating s i times its i th row for i ∈{ , . . . , m } and repeating t j times its j th row for j ∈ { , . . . , m } . The composition of two inter-ferometers is another interferometer which uni-tary representation is the product of the unitaryrepresentations of the composed interferometers.Hence, the inner product of the output states | φ i and | ψ i of two m -mode interferometers U and V with the same input state t ∈ Φ m,n , is equal to the matrix element t , t of ˆ U † ˆ V : h φ | ψ i = X u , v ∈ Φ m,n h t | ˆ U † | u i h v | ˆ V | t i h u | v i = X s ∈ Φ m,n h t | ˆ U † | s i h s | ˆ V | t i = h t | ˆ U † ˆ V | t i = Per h ( U † V ) t , t i t ! , (30) where we used in the third line t ∈ Φ m,n and thefact that ˆ U † ˆ V conserves the space Φ m,n . Withthe input t = ( n , m − n ) with n photons in m modes, this reduces to h φ | ψ i = Per h ( U † V ) n i , (31) where ( U † V ) n is the n × n top left submatrix of U † V . Hence, the inner product and the overlapmay be approximated to a polynomial precisioneﬃciently, since this is the case for the perma-nent [52, 53], as detailed in the previous section.We now consider the case k > . Let r ∈ N andlet p ∈ Φ k,r . With Eq. (10) and Eq. (15) , writingPr adap m,n [ p ] the probability of obtaining the adap-tive measurement outcome p , the output state ofthe interferometer U p with k adaptive measure-ments with input t = ( n , m − n ) in Fig. 1, whenthe adaptive measurement outcome p is obtained,reads q Pr adap m,n [ p ] | ψ p i , (32) where | ψ p i := X s ∈ Φ m − k,n − r Per (cid:16) U p ( p , s ) , t (cid:17) √ p ! s ! | s i , (33) and where Pr adap m,n [ p ] = h ψ p | ψ p i . The inner prod-uct of two (not normalised) output states | ψ p i and | ψ q i of m -mode interferometers U p and V q with k adaptive measurements is zero if | p | 6 = | q | ,and when | p | = | q | = r , it is thus given by h ψ p | ψ q i = 1 √ p ! q ! × X s ∈ Φ m − k,n − r s ! Per (cid:16) U p † t , ( p , s ) (cid:17) Per (cid:16) V q ( q , s ) , t (cid:17) . (34) This expression is a sum of | Φ m − k,n − r | terms,which is exponential in m whenever n is not con-stant. Using properties of the permanent [54], we k O (1) O (log m ) O ( m ) O (1) O (log m ) · · · O ( m ) · · · · · · Table 3: Simulability regimes for classical overlap esti-mation as a function of the parameters n and k . In greenis the parameter regime for which the classical overlapestimation may be eﬃcient (i.e., may take polynomialtime in m ) depending on the value of r , and in red isthe regime where the classical algorithm is no longer ef-ﬁcient. The cells containing a symbol “ · · · ” correspondto parameter regimes for which the quantum algorithmfor estimating the overlap is no longer eﬃcient. show that the expression in Eq. (34) may actu-ally be rewritten as a sum over fewer terms, whichconstitutes our main technical result: Lemma 1.

Let r ∈ N . The inner product of two(not normalised) output states | ψ p i and | ψ q i of m -mode interferometers U p and V q with adap-tive measurements outcome p , q ∈ Φ k,r is givenby h ψ p | ψ q i = 1 √ p ! q ! × X i , j ∈{ , } n | i | = | j | = r Per (cid:16) A i (cid:17) Per (cid:16) B j (cid:17) Per (cid:16) C i , j (cid:17) , (35) where for all i , j ∈ { , } n such that | i | = | j | = r , A i = U p † ( i , m − n ) , ( p , m − k ) (36) is an r × r matrix which can be obtained eﬃcientlyfrom U p , B j = V q ( q , m − k ) , ( j , m − n ) (37) is an r × r matrix which can be obtained eﬃcientlyfrom V q , and C i , j = U p † ( n − i , m − n ) , ( k , m − k ) V q ( k , m − k ) , ( n − j , m − n ) (38) is an ( n − r ) × ( n − r ) matrix which can be obtainedeﬃciently from U p and V q . We give a detailed proof in Appendix C. ByLemma 1, the inner product is expressed as the n r O (1) O (log n ) O ( n ) O (1) O (log m ) O ( m ) Table 4: Simulability regimes for classical overlap esti-mation as a function of the parameters n (total numberof input photons) and r (total number of photons de-tected in the adaptive measurements). The regimes areobtained from the running time of the classical algorithmusing Stirling’s equivalent n ! ∼ √ πn ( ne ) n . modulus squared of a sum over (cid:0) nr (cid:1) productsof three permanents, of square matrices of sizes | p | = r , | q | = r and ( n − r ) , respectively. In theworst case, when r = n/ , the sum has at most O (4 n ) terms, up to a polynomial factor in n . Inparticular, when n = O (log m ) or r = O (1) , theinner product reduces to a sum of a polynomialnumber of terms, which can all be computed intime O (poly m ) with Ryser’s formula [51]. Aninteresting fact is that the cost of computing theinner product does not depend explicitely on thenumber k > of adaptive measurements. How-ever, it does depend explicitly on r the total num-ber of photons detected during the adaptive mea-surements, and a larger number of adaptive mea-surements k favors the detection of a larger num-ber of photons r .The overlap of normalised ouput states is givenby | h ψ p | ψ q i | h ψ p | ψ p i h ψ q | ψ q i , (39) which may also be computed eﬃciently when n = O (log m ) . In this case, the classical algo-rithm for overlap estimation simply computes theabove expression, using Lemma 1 for each of theinner products. The running time of this classi-cal algorithm thus is O ( (cid:0) nr (cid:1) poly m ) and its eﬃ-ciency is summarised as a function of n and k inTable 3 and as a function of n and r in Table 4.Since the quantum eﬃcient regime correspondsto (cid:0) k + nn (cid:1) = O (poly m ) there is a possibility ofquantum advantage for overlap estimation when k = O (1) and n = O ( m ) . However, like for prob-ability estimation, the fraction rn of input photonsdetected during the adaptive measurements has o be suﬃciently large to prevent eﬃcient clas-sical simulation. In this case, when the numberof adaptive measurements satisﬁes k = O (1) , theinterferometers used must concentrate many pho-tons on the adaptive measurements. In this work, we have given a roadmap for per-forming quantum variational classiﬁcation andquantum kernel estimation using adaptive linearoptical interferometers. We have investigated thetransition of classical simulation between BosonSampling [4] and the Knill–Laﬂamme–Milburnscheme for universal quantum computing [31], interms of the number of adaptive measurementsperformed. In particular, we have derived classi-cal algorithms for simulating the quantum com-putational subroutines involved: output proba-bility estimation and output state overlap esti-mation.In the case of probability estimation, the pos-sible regimes for quantum advantage are incom-patible with near-term implementations: boththe number of adaptive measurements k and thenumber of input photons n must be greater than log m , where m is the number of modes. On theother hand, for overlap estimation, there is a pos-sibility of near-term computational advantage us-ing adaptive linear optics with a single adaptivemeasurement, requiring the preparation of pho-ton number states. Note that the interferome-ter should be concentrating many photons r atthe stage of the adaptive measurements in orderto obtain overlaps that are possibly hard to es-timate. Using more adaptive measurements doesnot increase the complexity (apart from polyno-mial factors in m ), but may ease the detection ofa larger number of photons during the adaptivemeasurements.Our results suggest regimes where quantum ad-vantage for machine learning with adaptive lin-ear optics is possible, in the parameter regimeswhere our classical simulation algorithms fail tobe eﬃcient: it is an interesting open questionwhether better classical simulation algorithms forthe computational subroutines involved can befound, or even classical algorithms solving di-rectly the machine learning problems eﬃciently.In any case, our results restrict the parameterregimes for which such a quantum advantage may be possible: support vector machine with adap-tive linear optics has to take place either in abunching regime—concentrating many photonsin the adaptive measurements—or using a num-ber of adaptive measurements scaling with thesize of the problem. In both cases, this imposesstrong experimental requirements.Our results have identiﬁed a sweet spot forquantum kernel estimation with adaptive lin-ear optics using a single adaptive measurement,which would be interesting to demonstrate exper-imentally. In the longer term, variational classi-ﬁcation with adaptive linear optics could also beinteresting since it may enable quantum advan-tage in a regime where the quantum algorithmfor overlap estimation is no longer eﬃcient.As previously mentioned, it is a pressing ques-tion whether more eﬃcient classical algorithmsmay be derived. In practical settings, taking intoaccount photon losses could help providing moreeﬃcient classical simulation algorithms [55]. Weleave these considerations for future work. Acknowledgements

AS has been supported by a KIAS individualgrant (CG070301) at Korea Institute for Ad-vanced Study. DM acknowledges support fromthe ANR through project ANR-17-CE24-0035VanQuTe.

References [1] R. P. Feynman, “Simulating physics withcomputers,”

Int. J. Theor. Phys , (1982).[2] P. W. Shor, “Algorithms for quantumcomputation: discrete logarithms andfactoring,” in Proceedings 35th annualsymposium on foundations of computerscience , pp. 124–134, IEEE. 1994.[3] J. Preskill, “Quantum Computing in theNISQ era and beyond,”

Quantum , 79(2018).[4] S. Aaronson and A. Arkhipov, “Thecomputational Complexity of LinearOptics,” Theory of Computing , 143(2013).[5] M. J. Bremner, R. Josza, and D. Shepherd,“Classical simulation of commutingquantum computations implies collapse of he polynomial hierarchy,” Proc. R. Soc. A , 459 (2010).[6] F. Arute, K. Arya, R. Babbush, D. Bacon,J. C. Bardin, R. Barends, R. Biswas,S. Boixo, F. G. Brandao, D. A. Buell, et al. ,“Quantum supremacy using aprogrammable superconducting processor,”

Nature , 505–510 (2019).[7] H.-S. Zhong, H. Wang, Y.-H. Deng, M.-C.Chen, L.-C. Peng, Y.-H. Luo, J. Qin,D. Wu, X. Ding, Y. Hu, et al. , “Quantumcomputational advantage using photons,”

Science , 1460–1463 (2020).[8] A. W. Harrow, A. Hassidim, and S. Lloyd,“Quantum algorithm for linear systems ofequations,”

Physical Review Letters ,150502 (2009).[9] M. Schuld, I. Sinayskiy, and F. Petruccione,“The quest for a Quantum NeuralNetwork,”

Quantum Information Processing , 2567–2586 (2014).[10] J. Romero, J. P. Olson, andA. Aspuru-Guzik, “Quantum autoencodersfor eﬃcient compression of quantum data,” Quantum Science and Technology , 045001(2017).[11] C. Ciliberto, M. Herbster, A. D. Ialongo,M. Pontil, A. Rocchetto, S. Severini, andL. Wossnig, “Quantum machine learning: aclassical perspective,” Proceedings of theRoyal Society A: Mathematical, Physicaland Engineering Sciences , 20170551(2018).[12] V. Dunjko and H. J. Briegel, “Machinelearning & artiﬁcial intelligence in thequantum domain: a review of recentprogress,”

Reports on Progress in Physics , 074001 (2018).[13] C. Zoufal, A. Lucchi, and S. Woerner,“Quantum Generative Adversarial Networksfor learning and loading randomdistributions,” npj Quantum Information ,103 (2019).[14] N. Killoran, T. R. Bromley, J. M. Arrazola,M. Schuld, N. Quesada, and S. Lloyd,“Continuous-variable quantum neuralnetworks,” Phys. Rev. Research , 033063(2019).[15] E. Farhi and H. Neven, “Classiﬁcation withQuantum Neural Networks on Near TermProcessors,” arXiv:1802.06002 . [16] K. Beer, D. Bondarenko, T. Farrelly, T. J.Osborne, R. Salzmann, D. Scheiermann,and R. Wolf, “Training deep quantumneural networks,” Nature Communications , 808 (2020).[17] A. Abbas, D. Sutter, C. Zoufal, A. Lucchi,A. Figalli, and S. Woerner, “The power ofquantum neural networks,” arXiv:2011.00027 .[18] M. Cerezo, A. Arrasmith, R. Babbush,S. C. Benjamin, S. Endo, K. Fujii, J. R.McClean, K. Mitarai, X. Yuan, L. Cincio,and P. J. Coles, “Variational QuantumAlgorithms,” arXiv:2012.09265 .[19] V. Havlíček, A. D. Córcoles, K. Temme,A. W. Harrow, A. Kandala, J. M. Chow,and J. M. Gambetta, “Supervised learningwith quantum-enhanced feature spaces,” Nature , 209 (2019).[20] M. Schuld and N. Killoran, “Quantummachine learning in feature Hilbert spaces,”

Physical review letters , 040504 (2019).[21] C. Blank, D. K. Park, J.-K. K. Rhee, andF. Petruccione, “Quantum classiﬁer withtailored quantum kernel,” npj QuantumInformation , 41 (2020).[22] K. Bartkiewicz, C. Gneiting, A. Černoch,K. Jiráková, K. Lemr, and F. Nori,“Experimental kernel-based quantummachine learning in ﬁnite feature space,” Scientiﬁc Reports , 12356 (2020).[23] Y. Liu, S. Arunachalam, and K. Temme, “Arigorous and robust quantum speed-up insupervised machine learning,” arXiv:2010.02174 .[24] H.-Y. Huang, M. Broughton, M. Mohseni,R. Babbush, S. Boixo, H. Neven, and J. R.McClean, “Power of data in quantummachine learning,” arXiv:2011.01938 .[25] C. S. Hamilton, R. Kruse, L. Sansoni,S. Barkhofen, C. Silberhorn, and I. Jex,“Gaussian boson sampling,” Physical reviewletters , 170501 (2017).[26] M. Schuld, K. Brádler, R. Israel, D. Su, andB. Gupt, “Measuring the similarity ofgraphs with a Gaussian boson sampler,”

Physical Review A , 032314 (2020).[27] J. R. McClean, S. Boixo, V. N.Smelyanskiy, R. Babbush, and H. Neven,“Barren plateaus in quantum neural etwork training landscapes,” NatureCommunications , (2018).[28] M. Cerezo, A. Sone, T. Volkoﬀ, L. Cincio,and P. J. Coles, “Cost-Function-DependentBarren Plateaus in Shallow QuantumNeural Networks,” arXiv:2001.00550 .[29] S. Wang, E. Fontana, M. Cerezo,K. Sharma, A. Sone, L. Cincio, and P. J.Coles, “Noise-Induced Barren Plateaus inVariational Quantum Algorithms,” arXiv:2007.14384 .[30] H. J. Briegel, D. E. Browne, W. Dür,R. Raussendorf, and M. Van den Nest,“Measurement-based quantumcomputation,” Nature Physics , 19–26(2009).[31] E. Knill, R. Laﬂamme, and G. J. Milburn,“A scheme for eﬃcient quantumcomputation with linear optics,” Nature , 46–52 (2001).[32] K. Mitarai, M. Negoro, M. Kitagawa, andK. Fujii, “Quantum circuit learning,”

Phys.Rev. A , 032309 (2018).[33] B. M. Terhal and D. P. DiVincenzo,“Classical simulation ofnoninteracting-fermion quantum circuits,” Physical Review A , 032325 (2002).[34] H. Pashayan, S. D. Bartlett, and D. Gross,“From estimation of quantum probabilitiesto simulation of quantum circuits,” Quantum , 223 (2020).[35] W. Hoeﬀding, “Probability inequalities forsums of bounded random variables,” Journal of the American statisticalassociation , 13–30 (1963).[36] M. Van Den Nest, “Classical simulation ofquantum computation, the Gottesman-Knilltheorem, and slightly beyond,” QuantumInformation & Computation , 258–271(2010), arXiv:0811.0898 .[37] D. Dieks, “Overlap and distinguishability ofquantum states,” Physics Letters A ,303–306 (1988).[38] M. Fanizza, M. Rosati, M. Skotiniotis,J. Calsamiglia, and V. Giovannetti,“Beyond the swap test: optimal estimationof quantum state overlap,”

Physical ReviewLetters , 060503 (2020).[39] H. Buhrman, R. Cleve, J. Watrous, andR. De Wolf, “Quantum ﬁngerprinting,”

Physical Review Letters , 167902 (2001). [40] U. Leonhardt, Essential Quantum Optics .Cambridge University Press, Cambridge,UK, 1st ed., 2010.[41] V. N. Vapnik,

The Nature of StatisticalLearning Theory . Springer-Verlag, Berlin,Heidelberg, 1995.[42] C. J. Burges, “A Tutorial on SupportVector Machines for Pattern Recognition,”

Data Mining and Knowledge Discovery ,121–167 (1998).[43] G. James, D. Witten, T. Hastie, andR. Tibshirani, An Introduction to StatisticalLearning: With Applications in R . SpringerPublishing Company, Incorporated, 2014.[44] A. Politi, J. C. Matthews, M. G.Thompson, and J. L. O’Brien, “Integratedquantum photonics,”

IEEE Journal ofSelected Topics in Quantum Electronics ,1673–1684 (2009).[45] M. Reck, A. Zeilinger, H. J. Bernstein, andP. Bertani, “Experimental realization of anydiscrete unitary operator,” Physical reviewletters , 58 (1994).[46] A. E. Moylett and P. S. Turner, “Quantumsimulation of partially distinguishableboson sampling,” Phys. Rev. A , 062329(2018).[47] J. M. Arrazola, T. R. Bromley, J. Izaac,C. R. Myers, K. Brádler, and N. Killoran,“Machine learning method for statepreparation and gate synthesis on photonicquantum computers,” arXiv:1807.10781 .[48] K. Heya, Y. Suzuki, Y. Nakamura, andK. Fujii, “Variational Quantum GateOptimization,” arXiv:1810.12745 .[49] H. Pashayan, J. J. Wallman, and S. D.Bartlett, “Estimating outcome probabilitiesof quantum circuits usingquasiprobabilities,” Physical review letters , 070501 (2015).[50] S. Bravyi, D. Gosset, and R. Movassagh,“Classical algorithms for quantum meanvalues,” arXiv:1909.11485 .[51] N. Albert and S. Wilf Herbert,

Combinatorial algorithms: for computersand calculators . Academic Press, 1978.[52] L. Gurvits, “On the complexity of mixeddiscriminants and related problems,” in

International Symposium on MathematicalFoundations of Computer Science ,pp. 447–458, Springer. 2005.

53] S. Aaronson and T. Hance, “Generalizingand derandomizing Gurvits’sapproximation algorithm for thepermanent,” arXiv:1212.0025 .[54] J. K. Percus,

Combinatorial methods , vol. 4.Springer Science & Business Media, 2012. [55] R. García-Patrón, J. J. Renema, andV. Shchesnovich, “Simulating bosonsampling in lossy architectures,”

Quantum , 169 (2019). ppendix A Classical probability estimation for Shor’s period-ﬁnding algorithm

In this section, we detail the classical probability estimation for Shor’s period-ﬁnding algorithm [2].If N is an n bits integer to factor, the period-ﬁnding subroutine measures the output state N X x X y e iπxyN | y i | f ( x ) i (40) in the computational basis, where f is a periodic function over { , . . . , N − } which can be evaluatedeﬃciently. The probability of obtaining an outcome y , f ( x ) is given by Pr [ y , f ( x )] = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N X f ( x )= f ( x ) e iπxy N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (41) Now let g x ,y : x e iπxy N if f ( x ) = f ( x ) , otherwise. (42) The function g x ,y can be evaluated eﬃciently and we have Pr [ y , f ( x )] = (cid:12)(cid:12)(cid:12)(cid:12) E x ← N [ g x ,y ( x )] (cid:12)(cid:12)(cid:12)(cid:12) , (43) where E x ← N denotes the expected value for x drawn uniformly randomly from { , . . . , N − } . By virtueof Hoeﬀding inequality, this quantity may be estimated eﬃciently (in n , the number of bits of N )classically by sampling uniformly a polynomial number of values in { , . . . , N − } and computing themodulus squared of the mean of g x ,y over these values.However, note that—the decision version of—probability estimation of quantum circuits is a BQP -complete problem almost by deﬁnition, since given a polynomially precise estimate of the probabilityof acceptance of an input x to a quantum circuit, one may determine whether it is accepted or rejectedby the circuit. In particular, unless factoring is in P , probability estimation for the quantum circuitcorresponding to Shor’s algorithm as a whole is hard for classical computers, and weak simulationof the period-ﬁnding subroutine is also hard, since in Shor’s algorithm the output samples from theperiod-ﬁnding subroutine are used for a diﬀerent classical computation than probability estimation,namely obtaining promising candidates for the period. B Eﬃciency of classical output probability estimation

For an interferometer U over m modes with n input photons and k adaptive measurements withoutcomes p ∈ Φ k,r , the expression of the probability of an outcome s obtained in the main text reads:Pr ﬁnal m,n [ s ] = 1 s ! X p ∈ Φ k,r p ! (cid:12)(cid:12)(cid:12) Per (cid:16) U p ( p , s ) , t (cid:17)(cid:12)(cid:12)(cid:12) , (44) where t = ( n , m − n ) . This is a sum over | Φ k,r | = (cid:0) k + r − r (cid:1) moduli squared of permanents of squarematrices of size n ≤ m . Permanents can be approximated eﬃciently using a simple randomizedalgorithm due to Gurvits: Lemma 2 ([52]) . Let A be an m × m matrix. Then, Per A may be estimated classically with additiveprecision ± (cid:15) k A k m with high probability, in time O ( m (cid:15) ) , where k A k is the largest singular value of A . his algorithm has been reﬁned in the case of matrices with repeated lines or repeated columns: Lemma 3 ([53]) . Let B be an m × n matrix. Let q = ( q , . . . , q m ) ∈ Φ m,n and let A = B q , n bethe n × n matrix obtained from B by repeating q i times its i th line. Then, Per A may be estimatedclassically with additive precision ± (cid:15) · q ! ...q m ! √ q q ··· q qmm k B k m with high probability, in time O ( mn(cid:15) ) , where k B k is the largest singular value of B . Moreover, | Per A | ≤ q ! . . . q m ! q q q · · · q q m m k B k m . (45) This lemma also allows to approximate eﬃciently | Per A | . Indeed, let < (cid:15) < and let z be anestimate of Per A with additive error ± (cid:15) · q ! ...q m ! √ q q ··· q qmm k B k m , then | z | is a good estimate of | Per A | : (cid:12)(cid:12)(cid:12) | z | − | Per A | (cid:12)(cid:12)(cid:12) = || z | − | Per A || · ( | z | + | Per A | ) ≤ (cid:15) · q ! . . . q m ! q q q · · · q q m m k B k m · | Per A | + (cid:15) · q ! . . . q m ! q q q · · · q q m m + | Per A | ≤ (cid:15) · q ! . . . q m ! q q q · · · q q m m k B k m · (cid:15) · q ! . . . q m ! q q q · · · q q m m + 2 q ! . . . q m ! q q q · · · q q m m k B k m ≤ (cid:15) · q ! . . . q m ! q q · · · q q m m k B k m , (46) where we used Eq. (45) in the third line.In particular, when B is an m × n submatrix of a unitary matrix (implying k B k ≤ ), we may computean estimate of | Per A | , where A = B q , n is the n × n matrix obtained from B by repeating q i timesits i th line, with additive precision ± (cid:15) · q ! ...q m ! q q ··· q qmm with high probability, in time O ( mn(cid:15) ) . The matricesin Eq. (44) are submatrices of unitary matrices, with repeated lines. Hence, estimating independentlyall the terms in the sum in Eq. (44) , we obtain, in time O ( mn(cid:15) · | Φ k,r | ) and with high probability, anestimate ˜ P of the probability Pr ﬁnal m,n [ s ] such that (cid:12)(cid:12)(cid:12) ˜ P − Pr ﬁnal m,n [ s ] (cid:12)(cid:12)(cid:12) ≤ (cid:15) · s ! . . . s m − k ! X p ∈ Φ k,r p ! . . . p k ! · p ! . . . p k ! s ! . . . s m − k ! p p · · · p p k k s s · · · s s m − k m − k ≤ (cid:15) · s ! . . . s m − k ! s s · · · s s m − k m − k X p ∈ Φ k,r p ! . . . p k ! p p · · · p p k k ≤ (cid:15) · | Φ k,r | , (47) where we used that for all n ∈ N , n ! ≤ n n . Note that a tighter bound may be obtained, but theabove one is suﬃcient for our needs. In particular, when | Φ k,r | = O (poly m ) , the above procedureprovides a polynomially precise additive estimate of the probability Pr ﬁnal m,n [ s ] in time O (poly m ) , withhigh probability.We have | Φ k,r | = k + r − r ! = kk + r · ( k + r )! k ! r ! . (48) This quantity is polynomial in k (resp. in r ) when r = O (1) (resp. k = O (1) ). Moreover, k + r − r ! ≤ k + r − X j =0 k + r − j ! = 2 k + r − , (49) o for k = O (log m ) and r = O (log m ) , we have | Φ k,r | = O (poly m ) . For all other cases, i.e. k = O (log m ) and r = O ( m ) , or k = O ( m ) and r = O (log m ) , or k = O ( m ) and r = O ( m ) , | Φ k,r | issuperpolynomial in m , using Stirling’s equivalent n ! ∼ √ πn ( ne ) n . C Proof of Lemma 1

We recall Lemma 1 from the main text:

Lemma 1.

Let r ∈ N . The inner product of two (not normalised) output states | ψ p i and | ψ q i of m -mode interferometers U p and V q with adaptive measurements outcome p , q ∈ Φ k,r is given by h ψ p | ψ q i = 1 √ p ! q ! X i , j ∈{ , } n | i | = | j | = r Per (cid:16) A i (cid:17) Per (cid:16) B j (cid:17) Per (cid:16) C i , j (cid:17) , (50) where for all i , j ∈ { , } n such that | i | = | j | = r , A i = U p † ( i , m − n ) , ( p , m − k ) (51) is an r × r matrix which can be obtained eﬃciently from U p , B j = V q ( q , m − k ) , ( j , m − n ) (52) is an r × r matrix which can be obtained eﬃciently from V q , and C i , j = U p † ( n − i , m − n ) , ( k , m − k ) V q ( k , m − k ) , ( n − j , m − n ) (53) is an ( n − r ) × ( n − r ) matrix which can be obtained eﬃciently from U p and V q .Proof. Consider the expression for the inner product obtained in Eq. (34): h ψ p | ψ q i = 1 √ p ! q ! X s ∈ Φ m − k,n − r s ! Per (cid:16) U p † t , ( p , s ) (cid:17) Per (cid:16) V q ( q , s ) , t (cid:17) . (54)It is reminiscent of the permanent composition formula [54]: for all m, n, c ∈ N ∗ , all s ∈ N , all u ∈ Φ m,s and all v ∈ Φ n,s , Per [( M N ) u , v ] = X s ∈ Φ c,s s ! Per ( M u , s ) Per ( N s , v ) (55)where M is a m × c matrix and N is a n × c matrix. However, this formula is not directly applicable tothe expression in Eq. (54). In order to obtain a suitable expression, we ﬁrst make use of the Laplaceformula for the permanent: we expand the permanent of U p † t , ( p , s ) along the columns that are repeatedaccording to p and we expand the permanent of V q ( q , s ) , t along the rows that are repeated according to q . The generalised Laplace column expansion formula for the permanent reads: let n ∈ N ∗ , let W bean n × n matrix, and let j ∈ { , } n . Then,Per ( W ) = X i ∈{ , } n | i | = | j | Per ( W i , j ) Per ( W n − i , n − j ) , (56)where W i , j is the matrix obtained from W by keeping only the k th rows and l th columns such that i k = 1 and j l = 1, respectively, and W n − i , n − j is the matrix obtained from W by keeping only the k th rows and l th columns such that i k = 0 and j l = 0, respectively. This formula is obtained by applyingthe Laplace expansion formula for one column various times, for each column with index l such that j l = 1, and the same formula holds for rows. e ﬁrst apply the general column expansion formula in Eq. (56) to the matrix U p † t , ( p , s ) with j =( r , n − r ) ∈ { , } n , obtainingPer (cid:16) U p † t , ( p , s ) (cid:17) = X i ∈{ , } n | i | = r Per (cid:20)(cid:16) U p † t , ( p , s ) (cid:17) i , j (cid:21) Per (cid:20)(cid:16) U p † t , ( p , s ) (cid:17) n − i , n − j (cid:21) . (57)Let us consider the matrix (cid:16) U p † t , ( p , s ) (cid:17) i , j appearing in this last expression, for i ∈ { , } n . Its rows areobtained by keeping the ﬁrst n lines of U p † since t = ( n , m − n ), then by keeping only the l th rowssuch that i l = 1. Its columns are obtained by repeating p l times the l th column for l ∈ { , . . . , k } and s l times for l ∈ { k + 1 , . . . , m } , then by only keeping the ﬁrst r columns since j = ( r , n − r ). However,since | p | = | j | = r , these are the columes repeated according to p . Hence, (cid:16) U p † t , ( p , s ) (cid:17) i , j = U p † ( i , m − n ) , ( p , m − k ) , (58)where U p † ( i , m − n ) , ( p , m − k ) is the matrix obtained from U p † by keeping only the l th rows such that i l = 1and removing the others, and by repeating p l times the l th column for l ∈ { , . . . , k } and removingthe others. Similarly, with | s | = | n − j | = n − r , (cid:16) U p † t , ( p , s ) (cid:17) n − i , n − j = U p † ( n − i , m − n ) , ( k , s ) , (59)where U p † ( n − i , m − n ) , ( k , s ) is the matrix obtained from U p † by keeping only the l th rows such that i l = 0and removing the others, and by repeating s l times the l th column for l ∈ { k + 1 , . . . , m } and removingthe others. With Eqs. (57), (58) and (59) we obtainPer (cid:16) U p † t , ( p , s ) (cid:17) = X i ∈{ , } n | i | = r Per (cid:16) U p † ( i , m − n ) , ( p , m − k ) (cid:17) Per (cid:16) U p † ( n − i , m − n ) , ( k , s ) (cid:17) = X i ∈{ , } n | i | = r Per (cid:16) A i (cid:17) Per (cid:16) U p † ( n − i , m − n ) , ( k , s ) (cid:17) , (60)where we have deﬁned, for all i ∈ { , } n such that | i | = r , A i := U p † ( i , m − n ) , ( p , m − k ) , (61)which is an r × r matrix independent of s that can be obtained eﬃciently from U p .The same reasoning with the general row expansion formula for the matrix V q ( q , s ) , t and the rows i = ( r , n − r ) givesPer (cid:16) V q ( q , s ) , t (cid:17) = X j ∈{ , } n | j | = r Per (cid:20)(cid:16) V q ( q , s ) , t (cid:17) i , j (cid:21) Per (cid:20)(cid:16) V q ( q , s ) , t (cid:17) n − i , n − j (cid:21) = X j ∈{ , } n | j | = r Per (cid:16) V q ( q , m − k ) , ( j , m − n ) (cid:17) Per (cid:16) V q ( k , s ) , ( n − j , m − n ) (cid:17) , (62)where V q ( q , m − k ) , ( j , m − n ) is the matrix obtained from V q by repeating q l times the l th row for l ∈{ , . . . , k } and removing the others and by keeping only the l th columns such that j l = 1, and where V q ( k , s ) , ( n − j , m − n ) is the matrix obtained from V q by repeating s l times the l th row for l ∈ { k +1 , . . . , m } and removing the others and by keeping only the l th columns such that j l = 0. Deﬁning, for all j ∈ { , } n such that | j | = r , B j := V q ( q , m − k ) , ( j , m − n ) , (63) he expression in Eq. (62) rewritesPer (cid:16) V q ( q , s ) , t (cid:17) = X j ∈{ , } n | j | = r Per (cid:16) B j (cid:17) Per (cid:16) V q ( k , s ) , ( n − j , m − n ) (cid:17) , (64)where B j are r × r matrices independent of s and can be obtained eﬃciently from V q .Plugging Eqs. (60) and (64) in Eq. (54) we obtain h ψ p | ψ q i = 1 √ p ! q ! X i , j ∈{ , } n | i | = | j | = r " Per (cid:16) A i (cid:17) Per (cid:16) B j (cid:17) × X s ∈ Φ m − k,n − r s ! Per (cid:16) U p † ( n − i , m − n ) , ( k , s ) (cid:17) Per (cid:16) V q ( k , s ) , ( n − j , m − n ) (cid:17) . (65)The sum appearing in the second line may now be expressed as a single permanent using the permanentcomposition formula: for all i , j ∈ { , } n such that | i | = | j | = r , let us deﬁne the ( n − r ) × ( m − k )matrix ˜ U p , i := U p † ( n − i , m − n ) , ( k , m − k ) , (66)and the ( m − k ) × ( n − r ) matrix ˜ V q , j := V q ( k , m − k ) , ( n − j , m − n ) , (67)so that U p † ( n − i , m − n ) , ( k , s ) = ˜ U p , i n − r , s and V q ( k , s ) , ( n − j , m − n ) = ˜ V q , js , n − r . (68)With the permanent composition formula in Eq. (55) we obtain X s ∈ Φ m − k,n − r s ! Per (cid:16) U p † ( n − i , m − n ) , ( k , s ) (cid:17) Per (cid:16) V q ( k , s ) , ( n − j , m − n ) (cid:17) = Per (cid:20)(cid:16) ˜ U p , i ˜ V q , j (cid:17) n − r , n − r (cid:21) . (69)Since ˜ U p , i ˜ V q , j is an ( n − r ) × ( n − r ) matrix we thus have X s ∈ Φ m − k,n − r s ! Per (cid:16) U p † ( n − i , m − n ) , ( k , s ) (cid:17) Per (cid:16) V q ( k , s ) , ( n − j , m − n ) (cid:17) = Per (cid:16) ˜ U p , i ˜ V q , j (cid:17) . (70)Then, Eq. (65) rewrites h ψ p | ψ q i = 1 √ p ! q ! X i , j ∈{ , } n | i | = | j | = r Per (cid:16) A i (cid:17) Per (cid:16) B j (cid:17) Per (cid:16) C i , j (cid:17) , (71)where we have deﬁned C i , j := ˜ U p , i ˜ V q , j = U p † ( n − i , m − n ) , ( k , m − k ) V q ( k , m − k ) , ( n − j , m − n ) , (72)which is an ( n − r ) × ( n − r ) matrix that can be computed eﬃciently from U p and V q ..