[PDF] Effects of quantum resources on the statistical complexity of quantum circuits

Abstract

We investigate how the addition of quantum resources changes the statistical complexity of quantum circuits by utilizing the framework of quantum resource theories. Measures of statistical complexity that we consider include the Rademacher complexity and the Gaussian complexity, which are well-known measures in computational learning theory that quantify the richness of classes of real-valued functions. We derive bounds for the statistical complexities of quantum circuits that have limited access to certain resources and apply our results to two special cases: (1) stabilizer circuits that are supplemented with a limited number of T gates and (2) instantaneous quantum polynomial-time Clifford circuits that are supplemented with a limited number of CCZ gates. We show that the increase in the statistical complexity of a quantum circuit when an additional quantum channel is added to it is upper bounded by the free robustness of the added channel. Finally, we derive bounds for the generalization error associated with learning from training data arising from quantum circuits.

Full PDF

EEffects of quantum resources on the statistical complexity of quantum circuits

Kaifeng Bu, ∗ Dax Enshan Koh,

2, †

Lu Li,

3, 4

Qingxian Luo,

4, 5 and Yaobo Zhang

6, 7 Department of Physics, Harvard University, Cambridge, Massachusetts 02138, USA Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR),1 Fusionopolis Way, Department of Mathematics, Zhejiang Sci-Tech University, Hangzhou, Zhejiang 310018, China School of Mathematical Sciences, Zhejiang University, Hangzhou, Zhejiang 310027, China Center for Data Science, Zhejiang University, Hangzhou Zhejiang 310027, China Zhejiang Institute of Modern Physics, Zhejiang University, Hangzhou, Zhejiang 310027, China Department of Physics, Zhejiang University, Hangzhou Zhejiang 310027, China

We investigate how the addition of quantum resources changes the statistical complexity of quantum circuitsby utilizing the framework of quantum resource theories. Measures of statistical complexity that we considerinclude the Rademacher complexity and the Gaussian complexity, which are well-known measures in compu-tational learning theory that quantify the richness of classes of real-valued functions. We derive bounds for thestatistical complexities of quantum circuits that have limited access to certain resources and apply our results totwo special cases: (1) stabilizer circuits that are supplemented with a limited number of T gates and (2) instan-taneous quantum polynomial-time Clifford circuits that are supplemented with a limited number of CCZ gates.We show that the increase in the statistical complexity of a quantum circuit when an additional quantum channelis added to it is upper bounded by the free robustness of the added channel. Finally, we derive bounds for thegeneralization error associated with learning from training data arising from quantum circuits.

I. INTRODUCTION

Quantum machine learning, which aims to harness thepower of quantum computing to perform machine learningtasks, has attracted considerable interest in recent years [1–5]. This interest is accompanied by the hope that quantumalgorithms can outperform their classical counterparts at solv-ing certain machine learning problems. This hope is fuelled,in part, both by the observation that quantum computers arecapable of efﬁciently producing patterns in data that classi-cal computers are believed to not be able to produce efﬁ-ciently [6–9] and by the proposal of quantum algorithms witha provable exponential speedup over known classical algo-rithms that may be adapted for use as subroutines in certainmachine learning algorithms. An example of such an algo-rithm is the Harrow-Hassidim-Lloyd (HHL) algorithm [10]for solving linear systems of equations, which has been ap-plied to various machine learning problems, like recommen-dation systems [11], support vector machines [12], principalcomponent analysis [13], etc.Central to many quantum machine learning algorithms isthe need to train quantum variational circuits to perform cer-tain tasks. These circuits are the central building block used invariational quantum algorithms, which have been described asa leading candidate for achieving a practical quantum advan-tage using noisy intermediate-scale quantum (NISQ [14, 15])devices [16]. Examples of variational quantum algorithms in-clude the variational quantum eigensolver (VQE) for quantumchemistry [17–19], the quantum approximate optimization al-gorithm (QAOA) for optimization [20], and the quantum neu-ral network (QNN) that generalizes the classical neural net-work [21–26]. ∗ [email protected] † [email protected] While quantum circuits are believed to provide an advan-tage over their classical counterparts, not all of them are ca-pable of doing so. There are several well-known restrictedclasses of quantum circuits that can be shown to be efﬁcientlysimulable by a classical computer. These include the sta-bilizer circuits [27], the matchgate circuits [28, 29] as wellas these circuits augmented with various supplementary re-sources [30–34]. These classical simulation results show thatif one hoped to outperform classical algorithms at a given ma-chine learning task, it is necessary to utilize resources outsidethese classically simulable restricted classes of quantum cir-cuits.A key step in both classical and quantum machine learn-ing is the building of learning models based on training data.Here, the power of a learning model depends on its statis-tical complexity, i.e., its ability to ﬁt functions, which hasbeen quantiﬁed by various measures. These measures includethe Vapnik-Chervonenkis (VC) dimension [35, 36] (which hasbeen used to determine the sample complexity of PAC learn-ing [37] and classical neural networks [38]), the metric en-tropy (also known as covering number) [39], the Rademachercomplexity [40] (which has been studied in the context of clas-sical neural networks [41–44]) and the Gaussian complexity[40].In building learning models using quantum circuits, variousquantum resources, or quantum effects , are typically at play.These include magic [45–47], entanglement [48, 49] and co-herence [50–52]. But how do changes in the amounts of thesequantum resources affect the statistical complexity of thesequantum-circuit-based learning models? In recent work [53],we partially addressed this question by focusing on a speciﬁcresource, namely the resource of magic. In particular, we uti-lized the ( p , q ) group norm to quantify the amount of magicin quantum circuits and showed how the statistical complex-ity of the quantum circuit scales with the depth and width ofthe circuit and the amount of magic it contains. In this work, a r X i v : . [ qu a n t - ph ] F e b we extend our previous results and address the above questionmore generally by considering the Rademacher and Gaussiancomplexities as measures of model complexity and utilizingthe framework of general resource theories [54, 55], whichoffers a powerful paradigm for the quantiﬁcation and opera-tional interpretation of quantum effects [46].In a quantum resource theory, quantum channels are cate-gorized as being either a free channel or a resource channel.Free channels are those that are available or inexpensive andresource channels are those that are limited or expensive touse. In this work, we consider quantum-circuit-based learn-ing models in the following two contexts: (1) quantum cir-cuits with access to only a restricted set of channels O , and(2) quantum circuits with access to a restricted set of channels O together with an additional resource channel Ψ ∈ O (forexample, we could take O to be the set of stabilizer circuitsand Ψ to be the T gate). We show that by adding a resourcechannel to a set of free channels, the Radamacher and Gaus-sian complexities are increased by an amount that is boundedby the free robustness of the resource channel multiplied bythe number of times the channel is used. Using this result, wederive an upper bound on the generalization error associatedwith learning from the training data arising from such circuits. II. PRELIMINARIESA. Quantum-generated function classes

Consider an n -qubit quantum circuit that implements aquantum channel Φ . For example, Φ = Φ ( θ ) could repre-sent a parametrized quantum circuit with gates parametrizedby the parameters θ ∈ R α (for an example, see Fig. 1). Let (cid:126) x ∈ F n be an n -bit input string (for example, (cid:126) x could be the bi-nary representation of a collection of pixel values of an imageof a handwritten digit). By Born’s rule, if we feed the compu-tational basis state | (cid:126) x (cid:105) into the circuit Φ and make a measure-ment in the computational basis, the probability of measuring (cid:126) y ∈ F n is given by p Φ ,(cid:126) x ( (cid:126) y ) = f Φ ( (cid:126) x ,(cid:126) y ) : = Tr [ Φ ( | x (cid:105)(cid:104) x | ) | y (cid:105)(cid:104) y | ] . (1)where f Φ : F n × F n → [ , ] is a real-valued function inducedby the channel Φ that maps input-output pairs ( (cid:126) x ,(cid:126) y ) to proba-bility values. Let Ω be a set of quantum channels. We deﬁnethe function class F ( Ω ) as follows: F ( Ω ) = { f Φ | Φ ∈ Ω } . (2) B. Statistical complexity

We now introduce the Rademacher and Gaussian complex-ities [40], which quantify the richness of sets of real-valued Subsequently, we will identify the circuit with the channel it implementsand denote both by Φ . FIG. 1. An example of a parametrized quantum circuit withparametrized gates. functions and can be used to provide bounds for the general-ization error associated with learning from training data. Let G be a set of real-valued functions and let S = ( z , . . . , z m ) ∈ R m be a set of m samples. The (empirical) Rademacher complex-ity of G with respect to S isˆ R S ( G ) = E ε ,..., ε m ∼ Rad (cid:34) sup g ∈ G m m ∑ i = ε i g ( z i ) (cid:35) , (3)where the expectation is taken over i.i.d. Rademacher randomvariables, i.e., ε i ∼ Rad for each i ∈ { , . . . , m } . Recall that theRademacher random variable X has probability mass functionPr ( X = k ) = (cid:40) / k ∈ {− , } , . (4)Similarly, the (empirical) Gaussian complexity of G withrespect to S isˆ G S ( G ) = E ε ,..., ε m ∼ N ( , ) (cid:34) sup g ∈ G m m ∑ i = ε i g ( z i ) (cid:35) , (5)where the expectation is taken over the i.i.d. random Gaussianvariables with zero mean and unit variance, i.e., ε i ∼ N ( , ) for each i ∈ { , . . . , m } .Note that the empirical Rademacher and Gaussian com-plexities depend on the samples S = ( z , . . . , z m ) . By averag-ing over samples S taken from a product distribution D m , weobtain the expected Rademacher and Gaussian complexities : R D ( G ) = E S ∼ D m (cid:2) ˆ R S ( G ) (cid:3) , (6) G D ( G ) = E S ∼ D m (cid:2) ˆ G S ( G ) (cid:3) . (7)In the rest of the main text, we will focus on the Rademachercomplexity; similar results hold for the Gaussian complexity,which we relegate to Appendix E. C. Statistical complexity in the quantum resource theoryframework

Quantum resource theories are characterized by a restrictedset of channels, called free channels , which map free statesto free states; any channel that is not a free channel is calleda resource channel . Let O be a set of n -qubit free channelsand let Ψ / ∈ O be an n -qubit resource channel. Deﬁne O Ψ : = O ∪ { Ψ } to be the class of channels formed by appending Ψ to O . In addition, to take into account the case where theresource channel is used more than once, for each k ∈ Z + ,deﬁne O ( k ) Ψ = (cid:26) l ∏ i = Φ i (cid:12)(cid:12)(cid:12)(cid:12) l = poly ( n ) ; Φ i ∈ O Ψ ∀ i ∈ { , . . . , l } ;and at most k of the Φ i ’s are Ψ (cid:27) . (8)It is easy to see that the above sets form a nested hierarchy O ⊂ O Ψ ⊂ O ( ) Ψ ⊂ O ( ) Ψ ⊂ . . . ⊂ O ( k ) Ψ ⊂ O ( k + ) Ψ ⊂ . . . (9)In this work, we will be interested in the statistical com-plexities of the function classes F ( Ω ) formed by taking Ω tobe the sets in the nested hierarchy in Eq. (9), where F ( · ) isgiven by Eq. (2). III. RESULTSA. Statistical complexity bounds

We ﬁrst consider the Rademacher complexity of F ( O Ψ ) . Theorem 1.

Given m independent samples S = ( (cid:126) z , . . . ,(cid:126) z m ) and a resource channel Ψ , the Rademacher complexity of F ( O Ψ ) is bounded as follows: ˆ R S ( F ( O )) ≤ ˆ R S ( F ( O Ψ )) ≤ ( + γ ( Ψ )) ˆ R S ( F ( O )) , (10) where γ ( Ψ ) is the free robustness of Ψ with respect to the set O , deﬁned as γ ( Ψ ) : = min (cid:26) λ (cid:12)(cid:12)(cid:12)(cid:12) ∃ Φ ∈ Conv ( O ) : Ψ + λ Φ + λ ∈ Conv ( O ) (cid:27) . Therefore, for any probability distribution D on the samplespace, if each sample (cid:126) z i is chosen independently according toD for i = , . . . , m, we haveR D ( F ( O )) ≤ R D ( F ( O Ψ )) ≤ ( + γ ( Ψ )) R D ( F ( O )) . (11)The proof of Theorem 1 is presented in Appendix B. The-orem 1 tells us that with access to the resource channel, theRademacher complexity is bounded by the free robustness ofthe channel.Next, let us consider the case where the resource channelcan be used multiple times. In this case, the relevant functionclass is that deﬁned by Eq. (8). By Eq. (9), the followingrelationship follows immediatelyˆ R S ( O ( k ) Ψ ) ≤ ˆ R S ( O ( k + ) Ψ ) . (12) Theorem 2.

Given m independent samples S = ( (cid:126) z , . . . ,(cid:126) z m ) and a resource channel Ψ , we have the following bound ˆ R S ( F ( O ( k ) Ψ )) ≤ γ ∗ ˆ R S ( F ( O )) , (13) where γ ∗ = min { + γ max , n , ( + γ ( Ψ )) k } , and γ max , n is themaximal free robustness over quantum channels on n qubits.Therefore for any probability distribution D on the samplespace, with each sample (cid:126) z i chosen independently accordingto D for i = , . . . , m, we haveR D ( F ( O ( k ) Ψ )) ≤ γ ∗ R D ( F ( O )) . (14)The proof of Theorem 2 is presented in Appendix C. The-orem 2 tells us that the Rademacher complexity for the casewhere we have access to multiple copies of a resource chan-nel has an upper bound that depends on the Rademacher com-plexity for the case where there is no resource channel, thefree robustness of the resource channel Ψ and the number oftimes Ψ is used.Now, let us give some examples to illustrate our results. Example 1 : Consider quantum circuits whose gates all belongto the Clifford group, and denote the set of Clifford channelsassociated with such circuits by

ST AB . Of interest to us isthe Rademacher complexity of F ( ST AB ) with respect to m independent samples S = ( (cid:126) z i ) mi = , denoted by ˆ R S ( F ( ST AB )) .As we shall show in Appendix D, we get the following boundfor stabilizer circuitsˆ R S ( F ( ST AB )) ≤ ( + o ( )) n √ m max Φ ∈ ST AB (cid:13)(cid:13)(cid:13) (cid:126) f Φ (cid:13)(cid:13)(cid:13) ∞ , (15)where (cid:126) f Φ = ( f Φ ( (cid:126) z i )) mi = . Now, while such circuits can be efﬁ-ciently simulated on a classical computer, by the Gottesman-Knill theorem [27], circuits formed from the Clifford+ T uni-versal gate set, where T = diag [ , e i π / ] , are believed to pre-clude efﬁcient classical simulation [56, 57]. This motivates usto consider quantum circuits consisting of both Clifford gatesand the T gate. We deﬁne the set ST AB ( k ) T to be the set ofquantum channels formed from Clifford unitaries and at most k T gates. As we shall show in Appendix D, the following up-per bound holds for the Rademacher complexity of ST AB ( k ) T :ˆ R S ( F ( ST AB ( k ) T )) ≤ (cid:16) + √ (cid:17) k ˆ R S ( F ( ST AB )) ≤ O (cid:18)(cid:16) + √ (cid:17) k n √ m (cid:19) max Φ ∈ ST AB (cid:13)(cid:13)(cid:13) (cid:126) f Φ (cid:13)(cid:13)(cid:13) ∞ , (16)where we used the fact that the free robustness of the T gateis upper bounded by √ / Example 2 : Consider the instantaneous quantum polynomial-time (IQP) circuits, a restricted model of quantum compu-tation that has been proposed as a candidate for demon-strating quantum computational supremacy in the near term[6, 8, 9, 58]. The structure of IQP circuits is quite simple:each circuit has the form H ⊗ n DH ⊗ n , where D is a subcircuitwith gates chosen from { Z , CZ , CCZ } (see Fig. 2). Let us de-ﬁne I to be the set of IQP circuits for which the gates in D arefrom the gate set { Z , CZ } and which contains at least one CZ gate (the case in which the circuits do not contain a CZ gateis trivial). Note that each circuit in I is also a Clifford circuit.Moreover, I is a ﬁnite set, and the size of I is O ( n ) . Thus,we have the following boundˆ R S ( F ( I )) ≤ O ( n ) √ m max Φ ∈ I (cid:13)(cid:13)(cid:13) (cid:126) f Φ (cid:13)(cid:13)(cid:13) ∞ . (17)While I can be efﬁciently simulated on a classical computer[27], IQP circuits formed from the gate set I + CCZ are hardto simulate classically [6, 8, 58], which motivates us to con-sider the set I ( k ) CCZ of IQP circuits with at most k CCZ gates.As we shall show in Appendix D, the following bound holds:ˆ R S ( F ( I ( k ) CCZ )) ≤ O (( n + k log n ) / ) √ m max Φ ∈ I ( k ) CCZ (cid:13)(cid:13)(cid:13) (cid:126) f Φ (cid:13)(cid:13)(cid:13) ∞ . (18). FIG. 2. An example of an IQP circuit, which has the form H ⊗ n DH ⊗ n , where the gates in D may be chosen only from the gateset { Z , CZ , CCZ } . B. Generalization error bounds

Given a sample (cid:126) z = ( (cid:126) x ,(cid:126) y ) (e.g., (cid:126) y = g ( (cid:126) x ) for some unknownfunction g ), let us consider the loss function l ( (cid:126) z i , Φ ) = − f Φ ( (cid:126) z i ) where f Φ is deﬁned by Eq. (1). Then the expected errorwith respect to some unknown probability distribution D on Z n × Z n is er D ( Φ ) = E (cid:126) z ∼ D l ( (cid:126) z , Φ ) (19)Given m independent samples S = ( (cid:126) z , . . . ,(cid:126) z m ) , the empiricalerror is er S ( Φ ) = m ∑ i l ( (cid:126) z i , Φ ) . (20)The difference between er S and er D is called the generaliza-tion error , which determines the performance of the function f on the unseen data drawn from the unknown probability dis-tribution. The Rademacher complexity provides a bound onthe generalization error by the following result. Lemma 3 ([40]) . If the loss function l ( f ( (cid:126) x ) ,(cid:126) y ) takes valuesin [ , B ] , then for any δ > , the following statement holds forany function f ∈ F with probability at least − δ :er D ( f ) ≤ er S ( f ) + B ˆ R S ( l F ) + B (cid:114) log ( / δ ) m where the function class l F : = { l f : ( (cid:126) x ,(cid:126) y ) → l ( f ( (cid:126) x ) ,(cid:126) y ) | f ∈ F } , and ˆ R S ( l F ) is theRademacher complexity of the function class l F on them given samples S = { ( (cid:126) x i ,(cid:126) y i ) } mi = . Using this result and Theorem 2, we get the following up-per bound on the generalization error for the function class F ( O ( k ) Ψ ) in terms of the Rademacher complexity of F ( O ) and γ ∗ . Proposition 4.

Consider a set of quantum circuits O and let Ψ / ∈ O . For any δ > , the following statement holds for all Φ ∈ O ( k ) Ψ with probability at least − δ er D ( Φ ) ≤ er S ( Φ ) + γ ∗ ˆ R S ( F ( O )) + (cid:114) log ( / δ ) m , where γ ∗ = min { ( + γ ( Ψ )) k , + γ max , n } . IV. CONCLUSION

In this paper, we investigated the effects of quantum re-sources on the statistical complexity of quantum circuits. Weconsidered the Rademacher and Gaussian complexities of thequantum-circuit-based learning model in two cases: (1) quan-tum circuits with access to only a restricted set of channels O , and (2) quantum circuits with access to a restricted setof channels O together with an additional resource channel Ψ ∈ O . We show that by adding a resource channel to a setof free channels, the Radamacher and Gaussian complexitiesare increased by an amount that is bounded by the free ro-bustness of the resource channel multiplied by the number oftimes the channel is used. We applied our results to two spe-cial cases: (1) stabilizer circuits that are supplemented witha limited number of T gates and (2) instantaneous quantumpolynomial-time Clifford circuits that are supplemented witha limited number of CCZ gates. Using this result, we derivean upper bound on the generalization error associated withlearning from the training data arising from such circuits.Our results reveal a new connection between quantum re-sources and the statistical complexity of quantum circuits,which paves the way for further research into the statisticalcomplexity of learning models based on quantum circuits, likethe variational quantum eigensolver and the quantum neuralnetwork. Furthermore, from a quantum resource theoreticpoint of view, our results also provide a new operational in-terpretation of free robustness in general resource theories.While we focused on the quantum circuit model in this pa-per, it will be interesting to generalize our results to othercomputational models such as measurement-based quantumcomputation (MBQC), tensor networks, etc. Besides theRademacher and Gaussian complexities, there are also othermeasures of statistical complexity of function classes, such asthe metric entropy, the VC dimension (or more generally, thepseudo-dimension [59]), and the topological entropy [60]. Itwill be interesting to see the effects of quantum resources us-ing these other measures of statistical complexity. We leavethis problem for further research. ACKNOWLEDGMENTS

K. B. thanks Arthur Jaffe and Zhengwei Liu for the helpand support during the breakout of the COVID-19 pandemic. K. B. acknowledges the support of ARO Grants W911NF-19-1-0302 and W911NF-20-1-0082, and the support from YauMathematical Science Center at Tsinghua University duringthe visit. [1] Seth Lloyd, Masoud Mohseni, and Patrick Rebentrost, “Quan-tum algorithms for supervised and unsupervised machine learn-ing,” arXiv preprint arXiv:1307.0411 (2013).[2] Peter Wittek,

Quantum machine learning: what quantum com-puting means to data mining (Academic Press, 2014).[3] Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Reben-trost, Nathan Wiebe, and Seth Lloyd, “Quantum machine learn-ing,” Nature , 195–202 (2017).[4] Carlo Ciliberto, Mark Herbster, Alessandro Davide Ialongo,Massimiliano Pontil, Andrea Rocchetto, Simone Severini, andLeonard Wossnig, “Quantum machine learning: a classical per-spective,” Proceedings of the Royal Society A: Mathematical,Physical and Engineering Sciences , 20170551 (2018).[5] Vedran Dunjko and Hans J Briegel, “Machine learning & arti-ﬁcial intelligence in the quantum domain: a review of recentprogress,” Reports on Progress in Physics , 074001 (2018).[6] Michael J Bremner, Richard Jozsa, and Dan J Shepherd, “Clas-sical simulation of commuting quantum computations impliescollapse of the polynomial hierarchy,” Proc. Roy. Soc. LondonSer. A , 459–472 (2010).[7] Scott Aaronson and Alex Arkhipov, “The computational com-plexity of linear optics,” in Proceedings of the Forty-Third An-nual ACM Symposium on Theory of Computing , STOC ’11(Association for Computing Machinery, New York, NY, USA,2011) p. 333–342.[8] Michael J. Bremner, Ashley Montanaro, and Dan J. Shep-herd, “Average-case complexity versus approximate simulationof commuting quantum computations,” Phys. Rev. Lett. ,080501 (2016).[9] Alexander M. Dalzell, Aram W. Harrow, Dax Enshan Koh, andRolando L. La Placa, “How many qubits are needed for quan-tum computational supremacy?” Quantum , 264 (2020).[10] Aram W. Harrow, Avinatan Hassidim, and Seth Lloyd, “Quan-tum algorithm for linear systems of equations,” Phys. Rev. Lett. , 150502 (2009).[11] Iordanis Kerenidis and Anupam Prakash, “Quantum Recom-mendation Systems,” in , Leibniz InternationalProceedings in Informatics (LIPIcs), Vol. 67, edited by Chris-tos H. Papadimitriou (Schloss Dagstuhl–Leibniz-Zentrum fuerInformatik, Dagstuhl, Germany, 2017) pp. 49:1–49:21.[12] Patrick Rebentrost, Masoud Mohseni, and Seth Lloyd, “Quan-tum support vector machine for big data classiﬁcation,” Phys.Rev. Lett. , 130503 (2014).[13] Seth Lloyd, Masoud Mohseni, and Patrick Rebentrost, “Quan-tum principal component analysis,” Nature Physics , 631–633 (2014).[14] John Preskill, “Quantum Computing in the NISQ era and be-yond,” Quantum , 79 (2018).[15] Kishor Bharti, Alba Cervera-Lierta, Thi Ha Kyaw, TobiasHaug, Sumner Alperin-Lea, Abhinav Anand, Matthias Deg-roote, Hermanni Heimonen, Jakob S Kottmann, Tim Menke, et al. , “Noisy intermediate-scale quantum (NISQ) algorithms,”arXiv preprint arXiv:2101.08448 (2021).[16] M Cerezo, Andrew Arrasmith, Ryan Babbush, Simon C Ben- jamin, Suguru Endo, Keisuke Fujii, Jarrod R McClean, KosukeMitarai, Xiao Yuan, Lukasz Cincio, et al. , “Variational quantumalgorithms,” arXiv preprint arXiv:2012.09265 (2020).[17] Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, Man-HongYung, Xiao-Qi Zhou, Peter J Love, Alán Aspuru-Guzik, andJeremy L O’Brien, “A variational eigenvalue solver on a pho-tonic quantum processor,” Nature communications , 4213(2014).[18] Yudong Cao, Jonathan Romero, Jonathan P Olson, MatthiasDegroote, Peter D Johnson, Mária Kieferová, Ian D Kivlichan,Tim Menke, Borja Peropadre, Nicolas PD Sawaya, et al. ,“Quantum chemistry in the age of quantum computing,” Chem-ical reviews , 10856–10915 (2019).[19] Joonho Lee, William J. Huggins, Martin Head-Gordon, andK. Birgitta Whaley, “Generalized unitary coupled cluster wavefunctions for quantum computation,” Journal of Chemical The-ory and Computation , 311–324 (2019).[20] Edward Farhi, Jeffrey Goldstone, and Sam Gutmann, “Aquantum approximate optimization algorithm,” arXiv preprintarXiv:1411.4028 (2014).[21] Edward Farhi and Hartmut Neven, “Classiﬁcation with quan-tum neural networks on near term processors,” arXiv preprintarXiv:1802.06002 (2018).[22] K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii, “Quantumcircuit learning,” Phys. Rev. A , 032309 (2018).[23] Maria Schuld and Nathan Killoran, “Quantum machine learn-ing in feature Hilbert spaces,” Phys. Rev. Lett. , 040504(2019).[24] Vojtˇech Havlíˇcek, Antonio D Córcoles, Kristan Temme,Aram W Harrow, Abhinav Kandala, Jerry M Chow, and Jay MGambetta, “Supervised learning with quantum-enhanced fea-ture spaces,” Nature , 209–212 (2019).[25] Kunal Sharma, Marco Cerezo, Lukasz Cincio, and Patrick JColes, “Trainability of dissipative perceptron-based quantumneural networks,” arXiv preprint arXiv:2005.12458 (2020).[26] Kerstin Beer, Dmytro Bondarenko, Terry Farrelly, Tobias J. Os-borne, Robert Salzmann, Daniel Scheiermann, and RamonaWolf, “Training deep quantum neural networks,” Nat. Com-mun. , 1–6 (2020).[27] Daniel Gottesman, “The Heisenberg representation of quantumcomputers,” Group22: Proceedings of the XXII InternationalColloquium on Group Theoretical Methods in Physics , 32–43(1999).[28] Leslie G. Valiant, “Quantum circuits that can be simulated clas-sically in polynomial time,” SIAM Journal on Computing ,1229–1254 (2002).[29] Richard Jozsa and Akimasa Miyake, “Matchgates and classicalsimulation of quantum circuits,” Proceedings of the Royal So-ciety A: Mathematical, Physical and Engineering Sciences ,3089–3106 (2008).[30] Richard Jozsa and Maarten Van den Nest, “Classical simulationcomplexity of extended Clifford circuits,” Quantum Informa-tion & Computation , 633–648 (2014).[31] Dax Enshan Koh, “Further extensions of Clifford circuits andtheir classical simulation complexities,” Quantum Information & Computation , 0262–0282 (2017).[32] Daniel J. Brod, “Efﬁcient classical simulation of matchgate cir-cuits with generalized inputs and measurements,” Phys. Rev. A , 062332 (2016).[33] Kaifeng Bu and Dax Enshan Koh, “Efﬁcient classical simula-tion of Clifford circuits with nonstabilizer input states,” Phys.Rev. Lett. , 170502 (2019).[34] M. Hebenstreit, R. Jozsa, B. Kraus, and S. Strelchuk, “Com-putational power of matchgates with supplementary resources,”Phys. Rev. A , 052604 (2020).[35] V. N. Vapnik and A. Ya. Chervonenkis, “On the uniform con-vergence of relative frequencies of events to their probabilities,”Theory of Probability & Its Applications , 264–280 (1971).[36] V. N. Vapnik and A. Ya. Chervonenkis, “Necessary and sufﬁ-cient conditions for the uniform convergence of means to theirexpectations,” Theory of Probability & Its Applications ,532–553 (1982).[37] Anselm Blumer, A. Ehrenfeucht, David Haussler, and Man-fred K. Warmuth, “Learnability and the Vapnik-Chervonenkisdimension,” J. ACM , 929–965 (1989).[38] Nick Harvey, Christopher Liaw, and Abbas Mehrabian,“Nearly-tight VC-dimension bounds for piecewise linear neuralnetworks,” in Proceedings of the 2017 Conference on LearningTheory , Proceedings of Machine Learning Research, Vol. 65,edited by Satyen Kale and Ohad Shamir (PMLR, Amsterdam,Netherlands, 2017) pp. 1064–1068.[39] VM Tikhomirov, “ ε -entropy and ε -capacity of sets in functionalspaces,” in Selected works of AN Kolmogorov (Springer, 1993)pp. 86–170.[40] Peter L. Bartlett and Shahar Mendelson, “Rademacher andGaussian complexities: Risk bounds and structural results,” J.Mach. Learn. Res. , 463–482 (2003).[41] Behnam Neyshabur, Ryota Tomioka, and Nathan Srebro,“Norm-based capacity control in neural networks,” in Proceed-ings of The 28th Conference on Learning Theory , Proceedingsof Machine Learning Research, Vol. 40 (PMLR, Paris, France,2015) pp. 1376–1401.[42] Peter L. Bartlett, Dylan J. Foster, and Matus Telgarsky,“Spectrally-normalized margin bounds for neural networks,” in

Proceedings of the 31st International Conference on NeuralInformation Processing Systems , NIPS’17 (Curran AssociatesInc., Red Hook, NY, USA, 2017) p. 6241–6250.[43] Behnam Neyshabur, Srinadh Bhojanapalli, David Mcallester,and Nati Srebro, “Exploring generalization in deep learning,”in

Advances in Neural Information Processing Systems , Vol. 30(Curran Associates, Inc., 2017) pp. 5947–5956.[44] Noah Golowich, Alexander Rakhlin, and Ohad Shamir, “Size-independent sample complexity of neural networks,” in

Pro-ceedings of the 31st Conference On Learning Theory , Proceed-ings of Machine Learning Research, Vol. 75 (PMLR, 2018) pp.297–299. [45] Victor Veitch, S A Hamed Mousavian, Daniel Gottesman, andJoseph Emerson, “The resource theory of stabilizer quantumcomputation,” New J. Phys. , 013009 (2014).[46] Mark Howard and Earl Campbell, “Application of a ResourceTheory for Magic States to Fault-Tolerant Quantum Comput-ing,” Phys. Rev. Lett. , 090501 (2017).[47] Xin Wang, Mark M Wilde, and Yuan Su, “Quantifying themagic of quantum channels,” New Journal of Physics ,103002 (2019).[48] Ryszard Horodecki, Paweł Horodecki, Michał Horodecki, andKarol Horodecki, “Quantum entanglement,” Rev. Mod. Phys. , 865–942 (2009).[49] Martin B. Plenio and Shashank Virmani, “An introduction toentanglement measures,” Quantum Information & Computation , 1–51 (2007).[50] Johan Aberg, “Quantifying superposition,” arXiv preprintquant-ph/0612146 (2006).[51] T. Baumgratz, M. Cramer, and M. B. Plenio, “Quantifying co-herence,” Phys. Rev. Lett. , 140401 (2014).[52] Alexander Streltsov, Gerardo Adesso, and Martin B. Plenio,“Colloquium: Quantum coherence as a resource,” Rev. Mod.Phys. , 041003 (2017).[53] Kaifeng Bu, Dax Enshan Koh, Lu Li, Qingxian Luo, andYaobo Zhang, “On the statistical complexity of quantum cir-cuits,” arXiv preprint arXiv:2101.06154 (2021).[54] Bob Coecke, Tobias Fritz, and Robert W. Spekkens, “A math-ematical theory of resources,” Information and Computation , 59 – 86 (2016), Quantum Physics and Logic.[55] Eric Chitambar and Gilad Gour, “Quantum resource theories,”Rev. Mod. Phys. , 025001 (2019).[56] Barbara M Terhal and David P DiVincenzo, “Adaptive quan-tum computation, constant depth quantum circuits and Arthur-Merlin games,” Quantum Information & Computation , 134–145 (2004).[57] M. Van den Nest, “Classical simulation of quantum computa-tion, the Gottesman-Knill theorem, and slightly beyond,” Quan-tum Information & Computation , 0258–0271 (2010).[58] Michael J. Bremner, Ashley Montanaro, and Dan J. Shepherd,“Achieving quantum supremacy with sparse and noisy commut-ing quantum computations,” Quantum , 8 (2017).[59] Matthias C. Caro and Ishaun Datta, “Pseudo-dimension ofquantum circuits,” Quantum Mach. Intell. , 14 (2020).[60] Kaifeng Bu, Yaobo Zhang, and Qingxian Luo, “Depth-widthtrade-offs for neural networks via topological entropy,” arXivpreprint arXiv:2010.07587 (2020).[61] Shai Shalev-Shwartz and Shai Ben-David, Understanding ma-chine learning: From theory to algorithms (Cambridge univer-sity press, 2014).[62] John Watrous,

The theory of quantum information (CambridgeUniversity Press, 2018).[63] Scott Aaronson and Daniel Gottesman, “Improved simulationof stabilizer circuits,” Phys. Rev. A , 052328 (2004). Appendix A: Basic properties of Rademacher complexity

In this section, we list several basic properties of the Rademacher complexity, which may be found in [61].Given a subset A of R m , the Rademacher complexity of A is deﬁned asˆ R ( A ) = E (cid:126) ε sup (cid:126) v ∈ A m ∑ i ε i v i , (A1)where { ε i } i are i.i.d Rademacher random variables. Proposition 5 ([40, 61]) . The Rademacher complexity satisﬁes the following properties:(1) ˆ R ( A ) = ˆ R ( Conv ( A )) , (A2) where Conv ( A ) = { ∑ i λ i (cid:126) v i : (cid:126) v i ∈ A , λ i ≥ , ∑ i λ i = } .(2) For any c ∈ R , we have ˆ R ( cA ) = | c | ˆ R ( A ) , (A3) where cA : = { c (cid:126) v : (cid:126) v ∈ A } .(3) For any (cid:126) c ∈ R m , we have ˆ R ( A + (cid:126) c ) = ˆ R ( A ) , (A4) where A + (cid:126) c : = { (cid:126) v + (cid:126) c : (cid:126) v ∈ A } .(4) For any A , A ⊂ R m , we have ˆ R ( A + A ) = ˆ R ( A ) + ˆ R ( A ) , (A5) where A + A : = { (cid:126) v + (cid:126) v : (cid:126) v ∈ A ,(cid:126) v ∈ A } .(5) Given a Lipschitz function φ : R → R with Lipschitz constant L and φ ( ) = , we have ˆ R ( φ ◦ A ) ≤ L ˆ R ( A ) , (A6) where φ ◦ A : = { ( φ ( x ) , φ ( x ) , ..., φ ( x m )) : ( x , x , ..., x m ) ∈ A } . When the set A is ﬁnite, Massart’s lemma gives an upper bound for the Rademacher complexity of A . Lemma 6 (Massart’s lemma [61]) . Given a ﬁnite set A ⊂ R m , then we have ˆ R ( A ) ≤ max (cid:126) v ∈ A (cid:107) (cid:126) v (cid:107) (cid:112) | A | m , (A7) where | A | denotes the size of the ﬁnite set A. We now state an important result of the Rademacher complexity, which allows it to be estimated from a single sample set S = ( z , . . . , z m ) . Lemma 7 ([40]) . Let G be a set of functions X → [ a , b ] , where a < b. Let m ∈ Z + be a positive integer and D be a probabilitydistribution. Let t > . Then, Pr ( z ,..., z m ) ∼ D m ε ,..., ε m ∼ Rad (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) R D ( G ) − m sup g ∈ G m ∑ i = ε i g ( z i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ t (cid:35) ≤ (cid:20) − mt ( b − a ) + { a , b } (cid:21) , (A8) and Pr S ∼ D m (cid:2)(cid:12)(cid:12) R D ( G ) − ˆ R S ( G ) (cid:12)(cid:12) ≥ t (cid:3) ≤ − mt / ( b − a ) . (A9) Appendix B: Proof of Theorem 1

Proof.

First, let us rewrite the Rademacher complexity as followsˆ R S ( F ) = E sup f ∈ F m m ∑ i = ε i f ( (cid:126) z i ) = E (cid:126) ε sup f ∈ F (cid:104) (cid:126) ε , (cid:126) f (cid:105) , (B1)where (cid:126) ε = ( ε , ..., ε m ) ∈ { ± } n , (cid:126) f = ( f ( (cid:126) z ) , ..., f ( (cid:126) z m )) and (cid:104) (cid:126) ε , (cid:126) f (cid:105) = m ∑ mi = ε i f ( (cid:126) z i ) .The inequality ˆ R S ( F ( O )) ≤ ˆ R S ( F ( O Ψ )) comes directly from the deﬁnition of Rademacher complexity and the fact that O ⊂ O Ψ . Hence, we only need to prove that ˆ R S ( F ( O Ψ )) ≤ ( + γ ( Ψ )) ˆ R S ( F ( O )) . (B2)Let us deﬁne the set A as follows A = (cid:40) (cid:126) ε ∈ { ± } m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:104) (cid:126) ε , (cid:126) f Ψ (cid:105) > sup f ∈ F ( O ) (cid:104) (cid:126) ε , (cid:126) f (cid:105) (cid:41) . (B3)To ﬁnish the proof, we need the following two lemmas about the basic properties of the set A deﬁned in (B3). Lemma 8.

Given the set A deﬁned in (B3) , we have A ∩ ( − A ) = /0 , (B4) where − A : = { − (cid:126) ε | (cid:126) ε ∈ A } .Proof. Based on the deﬁnition of the set A , we have (cid:104) (cid:126) ε , (cid:126) f Ψ (cid:105) > sup f ∈ F ( O ) (cid:104) (cid:126) ε , (cid:126) f (cid:105) , for any (cid:126) ε ∈ A . Thus, we have (cid:104)− (cid:126) ε , (cid:126) f Ψ (cid:105) < − sup f ∈ F ( O ) (cid:104) (cid:126) ε , (cid:126) f (cid:105) = inf f ∈ F ( O ) (cid:104)− (cid:126) ε , (cid:126) f (cid:105) ≤ sup f ∈ F ( O ) (cid:104)− (cid:126) ε , (cid:126) f (cid:105) , That is, − (cid:126) ε ∈ A c . Therefore, we have A ∩ A c = /0. Lemma 9.

Given the set A deﬁned in Eq. (B3) , we have ∑ (cid:126) ε ∈ A sup f ∈ F ( O ) (cid:104) (cid:126) ε , (cid:126) f (cid:105) + ∑ (cid:126) ε ∈− A sup f ∈ F ( O ) (cid:104) (cid:126) ε , (cid:126) f (cid:105) ≤ ∑ (cid:126) ε ∈{± } m sup f ∈ F ( O ) (cid:104) (cid:126) ε , (cid:126) f (cid:105) . (B5) Proof.

First, due to Lemma 8, we have ∑ (cid:126) ε ∈ A sup f ∈ F ( O ) (cid:104) (cid:126) ε , (cid:126) f (cid:105) + ∑ (cid:126) ε ∈− A sup f ∈ F ( O ) = ∑ (cid:126) ε ∈ A ∪ ( − A ) sup f ∈ F ( O ) (cid:104) (cid:126) ε , (cid:126) f (cid:105) . Hence, we only need to prove that ∑ (cid:126) ε ∈ A ∪ ( − A ) sup f ∈ F ( O ) (cid:104) (cid:126) ε , (cid:126) f (cid:105) ≤ ∑ (cid:126) ε ∈{± } m sup f ∈ F ( O ) (cid:104) (cid:126) ε , (cid:126) f (cid:105) . Let us deﬁne the set B : = A ∪ ( − A ) , then it easy to verify that − (cid:126) ε ∈ B c , for any (cid:126) ε ∈ B c . Then, we have ∑ (cid:126) ε ∈{± } m sup f ∈ F ( O ) (cid:104) (cid:126) ε , (cid:126) f (cid:105) − ∑ (cid:126) ε ∈ A ∪ ( − A ) sup f ∈ F ( O ) (cid:104) (cid:126) ε , (cid:126) f (cid:105) = ∑ (cid:126) ε ∈ B c sup f ∈ F ( O ) (cid:104) (cid:126) ε , (cid:126) f (cid:105) ≥ ∑ (cid:126) ε ∈ B c (cid:104) (cid:126) ε , (cid:126) f (cid:105) = (cid:104) ∑ (cid:126) ε ∈ B c (cid:126) ε , (cid:126) f (cid:105) = . Based on the deﬁnition of free robustness, there exist channels Φ , Φ ∈ Conv ( O ) such that Ψ = ( + γ ( Ψ )) Φ − γ ( Ψ ) Φ . Due to the linearity of function f Φ with respect to Φ , we have f Ψ = ( + γ ( Ψ )) f Φ − γ ( Ψ ) f Φ . Therefore, ˆ R S ( F ( O Ψ )) = m ∑ (cid:126) ε ∈ A (cid:104) (cid:126) ε , (cid:126) f Ψ (cid:105) + m ∑ (cid:126) ε ∈ A c sup f ∈ F ( O ) (cid:104) (cid:126) ε , (cid:126) f (cid:105) = m ∑ (cid:126) ε ∈ A [( + γ ( Ψ )) (cid:104) (cid:126) ε , f Φ (cid:105) − γ ( Ψ ) (cid:104) (cid:126) ε , f Φ (cid:105) ] + m ∑ (cid:126) ε ∈ A c sup f ∈ F ( O ) (cid:104) (cid:126) ε , (cid:126) f (cid:105) = m ∑ (cid:126) ε ∈ A (cid:104) (cid:126) ε , f Φ (cid:105) + m ∑ (cid:126) ε ∈ A c sup f ∈ F ( O ) (cid:104) (cid:126) ε , (cid:126) f (cid:105) + γ ( Ψ ) m ∑ (cid:126) ε ∈ A [ (cid:104) (cid:126) ε , f Φ (cid:105) − (cid:104) (cid:126) ε , f Φ (cid:105) ] ≤ m ∑ (cid:126) ε ∈ A sup f ∈ F ( O ) (cid:104) (cid:126) ε , (cid:126) f (cid:105) + m ∑ (cid:126) ε ∈ A c sup f ∈ F ( O ) (cid:104) (cid:126) ε , (cid:126) f (cid:105) + γ ( Ψ ) m ∑ (cid:126) ε ∈ A [ (cid:104) (cid:126) ε , f Φ (cid:105) − (cid:104) (cid:126) ε , f Φ (cid:105) ]= R S ( F ( O )) + γ ( Ψ ) m ∑ (cid:126) ε ∈ A [ (cid:104) (cid:126) ε , f Φ (cid:105) − (cid:104) (cid:126) ε , f Φ (cid:105) ]= ˆ R S ( F ( O )) + γ ( Ψ ) m (cid:34) ∑ (cid:126) ε ∈ A (cid:104) (cid:126) ε , f Φ (cid:105) + ∑ (cid:126) ε ∈− A (cid:104) (cid:126) ε , f Φ (cid:105) (cid:35) ≤ ˆ R S ( F ( O )) + γ ( Ψ ) m (cid:34) ∑ (cid:126) ε ∈ A sup f ∈ F ( O ) (cid:104) (cid:126) ε , f (cid:105) + ∑ (cid:126) ε ∈− A sup f ∈ F ( O ) (cid:104) (cid:126) ε , f (cid:105) (cid:35) ≤ ˆ R S ( F ( O )) + γ ( Ψ ) m ∑ (cid:126) ε ∈{± } m sup f ∈ F ( O ) (cid:104) (cid:126) ε , f (cid:105) = ( + γ ( Ψ )) ˆ R S ( F ( O )) , where the ﬁrst and second inequality comes from the fact that Φ , Φ ∈ Conv ( O ) , and the last inequality comes from Lemma 9. Appendix C: Proof of Theorem 2

First, let us prove the following lemma about the relationship between the Rademacher complexities of O ( k + ) Ψ and O ( k ) Ψ . Lemma 10.

Given m independent samples S = ( (cid:126) z , . . . ,(cid:126) z m ) and a resource channel Ψ , we have ˆ R S (cid:16) F ( O ( k + ) Ψ ) (cid:17) ≤ ( + γ ( Ψ )) ˆ R S (cid:16) F ( O ( k ) Ψ ) (cid:17) , (C1) for any k ≥ .Proof. By the deﬁnition of free robustness, there exist channels Φ , Φ ∈ Conv ( O ) such that Ψ = ( + γ ( Ψ )) Φ − γ ( Ψ ) Φ . Therefore, for any channel Φ ∈ O ( k + ) Ψ , there exist two channels Φ (cid:48) , Φ (cid:48)(cid:48) ∈ Conv ( O ( k ) Ψ ) such that Ψ = ( + γ ( Ψ )) Φ (cid:48) − γ ( Ψ ) Φ (cid:48)(cid:48) . Therefore, O ( k + ) Ψ ⊂ ( + γ ( Ψ )) Conv ( O ( k ) Ψ ) − γ ( Ψ ) Conv ( O ( k ) Ψ ) . Therefore, ˆ R S (cid:16) F ( O ( k + ) Ψ ) (cid:17) ≤ ˆ R S (cid:104) F (cid:16) ( + γ ( Ψ )) Conv ( O ( k ) Ψ ) − γ ( Ψ ) Conv ( O ( k ) Ψ ) (cid:17)(cid:105) = ( + γ ( Ψ )) ˆ R S (cid:16) F ( O ( k ) Ψ ) (cid:17) + γ ( Ψ ) ˆ R S (cid:16) F ( O ( k ) Ψ ) (cid:17) = ( + γ ( Ψ )) ˆ R S (cid:16) F ( O ( k ) Ψ ) (cid:17) , where the ﬁrst equality comes from that fact that ˆ R S ( ∑ i F i ) = ∑ i ˆ R S ( F i ) where each F i is a function class, and the facts thatRademacher complexity is invariant under convex combination and ˆ R S ( c F ) = | c | ˆ R s ( F ) .0We are now ready to prove the lemma. Proof.

Based on Lemma 10, we have the following inequality:ˆ R S ( F (cid:16) O ( k + ) Ψ ) (cid:17) ≤ ( + γ ( Ψ )) k ˆ R S ( F ( O )) . Besides, for any Φ ∈ O ( k ) Ψ , there exist Φ , Φ ∈ O such that Φ = ( + γ ) Φ − γ Φ , (C2)where γ ≤ γ max , n . Therefore, we have O ( k ) Ψ ⊂ ( + γ max , n ) Conv ( O ) − γ max , n Conv ( O ) , for any integer k . Hence, we haveˆ R S (cid:16) F ( O ( k ) Ψ ) (cid:17) ≤ ˆ R S [ F (( + γ max , n ) Conv ( O ) − γ max , n Conv ( O ))]= ( + γ max , n ) ˆ R S ( F ( O )) + γ max , n ˆ R S ( F ( O ))= ( + γ max , n ) ˆ R S ( F ( O )) , for any integer k . Therefore, we haveˆ R S (cid:16) F ( O ( k ) Ψ ) (cid:17) ≤ min (cid:110) + γ max , n , ( + γ ( Ψ )) k (cid:111) ˆ R S ( F ( O )) . Appendix D: Proof of Example 1 and 21. Example 1: Proofs

By Choi’s representation of quantum channels [62], the function f Φ can be written as follows f Φ ( (cid:126) x ,(cid:126) y ) = n Tr [ Φ ⊗ I ( | Λ (cid:105)(cid:104) Λ | ) | (cid:126) x (cid:105)(cid:104) (cid:126) x | ⊗ | (cid:126) y (cid:105)(cid:104) (cid:126) y | ] , (D1)where | Λ (cid:105) = / √ n ∑ (cid:126) x | (cid:126) x (cid:105) | (cid:126) x (cid:105) . Since Φ is a (unitary) stabilizer circuit and | Λ (cid:105) is a pure stabilizer state, Φ ⊗ I ( | Λ (cid:105)(cid:104) Λ | ) is astabilizer state on 2 n qubits. Since the number of pure stabilizer states on 2 n qubits is 2 ( . + o ( ))( n ) [63], let us consider thevector (cid:126) f Φ = ( f Φ ( (cid:126) z i )) mi = , where the set (cid:110) (cid:126) f Φ (cid:111) Φ ∈ ST AB is a ﬁnite set satisfying (cid:12)(cid:12)(cid:12)(cid:110) (cid:126) f Φ (cid:111) Φ ∈ ST AB (cid:12)(cid:12)(cid:12) ≤ ( . + o ( ))( n ) . (D2)Hence, we have R S ( F ( ST AB )) ≤ ( + o ( )) nm max Φ ∈ ST AB (cid:13)(cid:13)(cid:13) (cid:126) f Φ (cid:13)(cid:13)(cid:13) ≤ ( + o ( )) n √ m max Φ ∈ ST AB (cid:13)(cid:13)(cid:13) (cid:126) f Φ (cid:13)(cid:13)(cid:13) ∞ , where the ﬁrst inequality comes from Massart’s Lemma (see Lemma 6) and the second inequality comes from the fact that (cid:107)·(cid:107) ≤ √ m (cid:107)·(cid:107) ∞ .Now, let us assume that we have access to the T gate. In this case, let us deﬁne the corresponding sets of quantum channels ST AB T and ST AB ( k ) T . The free robustness of T gate is γ ( T ) ≤ √ / T gate written as a quantum channel Φ T may be decomposed as follows Φ T ( · ) = (cid:32) + √ (cid:33) Φ S ( · ) + Φ Z ( · ) − √ Φ SZ ( · ) , (D3)1where S = diag [ , i ] is the phase gate and Z = diag [ , − ] is the Pauli Z gate . By Theorem 2, we get the following upper boundon the Rademacher complexity of ST AB ( k ) T :ˆ R S ( F ( ST AB T )) ≤ (cid:16) + √ / (cid:17) ˆ R S ( F ( ST AB )) ≤ O (cid:18)(cid:16) + √ / (cid:17) n √ m (cid:19) max Φ ∈ ST AB (cid:13)(cid:13)(cid:13) (cid:126) f Φ (cid:13)(cid:13)(cid:13) ∞ . ˆ R S ( F ( ST AB ( k ) T )) ≤ (cid:16) + √ (cid:17) k ˆ R S ( F ( ST AB )) ≤ O (cid:18)(cid:16) + √ (cid:17) k n √ m (cid:19) max Φ ∈ ST AB (cid:13)(cid:13)(cid:13) (cid:126) f Φ (cid:13)(cid:13)(cid:13) ∞ .

2. Example 2: Proofs

Since I are the IQP circuits with only Z and CZ as internal gates, I is a ﬁnite set with size |I| = O ( n ) . Henceˆ R S ( F ( I )) ≤ O ( n ) m max Φ ∈ I (cid:13)(cid:13)(cid:13) (cid:126) f Φ (cid:13)(cid:13)(cid:13) ≤ O ( n ) √ m max Φ ∈ I (cid:13)(cid:13)(cid:13) (cid:126) f Φ (cid:13)(cid:13)(cid:13) ∞ , where the ﬁrst inequality comes from Massart’s Lemma (see Lemma 6) and the second inequality comes from the fact that (cid:107)·(cid:107) ≤ √ m (cid:107)·(cid:107) ∞ .Now, let us consider IQP ciruits with access to the CCZ gate. Let us deﬁne I ( k ) CCZ to be the set of IQP circuits with at most k CCZ gates. Then, the size of I ( k ) CCZ is (cid:12)(cid:12)(cid:12) I ( k ) CCZ (cid:12)(cid:12)(cid:12) ≤ |I| × (cid:32) k ∑ j = (cid:18) n j (cid:19)(cid:33) ≤ O ( n ) n k . (D4)Therefore, by Massart’s Lemma, we haveˆ R S (cid:16) F ( I ( k ) CCZ ) (cid:17) ≤ O (( n + k log n ) / ) √ m max Φ ∈ I ( k ) CCZ (cid:13)(cid:13)(cid:13) (cid:126) f Φ (cid:13)(cid:13)(cid:13) ∞ . (D5) Appendix E: Results about the Gaussian complexity

In the main text, we focused on the Rademacher complexity. In this appendix, we will show that similar results hold for theGaussian complexity.

Theorem 11.

Given m independent samples S = ( (cid:126) z , ...,(cid:126) z m ) and a resource channel Ψ , then we have the following bound ˆ G S ( F ( O )) ≤ ˆ G S ( F ( O Ψ )) ≤ ( + γ ( Ψ )) ˆ G S ( F ( O )) , (E1) where γ ( Ψ ) is the free robustness with respect to the set O , that is, γ ( Ψ ) : = min (cid:26) λ | Ψ + λ Φ + λ ∈ Conv ( O ) , Φ ∈ Conv ( O ) (cid:27) . Therefore, for any probability distribution D on the sample space, if each sample (cid:126) z i is chosen independently according to D fori = , . . . , m, then we have G D ( F ( O )) ≤ G D ( F ( O Ψ )) ≤ ( + γ ( Ψ )) R D ( F ( O )) . (E2) Proof.

The proof is the same as that for the Rademacher complexity, except that we will need to replace the set A in Lemma 8and Lemma 9 by A (cid:48) = { (cid:126) ε ∈ R m | (cid:104) (cid:126) ε , (cid:126) f Ψ (cid:105) > sup f ∈ F ( O ) (cid:104) (cid:126) ε , (cid:126) f (cid:105) } . (E3)2 Theorem 12.

Given m independent samples S = ( (cid:126) z , . . . ,(cid:126) z m ) and a resource channel Ψ , we have the following bound ˆ G S ( F ( O ( k ) Ψ )) ≤ γ ∗ ˆ G S ( F ( O )) , (E4) where γ ∗ = min { + γ max , n , ( + γ ( Ψ )) k } , and γ max , n is the maximal free robustness over quantum channels on n qubits.Given a probability distribution D on the sample space, if each sample (cid:126) z i chosen independently according to D for i = , . . . , m,then we have G D ( F ( O ( k ) Ψ )) ≤ γ ∗ G D ( F ( O )) . (E5) Proof.

This result also holds for the Gaussian complexity because the Gaussian complexity also satisﬁes convexity and invarianceunder convex combination.

Appendix F: Alternative deﬁnition of Rademacher and Gaussian complexity involving absolute values

Given a set of real-valued functions F , the Rademacher and Gaussian complexity with respect to a given sample S =( z , ...., z m ) may alternatively be deﬁned as follows:¯ R S ( F ) = E sup f ∈ F m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m ∑ i = ε i f ( z i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , (F1)where the expectation is taken over the i.i.d Rademacher variables ε , ε , ..., ε m , and¯ G S ( F ) = E sup f ∈ F m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m ∑ i = g i f ( z i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , (F2)where the expectation is taken over i.i.d random Gaussian variables with zero mean and variance 1, i.e., g i ∼ N ( , ) .As the only difference between ˆ R S and ¯ R S is that the latter involves taking an absolute value | · | , it is easy to see that ¯ R S and¯ G S satisfy the following bounds. Theorem 13.

Given m independent samples S = ( (cid:126) z , ...,(cid:126) z m ) and a resource channel Ψ , we have the following bound ¯ R S ( F ( O )) ≤ ¯ R S ( F ( O ( k ) Ψ )) ≤ γ ∗ ¯ R S ( F ( O )) , ¯ G S ( F ( O )) ≤ ¯ G S ( F ( O ( k ) Ψ )) ≤ γ ∗ ¯ G S ( F ( O )) , (F3) where γ ∗ = min { ( + γ ( Ψ )) k , + γ max , n } and γ max , n is the maximal free robustness over quantum channels on n qubits.is the maximal free robustness over quantum channels on n qubits.