[PDF] Capacity and quantum geometry of parametrized quantum circuits

Abstract

To harness the potential of noisy intermediate-scale quantum devices, it is paramount to find the best type of circuits to run hybrid quantum-classical algorithms. Key candidates are parametrized quantum circuits that can be effectively implemented on current devices. Here, we evaluate the capacity and trainability of these circuits using the geometric structure of the parameter space via the effective quantum dimension, which reveals the expressive power of circuits in general as well as of particular initialization strategies. We assess the representation power of various popular circuit types and find striking differences depending on the type of entangling gates used. Particular circuits are characterized by scaling laws in their expressiveness. We identify a transition in the quantum geometry of the parameter space, which leads to a decay of the quantum natural gradient for deep circuits. For shallow circuits, the quantum natural gradient can be orders of magnitude larger in value compared to the regular gradient; however, both of them can suffer from vanishing gradients. By tuning a fixed set of circuit parameters to randomized ones, we find a region where the circuit is expressive, but does not suffer from barren plateaus, hinting at a good way to initialize circuits. Our results enhance the understanding of parametrized quantum circuits for improving variational quantum algorithms.

Full PDF

CCapacity and quantum geometry of parametrized quantum circuits

Tobias Haug, ∗ Kishor Bharti, and M. S. Kim QOLS, Blackett Laboratory, Imperial College London SW7 2AZ, UK Centre for Quantum Technologies, National University of Singapore 117543, Singapore

To harness the potential of noisy intermediate-scale quantum devices, it is paramount to ﬁnd thebest type of circuits to run hybrid quantum-classical algorithms. Key candidates are parametrizedquantum circuits that can be eﬀectively implemented on current devices. Here, we evaluate thecapacity and trainability of these circuits using the geometric structure of the parameter space viathe eﬀective quantum dimension, which reveals the expressive power of circuits in general as well asof particular initialization strategies. We assess the representation power of various popular circuittypes and ﬁnd striking diﬀerences depending on the type of entangling gates used. Particular circuitsare characterized by scaling laws in their expressiveness. We identify a transition in the quantumgeometry of the parameter space, which leads to a decay of the quantum natural gradient for deepcircuits. For shallow circuits, the quantum natural gradient can be orders of magnitude larger invalue compared to the regular gradient; however, both of them can suﬀer from vanishing gradients.By tuning a ﬁxed set of circuit parameters to randomized ones, we ﬁnd a region where the circuitis expressive, but does not suﬀer from barren plateaus, hinting at a good way to initialize circuits.Our results enhance the understanding of parametrized quantum circuits for improving variationalquantum algorithms.

Quantum computers promise to tackle challengingproblems for classical computers such as drug design,combinatorial optimisation and simulation of many-bodyphysics. While fully-ﬂedged large-scale quantum com-puters with error correction are not expected to be avail-able for many years, noisy intermediate-scale quantum(NISQ) devices have been investigated as a way to ap-proach computationally hard problems with quantumprocessors available now and in the near future [1, 2].Variational quantum algorithms (VQA) [3–6] have beena major hope in achieving a quantum speedup with NISQdevices. The core idea is to update a parametrized quan-tum circuit (PQC) in a hybrid quantum-classical fash-ion. Measurements performed on the PQC are fed intoa classical computer to propose a new set of variationalparameters. A key challenge has been the occurrenceof barren plateaus, i.e. the gradients used for optimi-sation vanish exponentially with increasing number ofqubits [7], as well as for various types of cost functions [8],entanglement [9] and noise [10]. Further, the classicaloptimization part of variational algorithms was shownto be NP-hard [11]. Quantum algorithms that avoid thefeed-back loop to circumvent the barren plateau problemshave been proposed [12–17]. Besides this approach, ini-tialization strategies [18–20] and layer-wise learning [21]for VQA could help to solve the aforementioned prob-lems. However, tools to evaluate the power of thesestrategies are lacking. Hardware eﬃcient ans¨atze havebeen proposed to tailor a PQC to the restrictions of thehardware [22]. A widely used choice is quantum circuitsarranged in layers of single-qubit rotations followed bytwo-qubit entangling gates. However, a key question isthe space of possible states this ansatz type can repre-sent [23, 24]. ∗ [email protected] Here, we introduce the eﬀective quantum dimension G C and parameter dimension D C as a quantitative mea-sure of the capacity of a PQC. Parameter dimension D C measures the total number of independent parameters aquantum state deﬁned by the PQC can express. In con-trast, the eﬀective quantum dimension G C [25, 26] is alocal measure to quantify the space of states that canbe accessed by locally perturbing the parameters of thePQC. Both measures can be derived from the quantumgeometric structure of the PQC via the quantum Fisherinformation metric (QFI) F [27, 28]. From the QFI, onecan obtain the quantum natural gradient (QNG) for amore eﬃcient optimisation via gradients [27–29]. Thesemethods allow us to evaluate the expressive power, train-ability and number of redundant parameters of diﬀerentPQCs, and ﬁnd better initialization strategies.As demonstration of our tools, we provide an in-depthinvestigation of popular hardware-eﬃcient circuits, com-posed of layered single-qubit rotations and two-qubit en-tangling gates in various arrangements. We ﬁnd strik-ing diﬀerences depending on the choice of circuit struc-ture that aﬀect both the expressive power of the PQCin general as well as the quality of speciﬁc initializationstrategies. We calculate the number of redundant pa-rameters of various PQC types, as well as how fast theyconverge towards random quantum states as a functionof the number of layers. The choice of entangling gatehas a pronounced eﬀect on the representation power ofparticular initialization strategies.We reveal a transition in the spectrum of the QFI indeep circuits, which leads to a decay of the QNG. Forshallow circuits, the QNG can be orders of magnitudelarger than the regular gradient. However, both suﬀerfrom the barren plateau problem. By tuning the PQCsparameters from zero to a random set of parameters, weﬁnd a region where both large gradients and large eﬀec-tive quantum dimension G C coexist, which could serve a r X i v : . [ qu a n t - ph ] F e b = or or= p timesCHAIN all-to-all (ALL) alternating (ALT)a bc FIG. 1. a) Sketch of hardware-eﬃcient parametrized quantum circuit (PQC) U ( θ ) | (cid:105) ⊗ N = (cid:81) l = p [ W l V l ( θ l )] √ H d ⊗ N | (cid:105) ⊗ N withparameters θ and initial state | (cid:105) of all N qubits being in state zero. The PQC consists of an initial layer of √ H d gates appliedto each qubit, where H d is the Hadamard gate, followed by p repeated layers of parametrized single qubit rotations V l ( θ l )and entangling gates W l . V l ( θ l ) consists of single-qubit rotations R α ( θ l,n ) = exp( − iσ αn θ l,n /

2) at layer l and qubit n aroundaxis α ∈ { x, y, z } . b) Two-qubit entangling gates w l considered are CNOT gates (control- σ x ), CPHASE gates (control- σ z ,diag(1 , , , − √ iSWAP gates. c) Entangling layer W l is composed of the two-qubit entangling gates w l , which arearranged in either a nearest-neighbor one-dimensional chain topology (denoted as CHAIN), all-to-all connection (ALL) or in aalternating fashion (ALT) for even and odd layers l . as a good set of initial parameters for the training ofvariational algorithms. I. PARAMETRIZED QUANTUM CIRCUITS

A PQC generates a quantum state of N qubits | ψ ( θ ) (cid:105) = U ( θ ) | (cid:105) ⊗ N , (1)with the unitary U ( θ ), the M -dimensional parameter vec-tor θ and product state | (cid:105) ⊗ N (see Fig.1). The structureof the PQC inﬂuences its power to represent quantumstates [23, 24, 30]. One way to measure expressivenessis by determining the distance between the distributionof states generated by the circuit and the Haar randomdistribution of states [23, 24]. This tells us how well thePQC can represent arbitrary states across the Hilbertspace. The appearance of barren plateaus or vanish-ing gradients is connected to the aforementioned mea-sure [7, 31]. The variance of the gradient var( ∂ i E ) = (cid:104) ( ∂ i E ) (cid:105) − (cid:104) ∂ i E (cid:105) ( (cid:104) . (cid:105) denoting statistical average overmany random instances) in respect to the expectationvalue of a local Hamiltonian H ( E = (cid:104) | U † ( θ ) HU ( θ ) | (cid:105) )can vanish exponentially with the number of qubits forPQCs with random choice of parameters. The variancedecreases also with number of layers p of the PQC un-til a speciﬁc p r , where it remains constant upon furtherincrease of p > p r . For local cost functions, it has been shown that in most cases low variance of the gradient ofsuch PQCs correlates with high expressibility [31]. II. PARAMETER DIMENSION

We now introduce the parameter dimension D C of aPQC as another measure of capacity. As introduction,we take a generic quantum state which is parametrizedby in total M = 2 N real and complex coeﬃcients | ψ ( a, b ) (cid:105) = N (cid:88) j =1 ( a j + ib j ) | j (cid:105) , (2)where | j (cid:105) is the j -th computational basis state and a j , b j ∈ R . One can map the above state to D C = 2 N +1 − N +1 − b j = 0, we ﬁnd D C = 2 N − C the parameter dimension D C asthe number of independent parameters that can be repre-sented in the space of quantum states. In general, D C for N qubits is upper bounded by the generic state Eq. (2).We deﬁne the redundancy R = M − D C M , (3)which is the fraction of parameters of the PQC that donot contribute to changing the quantum state. θ φv θ v φ w θ w =0 φ G =1 C G =2 C D =2 C FIG. 2. Example to demonstrate the eﬀective quantum di-mension G C and parameter dimension D C for a single qubitparametrized as | ψ ( θ, ϕ (cid:105) = cos( θ/ | (cid:105) + exp( iϕ ) sin( θ/ | (cid:105) . D C = 2 is the total number of independent parameters ofthe quantum state. G C denotes the number of independentdirections a quantum state can move by locally perturbingits parameters θ , ϕ . For a random state | v (cid:105) ( θ / ∈ { , π } )two possible directions exist, along v θ and v ϕ . The par-ticular state | w ( θ = π, ϕ ) (cid:105) can only be perturbed in di-rection w θ as adjusting ϕ does not change the state (e.g. | w ( π, ϕ + (cid:15) ) (cid:105) = | w ( π, ϕ ) (cid:105) ), thus G C = 1. III. EFFECTIVE QUANTUM DIMENSION

Now, we explain how the QFI F ( θ ) quantiﬁes the ex-pressive power of a PQC (see Supplemental materials Bfor an introduction to the QFI and QNG). One can relate F ( θ ) to the distance in the space of pure quantum states,which is given by the Fubini-Study distanceDist Q (cid:16) | ψ ( θ ) (cid:105) , | ψ ( θ + d θ ) (cid:105) (cid:17) = (cid:88) i,j F ij ( θ )d θ i d θ j , (4)where Dist Q ( x, y ) = |(cid:104) x | y (cid:105)| and the QFI [27, 28] F ij ( θ ) = Re( (cid:104) ∂ i ψ | ∂ j ψ (cid:105) − (cid:104) ∂ i ψ | ψ (cid:105)(cid:104) ψ | ∂ j ψ (cid:105) ) . (5) F ( θ ) quantiﬁes the change of the quantum state whenadjusting its parameter θ inﬁnitesimally to θ + d θ . Thesingular value decomposition F = V SV T , (6)gives us V , which is a real-valued unitary with the i -th eigenvector α ( i ) placed at the i -th column of V , and S , which is a diagonal matrix with the M non-negativeeigenvalues λ ( i ) of F ( θ ) along the diagonal. The eigen-values and eigenvectors obey the equation F ( θ ) α ( i ) = λ ( i ) α ( i ) . Inserting Eq. (6) into Eq. (4) gives usDist Q (cid:16) | ψ ( θ ) (cid:105) , | ψ ( θ + dθ ) (cid:105) (cid:17) = d θ T F dθ = dθ T V SV T d θ . (7) Now, we assume that the small variations in θ are in thedirection of the i -th eigenvector of F with d θ = d µα ( i ) ,where d µ is an inﬁnitesimal scalar. We ﬁndDist Q (cid:16) | ψ ( θ ) (cid:105) , | ψ ( θ + dµα ( i ) ) (cid:105) (cid:17) = dµα ( i ) T V SV T α ( i ) d µ = λ ( i ) d µ d µ , where we have used V T α ( i ) = e ( i ) , where e ( i ) is the i -th basis vector. When updating θ (cid:48) = θ + d µα ( i ) , thequantum state changes at a rate that is proportionalto λ ( i ) . Eigenvalues λ ( i ) = 0 are called singularitiesas there is no change in the quantum state at all, i.e. |(cid:104) ψ ( θ ) | ψ ( θ + d µα ( i ) ) (cid:105)| = 1. The case λ ( i ) being verysmall, i.e. 1 (cid:29) λ ( i ) >

0, is called near singularity andis associated with plateaus in classical machine learningwhere training slows down [32].We now deﬁne the eﬀective quantum dimension G C ( θ )for a PQC denoted as C . It is given as the total numberof non-zero eigenvalues λ ( i ) ( θ ) of the QFI F ( θ ) initializedwith parameters θ [25, 26] G C ( θ ) = M (cid:88) i =1 I ( λ ( i ) ( θ )) , (8)where I ( x ) = 0 for x = 0 and I ( x ) = 1 for x (cid:54) = 0. G C ( θ ) is a local measure of expressiveness that countsthe number of independent directions in the state spacethat can be accessed by an inﬁnitesimal update of θ .A straightforward example is a generic single qubitquantum state (see Fig.2) | ψ ( θ, ϕ ) (cid:105) = cos (cid:18) θ (cid:19) | (cid:105) + exp( iϕ ) sin (cid:18) θ (cid:19) | (cid:105) (9) F = (cid:20) ( θ ) (cid:21) . (10)The eigenvalues and eigenvectors of the QFI F arestraightforward to calculate with λ = 1, α = { , } and λ = sin ( θ ), α = { , } . The eﬀective quantumdimension G C ( θ, ϕ ) = D C = 2, except for the specialcase θ = nπ , n integer, the eigenvalue is λ = 0 andthus G C ( nπ, ϕ ) = 1. Here, any change in the directionof eigenvector α (corresponding to changing ϕ ) will notyield any change in the underlying quantum state. How-ever note that except for these singular parameters weﬁnd G C = 2, which is equivalent to the maximal numberof independent parameters of a qubit.Further, consider the single qubit circuit with Pauli z matrix σ z and Hadamard gate H d U ( θ ) | (cid:105) = M (cid:89) i =1 (cid:20) exp( − i θ i σ z ) (cid:21) H d | (cid:105) . (11)Here, we ﬁnd F = J M,M , where J M,M is a M × M matrix ﬁlled with ones. A diagonalization gives us M − λ = 0, and one eigenvalue λ = M with eigenvector α = √ M J M, . This circuit has a lowparameter dimension D C = 1 and a large redundancy of R = ( M − /M , i.e. there are M − G C is less than the parameter dimension D C and the total number of parameters MG C ( θ ) ≤ D C ≤ M . (12)Given the aforementioned PQC type with a random set ofparameters θ rand ∈ rand(0 , π ), we ﬁnd numeric evidencethat G C ( θ rand ) is approximately equivalent to D C G C ( θ rand ) (cid:39) D C . (13)The core intuition is that starting from a suﬃciently ran-dom initial parameter set, a change of the PQC param-eters in the right direction is able to bring one closer toany quantum state that can be expressed by the PQC.For speciﬁc choices of parameters such as θ = 0 we ﬁnd G C < D C . Moving suﬃciently away from these specialpoints, we recover that G C (cid:39) D C (see Fig.6).We stress that Eq. (13) is not valid for arbitrary quan-tum circuits, e.g. circuits where the parameters do notenjoy a 2 π periodicity. As simple example take theevolution of a single qubit with a single parameter tU ( t ) | (cid:105) = exp( − i √ σ z t ) exp( − iσ x t ) | (cid:105) . The evolutionover all possible t (note the absence of 2 π periodicity)will cover all possible quantum states and thus D C = 2,whereas the eﬀective quantum dimension (with only asingle parameter t ) is G C = 1 < D C .We now consider diﬀerent types of hardware eﬃcientPQC | ψ ( θ ) (cid:105) = U ( θ ) | (cid:105) ⊗ N , which are circuits that can beeﬃciently run on NISQ quantum processors. We choosean initial state | (cid:105) ⊗ N , followed by a single layer of thesquare root of the Hadamard gate ( √ H d ) on every qubit.Then, we repeat p layers composed of parametrized singlequbit rotations and a set of two-qubit entangling gates(see Fig.1a). The single qubit rotations are either cho-sen randomly to be around the { x, y, z } axis, or ﬁxedto a speciﬁc axis. The two-qubit entangling gates areeither CNOT, CPHASE or √ iSWAP gates (see Fig.1b),that are common native gates in current quantum proces-sors [33]. The entangling gates in each layer are arrangedin either a nearest-neighbor chain topology (CHAIN),all-to-all connections (ALL) or in an alternating nearest-neighbor fashion (ALT) (see Fig.1c). The numerical cal-culations are performed using Yao [34]. IV. RESULTS

As a demonstration of our methods, in Fig.3 we pro-vide an in-depth characterization of a particular PQC asfunction of number of layers p . Each layer consists ofparametric single-qubit rotations around x , y or z axis, p D C D C M max( D C ) a p v a r ( l o g ( e i g ( F ))) b p m i n ( e i g ( F )) c log ( F ) P ( F ) / P m a x p = 10 p = 50 p = 210 p = 300 de N v a r ( g r a d i e n t ) var( k E )var( F k E ) f FIG. 3. Properties of PQC consisting of p layers of randomlychosen x , y , z rotations, followed by CNOT gates in a chaintopology (see Fig.1) for N = 10 qubits. a) The parameter di-mension D C of the circuit scales linearly with p , until it levelsat a characteristic value p c ≈ b) Variance of the loga-rithm of the non-zero eigenvalues of F . The variance peaksaround p ≈ p c . c) Minimal non-zero eigenvalue of F against p . It increases for p > p c . d) Histogram of logarithm ofeigenvalues of Fisher information matrix F . The width of thedistribution increases with p , with a pronounced tail at small F developing around p ≈ p c , which disappears for p > p c . e) Variance of the gradient var( ∂ k E ) and QNG var( F − ∂ k E )in respect to the Hamiltonian H = σ z σ z . The gradient de-cays until p ≈

20, after which it remains constant. The QNGremains larger than the regular gradient, but decreases for p > p c . f ) Variance of gradients and QNG for varying qubitnumber N for depth p = 2 N , showing approximate exponen-tial decrease with N . which are randomly chosen at every qubit and layer aswell as CNOT gates in a chain topology (see CHAIN inFig.1). The parameter dimension D C (i.e. number ofindependent parameters of the quantum state that canbe represented by the PQC) increases linearly with p inFig.3a, until it reaches the maximal possible value for D C = 2 N +1 − p c .This point is reﬂected in the spectrum of the QFI F ,averaged over random instances of the PQC (see Fig.3b-d). Most notably, the variance of the logarithm of thenon-zero eigenvalues reaches a maximum for p c (Fig.3b).Further, the minimum taken over all eigenvalues becomesminimal (Fig.3c). We can see this more clearly in the dis-tribution of eigenvalues (Fig.3d). With increasing p , thedistribution becomes broader, with a pronounced tail ofsmall eigenvalues of F appearing close to the transitionat p c . Above the transition p > p c , the small eigenvaluessuddenly disappear from the distribution. We investi-gate the variance of the gradient and QNG in Fig.3e. Wenote that the variance of the regular gradient decays with p , reaching a minimum around p ≈

20 [7], upon whichit remains constant. The variance of the QNG remainslarger than the regular gradient, however the QNG de-cays for p > p c . In Fig.3f, we numerically ﬁnd that vari-ance of both regular gradient and QNG vanish exponen-tially with increasing number of qubits N , demonstratingthe barren plateau problem. p v a r ( k E ) CNOTCPHASEiSWAP a p R b p v a r ( k E ) c p R d p v a r ( k E ) CNOTCPHASEiSWAP e p R f FIG. 4. Variance of gradient and redundancy of diﬀerenthardware eﬃcient PQCs plotted against number of layers p for N = 10 qubits. Each layer consists of single qubit-rotations asrandomly chosen rotations around { x, y, z } axis. We plot dif-ferent arrangements of entangling gates (as shown in Fig.1c)with a,b) nearest-neighbor one-dimensional chain, c,d) all-to-all, e,f ) alternating nearest-neighbor connections. a,c,e) Variance of the gradient var( ∂ k E ) in respect to the Hamilto-nian H = σ z σ z . b,d,f ) Redundancy R (Eq. (3)), which isthe fraction of redundant parameters of the PQC. In Fig.4, we compare diﬀerent types of PQCs withdiﬀerent entangling gates and arrangements. We keep the single qubit-rotations as randomly chosen rotationsaround { x, y, z } axis. We note that all circuits show thesame qualitative behavior regarding the transition in theQFI (see Fig.3 and supplemental materials) as well assuﬀer from exponential decrease of the variance of thegradient with increasing number of qubits. However, keydiﬀerences in the diﬀerent PQCs remain as we demon-strate below. We show the variance of the gradient forrandom PQC parameters in Fig.4a,c,e for diﬀerent ar-rangements of the entangling gates (CHAIN, ALL, ALT)as well as diﬀerent types of entangling gates (CNOT,CPHASE, √ iSWAP). The variance decays with increas-ing p , until it reaches a constant level, the value of whichis the same for all gates and arrangements. However,CPHASE requires the most layers p to converge, fol-lowed by √ iSWAP and CNOT. Fig.4b,d,f shows the re-dundancy R , which is the fraction of redundant parame-ters of the PQC. It quickly reaches a constant level withincreasing p . √ iSWAP has consistently low R , while forCNOT it varies depending on the arrangement of entan-gling gates. For CPHASE, we have consistently larger R .This can be easily understood when considering that z rotations commute with the entangling CPHASE layer.When two z rotations appear consecutively on the samequbit, they yield a redundant parameter. R for CNOTdepends highly on the entangling gates arrangement.We note that for these PQCs the number of layers p c atwhich the transition of the QFI occurs can be estimatedfrom the value of redundancy R . We ﬁnd p c ≈ (1 − R C ) D C /N , where R C is the converged value of R . Theeigenvalue spectrum of these PQCs and further types ofPQCs are discussed in the supplemental materials C.In Fig.5, we ﬁx the single-qubit rotations around the y -axis and investigate diﬀerent entangling gates arranged ina nearest-neighbor one-dimensional chain. Depending onthe choice of entangling gates, we ﬁnd that the varianceof the gradient decays to a diﬀerent constant level withincreasing p (see Fig.5a). y √ iSWAP matches the vari-ance found in Fig.3e, whereas y CNOT and y CPHASEhave higher variance. In Fig.5b we show the maximal D C for many layers p . D C scales exponentially for y CNOT( D C ∝ N ) and y √ iSWAP ( D C ∝ N +1 ), whereas for y CPHASE we ﬁnd numerically an approximate quadraticscaling D C ∝ N .In Fig.6 we show how G C and the variance of the gra-dient changes when tuning the parameters of a PQC de-ﬁned as U ( aθ rand ) | (cid:105) , θ rand ∈ [0 , π ), a ∈ [0 , a = 0 to a = 1, this corresponds to changingthe PQC from parameters all zero to a PQC with ran-dom parameters. We exemplary show a PQC consistingof layered randomly chosen single qubit rotations around x , y , z axis and entangling gates arranged in a chain. InFig.6a, we show diﬀerent types of entangling gates. G C increases with a , reaching the parameter dimension D C for a = 1. CNOT and √ iSWAP increase faster with a compared to the PQC with CPHASE gates. The vari-ance of the gradient decreases sharply once a particular a is reached. Note that there is a speciﬁc range of pa- p v a r ( k E ) a N m a x i m a l D C y CPHASE y CNOT y iSWAP xyz CNOT b FIG. 5. Capacity of PQCs with y rotations and diﬀerent en-tangling gates. The entangling layer is arranged as a nearest-neighbor one-dimensional chain. Three of the PQCs have y rotations, and as reference we show a PQC with randomized x , y or z rotations and CNOT gates. a) Variance of the gra-dient var( ∂ k E ) in respect to the Hamiltonian H = σ z σ z . b) Maximal parameter dimension D C of the PQCs as function ofnumber of qubits N . For y CPHASE we ﬁnd an approximatepowerlaw D C ∝ N . log ( a ) G C ( a r a n d ) CNOTCPHASEiSWAP a log ( a ) v a r ( k E ) CNOTCPHASEiSWAP b FIG. 6. Tuning the PQC parameters θ = aθ rand , where a = (0 ,

1] and θ rand ∈ [0 , π ) for circuits composed of ran-dom x , y and z rotations and entangling gates arranged in achain conﬁgurations. a) The eﬀective quantum dimension G C as function of log ( a ). Black dashed-dotted line is numberof parameters M . b) Variance of the gradient var( ∂ k E ) inrespect to Hamiltonian H = σ z σ z . All plots show number oflayers p = 100 and N = 10 qubits. rameters log ( a ) ≈ − . G C and the variance of gradients remains large.Finally, in Fig.7 we show the scaling of G C ( θ = 0 , N )with number of qubits N for a PQC with entanglinggates in a chain arrangement initialized with θ = 0, cor-responding to the point a = 0 in Fig.6. Numerically,we ﬁnd linear scaling of G C ( θ = 0 , N ) for CPHASE en-tangling gates, quadratic scaling for CNOT gates andhigher order polynomial or even exponential scaling for √ iSWAP gates. V. DISCUSSION

We investigated the capacity and trainability of hard-ware eﬃcient PQCs using the quantum geometric struc-ture of the parameter space. We introduced the notion ofparameter dimension D C and eﬀective quantum dimen-sion G C which are global and local measures respectively N G C ( = ) CNOT N CPHASE N iSWAP FIG. 7. Eﬀective quantum dimension G C ( θ = 0) plottedagainst number of qubits N for a circuit consisting of ran-domly chosen parametrized rotations around x , y or z axiswith parameters θ = 0, and two-qubit entangling gates ar-ranged in a nearest-neighbor chain. We compare CNOT,CPHASE and √ iSWAP entangling gates. From numericalresults, we ﬁnd G C scales quadratically for CNOT gates, lin-early for CPHASE gates and higher order polynomial or evenexponential scaling for √ iSWAP. Number of layers p is chosensuch that G C ( θ = 0) is maximized. of the space of quantum states that can be accessed bythe PQC. Both can be derived from the QFI. We appliedthese concepts on exemplary PQCs composed of layers ofsingle-qubit rotations and diﬀerent types of entanglinggates arranged in various geometries (see Fig.1). Forcomparable circuit depth p , we ﬁnd strong numerical ev-idence that PQCs constructed from CNOT or √ iSWAPgates have lower variance of the gradient, and thus higherexpressibility compared to PQCs with CPHASE gates.For a speciﬁc type of PQC composed of y rotations andCPHASE gates, D C scales only quadratically with num-ber of qubits, which may imply that this PQC can beeﬃciently simulated on classical computers. We ﬁnd thatthe redundancy of parameters varies strongly dependingon the conﬁguration of the PQC as well as the type ofgates. The redundancy could be systematically reducedby choosing appropriate single-qubit rotations or by us-ing methods of [35] combined with the QFI.The eﬀective quantum dimension G C shows the expres-sive power of a PQC by local variations around a speciﬁcparameter set. We ﬁnd that depending on the entan-gling gates, G C can scale widely diﬀerent with number ofqubits, with the largest value found for √ iSWAP gates.While we only studied the case θ = 0, PQCs with cor-related parameters could feature similar behavior [18].Tuning the parameters of a PQC from zero to a randomset of parameters yields a crossover from large gradientsand small G C to vanishing gradients and large G C . Forthe PQCs investigated, we can ﬁnd a range of parame-ters that combines large gradients with a nearly maximal G C , which could be an optimal starting point for gradientbased optimisation. Trade-oﬀs between the expressibilityof a circuit and the magnitude of its gradients are a keychallenge in ﬁnding good initialization strategies [31].When increasing the number of layers p to a value p c ,a transition occurs in the QFI when D C reaches its max-imal possible value. The transition is characterized by adisappearance of small eigenvalues of the QFI and a peakin the variance of the logarithm of eigenvalues. This tran-sition may be related to a phase transition in the opti-mization landscape of control theory. When the numberof parameters reaches a threshold, the optimization land-scape changes from being spin-glass like with many near-degenerate minima to one with many degenerate globalminima [36, 37]. For deep circuits p > p c , the transitionleads to a decay of the QNG as small eigenvalues aresuppressed. For shallow circuits p < p c , the QNG can beorders of magnitude larger in value compared to the regu-lar gradient, however our numerical results suggest thatboth regular gradient and QNG decrease exponentiallywith number of qubits. Thus, the QNG most likely can-not help to solve the barren plateau problem. This con-trasts the natural gradient in classical machine learning,which is known to be able to overcome the plateau phe-nomena that leads to a slow down of optimization [32].Imaginary-time evolution and variational quantumsimulation use a matrix related to the QFI to update theparameters of the PQC [28, 38]. The eﬀective quantumdimension G C could give major insights on the conver-gence properties of these algorithms. Recent proposalsfor adaptively generating ans¨atze could beneﬁt from theQFI by taking the geometry of the PQC into accountwhen designing PQCs [20].While cumbersome, we note that the QFI can be de-termined via measurement of overlaps on the quantumprocessor [28, 39, 40]. However, in order to evaluate atype of PQC, it is often suﬃcient to study circuits of afew qubits via classical simulation [41], and extrapolate the results.During the training of a hardware eﬃcient PQC, theeigenvalue spectrum of the QFI can gain speciﬁc fea-tures, as has been shown for restricted Boltzmann ma-chines [42]. We show that the PQCs have a characteristiceigenvalue spectra depending on their conﬁguration (seealso supplemental materials). The eigenvalues hold im-portant information about the trainability and general-ization of a model. For example, a model that generalizeswell is known to have a low eﬀective dimension in classicalmachine learning [26]. It would be interesting to studyin what way these statements translate to quantum ma-chine learning. Further, connections to complementarymeasures of capacity based on classical Fisher informa-tion [43] and memory capacity [44] respectively could beexplored.It would be straightforward to extend the concepts ofquantum geometry to evaluate the capacity and train-ability of noisy PQCs [45], convolutional PQCs [46], op-timal control [47], quantum metrology [48] and program-able analog quantum simulators [49].Python code for the numerical calculations performedin this work are available at [50]. Acknowledgements—

This work is supported by a Sam-sung GRP project and the UK Hub in Quantum Com-puting and Simulation, part of the UK National Quan-tum Technologies Programme with funding from UKRIEPSRC grant EP/T001062/1. We are grateful to theNational Research Foundation and the Ministry of Edu-cation, Singapore for ﬁnancial support. [1] J. Preskill, Quantum , 79 (2018).[2] K. Bharti, A. Cervera-Lierta, T. H. Kyaw, T. Haug,S. Alperin-Lea, A. Anand, M. Degroote, H. Heimonen,J. S. Kottmann, T. Menke, W.-K. Mok, S. Sim, L.-C.Kwek, and A. Aspuru-Guzik, arXiv:2101.08448 (2021).[3] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q.Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. Obrien,Nature communications , 4213 (2014).[4] J. R. McClean, J. Romero, R. Babbush, and A. Aspuru-Guzik, New Journal of Physics , 023023 (2016).[5] M. Cerezo, A. Arrasmith, R. Babbush, S. C. Benjamin,S. Endo, K. Fujii, J. R. McClean, K. Mitarai, X. Yuan,L. Cincio, et al. , arXiv preprint arXiv:2012.09265 (2020).[6] Y. Cao, J. Romero, J. P. Olson, M. Degroote, P. D. John-son, M. Kieferov´a, I. D. Kivlichan, T. Menke, B. Per-opadre, N. P. Sawaya, et al. , Chemical reviews , 10856(2019).[7] J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Babbush,and H. Neven, Nature communications , 4812 (2018).[8] M. Cerezo, A. Sone, T. Volkoﬀ, L. Cincio, and P. J.Coles, arXiv:2001.00550 (2020).[9] C. O. Marrero, M. Kieferov´a, and N. Wiebe,arXiv:2010.15968 (2020).[10] S. Wang, E. Fontana, M. Cerezo, K. Sharma, A. Sone,L. Cincio, and P. J. Coles, arXiv:2007.14384 (2020).[11] L. Bittel and M. Kliesch, arXiv:2101.07267 (2021). [12] H.-Y. Huang, K. Bharti, and P. Rebentrost,arXiv:1909.07344 (2019).[13] K. Bharti, arXiv:2009.11001 (2020).[14] K. Bharti and T. Haug, arXiv:2011.06911 (2020).[15] K. Bharti and T. Haug, arXiv:2010.05638 (2020).[16] T. Haug and K. Bharti, arXiv:2011.14737 (2020).[17] J. W. Z. Lau, K. Bharti, T. Haug, and L. C. Kwek,arXiv:2101.07677 (2021).[18] T. Volkoﬀ and P. J. Coles, arXiv preprintarXiv:2005.12200 (2020).[19] E. Grant, L. Wossnig, M. Ostaszewski, andM. Benedetti, Quantum , 214 (2019).[20] H. R. Grimsley, S. E. Economou, E. Barnes, and N. J.Mayhall, Nature communications , 1 (2019).[21] A. Skolik, J. R. McClean, M. Mohseni, P. van der Smagt,and M. Leib, arXiv preprint arXiv:2006.14904 (2020).[22] A. Kandala, A. Mezzacapo, K. Temme, M. Takita,M. Brink, J. M. Chow, and J. M. Gambetta, Nature , 242 (2017).[23] K. Nakaji and N. Yamamoto, arXiv:2005.12537 (2020).[24] S. Sim, P. D. Johnson, and A. Aspuru-Guzik, AdvancedQuantum Technologies , 1900070 (2019).[25] D. J. MacKay, in Advances in neural information pro-cessing systems (1992) pp. 839–846.[26] W. J. Maddox, G. Benton, and A. G. Wilson, arXivpreprint arXiv:2003.02139 (2020). [27] N. Yamamoto, arXiv:1909.05074 (2019).[28] J. Stokes, J. Izaac, N. Killoran, and G. Carleo, Quantum , 269 (2020).[29] D. Wierichs, C. Gogolin, and M. Kastoryano, arXivpreprint arXiv:2004.14666 (2020).[30] Y. Du, M.-H. Hsieh, T. Liu, and D. Tao, Phys. Rev. Res. , 033125 (2020).[31] Z. Holmes, K. Sharma, M. Cerezo, and P. J. Coles, arXivpreprint arXiv:2101.02138 (2021).[32] S.-i. Amari, Information geometry and its applications ,Vol. 194 (Springer, 2016).[33] P. Krantz, M. Kjaergaard, F. Yan, T. P. Orlando, S. Gus-tavsson, and W. D. Oliver, Applied Physics Reviews ,021318 (2019).[34] X.-Z. Luo, J.-G. Liu, P. Zhang, and L. Wang, Quantum , 341 (2020).[35] L. Funcke, T. Hartung, K. Jansen, S. K¨uhn, and P. Stor-nati, arXiv preprint arXiv:2011.03532 (2020).[36] M. Bukov, A. G. R. Day, D. Sels, P. Weinberg,A. Polkovnikov, and P. Mehta, Phys. Rev. X , 031086(2018).[37] H. A. Rabitz, M. M. Hsieh, and C. M. Rosenthal, Science , 1998 (2004).[38] S. McArdle, T. Jones, S. Endo, Y. Li, S. C. Benjamin,and X. Yuan, npj Quantum Information , 1 (2019).[39] X. Yuan, S. Endo, Q. Zhao, Y. Li, and S. C. Benjamin,Quantum , 191 (2019).[40] K. Mitarai and K. Fujii, Physical Review Research ,013006 (2019).[41] T. Jones, arXiv preprint arXiv:2011.02991 (2020).[42] C.-Y. Park and M. J. Kastoryano, Physical Review Re-search , 023232 (2020).[43] A. Abbas, D. Sutter, C. Zoufal, A. Lucchi, A. Figalli, andS. Woerner, arXiv preprint arXiv:2011.00027 (2020).[44] L. G. Wright and P. L. McMahon, in CLEO:QELS Fundamental Science (Optical Society of Amer-ica, 2020) pp. JM4G–5.[45] B. Koczor and S. C. Benjamin, arXiv preprintarXiv:1912.08660 (2019).[46] I. Cong, S. Choi, and M. D. Lukin, Nature Physics ,1273 (2019).[47] A. B. Magann, C. Arenz, M. D. Grace, T.-S. Ho, R. L.Kosut, J. R. McClean, H. A. Rabitz, and M. Sarovar, PR X Quantum , 010101 (2020).[48] J. J. Meyer, J. Borregaard, and J. Eisert, arXiv preprintarXiv:2006.06303 (2020).[49] V. Bastidas, T. Haug, C. Gravel, L.-C. Kwek, W. Munro,and K. Nemoto, arXiv:2009.00823 (2020).[50] T. Haug, “Quantum geometry of parametrizedquantum circuits,” https://github.com/txhaug/quantum-geometry .[51] S.-I. Amari, Neural computation , 251 (1998). Appendix A: Variational quantum eigensolver

The core idea of Variational quantum eigensolver(VQE) is to ﬁnd the ground state of a Hamiltonian H by minimizing the parameters θ of a PQC in regardsto an objective function that represents the energy of agiven Hamiltonian E ( θ ) = (cid:104) | U † ( θ ) HU ( θ ) | (cid:105) [3]. Theminimisation is performed with a classical optimisation algorithm, whereas the energy is measured on a quantumdevice. According to the Ritz variational principle, theobjective function is lower bounded by the ground stateenergy of H , i.e. E ( θ ) ≥ E g , where E g is the true groundstate of H . Appendix B: Quantum Fisher information metric

For VQE, the objective function is updated in hybridclassical-quantum algorithm in an iterative manner. Atstep n of the procedure, the objective function is eval-uated on the quantum computer for a given θ n . Basedon the result, a classical computer selects the next choice θ n +1 such that it (hopefully) decreases the objective func-tion. A common scheme to update parameters is ordi-nary gradient descent θ n +1 = θ n − η ∂E ( θ ) ∂θ , (B1)where η is a small coeﬃcient and ∂E ( θ ) /∂θ is the gradi-ent of the objective function.The above update rule assumes that the parameterspace for θ is a ﬂat Euclidian space. However, in gen-eral this is not the case, as the underlying PQC and costfunction do not have such simple forms. Recent stud-ies have proposed the quantum natural gradient (QNG),inspired from the natural gradient in classical machinelearning [51], to minimize the objective function [27, 28].The main idea is to use information about how fast thequantum state changes when adjusting the parameter θ in a particular direction. Optimisation with the naturalgradient updates the parameters according to θ k +1 = θ k − η k F ( θ ) − ∂E ( θ ) ∂θ , (B2)where F ( θ ) is the Fubini-Study metric tensor or quantumFisher information metric (QFI) F ij = Re( (cid:104) ∂ i ψ | ∂ j ψ (cid:105) − (cid:104) ∂ i ψ | ψ (cid:105)(cid:104) ψ | ∂ j ψ (cid:105) ) , (B3)where | ∂ i ψ (cid:105) = ∂∂θ i | ψ ( θ ) (cid:105) denotes the partial derivativeof | ψ ( θ ) (cid:105) . One can relate F ( θ ) to the distance in thespace of pure quantum states, which is the Fubini-Studydistance given byDist Q (cid:16) | ψ ( θ ) (cid:105) , | ψ ( θ + d θ ) (cid:105) (cid:17) = (cid:88) i,j F ij ( θ )d θ i d θ j , (B4)where Dist Q ( x, y ) = |(cid:104) x | y (cid:105)| .The QNG has been demonstrated to speed up gradi-ent based optimization techniques for PQCs [27, 28] andavoid local minimas [29]. Eﬃcient classical methods tocalculate the quantum Fisher information matrix havebeen proposed [41].For the purpose of this study, we investigate PQCs ofthe form | ψ ( θ ) (cid:105) = U ( θ ) | (cid:105) , where we assume U ( θ ) asa variational circuit in hardware eﬃcient manner, whichconsists of p layers of parametrized single-qubit rotationsfollowed by ﬁxed entangling gates U ( θ ) | (cid:105) = (cid:89) l = p [ W l V l ( θ l )] P | (cid:105) , (B5)where V l ( θ l ) consists of parametric single qubit rotations, P is the ﬁrst layer of single-qubit rotations and W l arenon-parametric entangling gates.We assume that the quantum processor has a tensoredstructure of N qubits with Hilbert space N = 2 N . Thequantum circuit is parametrized by θ , a vector of M = pN real numbers, where we assume that θ l are the N parameters that govern the unitary V l ( θ l ) for the l -thlayer of the circuit [28]. We deﬁne the following shortform for representing subcircuits between layers l ≤ l U [ l : l ] := W l V l · · · W l V l . (B6)We now show how to calculate the derivative of a pa-rameter θ l,k , which is the k -th parameter for a single-qubit rotations within the l -th layer. With V l = (cid:81) Nn =1 exp( − iθ l,n / σ αn ), where σ αn is a Pauli operator α ∈ { x, y, z } acting on qubit n . We deﬁne ∂ l,n = ∂∂ θl,n asthe partial derivative for the parameter θ l,n which con-trols the single qubit rotation on the n -th qubit in the l -th layer ∂ l,n V l ( θ l ) = − i σ αn V l ( θ l ) , (B7)where α is a function of n and l . For the full unitary weﬁnd ∂ l,n U ( θ ) | (cid:105) = U ( l : L ] W l ∂ l,n V l ( θ l ) U [1: l ) P | (cid:105) , = U ( l : L ] W l V l ( θ l )( − i σ αn U [1: l ) P | (cid:105) , = U [ l : L ] ( − i σ αn U [1: l ) P | (cid:105) . (B8)With Eq. (B8) inserted into Eq. (B3), we can calculatethe QFI. Appendix C: Further data on the PQCs

In Fig.8, we show further types of PQCs as deﬁnedin the caption. We highlight that the PQC rand( xyw )CPHASE has lower redundancy compared to rand( xyz )CPHASE. The reason is that the z rotations, which can commute with the CPHASE layer, are replaced with non-commuting ( x + y ) / √ zxz CNOT, which has ﬁrstbeen introduced in [22]. We note that while it has threerotations per qubit and layer, compared to rand( xyz )CNOT the decay of the variance of the gradient as func-tion of p remains the same in both types of PQC. Finally,we show further examples of the transition in the QFI,visible both in the peak of the variance of the logarithmof the eigenvalues, and in the decay of the QNG. p v a r ( k E ) rand( xyz ) CNOTrand( xyz ) CPHASErand( xyw ) CPHASE zxz CNOT a p v a r ( F k E ) b p R c p v a r ( l o g ( e i g ( F ))) d FIG. 8. Properties of further PQCs plotted against layers p for N = 10 qubits. PQCs have nearest-neighbor chain en-tangling layers. We deﬁne the type of PQCs in the legend:rand( xyz ) denotes randomized single-qubit rotations around { x, y, z } axis. rand( xyw ) denotes randomized single-qubit ro-tations around { x, y, ( x + y ) / √ } axis. zxz denotes that forevery layer there are three single-qubit rotations, around z , x and z axis. a) Variance of the gradient var( ∂ k E ) in re-spect to the Hamiltonian H = σ z σ z . b) Variance of QNGvar( F − ∂ k E ). c) Redundancy R of parameters of the PQCs. d) Variance of the logarithm of the eigenvalues of the QFI.

Appendix D: Histograms of eigenvalues

In Fig.9 we show the distribution of eigenvalues for thePQCs of Fig.4 in the main text. We ﬁnd that a charac-teristic spectrum for the diﬀerent PQC types. Note thatCPHASE appears to have more pronounced tails in allcases.0 log ( F ) P ( F ) / P m a x CNOTCPHASEiSWAP a log ( F ) P ( F ) / P m a x CNOTCPHASEiSWAP b log ( F ) P ( F ) / P m a x CNOTCPHASEiSWAP c FIG. 9. Distribution of eigenvalues of QFI for PQCs shown in Fig.4 in main text. a) nearest-neighbor chain arrangement ofentangling gates b) all-to-all connectivity c) alternating nearest-neighbor. All graphs for N = 10 qubits and number of layers pp