Capacity and quantum geometry of parametrized quantum circuits
CCapacity and quantum geometry of parametrized quantum circuits
Tobias Haug, ∗ Kishor Bharti, and M. S. Kim QOLS, Blackett Laboratory, Imperial College London SW7 2AZ, UK Centre for Quantum Technologies, National University of Singapore 117543, Singapore
To harness the potential of noisy intermediate-scale quantum devices, it is paramount to find thebest type of circuits to run hybrid quantum-classical algorithms. Key candidates are parametrizedquantum circuits that can be effectively implemented on current devices. Here, we evaluate thecapacity and trainability of these circuits using the geometric structure of the parameter space viathe effective quantum dimension, which reveals the expressive power of circuits in general as well asof particular initialization strategies. We assess the representation power of various popular circuittypes and find striking differences depending on the type of entangling gates used. Particular circuitsare characterized by scaling laws in their expressiveness. We identify a transition in the quantumgeometry of the parameter space, which leads to a decay of the quantum natural gradient for deepcircuits. For shallow circuits, the quantum natural gradient can be orders of magnitude larger invalue compared to the regular gradient; however, both of them can suffer from vanishing gradients.By tuning a fixed set of circuit parameters to randomized ones, we find a region where the circuitis expressive, but does not suffer from barren plateaus, hinting at a good way to initialize circuits.Our results enhance the understanding of parametrized quantum circuits for improving variationalquantum algorithms.
Quantum computers promise to tackle challengingproblems for classical computers such as drug design,combinatorial optimisation and simulation of many-bodyphysics. While fully-fledged large-scale quantum com-puters with error correction are not expected to be avail-able for many years, noisy intermediate-scale quantum(NISQ) devices have been investigated as a way to ap-proach computationally hard problems with quantumprocessors available now and in the near future [1, 2].Variational quantum algorithms (VQA) [3–6] have beena major hope in achieving a quantum speedup with NISQdevices. The core idea is to update a parametrized quan-tum circuit (PQC) in a hybrid quantum-classical fash-ion. Measurements performed on the PQC are fed intoa classical computer to propose a new set of variationalparameters. A key challenge has been the occurrenceof barren plateaus, i.e. the gradients used for optimi-sation vanish exponentially with increasing number ofqubits [7], as well as for various types of cost functions [8],entanglement [9] and noise [10]. Further, the classicaloptimization part of variational algorithms was shownto be NP-hard [11]. Quantum algorithms that avoid thefeed-back loop to circumvent the barren plateau problemshave been proposed [12–17]. Besides this approach, ini-tialization strategies [18–20] and layer-wise learning [21]for VQA could help to solve the aforementioned prob-lems. However, tools to evaluate the power of thesestrategies are lacking. Hardware efficient ans¨atze havebeen proposed to tailor a PQC to the restrictions of thehardware [22]. A widely used choice is quantum circuitsarranged in layers of single-qubit rotations followed bytwo-qubit entangling gates. However, a key question isthe space of possible states this ansatz type can repre-sent [23, 24]. ∗ [email protected] Here, we introduce the effective quantum dimension G C and parameter dimension D C as a quantitative mea-sure of the capacity of a PQC. Parameter dimension D C measures the total number of independent parameters aquantum state defined by the PQC can express. In con-trast, the effective quantum dimension G C [25, 26] is alocal measure to quantify the space of states that canbe accessed by locally perturbing the parameters of thePQC. Both measures can be derived from the quantumgeometric structure of the PQC via the quantum Fisherinformation metric (QFI) F [27, 28]. From the QFI, onecan obtain the quantum natural gradient (QNG) for amore efficient optimisation via gradients [27–29]. Thesemethods allow us to evaluate the expressive power, train-ability and number of redundant parameters of differentPQCs, and find better initialization strategies.As demonstration of our tools, we provide an in-depthinvestigation of popular hardware-efficient circuits, com-posed of layered single-qubit rotations and two-qubit en-tangling gates in various arrangements. We find strik-ing differences depending on the choice of circuit struc-ture that affect both the expressive power of the PQCin general as well as the quality of specific initializationstrategies. We calculate the number of redundant pa-rameters of various PQC types, as well as how fast theyconverge towards random quantum states as a functionof the number of layers. The choice of entangling gatehas a pronounced effect on the representation power ofparticular initialization strategies.We reveal a transition in the spectrum of the QFI indeep circuits, which leads to a decay of the QNG. Forshallow circuits, the QNG can be orders of magnitudelarger than the regular gradient. However, both sufferfrom the barren plateau problem. By tuning the PQCsparameters from zero to a random set of parameters, wefind a region where both large gradients and large effec-tive quantum dimension G C coexist, which could serve a r X i v : . [ qu a n t - ph ] F e b = or or= p timesCHAIN all-to-all (ALL) alternating (ALT)a bc FIG. 1. a) Sketch of hardware-efficient parametrized quantum circuit (PQC) U ( θ ) | (cid:105) ⊗ N = (cid:81) l = p [ W l V l ( θ l )] √ H d ⊗ N | (cid:105) ⊗ N withparameters θ and initial state | (cid:105) of all N qubits being in state zero. The PQC consists of an initial layer of √ H d gates appliedto each qubit, where H d is the Hadamard gate, followed by p repeated layers of parametrized single qubit rotations V l ( θ l )and entangling gates W l . V l ( θ l ) consists of single-qubit rotations R α ( θ l,n ) = exp( − iσ αn θ l,n /
2) at layer l and qubit n aroundaxis α ∈ { x, y, z } . b) Two-qubit entangling gates w l considered are CNOT gates (control- σ x ), CPHASE gates (control- σ z ,diag(1 , , , − √ iSWAP gates. c) Entangling layer W l is composed of the two-qubit entangling gates w l , which arearranged in either a nearest-neighbor one-dimensional chain topology (denoted as CHAIN), all-to-all connection (ALL) or in aalternating fashion (ALT) for even and odd layers l . as a good set of initial parameters for the training ofvariational algorithms. I. PARAMETRIZED QUANTUM CIRCUITS
A PQC generates a quantum state of N qubits | ψ ( θ ) (cid:105) = U ( θ ) | (cid:105) ⊗ N , (1)with the unitary U ( θ ), the M -dimensional parameter vec-tor θ and product state | (cid:105) ⊗ N (see Fig.1). The structureof the PQC influences its power to represent quantumstates [23, 24, 30]. One way to measure expressivenessis by determining the distance between the distributionof states generated by the circuit and the Haar randomdistribution of states [23, 24]. This tells us how well thePQC can represent arbitrary states across the Hilbertspace. The appearance of barren plateaus or vanish-ing gradients is connected to the aforementioned mea-sure [7, 31]. The variance of the gradient var( ∂ i E ) = (cid:104) ( ∂ i E ) (cid:105) − (cid:104) ∂ i E (cid:105) ( (cid:104) . (cid:105) denoting statistical average overmany random instances) in respect to the expectationvalue of a local Hamiltonian H ( E = (cid:104) | U † ( θ ) HU ( θ ) | (cid:105) )can vanish exponentially with the number of qubits forPQCs with random choice of parameters. The variancedecreases also with number of layers p of the PQC un-til a specific p r , where it remains constant upon furtherincrease of p > p r . For local cost functions, it has been shown that in most cases low variance of the gradient ofsuch PQCs correlates with high expressibility [31]. II. PARAMETER DIMENSION
We now introduce the parameter dimension D C of aPQC as another measure of capacity. As introduction,we take a generic quantum state which is parametrizedby in total M = 2 N real and complex coefficients | ψ ( a, b ) (cid:105) = N (cid:88) j =1 ( a j + ib j ) | j (cid:105) , (2)where | j (cid:105) is the j -th computational basis state and a j , b j ∈ R . One can map the above state to D C = 2 N +1 − N +1 − b j = 0, we find D C = 2 N − C the parameter dimension D C asthe number of independent parameters that can be repre-sented in the space of quantum states. In general, D C for N qubits is upper bounded by the generic state Eq. (2).We define the redundancy R = M − D C M , (3)which is the fraction of parameters of the PQC that donot contribute to changing the quantum state. θ φv θ v φ w θ w =0 φ G =1 C G =2 C D =2 C FIG. 2. Example to demonstrate the effective quantum di-mension G C and parameter dimension D C for a single qubitparametrized as | ψ ( θ, ϕ (cid:105) = cos( θ/ | (cid:105) + exp( iϕ ) sin( θ/ | (cid:105) . D C = 2 is the total number of independent parameters ofthe quantum state. G C denotes the number of independentdirections a quantum state can move by locally perturbingits parameters θ , ϕ . For a random state | v (cid:105) ( θ / ∈ { , π } )two possible directions exist, along v θ and v ϕ . The par-ticular state | w ( θ = π, ϕ ) (cid:105) can only be perturbed in di-rection w θ as adjusting ϕ does not change the state (e.g. | w ( π, ϕ + (cid:15) ) (cid:105) = | w ( π, ϕ ) (cid:105) ), thus G C = 1. III. EFFECTIVE QUANTUM DIMENSION
Now, we explain how the QFI F ( θ ) quantifies the ex-pressive power of a PQC (see Supplemental materials Bfor an introduction to the QFI and QNG). One can relate F ( θ ) to the distance in the space of pure quantum states,which is given by the Fubini-Study distanceDist Q (cid:16) | ψ ( θ ) (cid:105) , | ψ ( θ + d θ ) (cid:105) (cid:17) = (cid:88) i,j F ij ( θ )d θ i d θ j , (4)where Dist Q ( x, y ) = |(cid:104) x | y (cid:105)| and the QFI [27, 28] F ij ( θ ) = Re( (cid:104) ∂ i ψ | ∂ j ψ (cid:105) − (cid:104) ∂ i ψ | ψ (cid:105)(cid:104) ψ | ∂ j ψ (cid:105) ) . (5) F ( θ ) quantifies the change of the quantum state whenadjusting its parameter θ infinitesimally to θ + d θ . Thesingular value decomposition F = V SV T , (6)gives us V , which is a real-valued unitary with the i -th eigenvector α ( i ) placed at the i -th column of V , and S , which is a diagonal matrix with the M non-negativeeigenvalues λ ( i ) of F ( θ ) along the diagonal. The eigen-values and eigenvectors obey the equation F ( θ ) α ( i ) = λ ( i ) α ( i ) . Inserting Eq. (6) into Eq. (4) gives usDist Q (cid:16) | ψ ( θ ) (cid:105) , | ψ ( θ + dθ ) (cid:105) (cid:17) = d θ T F dθ = dθ T V SV T d θ . (7) Now, we assume that the small variations in θ are in thedirection of the i -th eigenvector of F with d θ = d µα ( i ) ,where d µ is an infinitesimal scalar. We findDist Q (cid:16) | ψ ( θ ) (cid:105) , | ψ ( θ + dµα ( i ) ) (cid:105) (cid:17) = dµα ( i ) T V SV T α ( i ) d µ = λ ( i ) d µ d µ , where we have used V T α ( i ) = e ( i ) , where e ( i ) is the i -th basis vector. When updating θ (cid:48) = θ + d µα ( i ) , thequantum state changes at a rate that is proportionalto λ ( i ) . Eigenvalues λ ( i ) = 0 are called singularitiesas there is no change in the quantum state at all, i.e. |(cid:104) ψ ( θ ) | ψ ( θ + d µα ( i ) ) (cid:105)| = 1. The case λ ( i ) being verysmall, i.e. 1 (cid:29) λ ( i ) >
0, is called near singularity andis associated with plateaus in classical machine learningwhere training slows down [32].We now define the effective quantum dimension G C ( θ )for a PQC denoted as C . It is given as the total numberof non-zero eigenvalues λ ( i ) ( θ ) of the QFI F ( θ ) initializedwith parameters θ [25, 26] G C ( θ ) = M (cid:88) i =1 I ( λ ( i ) ( θ )) , (8)where I ( x ) = 0 for x = 0 and I ( x ) = 1 for x (cid:54) = 0. G C ( θ ) is a local measure of expressiveness that countsthe number of independent directions in the state spacethat can be accessed by an infinitesimal update of θ .A straightforward example is a generic single qubitquantum state (see Fig.2) | ψ ( θ, ϕ ) (cid:105) = cos (cid:18) θ (cid:19) | (cid:105) + exp( iϕ ) sin (cid:18) θ (cid:19) | (cid:105) (9) F = (cid:20) ( θ ) (cid:21) . (10)The eigenvalues and eigenvectors of the QFI F arestraightforward to calculate with λ = 1, α = { , } and λ = sin ( θ ), α = { , } . The effective quantumdimension G C ( θ, ϕ ) = D C = 2, except for the specialcase θ = nπ , n integer, the eigenvalue is λ = 0 andthus G C ( nπ, ϕ ) = 1. Here, any change in the directionof eigenvector α (corresponding to changing ϕ ) will notyield any change in the underlying quantum state. How-ever note that except for these singular parameters wefind G C = 2, which is equivalent to the maximal numberof independent parameters of a qubit.Further, consider the single qubit circuit with Pauli z matrix σ z and Hadamard gate H d U ( θ ) | (cid:105) = M (cid:89) i =1 (cid:20) exp( − i θ i σ z ) (cid:21) H d | (cid:105) . (11)Here, we find F = J M,M , where J M,M is a M × M matrix filled with ones. A diagonalization gives us M − λ = 0, and one eigenvalue λ = M with eigenvector α = √ M J M, . This circuit has a lowparameter dimension D C = 1 and a large redundancy of R = ( M − /M , i.e. there are M − G C is less than the parameter dimension D C and the total number of parameters MG C ( θ ) ≤ D C ≤ M . (12)Given the aforementioned PQC type with a random set ofparameters θ rand ∈ rand(0 , π ), we find numeric evidencethat G C ( θ rand ) is approximately equivalent to D C G C ( θ rand ) (cid:39) D C . (13)The core intuition is that starting from a sufficiently ran-dom initial parameter set, a change of the PQC param-eters in the right direction is able to bring one closer toany quantum state that can be expressed by the PQC.For specific choices of parameters such as θ = 0 we find G C < D C . Moving sufficiently away from these specialpoints, we recover that G C (cid:39) D C (see Fig.6).We stress that Eq. (13) is not valid for arbitrary quan-tum circuits, e.g. circuits where the parameters do notenjoy a 2 π periodicity. As simple example take theevolution of a single qubit with a single parameter tU ( t ) | (cid:105) = exp( − i √ σ z t ) exp( − iσ x t ) | (cid:105) . The evolutionover all possible t (note the absence of 2 π periodicity)will cover all possible quantum states and thus D C = 2,whereas the effective quantum dimension (with only asingle parameter t ) is G C = 1 < D C .We now consider different types of hardware efficientPQC | ψ ( θ ) (cid:105) = U ( θ ) | (cid:105) ⊗ N , which are circuits that can beefficiently run on NISQ quantum processors. We choosean initial state | (cid:105) ⊗ N , followed by a single layer of thesquare root of the Hadamard gate ( √ H d ) on every qubit.Then, we repeat p layers composed of parametrized singlequbit rotations and a set of two-qubit entangling gates(see Fig.1a). The single qubit rotations are either cho-sen randomly to be around the { x, y, z } axis, or fixedto a specific axis. The two-qubit entangling gates areeither CNOT, CPHASE or √ iSWAP gates (see Fig.1b),that are common native gates in current quantum proces-sors [33]. The entangling gates in each layer are arrangedin either a nearest-neighbor chain topology (CHAIN),all-to-all connections (ALL) or in an alternating nearest-neighbor fashion (ALT) (see Fig.1c). The numerical cal-culations are performed using Yao [34]. IV. RESULTS
As a demonstration of our methods, in Fig.3 we pro-vide an in-depth characterization of a particular PQC asfunction of number of layers p . Each layer consists ofparametric single-qubit rotations around x , y or z axis, p D C D C M max( D C ) a p v a r ( l o g ( e i g ( F ))) b p m i n ( e i g ( F )) c log ( F ) P ( F ) / P m a x p = 10 p = 50 p = 210 p = 300 de N v a r ( g r a d i e n t ) var( k E )var( F k E ) f FIG. 3. Properties of PQC consisting of p layers of randomlychosen x , y , z rotations, followed by CNOT gates in a chaintopology (see Fig.1) for N = 10 qubits. a) The parameter di-mension D C of the circuit scales linearly with p , until it levelsat a characteristic value p c ≈ b) Variance of the loga-rithm of the non-zero eigenvalues of F . The variance peaksaround p ≈ p c . c) Minimal non-zero eigenvalue of F against p . It increases for p > p c . d) Histogram of logarithm ofeigenvalues of Fisher information matrix F . The width of thedistribution increases with p , with a pronounced tail at small F developing around p ≈ p c , which disappears for p > p c . e) Variance of the gradient var( ∂ k E ) and QNG var( F − ∂ k E )in respect to the Hamiltonian H = σ z σ z . The gradient de-cays until p ≈
20, after which it remains constant. The QNGremains larger than the regular gradient, but decreases for p > p c . f ) Variance of gradients and QNG for varying qubitnumber N for depth p = 2 N , showing approximate exponen-tial decrease with N . which are randomly chosen at every qubit and layer aswell as CNOT gates in a chain topology (see CHAIN inFig.1). The parameter dimension D C (i.e. number ofindependent parameters of the quantum state that canbe represented by the PQC) increases linearly with p inFig.3a, until it reaches the maximal possible value for D C = 2 N +1 − p c .This point is reflected in the spectrum of the QFI F ,averaged over random instances of the PQC (see Fig.3b-d). Most notably, the variance of the logarithm of thenon-zero eigenvalues reaches a maximum for p c (Fig.3b).Further, the minimum taken over all eigenvalues becomesminimal (Fig.3c). We can see this more clearly in the dis-tribution of eigenvalues (Fig.3d). With increasing p , thedistribution becomes broader, with a pronounced tail ofsmall eigenvalues of F appearing close to the transitionat p c . Above the transition p > p c , the small eigenvaluessuddenly disappear from the distribution. We investi-gate the variance of the gradient and QNG in Fig.3e. Wenote that the variance of the regular gradient decays with p , reaching a minimum around p ≈
20 [7], upon whichit remains constant. The variance of the QNG remainslarger than the regular gradient, however the QNG de-cays for p > p c . In Fig.3f, we numerically find that vari-ance of both regular gradient and QNG vanish exponen-tially with increasing number of qubits N , demonstratingthe barren plateau problem. p v a r ( k E ) CNOTCPHASEiSWAP a p R b p v a r ( k E ) c p R d p v a r ( k E ) CNOTCPHASEiSWAP e p R f FIG. 4. Variance of gradient and redundancy of differenthardware efficient PQCs plotted against number of layers p for N = 10 qubits. Each layer consists of single qubit-rotations asrandomly chosen rotations around { x, y, z } axis. We plot dif-ferent arrangements of entangling gates (as shown in Fig.1c)with a,b) nearest-neighbor one-dimensional chain, c,d) all-to-all, e,f ) alternating nearest-neighbor connections. a,c,e) Variance of the gradient var( ∂ k E ) in respect to the Hamilto-nian H = σ z σ z . b,d,f ) Redundancy R (Eq. (3)), which isthe fraction of redundant parameters of the PQC. In Fig.4, we compare different types of PQCs withdifferent entangling gates and arrangements. We keep the single qubit-rotations as randomly chosen rotationsaround { x, y, z } axis. We note that all circuits show thesame qualitative behavior regarding the transition in theQFI (see Fig.3 and supplemental materials) as well assuffer from exponential decrease of the variance of thegradient with increasing number of qubits. However, keydifferences in the different PQCs remain as we demon-strate below. We show the variance of the gradient forrandom PQC parameters in Fig.4a,c,e for different ar-rangements of the entangling gates (CHAIN, ALL, ALT)as well as different types of entangling gates (CNOT,CPHASE, √ iSWAP). The variance decays with increas-ing p , until it reaches a constant level, the value of whichis the same for all gates and arrangements. However,CPHASE requires the most layers p to converge, fol-lowed by √ iSWAP and CNOT. Fig.4b,d,f shows the re-dundancy R , which is the fraction of redundant parame-ters of the PQC. It quickly reaches a constant level withincreasing p . √ iSWAP has consistently low R , while forCNOT it varies depending on the arrangement of entan-gling gates. For CPHASE, we have consistently larger R .This can be easily understood when considering that z rotations commute with the entangling CPHASE layer.When two z rotations appear consecutively on the samequbit, they yield a redundant parameter. R for CNOTdepends highly on the entangling gates arrangement.We note that for these PQCs the number of layers p c atwhich the transition of the QFI occurs can be estimatedfrom the value of redundancy R . We find p c ≈ (1 − R C ) D C /N , where R C is the converged value of R . Theeigenvalue spectrum of these PQCs and further types ofPQCs are discussed in the supplemental materials C.In Fig.5, we fix the single-qubit rotations around the y -axis and investigate different entangling gates arranged ina nearest-neighbor one-dimensional chain. Depending onthe choice of entangling gates, we find that the varianceof the gradient decays to a different constant level withincreasing p (see Fig.5a). y √ iSWAP matches the vari-ance found in Fig.3e, whereas y CNOT and y CPHASEhave higher variance. In Fig.5b we show the maximal D C for many layers p . D C scales exponentially for y CNOT( D C ∝ N ) and y √ iSWAP ( D C ∝ N +1 ), whereas for y CPHASE we find numerically an approximate quadraticscaling D C ∝ N .In Fig.6 we show how G C and the variance of the gra-dient changes when tuning the parameters of a PQC de-fined as U ( aθ rand ) | (cid:105) , θ rand ∈ [0 , π ), a ∈ [0 , a = 0 to a = 1, this corresponds to changingthe PQC from parameters all zero to a PQC with ran-dom parameters. We exemplary show a PQC consistingof layered randomly chosen single qubit rotations around x , y , z axis and entangling gates arranged in a chain. InFig.6a, we show different types of entangling gates. G C increases with a , reaching the parameter dimension D C for a = 1. CNOT and √ iSWAP increase faster with a compared to the PQC with CPHASE gates. The vari-ance of the gradient decreases sharply once a particular a is reached. Note that there is a specific range of pa- p v a r ( k E ) a N m a x i m a l D C y CPHASE y CNOT y iSWAP xyz CNOT b FIG. 5. Capacity of PQCs with y rotations and different en-tangling gates. The entangling layer is arranged as a nearest-neighbor one-dimensional chain. Three of the PQCs have y rotations, and as reference we show a PQC with randomized x , y or z rotations and CNOT gates. a) Variance of the gra-dient var( ∂ k E ) in respect to the Hamiltonian H = σ z σ z . b) Maximal parameter dimension D C of the PQCs as function ofnumber of qubits N . For y CPHASE we find an approximatepowerlaw D C ∝ N . log ( a ) G C ( a r a n d ) CNOTCPHASEiSWAP a log ( a ) v a r ( k E ) CNOTCPHASEiSWAP b FIG. 6. Tuning the PQC parameters θ = aθ rand , where a = (0 ,
1] and θ rand ∈ [0 , π ) for circuits composed of ran-dom x , y and z rotations and entangling gates arranged in achain configurations. a) The effective quantum dimension G C as function of log ( a ). Black dashed-dotted line is numberof parameters M . b) Variance of the gradient var( ∂ k E ) inrespect to Hamiltonian H = σ z σ z . All plots show number oflayers p = 100 and N = 10 qubits. rameters log ( a ) ≈ − . G C and the variance of gradients remains large.Finally, in Fig.7 we show the scaling of G C ( θ = 0 , N )with number of qubits N for a PQC with entanglinggates in a chain arrangement initialized with θ = 0, cor-responding to the point a = 0 in Fig.6. Numerically,we find linear scaling of G C ( θ = 0 , N ) for CPHASE en-tangling gates, quadratic scaling for CNOT gates andhigher order polynomial or even exponential scaling for √ iSWAP gates. V. DISCUSSION
We investigated the capacity and trainability of hard-ware efficient PQCs using the quantum geometric struc-ture of the parameter space. We introduced the notion ofparameter dimension D C and effective quantum dimen-sion G C which are global and local measures respectively N G C ( = ) CNOT N CPHASE N iSWAP FIG. 7. Effective quantum dimension G C ( θ = 0) plottedagainst number of qubits N for a circuit consisting of ran-domly chosen parametrized rotations around x , y or z axiswith parameters θ = 0, and two-qubit entangling gates ar-ranged in a nearest-neighbor chain. We compare CNOT,CPHASE and √ iSWAP entangling gates. From numericalresults, we find G C scales quadratically for CNOT gates, lin-early for CPHASE gates and higher order polynomial or evenexponential scaling for √ iSWAP. Number of layers p is chosensuch that G C ( θ = 0) is maximized. of the space of quantum states that can be accessed bythe PQC. Both can be derived from the QFI. We appliedthese concepts on exemplary PQCs composed of layers ofsingle-qubit rotations and different types of entanglinggates arranged in various geometries (see Fig.1). Forcomparable circuit depth p , we find strong numerical ev-idence that PQCs constructed from CNOT or √ iSWAPgates have lower variance of the gradient, and thus higherexpressibility compared to PQCs with CPHASE gates.For a specific type of PQC composed of y rotations andCPHASE gates, D C scales only quadratically with num-ber of qubits, which may imply that this PQC can beefficiently simulated on classical computers. We find thatthe redundancy of parameters varies strongly dependingon the configuration of the PQC as well as the type ofgates. The redundancy could be systematically reducedby choosing appropriate single-qubit rotations or by us-ing methods of [35] combined with the QFI.The effective quantum dimension G C shows the expres-sive power of a PQC by local variations around a specificparameter set. We find that depending on the entan-gling gates, G C can scale widely different with number ofqubits, with the largest value found for √ iSWAP gates.While we only studied the case θ = 0, PQCs with cor-related parameters could feature similar behavior [18].Tuning the parameters of a PQC from zero to a randomset of parameters yields a crossover from large gradientsand small G C to vanishing gradients and large G C . Forthe PQCs investigated, we can find a range of parame-ters that combines large gradients with a nearly maximal G C , which could be an optimal starting point for gradientbased optimisation. Trade-offs between the expressibilityof a circuit and the magnitude of its gradients are a keychallenge in finding good initialization strategies [31].When increasing the number of layers p to a value p c ,a transition occurs in the QFI when D C reaches its max-imal possible value. The transition is characterized by adisappearance of small eigenvalues of the QFI and a peakin the variance of the logarithm of eigenvalues. This tran-sition may be related to a phase transition in the opti-mization landscape of control theory. When the numberof parameters reaches a threshold, the optimization land-scape changes from being spin-glass like with many near-degenerate minima to one with many degenerate globalminima [36, 37]. For deep circuits p > p c , the transitionleads to a decay of the QNG as small eigenvalues aresuppressed. For shallow circuits p < p c , the QNG can beorders of magnitude larger in value compared to the regu-lar gradient, however our numerical results suggest thatboth regular gradient and QNG decrease exponentiallywith number of qubits. Thus, the QNG most likely can-not help to solve the barren plateau problem. This con-trasts the natural gradient in classical machine learning,which is known to be able to overcome the plateau phe-nomena that leads to a slow down of optimization [32].Imaginary-time evolution and variational quantumsimulation use a matrix related to the QFI to update theparameters of the PQC [28, 38]. The effective quantumdimension G C could give major insights on the conver-gence properties of these algorithms. Recent proposalsfor adaptively generating ans¨atze could benefit from theQFI by taking the geometry of the PQC into accountwhen designing PQCs [20].While cumbersome, we note that the QFI can be de-termined via measurement of overlaps on the quantumprocessor [28, 39, 40]. However, in order to evaluate atype of PQC, it is often sufficient to study circuits of afew qubits via classical simulation [41], and extrapolate the results.During the training of a hardware efficient PQC, theeigenvalue spectrum of the QFI can gain specific fea-tures, as has been shown for restricted Boltzmann ma-chines [42]. We show that the PQCs have a characteristiceigenvalue spectra depending on their configuration (seealso supplemental materials). The eigenvalues hold im-portant information about the trainability and general-ization of a model. For example, a model that generalizeswell is known to have a low effective dimension in classicalmachine learning [26]. It would be interesting to studyin what way these statements translate to quantum ma-chine learning. Further, connections to complementarymeasures of capacity based on classical Fisher informa-tion [43] and memory capacity [44] respectively could beexplored.It would be straightforward to extend the concepts ofquantum geometry to evaluate the capacity and train-ability of noisy PQCs [45], convolutional PQCs [46], op-timal control [47], quantum metrology [48] and program-able analog quantum simulators [49].Python code for the numerical calculations performedin this work are available at [50]. Acknowledgements—
This work is supported by a Sam-sung GRP project and the UK Hub in Quantum Com-puting and Simulation, part of the UK National Quan-tum Technologies Programme with funding from UKRIEPSRC grant EP/T001062/1. We are grateful to theNational Research Foundation and the Ministry of Edu-cation, Singapore for financial support. [1] J. Preskill, Quantum , 79 (2018).[2] K. Bharti, A. Cervera-Lierta, T. H. Kyaw, T. Haug,S. Alperin-Lea, A. Anand, M. Degroote, H. Heimonen,J. S. Kottmann, T. Menke, W.-K. Mok, S. Sim, L.-C.Kwek, and A. Aspuru-Guzik, arXiv:2101.08448 (2021).[3] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q.Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. Obrien,Nature communications , 4213 (2014).[4] J. R. McClean, J. Romero, R. Babbush, and A. Aspuru-Guzik, New Journal of Physics , 023023 (2016).[5] M. Cerezo, A. Arrasmith, R. Babbush, S. C. Benjamin,S. Endo, K. Fujii, J. R. McClean, K. Mitarai, X. Yuan,L. Cincio, et al. , arXiv preprint arXiv:2012.09265 (2020).[6] Y. Cao, J. Romero, J. P. Olson, M. Degroote, P. D. John-son, M. Kieferov´a, I. D. Kivlichan, T. Menke, B. Per-opadre, N. P. Sawaya, et al. , Chemical reviews , 10856(2019).[7] J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Babbush,and H. Neven, Nature communications , 4812 (2018).[8] M. Cerezo, A. Sone, T. Volkoff, L. Cincio, and P. J.Coles, arXiv:2001.00550 (2020).[9] C. O. Marrero, M. Kieferov´a, and N. Wiebe,arXiv:2010.15968 (2020).[10] S. Wang, E. Fontana, M. Cerezo, K. Sharma, A. Sone,L. Cincio, and P. J. Coles, arXiv:2007.14384 (2020).[11] L. Bittel and M. Kliesch, arXiv:2101.07267 (2021). [12] H.-Y. Huang, K. Bharti, and P. Rebentrost,arXiv:1909.07344 (2019).[13] K. Bharti, arXiv:2009.11001 (2020).[14] K. Bharti and T. Haug, arXiv:2011.06911 (2020).[15] K. Bharti and T. Haug, arXiv:2010.05638 (2020).[16] T. Haug and K. Bharti, arXiv:2011.14737 (2020).[17] J. W. Z. Lau, K. Bharti, T. Haug, and L. C. Kwek,arXiv:2101.07677 (2021).[18] T. Volkoff and P. J. Coles, arXiv preprintarXiv:2005.12200 (2020).[19] E. Grant, L. Wossnig, M. Ostaszewski, andM. Benedetti, Quantum , 214 (2019).[20] H. R. Grimsley, S. E. Economou, E. Barnes, and N. J.Mayhall, Nature communications , 1 (2019).[21] A. Skolik, J. R. McClean, M. Mohseni, P. van der Smagt,and M. Leib, arXiv preprint arXiv:2006.14904 (2020).[22] A. Kandala, A. Mezzacapo, K. Temme, M. Takita,M. Brink, J. M. Chow, and J. M. Gambetta, Nature , 242 (2017).[23] K. Nakaji and N. Yamamoto, arXiv:2005.12537 (2020).[24] S. Sim, P. D. Johnson, and A. Aspuru-Guzik, AdvancedQuantum Technologies , 1900070 (2019).[25] D. J. MacKay, in Advances in neural information pro-cessing systems (1992) pp. 839–846.[26] W. J. Maddox, G. Benton, and A. G. Wilson, arXivpreprint arXiv:2003.02139 (2020). [27] N. Yamamoto, arXiv:1909.05074 (2019).[28] J. Stokes, J. Izaac, N. Killoran, and G. Carleo, Quantum , 269 (2020).[29] D. Wierichs, C. Gogolin, and M. Kastoryano, arXivpreprint arXiv:2004.14666 (2020).[30] Y. Du, M.-H. Hsieh, T. Liu, and D. Tao, Phys. Rev. Res. , 033125 (2020).[31] Z. Holmes, K. Sharma, M. Cerezo, and P. J. Coles, arXivpreprint arXiv:2101.02138 (2021).[32] S.-i. Amari, Information geometry and its applications ,Vol. 194 (Springer, 2016).[33] P. Krantz, M. Kjaergaard, F. Yan, T. P. Orlando, S. Gus-tavsson, and W. D. Oliver, Applied Physics Reviews ,021318 (2019).[34] X.-Z. Luo, J.-G. Liu, P. Zhang, and L. Wang, Quantum , 341 (2020).[35] L. Funcke, T. Hartung, K. Jansen, S. K¨uhn, and P. Stor-nati, arXiv preprint arXiv:2011.03532 (2020).[36] M. Bukov, A. G. R. Day, D. Sels, P. Weinberg,A. Polkovnikov, and P. Mehta, Phys. Rev. X , 031086(2018).[37] H. A. Rabitz, M. M. Hsieh, and C. M. Rosenthal, Science , 1998 (2004).[38] S. McArdle, T. Jones, S. Endo, Y. Li, S. C. Benjamin,and X. Yuan, npj Quantum Information , 1 (2019).[39] X. Yuan, S. Endo, Q. Zhao, Y. Li, and S. C. Benjamin,Quantum , 191 (2019).[40] K. Mitarai and K. Fujii, Physical Review Research ,013006 (2019).[41] T. Jones, arXiv preprint arXiv:2011.02991 (2020).[42] C.-Y. Park and M. J. Kastoryano, Physical Review Re-search , 023232 (2020).[43] A. Abbas, D. Sutter, C. Zoufal, A. Lucchi, A. Figalli, andS. Woerner, arXiv preprint arXiv:2011.00027 (2020).[44] L. G. Wright and P. L. McMahon, in CLEO:QELS Fundamental Science (Optical Society of Amer-ica, 2020) pp. JM4G–5.[45] B. Koczor and S. C. Benjamin, arXiv preprintarXiv:1912.08660 (2019).[46] I. Cong, S. Choi, and M. D. Lukin, Nature Physics ,1273 (2019).[47] A. B. Magann, C. Arenz, M. D. Grace, T.-S. Ho, R. L.Kosut, J. R. McClean, H. A. Rabitz, and M. Sarovar, PR X Quantum , 010101 (2020).[48] J. J. Meyer, J. Borregaard, and J. Eisert, arXiv preprintarXiv:2006.06303 (2020).[49] V. Bastidas, T. Haug, C. Gravel, L.-C. Kwek, W. Munro,and K. Nemoto, arXiv:2009.00823 (2020).[50] T. Haug, “Quantum geometry of parametrizedquantum circuits,” https://github.com/txhaug/quantum-geometry .[51] S.-I. Amari, Neural computation , 251 (1998). Appendix A: Variational quantum eigensolver
The core idea of Variational quantum eigensolver(VQE) is to find the ground state of a Hamiltonian H by minimizing the parameters θ of a PQC in regardsto an objective function that represents the energy of agiven Hamiltonian E ( θ ) = (cid:104) | U † ( θ ) HU ( θ ) | (cid:105) [3]. Theminimisation is performed with a classical optimisation algorithm, whereas the energy is measured on a quantumdevice. According to the Ritz variational principle, theobjective function is lower bounded by the ground stateenergy of H , i.e. E ( θ ) ≥ E g , where E g is the true groundstate of H . Appendix B: Quantum Fisher information metric
For VQE, the objective function is updated in hybridclassical-quantum algorithm in an iterative manner. Atstep n of the procedure, the objective function is eval-uated on the quantum computer for a given θ n . Basedon the result, a classical computer selects the next choice θ n +1 such that it (hopefully) decreases the objective func-tion. A common scheme to update parameters is ordi-nary gradient descent θ n +1 = θ n − η ∂E ( θ ) ∂θ , (B1)where η is a small coefficient and ∂E ( θ ) /∂θ is the gradi-ent of the objective function.The above update rule assumes that the parameterspace for θ is a flat Euclidian space. However, in gen-eral this is not the case, as the underlying PQC and costfunction do not have such simple forms. Recent stud-ies have proposed the quantum natural gradient (QNG),inspired from the natural gradient in classical machinelearning [51], to minimize the objective function [27, 28].The main idea is to use information about how fast thequantum state changes when adjusting the parameter θ in a particular direction. Optimisation with the naturalgradient updates the parameters according to θ k +1 = θ k − η k F ( θ ) − ∂E ( θ ) ∂θ , (B2)where F ( θ ) is the Fubini-Study metric tensor or quantumFisher information metric (QFI) F ij = Re( (cid:104) ∂ i ψ | ∂ j ψ (cid:105) − (cid:104) ∂ i ψ | ψ (cid:105)(cid:104) ψ | ∂ j ψ (cid:105) ) , (B3)where | ∂ i ψ (cid:105) = ∂∂θ i | ψ ( θ ) (cid:105) denotes the partial derivativeof | ψ ( θ ) (cid:105) . One can relate F ( θ ) to the distance in thespace of pure quantum states, which is the Fubini-Studydistance given byDist Q (cid:16) | ψ ( θ ) (cid:105) , | ψ ( θ + d θ ) (cid:105) (cid:17) = (cid:88) i,j F ij ( θ )d θ i d θ j , (B4)where Dist Q ( x, y ) = |(cid:104) x | y (cid:105)| .The QNG has been demonstrated to speed up gradi-ent based optimization techniques for PQCs [27, 28] andavoid local minimas [29]. Efficient classical methods tocalculate the quantum Fisher information matrix havebeen proposed [41].For the purpose of this study, we investigate PQCs ofthe form | ψ ( θ ) (cid:105) = U ( θ ) | (cid:105) , where we assume U ( θ ) asa variational circuit in hardware efficient manner, whichconsists of p layers of parametrized single-qubit rotationsfollowed by fixed entangling gates U ( θ ) | (cid:105) = (cid:89) l = p [ W l V l ( θ l )] P | (cid:105) , (B5)where V l ( θ l ) consists of parametric single qubit rotations, P is the first layer of single-qubit rotations and W l arenon-parametric entangling gates.We assume that the quantum processor has a tensoredstructure of N qubits with Hilbert space N = 2 N . Thequantum circuit is parametrized by θ , a vector of M = pN real numbers, where we assume that θ l are the N parameters that govern the unitary V l ( θ l ) for the l -thlayer of the circuit [28]. We define the following shortform for representing subcircuits between layers l ≤ l U [ l : l ] := W l V l · · · W l V l . (B6)We now show how to calculate the derivative of a pa-rameter θ l,k , which is the k -th parameter for a single-qubit rotations within the l -th layer. With V l = (cid:81) Nn =1 exp( − iθ l,n / σ αn ), where σ αn is a Pauli operator α ∈ { x, y, z } acting on qubit n . We define ∂ l,n = ∂∂ θl,n asthe partial derivative for the parameter θ l,n which con-trols the single qubit rotation on the n -th qubit in the l -th layer ∂ l,n V l ( θ l ) = − i σ αn V l ( θ l ) , (B7)where α is a function of n and l . For the full unitary wefind ∂ l,n U ( θ ) | (cid:105) = U ( l : L ] W l ∂ l,n V l ( θ l ) U [1: l ) P | (cid:105) , = U ( l : L ] W l V l ( θ l )( − i σ αn U [1: l ) P | (cid:105) , = U [ l : L ] ( − i σ αn U [1: l ) P | (cid:105) . (B8)With Eq. (B8) inserted into Eq. (B3), we can calculatethe QFI. Appendix C: Further data on the PQCs
In Fig.8, we show further types of PQCs as definedin the caption. We highlight that the PQC rand( xyw )CPHASE has lower redundancy compared to rand( xyz )CPHASE. The reason is that the z rotations, which can commute with the CPHASE layer, are replaced with non-commuting ( x + y ) / √ zxz CNOT, which has firstbeen introduced in [22]. We note that while it has threerotations per qubit and layer, compared to rand( xyz )CNOT the decay of the variance of the gradient as func-tion of p remains the same in both types of PQC. Finally,we show further examples of the transition in the QFI,visible both in the peak of the variance of the logarithmof the eigenvalues, and in the decay of the QNG. p v a r ( k E ) rand( xyz ) CNOTrand( xyz ) CPHASErand( xyw ) CPHASE zxz CNOT a p v a r ( F k E ) b p R c p v a r ( l o g ( e i g ( F ))) d FIG. 8. Properties of further PQCs plotted against layers p for N = 10 qubits. PQCs have nearest-neighbor chain en-tangling layers. We define the type of PQCs in the legend:rand( xyz ) denotes randomized single-qubit rotations around { x, y, z } axis. rand( xyw ) denotes randomized single-qubit ro-tations around { x, y, ( x + y ) / √ } axis. zxz denotes that forevery layer there are three single-qubit rotations, around z , x and z axis. a) Variance of the gradient var( ∂ k E ) in re-spect to the Hamiltonian H = σ z σ z . b) Variance of QNGvar( F − ∂ k E ). c) Redundancy R of parameters of the PQCs. d) Variance of the logarithm of the eigenvalues of the QFI.
Appendix D: Histograms of eigenvalues
In Fig.9 we show the distribution of eigenvalues for thePQCs of Fig.4 in the main text. We find that a charac-teristic spectrum for the different PQC types. Note thatCPHASE appears to have more pronounced tails in allcases.0 log ( F ) P ( F ) / P m a x CNOTCPHASEiSWAP a log ( F ) P ( F ) / P m a x CNOTCPHASEiSWAP b log ( F ) P ( F ) / P m a x CNOTCPHASEiSWAP c FIG. 9. Distribution of eigenvalues of QFI for PQCs shown in Fig.4 in main text. a) nearest-neighbor chain arrangement ofentangling gates b) all-to-all connectivity c) alternating nearest-neighbor. All graphs for N = 10 qubits and number of layers pp