[PDF] Analyzing the barren plateau phenomenon in training quantum neural networks with the ZX-calculus

Abstract

In this paper, we propose a general scheme to analyze the gradient vanishing phenomenon, also known as the barren plateau phenomenon, in training quantum neural networks with the ZX-calculus. More precisely, we extend the barren plateaus theorem from unitary 2-design circuits to any parameterized quantum circuits under certain reasonable assumptions. The main technical contribution of this paper is representing certain integrations as ZX-diagrams and computing them with the ZX-calculus. The method is used to analyze four concrete quantum neural networks with different structures. It is shown that, for the hardware efficient ansatz and the MPS-inspired ansatz, there exist barren plateaus, while for the QCNN ansatz and the tree tensor network ansatz, there exists no barren plateau.

Full PDF

AAnalyzing the barren plateau phenomenon in trainingquantum neural network with the ZX-calculus

Chen Zhao and Xiao-Shan Gao Academy of Mathematics and Systems Science, Chinese Academy of Sciences University of Chinese Academy of Sciences

In this paper, we propose a general scheme to analyze the barren plateau phe-nomenon in training quantum neural networks with the ZX-calculus. More precisely, weextend the barren plateaus theorem from unitary 2-design circuits to any parameterizedquantum circuits under certain reasonable assumptions. The main technical contribu-tion of this paper is representing certain integrations as ZX-diagrams and computingthem with the ZX-calculus. The method is used to analyze four concrete quantumneural networks with diﬀerent structures. It is shown that, for the hardware eﬃcientansatz and the MPS-inspired ansatz, there exist barren plateaus, while for the QCNNand the tree tensor network ansatz, there exists no barren plateau.

In recent years, the hybrid quantum-classical algorithms are widely used in quantum chemistry [1–4], combinatorial optimization [5, 6], and quantum machine learning [7–12]. In these hybridquantum-classical algorithms, the goal is usually training parameterized quantum circuits (PQCs)with classical optimizers. The PQC will be applied to an initial state and then the state will bemeasured on a quantum device. The classical optimizer will update the parameters of the PQCaccording to the measurement results. As the PQC can be run on noisy intermediate-scale quan-tum (NISQ [13]) devices, these algorithms are regarded as near-term practical quantum algorithmswith potential quantum advantages.There exist many methods to train PQCs. Some of these are gradient-based [14–16] andsome are not [17, 18]. In quantum machine learning, gradient-based methods are widely used.When using gradient-based methods to train PQCs, one may suﬀer from the barren plateaus(BP) phenomenon which was ﬁrst studied in [19]. The BP phenomenon is that the gradient ofparameters of the PQC will vanish exponentially in the system size. It was proved that if thePQC is 2-design then the barren plateau phenomenon exists [19]. Even if the PQC is shallow andlocally 2-design, the BP phenomenon also exists if the cost-function is global [20]. It was alsoproved that if the cost-function is local, then the PQC with log( n ) -depth is trainable [20]. Toomuch entanglement will induce barren plateaus [21]. And the noise from quantum hardware alsocauses barren plateaus, which is called noise-induced barren plateaus [22].The above results are obtained under certain assumptions of unitary t -design and it is stilldiﬃcult to analyze the BP phenomenon for PQCs besides those containing t -design parts. Inthis paper, we develop a general scheme to analyze whether there exist BP phenomena whentraining a concrete PQC. We focus on BP phenomena induced by the structure of PQCs andnoise-induced barren plateaus are not considered in this paper. The most important tool usedis the ZX-calculus , a graphical language for describing and reasoning quantum processes. ZX-calculus was developed by Coecke and Duncan in [23, 24], which has various applications including

Chen Zhao: [email protected] Gao: [email protected] a r X i v : . [ qu a n t - ph ] F e b uantum circuit synthesis [25–28], measurement-based quantum computing [29, 30], quantum errorcorrection [31, 32], condensed matter physics [33], and quantum natural language processing [34].In the ZX-calculus, the objects to deal with are ZX-diagrams, which consist of two kinds of tensors,Z-spiders and X-spiders. And ZX-diagrams can be rewritten with ZX-calculus rules. Moreover,every quantum circuit can be converted into a ZX-diagram.Let ~θ = ( θ , . . . , θ m ) be a set of parameters. To analyze the gradient of a PQC U ( ~θ ) withrespect to a Hamiltonian H , we need to estimate the following expectation and the variance E (cid:20) ∂ h H i ∂θ j (cid:21) , Var (cid:20) ∂ h H i ∂θ j (cid:21) (1) where h H i is deﬁned in (3) . It will be shown that the expectation in (1) is always zero. The PQCis said to have barren plateaus if the variance in (1) vanishes exponentially in terms of the size ofthe PQC. The PQC is said to have no barren plateau or trainable if the variance in (1) vanishespolynomially in terms of the size of the PQC.To estimate the expectation and variance in (1) , we ﬁrst represent them as ZX-diagrams. Sincethe expectation and the variance are integrations, the main technical contribution of this paper isrepresenting these integrations as ZX-diagrams and computing them with the ZX-calculus. Moreprecisely, with the rewriting rules in the ZX-calculus, we prove that (1) is equal to the contractionof a tensor network with a similar structure as the PQC. Hence, the existence of barren plateausis totally characterized by the scaling property of the tensor network.We use these techniques to analyze whether there exist BP phenomena in the hardware-eﬃcientansatz [2], the QCNN [35], the tree tensor network ansatz [36], and the MPS-inspired ansatz [37].We show that there exist barren plateaus in hardware-eﬃcient ansatz and MPS-inspired ansatz,and there is no barren plateau in the QCNN and the tree tensor network ansatz.This paper is organized as follows. A brief introduction to the PQC, the BP phenomenon, andthe ZX-calculus will be given in section 2. We will prove the main result which characterizes (1) in section 3. And the analysis of 4 concrete PQCs is given in section 4. In a hybrid quantum-classical algorithm, there will be an ansatz, which is a PQC of the form U ( ~θ ) = M Y j =1 [ U j ( θ j ) · V j ] . (2) Here U j ( θ j ) , j = 1 , . . . , M are parameterized gates, for example the rotation gates R X , R Y , R Z .And V j , j = 1 , . . . , M are non-parameterized gates, for example the Hadamard gate H and theCNOT gate. The PQC will be applied to an initial state | ψ i and then the state will be measured.The above procedure, which is called the quantum part of the algorithm, will be run on quantumprocessors,Meanwhile, there will be a classical part which consists of classical processors to optimize theparameters of the PQC in the quantum part. A cost-function L ( ~θ ) will be estimated in the classicalpart based on the measurement results. Usually, the expectation h H i = h ψ | U † ( ~θ ) HU ( ~θ ) | ψ i (3) of a given Hamiltonian H will be regarded as the cost-function in many tasks.As demonstrated in ﬁgure 1, the quantum part runs the PQC and gets the measurementresults and the classical part estimates the cost-function and updates the parameters. After severaliterations, the cost-function may converge and be optimized. Then the training will be stopped.This is the main idea of the hybrid quantum-classical algorithm. ψ (cid:105) U ( (cid:126)θ ) H Quantum processors Classical processors L ( (cid:126)θ ) (cid:126)θ (cid:48) update parameters Figure 1: The hybrid quantum-classical algorithm

When the parameterized gates are of the form U j ( θ j ) = e − i θ H j , where H j satisﬁes H j = I , the gradient ∂ h H i ∂θ j can be estimated by the parameter shifting rulewithout changing the structure of the PQC [14]. Once we get the gradient, we can use gradient-based optimization methods, such as gradient descent, to optimize the parameters.Ideally, if the gradient does not vanish too fast as the size of the PQC grows, then the gradientcould be estimated eﬃciently and the PQC could be trained easily. However, the BP phenomenontells us that in many cases, the gradient vanishes exponentially as the system grows up. When thishappens, the PQC will be diﬃcult to be trained. The ﬁrst rigorous proof of the BP phenomenonis shown below. Theorem 1 ([19]) . Consider a PQC U ( ~θ ) = V ( θ M , . . . , θ j +1 ) U ( θ j ) W ( θ j − , . . . , θ ) and a Hamil-tonian H . The expectation of gradient is 0 if V and W are 1-design. And the variance of gradient Var h ∂ h H i ∂θ j i vanishes exponentially in the number of qubits if V or W is 2-design. Hence, when designing the ansatz PQC for a hybrid quantum-classical algorithm, we shouldanalyze whether there exist BP phenomena in it to ensure that it can be trained eﬃciently.

We provide a brieﬂy introduction to the ZX-calculus here. For more details, please refer to [38, 39].In the ZX-calculus, quantum states and their transformations are represented as ZX-diagramswhich consist of 2 kinds of tensors,

Z-spiders and

X-spiders . The Z-spider is denoted as the greennode, and the X-spider is denoted as the red node. They can be written explicitly in the Diracnotation as follows. θ ... θ ... | . . . i | {z } n h . . . | | {z } m + e iθ | . . . i | {z } n h . . . | | {z } m := θ ... θ ... | + · · · + i | {z } n h + · · · + | | {z } m + e iθ |− · · · −i | {z } n h− · · · −| | {z } m := m ( ) n ) nm ( For a spider, the edges on the left are called input and the edges on the right are called output .The angle θ is called the phase of the spider. For simplicity, we will omit the phase when it is zero.Spiders can be connected with wires. Hence, ZX-diagrams can be regarded as tensor networksgenerated with Z-spiders and X-spiders. For example, we can use ZX-diagrams to represent the ollowing quantum states and quantum gates. = | i + | i = √ | + i = | + i + |−i = √ | i θ = θ = R Z ( θ ) R X ( θ ) + = √ Z = √ := H π = π π (4) Here we introduce a new notation, the yellow box, to represent the Hadamard gate H = 1 √ (cid:18) − (cid:19) . Since the gates set { R Z , R X , CNOT } is universal for quantum computing, in principle, one canconvert every quantum circuit to a ZX-diagram with the equations in (4).Moreover, the ZX-calculus is a powerful tool for reasoning. There are several rewriting rulesin the ZX-calculus with which one can rewrite a ZX-diagram to another equivalent form. Figure2 gives some basic rewriting rules in the ZX-calculus. Here, two ZX-diagrams A, B are said to beequivalent if and only if there exists a non-zero constant c ∈ C , such that A = c · B . β ... ... α ...... = ... ... ... α + β ( f ) − α = ππ α ... ... π ( π ) ... α = ... ( c ) ... = ... ( h ) ( i ) == ( i )( b ) = ... α α ... Figure 2: Some basic rewriting rules in the ZX-calculus. Here, ‘ . . . ’ means 0 or more. (This ﬁgure is from [25].)

Note that the ZX-calculus is universal . It means that any linear transformations can be rep-resented as ZX-diagrams. Moreover, the rules in ﬁgure 2 are complete for the stabilizer quantummechanics where phases can only be multiples of π [40, 41]. That is, if two ZX-diagrams areequivalent, then there exists a set of rewriting rules in ﬁgure 2 that rewrites one into another.There are also completeness results for the Cliﬀord+T quantum mechanics, where phases can bemultiples of π , and for arbitrary ZX-diagrams [42–45].In this paper, we will focus on a canonical form of the ZX-diagram, the graph-like ZX-diagram which is deﬁned in [25]. Deﬁnition 1 ([25]) . A ZX-diagram is graph-like if1. All spiders are Z-spiders.2. Z-spiders are only connected via Hadamard edges.3. There exist no parallel Hadamard edges or self-loops.4. Every input or output is connected to a Z-spider and every Z-spider is connected to at mostone input or output.

Two spiders being connected via a Hadamard edge means that they are connected with aHadamard box. Alternatively, we will also use the dashed blue edge to represent a Hadamardedge. ... β ... α ... β ...:= All X-spiders can be rewritten to Z-spiders by using the rule ( h ) in ﬁgure 2. Connected Hadamardboxes can be canceled with the rule ( i ) and normal edges can be canceled with the rule ( f ) .Furthermore, parallel Hadamard edges and self-loops can be canceled with rules in the ﬁgure 3.Hence, every ZX-diagram is equivalent to a graph-like ZX-diagram [25]. α ... β ... α ... β ...= α . . . α + π . . . = α . . . α . . . = ( hopf ) ( hsl ) ( sl ) Figure 3: Rules for canceling parallel edges and self-loops [25].

In this section, we will show how to analyze the BP phenomenon with the ZX-calculus. Moreprecisely, we will show how to estimate the expectation and the variance of the gradient of thecost function of a PQC with respect to a Hamiltonian with the ZX-calculus. The main techniquewe used is to compute integration over unitarians with the ZX-calculus.Scalars are ignored in the rules in section 2.3. However, to consider the BP phenomenon, thescalar is necessary. Hence, we ﬁrst give the precise rules with scalars in ﬁgure 4. β ... ... α ...... = ... ... ... α + β ( f ) − α = ππ α ... ... π ( π ) ... α = ... ( c ) ... = ... ( h ) ( i ) == ( i )( b ) = ... α α ... ) n √ n − ) n √ α ... β ... α ... β ...= α . . . α + π . . . = α . . . α . . . = √ ( hopf )( hsl )( sl ) e iα Figure 4: Rewriting rules with scalars.

In this paper, we consider PQCs under the following assumptions.

Assumption 1.

The PQC U ( ~θ ) satisﬁes1. Each gate in U is one of { R X , R Z , H, CNOT } .2. The parameters in ~θ = ( θ , . . . , θ m ) are independent uniform random variables in the interval [ − π, π ] . .1 Representing gradients as ZX-diagrams Consider a PQC U ( ~θ ) of n -qubits and a Hamiltonian H . Without loss of generality, we also assumethat we apply this PQC to an initial state | i . Then the expectation of H can be expressed as h H i = h | U † ( ~θ ) HU ( ~θ ) | i . (5) As shown in section 2.3, we can convert the PQC U ( ~θ ) to a parameterized graph-like ZX-diagram G U ( ~θ ) with (4) . Suppose that U ( ~θ ) = c · G U ( ~θ ) for a constant c , then h H i can also be expressed as a ZX-diagram as demonstrated in the followingequation. ... . . .. . .. . . . . .... ... HG U ( ~θ ) ... . . .. . .. . . . . . ...... G U † ( − ~θ ) h H i = | c | n · (6) If we expand the spider by the deﬁnition of the Z-spider, we can prove that the gradient ∂ h H i ∂θ j can be represented as a ZX-diagram. Theorem 2.

The gradient can be represented as the following equation. ... . . .. . .. . . ... ... H n | c | · ∂ h H i ∂θ j = ∂∂θ j ... ... θ j . . .. . .. . .. . .. . . ... . . .. . . . . . ...... ...... − θ j . . .. . .. . . . . .. . . ... . . .. . .. . . ... ... H ... ... θ j + π . . .. . .. . .. . .. . . ... . . .. . . . . . ...... ...... − θ j − π . . .. . .. . . . . .. . . = π Proof.

The proof is given in appendix A.

To analyze the BP phenomenon, we need to compute the expectation E ( ∂ h H i ∂θ j ) = Z ~θ p ( ~θ ) ∂ h H i ∂θ j d ~θ, (7) and the variance Var ( ∂ h H i ∂θ j ) = E ( (cid:12)(cid:12)(cid:12)(cid:12) ∂ h H i ∂θ j (cid:12)(cid:12)(cid:12)(cid:12) ) − (cid:18) E ( ∂ h H i ∂θ j ) (cid:19) = Z ~θ p ( ~θ ) (cid:12)(cid:12)(cid:12)(cid:12) ∂ h H i ∂θ j (cid:12)(cid:12)(cid:12)(cid:12) d ~θ − (cid:18) E ( ∂ h H i ∂θ j ) (cid:19) , (8) for j = 1 , . . . , m . Here p ( ~θ ) is the probability of the parameters ~θ .By assumption 1, (7) and (8) can be written as E ( ∂ h H i ∂θ j ) = 1(2 π ) m Z θ · · · Z θ m ∂ h H i ∂θ j d θ . . . d θ m , (9) nd Var ( ∂ h H i ∂θ j ) = 1(2 π ) m Z θ · · · Z θ m (cid:12)(cid:12)(cid:12)(cid:12) ∂ h H i ∂θ j (cid:12)(cid:12)(cid:12)(cid:12) d θ . . . d θ m − (cid:18) E ( ∂ h H i ∂θ j ) (cid:19) , (10) for j = 1 , . . . , m .We will compute the expectation and variance of the gradients in the next two sections. In this section, we will compute the expectation in (9) . As shown in theorem 2, the integration π Z θ k ∂ h H i ∂θ j d θ k , for k = 1 , . . . , m , is also an integration of a ZX-diagram over its parameter θ k . With the followinglemma, the integration can be represented as a ZX-diagram again. Lemma 1.

The following equation holds. α − α ... ... ... .... . .. . .. . .. . . mm n n π Z α ... ... ... .... . .. . .. . .. . . mm n n =d α Proof.

The proof is given in appendix B.

With this lemma, we can prove the following theorem.

Theorem 3.

Under assumption 1, the integration π Z θ j ∂ h H i ∂θ j d θ j = 0 . Proof.

Using the relation in lemma 1 on the ZX-diagram in theorem 2, we have n | c | · π R θ j ∂ h H i ∂θ j d θ j ... . . .. . .. . . ... ... H ... ... . . .. . .. . .. . .. . . ... . . .. . . . . . ...... ...... . . .. . .. . . . . .. . . π ... . . .. . .. . . ... ... H ... ... . . .. . .. . .. . .. . . ... . . .. . . . . . ...... ...... . . .. . .. . . . . .. . . π = ... . . .. . .. . . ... ... H ... ... . . .. . .. . .. . .. . . ... . . .. . . . . . ...... ...... . . .. . .. . . . . .. . . π == = 0. As a corollary, the expectation of the gradient in (9) is zero.

Corollary 1.

Under assumption 1, the expectation E ( ∂ h H i ∂θ j ) = 1(2 π ) m Z θ · · · Z θ m ∂ h H i ∂θ j d θ . . . d θ m = 0 , for j = 1 , . . . , m. In this section, we will compute the variance in (8) .Because the gradient ∂ h H i ∂θ j is a real number and the expectation is 0, by (8) , the variance is theexpectation of (cid:16) ∂ h H i ∂θ j (cid:17) , which can be represented as follows by theorem 2. n | c | · h ∂ h H i ∂θ j i = ... . . .. . .. . . ... ... H ... ... θ j + π . . .. . .. . .. . .. . . ... . . .. . . . . . ...... ...... − θ j − π . . .. . .. . . . . .. . . π ... . . .. . . . . . ...... H ...... θ j + π . . .. . .. . . . . .. . . ... . . .. . .. . . ... ...... ... − θ j − π . . .. . .. . .. . .. . . π (11) imilar to lemma 1, we can prove the following lemma. Lemma 2.

The following equation holds. α − α ... ... ... ............... mm n n π Z α ... ... ... ............... mm n n = − α α ... ... ... ............... mm n n d α ............ ............ m mnn ... ... ... ............... mm n n ............ ............ m mnn + π ... ... ... ............... mm n n ............ ............ m mnn + π Proof.

Refer to appendix C.

There exist three terms after integration. Hence, computing the variance of gradients is muchmore complicated than computing the expectation. We denote the three ZX-diagrams in lemma 2as π T = T = , , π T = . (12) And we introduce a new notation V a ,...,a m U , a j ∈ { T , T , T } , to represent the following ZX-diagram. ... . . .. . .. . . ... ... H ... ... . . .. . .. . .. . . . . .. . .. . . ... ... . . .. . .. . . ... . . .. . . . . . ...... ...... . . .. . . . . .. . .. . .. . . . . . ...... . . .. . .. . . ... . . .. . .. . . ... ... H ... ... . . .. . .. . .. . . . . .. . .. . . ... ... . . .. . .. . . ... . . .. . . . . . ...... ...... . . .. . . . . .. . .. . .. . . . . . ...... . . .. . .. . . the spider corresponding to θ the spider corresponding to θ m a a m ......... ...... G U V a ,...,a m U = (13) Here, U ( θ , . . . , θ m ) is a PQC with m parameters and G U is the graph-like ZX-diagram corre-sponding to U . With this notation, we have the following theorem. Theorem 4.

Under assumption 1, the following equation holds.

Var (cid:18) ∂ h H i ∂θ j (cid:19) = | c | n · X a k ∈{ T ,T ,T } , k = j V a ,...,a j − ,T ,a j +1 ,...,a m U . Proof.

By (11) and lemma 2, we have the following equation. n | c | · π Z θ j (cid:20) ∂ h H i ∂θ j (cid:21) d θ j = X a j ∈{ T ,T ,T } ... . . .. . .. . . ... ... H ... ... . . .. . .. . .. . .. . . ... . . .. . . . . . ...... ...... . . .. . .. . . . . .. . . π ... . . .. . . . . . ...... H ...... . . .. . .. . . . . .. . . ... . . .. . .. . . ... ...... ... . . .. . .. . .. . .. . . π a j And by the following equations, πππ . . . . . .. . .. . . = πππ . . . . . .. . .. . . = π . . . . . .. . .. . . π ππ π πππ = . . . . . .. . .. . . π ππ π = . . . . . .. . .. . . πππ . . . . . .. . .. . . = ππ . . . . . .. . .. . . = ππ . . . . . .. . .. . . = 0 ππ . . . . . .. . .. . . π ππ . . . . . .. . .. . . π = ππ . . . . . .. . .. . . π = = 0 we have4 n | c | · π Z θ j (cid:20) ∂ h H i ∂θ j (cid:21) d θ j = ... . . .. . .. . . ... ... H ... ... . . .. . .. . .. . .. . . ... . . .. . . . . . ...... ...... . . .. . .. . . . . .. . . ... . . .. . . . . . ...... H ...... . . .. . .. . . . . .. . . ... . . .. . .. . . ... ...... ... . . .. . .. . .. . .. . . π Then, by the deﬁnition of V a ,...,a m U , we get Var (cid:18) ∂ h H i ∂θ j (cid:19) = | c | n · X a k ∈{ T ,T ,T } , k = j V a ,...,a j − ,T ,a j +1 ,...,a m U . ence, to compute the variance, we need to sum over m − terms of the tensor V a ,...,a j − ,T ,a j +1 ,...,a m U . It seems inaccessible when m is large. But in many cases, we have simple ways to compute thissum.Let us consider two spiders W j , W k corresponding to the parameters θ j , θ k in G U . Suppose that W j and W k are connected with a Hadamard edge. Then by the following lemma, the Hadamardedge can be removed after integration over θ j and θ k . Lemma 3.

The following equation holds. θ j θ k . . . . . . − θ k − θ j . . . . . .. . . θ j θ k . . .. . . − θ k − θ j . . .. . . . . . π ) Z θ j Z θ k d θ j d θ k . . . . . . . . . . . .. . . . . .. . .. . .. . . . . . = P a j ,a k ∈{ T ,T ,T } a j a k M a j ,a k · Here M a j ,a k is the × matrix M = 14  − −  Proof.

Refer to appendix D.

Applying this lemma to the variance recursively, we can remove all the Hadamard edges con-necting two parameterized spiders. And the big tensor V a ,...,a j − ,T ,a j +1 ,...,a m U will be broken into smaller tensors that are connected with M . It is a new tensor network whosestructure is similar to G U . To compute the variance, the only thing we need to do is contractingthis new tensor network. Figure 5 demonstrates the above procedure for the case that all spiders in G U are parameterized and are connected with Hadamard edges. The tensor ˜ I a ,...,a k is related tothe input state, while the tensor ˜ H c ,...,c m is related to the Hamiltonian H . And P is a projectionthat has only one non-zero entry. That is P (2 , , . . . ,

2) = 1 . Also note that there is a scalar for each internal copy tensor. This scalar comes from the followingequation. π = π = = 2 (14) In conclusion, computing the variance of gradients is reduced to contracting a tensor networkcorresponding to the circuit. In the next section, we will use these techniques to analyze the BPphenomenon several for concrete PQCs.

In this section, we will analyze the BP phenomenon for 4 PQCs with the techniques introduced insection 3. α α k ... β β β l ... γ γ γ m ...... ... . . .. . .. . .. . . ... ... ...... ... . . .. . .. . .. . . ˜ I a ,...,a j = M ... = copy tensor ˜ H c ,...,c m G U = Var ( ∂ h H i ∂θ ) = P V a ,...,a k ,b ,...,b l ,...,T ,...,c ,...,c m U = . . .. . .. . .. . .. . .. . . θ ... ... P ... ... . . .. . .. . . . . .. . .. . . P ... ... = the projectionBy lemma 3where Figure 5: Computing the variance with the tensor network

Consider a hardware-eﬃcient ansatz [2] PQC of the following form. R X R Z | i R X R Z | i R X R Z | i ... ... ... + ++ R X R Z | i R Z R X R Z | i + ... . . .. . . R Z R Z R Z R Z R X R X R Z R X R Z ... ... R Z R X R Z R X R Z + ++ R Z + ... . . .. . . R Z R Z R Z R Z R X R X ... R X R X R X × L ... (15) Suppose the circuit is of n -qubits. By the conversion rules in (4) , we get a graph-like ZX-diagramwhere all spiders are parameterized. . . .. . . . . .. . . . . . ... ... . . . . . .. . . . . .. . . . . . . . . ... . . .. . . . . . . . .. . . ... . . .. . . × ( L + 1) . . . . . . ... √ √ √ √ √ √ √ √ √ ere, the Z-spider with “ . . . ” represents a Z-spider with a parameter.By lemma 3, we can remove most Hadamard edges. Then the remaining part consists of α β γ . . .. . . . By the similar techniques in lemma 3, we can prove the following lemma.

Lemma 4.

Refer to appendix E.

Then we can construct a tensor network which is simlar to ﬁgure 5 as follows. ... ...

ET ETET ET ...

ET MM ... MM × L ... M MMMMM

ET ETET ET ...

ET MM ... MM ... M MMMMM H c ,...,c n (16) And if we want to compute the variance

Var ( ∂ h H i ∂θ j ) , we can just simply replace the copy tensorcorresponding to θ j in the above tensor network with the projection P .If we denote EM = ET M M , then a layer can be represented as EM EM EM . . .. . .. . .

EM MLT = E . nd the whole tensor network in (16) is MM − M − M − M − M − LT LT M − MMMMM ... ...... .... . .. . .. . .. . .. . .. . . L + 1 H c ,...,c n . Hence, the variance will be MM − M − M − M − M − LT LT ... ...... .... . .. . .. . .. . .. . .. . . L H c ,...,c n LT LT M − MMMMM ... ...... .... . .. . .. . .. . .. . .. . . L P M M − ... ... Var (cid:16) ∂ h H i ∂θ j (cid:17) = corresponding to θ j . (17) We can prove that only 2 eigenvalues of the matrix LT are 1 and the norms of other eigenvaluesare less than 1 (for the complete proof, please refer to appendix F). Moreover, the eigenspacecorresponding to the eigenvalue 1 is generated with two vectors E = span { v , ⊗ · · · ⊗ v , , v , ⊗ · · · ⊗ v , } , v , =   , v , =   . (18) Hence, LT d will converge to P E , the projection to the eigenspace E , exponentially, as d → ∞ .If we replace LT L and LT L with the projection P E , then the (17) will become MM − M − M − M − M − P E ... ... H c ,...,c n M − MMMMM ... ... P M M − ... ... lim L ,L →∞ Var (cid:16) ∂ h H i ∂θ j (cid:17) = corresponding to θ j P E . (19) This term is · n Tr( H ) , (20) which is exponentially small. Thus, there exist barren plateaus in the hardware-eﬃcient ansatz.More precisely, the variance Var (cid:16) ∂ h H i ∂θ (cid:17) is exponentially (in L ) close to a exponentially small (in n ) value (20) . heorem 5. The variance of gradients in the hardware-eﬃcient ansatz deﬁned in (15) vanishesexponentially as the qubit number n and the layer number L grow up. Note that the above analysis can be generalized to any hardware-eﬃcient ans¨atze if the entanglerconnects all of the qubits.

The tree tensor network is a special kind of tensor networks with tree structures. And the quantumanalog of the tree tensor network is developed in [36]. In [46], it was proved that the sum of thevariance n X j =1 Var (cid:18) ∂ h H i ∂θ j (cid:19) will not vanish exponentially. In this section, we will prove that not only the sum of the variancebut also the variance of each parameter vanishes polynomially.Consider the tree tensor network ansatz with n -qubit of the following form. R Y R Y R Y R Y ++ R Y R Y + R Y R Y R Y R Y R Y ++ R Y R Y + R Y + Measure R Y To analyse the BP phenomenon in this ansatz, we ﬁrst use the gate decomposition R Y ( θ ) = R Z ( π R X ( θ ) R Z ( − π to convert the PQC to a ZX-diagram as follows. . . . Measure π − π . . . π − π . . . π − π . . . π − π . . . π − π . . . π − π . . . π − π . . . π − π . . . π − π . . . π − π . . . π − π . . . π − π . . . π − π . . . π − π √ √ √ √ √ √ √ The X-spiders with phase “. . . ” are spiders with parameters. And the ZX-diagram can be rewritten o a graph-like ZX-diagram as follows. . . . Measure π − π . . . π − π . . . π − π . . . π − π . . . π − π . . . π − π . . . π − π . . . π − π . . . π − π . . . π − π . . . π − π . . . π − π . . . π − π . . . π − π . . . π − π By using the rewriting rule ( lc ) in [25], we can remove the spiders with phases ± π . . . . Measure π − π . . . − π . . . π − π . . . − π . . . π − π . . . − π . . . π − π . . . − π . . . π . . .. . . π . . . . . . π . . . . . . π By (11) , the building block of the variance ∂ h H i ∂θ j is αβ γ − α − β − γ . . . αβγ − α − β − γ . . . . . .. . .. . .. . . . . .. . .. . .. . . . We can prove that (for the complete proof, please refer to appendix F), after integration over theparameters α, β, γ , the building block will become αβ γ − α − β − γ . . . αβγ − α − β − γ . . . . . .. . .. . .. . . . . .. . .. . .. . . π ) Z α,β,γ d α d β d γ = P a,b,c ∈ T ,T ,T T a,b,c TTN . . .. . . . . .. . .. . .. . . . . .. . .. . .. . . a b c (21)

Here, T TTN is a rank-3 tensor deﬁned as follows. T TTN [1 , · , · ] = 116   , T TTN [2 , · , · ] = 116  −

10 1 0 −  , T TTN [3 , · , · ] = 116   . ence, the variance of ∂ h H i ∂θ j can be obtained by replacing one of the copy tensor with theprojection P in the following tensor network. H ˜ I b caT a,b,c TTN = . (22) Now let us analyse this tensor network.Since the Hamiltonian H is a 1-qubit Hermitian operator, it can be expressed as H = k I + k X + k Y + k Z, k j ∈ R . Then ˜ H = 2 k   + 2( k + k )   + 2 k  −  . We denote v =   , v , =   , v − , =  −  . Note that the building block of this tensor network is · T TTN . By the deﬁnition of T TTN , wehave v , v , v , = v v v , = v v + , , v − , = v , v . (23) With the above equations, we can compute the variance simply.For example, consider the following variance. ˜ H = 2 k v , + 2( k + k ) v + 2 k v − , P ˜ I Var (cid:16) ∂ h H i ∂θ (cid:17) = . It is a linear function of ˜ H . Hence, we can analyze each term of ˜ H individually.Since P v , = 0 , the ﬁrst term k v , in ˜ H will become 0. ow let us consider the second term k + k ) v . With (23) , we can expand the variance asfollows. v P ˜ I v P ˜ I = v P ˜ I = ( v , + v ) v P ˜ I = v v P ˜ I = v ( v , + v ) v v ˜ I

222 2444 4 = v ( v , + v ) . Expanding it recursively, the variance can be represented as a summation of terms of the followingform, ˜ I ( u , . . . , u n ) = u u u n ˜ I ... , where u j ∈ { v , v , } . (24) nd by the deﬁnition of ˜ I , each of the term ˜ I ( u , . . . , u n ) ≥ . Hence, we can get a lower bound, v v ˜ I

222 2444 4 v ( v , + v ) v v ˜ I

222 2444 4 ≥ v v , v v ˜ I ≥ v v , v , v , v v v , ˜ I ≥ v v , v , v , v , .Similarly, we have a lower bound for the term v − , .For the general case of n -qubit, we can prove that it has a lower bound. Theorem 6.

Note that ˜ I ( u , . . . , u n ) only depends on the input state. If ˜ I ( u , . . . , u n ) ∈ Ω( 1poly( n ) ) or ˜ I ( w , . . . , w n ) ∈ Ω( 1poly( n ) ) , there exist no barren plateau in the tree tensor network ansatz. QCNN was developed in [35]. It was proved that there exists no barren plateau in the QCNNansatz if the subblocks are unitary 2-design [47]. In this section, we will use the ZX-calculus toanalyze the BP phenomenon in a QCNN ansatz without the assumption of unitary 2-design.Consider a QCNN ansatz as follows. R X R X R X R X ++ + R X R X R X R X ++ + + + Measure +++ ++++ + + R Z R Z R Z R Z R Z R Z R Z R Z R X R X R X R X R X R X R X R X R Z R Z R Z R Z R Z R Z R Z R Z R X R X R X R X R Z R Z R Z R Z R X R X R X R Z R Z R Z R X R X R X R Z R Z R Z R X R X R X R X R Z R Z R Z R Z R X R X R Z R Z R X R X R Z R Z R X R Z It can be represented as the following ZX-diagram. ... ............ .................. ...... ......

Measure ...... ......... ......... ......... ............ ......... ....................................... ....................................... ............ ............... ......... ...... √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ where the Z-spiders with “. . . ” are parameterized. Note that this is a graph-like ZX-diagramwhose spiders are all parameterized. Hence, by using lemma 3, the variance can be obtained byreplacing one of the copy tensors with the projection P in the following tensor network. v , v , v , v , v , v , v ,

22 222 222 222 2444 22 244 22 24 22 24 22 24222 22 242 22 242 22 242 22 2422 22 242 22 242 22 22 24 2242 22 242 ˜ H ˜ I f we denote 22 242= T QCNN , then we have the following equations

22 242 v , v = v − , + v v , + v

22 242 v , v − , = v , v − , + v ,22 242 v , v , = v , v , , . (26) By using (26) , we can expand the variance as a sum of terms of the following form. u u u n ˜ I ... , where u j ∈ { v , v , , v − , } . ˜ I ( u , . . . , u n ) = And each of these terms is non-negative. Hence, similar to the analysis of tree tensor networkansatz, we can prove that the variance of gradients in the QCNN ansatz has a lower bound.

Theorem 7.

Hence, if provided ˜ I ( u , . . . , u n ) ∈ Ω( 1poly( n ) ) or ˜ I ( w , . . . , w n ) ∈ Ω( 1poly( n ) ) , then there exists no barren plateau in the QCNN ansatz. .4 MPS-inspired ansatz The matrix product state (MPS) is a special structure of tensor networks. And it is widely usedin quantum physics and machine learning [48, 49]. There are also PQCs with a similar structureas MPS, and we call it MPS-inspired ansatz. It has been shown that MPS-inspired ansatz canbe implemented eﬃciently in quantum computers with fewer qubits [37]. We will analyze the BPphenomenon in MPS-inspired ansatz in this section.Let us consider the following MPS-inspired ansatz R X R X + + + R Z R Z R X R X R X R Z R Z R Z R X R X R Z R Z R X R Z R X R Z + ...... ... R X R X + R Z R Z R X R X R Z R Z | i| i| i| i| i , and the Hamiltonian H = I ⊗ I · · · ⊗ I ⊗ X . We will prove that the variance Var (cid:16) ∂ h H i ∂θ (cid:17) isexponentially small. Here θ is the parameter of the ﬁrst R X gate applying on the ﬁrst qubit.Firstly, we convert the PQC into a ZX-diagram as follows. . . .. . . . . .. . . . . . . . . . . .. . .. . . . . .. . . . . .. . . . . .. . . . . .. . .. . . . . . . . . . . .... ... . . . . . .. . .. . . . . . . . .. . . ... ... √ √ √ √ √ √ √ √ √ This is a graph-like ZX-diagram whose spiders are all parameterized. We can use lemma 3 torepresent the variance as the following tensor network. v , v , v , . . .... ... v , v ... ...

44 4 4 P Var (cid:16) ∂ h H i ∂θ (cid:17) = By using M v = 12 (cid:0) v + v − , (cid:1) , M v , = v , , and v , v v , v − , = v − , v + v , v − , v − , v , v , v , v , v , , , e can simplify the variance as v v v ... ... v P Var (cid:16) ∂ h H i ∂θ (cid:17) = v − , = n − (28) It is exponential in qubit number n . Hence, there exist barren plateaus in the MPS-inspired ansatz. We developed powerful techniques to analyze the BP phenomenon in certain quantum neuralnetworks training with the ZX-calculus. The quantum neural networks under consideration arePQCs and the cost function is the expectation h H i of the PQC with respect to a given Hamiltonian H . The basic idea of the method is to represent the PQC, the cost function h H i , and the gradients ∂ h H i ∂θ j as ZX-diagrams. And then computing the expectation and the variance of the gradient of h H i becomes computing the integration of certain ZX-diagrams. We show that these integrationsare also ZX-diagrams which can be computed explicitly in many cases.In principle, these techniques can be used to any given ansatz under assumption 1. We remarkthat these techniques can be used to analyze the BP phenomenon for PQCs which contain t -designsub-blocks, for example, the PQCs considered in [20, 47]. Because the t -design sub-blocks canbe replaced with concrete t -design PQCs and then the techniques proposed in this paper can beapplied. In conclusion, we extend the barren plateaus theorem from unitary 2-design circuits toany parameterized quantum circuits under assumption 1.Using the techniques proposed in this paper, we analyzed 4 kinds of ans¨atze, including thehardware-eﬃcient ansatz, the tree tensor network ansatz, the QCNN ansatz, and the MPS-inspiredansatz. It is shown that there exist barren plateaus in the hardware-eﬃcient-ansatz and the MPS-inspired ansatz, while there exists no barren plateau in the tree tensor network ansatz and theQCNN ansatz. Acknowledgment.

This work is partially supported by a NSFC grant No.11688101 and by aNKRDP grant No.2018YFA0306702.

References [1] Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, Man-Hong Yung, Xiao-Qi Zhou, Peter JLove, Al´an Aspuru-Guzik, and Jeremy L O’brien. A variational eigenvalue solver on a photonicquantum processor.

Nature communications , 5:4213, 2014.[2] Abhinav Kandala, Antonio Mezzacapo, Kristan Temme, Maika Takita, Markus Brink, Jerry MChow, and Jay M Gambetta. Hardware-eﬃcient variational quantum eigensolver for smallmolecules and quantum magnets.

Nature , 549(7671):242–246, 2017.[3] Yudong Cao, Jonathan Romero, Jonathan P Olson, Matthias Degroote, Peter D Johnson,M´aria Kieferov´a, Ian D Kivlichan, Tim Menke, Borja Peropadre, Nicolas PD Sawaya, et al.Quantum chemistry in the age of quantum computing.

Chemical reviews , 119(19):10856–10915, 2019.[4] Bela Bauer, Sergey Bravyi, Mario Motta, and Garnet Kin Chan. Quantum algorithms forquantum chemistry and quantum materials science. arXiv preprint arXiv:2001.03685 , 2020.

5] Edward Farhi, Jeﬀrey Goldstone, and Sam Gutmann. A quantum approximate optimizationalgorithm. arXiv preprint arXiv:1411.4028 , 2014.[6] Leo Zhou, Sheng-Tao Wang, Soonwon Choi, Hannes Pichler, and Mikhail D Lukin. Quantumapproximate optimization algorithm: performance, mechanism, and implementation on near-term devices. arXiv preprint arXiv:1812.01041 , 2018.[7] Jin-Guo Liu and Lei Wang. Diﬀerentiable learning of quantum circuit born machines.

PhysicalReview A , 98(6):062324, 2018.[8] Seth Lloyd and Christian Weedbrook. Quantum generative adversarial learning.

Physicalreview letters , 121(4):040502, 2018.[9] Vojtˇech Havl´ıˇcek, Antonio D C´orcoles, Kristan Temme, Aram W Harrow, Abhinav Kandala,Jerry M Chow, and Jay M Gambetta. Supervised learning with quantum-enhanced featurespaces.

Nature , 567(7747):209–212, 2019.[10] Maria Schuld, Alex Bocharov, Krysta M Svore, and Nathan Wiebe. Circuit-centric quantumclassiﬁers.

Physical Review A , 101(3):032308, 2020.[11] Marcello Benedetti, Erika Lloyd, Stefan Sack, and Mattia Fiorentini. Parameterized quantumcircuits as machine learning models.

Quantum Science and Technology , 4(4):043001, 2019.[12] Chen Zhao and Xiao-Shan Gao. Qdnn: Dnn with quantum neural network layers. arXivpreprint arXiv:1912.12660 , 2019.[13] John Preskill. Quantum computing in the nisq era and beyond.

Quantum , 2:79, 2018.[14] Maria Schuld, Ville Bergholm, Christian Gogolin, Josh Izaac, and Nathan Killoran. Evaluatinganalytic gradients on quantum hardware.

Physical Review A , 99(3):032331, 2019.[15] Andrea Mari, Thomas R Bromley, and Nathan Killoran. Estimating the gradient and higher-order derivatives on quantum hardware.

Physical Review A , 103(1):012405, 2020.[16] James Stokes, Josh Izaac, Nathan Killoran, and Giuseppe Carleo. Quantum natural gradient.

Quantum , 4:269, 2020.[17] Ken M Nakanishi, Keisuke Fujii, and Synge Todo. Sequential minimal optimization forquantum-classical hybrid algorithms.

Physical Review Research , 2(4):043158, 2020.[18] Jonas M K¨ubler, Andrew Arrasmith, Lukasz Cincio, and Patrick J Coles. An adaptive opti-mizer for measurement-frugal variational algorithms.

Quantum , 4:263, 2020.[19] Jarrod R McClean, Sergio Boixo, Vadim N Smelyanskiy, Ryan Babbush, and Hartmut Neven.Barren plateaus in quantum neural network training landscapes.

Nature communications , 9(1):1–6, 2018.[20] Marco Cerezo, Akira Sone, Tyler Volkoﬀ, Lukasz Cincio, and Patrick J Coles. Cost-function-dependent barren plateaus in shallow quantum neural networks. arXiv preprintarXiv:2001.00550 , 2020.[21] Carlos Ortiz Marrero, M´aria Kieferov´a, and Nathan Wiebe. Entanglement induced barrenplateaus. arXiv preprint arXiv:2010.15968 , 2020.[22] Samson Wang, Enrico Fontana, Marco Cerezo, Kunal Sharma, Akira Sone, Lukasz Cincio,and Patrick J Coles. Noise-induced barren plateaus in variational quantum algorithms. arXivpreprint arXiv:2007.14384 , 2020.[23] Bob Coecke and Ross Duncan. Interacting quantum observables. In

Proceedings of the 37thInternational Colloquium on Automata, Languages and Programming (ICALP) , Lecture Notesin Computer Science, 2008. DOI: 10.1007/978-3-540-70583-3“˙25.[24] Bob Coecke and Ross Duncan. Interacting quantum observables: categorical algebra and dia-grammatics.

New Journal of Physics , 13:043016, 2011. DOI: 10.1088/1367-2630/13/4/043016.[25] Ross Duncan, Aleks Kissinger, Simon Pedrix, and John van de Wetering. Graph-theoreticSimpliﬁcation of Quantum Circuits with the ZX-calculus.

Quantum , 4:279, 6 2020. ISSN2521-327X. DOI: 10.22331/q-2020-06-04-279.[26] Aleks Kissinger and John van de Wetering. Reducing T-count with the ZX-calculus.

PhysicalReview A , 102:022406, 8 2020. DOI: 10.1103/PhysRevA.102.022406.[27] Alexander Cowtan, Silas Dilkes, Ross Duncan, Will Simmons, and Seyon Sivarajah. PhaseGadget Synthesis for Shallow Circuits. In Bob Coecke and Matthew Leifer, editors,

Pro-ceedings 16th International Conference on

Quantum Physics and Logic,

Chapman Uni-versity, Orange, CA, USA., 10-14 June 2019 , volume 318 of

Electronic Proceedings inTheoretical Computer Science , pages 213–228. Open Publishing Association, 2020. DOI:10.4204/EPTCS.318.13.

28] Michael Hanks, Marta P. Estarellas, William J. Munro, and Kae Nemoto. Eﬀective Com-pression of Quantum Braided Circuits Aided by ZX-Calculus.

Physical Review X , 10:041030,2020. DOI: 10.1103/PhysRevX.10.041030.[29] Ross Duncan. A graphical approach to measurement-based quantum computing. InMehrnoosh Sadrzadeh Chris Heunen and Edward Grefenstette, editors,

Quantum Physics andLinguistics: A Compositional, Diagrammatic Discourse . 2013. ISBN 9780199646296. DOI:10.1093/acprof:oso/9780199646296.001.0001.[30] Miriam Backens, Hector Miller-Bakewell, Giovanni de Felice, Leo Lobski, and John van deWetering. There and back again: A circuit extraction tale. arXiv preprint arXiv:2003.01664 ,2020.[31] Nicholas Chancellor, Aleks Kissinger, Joschka Roﬀe, Stefan Zohren, and Dominic Horsman.Graphical Structures for Design and Veriﬁcation of Quantum Error Correction. arXiv preprintarXiv:1611.08012 , 2016.[32] Niel de Beaudrap and Dominic Horsman. The ZX calculus is a language for surface codelattice surgery.

Quantum , 4, 2020. DOI: 10.22331/q-2020-01-09-218.[33] Richard D. P. East, John van de Wetering, Nicholas Chancellor, and Adolfo G. Grushin.AKLT-states as ZX-diagrams: diagrammatic reasoning for quantum states. arXiv preprintarXiv:2012.01219 , 2020.[34] Bob Coecke, Giovanni de Felice, Konstantinos Meichanetzidis, and Alexis Toumi. Foundationsfor Near-Term Quantum Natural Language Processing. arXiv preprint arXiv:2012.03755 ,2020.[35] Iris Cong, Soonwon Choi, and Mikhail D Lukin. Quantum convolutional neural networks.

Nature Physics , 15(12):1273–1278, 2019.[36] Edward Grant, Marcello Benedetti, Shuxiang Cao, Andrew Hallam, Joshua Lockhart, Vid Sto-jevic, Andrew G Green, and Simone Severini. Hierarchical quantum classiﬁers. npj QuantumInformation , 4(1):1–8, 2018.[37] Jin-Guo Liu, Yi-Hong Zhang, Yuan Wan, and Lei Wang. Variational quantum eigensolverwith fewer qubits.

Phys. Rev. Research , 1:023025, Sep 2019. DOI: 10.1103/PhysRevRe-search.1.023025. URL https://link.aps.org/doi/10.1103/PhysRevResearch.1.023025 .[38] Bob Coecke and Aleks Kissinger.

Picturing Quantum Processes . Cambridge University Press,2017. DOI: 10.1007/978-3-319-91376-6“˙6.[39] John van de Wetering. ZX-calculus for the working quantum computer scientist. arXiv preprintarXiv:2012.13966 , 2020.[40] Miriam Backens. The ZX-calculus is complete for stabilizer quantum mechanics.

New Journalof Physics , 16(9):093021, 2014. DOI: 10.1088/1367-2630/16/9/093021.[41] Miriam Backens. Making the stabilizer ZX-calculus complete for scalars. In Chris Heunen,Peter Selinger, and Jamie Vicary, editors,

Proceedings of the 12th International Workshop onQuantum Physics and Logic (QPL 2015) , volume 195 of

Electronic Proceedings in TheoreticalComputer Science , pages 17–32, 2015. DOI: 10.4204/EPTCS.195.2.[42] Emmanuel Jeandel, Simon Perdrix, and Renaud Vilmart. A Complete Axiomatisation ofthe ZX-Calculus for Cliﬀord+T Quantum Mechanics. In

Proceedings of the 33rd AnnualACM/IEEE Symposium on Logic in Computer Science , LICS ’18, pages 559–568, New York,NY, USA, 2018. ACM. ISBN 978-1-4503-5583-4. DOI: 10.1145/3209108.3209131.[43] Emmanuel Jeandel, Simon Perdrix, and Renaud Vilmart. Diagrammatic Reasoning BeyondCliﬀord+T Quantum Mechanics. In

Proceedings of the 33rd Annual ACM/IEEE Symposiumon Logic in Computer Science , LICS ’18, pages 569–578, New York, NY, USA, 2018. ACM.ISBN 978-1-4503-5583-4. DOI: 10.1145/3209108.3209139.[44] Quanlong Wang.

Completeness of the ZX-calculus . PhD thesis, University of Oxford, 2018.[45] Emmanuel Jeandel, Simon Perdrix, and Renaud Vilmart. Completeness of the zx-calculus.

Logical Methods in Computer Science , 6 2020. DOI: 10.23638/LMCS-16(2:11)2020.[46] Kaining Zhang, Min-Hsiu Hsieh, Liu Liu, and Dacheng Tao. Toward trainability of quantumneural networks. arXiv preprint arXiv:2011.06258 , 2020.[47] Arthur Pesah, M Cerezo, Samson Wang, Tyler Volkoﬀ, Andrew T Sornborger, and Patrick JColes. Absence of barren plateaus in quantum convolutional neural networks. arXiv preprintarXiv:2011.02966 , 2020.

48] F. Verstraete, V. Murg, and J.I. Cirac. Matrix product states, projected entangled pairstates, and variational renormalization group methods for quantum spin systems.

Advancesin Physics , 57(2):143–224, 2008. DOI: 10.1080/14789940801912366. URL https://doi.org/10.1080/14789940801912366 .[49] Zhao-Yu Han, Jun Wang, Heng Fan, Lei Wang, and Pan Zhang. Unsupervised generativemodeling using matrix product states.

Phys. Rev. X , 8:031012, Jul 2018. DOI: 10.1103/Phys-RevX.8.031012. URL https://link.aps.org/doi/10.1103/PhysRevX.8.031012 . A Proof of theorem 2

Theorem 2.

Consider the spiders corresponding to θ j . We expand the spiders as follows. θ j − θ j ... ... ... .... . .. . .. . .... mm n n = | i ⊗ n h | ⊗ m | i ⊗ n h | ⊗ m . . . | i ⊗ n h | ⊗ m | i ⊗ n h | ⊗ m . . .+ e iθ j | i ⊗ n h | ⊗ m | i ⊗ n h | ⊗ m . . .+ e − iθ j | i ⊗ n h | ⊗ m | i ⊗ n h | ⊗ m . . .+ . We take the partial derivative of θ j on the two sides. We get θ j − θ j ... ... ... .... . .. . .. . .... mm n n = | i ⊗ n h | ⊗ m | i ⊗ n h | ⊗ m . . . ie iθ j | i ⊗ n h | ⊗ m | i ⊗ n h | ⊗ m . . . − ie − iθ j ∂∂θ j = | i ⊗ n h | ⊗ m | i ⊗ n h | ⊗ m . . . e i ( θ j + π ) | i ⊗ n h | ⊗ m | i ⊗ n h | ⊗ m . . .+ e − i ( θ j + π ) θ j + π − θ j − π ... ... ... .... . .. . .. . .... mn n = m π . B Proof of lemma 1

Lemma 1.

The following equation holds. α − α ... ... ... .... . .. . .. . .. . . mm n n π Z α ... ... ... .... . .. . .. . .. . . mm n n =d α Proof.

We expand the spiders as follows. α − α ... ... ... .... . .. . .. . .... mm n n = | i ⊗ n h | ⊗ m . . . | i ⊗ m h | ⊗ n | i ⊗ n h | ⊗ m . . . | i ⊗ m h | ⊗ n + e iα | i ⊗ n h | ⊗ m . . . | i ⊗ m h | ⊗ n | i ⊗ n h | ⊗ m . . . | i ⊗ m h | ⊗ n ++ e − iα . y Z π − π e kα dα = 0 , k = ± , we have α − α ... ... ... .... . .. . .. . .... mm n n π R α d α ... ... ... .... . .. . .. . .... mm n n == | i ⊗ n h | ⊗ m . . . | i ⊗ m h | ⊗ n + | i ⊗ n h | ⊗ m . . . | i ⊗ m h | ⊗ n . C Proof of lemma 2

Lemma 2.

We expand each spider on the left hand side of the equation as follows. α − α ... ... ... ............... mm n n = − α α ... ... ... ............... mm n n | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n + e iα | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n + e − iα | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n + | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n + e − iα | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n + | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n + e − i α | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n + e − iα | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n + e iα | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n + e i α | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n + | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n + e iα | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n + | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n + e iα | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n + e − iα | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n + | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n Since Z π e ikα dα = 0 , k = ± , ± , we integrate over α on each side and obtain α − α ... ... ... ............... mm n n = − α α ... ... ... ............... mm n n | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n + | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n + | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n + | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n + | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n + | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n | i ⊗ n h | ⊗ m ... | i ⊗ m h | ⊗ n π R α d α ... ... ... ............... mm n n ............ ............ m mnn ... ... ... ............... mm n n ............ ............ m mnn + π ... ... ... ............... mm n n ............ ............ m mnn + π = . Proof of lemma 3

Lemma 3.

By lemma 2, we have θ j θ k . . . . . . − θ k − θ j . . . . . .. . . θ j θ k . . .. . . − θ k − θ j . . .. . . . . .1(2 π ) Z θ j Z θ k d θ j d θ k . . . . . . . . . . . .. . . . . .. . .. . .. . . . . .= P a j ,a k ∈{ T ,T ,T } a j a k . Hence, it is suﬃcient to prove . . . . . . . . . . . .. . . . . .. . .. . .. . . . . . a j a k . . . . . . . . . . . .. . . . . .. . .. . .. . . . . . a j a k M a j ,a k = for a j , a j ∈ { T , T , T } .For a j = a k = T , we have . . . . . . . . . . . .. . . . . .. . .. . .. . . . . . = . . . . . . . . . . . .. . . . . .. . .. . .. . . . . .( f )= . . . . . . . . . . . .. . . . . .. . .. . .. . . . . .( hopf ) . . . . . . . . . . . .. . . . . .. . .. . .. . . . . .=( f ) . (29)And for ( a j , a k ) / ∈ { ( T , T ) , ( T , T ) } , the proof is almost the same as that of (29). ow, let us consider the case when ( a j , a k ) = ( T , T ). We can use rules in ﬁgure 4 as follows. . . . . . . . . . . . .. . . . . .. . .. . .. . . . . . =( f ) π π . . . . . . . . . . . .. . . . . .. . .. . .. . . . . . π π =( π ), ( f ) . . . . . . . . . . . .. . . . . .. . .. . .. . . . . . ππ ππ =( h ), ( i ) ( f ), π . . . . . . . . . . . .. . . π . . .. . .. . .. . . . . . π π =( hopf ) π . . . . . . . . . . . .. . . π . . .. . .. . .. . . . . . π π =( π ) . . . . . . . . . . . .. . . π . . .. . .. . .. . . . . . ππ − π . . . . . . . . . . . .. . . . . .. . .. . .. . . . . . π π = − ( f ) .=( f ) π . . . . . . . . . . . .. . . π . . .. . .. . .. . . . . . ππ E Proof of lemma 4

Lemma 4.

The following equation holds. θ θ θ . . .. . . − θ − θ − θ . . .. . . θ θ θ . . .. . . − θ − θ − θ . . .. . . . . .. . . . . .. . .. . .. . .. . .. . . π ) Z θ ,θ ,θ d θ d θ d θ = P a ,a ,a ∈{ T ,T ,T } a a a ET a ,a ,a · Here ET a ,a ,a can be regarded as a × × tensor ET [1 , · , · ] = 18   , ET [2 , · , · ] = 18   , ET [3 , · , · ] = 18   . roof. By lemma 2, it is suﬃcient to prove that . . .. . . . . .. . .. . .. . .. . .. . . = a a a ET a ,a ,a · . . .. . . . . .. . .. . .. . .. . .. . . a a a . . .. . . . . .. . .. . .. . .. . .. . . = a a ET a ,a ,a · for a , a , a ∈ { T , T , T } .We ﬁrst consider the case when a = T . Since a = T , we have = . . .. . . . . .. . .. . .. . .. . .. . . a a . . .. . . . . .. . .. . .. . .. . .. . . a a ( f )= . . .. . . . . .. . .. . .. . .. . .. . . a a ( h ), ( f ) = . . .. . . . . .. . .. . .. . .. . .. . . a a ( b )= . . .. . . . . .. . .. . .. . .. . .. . . a a ( h ), ( f ) 22 . Now, if a = T , then we have . . .. . . . . .. . .. . .. . .. . .. . . a . . .. . . . . .. . .. . .. . .. . .. . . a ( f ), ( hopf ) = . . .. . . . . .. . .. . .. . .. . .. . . a ( i ), ( i ) . ence, . . .. . . . . .. . .. . .. . .. . .. . . = . . .. . . . . .. . .. . .. . .. . .. . . ( f ) , . . .. . . . . .. . .. . .. . .. . .. . . =( f ), ( h ) π . . .. . . . . .. . .. . .. . .. . .. . . π =( hopf ) . . .. . . . . .. . .. . .. . .. . .. . . π = 0, . . .. . . . . .. . .. . .. . .. . .. . . =( f ), ( h ) π . . .. . . . . .. . .. . .. . .. . .. . . π =( hopf ) . . .. . . . . .. . .. . .. . .. . .. . . π = 0. That is ET [1 , , · ] = (cid:0) (cid:1) .If a = T , then . . .. . . . . .. . .. . .. . .. . .. . . a f ), ( hopf ) =( i ), ( i ) π . . .. . . . . .. . .. . .. . .. . .. . . a π . . .. . . . . .. . .. . .. . .. . .. . . a π = . . .. . . . . .. . .. . .. . .. . .. . . a π ππ = . . .. . . . . .. . .. . .. . .. . .. . . a π π π = . . .. . . . . .. . .. . .. . .. . .. . . a π π ( π ), ( f ) ( h ), ( f ) ( f ), ( hopf ), ( π ) ence, . . .. . . . . .. . .. . .. . .. . .. . . π π = . . .. . . . . .. . .. . .. . .. . .. . . π π = 0,( f ), ( hopf ) . . .. . . . . .. . .. . .. . .. . .. . . π π =( h ) π . . .. . . . . .. . .. . .. . .. . .. . . π π π =( f ), ( π ) . . .. . . . . .. . .. . .. . .. . .. . . π π , . . .. . . . . .. . .. . .. . .. . .. . . π π =( f ), ( h ) π . . .. . . . . .. . .. . .. . .. . .. . . π π π =( hopf ) . . .. . . . . .. . .. . .. . .. . .. . . π π π = 0. That is ET [1 , , · ] = (cid:0) (cid:1) .If a = T , then . . .. . . . . .. . .. . .. . .. . .. . . a f ), ( hopf ) =( i ), ( i ) π = = =( π ), ( f ) ( h ), ( f ) ( f ), ( hopf ), ( π ) . . .. . . . . .. . .. . .. . .. . .. . . a π . . .. . . . . .. . .. . .. . .. . .. . . a π . . .. . . . . .. . .. . .. . .. . .. . . a ππ π π . . .. . . . . .. . .. . .. . .. . .. . . a π π ππ . . .. . . . . .. . .. . .. . .. . .. . . a ππ π ence. . . .. . . . . .. . .. . .. . .. . .. . . ππ π =( f ), ( hopf ) = 0, . . .. . . . . .. . .. . .. . .. . .. . . ππ π . . .. . . . . .. . .. . .. . .. . .. . . ππ π =( f ), ( hopf ) = 0, π . . .. . . . . .. . .. . .. . .. . .. . . ππ π π . . .. . . . . .. . .. . .. . .. . .. . . ππ π =( h ), ( f ), ( π ) . π . . .. . . . . .. . .. . .. . .. . .. . . π π That is ET [1 , , · ] = (cid:0) (cid:1) .By now, we have proved that ET [1 , · , · ] = 18   . (30)Now, let us consider the case when a = T . Since a = T , we have = . . .. . . . . .. . .. . .. . .. . .. . . a a . . .. . . . . .. . .. . .. . .. . .. . . a a ( f )= . . .. . . . . .. . .. . .. . .. . .. . . a a ( h ), ( f ) = . . .. . . . . .. . .. . .. . .. . .. . . a a ( b )= . . .. . . . . .. . .. . .. . .. . .. . . a a ( h ), ( f ) 22 . π ππ ππ ow, if a = T , then we have . . .. . . . . .. . .. . .. . .. . .. . . a π . . .. . . . . .. . .. . .. . .. . .. . . a π =( f ), ( hopf ) . . .. . . . . .. . .. . .. . .. . .. . . a π =( i ), ( i ) Hence, . . .. . . . . .. . .. . .. . .. . .. . . π . . .. . . . . .. . .. . .. . .. . .. . . π =( h ) π π . . .. . . . . .. . .. . .. . .. . .. . . =( f ), ( π ) π . . .. . . . . .. . .. . .. . .. . .. . . π =( f ), ( hopf ) . . .. . . . . .. . .. . .. . .. . .. . . π = 0, . . .. . . . . .. . .. . .. . .. . .. . . π =( f ), ( hopf ) = 0. π . . .. . . . . .. . .. . .. . .. . .. . . π π , That is ET [1 , , · ] = (cid:0) (cid:1) .If a = T , then . . .. . . . . .. . .. . .. . .. . .. . . a π =( f ), ( hopf ) =( i ), ( i ) π . . .. . . . . .. . .. . .. . .. . .. . . a ππ . . .. . . . . .. . .. . .. . .. . .. . . a ππ =( π ) . . .. . . . . .. . .. . .. . .. . .. . . a ππππ =( h ), ( f ) . . .. . . . . .. . .. . .. . .. . .. . . a ππ =( f ), ( hopf ) . . .. . . . . .. . .. . .. . .. . .. . . a ππ =( i ), ( i ) . . .. . . . . .. . .. . .. . .. . .. . . a ππ . . .. . . . . .. . .. . .. . .. . .. . . a =( f ), ( π ) π ence, . . .. . . . . .. . .. . .. . .. . .. . . π . . .. . . . . .. . .. . .. . .. . .. . . π . . .. . . . . .. . .. . .. . .. . .. . . π =( f ) π . . .. . . . . .. . .. . .. . .. . .. . . π π =( f ), ( h ) = 0,( hopf ), . . .. . . . . .. . .. . .. . .. . .. . . π π . . .. . . . . .. . .. . .. . .. . .. . . π π =( f ), ( h ) = 0.( hopf ) That is ET [1 , , · ] = (cid:0) (cid:1) .If a = T , then . . .. . . . . .. . .. . .. . .. . .. . . a π =( f ), ( hopf ) π =( π )=( h ), ( f ) =( f ), ( hopf ) =( f ), ( π ) . . .. . . . . .. . .. . .. . .. . .. . . a ππ . . .. . . . . .. . .. . .. . .. . .. . . a πππ ππ . . .. . . . . .. . .. . .. . .. . .. . . a ππ ππ π . . .. . . . . .. . .. . .. . .. . .. . . a ππ ππ π . . .. . . . . .. . .. . .. . .. . .. . . a ππ π π ence. . . .. . . . . .. . .. . .. . .. . .. . . ππ π π . . .. . . . . .. . .. . .. . .. . .. . . ππ π π =( f ), ( hopf ) . . .. . . . . .. . .. . .. . .. . .. . . ππ π π . . .. . . . . .. . .. . .. . .. . .. . . ππ π =( f ), ( hopf ) . . .. . . . . .. . .. . .. . .. . .. . . ππ π π =( f ), ( hopf ) π π π = 0,= 0,= 0. π . . .. . . . . .. . .. . .. . .. . .. . . ππ π π π That is ET [1 , , · ] = (cid:0) (cid:1) .By now, we have proved that ET [2 , · , · ] = 18   . (31)Let us consider the case when a = T .When a = T , we have = . . .. . . . . .. . .. . .. . .. . .. . . a a . . .. . . . . .. . .. . .. . .. . .. . . a a ( f )=( h ), ( f ) =( b )= . . .. . . . . .. . .. . .. . .. . .. . . a a ( h ), ( f ) π π π . . .. . . . . .. . .. . .. . .. . .. . . a a π . . .. . . . . .. . .. . .. . .. . .. . . a a π ccording to the symmetry, we can use the result of the case when a = T . And we get ET [3 , · , · ] = 18   . (32) F Analysis of PQCs in section 4

F.1 Hardware-eﬃcient ansatz

We will prove some property of LT which are used in the analysis in 4.1. Theorem 8.

Suppose that λ , . . . , λ n are eigenvalues of LT . And | λ | ≥ | λ | ≥ · · · ≥ | λ n | . Then we have λ = λ = 1 , | λ j | < for j > . Proof.

By deﬁnition of EM , we can compute that EM = 

34 14 1414 34 − −

14 34 14 34

34 14 −

14 14 −  . (33)By computation, EM can be diagonalized. Four of its eigenvalues are 1 and other eigenvalues arein the interval ( − , { v ⊗ v , , v ⊗ v , , v , ⊗ v , , v , ⊗ v , } , (34)where v =   , v , =   , v , =   .LT is an operator on the tensor product of n of R . We denote the operator EM on the i -th and j -th R as EM i,j . Then the eigenspace of LT corresponding to the eigenvalue 1 is theintersection of the eigenspaces corresponding to the eigenvalue 1 of EM , , EM , , . . . , EM n − ,n , EM n, . Hence, by eq. (34), we have E = span { v , ⊗ · · · ⊗ v , , v , ⊗ · · · ⊗ v , } . .2 Tree tensor network ansatz In section 4.2, we used (21) . Here we will prove this equation.

Proof of (21) . By lemma 2, we only need to prove that T a,b,c TTN . . .. . . . . .. . .. . .. . . . . .. . .. . .. . . a b c . . .. . . . . .. . .. . .. . . . . .. . .. . .. . . a b c = . Similar to the proof of lemma 4, we ﬁrst consider the case when a = T .When a = T , we have . . .. . . . . .. . .. . .. . . . . .. . .. . .. . . b c = . . .. . . . . .. . .. . .. . . . . .. . .. . .. . . b c ( f ), ( hopf ) = . . .. . . . . .. . .. . .. . . . . .. . .. . .. . . b c ( i ), ( i ) . Now, a is disconnected with b and c . Hence, we can consider b and c individually. If b = T , then . . .. . . . . .. . .. . .. . . c . . .. . . . . .. . .. . .. . . c =( f ), ( hopf ) . . .. . . . . .. . .. . .. . . c =( i ), ( i ) Hence, . . .. . .. . .. . .. . .. . . . . .. . .= π . . .. . . π = . . .. . . π = = 0,, π . . .. . . π = .( f )( f ) ( hopf ) That it T TTN [1 , , · ] = 116 (cid:0) (cid:1) . If b = T , then ...... ............ c =( f ), ( hopf ) =( π ), ( f ) π ...... ............ c = ππππ ...... ............ c ππ = ππ ...... ............ c ππ ππ ...... ............ c ππ π ( π ) ( h ), ( f ) ence, . . .. . . ππ . . .. . . ππ . . .. . . ππππ . . .. . . ππ =( f ), ( hopf ) . . .. . . ππ =( f ), ( hopf )= . . .. . . π ( f ), ( h ), ( π ) = 0,= 0., That it T TTN [1 , , · ] = 116 (cid:0) (cid:1) . If b = T , then . . .. . . . . .. . .. . .. . . c . . .. . . c =( f ), ( hopf ) . . .. . . c =( i ), ( i ) π . . .. . .. . .. . . π . . .. . .. . .. . . π Hence, . . .. . .. . .. . .. . .. . . . . .. . .= π . . .. . . π = . . .. . . π = = 0,, π . . .. . . π = .( f )( f ) ( hopf ) That it T TTN [1 , , · ] = 116 (cid:0) (cid:1) . By now, we have proved that T TTN [1 , · , · ] = 116   . (35)Now, let us consider the case when a = T . hen a = T , we have . . .. . . . . .. . .. . .. . . . . .. . .. . .. . . b c = . . .. . . . . .. . .. . .. . . . . .. . . b c ππ ( f ), ( hopf ) =( π ) π . . .. . . . . .. . .. . .. . . . . .. . .. . .. . . b c ππ ππ =( π ) . . .. . . . . .. . .. . .. . . . . .. . .. . .. . . b c ππππ =( h ), ( f ). . .. . . ππ . . .. . . . . .. . .. . .. . . . . .. . .. . .. . . b c πππ . Now, a is disconnected with b and c . Hence, we can consider b and c individually. If b = T , then . . .. . . . . .. . .. . .. . . c ππ . . .. . . . . .. . .. . .. . . c ππ =( h ) π π . . . π π . . . . . .. . .. . .. . . c =( π ) π . . . π . . . . . .. . .. . .. . . c =( i ), ( f ), ( h ) π . . . π . . . . . .. . .. . .. . . c =( f ), ( hopf ) π . . . π . . . . . .. . .. . .. . . c =( i ), ( i ) Hence, π . . . π . . . π . . . π . . . π . . . π . . . ππ . . .. . . =( f ) . . .. . . π =( f ), ( h ) =( hopf )0,,. . .. . . π − π =( f ), ( π ) π π . . .. . . − =( f ), ( π ) π . That it T TTN [2 , , · ] = 116 (cid:0) − (cid:1) . If b = T , then ...... ............ c ππ =( h ), ( π ) π π ... π π ... ............ c π ... π ... ............ c ππ =( i ), ( f ), ( h ) =( f ), ( hopf ) =( π ), ( f ) π π =( f ), ( π ) π ... π ... ............ c ππ ππππ π ... π ... ............ c ππ ππ π ... π ... ............ c ππ π ence, π . . . π . . . πππ . . . π . . . πππ . . . π . . . ππππ . . .. . . π . . .. . . ππ =( f ), ( hopf ) ( f ), ( π )= = 0, π . . . π . . . πππ = 0.=( f ), ( hopf ) , That it T TTN [2 , , · ] = 116 (cid:0) (cid:1) .ow If b = T , then . . .. . . . . .. . .. . .. . . c ππ . . .. . . . . .. . .. . .. . . c ππ =( h ) π π . . . π π . . . . . .. . .. . .. . . c =( π ) π π . . . π π . . . . . .. . .. . .. . . c =( i ), ( f ), ( h ) =( f ), ( hopf ) =( i ), ( i ) π π ππ π π . . . π π . . . . . .. . .. . .. . . c π π π . . . π π . . . . . .. . .. . .. . . c π =( f ), ( π ) π . . . π . . . . . .. . .. . .. . . c − π Hence, π . . . π . . . π . . . π . . . π . . . π . . . − − − ππ . . .. . . − =( f ) . . .. . . − π =( f ), ( h ) =( hopf )0,,. . .. . . π π =( f ), ( π ) π π . . .. . . =( f ), ( π ) π . That it T TTN [2 , , · ] = 116 (cid:0) − (cid:1) . By now, we have proved that T TTN [2 , · , · ] = 116  −

10 1 0 −  . (36) ow, let us consider the case when a = T .When a = T , we have . . .. . . . . .. . .. . .. . . . . .. . .. . .. . . b c = . . .. . . . . .. . .. . .. . . . . .. . .. . .. . . b c ( f ), ( hopf ) = . . .. . . . . .. . .. . .. . . . . .. . .. . .. . . b c ( i ), ( i ) π π π . Now, a is disconnected with b and c . And the part of b and c is same as the case when a = T .Hence, we have T TTN [3 , · , · ] = 116   . (37) Now we are going to prove that there is a lower bound for the variance of gradients in the treetensor network ansatz.

Theorem 6.

For tree tensor network ansatz shown in section 4.2, if H = k I + k X + k Y + k Z, then we have Var (cid:18) ∂ h H i ∂θ (cid:19) ≥ k + k n ˜ I ( u , . . . , u n ) + k n ˜ I ( w , . . . , w n ) , (25) for some u j , w j ∈ { v , , v , v − , } . Here ˜ I ( u , . . . , u n ) is deﬁned in (24) . And ˜ I is a n dimensionaltensor which only depends on the input state. If the input state is ρ , then ˜ I is deﬁned as follows. ˜ I a ,...,a n = ρρ − π − π − π π π π − π − π − π π π π ...... ...... T a T a T a n ... , where a j ∈ { , , } .Proof. We ﬁrst prove that ˜ I ( u , . . . , u n ) ≥ , u j ∈ (cid:8) v , v , , v − , (cid:9) , where v =   , v , =   , v − , =  −  . ote that if u is v , or v − , , then ρρ − π − π − π π π π − π − π − π π π π ...... ...... T + T . . .. . . ...˜ I ( v , , . . . ) = ρρ − π − π − π π π π − π − π − π π π π ...... ...... T − T . . .. . . ...˜ I ( v − , , . . . ) =, . By T + T + π = | i h | + | i h | + | i h | + | i h | == ( | i h | + | i h | ) ⊗ ( | i h | + | i h | ) T − T + π = | i h | − | i h | − | i h | + | i h | == ( | i h | − | i h | ) ⊗ ( | i h | − | i h | ) == ππ ,, T = π = | i h || i h | | i h || i h | + , , we can expand ˜ I ( u , . . . , u n ) as a sum of squares. Thus, we have proved that˜ I ( u , . . . , u n ) ≥ . Now let us consider the lower bound of

Var (cid:16) ∂ h H i ∂θ (cid:17) . By the graph-like ZX-diagram we get, wehave ˜ H =  ˜ H ˜ H ˜ H  which is deﬁned as follows. H ˜ H j π T j − π H − π π = , for j = 1 , , H = k I + k X + k Y + k Z . Then we can get˜ H = 2 k v , + 2( k + k ) v + 2 k v − , . (38)And by the deﬁnition of T TTN , we can get (23). Hence, we can expand the variance as

Var (cid:18) ∂ h H i ∂θ (cid:19) = X u j ∈{ v ,v , ,v − , } a ( u , . . . , u n ) · ˜ I ( u , . . . , u n ) , here a ( u , . . . , u n ) is a non-negative number. We will prove that there exists one choise of( u , . . . , u n ) such that a ( u , . . . , u n ) ∈ Ω( 1poly( n ) ) . When we use (23) to expand the variance, the only case that will cause a coeﬃcient < a ( u , . . . , u n ) only dependon the number of times that we use the second equation. And it depends on the location of θ .Note that the worst case is that θ is the parameter of the ﬁrst gate applying to the ﬁrst qubit. Inthis case, we need to use the second equation in (23) for 2 log( n ) − Var (cid:18) ∂ h H i ∂θ (cid:19) ≥ k + k n ˜ I ( u , . . . , u n ) + k n ˜ I ( w , . . . , w n ) , for some u j , w j ∈ { v , , v , v − , } .Note that I ( u , . . . , u n ) only depends on the input state ρ . If given that˜ I ( u , . . . , u n ) ∈ Ω( 1poly( n ) ) or ˜ I ( w , . . . , w n ) ∈ Ω( 1poly( n ) ) , then there exist no barren plateau in the tree tensor network ansatz. F.3 QCNN

Theorem 7.

For the QCNN ansatz shown in section 4.3, if H = k I + k X + k Y + k Z, then we have Var (cid:18) ∂ h H i ∂θ (cid:19) ≥ k + k n ˜ I ( u , . . . , u n ) + k n ˜ I ( w , . . . , w n ) , (27) for some u j , w j ∈ { v , , v , v − , } . Here ˜ I is a n dimensional tensor which only depends on theinput state ρ . ˜ I a ,...,a n = ρρ ...... ...... T a T a T a n ... , where a j ∈ { , , } .Proof. The proof is similar to that of the tree tensor network ansatz (Theorem 6), so we will onlygive a sketch of the proof. We also have˜ I ( u , . . . , u n ) ≥ , u j ∈ { v , , v , v − , } . From the graph-like ZX-diagram, we have H ˜ H j T j H = , for j = 1 , , ence, ˜ H = 2 k v , + 2( k + k ) v + 2 k v − , . We will analyze each term of ˜ H in the variance.Using (26), the term 2 k v , will become 0 by P v , = 0 . The terms 2( k + k ) v and 2 k v − , will generate terms containing v after expanding using (26).And each time after generating terms containing v , a coeﬃcient ≥ will be multiplied to thevariance. Hence, if we want to bring v to P , we need to generate terms containing v for l times,where l is a path from the location of v to the location of P .By the structure of the QCNN ansatz, we have l ≥ n ) . It will generated a coeﬃcient ≥ n ) .Hence we have Var (cid:18) ∂ h H i ∂θ (cid:19) ≥ k + k n ˜ I ( u , . . . , u n ) + k n ˜ I ( w , . . . , w n ) , for some u j , w j ∈ { v , , v , v − , } ..