Classical restrictions of generic matrix product states are quasi-locally Gibbsian
Yaiza Aragonés-Soria, Johan Åberg, Chae-Yeun Park, Michael J. Kastoryano
CClassical restrictions of generic matrix product states arequasi-locally Gibbsian
Yaiza Aragon´es-Soria , Johan ˚Aberg , Chae-Yeun Park , and Michael J. Kastoryano , , Institute for Theoretical Physics, University Cologne, Germany Amazon Quantum Solutions Lab, Seattle, Washington 98170, USA AWS Center for Quantum Computing, Pasadena, California 91125, USA
October 23, 2020
Abstract
We show that the norm squared amplitudes with respect to a local orthonormal basis (theclassical restriction) of finite quantum systems on one-dimensional lattices can be exponentiallywell approximated by Gibbs states of local Hamiltonians (i.e., are quasi-locally Gibbsian) if theclassical conditional mutual information (CMI) of any connected tripartition of the lattice israpidly decaying in the width of the middle region. For injective matrix product states, wemoreover show that the classical CMI decays exponentially, whenever the collection of matrixproduct operators satisfies a ‘purity condition’; a notion previously established in the theory ofrandom matrix products. We furthermore show that violations of the purity condition enablesa generalized notion of error correction on the virtual space, thus indicating the non-genericnature of such violations. The proof of our main result makes extensive use of the theory ofrandom matrix products, and may find applications elsewhere.
Considerable effort has been devoted to understanding the entanglement properties of many-bodyquantum states. For finite one-dimensional lattices systems, the theory of Matrix Product States(MPSs) provides a complete framework for describing entanglement of gapped many-body systems[1], and allows for efficient high precision simulations via the DMRG algorithm [2, 3, 4]. Similarlyimpressive degrees of numerical precision can be reached in other settings, such as disorder [5], opensystems [6], time evolution [7], or critical systems [8]. The success of these simulation methods canbe traced back to the accurate parametrization of entanglement in MPS. There exist extensions tolattices of higher dimensions (projected entangled pair states) but these have been far less usefulfor simulations, due to their extensive entanglement growth.In contrast, quantum Monte-Carlo simulations are largely based on heuristic assumptions onthe weights and phases of the underlying state. Indeed, if the system under study can be cast in aform with only positive weights, then Monte-Carlo methods often work well, though convergenceguarantees are only known in very special cases [9]. This in turn is believed to be due to the localGibbsian nature of the classical restriction of the state. Classical Monte-Carlo sampling is knownto converge rapidly for Ising type problems [10, 11], while quantum variational Monte-Carlo isoften successful when using a locally restricted Gibbs Ansatz, such as the Jastrow-Ansatz. Furtherevidence of the importance of locality in the Ansatz wavefunction has been observed for more1 a r X i v : . [ qu a n t - ph ] O c t xpressive Ans¨atze, such as the complex Restricted Boltzmann machine [12], where the activationsnaturally preserve locality in many cases. Hence, whereas tensor network states explicitly encodethe local entanglement structure in their construction, quantum (variational) Monte Carlo implicitlyinvokes locality through the pervasive Gibbsian nature of probability distributions.Here, we connect these two pictures by showing that generic injective MPS [13] have classicalrestrictions that are quasi-locally Gibbsian. More precisley, we here refer to a probability distribu-tion as locally Gibbsian if it can be written as the equilibrium distribution of a local Hamiltonian,i.e., as a sum of terms that each spans at most (cid:96) adjacent sites. Well known examples include theIsing and Potts models. We similarly say that a distribution is quasi-locally Gibbsian, if it can beapproximated by Gibbs distributions corresponding to local Hamiltonians h (cid:96) , where the error of theapproximation in some sense decays exponentially with increasing (cid:96) . Such notions appear in variousguises in the literature, e.g., [14], which requires that the coefficients in the cluster expansion oflog( p ) are rapidly decaying with the order of the cluster.As the first step towards proving the generic quasi-local Gibbs property of injective MPS, weshow (in Section 4) that probability distributions on a one-dimensional lattice with open boundaryconditions are quasi-locally Gibbsian if the Conditional Mutual Information (CMI) between anytripartition of the lattice is decaying rapidly in the width of the middle region. The stronger thedecay of the CMI, the more local the Gibbs distribution. In the case of zero correlation length,the distribution is (strictly) locally Gibbs [15]. A number of recent studies in quantum informationtheory have revealed connections between the CMI and the Gibbsian nature of density matrices.In Ref. [16], the authors show that the quantum CMI of a full rank density matrix on a one-dimensional lattice is small if and only if the state is Gibbsian. The Gibbsian nature of states hasimportant implications for the nature of edge states of topologically ordered systems [17, 18]. Ourresults show that similar equivalences hold for classical restrictions of quantum states.The second step towards establishing the generic quasi-locality is also our main result; that theclassical restriction of injective MPS have an exponentially decaying CMI if the matrix productoperators satisfy a condition referred to as purity (see Def. 3). This condition has previously beenshown [19, 20] to imply the ‘purification’ of quantum trajectories resulting from the applications ofsequences of random matrices on an initial state. In our setting, the classical CMI can be rewrittenin terms of the expected entanglement entropy after measurements on the conditional subsystem.A vanishing entanglement entropy is thus equivalent to the purification of the state-trajectoryinduced by the sequence of measurements on the virtual system. The purification of trajectoriesimplies that the system asymptotically jumps between pure states of a specific stationary measure,irrespective of what (mixed) state the system started in. We are currently not aware of a meaningfuloperational interpretation of the stationary stochastic process, and believe it to be quite hard toevaluate in practice [21]. Furthermore, and perhaps counter-intuitively, we observe that the rate ofdecay towards the stationary measure is unrelated to the gap of the transfer operator of the MPS.We moreover do not know of a closed functional form for the decay rate, in terms of the matrixproduct operators.One may note that our setting, which focuses on the degree of conditional post-measuremententanglement, is closely related to the notion of localizable entanglement [22, 23, 24]. The latteris obtained by optimizing the measurements over all possible local bases, while we consider a fixedbasis. However, to the best of our knowledge, a general proof of the exponential decay of thelocalizable entanglement has not been shown previously.As a further attempt to gain a better understanding of the purity condition, we moreover in-vestigate the conspicuous similarity between (the violation of) the purity condition (see Def. 3)2nd the Knill-Laflamme error correction condition [25]. Indeed, we find (in Section 6.2) that thepurity-condition can be regarded as the non-existence of a non-trivial correctable subspace thatpersists indefinitely throughout iterated applications of an error-model, in a somewhat unconven-tional error correction scenario. One may note that invariant subspaces are special cases of suchcorrectable spaces. As an example, MPS with symmetry-protected topological order are associatedto invariant subspaces [26] and would thus violate the purity condition.The proof of our main theorem relies heavily on the theory of random matrix products, and inparticular on the work of Benoist et. al. [19] and Maassen et. al. [20]. Since these results involvenotions from probability theory that likely are unfamiliar to most of the quantum informationcommunity, we reproduce in the appendix many of the basic results in a language that should bemore familiar to the quantum-information reader. We hope that this will facilitate the access to arich and extensive body of work that should see many more applications in the fields of quantuminformation and many body physics. For instance, the theory of random matrix products hasrecently been leveraged in a different setting, to show ergodicity for ensembles of quantum channels[27, 28].Concerning the structure of the paper, we begin by introducing the notation in Section 2,while Section 3 focuses on the central object in this investigation, namely the CMI with respectto classical restrictions of MPS. Section 4 presents the first result of the paper: an exponentiallydecaying CMI implies quasi-local Gibbs distributions. Section 5 is devoted to the main result,namely the exponentially decaying CMI for a broad class of MPS. Section 6 provides examplesand observations, where we in Section 6.1 observe that MPS corresponding to symmetry protectedphases violate the purity condition. In Section 6.2 we further investigate the purity condition andshow that its violation can be regarded as a type or error-correction condition. Section 6.3 comparesthe convergence rate of the CMI with the rate of the converge to the fixed point of the transferoperator. Concrete examples are provided in Section 6.4. We finish with an outlook in Section 7. We consider pure states defined on a finite one-dimensional lattice, Λ, and associate a finitedimensional Hilbert space of dimension d to each site. We index the sites of the lattice ac-cording to a tripartition of the lattice Λ = ABC as follows: we denote sites in region A as −| A | + 1 , −| A | + 2 , . . . , − ,
0; sites in region B as 1 , . . . , N ; and sites in region C as N + 1 , . . . , | BC | .This peculiar indexing of sites will make sense later on when considering the CMI for MPSs.Figure 1: We consider a MPS on a finite lattice, Λ, which is broken up into three contiguous regionssuch that Λ = ABC . We denote sites in region A as −| A | + 1 , −| A | + 2 , . . . , − ,
0; sites in region B as 1 , . . . , N ; and sites in region C as N + 1 , . . . , | BC | .Let | x Λ (cid:105) = | x −| A | +1 , . . . , x , x , . . . , x N , . . . , x | BC | (cid:105) be a local orthonormal basis, where {| x i (cid:105)} d − x i =0
3s the local basis at site i . Unless specified otherwise, we will be working with translationallyinvariant MPSs with open boundary conditions | Ψ (cid:105) = 1 K d − (cid:88) x −| A | +1 ,...,x | BC | =0 (cid:104) R | A x | BC | · · · A x −| A | +1 | L (cid:105)| x −| A | +1 · · · x | BC | (cid:105) , (1)where K is a normalization factor. Here, A x i are D × D matrices encoding correlations in the systemand | L (cid:105) and | R (cid:105) are normalized states on the D -dimensional virtual space specifying the boundaryconditions, where D is known as the bound dimension of the MPS. Without loss of generality,we consider (left-)normalized MPS, which enforces that (cid:80) d − x i =0 A † x i A x i = . Left normalizationguarantees that the completely positive map E ( · ) := d − (cid:88) x i =0 A x i · A † x i (2)is trace preserving. The map E is often referred to as the transfer operator and maps densitymatrices on the virtual space to density matrices from left to right. The adjoint map, E ∗ , mapsoperators from right to left along the chain. Our choice of boundary conditions serves mainly fornotational simplicity. The results in the paper extend naturally to periodic or mixed boundaryconditions. For periodic boundary conditions, the regions ABC need to be chosen differently toensure that B separates A from C .The normalization constant can be expressed concisely as K = Tr (cid:2) E | Λ | ( L ) R (cid:3) , where forshorthand notation we write R = | R (cid:105)(cid:104) R | and L = | L (cid:105)(cid:104) L | . Classical Restrictions
For a given local basis {| x Λ (cid:105)} , we define the quantum channelΦ Λ ( ψ ) = (cid:88) x Λ | x Λ (cid:105)(cid:104) x Λ |(cid:104) x Λ | ψ | x Λ (cid:105) . (3)In other words, Φ Λ generates a state that is diagonal with respect to the basis {| x Λ (cid:105)} , by deletingthe off-diagonal elements of the input ψ . We refer to Φ Λ as the classical restriction (also commonlyreferred to as a ‘dephasing map’ or ‘pinching’). Since Φ Λ ( ψ ) is diagonal, the map Φ Λ effectivelydefines a classical probability distribution, p ψ ( x Λ ) = (cid:104) x Λ | ψ | x Λ (cid:105) , for any choice of basis {| x Λ (cid:105)} .We also consider the channel that measures a subset of systems B ⊂ Λ and we denote it asΦ B ( ψ ) = (cid:88) x B | x B (cid:105)(cid:104) x B |(cid:104) x B | ψ | x B (cid:105) , (4)= (cid:88) x B p ψ ( x B ) ψ ( x B ) , (5)with | x B (cid:105) = (cid:78) i ∈ B | x i (cid:105) . Here, the channel Φ B similarly defines a classical probability distributionon the sites in B by p ψ ( x B ) = (cid:104) x B | ψ B | x B (cid:105) , where ψ B := Tr Λ \ AC ψ is the reduced state of ψ on B .Note that p ψ ( x B ) = (cid:88) x AC p ψ ( x ) , (6)where recall that Λ = ABC . Moreover, we refer to the post-measurement state after obtaining themeasurement outcome x B as ψ ( x B ) = 1 (cid:112) p ψ ( x B ) | x B (cid:105)(cid:104) x B | ⊗ (cid:104) x B | ψ | x B (cid:105) . (7)4onsider now the MPS defined in Eq. (1). The probability distribution on B is p Ψ ( x B ) = 1 K Tr (cid:104) A x N · · · A x E | A | ( L ) A † x · · · A † x N E ∗| C | ( R ) (cid:105) , (8)where E n is understood as convolution of the map and x B := x , . . . , x N , with x i = 0 , . . . , d − E [29]. The latter means that E hasa unique full-rank fixed point, i.e., there exists a unique full-rank density operator ρ such that E ( ρ ) = ρ . A consequence of the injectivity of the MPS is thus that lim | A |→∞ E | A | ( χ ) = ρ Tr( χ ),lim | C |→∞ E ∗| C | ( Q ) = Tr( Qρ ), and lim | A |→∞ , | C |→∞ K = Tr( Rρ ) (cid:54) = 0. Hence, if region B is keptfixed, while regions A and C both grow to infinity, the probability distribution (8) on B reduces to p Ψ ( x B ) = Tr (cid:104) A x N · · · A x ρA † x · · · A † x N (cid:105) . (9) Throughout the paper, we use a number of entropic quantities, which we introduce in this section.In particular, we switch back and forth between classical and quantum systems. The quantumvon Neumann entropy of a mixed state, χ , is denoted as S ( χ ) = − Tr χ log χ , while the classicalentropy is referred to as H ( p ) = − (cid:80) x p ( x ) log p ( x ) for a classical probability distribution p ( x ).Here, log denotes the natural logarithm. In the next section, we use the classical relative entropyas a measure of distinguishability between probability distributions. The classical relative entropyof p ( x ) with respect to p ( x ) is defined as S ( p || p ) = (cid:88) x p ( x ) log (cid:20) p ( x ) p ( x ) (cid:21) . (10)The (quantum) CMI between regions A and C conditioned on region B , is given by I χ ( A : C | B ) = S ( χ AB ) + S ( χ BC ) − S ( χ B ) − S ( χ ABC ) . (11)After applying the classical conditioning map, Φ Λ in Eq. (3), on a quantum state, χ , we get theclassical CMI I Φ( χ ) ( A : C | B ) = I p χ ( A : C | B ) = H ( p χ,AB ) + H ( p χ,BC ) − H ( p χ,B ) − H ( p χ,ABC ) , (12)where p χ,A := p χ ( x A ) = (cid:104) x A | χ A | x A (cid:105) .We now point out an important observation on the CMI [30]. Suppose that we have a purestate, ψ = | ψ (cid:105)(cid:104) ψ | , and we measure all spins in region B . Then, the quantum CMI of the post-measurement state satisfies I p ψ,B ( A : C | B ) ≤ I ψ ( x B ) ( A : C | B ) , (13)= (cid:104) S [ ψ A ( x B )] (cid:105) p ψ ( x B ) + (cid:104) S [ ψ C ( x B )] (cid:105) p ψ ( x B ) , = 2 (cid:104) S [ ψ C ( x B )] (cid:105) p ψ ( x B ) , (14)where the state ψ X ( x B ) is the reduced state in region X of the post-measurement state, ψ ( x B ),and (cid:104) S [ ψ ( x )] (cid:105) p ψ ( x ) is the average von Neumann entropy of ψ ( x ) over p ψ ( x ), i.e., (cid:104) S [ ψ ( x )] (cid:105) p ψ ( x ) := (cid:88) x p ψ ( x ) S [ ψ ( x )] . (15)5he inequality in Eq. (13) comes from monotonicity of the relative entropy. Note that S [ ψ A ( x B )] = S [ ψ C ( x B )] since (cid:104) x B | ψ | x B (cid:105) p ψ ( x B ) is a pure state on the bipartition AC . Eq. (13) allows usto characterise the states that have a small post-measurement CMI by finding the states that havea small average entropy of ψ C ( x B ).Let us now go back to the MPS described in the previous section. With the injective MPSin the canonical form of Eq. (1), it can be shown that the reduced state of the post-measurementstate, Ψ C ( x B ), is (up to zero eigenvalues) isospectral to1 p Ψ ( x B ) K (cid:113) E ∗| C | ( R ) A x N · · · A x E | A | ( L ) A † x · · · A † x N (cid:113) E ∗| C | ( R ) . (16)The average von Neumann entropy of the reduced state of a post-measurement translationallyinvariant injective MPS is then (cid:104) S [Ψ C ( x B )] (cid:105) p Ψ ( x B ) = (cid:88) x B p Ψ ( x B ) S (cid:32) p Ψ ( x B ) K F A x N · · · A x σA † x · · · A † x N F † (cid:33) , (17)where σ := E | A | ( L ) and F † F := E ∗| C | ( R ). Eq. (17) will be the main object of study throughoutthis paper. As mentioned earlier, a translationally invariant injective MPS results in a primitivechannel E . On a finite-dimensional space, this implies that for sufficiently large | A | and | C | , itfollows that both σ and F are full-rank operators. We also recall that lim | A |→∞ E | A | ( χ ) = ρ Tr( χ ),lim | C |→∞ E ∗| C | ( Q ) = Tr( Qρ ), and lim | A |→∞ , | C |→∞ K = Tr( Rρ ) (cid:54) = 0, and consequently (17)reduces to (cid:104) S [Ψ C ( x B )] (cid:105) p Ψ ( x B ) = (cid:88) x B p Ψ ( x B ) S (cid:32) A x N · · · A x ρA † x · · · A † x N p Ψ ( x B ) (cid:33) , (18)for infinite chains. In this section we consider probability distributions p ,..., | Λ | on finite one-dimensional lattices, Λ,and discuss conditions for when these can be well approximated by Gibbs distributions of localHamiltonians. We say that a Hamiltonian is (cid:96) -local if it can be written as a sum of terms thateach span at most (cid:96) consecutive sites. A distribution is (cid:96) -local if it is the Gibbs distribution ofsome (cid:96) -local Hamiltonian. In a similar spirit, we say that a distribution is quasi-locally Gibbs if itcan be approximated by (cid:96) -local distributions, where the error of this approximation in some sensedecays fast with respect to increasing (cid:96) . In this section we show that, if the CMI I p ,..., | Λ | ( A : C | B )of the distribution p ,..., | Λ | decays sufficiently fast with increasing size | B | of the bridging regionin a contiguous tripartition Λ = ABC of the lattice, then p ,..., | Λ | is quasi-locally Gibbs. (Forconvenience we change the notation in this section and enumerate the sites of the entire lattice as1 , . . . , | Λ | .) This result is similar in spirit to Kozlov’s theorem [14] (see also [30]). Although thissection exclusively focuses on probability distributions, the application to quantum states becomesapparent in Section 5, where we consider classical restrictions of underlying injective MPS andshow that these generically are quasi-locally Gibbsian.6et p ,..., | Λ | be a probability distribution over a finite sub-chain Λ of a one-dimensional lattice.We let p j denote the marginal distribution at site j . For 1 ≤ j < k ≤ | Λ | we let p j,...,k denote themarginal distribution of the chain j, . . . , k . In the following, we assume that p ,..., | Λ | ( x , . . . , x | Λ | ) > , ∀ x , . . . , x | Λ | , (19)which consequently leads to p j,...,k ( x j , . . . , x k ) >
0. With these assumptions, we can define h j,...,k := − log p j,...,k , ≤ j ≤ k ≤ | Λ | , (20)and thus p j,...,k = e − h j,...,k . Hence, we have constructed h j,...,k such that p j,...,k is Gibbs distributedwith respect to h j,...,k , with β = 1 in e − βh j,...,k /Z ( h j,...,k ), where one may note that Z ( h j,...,k ) := (cid:80) x j ,...,x k e − h j,...,k ( x j ,...,x k ) = 1. For j ≤ k we define H j,k := h j if k = j, | Λ | ≥ j ≥ ,h j,j +1 − h j − h j +1 if k = j + 1 , | Λ | − ≥ j ≥ ,h j +1 ,...,k − + h j,...,k − h j,...,k − − h j +1 ,...,k if | Λ | − ≥ k − ≥ j ≥ . (21)Next, we define h (cid:96) ,..., | Λ | := (cid:88) ≤ j ≤ k ≤| Λ | ,k − j ≤ (cid:96) H j,k . (22)Hence, h (cid:96) ,..., | Λ | only includes the terms H j,k for which the range of the sub-chain j, . . . , k does notexceed (cid:96) . In other words, h (cid:96) ,..., | Λ | is a (cid:96) -local Hamiltonian. The associated (cid:96) -local Gibbs distributionis p (cid:96) ,..., | Λ | ( x , . . . , x | Λ | ) := e − h (cid:96) ,..., | Λ | ( x ,...,x | Λ | ) Z ( h (cid:96) ,..., | Λ | ) , (23)with Z ( h (cid:96) ,..., | Λ | ) := (cid:88) x (cid:48) ,...,x (cid:48)| Λ | e − h (cid:96) ,..., | Λ | ( x (cid:48) ,...,x (cid:48)| Λ | ) . The following proposition expresses the classical relative entropy (see Eq. (10)) between the Gibbsdistribution p ,..., | Λ | associated to the full Hamiltonian, h ,..., | Λ | , and the Gibbs distribution p (cid:96) ,..., | Λ | associated to the l -local Hamiltonian, h (cid:96) ,..., | Λ | , in terms of the CMI between suitable regions of thechain. Hence, if the latter are sufficiently small, then the approximating (cid:96) -local distribution p (cid:96) ,..., | Λ | is close to the original distribution p ,..., | Λ | . Proposition 1.
For p ,..., | Λ | ( x , . . . , x | Λ | ) > , let p (cid:96) ,..., | Λ | be as defined in Eqns. (20-23). For ≤ (cid:96) ≤ | Λ | − it is the case that S ( p ,..., | Λ | (cid:107) p (cid:96) ,..., | Λ | ) = (cid:88) ≤ j ≤ k ≤| Λ | ,k − j>(cid:96) I ( j : k | j + 1 , . . . , k − . (24) Proof.
Let (cid:104)·(cid:105) p ,..., | Λ | denote the expectation value with respect to the distribution p ,..., | Λ | . One canconfirm that (cid:104) H j,k (cid:105) p ,..., | Λ | = S ( p j ) if k = j, | Λ | ≥ j ≥ , − I ( j : k ) if k = j + 1 , | Λ | − ≥ j ≥ , − I ( j : k | j + 1 , . . . , k −
1) if | Λ | − ≥ k − ≥ j ≥ . (25)7ne can also confirm that (cid:88) ≤ j ≤ k ≤| Λ | H j,k = h ,..., | Λ | . (26)With a somewhat lengthy but straightforward calculation, one can moreover show that Z ( h (cid:96) ,..., | Λ | ) = 1 , ≤ (cid:96) ≤ | Λ | − . (27)For 2 ≤ (cid:96) ≤ | Λ | −
2, we can combine the definition of the relative entropy, with the definitionof p (cid:96) ,..., | Λ | in (23), and h ,..., | Λ | = − log p ,..., | Λ | , as well as the fact that p ,..., | Λ | is a probabilitydistribution, to get S ( p ,..., | Λ | (cid:107) p (cid:96) ,..., | Λ | ) = − (cid:88) x ,...,x | Λ | p ,..., | Λ | ( x , . . . , x | Λ | ) h ,..., | Λ | ( x , . . . , x | Λ | )+ (cid:88) x ,...,x | Λ | p ,..., | Λ | ( x , . . . , x | Λ | ) h (cid:96) ,..., | Λ | ( x , . . . , x | Λ | )+ log Z ( h (cid:96) ,..., | Λ | ) , [By (27), (26), and the definition of h (cid:96) ,..., | Λ | in (22)]= − (cid:88) x ,...,x | Λ | p ,..., | Λ | ( x , . . . , x | Λ | ) (cid:88) ≤ j ≤ k ≤| Λ | H j,k ( x , . . . , x | Λ | )+ (cid:88) x ,...,x | Λ | p ,..., | Λ | ( x , . . . , x | Λ | ) (cid:88) ≤ j ≤ k ≤| Λ | ,k − j ≤ (cid:96) H j,k ( x , . . . , x | Λ | ) , = − (cid:88) ≤ j ≤ k ≤| Λ | ,k − j>(cid:96) (cid:104) H j,k (cid:105) p ,..., | Λ | , [By (25), for the case k − j > (cid:96) ≥ (cid:88) ≤ j ≤ k ≤| Λ | ,k − j>(cid:96) I ( j : k | j + 1 , . . . , k − . (28)Loosely speaking, the above proposition tells us that, if the CMIs I ( j : k | j + 1 , . . . , k −
1) insome sense decrease sufficiently fast with increasing k − j , then the (cid:96) -local Gibbs distribution p (cid:96) ,..., | Λ | approaches the true distribution p ,..., | Λ | . The following lemma formalizes this intuition. Lemma 2.
Suppose that the probability distribution p ,..., | Λ | ( x , . . . , x | Λ | ) > is such that thereexists a monotonically decreasing function ξ : N → R , such that for every contiguous partition Λ =
ABC , it is the case that I p ( A : C | B ) ≤ ξ ( | B | ) . (29) Let p (cid:96) ,..., | Λ | be defined in Eqns. (20-23). Then, for ≤ (cid:96) ≤ | Λ | − , we have S ( p ,..., | Λ | (cid:107) p (cid:96) ,..., | Λ | ) ≤| Λ | ξ ( (cid:96) ) . (30) Proof.
With the general observation that I ( A : C | B ) ≤ I ( A A : C C | B ), for A = A A and C = C C , we can use A = { , . . . , j − } , A = { j } , B = { j + 1 , . . . , k − } , C = { k } , and C = { k + 1 , . . . , | Λ |} in (24) and (29), which yields S ( p ,..., | Λ | (cid:107) p (cid:96) ,..., | Λ | ) ≤ (cid:88) ≤ j ≤ k ≤| Λ | ,k − j>(cid:96) ξ ( |{ j + 1 , . . . , k − }| ) . (31)8ith the observation that k − j > (cid:96) implies k − j − ≥ (cid:96) and thus |{ j + 1 , . . . , k − }| = k − j − ≥ (cid:96) ,together with the assumption that the function ξ is monotonically decreasing, we thus get S ( p ,..., | Λ | (cid:107) p (cid:96) ,..., | Λ | ) ≤ ξ ( (cid:96) ) (cid:88) ≤ j ≤ k ≤| Λ | ,(cid:96) Definition 3 (Purity [19]) . Let { A x } d − x =0 be linear operators on a complex finite-dimensional Hilbertspace, H . We say that { A x } d − x =0 satisfies the purity condition if the following implication holds:If P is an orthogonal projector on H such that P A † x · · · A † x N A x N · · · A x P ∝ P, ∀ N ∈ N , ∀ ( x , . . . , x N ) ∈ { , . . . , d − } × N , then rank( P ) = 1 . (33)Note that the condition P A † x · · · A † x N A x N · · · A x P ∝ P is trivially true whenever P is a rank-one projector. Hence, the purity condition means that P A † x · · · A † x N A x N · · · A x P ∝ P only holdsfor rank-one projectors. The purity condition bears some resemblance to the Knill-Laflamme con-dition [25]. We discuss the relationship between the purity condition and error correction/detectionin Section 6.2.In terms of the purity condition, our main theorem is phrased as follows. Theorem 4. Let Ψ be an injective MPS on a finite one-dimensional lattice, Λ , with finite bonddimension, D , and open boundary conditions. If the purity condition holds for the matrix product perators associated with a specific local basis, {| x (cid:105)} , then there exist constants > κ ≥ and c ≥ , such that for any three contiguous regions Λ = ABC as in Fig. 1, we have I p Ψ ,B ( A : C | B ) ≤ cκ | B | . (34) The constants c and κ are independent of | A | , | B | , | C | , | L (cid:105) , and | R (cid:105) . The following gives an overview of the essential steps of the proof. For a more detailed account,see the proof of Theorem 11 in Appendix A. Proof. The first step in proving Theorem 4 is to bound the post-measurement CMI, I p Ψ ,B ( A : C | B ),in terms of the quantity f ( N ), defined below in Eq. (41). The second step is to show that f ( N )decays exponentially if the purity condition is satisfied; this step is shown independently in Prop.5. We relegate much of the technical details of the proof to the appendix to allow for a clearerpresentation of the main ideas.To start with, we bound the average entropy (Eq. (15)) in terms of a quantity that can beinterpreted as the average purity and we get (cid:104) S [Ψ C ( x B )] (cid:105) p Ψ ( x B ) ≤ − Q log Q + Q [1 + log ( D − , (35)with Q := 1 − (cid:80) x B p Ψ ( x B ) (cid:107) Ψ C ( x B ) (cid:107) . The proof, which is deferred to Lemma 6 in AppendixA, follows from concavity of the entropy functional. It is clear that exponential decay of Q impliesexponential decay of I Ψ( x B ) ( A : C | B ) by Eq. (13).Next, we show that Q can be bounded above by a function of the ordered singular values of thematrix product defining the classical post-measurement MPS. Indeed, first Lemma 7 in AppendixA establishes an upper bound on Q in terms of the average second eigenvalue of the matrix productin Eq. (17) as Q ≤ D − K (cid:88) x B λ ↓ (cid:16) F A x N · · · A x σA † x · · · A † x N F † (cid:17) , (36)where { λ ↓ j ( O ) } and { ν ↓ j ( O ) } denote the eigenvalues and singular values of an operator O in de-creasing order, i.e., λ ↓ ( O ) ≥ · · · ≥ λ ↓ D ( O ) and ν ↓ ( O ) ≥ · · · ≥ ν ↓ D ( O ), respectively.Then, recalling that for any operator O , we have λ j ( OO † ) = ν j ( O ) , we get Q ≤ D − K (cid:88) x B λ ↓ (cid:16) F A x N · · · A x σA † x · · · A † x N F † (cid:17) , (37) ≤ D − K (cid:88) x B (cid:113) λ ↓ ( F A x N · · · A x σA † x · · · A † x N F † ) λ ↓ ( F A x N · · · A x σA † x · · · A † x N F † ) , (38)= D − K (cid:88) x B ν ↓ ( F A x N · · · A x √ σ ) ν ↓ ( F A x N · · · A x √ σ ) =: D − K f ( N ) , (39)where recall that | B | = N .Next, we need to take into account of the fact that K depends on the size of the regions A , B and C , and in principle K could approach zero. However, the assumption that the MPS is injective,implies that E ( · ) = (cid:80) x A x · A † x is primitive, which means that E has a unique full-rank fixed point.10he latter is used in Lemma 10 in Appendix A to show that for all sufficiently large | B | there existsa number r > K = (cid:104) R | E | Λ | ( | L (cid:105)(cid:104) L | ) | R (cid:105) = (cid:104) R | E | A | + | B | + | C | ( | L (cid:105)(cid:104) L | ) | R (cid:105) ≥ r, (40)where r is independent of | A | , | C | , | L (cid:105) and | R (cid:105) . We use this to obtain an upper bound on Q thatonly depends on N via f ( N ).Finally, in Proposition 5 below, the function f ( N ) is shown to decay exponentially if the puritycondition holds. Moreover, the constants c and γ in the bound (42) can be chosen to be independentof | A | , | B | , | C | , which follows from the fact that c and γ are independent of σ and F .Note that the bound in Eq. (38) is likely quite sub-optimal. It is an interesting open questionwhether there exists a more direct bound of the average purity that does not rely on boundingthe function f ( N ). The main reason to work with f ( N ) rather than the average purity is because f ( N ) is explicitly submultiplicative.We now state the key proposition adapted from Ref. [19], and references therein. Proposition 5 ([19]) . Let { A x } d − x =0 be operators on a finite-dimensional complex Hilbert space, H ,such that (cid:80) d − x =0 A † x A x = . For operators σ and F on H , define f ( N ) := d − (cid:88) x ,...,x N =0 ν ↓ ( F A x N · · · A x √ σ ) ν ↓ ( F A x N · · · A x √ σ ) . (41) If { A x } d − x =1 satisfies the purity condition in Definition 3, then there exist real constants, ≤ c and < γ < , such that for all density operators σ , and all F such that F † F ≤ , it is the case that f ( N ) ≤ cγ N , ∀ N ∈ N . (42) Conversely, if there exists constants ≤ c and < γ < such that (42) holds for some σ and F that both are full-rank operators, then { A x } d − x =1 satisfies the purity condition. Theorem 4 provides the necessary bound ξ ( | B | ) = cκ | B | in Lemma 2 for showing the quasi-locality of the classical restriction p ,..., | Λ | ( x , . . . , x | Λ | ) = (cid:104) x Λ | Ψ | x Λ (cid:105) . Theorem 4 and Lemma 2 thusyield as a corollary (for a more exact formulation, see Corollary 12 in Appendix A) S ( p ,..., | Λ | (cid:107) p (cid:96) ,..., | Λ | ) ≤ c | Λ | κ (cid:96) , ≤ (cid:96) ≤ | Λ | − . (43)A simple example that leads to an exponential decay of the relative entropy is if (cid:96) is a constantfraction of | Λ | , i.e., (cid:96) = α | Λ | , < α < . (44)The result is that the relative entropy decays exponentially, and the family of (cid:96) -local distributionsthus approaches p ,..., | Λ | exponentially fast. The classical restriction of typical injective MPS is thusin this sense quasi-locally Gibbs. Here, we give a brief overview of the general structure and ideas behind the proof of Proposition5, i.e., that f ( N ) decays exponentially if { A x } x satisfies the purity condition. Although we do not11lways follow the exact same tracks, the essence of the proof is due to [19, 20], which we haveadapted to our particular setting and cast in a language that is hopefully more accessible to thequantum information theory community. The proof in the appendix is essentially self-contained,only referencing some standard results in the theory of Martingales, that can be found in a numberof classic textbook on the subject.As one may note from Eq. (41), the sequence f ( N ) not only depends on the operators A x , butalso on the operators σ and F . It turns out to be convenient to first focus on the function w ( N ) = d − (cid:88) x ,...,x N =0 ν ↓ ( A x N · · · A x ) ν ↓ ( A x N · · · A x ) . (45)Once we have established the purity condition as a necessary and sufficient condition for exponentialdecay of w ( N ), we extend (Proposition 31 in Section D.5) this result to f ( N ), which thus yieldsthe statement of Proposition 5.The proof of the exponential convergence of w ( N ) is essentially done in two steps. First, it isshown that w ( N ) converges to zero. Next, it is shown that w ( N ) is submultiplicative, in the sensethat w ( N + M ) ≤ w ( N ) w ( M ), and thus log w ( N ) is subadditive. This observation is used, togetherwith Fekete’s subadditive lemma, to show that w ( N ) goes to zero exponentially fast. These stepsare incorporated into the proof of Proposition 30.The essential approach for proving that w ( N ) converges to zero is to interpret w ( N ) as theaverage over a stochastic process. This process can be viewed as the random measurement outcomes x , . . . , x N due to a repeated sequential measurement of the POVM { A † x A x } d − x =0 . (This process isdescribed more precisely in Appendix C.) For the proof, it is useful to introduce the operator M N = A † x · · · A † x N A x N · · · A x Tr( A † x · · · A † x N A x N · · · A x ) , (46)which thus depends on the sequence of random measurement outcomes x , . . . , x N . It turns out thatone can express w ( N ) in terms of M N via the relation w ( N ) = E (cid:0)(cid:113) λ ↓ ( M N ) λ ↓ ( M N ) (cid:1) D , where λ ↓ ( M N ) and λ ↓ ( M N ) denote the largest and the second largest eigenvalue of M N , respectively,and D the dimension of the underlying Hilbert space. Moreover, E denotes the expectation valueover all possible measurement outcomes. One can realize that M N is positive semi-definite, hastrace 1, and can thus be interpreted as a density operator. The main point is that if M N wouldbe a rank-one operator, and thus correspond to a pure state, then it follows that λ ↓ ( M N ) is zero.Intuitively, it thus seems reasonable that w ( N ) converges to zero if it is ‘sufficiently likely’ that M N converges to a rank-one operator.The starting point for demonstrating that M N converges to a rank-one operator is to show(Lemma 23) that the sequence ( M N ) N ∈ N is a martingale relative to the sequence of measurementoutcomes ( x N ) N ∈ N . This enables us to show (Lemma 24) that ( M N ) N ∈ N almost surely convergesto a positive operator M ∞ . (All these notions are reviewed in Section B.) Once this is established,the bulk of the proof is focused on showing that M ∞ (almost surely) is a rank-one operator if andonly if { A x } d − x =0 satisfies the purity condition.The arguable least transparent part of the proof is how to show that the purity conditionis sufficient for M ∞ to be a rank-one operator. The first part of the proof (Lemma 25) showsthat M N + p and M N in some sense ‘approach’ each other, even when conditioned on x , . . . , x N .The second part (Lemma 27) losely speaking shows that M N + p gives rise to a term of the form √ M N U † N A † x · · · A † x p A x p · · · A x U N √ M N for a unitary operator, U N , while M N gives rise to a12erm that is proptional to M N . As these operators approach each other when N approachesinfinity, one can use this to show that M ∞ U †∞ A † x (cid:48) · · · A † x (cid:48) p A x (cid:48) p · · · A x (cid:48) U ∞ M ∞ ∝ M ∞ U †∞ U ∞ M ∞ . (47)In a reformulation (Lemma 26) of the purity condition, the projector, P , is replaced by a generaloperator, O , again with the conclusion that O must be a rank-one operator. With O = U ∞ M ∞ itfollows that M ∞ is a rank-one operator.To conversely show (Lemma 28) that the purity condition is a necessary condition is somewhatless involved. By assuming that a projector P satisfies the proportionality in (33) while having arank larger than one, then it follows that the only way in which M ∞ can be a rank-one operator,is if P M ∞ P = 0. However, this leads to a contradiction with A x being such that (cid:80) Lk =1 A † x A x = .Remark. Using the same tools as above, Benoist et. al. show in [19] that the stochastic processdefined in Appendix C equilibrates exponentially. It is worth noting that the average purity canconverge to zero much faster than the stochastic process. For instance, if σ is a rank-one operator,then it trivially follows that f ( N ) is identically zero for all N , irrespective of whether { A x } d − x =0 satisfies the purity condition or not. In this section, we discuss the purity condition and the decay of the CMI in the context of quantuminformation theory. We specifically study the behaviour of the CMI for symmetry-protected phasesand obtain that it remains constant. Moreover, the decay rate of the CMI is shown to be unrelatedto the decay of the transfer operator of the corresponding MPS by constructing two simple examples.The purity condition is discussed from the point of view of quantum error correction. Finally, wework out some examples. We briefly discuss what systems do not satisfy the purity condition and comment on the relationto symmetry-protected phases in one dimension.Consider an MPS, | Ψ (cid:105) , of the form of Eq. (1) with matrices A x i that have a tensor productdecomposition into two subsystems such that A x i = U x i ⊗ T x i , (48)where U x i is a unitary matrix and T x i is any matrix. Then, the reduced post-measurement state,Ψ C ( x B ), in the infinite chain case (see Eq. (18)) is isospectral toΨ C ( x B ) (cid:39) p Ψ ( x B ) K ( U x N · · · U x ⊗ T x N · · · T x ) ρ (cid:16) U † x · · · U † x N ⊗ T † x · · · T † x N (cid:17) . For simplicity, let us further consider the case where the unique fixed point of the transfer operatoris proportional to the identity, i.e., ρ = D D , where D and D are the dimensions of the twosub-systems respectively. We obtainΨ C ( x B ) (cid:39) D D p Ψ ( x B ) K (cid:16) ⊗ T x N · · · T x T † x · · · T † x N (cid:17) . S [Ψ C ( x B )] = log D + S (cid:18) p Ψ ( x B ) D K T x N · · · T x T † x · · · T † x N (cid:19) . Consequently, the average entropy of entanglement of Ψ C ( x B ), and thus the post-measurementCMI, I Ψ( x B ) ( A : C | B ) (see Eq. (13)), always has a constant contribution independent of the lengthof the middle region B . More generally, the CMI is non-vanishing for MPS in a basis where thematrices A x i can be isometrically mapped to a form as in Eq. (48) [24].It was shown in Ref. [26] that for a symmetry-protected phase in the MPS framework, therealways exists a local basis in which the matrices have the form of Eq. (48), with the additionalproperty that the unitary matrices form a representation of the symmetry group. The AKLT model(see Sec. 6.4.1) is such an example. In this section we will explore the purity condition (see Def. 3) in more detail. The purity conditionstates that the only projectors P that satisfy P A † x · · · A † x N A x N · · · A x P ∝ P, (49)for all N ∈ N , and all ( x , . . . , x N ) ∈ { , . . . , d − } × N , are those that have rank one. Here weinvestigate the relation between this condition (or rather the violation of it) and the Knill-Laflammeerror correction condition [25].Suppose that there exits a projector, P , onto a subspace, C , with dim C ≥ P A † x A x P = λ x P, ∀ x. (50)This looks suspiciously similar to the Knill-Laflamme error correction condition, which is P A † x A y P = c xy P, ∀ x, y. (51)The question is how one can understand the apparent similarity between Eq. (50) and (51). Tothis end, let us first recall the error correction scenario. If A x are operators on a Hilbert space, H ,with (cid:80) x A † x A x = , we define the corresponding noise channel E ( χ ) := (cid:88) x A x χA † x . (52)For any state, χ , with support on the subspace C ⊆ H , it is the case that E ( χ ) can be restored to χ if and only if (51) is true. More precisely, there exists a recovery operation, R , (that does notdepend on χ ) such that R ◦ E ( χ ) = χ for all density operators χ with support on C .It turns out that Eq. (50) is also a necessary and sufficient condition for error correction, butfor a different type of error-model. The channel E , in the standard error-correction scenario, isthe effect of a unitary evolution that acts on H and on an environment, H E , where the latter isinaccessible to us. In the alternative scenario, we assume that there exists an ancilla system, A ,which we do have access to, and which we can use in order to help us restore the initial state on H . More precisely, we assume an error model of the form˜ E ( χ ) = (cid:88) x | x (cid:105) A (cid:104) x | ⊗ A x χA † x , (53)14here {| x (cid:105) A } l is an orhonormal basis of the Hilbert space associated to the ancillary system, H A .We can interpret this as having access to additional classical information about the error in theregister, A . We use this additional information in order to restore the state on C . One may notethat if we have no access to A , then we are back to the standard scenario, where the channel on H is E = Tr A ˜ E . It turns out that (50) is a necessary and sufficient condition for the existence of arecovery channel ˜ R : L ( H ⊗ H A ) → L ( H ), such that ˜ R ◦ ˜ E ( χ ) = χ for all density operators χ on C .The proof of this statement is nearly identical to that of the original Knill-Laflamme theorem andis omitted here.If one finds a non-tivial projector P (i.e. if Tr( P ) = dim C ≥ 2) such that Eq. (50) holds, thenone can explicitly construct a collection of unitary operators, U x , such that (cid:88) x U x A x χA † x U † x = χ (54)for all density operators χ on C . In other words, the operators U x perform the error correction onsubspace C . More precisely, if we have a set { A x } with (cid:80) x A † x A x = , for which there exists anon-trivial projector, P , that satisfies Eq. (50), then we can construct a new “error-corrected” set, { A x } , with A x := U x A x (and (cid:80) x A † x A x = ). For this new set we will thus not get a decay to zeroof the average entropy (Eq. (17)), no matter how long a chain A x N · · · A x we construct.Nothing prevents us from repeating the above reasoning for products { A x A x } x ,x , i.e., wecan try to find the largest subspace C with corresponding projector, P , such that P A † x A † x A x A x P = λ x ,x P. (55)We can similarly ask for the largest subspace C that is correctable for { A x A x A x } x ,x ,x . Onecan realize that we always have C n ⊆ C n − .The purity condition is violated if and only if there exists a non-trivial projector P such that(49) holds for all N . By the above reasoning we can thus conclude that the purity condition fails ifand only if there for all N exists a fixed non-trivial correctable subspace C . Loosely speaking, we canalternatively phrase the purity condition as the non-existence of a non-trivial correctable subspacethat persists indefinitely throughout iterated applications of the error channel. Intuitively, thisobservation suggests that the violation of the purity condition is a rather “brittle” and non-genericphenomenon. As we have seen in Section 5, the CMI of an MPS decays exponentially to zero when the matricesof the MPS satisfy the purity condition (see Def. 3). Moreover, the transfer operator of an injectiveMPS, E , decays to its unique fixed point exponentially fast, at a rate lower bounded by the gap ofthe channel. The decay rate is often referred to as the correlation length. One could expect thatthere exists a relation between the correlation length and the decay rate of the classical CMI ofTheorem 4. In this section, we consider two examples which give clear evidence of the absence ofsuch relation.Consider an MPS of the form of Eq. (1) such that all matrices A x i have rank one. After ameasurement of system B , the reduced state Ψ C ( x B ) becomes pure for any measurement outcome(see Eq. (16)). Therefore, the average von Neumann entropy of Ψ C ( x B ), and thus the post-measurement CMI according to Eq. (13), are zero instantaneously even if region B is a single15ite. On the other hand, the correlation length need not be zero. For example, given a collectionof transition probabilities, { P ( x j | x i ) } d − x i ,x j =0 , and an orthonormal basis, {| x i (cid:105)} d − x i =0 , the repeatedapplication of the transfer operator of an MPS with A x ij = (cid:112) P ( x j | x i ) | x j (cid:105)(cid:104) x i | effectively implementsa classical Markov process with transition probability P ( x j | x i ) such that E ◦ N ( χ ) = d − (cid:88) x ,...,x N =0 P ( x | x ) · · · P ( x N | x N − ) (cid:104) x | χ | x (cid:105)| x N (cid:105)(cid:104) x N | . Nothing prevents this Markov chain to have a slow convergence to its equilibrium distribution.Conversely, consider an MPS with matrices A x i proportional to unitary operators. As we havediscussed in Section 6.1, this implies that there is no decay of the von Neumann entropy, and thusthe CMI remains constant. However, an injective MPS always has a finite correlation length. As aconcrete example, consider A x ij := 1 D D − (cid:88) k =0 e πi kxjD | k (cid:105)(cid:104) ( k + x i ) mod D | , where {| k (cid:105)} D − k =0 is an orthonormal basis. One can easily check that A x ij are proportional to unitaryoperators, and hence the average von Neumann entropy, (cid:104) S [Ψ C ( x B )] (cid:105) , does not decay. The transferoperator of this MPS is the replacement map that replaces any input state, χ , with the maximallymixed state, i.e., E ( χ ) = D − (cid:88) x i ,x j =0 A x ij χA † x ij = Tr χD . In [22, 23], the decay of classical and quantum correlations is also studied. There, the authorsintroduce an entanglement measure called Localizable Entanglement (LE). The LE is defined as themaximal amount of entanglement that can be created on average between two spins at positions i and j of a chain by performing local measurements on the other spins. It is easy to note thatthe LE is similar to the scenario that we are considering in this paper (see Eq. (15)). Indeed,the difference is simply that the LE optimises over the basis of the measurement, while we pick aconcrete basis. For the case when the measured spins are spin-1/2, it is shown in [22, 23] that theconnected correlation function provides a lower bound on the LE. In this section we consider examples that illustrate some features of the process under study. Asa prototypical example, we look at the AKLT model and obtain the exact convergence rate in aspecific basis. Then, we consider MPS with strictly contractive transfer operator and pure fixedpoint. This second example shows that primitivity of the transfer operator is not a necessarycondition for the exponential convergence of the post-measurement CMI. In the last example thatwe construct, the purity condition is violated up to a fixed length | B | = N but satisfied thereafter. The first state we want to consider is the 1D AKLT model. The AKLT state defined on a chainhas a well-known MPS description with bond dimension D = 2 and physical dimension d = 3. The16atrices in the MPS picture are given by A = − √ (cid:18) − (cid:19) , A + = (cid:114) (cid:18) (cid:19) , A − = − (cid:114) (cid:18) (cid:19) . (56)We take the { x i } = { , + , −} as the basis for our physical space. It can be seen by inspection thatthe transfer operator, E ( χ ) = (cid:80) x i A x i χA † x i , has a unique stationary state ρ = / 2. In the infinitechain setting, the probability of a measurement outcome x B = x , . . . , x N is given by p Ψ ( x B ) = 12 Tr (cid:104) A x N · · · A x A † x · · · A † x N (cid:105) . (57)We want to calculate the average entropy on C after measurement of system B , i.e., we need toestimate (cid:104) S [Ψ C ( x B )] (cid:105) = (cid:88) x B =0 , + , − p Ψ ( x B ) S A x N · · · A x A † x · · · A † x N Tr (cid:16) A x N · · · A x A † x · · · A † x N (cid:17) . (58)We note two scenarios: p Ψ ( x B ) = 0 and p Ψ ( x B ) (cid:54) = 0. Since A + A + = A − A − = 0, we get thatwhenever the string x B contains two (or more) successive + (or − ), then p Ψ ( x B ) = 0. In otherwords, the only strings that give a p Ψ ( x B ) (cid:54) = 0 are those with an alternating sequence (Ex: + − + − ),possibly interspersed with 0’s. However, the only string with non-zero entropy is the one with all0’s because any alternating sequence has rank one. This string will occur with probability 1 / N .Thus, we get that (cid:104) S [Ψ C ( x B )] (cid:105) = 13 N S [Ψ C ( x )] , (59)where x := 0 , . . . , 0. Hence, the AKLT model in the standard basis has a post-measurement CMIthat is exponentially decaying in the size of B for large A and C . Coincidentally, the correlationlength is the same as the classical CMI decay in this basis.Let us now consider a change of basis. The 1D AKLT state is also given by the MPS represen-tation with matrices˜ A := (cid:114) 13 ˆ σ x = (cid:114) (cid:18) (cid:19) , ˜ A := (cid:114) 13 ˆ σ y = (cid:114) (cid:18) − ii (cid:19) , ˜ A := (cid:114) 13 ˆ σ z = (cid:114) (cid:18) − (cid:19) , where ˆ σ i are the Pauli matrices. The Pauli matrices are unitary, and thus the average von-Neumannentropy of Ψ C ( x B ) (Eq. (58)) is constant, namely log 2. In other words, the post-measurementCMI of the AKLT chain when measured in the basis corresponding to the Pauli matrices is notdecaying. This is consistent with the discussion in Section 6.1 because the AKLT chain is in theHaldane phase, which is a SPP protected by the Z × Z symmetry generated by the π rotationsaround three orthogonal axes. As mentioned in Section 2, translationally invariant MPS are injective if and only if the transferoperator E , defined in (2), is primitive [29], where primitivity means that the channel possessesa unique full-rank fixed point. Primitivity in turn guarantees the existence of a gapped parentHamiltonian and exponential decay of correlations [13, 32].In view of the essential role played by primitivity for the decay of correlations, one may ask howit relates to the purity condition for the exponential decay of the CMI, in the sense of Theorem 4.17n this section, we present an example which shows that primitivity is not a necessary conditionfor the exponential decay of the CMI. In other words, we consider an MPS with a non-primitivetransfer operator, which nevertheless yields and exponentially decaying CMI due to purity.We say that a channel Φ is strictly contractive if there exists a number 0 ≤ α < (cid:107) Φ( χ ) − Φ( χ ) (cid:107) ≤ α (cid:107) χ − χ (cid:107) for all density operators χ and χ .Consider an MPS of the form of Eq. (1) which has a strictly contractive transfer operator, E ,with a pure fixed point, denoted by | φ (cid:105) . Note that if a channel is strictly contractive, then thefixed point is unique. Note further that, since the fixed point is pure, the transfer operator is,by definition, not primitive. Let us also assume that F = (cid:112) E ∗| C | ( R ) is full rank. Under theseassumptions, our aim is to find an exponentially decaying bound of the average von Neumannentropy of the reduced post-measurement state (see Eq. (17)). We start by using the concavity ofthe entropy and obtain (cid:104) S [Ψ C ( x B )] (cid:105) p Ψ ( x B ) ≤ S (cid:32) K (cid:88) x B F A x N · · · A x σA † x · · · A † x N F † (cid:33) = S (cid:18) F E N ( σ ) F † Tr [ F E N ( σ ) F † ] (cid:19) , where K = Tr (cid:2) F E N ( σ ) F † (cid:3) . The purity of the fixed point of E allows us to transform the aboveinequality to (cid:104) S [Ψ C ( x B )] (cid:105) p Ψ ( x B ) ≤ S (cid:18) F E N ( σ ) F † Tr [ F E N ( σ ) F † ] (cid:19) = (cid:12)(cid:12)(cid:12)(cid:12) S (cid:18) F E N ( σ ) F † Tr [ F E N ( σ ) F † ] (cid:19) − S (cid:18) F E N ( | φ (cid:105)(cid:104) φ | ) F † Tr [ F E N ( | φ (cid:105)(cid:104) φ | ) F † ] (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) , We can further bound the average von Neumann entropy using the Fannes-Audenaert inequality[33, 34]. This yields (cid:104) S [Ψ C ( x B )] (cid:105) p Ψ ( x B ) ≤ t log( D − 1) + H B ( t ) , (60)where H B is the binary entropy, i.e., H B ( t ) = − t log( t ) − (1 − t ) log(1 − t ), with H B (0) := 0 and H B (1) := 0, and where t is defined as t := 12 (cid:13)(cid:13)(cid:13)(cid:13) F E N ( σ ) F † Tr [ F E N ( σ ) F † ] − F E N ( | φ (cid:105)(cid:104) φ | ) F † Tr [ F E N ( | φ (cid:105)(cid:104) φ | ) F † ] (cid:13)(cid:13)(cid:13)(cid:13) , (61)with (cid:107) · (cid:107) denoting the trace norm.For any full-rank F , and any pair of density operators χ , χ on a finite-dimensional Hilbertspace, one can show that (cid:13)(cid:13)(cid:13)(cid:13) F χ F † Tr( F χ F † ) − F χ F † Tr( F χ F † ) (cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:18) s max ( F ) s min ( F ) (cid:19) (cid:107) χ − χ (cid:107) , (62)where ν ( F ) and ν D ( F ) denote the largest and the smallest singular values of F , and where wenote that ν D ( F ) > F is full-rank on a finite-dimensional space.By combining (61) and (62) with an iterative use of strict contractivity of E , we find an upper-bound on t such that t ≤ (cid:18) ν ( F ) ν D ( F ) (cid:19) (cid:107) E N ( σ ) − E N ( | φ (cid:105)(cid:104) φ | ) (cid:107) , ≤ (cid:18) ν ( F ) ν D ( F ) (cid:19) α N (cid:107) σ − | φ (cid:105)(cid:104) φ |(cid:107) . (63)18f it would be possible to choose α = 0, then (cid:107) E ( χ ) − E ( χ ) (cid:107) = 0, and thus E ( χ ) = E ( χ ),which implies that the convergence is not only exponential, but immediate. Hence, without loss ofgenerality, we may in the following assume that 0 < α < H B ( t ). For that, we define the function g ( t ) := t − t log t , with g (0) := 0. One can show that g is monotonically increasing on t ∈ [0 , 1] andsatisfies H B ( t ) ≤ g ( t ) for 0 ≤ t ≤ 1. These two properties of g together with inequality (63) leadto H B ( t ) ≤ (cid:18) ν ( F ) ν D ( F ) (cid:19) α N (cid:107) σ − | φ (cid:105)(cid:104) φ |(cid:107) − (cid:18) ν ( F ) ν D ( F ) (cid:19) α N (cid:107) σ − | φ (cid:105)(cid:104) φ |(cid:107) log (cid:34)(cid:18) ν ( F ) ν D ( F ) (cid:19) (cid:107) σ − | φ (cid:105)(cid:104) φ |(cid:107) (cid:35) − (cid:18) ν ( F ) ν D ( F ) (cid:19) (cid:107) σ − | φ (cid:105)(cid:104) φ |(cid:107) N α N log α. (64)By combining (60) with (64), and again using inequality (63), we find that (cid:104) S [Ψ C ( x B )] (cid:105) p Ψ ( x B ) ≤ c α N + c N α N , (65)where c and c are defined as c := (cid:18) ν ( F ) ν D ( F ) (cid:19) (cid:107) σ − | φ (cid:105)(cid:104) φ |(cid:107) (cid:20) log( D − 1) + 1 − log (cid:32)(cid:18) ν ( F ) ν D ( F ) (cid:19) (cid:107) σ − | φ (cid:105)(cid:104) φ |(cid:107) (cid:33) (cid:21) ,c := (cid:18) ν ( F ) ν D ( F ) (cid:19) (cid:107) σ − | φ (cid:105)(cid:104) φ |(cid:107) ( − log α ) . The right-hand side of Eq. (65) decays exponentially to zero when N grows to infinity because0 < α < 1. Hence, we have found an exponentially-decaying bound on the post-measurement CMIfor an MPS with a transfer operator that is strictly contractive and has a pure fixed point. Thisshows that primitivity of the transfer operator is not a necessary condition for the CMI to converge. Here, we construct a simple example of a two-element set { A , A } that has a nontrivial correctablesubspace (in the sense of Section 6.2) for small enough N , but where the purity condition never-theless holds. For the Hilbert space H , we let dim H = D + 1 and let | (cid:105) , . . . , | D − (cid:105) , | D (cid:105) be anorthonormal basis of H . We define the projector P := (cid:80) D − k =0 | k (cid:105)(cid:104) k | and the operators A := D − (cid:88) k =0 | k + 1 (cid:105)(cid:104) k | , and A := | D (cid:105)(cid:104) D | . (66)We note that A is a Jordan block with zeros on the diagonal, and that A † A = D − (cid:88) k =0 | k (cid:105)(cid:104) k | = P, (67)19hile A † A = A = | D (cid:105)(cid:104) D | . Thus, A † A + A † A = . Moreover, observe that P A † A P = λ P, λ = 1 ,P A † A P = λ P, λ = 0 . (68)Hence, for a single site, the correctable subspace is C := span {| (cid:105) , . . . , | D − (cid:105)} .Now consider the case of several sites, where we construct the sequence A x N · · · A x . It is notdifficult to see that A N = D − N (cid:88) k =0 | k + N (cid:105)(cid:104) k | . (69)Hence, the correctable subspace decreases the dimension with one step along the sequence, untilit is exhausted. As a consequence, the CMI in this example will be exponentially decaying with apre-factor that grows exponentially in D . Examples that do not have a block diagonal structure canalso be constructed. We note that the example above is similar in spirit to a Bosonic annihilationoperator. Indeed, in the infinite system case where the Kraus operators are Bosonic creation andannihilation operators, the purity condition no longer makes sense, and the theory breaks down. We have shown that the amplitudes of an injective MPS in a specific local basis follow a quasi-localGibbs distribution with exponentially decaying tails if a condition on the Kraus operators of theMPS called the ‘purity’ is satisfied. The purity condition reflects the fact that no information canbe preserved in the virtual subspace on average, upon measurements. Our proof makes extensiveuse of the theory of random matrix products.A number of open questions remain. Perhaps the most obvious is whether the methods used inthis paper can be applied in higher dimensions or in the context of matrix product operators, andwhether this leads to new insights or algorithmic improvements. In the setting of matrix productoperators, the purity condition would no longer be sufficient to prevent information transmissionalong the chain. There one would likely have to bound the stochastic process upon measurementsfrom above and below. Some recent progress in this direction has been communicated to us [35].Another place where the present tools might be applied is in the rigorous analysis of the WaveFunction Monte Carlo algorithm. A first attempt to achieve this has been made in Ref. [21], yetsome work remains to be done in connecting these mathematical results to more realistic physicalsettings and particular examples. Yet another extension would be to continuous MPSs [36].On a more technical level, it would be valuable to get a better handle on the decay rate of thestochastic process. In particular, whether there exists a closed from expression as is the case forthe correlation length (as the spectral gap of the transfer operator). Acknowledgements We thank T. Benoist for clarifying some details in Ref. [19]. We thankDavid Gross for helpful discussion. Funded by the Deutsche Forschungsgemeinschaft (DFG, Ger-man Research Foundation) under Germany’s Excellence Strategy - Cluster of Excellence Matterand Light for Quantum Computing (ML4Q) EXC 2004/1-390534769. This work was completedwhile MJK was at the University of Cologne. 20 eferences [1] M. B. Hastings, Solving Gapped Hamiltonians Locally , Phys. Rev. B , 085115 (2006).[2] S. White, Density matrix formulation for quantum renormalization groups , Phys. Rev. Lett. , 2863 (1992).[3] U. Schollwoeck, The density-matrix renormalization group , Rev. Mod. Phys. , 259 (2005).[4] U. Schollwoeck, The density-matrix renormalization group in the age of matrix product states ,Annals of Physics , 96 (2011).[5] J. C. Xavier, J. A. Hoyos, E. Miranda, Adaptive Density Matrix Renormalization Group forDisordered Systems , Phys. Rev. B , 195115 (2018).[6] F. Verstraete, J. J. Garcia-Ripoll, and J. I. Cirac, Matrix product density operators: Simulationof finite-temperature and dissipative systems , Phys. Rev. Lett. , 207204 (2004).[7] S. Paeckel, T. K¨ohler, A. Swoboda, S. R. Manmana, U. Schollw¨ock, C. Hubig, Time-evolutionmethods for matrix-product states , Annals of Physics , 167998 (2019).[8] J. Almeida, M. A. Martin-Delgado, and G. Sierra, DMRG applied to critical systems: spinchains , AIP Conference Proceedings , 261 (2007).[9] S. Bravyi, D. Gosset, Polynomial-time classical simulation of quantum ferromagnets , Phys.Rev. Lett. , 100503 (2017).[10] M. Jerrum and A. Sinclair, Polynomial-time approximation algorithm for the Ising model ,SIAM Journal on computing , 1087 (1993).[11] F. Martinelli, E. Olivieri, Approach to equilibrium of Glauber dynamics in the one phase region.I. The attractive case , Comm. Math. Phys. , 3, 447-486 (1994).[12] G. Carleo, M. Troyer, Solving the Quantum Many-Body Problem with Artificial Neural Net-works , Science , 602 (2017).[13] D. Perez-Garcia, F. Verstraete, M.M. Wolf, J.I. Cirac, Matrix Product State Representations ,Quantum Inf. Comput. , 401 (2007).[14] O. K. Kozlov, Gibbs Description of a System of Random Variables , Probl. Peredachi Inf. :3(1974), 94–103; Problems Inform. Transmission, :3 (1974), 258–265.[15] W. Brown, D. Poulin, Quantum Markov Networks and Commuting Hamiltonians ,arXiv:1206.0755 (2012).[16] K. Kato, F. G. S. L. Brandao, Quantum Approximate Markov Chains are Thermal , Commun.Math. Phys. , 117 (2019).[17] K. Kato, F. G. S. L. Brandao, Locality of Edge States and Entanglement Spectrum from StrongSubadditivity , Phys. Rev. B , 195124 (2019).[18] M. J. Kastoryano, A. Lucia, D. Perez-Garcia, Locality at the boundary implies gap in the bulkfor 2D PEPS , Comm. Math. Phys. (2019) 366: 895.2119] T. Benoist, M. Fraas, Y. Pautrat, and C. Pellegrini, Invariant measure for quantum trajectories ,Probability Theory and Related Fields , 307–334, (2019).[20] H. Maassen and B. K¨ummerer, Purification of quantum trajectories , Lecture Notes-MonographSeries , 252–261, (2006).[21] T. Benoist, M. Fraas, Y. Pautrat, C. Pellegrini, Invariant measure for stochastic Schr¨odingerequations , Ann. Henri Poincar´e.[22] M. Popp, F. Verstraete, J. I. Cirac, Entanglement versus Correlations in Spin Systems , Phys.Rev. Lett. , 027901 (2004).[23] M. Popp, F. Verstraete, M. A. Martin-Delgado, J. I. Cirac, Localizable Entanglement , Phys.Rev. A , 042306 (2005).[24] T. B. Wahl, D. Perez-Garcia, J. I. Cirac, Matrix Product States with long-range LocalizableEntanglement , Phys. Rev. A , 062314 (2012).[25] E. Knill, R. Laflamme, A Theory of Quantum Error-Correcting Codes , Phys. Rev. Lett. ,2525-2528 (2000).[26] D. V. Else, I. Schwarz, S. D. Bartlett, A. C. Doherty, Symmetry-protected phases formeasurement-based quantum computation , Phys. Rev. Lett. , 240505 (2012).[27] R. Movassagh, and Jeffrey Schenker, An ergodic theorem for homogeneously distributed quan-tum channels with applications to matrix product states , arXiv:1909.11769 (2019).[28] R. Movassagh, and Jeffrey Schenker, Theory of Ergodic Quantum Processes , arXiv:2004.14397(2020).[29] M. Sanz, D. P´erez-Garc´ıa, M. M. Wolf, J. I. Cirac, A quantum version of Wielandt’s inequality ,IEEE Transactions on Information Theory , 4668 (2010).[30] M. B. Hastings, How Quantum Are Non-Negative Wavefunctions? , J. Math. Phys. , 015210(2016).[31] P. Bougerol, J. Lacroix, Products of Random Matrices with Applications to Schr¨odinger Oper-ators , Birkh¨auser, Boston – Basel – Stuttgart (1985).[32] M. Fannes, B. Nachtergaele, R. F. Werner, Finitely correlated states on quantum spin chains ,Commun.Math. Phys. , 443 (1992).[33] M. Fannes, A Continuity Property of the Entropy Density for Spin Lattice Systems , Commun.math. Phys. , 291 (1973).[34] K. M. R. Audenaert, A Sharp Fannes-type Inequality for the von Neumann Entropy , J. Phys.A , 8127–8136 (2007).[35] C-F Chen, K. Kato, F. G. S. L. Brandao, When do matrix-product-density-operators have alocal parent Hamiltonian? , private communication.[36] F. Verstraete, J.I. Cirac, Continuous Matrix Product States for Quantum Fields , Phys. Rev.Lett. , 190405 (2010). 2237] A. Gut, Probability: A Graduate Course , Springer texts in statistics (Springer, New York,2005).[38] R. A. Horn and C. R. Johnson, Matrix Analysis ¨Uber die Verteilung der Wurzeln bei gewissen algebraischen Gleichungen mit ganz-zahligen Koeffizienten , Math. Zeit. , 228 (1923).[40] J. M. Steele, Probability Theory and Combinatorial Optimization , CBMS-NSF regional con-ference series in applied mathematics; 69 (SIAM, 1996).23 Elements of the proof of Theorem 4 In order to show Theorem 4 in Section 5, we use two key bounds, one on the average von Neumannentropy and a second on the average purity. Here, we state these bounds in the form of two lemmas.In the following we let (cid:107) Q (cid:107) := sup (cid:107) ψ (cid:107) =1 (cid:107) Q | ψ (cid:105)(cid:107) denote the standard operator norm. Lemma 6. Let { ρ x } d − x =0 be a collection of density operators on a Hilbert space H , with D = dim H ,and { p ( x ) } d − x =0 be real numbers such that p ( x ) ≥ and (cid:80) d − x =0 p ( x ) = 1 . Then, d − (cid:88) x =0 p ( x ) S ( ρ x ) ≤ − Q log Q + Q [log ( D − 1) + 1] , (70) where we refer to Q as the the average purity and define it as Q := 1 − d − (cid:88) x =0 p ( x ) (cid:107) ρ x (cid:107) . (71) Proof. To begin with, let us first consider a single density operator, ρ x , and define the channelΓ( ρ x ) := | φ (cid:105)(cid:104) φ | ρ x | φ (cid:105)(cid:104) φ | + Φ ⊥ ρ x Φ ⊥ , where | φ (cid:105) is a pure state in H , and Φ ⊥ := − | φ (cid:105)(cid:104) φ | . The channel Γ is mixing-enhancing, i.e., S ( ρ x ) ≤ S [Γ( ρ x )]. Moreover, Γ transforms any input state into a block-diagonal state, whichimplies that for any function, f , and any input state, ρ , it holds that f [Γ( ρ )] = f ( | φ (cid:105)(cid:104) φ | ρ | φ (cid:105)(cid:104) φ | ) + f (cid:0) Φ ⊥ ρ Φ ⊥ (cid:1) . Using these two properties of Γ, we obtain S ( ρ x ) ≤ H B [ q ( x )] + q ( x ) S (cid:20) Φ ⊥ ρ x Φ ⊥ Tr (Φ ⊥ ρ x ) (cid:21) , ≤ H B [ q ( x )] + q ( x ) log (dim H − , where we have defined q ( x ) := Tr(Φ ⊥ ρ x ) = 1 − (cid:104) φ | ρ x | φ (cid:105) for x = 0 , . . . , d − 1, and recall that H B ( t ) = − t log t − (1 − t ) log(1 − t ) is the binary entropy. Note that we can choose | φ (cid:105) to be thenormalized eigenvector corresponding to the largest eigenvalue of ρ x , which we denote as λ ↓ ( ρ x ).Then, we have q ( x ) = 1 − λ ↓ ( ρ x ) = 1 − (cid:107) ρ x (cid:107) .Considering now the whole set of density operators, { ρ x } d − x =0 , we have by the concavity of theentropy that d − (cid:88) x =0 p ( x ) S ( ρ x ) ≤ H B ( Q ) + Q log (dim H − , (72)where Q is defined in Eq. (71).One can next bound the binary entropy, as H B ( t ) ≤ t − t log t on 0 ≤ t ≤ 1. By combining thisobservation with (72), we obtain (70).The average purity, Q , defined in Eq. (71) can be bounded if one considers some structure onthe density operators and the probabilities. In particular, taking ρ x = Ψ C ( x B ) and p ( x ) = p Ψ ( x B )(see Eq. (1) and Eq. (8)), the average purity is Q = 1 − K − (cid:80) d − x =0 (cid:107) F A x N · · · A x σA † x · · · A † x N F † (cid:107) ,where recall that σ = E | A | ( L ) and F † F = E ∗| C | ( R ). An upper and a lower bound on Q are statedand shown in the following lemma. 24 emma 7. Let { A x } d − x =0 be a collection of operators on a Hilbert space, H , with D := dim H ≤ + ∞ ,such that (cid:80) d − x =0 A † x A x = . Let σ be a density operator on H , and F an operator on H , such that F † F ≤ . Then, K d − (cid:88) x =0 λ ↓ (cid:16) F A x N · · · A x σA † x · · · A † x N F † (cid:17) ≤ Q ≤ D − K d − (cid:88) x =0 λ ↓ (cid:16) F A x N · · · A x σA † x · · · A † x N F † (cid:17) , (73) where x := x , . . . , x N ; K is a normalization constant such that K := d − (cid:88) x =0 Tr (cid:16) F A x N · · · A x σA † x · · · A † x N F † (cid:17) ; (74) and Q is Q = 1 − K d − (cid:88) x =0 (cid:13)(cid:13)(cid:13) F A x N · · · A x σA † x · · · A † x N F † (cid:13)(cid:13)(cid:13) . (75) Proof. Consider a positive semi-definite operator, ρ ≥ 0, and define a function, L , on ρ such that L ( ρ ) := D (cid:88) j =2 λ ↓ j ( ρ ) . This function L ( ρ ) can be upper and lower bounded as λ ↓ ( ρ ) ≤ L ( ρ ) ≤ ( D − λ ↓ ( ρ ) . (76)Moreover, it holds that λ ↓ ( ρ ) + L ( ρ ) = Tr( ρ ) . (77)If we introduce in Eq. (77) the positive operator ρ := K − F A x N · · · A x σA † x · · · A † x N F † and wesum over all possible values of x := x , . . . , x N , we obtain1 K d − (cid:88) x =0 λ ↓ (cid:16) F A x N · · · A x σA † x · · · A † x N F † (cid:17) + 1 K d − (cid:88) x =0 L (cid:16) F A x N · · · A x σA † x · · · A † x N F † (cid:17) = 1 K d − (cid:88) x =0 Tr (cid:16) F A x N · · · A x σA † x · · · A † x N F † (cid:17) , = 1 , where the last equality holds due to the definition of K in Eq. (74). This implies that1 K d − (cid:88) x =0 L (cid:16) F A x N · · · A x σA † x · · · A † x N F † (cid:17) = 1 − K d − (cid:88) x =0 λ ↓ (cid:16) F A x N · · · A x σA † x · · · A † x N F † (cid:17) , = 1 − K d − (cid:88) x =0 (cid:13)(cid:13)(cid:13) F A x N · · · A x σA † x · · · A † x N F † (cid:13)(cid:13)(cid:13) , = Q. Using the bounds of Eq. (76), we finish the proof, since we obtain the bounds on Q in Eq. (73).25ecall that we throughout this investigation assume that log denotes the natural logarithm.We state the following two lemmas without proof. Lemma 8. Let H B ( t ) := − t log t − (1 − t ) log(1 − t ) , < t < , (78) and H B (0) := 0 and H B (1) := 0 . Let g ( t ) := t − t log t, < t ≤ , (79) and g (0) := 0 . Then, g is monotonically increasing on [0 , , and H B ( t ) ≤ g ( t ) , ≤ t ≤ . (80) Lemma 9. − t log t ≤ (cid:15) t − (cid:15) , ≤ t ≤ , < (cid:15) < . (81)We recall that if the channel E is primitive, then it follows that E has a unique full-rank fixpoint ρ [29]. With the replacement-map R ( σ ) := ρ Tr( σ ), the fact that every initial state σ converges to ρ can be expressed as lim N →∞ E N = R . Since the underlying Hilbert space is finite-dimensional,we can express the convergence in terms of any norm. It is convenient to express the convergencein terms of the norm (cid:107)F (cid:107) := sup (cid:107) Q (cid:107) =1 (cid:107)F ( Q ) (cid:107) , (82)and thus lim N →∞ (cid:107) E N − R(cid:107) = 0, where (cid:107) Q (cid:107) := Tr (cid:112) Q † Q is the trace norm. Lemma 10. Let { A x } d − x =0 be operators on a Hilbert space H , with D := dim H < + ∞ , such that (cid:80) d − x =0 A † x A x = , and E ( · ) := (cid:80) d − x =0 A x · A † x is primitive. Then, there exists a real number r > and a natural number N such that (cid:104) R | E | A | + | B | + | C | ( | L (cid:105)(cid:104) L | ) | R (cid:105) ≥ r, ∀| A | , | C | , ∀(cid:107) R (cid:107) = 1 , (cid:107) L (cid:107) = 1 , ∀| B | ≥ N . (83) Proof. For the map R ( σ ) := ρ Tr( σ ) with ρ the unique fixpoint ρ of E , we first observe that (cid:12)(cid:12)(cid:12) (cid:104) R | E | A | + | B | + | C | ( | L (cid:105)(cid:104) L | ) | R (cid:105) − (cid:104) R | ρ | R (cid:105) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) Tr (cid:16) | R (cid:105)(cid:104) R | (cid:0) E | A | + | B | + | C | ( | L (cid:105)(cid:104) L | ) − ρ (cid:1)(cid:17)(cid:12)(cid:12)(cid:12) , ≤ (cid:13)(cid:13) | R (cid:105)(cid:104) R | (cid:13)(cid:13)(cid:13)(cid:13) E | A | + | B | + | C | ( | L (cid:105)(cid:104) L | ) − ρ (cid:13)(cid:13) , = (cid:13)(cid:13)(cid:13) E | B | (cid:0) E | A | + | C | ( | L (cid:105)(cid:104) L | ) (cid:1) − R (cid:0) E | A | + | C | ( | L (cid:105)(cid:104) L | ) (cid:1)(cid:13)(cid:13)(cid:13) , ≤ (cid:13)(cid:13) E | B | − R (cid:13)(cid:13) (cid:107) E | A | + | C | ( | L (cid:105)(cid:104) L | ) (cid:107) , = (cid:13)(cid:13) E | B | − R (cid:13)(cid:13) . (84)Since E is assumed to be primitive, it follows that E has a unique full rank fixed point ρ . Since H is assumed to be finite-dimensional, it follows that the minimal eigenvalue of ρ is such that λ min ( ρ ) > 0. By (84), it follows that λ min ( ρ ) − (cid:13)(cid:13) E | B | − R (cid:13)(cid:13) ≤(cid:104) R | ρ | R (cid:105) − (cid:13)(cid:13) E | B | − R (cid:13)(cid:13) , ≤(cid:104) R | E | A | + | B | + | C | ( | L (cid:105)(cid:104) L | ) | R (cid:105) . (85)26ince lim | B |→∞ (cid:13)(cid:13) E | B | − R (cid:13)(cid:13) = 0 and λ min ( ρ ) > 0, it follows that there exists an r such that λ min ( ρ ) > r > N , such that (cid:104) R | E | A | + | B | + | C | ( | L (cid:105)(cid:104) L | ) | R (cid:105) ≥ r, ∀| B | ≥ N . (86)One should note that r and N are independent of | A | , | C | , and all normalized | R (cid:105) and | L (cid:105) .Theorem 4 in the main text follows as a direct corollary of Theorem 11 with κ := γ − (cid:15) and c := c (cid:15) for any fixed 0 < (cid:15) < 1. In essence, we use the bound f ( N ) ≤ cγ N in Proposition 5 in orderto prove the bound in Theorem 11, and thus it is the same γ that appears in both bounds. Thereason for the transition from γ to γ − (cid:15) is loosely speaking due to a leading order term proportionalto | B | γ | B | . This term appears in a bound on the CMI and can be accommodated by an arbitrarilysmall sacrifice of the rate in the exponential decay. However, since we here are not only interestedin the asymptotics, but rather wish to achieve a general bound valid for all values of | B | , theconstruction in the proof becomes more elaborate. Theorem 11. For a set of operators { A x } d − x =0 on a Hilbert space H with D := dim H ≥ , andnormalized | R (cid:105) , | L (cid:105) ∈ H , let Ψ be the MPS as defined in (1) on a region Λ = ABC . The set { A x } d − x =0 is such that (cid:80) d − x =0 A † x A x = satisfies the purity condition in Definition 3, and is suchthat E ( · ) := (cid:80) d − x =0 A x · A † x is primitive. For the constant γ as guaranteed by Proposition 5, and forevery < (cid:15) < , there exists a constant c (cid:15) ≥ such that I p ψ,B ( A : C | B ) ≤ c (cid:15) γ | B | (1 − (cid:15) ) , | B | = 1 , , . . . . (87) The constant γ is independent of | A | , | B | , | C | , | L (cid:105) , | R (cid:105) and (cid:15) . The constant c (cid:15) is independent of | A | , | B | , | C | , | L (cid:105) and | R (cid:105) , but may depend on (cid:15) .Proof. We first note that I p ψ,B ( A : C | B ) ≤ I ψ ( x B ) ( A : C | B ) , = (cid:104) S [ ψ A ( x B )] (cid:105) p ψ ( x B ) + (cid:104) S [ ψ C ( x B )] (cid:105) p ψ ( x B ) , = 2 (cid:104) S [ ψ C ( x B )] (cid:105) p ψ ( x B ) , (88)where the state ψ X ( x B ) is the reduced state in region X of the post-measurement state, ψ ( x B ),and (cid:104) S [ ψ ( x )] (cid:105) p ψ ( x ) is the average von Neumann entropy (cid:104) S [Ψ C ( x B )] (cid:105) p Ψ ( x B ) = (cid:88) x B p Ψ ( x B ) S ( ρ x B ) , (89)with p Ψ ( x B ) as in (8), and ρ x B := 1 p Ψ ( x B ) K F A x N · · · A x σA † x · · · A † x N F † . The inequality (88) follows from the fact that (up to zero eigenvalues) Ψ C ( x B ) is isospectral to ρ x B , as discussed in Section 3. By Lemma 6, we know that (cid:104) S [Ψ C ( x B )] (cid:105) p Ψ ( x B ) ≤ H B ( Q ) + Q log( D − , (90)with Q := 1 − (cid:88) x B p Ψ ( x B ) (cid:107) ρ x B (cid:107) . (91)27y Lemma 8 we know that the function g ( t ) = t − t log t is monotonically increasing on theinterval [0 , 1] and satisfies H B ( t ) ≤ g ( t ). By combining this observation with (90), we get (cid:104) S [Ψ C ( x B )] (cid:105) p Ψ ( x B ) ≤ g ( Q ) + Q log( D − 1) = − Q log Q + 1ln a Q + Q log( D − . (92)By Lemma 7 we furthermore know that Q ≤ D − K (cid:88) x B λ ↓ (cid:16) F A x N · · · A x σA † x · · · A † x N F † (cid:17) , (93)where λ ↓ j ( O ) are the eigenvalues of an operator O in non-increasing order, i.e., λ ↓ ( O ) ≥ · · · ≥ λ ↓ D ( O ).Similarly, we let in the following ν ↓ j ( O ) denote the singular values of O in non-increasing order ν ↓ ( O ) ≥ · · · ≥ ν ↓ D ( O ). Then, recalling that for any operator O , we have λ j ( OO † ) = ν j ( O ) , we get Q ≤ D − K (cid:88) x B λ ↓ (cid:16) F A x N · · · A x σA † x · · · A † x N F † (cid:17) , ≤ D − K (cid:88) x B (cid:113) λ ↓ ( F A x N · · · A x σA † x · · · A † x N F † ) λ ↓ ( F A x N · · · A x σA † x · · · A † x N F † ) , = D − K (cid:88) x B ν ↓ ( F A x N · · · A x √ σ ) ν ↓ ( F A x N · · · A x √ σ ) , = D − K f ( N ) , [By Proposition 5] ≤ D − K cγ N , = D − (cid:104) R | E | A | + | B | + | C | ( | L (cid:105)(cid:104) L | ) | R (cid:105) cγ | B | , (94)where we recall that N = | B | and that the constant c and 0 < γ < σ := E | A | ( L )and F := (cid:112) E ∗| C | ( R ), and consequently are independent of | A | and | C | (as well as of | B | ).By Lemma 10, there exist constants r > N such that (cid:104) R | E | A | + | B | + | C | ( | L (cid:105)(cid:104) L | ) | R (cid:105) ≥ r forall | B | ≥ N . By Lemma 10 we know that r and N do not depend on | A | , | B | , | C | , | R (cid:105) , | L (cid:105) . Bycombining this observation with (94), we can conclude that Q ≤ ˜ cγ | B | , with ˜ c := D − r c, ∀| B | ≥ N , (95)where we note that ˜ c and N do not depend on | A | , | B | , | C | , | R (cid:105) , | L (cid:105) . By inspection of the definitionof Q in (91), one can see that Q ≤ Q ≤ t, ∀| B | ≥ N , with t := min (cid:104) , ˜ cγ | B | (cid:105) , (97)where t by necessity is contained in the interval [0 , g to obtain (cid:104) S [Ψ C ( x B )] (cid:105) p Ψ ( x B ) ≤ g ( Q ) + Q log( D − , [Monotonicity of g , Lemma 8, together with (97)] ≤ g ( t ) + t log( D − , = − t log t + t [1 + log( D − , [By Lemma 9] ≤ (cid:15) t − (cid:15) + t [1 + log( D − , [By t ≤ t − (cid:15) , ≤ t ≤ , < (cid:15) < ≤ (cid:20) (cid:15) + 1 + log( D − (cid:21) t − (cid:15) . (98)Since t = min (cid:104) , ˜ cγ | B | (cid:105) ≤ ˜ cγ | B | we get t − (cid:15) ≤ ˜ c − (cid:15) γ | B | (1 − (cid:15) ) , and thus (cid:104) S [Ψ C ( x B )] (cid:105) p Ψ ( x B ) ≤ (cid:20) (cid:15) + 1 + log( D − (cid:21) ˜ c − (cid:15) γ | B | (1 − (cid:15) ) . (99)By combining this with (88) we get I p ψ,B ( A : C | B ) ≤ ˜ c (cid:15) γ | B | (1 − (cid:15) ) , ∀| B | ≥ N , ˜ c (cid:15) =2 (cid:20) (cid:15) + 1 + log( D − (cid:21) ˜ c − (cid:15) . (100)Finally we should remove the restriction that | B | ≥ N . By (88) and (89) we can conclude that I p ψ,B ( A : C | B ) ≤ (cid:80) x B p Ψ ( x B ) S ( ρ x B ) ≤ D , where the last inequality follows since ρ x B is adensity operator on H , which has dimension D . Let c (cid:15) := max (cid:16) ˜ c (cid:15) , D ) γ − ( N − − (cid:15) ) (cid:17) . (101)One can confirm that this guarantees that I p ψ,B ( A : C | B ) ≤ c (cid:15) γ | B | (1 − (cid:15) ) (102)for all | B | = 1 , , . . . . The resulting constant c (cid:15) is independent of | A | , | B | , | C | , | L (cid:105) and | R (cid:105) .By combining Lemma 2 with Theorem 11, and defining κ := γ − (cid:15) and c := c (cid:15) for some arbitrarybut fixed 0 < (cid:15) < 1, we get the following. Corollary 12. For a set of operators { A x } d − x =0 on a Hilbert space H with D := dim H ≥ ,and normalized | R (cid:105) , | L (cid:105) ∈ H , let Ψ be the MPS as defined in (1) on a region Λ = ABC . The set { A x } d − x =0 is such that (cid:80) d − x =0 A † x A x = , satisfies the purity condition in Definition 3, and is such that E ( · ) := (cid:80) d − x =0 A x · A † x is primitive. Let p ,..., | Λ | ( x , . . . , x | Λ | ) = (cid:104) x Λ | Ψ | x Λ (cid:105) be the classical restrictionof Ψ , and assume that this restriction is such that p ,..., | Λ | ( x , . . . , x | Λ | ) > for all x , . . . , x | Λ | . Let p (cid:96) ,..., | Λ | be as defined in (20-23). Then, there exist constants, ≤ c and < κ < , such that S ( p ,..., | Λ | (cid:107) p (cid:96) ,..., | Λ | ) ≤ c | Λ | κ (cid:96) , ≤ (cid:96) ≤ | Λ | − . (103)29 Notions from probability theory As mentioned in the main text, and in the proof overview, the proof of Proposition 5 relies onvarious probabilistic concepts. Here, we briefly review the pertinent notions, and also collect thetechnical results that we will need at various points along the proof.Throughout these derivations we will use bold letters, such as x , Y , etc, to denote randomvariables and random operators (where ‘random variables’ by default are real-valued measurablefunctions on the underlying probability space, while ‘random operators’ are operator-valued mea-surable functions). In the following E ( x ), E ( y ), etc, denote the expectation value, and E ( y | x )denotes the expectation value of y conditioned on x . One should keep in mind that E ( y | x ) is arandom variable (due to x ). One should also keep in mind the general relation E (cid:0) E ( y | x ) (cid:1) = E ( y ). B.1 Almost surely When we say that a relation for one, or several, random variables holds almost surely ( a.s. ), itmeans that the relation is true apart from a set of probability zero. Put differently, the relation istrue with probability one. For example, x = y a.s. means that P ( { ω ∈ Ω : x ( ω ) = y ( ω ) } ) = 1,where Ω denotes the underlying sample space, and ω an element of the sample space.As examples, one can consider various notions that intuitively remain true even if ‘a few’ pointsare excluded. For example, if x and y are such that x ≤ y then (if the expectations exist) E ( x ) ≤ E ( y ). This conclusion still holds, even if the inequality only holds almost everywhere (seee.g. Theorem 4.4 in chapter 2 of [37]): Lemma 13. If x and y are non-negative random variables, then x ≤ y a.s implies E ( y ) ≤ E ( x ) . Another statement in a similar spirit is the following. (The claim of the lemma is contained inTheorem 4.4 in chapter 2 of [37].) Lemma 14. If x is a non-negative random variable, then x = 0 a.s. if and only if E ( x ) = 0 . For a random variable x , we define the positive and negative components by x + := max( x , x − := max( − x , x + ≥ x − ≥ x = x + − x − . Theexpectation value E ( x ) of a random variable, x , is defined as E ( x ) = E ( x + ) − E ( x − ) if at leastone of E ( x + ) and E ( x − ) is finite. Lemma 15. If x is a random variable such that x ≥ a.s. and E ( x ) = 0 , then x = 0 a.s. Proof. Since x ≥ x − = 0 almost surely. Since x − byconstruction is a non-negative random variable, it follows by Lemma 14 that E ( x − ) = 0. Wecan conclude that E ( x ) is well defined, and E ( x ) = E ( x + ) − E ( x − ) and thus E ( x ) = 0 implies E ( x + ) = E ( x − ) = 0. Since x + by construction is non-negative, Lemma 14 implies x + = 0 a.s. Wecan thus conclude that x = x + − x − = 0 a.s. B.2 Stochastic convergence of real-valued sequences A sequence of random variables ( x N ) N ∈ N is said to converge almost surely to a random variable x ∞ (denoted lim N →∞ x N = x ∞ a.s. ) if P ( { ω ∈ Ω : lim N →∞ x N ( ω ) = x ∞ ( ω ) } ) = 1 . (104)30s mentioned above, ω is an element of the underlying sample space, Ω, and x N ( ω ) is a specificrealization of the stochastic process. (If one thinks of an infinite sequence of coin-tosses, then x N vaguely stands for all possible sequences of coin-tosses, while x N ( ω ) means a specific sequence ofheads and tails.) What (104) essentially says is that if we look at the set of all sequences x N ( ω )and x ∞ ( ω ), such that x N ( ω ) actually do converge to x ∞ ( ω ), then this set has probability 1.In these derivations, we will often start with a process that converges lim N →∞ x N = x ∞ almostsurely, but we want to show that lim N →∞ E ( x N ) = E ( x ∞ ). This is not generally true, but as aconsequence of Lebesgues dominated convergence theorem (see e.g., Theorem 5.3 in chapter 2 of[37]) we have the following. Proposition 16. Suppose that x N , x ∞ and y are random variables such that | x N | ≤ y for all N ,where E ( y ) < + ∞ , and that x N → x ∞ a.s. , then lim N →∞ E ( x N ) = E ( x ∞ ) . (105)In the special case that y is equal to a constant C , one obtains the following special case(sometimes referred to as the bounded convergence theorem). Proposition 17. Suppose that x N , x ∞ are random variables, and there exists a constant C < + ∞ ,such that | x N | ≤ C for all N . If x N → x ∞ a.s. , then lim N →∞ E ( x N ) = E ( x ∞ ) . (106)The following lemma is a consequence of the Borel-Cantelli Lemma (and is included in Theorem3.1 in chapter 5 of [37]). Lemma 18. Let ( x n ) n ∈ N and x be random variables such that (cid:80) ∞ n =1 P ( | x n − x | > (cid:15) ) < + ∞ for all (cid:15) > (sometimes referred to as complete convergence of ( x n ) n ∈ N to x ). Then, ( x n ) n ∈ N convergesalmost surely to x . The above can be used to obtain the following. Lemma 19. Let ( r N ) N ∈ N be a sequence of random variables, such that r N ≥ , and such thatthe expectations values E ( r N ) exist and are finite. Suppose that there exists a number R such that lim k →∞ E (cid:16) (cid:80) kN =1 r N (cid:17) = R < + ∞ , then lim N →∞ r N = 0 a.s. Proof. We first note that if ( a N ) N ∈ N is a sequence of real numbers such that a N ≥ 0, and if A k := (cid:80) kN =1 a N is such that lim k →∞ A k = R < + ∞ , then lim N →∞ a N = 0. With a N := E ( r N ),and A k := (cid:80) kN =1 a N = E (cid:16) (cid:80) kN =1 r N (cid:17) , it thus follows, by the assumptions of the lemma, thatlim N →∞ E ( r N ) = lim N →∞ a N = 0. By assumption, r N ≥ E ( r N ) are well defined and finite.Hence, by Markov’s inequality, it follows that P ( r N > (cid:15) ) ≤ E ( r N ) /(cid:15) for all (cid:15) > 0. Consequently, (cid:80) kN =1 P ( r N > (cid:15) ) ≤ (cid:15) E (cid:16) (cid:80) kN =1 r N (cid:17) , and thus ∞ (cid:88) N =1 P ( | r N | > (cid:15) ) ≤ (cid:15) lim N →∞ E (cid:32) k (cid:88) N =1 r N (cid:33) = R(cid:15) < + ∞ , ∀ (cid:15) > . (107)Hence, ( r N ) N ∈ N converges completely to 0. By Lemma 18, we can conclude that ( r N ) N ∈ N convergesalmost surely to 0. 31 .3 Stochastic convergence of operator-valued sequences Convergence of various sequences of operators play an important role in this investigation. Sincewe here exclusively will deal with finite-dimensional spaces, one may argue that the distinctionbetween ‘random variables’ and ‘random operators’ is not very dramatic. For the sake of clarity,we will nevertheless throughout these derivations make a distinction of the these two types and, tofurther this, we will use small bold letters, x , y , etc to denote random variables, while capital boldletters X , Y , etc denote random operators.Here, we briefly recall that, on finite-dimensional Hilbert spaces, all norms are metrically equiv-alent. Due to the metrical equivalence in finite dimensions (see e.g., Corollary 5.4.5 in [38]),we do not need to make a distinction between different norms when we discuss convergences ofsequences of operators, and we can equivalently consider the element-wise convergence of the ele-ments of the matrix-representation in some arbitrary basis. In what follows we will switch betweenthese equivalent manifestations of convergence without any further comments. For the operatornorms, we will mainly be using the supremum norm, (cid:107) O (cid:107) := sup (cid:107) ψ (cid:107) =1 (cid:107) O | ψ (cid:105)(cid:107) , and the trace-norm, (cid:107) O (cid:107) := Tr √ O † O , but also the Hilbert-Schmidt norm, (cid:107) O (cid:107) := (cid:112) Tr( O † O ).Let us now consider a sequence of random operators ( X N ) N ∈ N and X ∞ on a finite-dimensionalHilbert space. We interpret the convergence X N → X ∞ a.s. aslim N →∞ (cid:107) X N − X ∞ (cid:107) = 0 a.s., (108)or equivalently for any other operator norm (since the underlying Hilbert space is finite-dimensional),or as lim N →∞ (cid:104) k | X N | k (cid:48) (cid:105) = (cid:104) k | X ∞ | k (cid:48) (cid:105) a.s. ∀ N, N (cid:48) = 1 , . . . , D. (109)The following is a counterpart of Proposition 17, which can be obtained by applying Proposition17 to the real and imaginary matrix components with respect to a basis, i.e., Re (cid:104) k | X N | k (cid:48) (cid:105) andIm (cid:104) k | X N | k (cid:48) (cid:105) . Proposition 20. Suppose that X N and X ∞ are random operators on a complex finite-dimensionalHilbert space, and that there exists a constant C < + ∞ , such that (cid:107) X N (cid:107) ≤ C for all N . If X N → X ∞ a.s. , then lim N →∞ E ( X N ) = E ( X ∞ ) . (110) B.4 Martingales Our primary interest in martingales is that they allow for statements concerning the stochasticconvergence of sequences of random variables. However, in order to connect to the manner thatthese convergence-theorems typically are phrased in the literature, we need to briefly discuss sometechnical concepts. (For a more thorough introduction, see, e.g., chapter 10 in [37].)Consider a sequence of random variables ( y N ) N ∈ N on a probability space (Ω , F , P ), where Ωis the sample space, F is a σ -algebra (the event space), and P a probability measure. We alsoconsider a filtration, i.e., a non-decreasing sequence of σ -subalgebras F ⊂ F ⊂ · · · ⊂ F . Asequence ( y N ) N ∈ N of random variables is said to be adapted to ( F N ) N ∈ N if each y N is measurablewith respect to F N . A sequence ( y N ) N ∈ N is a martingale with respect to ( F N ) N ∈ N if ( y N ) N ∈ N isadapted to ( F N ) N ∈ N , satisfies E ( y N +1 |F N ) = y N a.s. , as well as E ( | y N | ) < + ∞ . Intuitively, F N stands for the information available to us at step N . In our setting, this information corresponds32o variables x , . . . , x N (which are assumed to also be random variables on the same underlyingprobability space (Ω , F , P )). More precisely, F N := σ ( x N , . . . , x ), which denotes the σ -algebragenerated by x , . . . , x N and often is referred to as the natural filtration of x , . . . , x N . Since in thefollowing we exclusively will use the natural filtrations, we will employ the more succinct notation E ( y N +1 | x N , . . . , x ) := E ( y N +1 |F N ), with F N := σ ( x , . . . , x N ). We moreover say that ( y N ) N ∈ N is a martingale with respect to ( x N ) N ∈ N if E ( y N +1 | x N , . . . , x ) = y N a.s., and E ( | y N | ) < + ∞ , with y N = f N ( x N , . . . , x ) , (111)for (Borel measurable) functions f N . The construction with the functions f N guarantees that( y N ) N ∈ N is adapted to the natural filtration of ( x N ) N ∈ N . The following proposition is obtained asa special case of Theorem 12.1 in chapter 10 of [37]. Proposition 21. Let ( y N ) N ∈ N be a martingale with respect to another process ( x N ) N ∈ N , andsuppose that there exists a real number C such that | y N | ≤ C for all N ∈ N , then there exists arandom variable y ∞ such that lim N →∞ y N = y ∞ a.s. and E ( | y ∞ | ) < + ∞ . (112)As a technical remark concerning the relation to Theorem 12.1 in chapter 10 of [37], one maynote that the condition | y N | ≤ C implies that ( y n ) n ∈ N is uniformly integrable.Our main interest is not these ‘standard’ real-valued martingales, but rather operator-valuedmartingales. It is again worth recalling that we here only consider finite-dimensional spaces, andhence we can represent each operator as a finite matrix with respect to some choice of basis.With this in mind, we say that an operator-valued process ( Y N ) N ∈ N on a finite-dimensionalHilbert space is an operator-valued martingale with respect to a stochastic process ( x N ) N ∈ N ifeach of Re (cid:104) k | Y N | k (cid:48) (cid:105) and Im (cid:104) k | Y N | k (cid:48) (cid:105) are a martingale with respect to ( x N ) N ∈ N for some fixedorthonormal basis {| k (cid:105)} Dk =1 . One may note that the condition that E ( | Re (cid:104) k | Y N | k (cid:48) (cid:105)| ) < + ∞ and E ( | Im (cid:104) k | Y N | k (cid:48) (cid:105)| ) < + ∞ in the finite-dimensional case is equivalent to E ( (cid:107) Y N (cid:107) ) < + ∞ . (113)Similarly, the conditions E (cid:0) Re (cid:104) k | Y N +1 | k (cid:48) (cid:105) (cid:12)(cid:12) x N , . . . , x (cid:1) = Re (cid:104) k | Y n +1 | k (cid:48) (cid:105) a.s.,E (cid:0) Im (cid:104) k | Y N +1 | k (cid:48) (cid:105) (cid:12)(cid:12) x N , . . . , x (cid:1) = Im (cid:104) k | Y n +1 | k (cid:48) (cid:105) a.s., (114)can equivalently be stated as E ( Y N +1 | x N , . . . , x ) = y N a.s. (115)In a similar manner, Proposition 21 can be applied to the real and imaginary components of anoperator-valued martingale, which yields the following ‘operator counterpart’ to Proposition 21. Proposition 22. Let ( Y N ) N ∈ N be an operator-valued martingale on a finite-dimensional complexHilbert space with respect to a real-valued process ( x N ) N ∈ N , and suppose that there exists real number C such that (cid:107) Y N (cid:107) ≤ C for all N ∈ N , then there exists a random operator, Y ∞ , such that lim N →∞ Y N = Y ∞ a.s. and E ( (cid:107) Y ∞ (cid:107) ) < + ∞ . (116)33 Stochastic process of measurements Consider a set of operators { A x } d − x =0 on a finite-dimensional Hilbert space, H , with D := dim H ,such that (cid:80) d − x =0 A † x A x = . We introduce the stochastic process ( x N ) N ∈ N with a joint distributionsuch that, for each N , the marginal distribution of x , . . . , x N is given by P ( x N = x N , . . . , x = x ) = 1 D Tr (cid:16) A x N · · · A x A † x · · · A † x N (cid:17) . (117)This means that x , . . . , x N can be interpreted as the outcomes of a sequence of measurements,where the initial state is maximally mixed, i.e., σ = /D . Note that when we in the followingrefer to an expectation value, the underlying probability distribution is assumed to be (117) unlessotherwise stated. It will be useful to note that P ( x N +1 = x N +1 | x N = x N , . . . , x = x ) = Tr( A † x · · · A † x N A † x N +1 A x N +1 A x N · · · A x )Tr( A † x · · · A † x N A x N · · · A x ) , (118)where P ( y | x ) := P ( y, x ) /P ( x ) denotes the conditional probability.Based on the process ( x N ) N ∈ N , we define sequences of random operators A N := A x N , W N := A N · · · A = A x N · · · A x , M N := W † N W N Tr( W † N W N ) if Tr( W † N W N ) (cid:54) = 00 if Tr( W † N W N ) = 0 . (119)One should note that M N is Hermitian, i.e., M † N = M N . Moreover, M N is positive semidefinite,and either has trace 1 or trace 0. Hence, M N is either a density operator, or the zero operator.One may further note thatTr( W † N W N ) = Tr( A † x · · · A † x N A x N · · · A x ) = P ( x N , . . . , x ) D, (120)from which we can conclude that M N = 0 with probability zero, i.e., M N is almost surely a densityoperator.As a side-remark, one might note that M N is not the post-measurement state of the mea-surement process. The post-measurement state would rather be ρ N := W N W † N / Tr( W † N W N ).However, M N and ρ N have the same non-zero eigenvalues (as can be seen by a singular valuedecomposition of W N ). The main reason for why it is convenient to use M N , rather than ρ N , isthat on M N we can directly utilize (cid:80) x A † x A x = , which for example is used in the proof of themartingale property in Lemma 23.In the following it will be useful to observe that since A N is a (deterministic) function of x N [as seen by (119)] it is the case that E ( A N | x N ) = A x N = A N . (121)Analogously, E ( W N | x N , . . . , x ) = W N , and similarly E ( M N | x N , . . . , x ) = M N .34 Elements of the proof of Proposition 5 D.1 lim N →∞ M N = M ∞ a.s. The purpose of this section is to show that M N has limit operator M ∞ in a sufficiently strongsense, and that this limit operator has ‘nice’ properties. We do this by first showing that M N isa martingale relative to the sequence of measurement outcomes x N . This in turn yields almostsure convergence to limiting operator M ∞ . Recall that the underlying probability distribution isassumed to be (117), and that all expectations are taken with respect to this distribution. Lemma 23. Let { A x } d − x =0 be linear operators on a finite-dimensional complex Hilbert space, suchthat (cid:80) d − x =0 A † x A x = . Then, ( M N ) N ∈ N , defined by (119), is an operator-valued martingale withrespect to ( x N ) N ∈ N with distribution (117).Proof. From the fact that each M N is a density operator, or the zero operator, it follows that (cid:107) M N (cid:107) ≤ 1, and thus in particular that E ( (cid:107) M N (cid:107) ) ≤ < + ∞ . Moreover, by the constructionin (119), it is the case that M n (and thus the matrix-elements with respect to a given basis) arefunctions of x n , . . . , x . By (118) and (cid:80) d − x N +1 =0 A † x N +1 A x N +1 = we find that E ( M N +1 | x N = x N , . . . , x = x )= (cid:88) x N +1 A † x · · · A † x N A † x N +1 A x N +1 A x N · · · A x Tr( A † x · · · A † x N A † x N +1 A x N +1 A x N · · · A x ) P ( x N +1 = x N +1 | x N = x N , . . . , x = x ) , = E ( M N | x N = x N , . . . , x = x ) . We can conclude that E ( M N +1 | x N , . . . , x ) = E ( M N | x N , . . . , x ) = M N . Hence, ( M N ) N is amartingale sequence with respect to ( x N ) N ∈ N . Lemma 24. Let { A x } d − x =0 be linear operators on a finite-dimensional complex Hilbert space, suchthat (cid:80) d − x =0 A † x A x = . Let ( M N ) N ∈ N be as defined in (119) with respect to ( x N ) N ∈ N and distributedas in (117). Then, there exists a random operator, M ∞ , such that lim N →∞ M N = M ∞ a.s., (122) M ∞ is almost surely a density operator, (123)lim N →∞ E ( M N ) = E ( M ∞ ) , (124)lim N →∞ E ( (cid:107) M N (cid:107) ) = E ( (cid:107) M ∞ (cid:107) ) , (125) E ( (cid:107) M ∞ (cid:107) ) < + ∞ . (126) Proof. By Lemma 23 we know that ( M N ) N ∈ N is a martingale with respect to ( x N ) N ∈ N . From thefact that each M N is a density operator, or the zero operator, it follows that (cid:107) M N (cid:107) ≤ , ∀ N ∈ N (127)35y Proposition 22 it follows that there exists a random operator, M ∞ , such thatlim N →∞ M N = M ∞ a.s., (128)with E ( (cid:107) M ∞ (cid:107) ) < + ∞ . By combining (128) with (127), Proposition 20 yields E ( M N ) → E ( M ∞ ).Moreover, (128) yields lim N →∞ (cid:107) M N (cid:107) = (cid:107) M ∞ (cid:107) a.s. By this observation together with (127),Proposition 17 with x N := (cid:107) M N (cid:107) and x ∞ := (cid:107) M ∞ (cid:107) yields E ( (cid:107) M N (cid:107) ) → E ( (cid:107) M ∞ (cid:107) ). Finally, weshould show that M ∞ almost surely is a density operator, i.e., that M ∞ ≥ M ∞ = 1 almost surely. From (128) it follows that lim N →∞ (cid:104) ψ | M N | ψ (cid:105) = (cid:104) ψ | M ∞ | ψ (cid:105) a.s .Since (cid:104) ψ | M N | ψ (cid:105) ≥ 0, it follows that (cid:104) ψ | M ∞ | ψ (cid:105) ≥ a.s . Analogously, since Tr M N = 1 almostsurely, it follows that lim N →∞ Tr M N = Tr M ∞ = 1 a.s . Hence, M ∞ is almost surely a densityoperator. D.2 If { A x } d − x =0 satisfies the purity condition, then rank( M ∞ ) = 1 a.s. The purpose of this section is to show that the limit operator M ∞ , more or less always, is arank-one operator whenever { A x } d − x =0 satisfies the purity condition. The first step (Lemma 25) isto show that the difference between the operators M N + p and M N tends to vanish as N increases,even when conditioned on all the measurement outcomes x N , . . . , x . Lemma 25. Let { A x } d − x =0 be linear operators on a finite-dimensional complex Hilbert space, suchthat (cid:80) d − x =0 A † x A x = . Let ( M N ) N ∈ N be as defined in (119) with respect to ( x N ) N ∈ N and distributedas in (117). Then, lim N →∞ E (cid:16) (cid:107) M N + p − M N (cid:107) (cid:12)(cid:12)(cid:12) x N , . . . , x (cid:17) = 0 a.s. (129) Proof. Recall that M N is a deterministic function of x N , . . . , x , and thus E ( M N | x N , . . . , x ) = M N . (130)A direct consequence is that M N and M N + p are independent when conditioned on x N , . . . , x ,and thus E ( M N + p M N | x N , . . . , x ) = E ( M N + p | x N , . . . , x ) E ( M N | x N , . . . , x ) . (131)By expanding E (cid:0) ( M N + p − M N ) (cid:1) one obtains cross-terms such as E ( M N + p M N ) = E (cid:16) E ( M N + p M N | x N , . . . , x ) (cid:17) , = E (cid:16) E ( M N + p | x N , . . . , x ) E ( M N | x N , . . . , x ) (cid:17) , (132)where the last equality follows by the conditional independence in (131). By combining theseobservations with the martingale property, as shown in Lemma 23, with (130), E (cid:0) ( M N + p − M N ) (cid:1) results in E (cid:0) ( M N + p − M N ) (cid:1) = E ( M N + p ) + E ( M N ) − E (cid:16) E ( M N | x N , . . . , x ) E ( M N | x N , . . . , x ) (cid:17) , = E ( M N + p ) − E ( M N ) , (133)36here we in the last step have used (130).By the observation that M N is Hermitian, it follows that Tr E ( M N + p ) = E ( (cid:107) M N + p (cid:107) ),Tr E ( M N ) = E ( (cid:107) M N (cid:107) ) and Tr E (cid:0) ( M N + p − M N ) (cid:1) = E (cid:0) (cid:107) M N + p − M N (cid:107) (cid:1) , which with (133)yields E ( (cid:107) M N + p (cid:107) ) − E ( (cid:107) M N (cid:107) ) ≥ . (134)By using (133), we next observe that p − (cid:88) N =0 E ( M N + k +1 ) − p − (cid:88) N =0 E ( M N ) = k (cid:88) N =0 E ( M N + p ) − k (cid:88) N =0 E ( M N ) , = k (cid:88) N =0 E (cid:0) ( M N + p − M N ) (cid:1) , = E (cid:32) k (cid:88) N =0 E (cid:16) ( M N + p − M N ) (cid:12)(cid:12)(cid:12) x N , . . . , x (cid:17)(cid:33) , (135)where we in the last step use the general relation E (cid:0) E ( y | x ) (cid:1) = E ( y ). Recall that (cid:107) O (cid:107) := Tr( O ).By applying the trace to (135) we obtain E (cid:32) k (cid:88) N =0 E (cid:16) (cid:107) M N + p − M N (cid:107) (cid:12)(cid:12)(cid:12) x N , . . . , x (cid:17)(cid:33) = p − (cid:88) N =0 (cid:20) E ( (cid:107) M N + k +1 (cid:107) ) − E ( (cid:107) M N (cid:107) ) (cid:21) . By Lemma 24 we know that lim N →∞ M N = M ∞ almost surely. From this observation it followsthat lim N →∞ (cid:107) M N + k +1 (cid:107) = (cid:107) M ∞ (cid:107) a.s. Next, we note that M N is a density operator, or thezero operator, and thus it follows that (cid:107) M N (cid:107) ≤ 1. With x N := (cid:107) M N (cid:107) and x ∞ := (cid:107) M ∞ (cid:107) , itfollows by Proposition 17 that lim N →∞ E ( (cid:107) M N + k +1 (cid:107) ) = E ( (cid:107) M ∞ (cid:107) ) . (136)By Lemma 24 we know that M ∞ is almost surely a density operator, from which it follows that (cid:107) M ∞ (cid:107) ≤ a.s. With x := (cid:107) M ∞ (cid:107) and y := 1 in Lemma 13, we get E ( (cid:107) M ∞ (cid:107) ) ≤ . (137)With p := k + 1 in the inequality (134), it follows that E ( (cid:107) M N + k +1 (cid:107) ) − E ( (cid:107) M N (cid:107) ) ≥ 0, whichimplies E ( (cid:107) M ∞ (cid:107) ) − E ( (cid:107) M N (cid:107) ) ≥ 0. Hence,lim k →∞ E (cid:32) k (cid:88) N =0 E (cid:16) (cid:107) M N + p − M N (cid:107) (cid:12)(cid:12)(cid:12) x N , . . . , x (cid:17)(cid:33) = p − (cid:88) N =0 (cid:20) E ( (cid:107) M ∞ (cid:107) ) − E ( (cid:107) M N (cid:107) ) (cid:21) =: R ( p ) . (138)By E ( (cid:107) M ∞ (cid:107) ) − E ( (cid:107) M N (cid:107) ) ≥ 0, it follows that R ( p ) ≥ R ( p ) ≤ p < + ∞ . Define r N := E ( (cid:107) M N + p − M N (cid:107) | x N , . . . , x ). Note that r N ≥ 0. Moreover, E ( r N ) = E ( (cid:107) M N + p − M N (cid:107) ). Since M N is either a density operator, or the zero operator, itfollows that (cid:107) M N (cid:107) ≤ 1, and thus (cid:107) M N + p − M N (cid:107) ≤ ( (cid:107) M N + p (cid:107) + (cid:107) M N (cid:107) ) ≤ 4. We concludethat E ( (cid:107) M N + p − M N (cid:107) ) ≤ 4, which together with E ( r N ) = E ( (cid:107) M N + p − M N (cid:107) ) yields E ( r N ) ≤ R ( p ) such that lim k →∞ E ( (cid:80) kN =0 r N ) = R ( p ) < + ∞ . All theconditions of Lemma 19 are thus satisfied and it yieldslim N →∞ E (cid:16) (cid:107) M N + p ( ω ) − M N ( ω ) (cid:107) (cid:12)(cid:12)(cid:12) x N ( ω ) , . . . , x ( ω ) (cid:17) = 0 . (139)37ext, we note that x (cid:55)→ x is a convex function, and thus by Jensen’s inequality ( E ( X )) ≤ E ( X ).By combining this observation with (139) we obtainlim N →∞ E (cid:16) (cid:107) M N + p − M N (cid:107) (cid:12)(cid:12)(cid:12) x N , . . . , x (cid:17) = 0 a.s. (140)By the general relation between the supremum norm and the Hilbert-Schmidt norm, (cid:107) R (cid:107) ≤ (cid:107) R (cid:107) ,we get E ( (cid:107) R (cid:107) ) ≤ E ( (cid:107) R (cid:107) ), and thus (140) yields (129).The following lemma provides a reformulation of the purity condition that is better suited forthe proof-technique that we employ. Lemma 26. Let { A x } d − x =0 be linear operators on a complex finite-dimensional Hilbert space, H .Then, { A x } d − x =0 satisfies the purity condition if and only if the following condition holds:If O is an operator on H such that O † A † x · · · A † x N A x N · · · A x O ∝ O † O, ∀ N ∈ N , ∀ ( x , . . . , x N ) ∈ { , . . . , d − } × N , then rank( O ) = 1 . (141) Proof. We start proving the direction that, if { A x } d − x =0 satisfies condition (141), then { A x } d − x =0 alsosatisfies the purity condition. Suppose that condition (141) holds. For the subset of operators O = P for projectors P , we thus find that condition (33) holds, and hence { A x } d − x =0 satisfies thepurity condition.Conversely, we wish to show that, if { A x } d − x =0 satisfies the purity condition, then { A x } d − x =0 alsosatisfies condition (141). Hence, assume that { A x } d − x =0 satisfies the purity condition. Let O be anyoperator on H such that O † A † x · · · A † x N A x N · · · A x O ∝ O † O (142)for all N and all x , . . . , x N . We next note that OO † is positive semidefinite, and let ( OO † ) (cid:9) denote the inverse on the support of OO † , such that ( OO † ) (cid:9) OO † = OO † ( OO † ) (cid:9) = P , where P isthe projector onto the support of OO † . Multiplying (142) from the left with ( OO † ) (cid:9) O and fromthe right with O † ( OO † ) (cid:9) results in P A † x · · · A † x N A x N · · · A x P ∝ P . Since the purity condition isassumed to hold, it follows that rank( P ) = 1. However, rank( P ) = rank( OO † ) = rank( O ). We canthus conclude that if { A x } d − x =0 satisfies the purity condition, then { A x } d − x =0 also satisfies condition(141).In the following lemma we use the convergence in (140) to show that M ∞ almost surely isa rank-one operator. A key-step in the proof is the equality (144) below, which with (140)and the observation that (cid:107)√ M N (cid:107) ≤ N →∞ M N = M ∞ a.s. , it seems reasonable that we in the limit N → ∞ obtain the propor-tionality in (149). The latter does via Lemma 26 imply the desired result that M ∞ almost surelyis a rank-one operator. However, there is a complication to this reasoning, namely the sequenceof unitary operators, U N . These unitary operators are the result of a polar decomposition of theoperators A x N · · · A x / (cid:113) Tr( A † x · · · A † x N A x N · · · A x ), and we have very little control of the sequence( U N ) N ∈ N , and in particular whether it possesses a limit U ∞ . However, we can mend this issueby using the fact that the set of unitary operators on a finite-dimensional Hilbert space is sequen-tially compact. Recall that a topological space, C , is sequentially compact if, for every sequence( x j ) j ∈ N ⊂ C , there exists a subsequence ( x j k ) k ∈ N such that x j k converges to an element in C . Ona finite-dimensional complex Hilbert space with dimension D , the set of unitary operators, U ( D ),38orms a sequentially compact (as well as compact) space. Hence, whenever we have a sequence( U j ) j ∈ N in U ( D ), then there exists a subsequence ( U j k ) k ∈ N such that U j k converges to an elementin U ( D ). Lemma 27. With the assumptions in Lemma 24, let M ∞ be the random operator guaranteed byLemma 24. If { A x } d − x =0 satisfies the purity condition, then rank( M ∞ ) = 1 a.s. (143) Proof. We define M x N ,...,x := A † x · · · A † x N A x N · · · A x / Tr( A † x · · · A † x N A x N · · · A x ), and thus wehave M N = M x N ,..., x . We make a polar decomposition with a unitary operator U x N ,...,x suchthat U x N ,...,x (cid:112) M x N ,...,x = A x N · · · A x / (cid:113) Tr( A † x · · · A † x N A x N · · · A x ), and define U N := U x N ,..., x .Then, we have E (cid:16) (cid:107) M N + p − M N (cid:107) (cid:12)(cid:12)(cid:12) x N = x N , . . . , x = x (cid:17) = (cid:88) x N + p ,...,x N +1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:112) M x N ,...,x U † x N ,...,x A † x N +1 · · · A † x N + p A x N + p · · · A x N +1 U x N ,...,x (cid:112) M x N ,...,x − M x N ,...,x Tr( A † x N +1 · · · A † x N + p A x N + p · · · A x N +1 U x N ,...,x M x N ,...,x U † x N ,...,x ) (cid:13)(cid:13)(cid:13)(cid:13) , = E (cid:88) x (cid:48) p ,...,x (cid:48) (cid:13)(cid:13)(cid:13)(cid:13)(cid:112) M N U † N A † x (cid:48) · · · A † x (cid:48) p A x (cid:48) p · · · A x (cid:48) U N (cid:112) M N − M N Tr( A † x (cid:48) · · · A † x (cid:48) p A x (cid:48) p · · · A x (cid:48) U N M N U † N ) (cid:13)(cid:13)(cid:13)(cid:13) (cid:12)(cid:12)(cid:12)(cid:12) x N = x N , . . . , x = x (cid:19) , where we in the second equality have renamed the indices x N +1 , . . . , x N + p to x (cid:48) , . . . , x (cid:48) p . Conse-quently, E (cid:16) (cid:107) M N + p − M N (cid:107) (cid:12)(cid:12)(cid:12) x N , . . . , x (cid:17) = (cid:88) x (cid:48) p ,...,x (cid:48) (cid:13)(cid:13)(cid:13)(cid:13)(cid:112) M N U † N A † x (cid:48) · · · A † x (cid:48) p A x (cid:48) p · · · A x (cid:48) U N (cid:112) M N − M N Tr( A † x (cid:48) · · · A † x (cid:48) p A x (cid:48) p · · · A x (cid:48) U N M N U † N ) (cid:13)(cid:13)(cid:13)(cid:13) , (144)where we have used that M N and U N are deterministic functions of x (cid:48) N , . . . , x (cid:48) . Since M N is adensity operator, or the zero operator, it follows that (cid:107)√ M N (cid:107) ≤ 1. By combining this observationwith (144), and with Lemma 25, it follows thatlim N →∞ (cid:88) x p ,...,x (cid:13)(cid:13)(cid:13)(cid:13) M N U † N A † x · · · A † x p A x p · · · A x U N M N − M N Tr( A † x · · · A † x p A x p · · · A x U N M N U † N ) (cid:13)(cid:13)(cid:13)(cid:13) = 0 a.s. (145)Next, we recall that Lemma 24 guarantees that lim N →∞ M N = M ∞ a.s. , where M ∞ almostsurely is a density operator. Let ω ∈ Ω be such that lim N →∞ M N ( ω ) = M ∞ ( ω ), where M ∞ ( ω ) isa density operator, and the limit in (145) holds. The latter implies a sequence of unitary operators( U N ( ω )) N ∈ N ⊂ U ( D ). By the sequential compactness of U ( D ), it follows that there exists asubsequence (cid:0) U N k ( ω ) (cid:1) k ∈ N and an element U ∞ ( ω ) ∈ U ( D ), such that lim k →∞ U N k ( ω ) = U ∞ ( ω ).39t still remains true that lim k →∞ M N k ( ω ) = M ∞ ( ω ), and similarly the limit in (145) remains truewith N replaced with N k . With the definition B k ( ω ) := (cid:88) x p ,...,x (cid:13)(cid:13)(cid:13)(cid:13) M N k ( ω ) U † N k ( ω ) A † x · · · A † x p A x p · · · A x U N k ( ω ) M N k ( ω ) − M N k ( ω ) Tr (cid:0) A † x · · · A † x p A x p · · · A x U N k ( ω ) M N k ( ω ) U N k ( ω ) † (cid:1)(cid:13)(cid:13)(cid:13)(cid:13) , it thus follows by (145) that B k ( ω ) → 0. Define B ∞ ( ω ) := (cid:88) x p ,...,x (cid:13)(cid:13)(cid:13)(cid:13) M ∞ ( ω ) U ∞ ( ω ) † A † x · · · A † x p A x p · · · A x U ∞ ( ω ) M ∞ ( ω ) − M ∞ ( ω ) Tr( A † x · · · A † x p A x p · · · A x U ∞ ( ω ) M ∞ ( ω ) U ∞ ( ω ) † ) (cid:13)(cid:13)(cid:13)(cid:13) . (146)Next we wish to show that B k ( ω ) → B ∞ ( ω ). By the inverted triangle inequality, a rearrange-ment, and the triangle inequality, one obtains | B ∞ ( ω ) − B k ( ω ) | ≤ (cid:88) x p ,...,x (cid:13)(cid:13)(cid:13)(cid:13) M ∞ ( ω ) U ∞ ( ω ) † A † x · · · A † x p A x p · · · A x U ∞ ( ω ) M ∞ ( ω ) − M N k ( ω ) U † N k ( ω ) A † x · · · A † x p A x p · · · A x U N k ( ω ) M N k ( ω ) (cid:13)(cid:13)(cid:13)(cid:13) + (cid:88) x p ,...,x (cid:13)(cid:13)(cid:13)(cid:13) M ∞ ( ω ) Tr (cid:0) A † x · · · A † x p A x p · · · A x U ∞ ( ω ) M ∞ ( ω ) U ∞ ( ω ) † (cid:1) − M N k ( ω ) Tr (cid:0) A † x · · · A † x p A x p · · · A x U N k ( ω ) M N k ( ω ) U N k ( ω ) † (cid:1)(cid:13)(cid:13)(cid:13)(cid:13) (147)The goal is to utilize the fact that M N k ( ω ) → M ∞ ( ω ), and consequently that M N k ( ω ) → M ∞ ( ω ),and similarly that U N k ( ω ) → U ∞ ( ω ). To this end, in the first sum in (147), inside the norm, one cansubtract and add M N k ( ω ) U † N k ( ω ) A † x · · · A † x p A x p · · · A x U ∞ ( ω ) M ∞ ( ω ). Similarly in the secondsum, we subtract and add M N k ( ω ) Tr( A † x · · · A † x p A x p · · · A x U ∞ ( ω ) M ∞ ( ω ) U ∞ ( ω ) † ) inside of thenorm. One can repeatedly use the triangle inequality, subtractions and additions in the similar spiritas above, together with general relations such as (cid:107) AB (cid:107) ≤ (cid:107) A (cid:107)(cid:107) B (cid:107) , | Tr( AB ) | ≤ (cid:107) A (cid:107)(cid:107) B (cid:107) , as well asobservations such as (cid:107) U ∞ ( ω ) M ∞ ( ω ) (cid:107) ≤ (cid:13)(cid:13) M N k ( ω ) U † N k ( ω ) (cid:13)(cid:13) ≤ (cid:107) U ∞ ( ω ) M ∞ ( ω ) U ∞ ( ω ) † (cid:107) = (cid:107) M ∞ ( ω ) (cid:107) = 1, (cid:107) M N k ( ω ) (cid:107) ≤ 1, and (cid:80) x p ,...,x (cid:107) A † x · · · A † x p A x p · · · A x (cid:107) ≤ d p to show that | B ∞ ( ω ) − B n ( ω ) | ≤ d p (cid:107) M ∞ ( ω ) − M N k ( ω ) (cid:13)(cid:13) + d p (cid:107) U ∞ ( ω ) † − U † N k ( ω ) (cid:107) + d p (cid:107) M ∞ ( ω ) − M N k ( ω ) (cid:107) + d p (cid:107) U ∞ ( ω ) − U N k ( ω ) (cid:107) + d p (cid:107) M ∞ ( ω ) − M N k ( ω ) (cid:107) + d p (cid:107) U ∞ ( ω ) − U N k ( ω ) (cid:107) + d p (cid:107) M ∞ ( ω ) − M N k ( ω ) (cid:107) + d p (cid:107) U ∞ ( ω ) † − U N k ( ω ) † (cid:107) , =: C k ( ω ) , and thus C k ( ω ) → 0. We can conclude that B ∞ ( ω ) − B k ( ω ) ≤ | B ∞ ( ω ) − B k ( ω ) | ≤ C k ( ω ), whichimplies 0 ≤ B ∞ ( ω ) ≤ B k ( ω ) + C k ( ω ). By combining this observation with B k ( ω ) → k ( ω ) → 0, as well as with the definition of B ∞ ( ω ) in (146), we can conclude that B ∞ ( ω ) = (cid:88) x p ,...,x (cid:13)(cid:13)(cid:13)(cid:13) M ∞ ( ω ) U ∞ ( ω ) † A † x · · · A † x p A x p · · · A x U ∞ ( ω ) M ∞ ( ω ) − M ∞ ( ω ) Tr (cid:0) A † x · · · A † x p A x p · · · A x U ∞ ( ω ) M ∞ ( ω ) U ∞ ( ω ) † (cid:1)(cid:13)(cid:13)(cid:13)(cid:13) , =0 . (148)This in turn implies M ∞ ( ω ) U †∞ ( ω ) A † x · · · A † x p A x p · · · A x U ∞ ( ω ) M ∞ ( ω ) ∝ M ∞ ( ω ) U †∞ ( ω ) U ∞ ( ω ) M ∞ ( ω ) . (149)Since we have assumed the purity condition, it follows by Lemma 26, with O := U ∞ ( ω ) M ∞ ( ω ),that rank( M ∞ ( ω )) = rank( U ∞ ( ω ) M ∞ ( ω )) = 1. Since this holds for almost all elements ω in thesample space, we can conclude that rank( M ∞ ) = 1 a.s. D.3 rank( M ∞ ) = 1 a.s. implies the purity condition While we in the previous section demonstrated that the purity condition is sufficient for M ∞ beinga rank-one operator, we here show that it also is a necessary condition. The idea is to assumethat M ∞ has rank one, but that the purity condition does not hold. The latter means that thereexists a projector P with rank( P ) > 1, while still P A † x · · · A † x N A x N · · · A x P ∝ P . The latter isthen showed to imply P M ∞ ( ω ) P ∝ P . However, since rank( M ∞ ) = 1, the only possibility is that P M ∞ ( ω ) P = 0. This turns out to be in contradiction with (cid:80) d − x =0 A † x A x = . Lemma 28. Let { A x } d − x =0 be operators on a finite-dimensional complex Hilbert space, H , such that (cid:80) d − x =0 A † x A x = . Let ( M N ) N ∈ N be as defined in (119) with respect to ( x N ) N ∈ N and distributed asin (117). Let M ∞ := lim N →∞ M N a.s. , as guaranteed by Lemma 24. If rank( M ∞ ) = 1 a.s. , then { A x } d − x =0 satisfies the purity condition in Definition 3.Proof. We proceed via a proof by contradiction, and thus assume that { A x } d − x =0 is such thatrank( M ∞ ) = 1 a.s , but that the purity condition does not hold. The latter means that thereexists a projector P such that P A † x · · · A † x N A x N · · · A x P ∝ P, ∀ N ∈ N , ∀ ( x , . . . , x N ) ∈ { , . . . , d − } × N , (150)but rank( P ) > 1. Recall that M N = A † x ··· A † x N A x N ··· A x Tr( A † x ··· A † x N A x N ··· A x ) if Tr( A † x · · · A † x N A x N · · · A x ) (cid:54) = 0 , A † x · · · A † x N A x N · · · A x ) = 0 , (151)where we note that Tr( A † x · · · A † x N A x N · · · A x ) = 0 if and only if A † x · · · A † x N A x N · · · A x = 0. By(150), it thus follows that P M N P ∝ P . Let ω ∈ Ω be such that lim N →∞ M N ( ω ) = M ∞ ( ω ).Consequently, lim N →∞ (cid:107) P M N ( ω ) P − P M ∞ ( ω ) P (cid:107) = 0 . (152)41y P M N P ∝ P , we know that there exists a proportionality constant, a N ( ω ), for each N and ω ,such that P M N ( ω ) P = a N ( ω ) P. (153)Next we use the general relation | Tr( AB ) | ≤ (cid:107) A (cid:107) (cid:107) B (cid:107) to show | a N ( ω ) Tr( P ) − Tr( P M ∞ ( ω ) P ) | = (cid:12)(cid:12)(cid:12) Tr (cid:16) (cid:0) a N ( ω ) P − P M ∞ ( ω ) P (cid:1)(cid:17)(cid:12)(cid:12)(cid:12) , ≤ D (cid:107) P M N ( ω ) P − P M ∞ ( ω ) P (cid:107) → , (154)where we have used (cid:107) (cid:107) = D , (153) and (152). With a ∞ ( ω ) := Tr( P M ∞ ( ω )) / Tr( P ), we can thusconclude that lim N →∞ | a N ( ω ) − a ∞ ( ω ) | = 0. Hence, (cid:107) a ∞ ( ω ) P − P M ∞ ( ω ) P (cid:107) = (cid:107) a ∞ ( ω ) P − a N ( ω ) P + a N ( ω ) P − P M ∞ ( ω ) P (cid:107) , ≤| a ∞ ( ω ) − a N ( ω ) | + (cid:107) P M N ( ω ) P − P M ∞ ( ω ) P (cid:107) → . We can thus conclude that P M ∞ ( ω ) P ∝ P , and hence P M ∞ P ∝ P a.s. However, since M ∞ by assumption is rank-one a.s. , and rank( P ) > 1, the only possibility is that the proportionalityconstant is zero, i.e., that P M ∞ P = 0 a.s. Next, we note that E ( M N ) = D . By (124) in Lemma24, we know that E ( M N ) → E ( M ∞ ), and thus E ( M ∞ ) = D . However, this is in contradictionwith P M ∞ P = 0 a.s. D.4 w ( N ) goes to zero exponentially if and only if the purity condition holds In the previous sections we have shown that { A x } d − x =0 satisfies the purity condition if and only ifrank( M ∞ ) = 1. Here we show that the latter in turn is equivalent to lim N →∞ w ( N ) = 0, and thatthis in turn is equivalent to w ( N ) converging exponentially fast to zero.If M ∞ has rank one, i.e., rank( M ∞ ) = 1, then it follows that (cid:107) M ∞ (cid:107) = 1 and we can relate (cid:107) M ∞ (cid:107) − (cid:107) M N (cid:107) = 1 − (cid:107) M N (cid:107) to the eigenvalues of M N and the singular values of W N . Thelatter directly connects to the definition of w ( N ) in (157). To this end, we introduce the followingnotation. For a general operator, O , on a space of finite dimension, D , let ν ↓ ( O ) ≥ · · · ≥ ν ↓ D ( O ) bethe ordered singular values of O . Similarly, for a Hermitian operator, J , let λ ↓ ( J ) ≥ · · · ≥ λ ↓ D ( J )be the ordered eigenvalues of J .The fact that w ( N ) converges to zero does, of course, not guarantee that w ( N ) converges ex-ponentially fast to zero. The latter we obtain by first showing that w ( N ) is submultiplicative, i.e., w ( N + M ) ≤ w ( N ) w ( M ), which implies that log w ( N ) is subadditive. We obtain the submultiplica-tivity by rewriting w ( N ) in terms of the norm of the second order exterior power of A x N · · · A x . Inorder to introduce the exterior power of an operator, consider a Hilbert space, H , with an orthonor-mal basis, {| j (cid:105)} Dj =1 , and D = dim H . On the product space H ⊗ H , we construct the swap-operator, S := (cid:80) Dj,k =1 | j (cid:105)(cid:104) k | ⊗ | k (cid:105)(cid:104) j | , where one may note that S = ⊗ and S † = S . We also define theprojector P A := ( ⊗ − S ) onto the the anti-symmetric subspace of H ⊗ H . For an operator O on H , we define the exterior power (of degree two) of an operator O as ∧ ( O ) := P A [ O ⊗ O ] P A . A con-sequence of this definition is that (cid:107) ∧ ( O ) (cid:107) = ν ↓ ( O ) ν ↓ ( O ). Moreover, ∧ ( O O ) = ∧ ( O ) ∧ ( O ),and consequently (cid:107) ∧ ( O O ) (cid:107) ≤ (cid:107) ∧ ( O ) (cid:107)(cid:107) ∧ ( O ) (cid:107) . By comparing these definitions with (157)below, we can conclude that w ( N ) = (cid:80) d − x ,...,x N =0 (cid:107) ∧ ( A x N · · · A x ) (cid:107) . A further observation is that (cid:107) ∧ ( O ) (cid:107) ≤ (cid:107) O (cid:107) . (155)The exponential decay of w ( N ) is obtained by combining lim N →∞ w ( N ) = 0 with the submulti-plicativity of log w ( N ) and Fekete’s subadditivity lemma. Fekete’s Lemma is commonly attributedto [39]. For a proof, see Lemma 1.2.1 in [40], and for a historical overview, see Section 1.10 in [40].42 emma 29 (Fekete’s subadditive lemma) . Let ( a N ) N ∈ N be a subadditive sequence of real numbers,i.e., a N + M ≤ a N + a M . Then the limit lim N →∞ a N /N is well defined (but may be −∞ ) and lim N →∞ a N N = inf N ∈ N a N N . (156) Proposition 30. Let { A x } d − x =0 be linear operators on a finite-dimensional Hilbert space, such that (cid:80) d − x =0 A † x A x = . Define w ( N ) := d − (cid:88) x ,...,x N =0 ν ↓ ( A x N · · · A x ) ν ↓ ( A x N · · · A x ) . (157) The following statements are equivalent:1. { A x } d − x =0 satisfies the purity condition in Definition 3.2. lim N →∞ w ( N ) = 0 .3. There exist real constants C (cid:48) ≥ and < γ < such that w ( N ) ≤ C (cid:48) γ N , ∀ N ∈ N . (158) Proof. ⇒ Let M N be as defined in (119). We first distinguish the two cases that M N isa density operator, or that it is the zero operator. In the case that M N is a density operator, itfollows that 1 = Tr( M N ) ≥ λ ↓ ( M N ) + λ ↓ ( M N ), and thus 1 ≥ − λ ↓ ( M N ) ≥ λ ↓ ( M N ) ≥ 0. Bynoting that (cid:107) M N (cid:107) = λ ↓ ( M N ), we thus get (cid:112) (cid:107) M N (cid:107) (1 − (cid:107) M N (cid:107) ) ≥ (cid:113) λ ↓ ( M N ) λ ↓ ( M N ). Since M N is assumed to be a density operator, it moreover follows that (cid:107) M N (cid:107) ≤ 1, and thus (cid:112) | − (cid:107) M N (cid:107)| ≥ (cid:113) λ ↓ ( M N ) λ ↓ ( M N ) . (159)In the case that M N is the zero operator, then (159) is trivially true.By Lemma 24, we know that M ∞ almost surely is a density operator. By Lemma 27, we alsoknow that M ∞ almost surely is a rank-one operator. Hence, M ∞ almost surely corresponds to apure state. Consequently, (cid:107) M ∞ (cid:107) = 1 a.s. Combining this observation with the inverted triangleinequality yields (cid:112) (cid:107) M ∞ − M N (cid:107) ≥ (cid:112) |(cid:107) M ∞ (cid:107) − (cid:107) M N (cid:107)| = (cid:112) | − (cid:107) M N (cid:107)| a.s. (160)Combining (159) with (160) yields (cid:112) (cid:107) M ∞ − M N (cid:107) ≥ (cid:113) λ ↓ ( M N ) λ ↓ ( M N ) a.s. (161)We next observe that ν ↓ k W N (cid:113) Tr( W † N W N ) = (cid:118)(cid:117)(cid:117)(cid:116) λ ↓ k (cid:32) W † N W N Tr( W † N W N ) (cid:33) = (cid:113) λ ↓ k ( M N ) . (162)Thus, (161) and (162) yields (cid:112) (cid:107) M ∞ − M N (cid:107) ≥ ν ↓ ( W N ) ν ↓ ( W N )Tr( W † N W N ) a.s., (163)43hich, by Lemma 13, results in E (cid:0)(cid:112) (cid:107) M ∞ − M N (cid:107) (cid:1) D ≥ E (cid:32) ν ↓ ( W N ) ν ↓ ( W N )Tr( W † N W N ) (cid:33) D = w ( N ) . (164)By Lemma 24, we know that M N → M ∞ almost surely. Since the underlying Hilbert space isfinite-dimensional, we then have (cid:107) M ∞ − M N (cid:107) → a.s. , and consequently x N := (cid:112) (cid:107) M ∞ − M N (cid:107) → a.s. (165)We next observe that M N is a density operator, or the zero operator, and thus (cid:107) M N (cid:107) ≤ 1. Hence, (cid:107) M ∞ − M N (cid:107) ≤ (cid:107) M ∞ (cid:107) + (cid:107) M N (cid:107) ≤ (cid:107) M ∞ (cid:107) , which yields x N = (cid:112) (cid:107) M ∞ − M N (cid:107) ≤ (cid:112) (cid:107) M ∞ (cid:107) ≤ (cid:107) M ∞ (cid:107) =: y . (166)By Lemma 24 we know that E ( (cid:107) M ∞ (cid:107) ) < + ∞ , and thus E ( y ) = 1 + E ( (cid:107) M ∞ (cid:107) ) < + ∞ . Byusing this observation and Eqs. (165) and (166) into Proposition 16, we can conclude thatlim N →∞ E ( (cid:112) (cid:107) M ∞ − M N (cid:107) ) = 0. By combining this with (164), it follows that lim N →∞ w ( N ) = 0.Hence, we can conclude that statement 1 implies statement 2. ⇒ We first make the observation that (cid:107) ∧ ( A x N + M · · · A x ) (cid:107) ≤ (cid:107) ∧ ( A x N + M · · · A x N +1 ) (cid:107)(cid:107) ∧ ( A x N · · · A x ) (cid:107) , (167)which in turn yields w ( N + M ) ≤ w ( M ) w ( N ). Hence, w is submultiplicative, and thus log w ( N )is subadditive. By statement 2 we know that lim N →∞ w ( N ) = 0. It follows that there exists a N ∈ N such that log w ( N ) < 0. Hence, since log w ( N ) is subadditive, it follows by Lemma 29that 0 > log w ( N ) N ≥ inf N log w ( N ) N = lim N →∞ log w ( N ) N . (168)In the case that the limit is finite, let l := lim N →∞ log w ( N ) N . By definition of the limit, we knowthat for any (cid:15) > 0, there exists an N (cid:15) such that [log w ( N )] /N − l ≤ (cid:15) for all N ≥ N (cid:15) . We choosean arbitrary but fixed (cid:15) > 0, and thus w ( N ) ≤ γ N for all N ≥ N (cid:15) , where γ := e l + (cid:15) . Define C (cid:48) := max (cid:8) , max N =1 ,...,N (cid:15) w ( N ) /N (cid:9) , and thus (158) holds.Let us finally consider the case that lim N →∞ log w ( N ) N = −∞ . This means that for every a > N a such that [log w ( N )] /N ≤ − a for all N ≥ N a , which we can easily rewrite as w ( N ) ≤ e − aN . Hence, with γ := e − a and C (cid:48) := max { , max N =1 ,...,N a w ( N ) /N } we again obtain(158). We can conclude that statement 2 implies statement 3. ⇒ This implication is trivial. ⇒ In our first step, we show that lim N →∞ w ( N ) = 0 implies that (cid:107) M ∞ (cid:107) = 1 a.s. Wefirst observe that if η is a density operator on a complex Hilbert space with finite dimension D ,then 1 − (cid:107) η (cid:107) ≤ (cid:112) ( D − D (cid:113) λ ↓ ( η ) λ ↓ ( η ). We know that M N is either a density operator, or thezero operator, and thus 1 − (cid:107) M N (cid:107) ≥ 0. We moreover know that M N almost surely is a densityoperator. With x := 1 − (cid:107) M N (cid:107) and y := (cid:112) D ( D − (cid:113) λ ↓ ( M N ) λ ↓ ( M N ), we can use the aboveobservations to conclude that 0 ≤ x ≤ y a.s. Moreover, by Lemma 13, we obtain1 − E ( (cid:107) M N (cid:107) ) = E (1 − (cid:107) M N (cid:107) ) ≤ D (cid:114) D − D E (cid:18)(cid:113) λ ↓ ( M N ) λ ↓ ( M N ) (cid:19) . (169)44ext we note that the observation in (162) yields E (cid:18)(cid:113) λ ↓ ( M N ) λ ↓ ( M N ) (cid:19) D = E (cid:32) ν ↓ ( W N ) ν ↓ ( W N )Tr( W † N W N ) (cid:33) D = w ( N ) . (170)By combining (169) and (170), one obtains 1 − E ( (cid:107) M N (cid:107) ) ≤ w ( N ) (cid:112) ( D − /D . By the assumptionthat lim N →∞ w ( N ) = 0, it follows that lim N →∞ E ( (cid:107) M N (cid:107) ) = 1. By (125) in Lemma 24, we knowthat E ( (cid:107) M N (cid:107) ) → E ( (cid:107) M ∞ (cid:107) ). We can thus conclude that E ( (cid:107) M ∞ (cid:107) ) = 1. With x := 1 − (cid:107) M ∞ (cid:107) ,it follows that E ( x ) = 0. Since M ∞ is almost surely a density operator, it follows that 1 ≥ (cid:107) M ∞ (cid:107) almost surely. Hence, x = 1 − (cid:107) M ∞ (cid:107) ≥ a.s. By combining this observation and E ( x ) = 0 withLemma 15, we obtain x = 0 a.s. , and thus (cid:107) M ∞ (cid:107) = 1 a.s. By Lemma 24, we know that M ∞ almost surely is a density operator. If M ∞ is a density operator, then M ∞ is a rank one operator ifand only if (cid:107) M ∞ (cid:107) = 1. We can thus conclude that rank( M ∞ ) = 1 a.s. By Lemma 28, this impliesthat { A x } d − x =0 satisfies the purity condition in Definition 3. Hence, statement 2 implies statement1. D.5 Generalization to F and σ The entire proof has up to this point concerned the exponential decay of w ( N ), while we actuallywish to find conditions for the exponential decay of f ( N ). Here, we find necessary as well assufficient conditions for the exponential decay of f ( N ). We state a slightly more elaborate versionof Proposition 5. Proposition 31. Let { A x } d − x =1 be linear operators on the finite-dimensional complex Hilbert space H , such that (cid:80) d − x =1 A † x A x = . For operators σ and F on H , define f ( N ) := d − (cid:88) x N ,...,x =1 ν ↓ ( F A x N · · · A x √ σ ) ν ↓ ( F A x N · · · A x √ σ ) . (171) If { A x } d − x =1 satisfies the purity condition in Definition 3, then there exist real constants ≤ c and < γ < , which satisfy f ( N ) ≤ cγ N , ∀ N ∈ N , (172) for all density operators σ , and all F such that F † F ≤ .Conversely, if f is defined with respect to some full-rank operators σ and F , such that thereexist constants ≤ c σ,F and < γ < which fulfill f ( N ) ≤ c σ,F γ N , ∀ N ∈ N , (173) then { A x } d − x =1 satisfies the purity condition.Proof. We begin by proving the first claim of the proposition. We first note that f ( N ) = (cid:88) x ,...,x N (cid:107) ∧ ( F A x N · · · A x √ σ ) (cid:107) , ≤(cid:107) ∧ ( F ) (cid:107)(cid:107) ∧ ( √ σ ) (cid:107) w ( N ) , ≤(cid:107) F (cid:107) (cid:107)√ σ (cid:107) w ( N ) , ≤ w ( N ) , (174)45here w is as defined in (157), and where the next to last inequality follows from (155). The lastinequality follows since σ is assumed to be a density operator, and thus (cid:107)√ σ (cid:107) ≤ 1, and similarly F † F ≤ implies (cid:107) F (cid:107) ≤ 1. If { A x } d − x =0 satisfies the purity condition, then it follows by Proposition30 that w ( N ) ≤ C (cid:48) γ N . By combining this observation with (174), we obtain (172) with c := C (cid:48) .Note that Proposition 30 makes no reference to F or σ , and thus c is independent of these.Next, we turn to the second claim of the proposition. For this purpose, we first note that since F and σ (and thus √ σ ) are full-rank operators on a finite-dimensional space, it follows that F − and √ σ − exist. With w as defined in (157), we thus find w ( N ) = (cid:88) x ,...,x N (cid:107) ∧ ( A x N · · · A x ) (cid:107) ≤ (cid:107) ∧ ( F − ) (cid:107)(cid:107) ∧ ( √ σ − ) (cid:107) f ( N ) . (175)Hence, with c (cid:48) σ,F := (cid:107) ∧ ( F − ) (cid:107)(cid:107) ∧ ( √ σ − ) (cid:107) , we get w ( N ) ≤ c (cid:48) σ,F f ( N ). Combined with the assump-tion (173), it follows that w ( N ) ≤ c σ,F c (cid:48) σ,F γ N , ∀ N ∈ N , (176)where w is as defined in Proposition 30, and 0 < γ < 1. With C (cid:48) := c σ,F c (cid:48) σ,F in Proposition 30, itfollows that { A x } d − x =1=1