[PDF] Classical restrictions of generic matrix product states are quasi-locally Gibbsian

Abstract

We show that the norm squared amplitudes with respect to a local orthonormal basis (the classical restriction) of finite quantum systems on one-dimensional lattices can be exponentially well approximated by Gibbs states of local Hamiltonians (i.e., are quasi-locally Gibbsian) if the classical conditional mutual information (CMI) of any connected tripartition of the lattice is rapidly decaying in the width of the middle region. For injective matrix product states, we moreover show that the classical CMI decays exponentially, whenever the collection of matrix product operators satisfies a 'purity condition'; a notion previously established in the theory of random matrix products. We furthermore show that violations of the purity condition enables a generalized notion of error correction on the virtual space, thus indicating the non-generic nature of such violations. The proof of our main result makes extensive use of the theory of random matrix products, and may find applications elsewhere.

Full PDF

CClassical restrictions of generic matrix product states arequasi-locally Gibbsian

Yaiza Aragon´es-Soria , Johan ˚Aberg , Chae-Yeun Park , and Michael J. Kastoryano , , Institute for Theoretical Physics, University Cologne, Germany Amazon Quantum Solutions Lab, Seattle, Washington 98170, USA AWS Center for Quantum Computing, Pasadena, California 91125, USA

October 23, 2020

Abstract

We show that the norm squared amplitudes with respect to a local orthonormal basis (theclassical restriction) of ﬁnite quantum systems on one-dimensional lattices can be exponentiallywell approximated by Gibbs states of local Hamiltonians (i.e., are quasi-locally Gibbsian) if theclassical conditional mutual information (CMI) of any connected tripartition of the lattice israpidly decaying in the width of the middle region. For injective matrix product states, wemoreover show that the classical CMI decays exponentially, whenever the collection of matrixproduct operators satisﬁes a ‘purity condition’; a notion previously established in the theory ofrandom matrix products. We furthermore show that violations of the purity condition enablesa generalized notion of error correction on the virtual space, thus indicating the non-genericnature of such violations. The proof of our main result makes extensive use of the theory ofrandom matrix products, and may ﬁnd applications elsewhere.

Considerable eﬀort has been devoted to understanding the entanglement properties of many-bodyquantum states. For ﬁnite one-dimensional lattices systems, the theory of Matrix Product States(MPSs) provides a complete framework for describing entanglement of gapped many-body systems[1], and allows for eﬃcient high precision simulations via the DMRG algorithm [2, 3, 4]. Similarlyimpressive degrees of numerical precision can be reached in other settings, such as disorder [5], opensystems [6], time evolution [7], or critical systems [8]. The success of these simulation methods canbe traced back to the accurate parametrization of entanglement in MPS. There exist extensions tolattices of higher dimensions (projected entangled pair states) but these have been far less usefulfor simulations, due to their extensive entanglement growth.In contrast, quantum Monte-Carlo simulations are largely based on heuristic assumptions onthe weights and phases of the underlying state. Indeed, if the system under study can be cast in aform with only positive weights, then Monte-Carlo methods often work well, though convergenceguarantees are only known in very special cases [9]. This in turn is believed to be due to the localGibbsian nature of the classical restriction of the state. Classical Monte-Carlo sampling is knownto converge rapidly for Ising type problems [10, 11], while quantum variational Monte-Carlo isoften successful when using a locally restricted Gibbs Ansatz, such as the Jastrow-Ansatz. Furtherevidence of the importance of locality in the Ansatz wavefunction has been observed for more1 a r X i v : . [ qu a n t - ph ] O c t xpressive Ans¨atze, such as the complex Restricted Boltzmann machine [12], where the activationsnaturally preserve locality in many cases. Hence, whereas tensor network states explicitly encodethe local entanglement structure in their construction, quantum (variational) Monte Carlo implicitlyinvokes locality through the pervasive Gibbsian nature of probability distributions.Here, we connect these two pictures by showing that generic injective MPS [13] have classicalrestrictions that are quasi-locally Gibbsian. More precisley, we here refer to a probability distribu-tion as locally Gibbsian if it can be written as the equilibrium distribution of a local Hamiltonian,i.e., as a sum of terms that each spans at most (cid:96) adjacent sites. Well known examples include theIsing and Potts models. We similarly say that a distribution is quasi-locally Gibbsian, if it can beapproximated by Gibbs distributions corresponding to local Hamiltonians h (cid:96) , where the error of theapproximation in some sense decays exponentially with increasing (cid:96) . Such notions appear in variousguises in the literature, e.g., [14], which requires that the coeﬃcients in the cluster expansion oflog( p ) are rapidly decaying with the order of the cluster.As the ﬁrst step towards proving the generic quasi-local Gibbs property of injective MPS, weshow (in Section 4) that probability distributions on a one-dimensional lattice with open boundaryconditions are quasi-locally Gibbsian if the Conditional Mutual Information (CMI) between anytripartition of the lattice is decaying rapidly in the width of the middle region. The stronger thedecay of the CMI, the more local the Gibbs distribution. In the case of zero correlation length,the distribution is (strictly) locally Gibbs [15]. A number of recent studies in quantum informationtheory have revealed connections between the CMI and the Gibbsian nature of density matrices.In Ref. [16], the authors show that the quantum CMI of a full rank density matrix on a one-dimensional lattice is small if and only if the state is Gibbsian. The Gibbsian nature of states hasimportant implications for the nature of edge states of topologically ordered systems [17, 18]. Ourresults show that similar equivalences hold for classical restrictions of quantum states.The second step towards establishing the generic quasi-locality is also our main result; that theclassical restriction of injective MPS have an exponentially decaying CMI if the matrix productoperators satisfy a condition referred to as purity (see Def. 3). This condition has previously beenshown [19, 20] to imply the ‘puriﬁcation’ of quantum trajectories resulting from the applications ofsequences of random matrices on an initial state. In our setting, the classical CMI can be rewrittenin terms of the expected entanglement entropy after measurements on the conditional subsystem.A vanishing entanglement entropy is thus equivalent to the puriﬁcation of the state-trajectoryinduced by the sequence of measurements on the virtual system. The puriﬁcation of trajectoriesimplies that the system asymptotically jumps between pure states of a speciﬁc stationary measure,irrespective of what (mixed) state the system started in. We are currently not aware of a meaningfuloperational interpretation of the stationary stochastic process, and believe it to be quite hard toevaluate in practice [21]. Furthermore, and perhaps counter-intuitively, we observe that the rate ofdecay towards the stationary measure is unrelated to the gap of the transfer operator of the MPS.We moreover do not know of a closed functional form for the decay rate, in terms of the matrixproduct operators.One may note that our setting, which focuses on the degree of conditional post-measuremententanglement, is closely related to the notion of localizable entanglement [22, 23, 24]. The latteris obtained by optimizing the measurements over all possible local bases, while we consider a ﬁxedbasis. However, to the best of our knowledge, a general proof of the exponential decay of thelocalizable entanglement has not been shown previously.As a further attempt to gain a better understanding of the purity condition, we moreover in-vestigate the conspicuous similarity between (the violation of) the purity condition (see Def. 3)2nd the Knill-Laﬂamme error correction condition [25]. Indeed, we ﬁnd (in Section 6.2) that thepurity-condition can be regarded as the non-existence of a non-trivial correctable subspace thatpersists indeﬁnitely throughout iterated applications of an error-model, in a somewhat unconven-tional error correction scenario. One may note that invariant subspaces are special cases of suchcorrectable spaces. As an example, MPS with symmetry-protected topological order are associatedto invariant subspaces [26] and would thus violate the purity condition.The proof of our main theorem relies heavily on the theory of random matrix products, and inparticular on the work of Benoist et. al. [19] and Maassen et. al. [20]. Since these results involvenotions from probability theory that likely are unfamiliar to most of the quantum informationcommunity, we reproduce in the appendix many of the basic results in a language that should bemore familiar to the quantum-information reader. We hope that this will facilitate the access to arich and extensive body of work that should see many more applications in the ﬁelds of quantuminformation and many body physics. For instance, the theory of random matrix products hasrecently been leveraged in a diﬀerent setting, to show ergodicity for ensembles of quantum channels[27, 28].Concerning the structure of the paper, we begin by introducing the notation in Section 2,while Section 3 focuses on the central object in this investigation, namely the CMI with respectto classical restrictions of MPS. Section 4 presents the ﬁrst result of the paper: an exponentiallydecaying CMI implies quasi-local Gibbs distributions. Section 5 is devoted to the main result,namely the exponentially decaying CMI for a broad class of MPS. Section 6 provides examplesand observations, where we in Section 6.1 observe that MPS corresponding to symmetry protectedphases violate the purity condition. In Section 6.2 we further investigate the purity condition andshow that its violation can be regarded as a type or error-correction condition. Section 6.3 comparesthe convergence rate of the CMI with the rate of the converge to the ﬁxed point of the transferoperator. Concrete examples are provided in Section 6.4. We ﬁnish with an outlook in Section 7. We consider pure states deﬁned on a ﬁnite one-dimensional lattice, Λ, and associate a ﬁnitedimensional Hilbert space of dimension d to each site. We index the sites of the lattice ac-cording to a tripartition of the lattice Λ = ABC as follows: we denote sites in region A as −| A | + 1 , −| A | + 2 , . . . , − ,

0; sites in region B as 1 , . . . , N ; and sites in region C as N + 1 , . . . , | BC | .This peculiar indexing of sites will make sense later on when considering the CMI for MPSs.Figure 1: We consider a MPS on a ﬁnite lattice, Λ, which is broken up into three contiguous regionssuch that Λ = ABC . We denote sites in region A as −| A | + 1 , −| A | + 2 , . . . , − ,

0; sites in region B as 1 , . . . , N ; and sites in region C as N + 1 , . . . , | BC | .Let | x Λ (cid:105) = | x −| A | +1 , . . . , x , x , . . . , x N , . . . , x | BC | (cid:105) be a local orthonormal basis, where {| x i (cid:105)} d − x i =0

3s the local basis at site i . Unless speciﬁed otherwise, we will be working with translationallyinvariant MPSs with open boundary conditions | Ψ (cid:105) = 1 K d − (cid:88) x −| A | +1 ,...,x | BC | =0 (cid:104) R | A x | BC | · · · A x −| A | +1 | L (cid:105)| x −| A | +1 · · · x | BC | (cid:105) , (1)where K is a normalization factor. Here, A x i are D × D matrices encoding correlations in the systemand | L (cid:105) and | R (cid:105) are normalized states on the D -dimensional virtual space specifying the boundaryconditions, where D is known as the bound dimension of the MPS. Without loss of generality,we consider (left-)normalized MPS, which enforces that (cid:80) d − x i =0 A † x i A x i = . Left normalizationguarantees that the completely positive map E ( · ) := d − (cid:88) x i =0 A x i · A † x i (2)is trace preserving. The map E is often referred to as the transfer operator and maps densitymatrices on the virtual space to density matrices from left to right. The adjoint map, E ∗ , mapsoperators from right to left along the chain. Our choice of boundary conditions serves mainly fornotational simplicity. The results in the paper extend naturally to periodic or mixed boundaryconditions. For periodic boundary conditions, the regions ABC need to be chosen diﬀerently toensure that B separates A from C .The normalization constant can be expressed concisely as K = Tr (cid:2) E | Λ | ( L ) R (cid:3) , where forshorthand notation we write R = | R (cid:105)(cid:104) R | and L = | L (cid:105)(cid:104) L | . Classical Restrictions

For a given local basis {| x Λ (cid:105)} , we deﬁne the quantum channelΦ Λ ( ψ ) = (cid:88) x Λ | x Λ (cid:105)(cid:104) x Λ |(cid:104) x Λ | ψ | x Λ (cid:105) . (3)In other words, Φ Λ generates a state that is diagonal with respect to the basis {| x Λ (cid:105)} , by deletingthe oﬀ-diagonal elements of the input ψ . We refer to Φ Λ as the classical restriction (also commonlyreferred to as a ‘dephasing map’ or ‘pinching’). Since Φ Λ ( ψ ) is diagonal, the map Φ Λ eﬀectivelydeﬁnes a classical probability distribution, p ψ ( x Λ ) = (cid:104) x Λ | ψ | x Λ (cid:105) , for any choice of basis {| x Λ (cid:105)} .We also consider the channel that measures a subset of systems B ⊂ Λ and we denote it asΦ B ( ψ ) = (cid:88) x B | x B (cid:105)(cid:104) x B |(cid:104) x B | ψ | x B (cid:105) , (4)= (cid:88) x B p ψ ( x B ) ψ ( x B ) , (5)with | x B (cid:105) = (cid:78) i ∈ B | x i (cid:105) . Here, the channel Φ B similarly deﬁnes a classical probability distributionon the sites in B by p ψ ( x B ) = (cid:104) x B | ψ B | x B (cid:105) , where ψ B := Tr Λ \ AC ψ is the reduced state of ψ on B .Note that p ψ ( x B ) = (cid:88) x AC p ψ ( x ) , (6)where recall that Λ = ABC . Moreover, we refer to the post-measurement state after obtaining themeasurement outcome x B as ψ ( x B ) = 1 (cid:112) p ψ ( x B ) | x B (cid:105)(cid:104) x B | ⊗ (cid:104) x B | ψ | x B (cid:105) . (7)4onsider now the MPS deﬁned in Eq. (1). The probability distribution on B is p Ψ ( x B ) = 1 K Tr (cid:104) A x N · · · A x E | A | ( L ) A † x · · · A † x N E ∗| C | ( R ) (cid:105) , (8)where E n is understood as convolution of the map and x B := x , . . . , x N , with x i = 0 , . . . , d − E [29]. The latter means that E hasa unique full-rank ﬁxed point, i.e., there exists a unique full-rank density operator ρ such that E ( ρ ) = ρ . A consequence of the injectivity of the MPS is thus that lim | A |→∞ E | A | ( χ ) = ρ Tr( χ ),lim | C |→∞ E ∗| C | ( Q ) = Tr( Qρ ), and lim | A |→∞ , | C |→∞ K = Tr( Rρ ) (cid:54) = 0. Hence, if region B is keptﬁxed, while regions A and C both grow to inﬁnity, the probability distribution (8) on B reduces to p Ψ ( x B ) = Tr (cid:104) A x N · · · A x ρA † x · · · A † x N (cid:105) . (9) Throughout the paper, we use a number of entropic quantities, which we introduce in this section.In particular, we switch back and forth between classical and quantum systems. The quantumvon Neumann entropy of a mixed state, χ , is denoted as S ( χ ) = − Tr χ log χ , while the classicalentropy is referred to as H ( p ) = − (cid:80) x p ( x ) log p ( x ) for a classical probability distribution p ( x ).Here, log denotes the natural logarithm. In the next section, we use the classical relative entropyas a measure of distinguishability between probability distributions. The classical relative entropyof p ( x ) with respect to p ( x ) is deﬁned as S ( p || p ) = (cid:88) x p ( x ) log (cid:20) p ( x ) p ( x ) (cid:21) . (10)The (quantum) CMI between regions A and C conditioned on region B , is given by I χ ( A : C | B ) = S ( χ AB ) + S ( χ BC ) − S ( χ B ) − S ( χ ABC ) . (11)After applying the classical conditioning map, Φ Λ in Eq. (3), on a quantum state, χ , we get theclassical CMI I Φ( χ ) ( A : C | B ) = I p χ ( A : C | B ) = H ( p χ,AB ) + H ( p χ,BC ) − H ( p χ,B ) − H ( p χ,ABC ) , (12)where p χ,A := p χ ( x A ) = (cid:104) x A | χ A | x A (cid:105) .We now point out an important observation on the CMI [30]. Suppose that we have a purestate, ψ = | ψ (cid:105)(cid:104) ψ | , and we measure all spins in region B . Then, the quantum CMI of the post-measurement state satisﬁes I p ψ,B ( A : C | B ) ≤ I ψ ( x B ) ( A : C | B ) , (13)= (cid:104) S [ ψ A ( x B )] (cid:105) p ψ ( x B ) + (cid:104) S [ ψ C ( x B )] (cid:105) p ψ ( x B ) , = 2 (cid:104) S [ ψ C ( x B )] (cid:105) p ψ ( x B ) , (14)where the state ψ X ( x B ) is the reduced state in region X of the post-measurement state, ψ ( x B ),and (cid:104) S [ ψ ( x )] (cid:105) p ψ ( x ) is the average von Neumann entropy of ψ ( x ) over p ψ ( x ), i.e., (cid:104) S [ ψ ( x )] (cid:105) p ψ ( x ) := (cid:88) x p ψ ( x ) S [ ψ ( x )] . (15)5he inequality in Eq. (13) comes from monotonicity of the relative entropy. Note that S [ ψ A ( x B )] = S [ ψ C ( x B )] since (cid:104) x B | ψ | x B (cid:105) p ψ ( x B ) is a pure state on the bipartition AC . Eq. (13) allows usto characterise the states that have a small post-measurement CMI by ﬁnding the states that havea small average entropy of ψ C ( x B ).Let us now go back to the MPS described in the previous section. With the injective MPSin the canonical form of Eq. (1), it can be shown that the reduced state of the post-measurementstate, Ψ C ( x B ), is (up to zero eigenvalues) isospectral to1 p Ψ ( x B ) K (cid:113) E ∗| C | ( R ) A x N · · · A x E | A | ( L ) A † x · · · A † x N (cid:113) E ∗| C | ( R ) . (16)The average von Neumann entropy of the reduced state of a post-measurement translationallyinvariant injective MPS is then (cid:104) S [Ψ C ( x B )] (cid:105) p Ψ ( x B ) = (cid:88) x B p Ψ ( x B ) S (cid:32) p Ψ ( x B ) K F A x N · · · A x σA † x · · · A † x N F † (cid:33) , (17)where σ := E | A | ( L ) and F † F := E ∗| C | ( R ). Eq. (17) will be the main object of study throughoutthis paper. As mentioned earlier, a translationally invariant injective MPS results in a primitivechannel E . On a ﬁnite-dimensional space, this implies that for suﬃciently large | A | and | C | , itfollows that both σ and F are full-rank operators. We also recall that lim | A |→∞ E | A | ( χ ) = ρ Tr( χ ),lim | C |→∞ E ∗| C | ( Q ) = Tr( Qρ ), and lim | A |→∞ , | C |→∞ K = Tr( Rρ ) (cid:54) = 0, and consequently (17)reduces to (cid:104) S [Ψ C ( x B )] (cid:105) p Ψ ( x B ) = (cid:88) x B p Ψ ( x B ) S (cid:32) A x N · · · A x ρA † x · · · A † x N p Ψ ( x B ) (cid:33) , (18)for inﬁnite chains. In this section we consider probability distributions p ,..., | Λ | on ﬁnite one-dimensional lattices, Λ,and discuss conditions for when these can be well approximated by Gibbs distributions of localHamiltonians. We say that a Hamiltonian is (cid:96) -local if it can be written as a sum of terms thateach span at most (cid:96) consecutive sites. A distribution is (cid:96) -local if it is the Gibbs distribution ofsome (cid:96) -local Hamiltonian. In a similar spirit, we say that a distribution is quasi-locally Gibbs if itcan be approximated by (cid:96) -local distributions, where the error of this approximation in some sensedecays fast with respect to increasing (cid:96) . In this section we show that, if the CMI I p ,..., | Λ | ( A : C | B )of the distribution p ,..., | Λ | decays suﬃciently fast with increasing size | B | of the bridging regionin a contiguous tripartition Λ = ABC of the lattice, then p ,..., | Λ | is quasi-locally Gibbs. (Forconvenience we change the notation in this section and enumerate the sites of the entire lattice as1 , . . . , | Λ | .) This result is similar in spirit to Kozlov’s theorem [14] (see also [30]). Although thissection exclusively focuses on probability distributions, the application to quantum states becomesapparent in Section 5, where we consider classical restrictions of underlying injective MPS andshow that these generically are quasi-locally Gibbsian.6et p ,..., | Λ | be a probability distribution over a ﬁnite sub-chain Λ of a one-dimensional lattice.We let p j denote the marginal distribution at site j . For 1 ≤ j < k ≤ | Λ | we let p j,...,k denote themarginal distribution of the chain j, . . . , k . In the following, we assume that p ,..., | Λ | ( x , . . . , x | Λ | ) > , ∀ x , . . . , x | Λ | , (19)which consequently leads to p j,...,k ( x j , . . . , x k ) >

0. With these assumptions, we can deﬁne h j,...,k := − log p j,...,k , ≤ j ≤ k ≤ | Λ | , (20)and thus p j,...,k = e − h j,...,k . Hence, we have constructed h j,...,k such that p j,...,k is Gibbs distributedwith respect to h j,...,k , with β = 1 in e − βh j,...,k /Z ( h j,...,k ), where one may note that Z ( h j,...,k ) := (cid:80) x j ,...,x k e − h j,...,k ( x j ,...,x k ) = 1. For j ≤ k we deﬁne H j,k :=  h j if k = j, | Λ | ≥ j ≥ ,h j,j +1 − h j − h j +1 if k = j + 1 , | Λ | − ≥ j ≥ ,h j +1 ,...,k − + h j,...,k − h j,...,k − − h j +1 ,...,k if | Λ | − ≥ k − ≥ j ≥ . (21)Next, we deﬁne h (cid:96) ,..., | Λ | := (cid:88) ≤ j ≤ k ≤| Λ | ,k − j ≤ (cid:96) H j,k . (22)Hence, h (cid:96) ,..., | Λ | only includes the terms H j,k for which the range of the sub-chain j, . . . , k does notexceed (cid:96) . In other words, h (cid:96) ,..., | Λ | is a (cid:96) -local Hamiltonian. The associated (cid:96) -local Gibbs distributionis p (cid:96) ,..., | Λ | ( x , . . . , x | Λ | ) := e − h (cid:96) ,..., | Λ | ( x ,...,x | Λ | ) Z ( h (cid:96) ,..., | Λ | ) , (23)with Z ( h (cid:96) ,..., | Λ | ) := (cid:88) x (cid:48) ,...,x (cid:48)| Λ | e − h (cid:96) ,..., | Λ | ( x (cid:48) ,...,x (cid:48)| Λ | ) . The following proposition expresses the classical relative entropy (see Eq. (10)) between the Gibbsdistribution p ,..., | Λ | associated to the full Hamiltonian, h ,..., | Λ | , and the Gibbs distribution p (cid:96) ,..., | Λ | associated to the l -local Hamiltonian, h (cid:96) ,..., | Λ | , in terms of the CMI between suitable regions of thechain. Hence, if the latter are suﬃciently small, then the approximating (cid:96) -local distribution p (cid:96) ,..., | Λ | is close to the original distribution p ,..., | Λ | . Proposition 1.

For p ,..., | Λ | ( x , . . . , x | Λ | ) > , let p (cid:96) ,..., | Λ | be as deﬁned in Eqns. (20-23). For ≤ (cid:96) ≤ | Λ | − it is the case that S ( p ,..., | Λ | (cid:107) p (cid:96) ,..., | Λ | ) = (cid:88) ≤ j ≤ k ≤| Λ | ,k − j>(cid:96) I ( j : k | j + 1 , . . . , k − . (24) Proof.

Let (cid:104)·(cid:105) p ,..., | Λ | denote the expectation value with respect to the distribution p ,..., | Λ | . One canconﬁrm that (cid:104) H j,k (cid:105) p ,..., | Λ | =  S ( p j ) if k = j, | Λ | ≥ j ≥ , − I ( j : k ) if k = j + 1 , | Λ | − ≥ j ≥ , − I ( j : k | j + 1 , . . . , k −

1) if | Λ | − ≥ k − ≥ j ≥ . (25)7ne can also conﬁrm that (cid:88) ≤ j ≤ k ≤| Λ | H j,k = h ,..., | Λ | . (26)With a somewhat lengthy but straightforward calculation, one can moreover show that Z ( h (cid:96) ,..., | Λ | ) = 1 , ≤ (cid:96) ≤ | Λ | − . (27)For 2 ≤ (cid:96) ≤ | Λ | −

2, we can combine the deﬁnition of the relative entropy, with the deﬁnitionof p (cid:96) ,..., | Λ | in (23), and h ,..., | Λ | = − log p ,..., | Λ | , as well as the fact that p ,..., | Λ | is a probabilitydistribution, to get S ( p ,..., | Λ | (cid:107) p (cid:96) ,..., | Λ | ) = − (cid:88) x ,...,x | Λ | p ,..., | Λ | ( x , . . . , x | Λ | ) h ,..., | Λ | ( x , . . . , x | Λ | )+ (cid:88) x ,...,x | Λ | p ,..., | Λ | ( x , . . . , x | Λ | ) h (cid:96) ,..., | Λ | ( x , . . . , x | Λ | )+ log Z ( h (cid:96) ,..., | Λ | ) , [By (27), (26), and the deﬁnition of h (cid:96) ,..., | Λ | in (22)]= − (cid:88) x ,...,x | Λ | p ,..., | Λ | ( x , . . . , x | Λ | ) (cid:88) ≤ j ≤ k ≤| Λ | H j,k ( x , . . . , x | Λ | )+ (cid:88) x ,...,x | Λ | p ,..., | Λ | ( x , . . . , x | Λ | ) (cid:88) ≤ j ≤ k ≤| Λ | ,k − j ≤ (cid:96) H j,k ( x , . . . , x | Λ | ) , = − (cid:88) ≤ j ≤ k ≤| Λ | ,k − j>(cid:96) (cid:104) H j,k (cid:105) p ,..., | Λ | , [By (25), for the case k − j > (cid:96) ≥ (cid:88) ≤ j ≤ k ≤| Λ | ,k − j>(cid:96) I ( j : k | j + 1 , . . . , k − . (28)Loosely speaking, the above proposition tells us that, if the CMIs I ( j : k | j + 1 , . . . , k −

1) insome sense decrease suﬃciently fast with increasing k − j , then the (cid:96) -local Gibbs distribution p (cid:96) ,..., | Λ | approaches the true distribution p ,..., | Λ | . The following lemma formalizes this intuition. Lemma 2.

Suppose that the probability distribution p ,..., | Λ | ( x , . . . , x | Λ | ) > is such that thereexists a monotonically decreasing function ξ : N → R , such that for every contiguous partition Λ =

ABC , it is the case that I p ( A : C | B ) ≤ ξ ( | B | ) . (29) Let p (cid:96) ,..., | Λ | be deﬁned in Eqns. (20-23). Then, for ≤ (cid:96) ≤ | Λ | − , we have S ( p ,..., | Λ | (cid:107) p (cid:96) ,..., | Λ | ) ≤| Λ | ξ ( (cid:96) ) . (30) Proof.

With the general observation that I ( A : C | B ) ≤ I ( A A : C C | B ), for A = A A and C = C C , we can use A = { , . . . , j − } , A = { j } , B = { j + 1 , . . . , k − } , C = { k } , and C = { k + 1 , . . . , | Λ |} in (24) and (29), which yields S ( p ,..., | Λ | (cid:107) p (cid:96) ,..., | Λ | ) ≤ (cid:88) ≤ j ≤ k ≤| Λ | ,k − j>(cid:96) ξ ( |{ j + 1 , . . . , k − }| ) . (31)8ith the observation that k − j > (cid:96) implies k − j − ≥ (cid:96) and thus |{ j + 1 , . . . , k − }| = k − j − ≥ (cid:96) ,together with the assumption that the function ξ is monotonically decreasing, we thus get S ( p ,..., | Λ | (cid:107) p (cid:96) ,..., | Λ | ) ≤ ξ ( (cid:96) ) (cid:88) ≤ j ≤ k ≤| Λ | ,(cid:96)

Deﬁnition 3 (Purity [19]) . Let { A x } d − x =0 be linear operators on a complex ﬁnite-dimensional Hilbertspace, H . We say that { A x } d − x =0 satisﬁes the purity condition if the following implication holds:If P is an orthogonal projector on H such that P A † x · · · A † x N A x N · · · A x P ∝ P, ∀ N ∈ N , ∀ ( x , . . . , x N ) ∈ { , . . . , d − } × N , then rank( P ) = 1 . (33)Note that the condition P A † x · · · A † x N A x N · · · A x P ∝ P is trivially true whenever P is a rank-one projector. Hence, the purity condition means that P A † x · · · A † x N A x N · · · A x P ∝ P only holdsfor rank-one projectors. The purity condition bears some resemblance to the Knill-Laﬂamme con-dition [25]. We discuss the relationship between the purity condition and error correction/detectionin Section 6.2.In terms of the purity condition, our main theorem is phrased as follows. Theorem 4.

Let Ψ be an injective MPS on a ﬁnite one-dimensional lattice, Λ , with ﬁnite bonddimension, D , and open boundary conditions. If the purity condition holds for the matrix product perators associated with a speciﬁc local basis, {| x (cid:105)} , then there exist constants > κ ≥ and c ≥ , such that for any three contiguous regions Λ =

ABC as in Fig. 1, we have I p Ψ ,B ( A : C | B ) ≤ cκ | B | . (34) The constants c and κ are independent of | A | , | B | , | C | , | L (cid:105) , and | R (cid:105) . The following gives an overview of the essential steps of the proof. For a more detailed account,see the proof of Theorem 11 in Appendix A.

Proof.

The ﬁrst step in proving Theorem 4 is to bound the post-measurement CMI, I p Ψ ,B ( A : C | B ),in terms of the quantity f ( N ), deﬁned below in Eq. (41). The second step is to show that f ( N )decays exponentially if the purity condition is satisﬁed; this step is shown independently in Prop.5. We relegate much of the technical details of the proof to the appendix to allow for a clearerpresentation of the main ideas.To start with, we bound the average entropy (Eq. (15)) in terms of a quantity that can beinterpreted as the average purity and we get (cid:104) S [Ψ C ( x B )] (cid:105) p Ψ ( x B ) ≤ − Q log Q + Q [1 + log ( D − , (35)with Q := 1 − (cid:80) x B p Ψ ( x B ) (cid:107) Ψ C ( x B ) (cid:107) . The proof, which is deferred to Lemma 6 in AppendixA, follows from concavity of the entropy functional. It is clear that exponential decay of Q impliesexponential decay of I Ψ( x B ) ( A : C | B ) by Eq. (13).Next, we show that Q can be bounded above by a function of the ordered singular values of thematrix product deﬁning the classical post-measurement MPS. Indeed, ﬁrst Lemma 7 in AppendixA establishes an upper bound on Q in terms of the average second eigenvalue of the matrix productin Eq. (17) as Q ≤ D − K (cid:88) x B λ ↓ (cid:16) F A x N · · · A x σA † x · · · A † x N F † (cid:17) , (36)where { λ ↓ j ( O ) } and { ν ↓ j ( O ) } denote the eigenvalues and singular values of an operator O in de-creasing order, i.e., λ ↓ ( O ) ≥ · · · ≥ λ ↓ D ( O ) and ν ↓ ( O ) ≥ · · · ≥ ν ↓ D ( O ), respectively.Then, recalling that for any operator O , we have λ j ( OO † ) = ν j ( O ) , we get Q ≤ D − K (cid:88) x B λ ↓ (cid:16) F A x N · · · A x σA † x · · · A † x N F † (cid:17) , (37) ≤ D − K (cid:88) x B (cid:113) λ ↓ ( F A x N · · · A x σA † x · · · A † x N F † ) λ ↓ ( F A x N · · · A x σA † x · · · A † x N F † ) , (38)= D − K (cid:88) x B ν ↓ ( F A x N · · · A x √ σ ) ν ↓ ( F A x N · · · A x √ σ ) =: D − K f ( N ) , (39)where recall that | B | = N .Next, we need to take into account of the fact that K depends on the size of the regions A , B and C , and in principle K could approach zero. However, the assumption that the MPS is injective,implies that E ( · ) = (cid:80) x A x · A † x is primitive, which means that E has a unique full-rank ﬁxed point.10he latter is used in Lemma 10 in Appendix A to show that for all suﬃciently large | B | there existsa number r > K = (cid:104) R | E | Λ | ( | L (cid:105)(cid:104) L | ) | R (cid:105) = (cid:104) R | E | A | + | B | + | C | ( | L (cid:105)(cid:104) L | ) | R (cid:105) ≥ r, (40)where r is independent of | A | , | C | , | L (cid:105) and | R (cid:105) . We use this to obtain an upper bound on Q thatonly depends on N via f ( N ).Finally, in Proposition 5 below, the function f ( N ) is shown to decay exponentially if the puritycondition holds. Moreover, the constants c and γ in the bound (42) can be chosen to be independentof | A | , | B | , | C | , which follows from the fact that c and γ are independent of σ and F .Note that the bound in Eq. (38) is likely quite sub-optimal. It is an interesting open questionwhether there exists a more direct bound of the average purity that does not rely on boundingthe function f ( N ). The main reason to work with f ( N ) rather than the average purity is because f ( N ) is explicitly submultiplicative.We now state the key proposition adapted from Ref. [19], and references therein. Proposition 5 ([19]) . Let { A x } d − x =0 be operators on a ﬁnite-dimensional complex Hilbert space, H ,such that (cid:80) d − x =0 A † x A x = . For operators σ and F on H , deﬁne f ( N ) := d − (cid:88) x ,...,x N =0 ν ↓ ( F A x N · · · A x √ σ ) ν ↓ ( F A x N · · · A x √ σ ) . (41) If { A x } d − x =1 satisﬁes the purity condition in Deﬁnition 3, then there exist real constants, ≤ c and < γ < , such that for all density operators σ , and all F such that F † F ≤ , it is the case that f ( N ) ≤ cγ N , ∀ N ∈ N . (42) Conversely, if there exists constants ≤ c and < γ < such that (42) holds for some σ and F that both are full-rank operators, then { A x } d − x =1 satisﬁes the purity condition. Theorem 4 provides the necessary bound ξ ( | B | ) = cκ | B | in Lemma 2 for showing the quasi-locality of the classical restriction p ,..., | Λ | ( x , . . . , x | Λ | ) = (cid:104) x Λ | Ψ | x Λ (cid:105) . Theorem 4 and Lemma 2 thusyield as a corollary (for a more exact formulation, see Corollary 12 in Appendix A) S ( p ,..., | Λ | (cid:107) p (cid:96) ,..., | Λ | ) ≤ c | Λ | κ (cid:96) , ≤ (cid:96) ≤ | Λ | − . (43)A simple example that leads to an exponential decay of the relative entropy is if (cid:96) is a constantfraction of | Λ | , i.e., (cid:96) = α | Λ | , < α < . (44)The result is that the relative entropy decays exponentially, and the family of (cid:96) -local distributionsthus approaches p ,..., | Λ | exponentially fast. The classical restriction of typical injective MPS is thusin this sense quasi-locally Gibbs. Here, we give a brief overview of the general structure and ideas behind the proof of Proposition5, i.e., that f ( N ) decays exponentially if { A x } x satisﬁes the purity condition. Although we do not11lways follow the exact same tracks, the essence of the proof is due to [19, 20], which we haveadapted to our particular setting and cast in a language that is hopefully more accessible to thequantum information theory community. The proof in the appendix is essentially self-contained,only referencing some standard results in the theory of Martingales, that can be found in a numberof classic textbook on the subject.As one may note from Eq. (41), the sequence f ( N ) not only depends on the operators A x , butalso on the operators σ and F . It turns out to be convenient to ﬁrst focus on the function w ( N ) = d − (cid:88) x ,...,x N =0 ν ↓ ( A x N · · · A x ) ν ↓ ( A x N · · · A x ) . (45)Once we have established the purity condition as a necessary and suﬃcient condition for exponentialdecay of w ( N ), we extend (Proposition 31 in Section D.5) this result to f ( N ), which thus yieldsthe statement of Proposition 5.The proof of the exponential convergence of w ( N ) is essentially done in two steps. First, it isshown that w ( N ) converges to zero. Next, it is shown that w ( N ) is submultiplicative, in the sensethat w ( N + M ) ≤ w ( N ) w ( M ), and thus log w ( N ) is subadditive. This observation is used, togetherwith Fekete’s subadditive lemma, to show that w ( N ) goes to zero exponentially fast. These stepsare incorporated into the proof of Proposition 30.The essential approach for proving that w ( N ) converges to zero is to interpret w ( N ) as theaverage over a stochastic process. This process can be viewed as the random measurement outcomes x , . . . , x N due to a repeated sequential measurement of the POVM { A † x A x } d − x =0 . (This process isdescribed more precisely in Appendix C.) For the proof, it is useful to introduce the operator M N = A † x · · · A † x N A x N · · · A x Tr( A † x · · · A † x N A x N · · · A x ) , (46)which thus depends on the sequence of random measurement outcomes x , . . . , x N . It turns out thatone can express w ( N ) in terms of M N via the relation w ( N ) = E (cid:0)(cid:113) λ ↓ ( M N ) λ ↓ ( M N ) (cid:1) D , where λ ↓ ( M N ) and λ ↓ ( M N ) denote the largest and the second largest eigenvalue of M N , respectively,and D the dimension of the underlying Hilbert space. Moreover, E denotes the expectation valueover all possible measurement outcomes. One can realize that M N is positive semi-deﬁnite, hastrace 1, and can thus be interpreted as a density operator. The main point is that if M N wouldbe a rank-one operator, and thus correspond to a pure state, then it follows that λ ↓ ( M N ) is zero.Intuitively, it thus seems reasonable that w ( N ) converges to zero if it is ‘suﬃciently likely’ that M N converges to a rank-one operator.The starting point for demonstrating that M N converges to a rank-one operator is to show(Lemma 23) that the sequence ( M N ) N ∈ N is a martingale relative to the sequence of measurementoutcomes ( x N ) N ∈ N . This enables us to show (Lemma 24) that ( M N ) N ∈ N almost surely convergesto a positive operator M ∞ . (All these notions are reviewed in Section B.) Once this is established,the bulk of the proof is focused on showing that M ∞ (almost surely) is a rank-one operator if andonly if { A x } d − x =0 satisﬁes the purity condition.The arguable least transparent part of the proof is how to show that the purity conditionis suﬃcient for M ∞ to be a rank-one operator. The ﬁrst part of the proof (Lemma 25) showsthat M N + p and M N in some sense ‘approach’ each other, even when conditioned on x , . . . , x N .The second part (Lemma 27) losely speaking shows that M N + p gives rise to a term of the form √ M N U † N A † x · · · A † x p A x p · · · A x U N √ M N for a unitary operator, U N , while M N gives rise to a12erm that is proptional to M N . As these operators approach each other when N approachesinﬁnity, one can use this to show that M ∞ U †∞ A † x (cid:48) · · · A † x (cid:48) p A x (cid:48) p · · · A x (cid:48) U ∞ M ∞ ∝ M ∞ U †∞ U ∞ M ∞ . (47)In a reformulation (Lemma 26) of the purity condition, the projector, P , is replaced by a generaloperator, O , again with the conclusion that O must be a rank-one operator. With O = U ∞ M ∞ itfollows that M ∞ is a rank-one operator.To conversely show (Lemma 28) that the purity condition is a necessary condition is somewhatless involved. By assuming that a projector P satisﬁes the proportionality in (33) while having arank larger than one, then it follows that the only way in which M ∞ can be a rank-one operator,is if P M ∞ P = 0. However, this leads to a contradiction with A x being such that (cid:80) Lk =1 A † x A x = .Remark. Using the same tools as above, Benoist et. al. show in [19] that the stochastic processdeﬁned in Appendix C equilibrates exponentially. It is worth noting that the average purity canconverge to zero much faster than the stochastic process. For instance, if σ is a rank-one operator,then it trivially follows that f ( N ) is identically zero for all N , irrespective of whether { A x } d − x =0 satisﬁes the purity condition or not. In this section, we discuss the purity condition and the decay of the CMI in the context of quantuminformation theory. We speciﬁcally study the behaviour of the CMI for symmetry-protected phasesand obtain that it remains constant. Moreover, the decay rate of the CMI is shown to be unrelatedto the decay of the transfer operator of the corresponding MPS by constructing two simple examples.The purity condition is discussed from the point of view of quantum error correction. Finally, wework out some examples.

We brieﬂy discuss what systems do not satisfy the purity condition and comment on the relationto symmetry-protected phases in one dimension.Consider an MPS, | Ψ (cid:105) , of the form of Eq. (1) with matrices A x i that have a tensor productdecomposition into two subsystems such that A x i = U x i ⊗ T x i , (48)where U x i is a unitary matrix and T x i is any matrix. Then, the reduced post-measurement state,Ψ C ( x B ), in the inﬁnite chain case (see Eq. (18)) is isospectral toΨ C ( x B ) (cid:39) p Ψ ( x B ) K ( U x N · · · U x ⊗ T x N · · · T x ) ρ (cid:16) U † x · · · U † x N ⊗ T † x · · · T † x N (cid:17) . For simplicity, let us further consider the case where the unique ﬁxed point of the transfer operatoris proportional to the identity, i.e., ρ = D D , where D and D are the dimensions of the twosub-systems respectively. We obtainΨ C ( x B ) (cid:39) D D p Ψ ( x B ) K (cid:16) ⊗ T x N · · · T x T † x · · · T † x N (cid:17) . S [Ψ C ( x B )] = log D + S (cid:18) p Ψ ( x B ) D K T x N · · · T x T † x · · · T † x N (cid:19) . Consequently, the average entropy of entanglement of Ψ C ( x B ), and thus the post-measurementCMI, I Ψ( x B ) ( A : C | B ) (see Eq. (13)), always has a constant contribution independent of the lengthof the middle region B . More generally, the CMI is non-vanishing for MPS in a basis where thematrices A x i can be isometrically mapped to a form as in Eq. (48) [24].It was shown in Ref. [26] that for a symmetry-protected phase in the MPS framework, therealways exists a local basis in which the matrices have the form of Eq. (48), with the additionalproperty that the unitary matrices form a representation of the symmetry group. The AKLT model(see Sec. 6.4.1) is such an example. In this section we will explore the purity condition (see Def. 3) in more detail. The purity conditionstates that the only projectors P that satisfy P A † x · · · A † x N A x N · · · A x P ∝ P, (49)for all N ∈ N , and all ( x , . . . , x N ) ∈ { , . . . , d − } × N , are those that have rank one. Here weinvestigate the relation between this condition (or rather the violation of it) and the Knill-Laﬂammeerror correction condition [25].Suppose that there exits a projector, P , onto a subspace, C , with dim C ≥

P A † x A x P = λ x P, ∀ x. (50)This looks suspiciously similar to the Knill-Laﬂamme error correction condition, which is P A † x A y P = c xy P, ∀ x, y. (51)The question is how one can understand the apparent similarity between Eq. (50) and (51). Tothis end, let us ﬁrst recall the error correction scenario. If A x are operators on a Hilbert space, H ,with (cid:80) x A † x A x = , we deﬁne the corresponding noise channel E ( χ ) := (cid:88) x A x χA † x . (52)For any state, χ , with support on the subspace C ⊆ H , it is the case that E ( χ ) can be restored to χ if and only if (51) is true. More precisely, there exists a recovery operation, R , (that does notdepend on χ ) such that R ◦ E ( χ ) = χ for all density operators χ with support on C .It turns out that Eq. (50) is also a necessary and suﬃcient condition for error correction, butfor a diﬀerent type of error-model. The channel E , in the standard error-correction scenario, isthe eﬀect of a unitary evolution that acts on H and on an environment, H E , where the latter isinaccessible to us. In the alternative scenario, we assume that there exists an ancilla system, A ,which we do have access to, and which we can use in order to help us restore the initial state on H . More precisely, we assume an error model of the form˜ E ( χ ) = (cid:88) x | x (cid:105) A (cid:104) x | ⊗ A x χA † x , (53)14here {| x (cid:105) A } l is an orhonormal basis of the Hilbert space associated to the ancillary system, H A .We can interpret this as having access to additional classical information about the error in theregister, A . We use this additional information in order to restore the state on C . One may notethat if we have no access to A , then we are back to the standard scenario, where the channel on H is E = Tr A ˜ E . It turns out that (50) is a necessary and suﬃcient condition for the existence of arecovery channel ˜ R : L ( H ⊗ H A ) → L ( H ), such that ˜ R ◦ ˜ E ( χ ) = χ for all density operators χ on C .The proof of this statement is nearly identical to that of the original Knill-Laﬂamme theorem andis omitted here.If one ﬁnds a non-tivial projector P (i.e. if Tr( P ) = dim C ≥

2) such that Eq. (50) holds, thenone can explicitly construct a collection of unitary operators, U x , such that (cid:88) x U x A x χA † x U † x = χ (54)for all density operators χ on C . In other words, the operators U x perform the error correction onsubspace C . More precisely, if we have a set { A x } with (cid:80) x A † x A x = , for which there exists anon-trivial projector, P , that satisﬁes Eq. (50), then we can construct a new “error-corrected” set, { A x } , with A x := U x A x (and (cid:80) x A † x A x = ). For this new set we will thus not get a decay to zeroof the average entropy (Eq. (17)), no matter how long a chain A x N · · · A x we construct.Nothing prevents us from repeating the above reasoning for products { A x A x } x ,x , i.e., wecan try to ﬁnd the largest subspace C with corresponding projector, P , such that P A † x A † x A x A x P = λ x ,x P. (55)We can similarly ask for the largest subspace C that is correctable for { A x A x A x } x ,x ,x . Onecan realize that we always have C n ⊆ C n − .The purity condition is violated if and only if there exists a non-trivial projector P such that(49) holds for all N . By the above reasoning we can thus conclude that the purity condition fails ifand only if there for all N exists a ﬁxed non-trivial correctable subspace C . Loosely speaking, we canalternatively phrase the purity condition as the non-existence of a non-trivial correctable subspacethat persists indeﬁnitely throughout iterated applications of the error channel. Intuitively, thisobservation suggests that the violation of the purity condition is a rather “brittle” and non-genericphenomenon. As we have seen in Section 5, the CMI of an MPS decays exponentially to zero when the matricesof the MPS satisfy the purity condition (see Def. 3). Moreover, the transfer operator of an injectiveMPS, E , decays to its unique ﬁxed point exponentially fast, at a rate lower bounded by the gap ofthe channel. The decay rate is often referred to as the correlation length. One could expect thatthere exists a relation between the correlation length and the decay rate of the classical CMI ofTheorem 4. In this section, we consider two examples which give clear evidence of the absence ofsuch relation.Consider an MPS of the form of Eq. (1) such that all matrices A x i have rank one. After ameasurement of system B , the reduced state Ψ C ( x B ) becomes pure for any measurement outcome(see Eq. (16)). Therefore, the average von Neumann entropy of Ψ C ( x B ), and thus the post-measurement CMI according to Eq. (13), are zero instantaneously even if region B is a single15ite. On the other hand, the correlation length need not be zero. For example, given a collectionof transition probabilities, { P ( x j | x i ) } d − x i ,x j =0 , and an orthonormal basis, {| x i (cid:105)} d − x i =0 , the repeatedapplication of the transfer operator of an MPS with A x ij = (cid:112) P ( x j | x i ) | x j (cid:105)(cid:104) x i | eﬀectively implementsa classical Markov process with transition probability P ( x j | x i ) such that E ◦ N ( χ ) = d − (cid:88) x ,...,x N =0 P ( x | x ) · · · P ( x N | x N − ) (cid:104) x | χ | x (cid:105)| x N (cid:105)(cid:104) x N | . Nothing prevents this Markov chain to have a slow convergence to its equilibrium distribution.Conversely, consider an MPS with matrices A x i proportional to unitary operators. As we havediscussed in Section 6.1, this implies that there is no decay of the von Neumann entropy, and thusthe CMI remains constant. However, an injective MPS always has a ﬁnite correlation length. As aconcrete example, consider A x ij := 1 D D − (cid:88) k =0 e πi kxjD | k (cid:105)(cid:104) ( k + x i ) mod D | , where {| k (cid:105)} D − k =0 is an orthonormal basis. One can easily check that A x ij are proportional to unitaryoperators, and hence the average von Neumann entropy, (cid:104) S [Ψ C ( x B )] (cid:105) , does not decay. The transferoperator of this MPS is the replacement map that replaces any input state, χ , with the maximallymixed state, i.e., E ( χ ) = D − (cid:88) x i ,x j =0 A x ij χA † x ij = Tr χD . In [22, 23], the decay of classical and quantum correlations is also studied. There, the authorsintroduce an entanglement measure called Localizable Entanglement (LE). The LE is deﬁned as themaximal amount of entanglement that can be created on average between two spins at positions i and j of a chain by performing local measurements on the other spins. It is easy to note thatthe LE is similar to the scenario that we are considering in this paper (see Eq. (15)). Indeed,the diﬀerence is simply that the LE optimises over the basis of the measurement, while we pick aconcrete basis. For the case when the measured spins are spin-1/2, it is shown in [22, 23] that theconnected correlation function provides a lower bound on the LE. In this section we consider examples that illustrate some features of the process under study. Asa prototypical example, we look at the AKLT model and obtain the exact convergence rate in aspeciﬁc basis. Then, we consider MPS with strictly contractive transfer operator and pure ﬁxedpoint. This second example shows that primitivity of the transfer operator is not a necessarycondition for the exponential convergence of the post-measurement CMI. In the last example thatwe construct, the purity condition is violated up to a ﬁxed length | B | = N but satisﬁed thereafter. The ﬁrst state we want to consider is the 1D AKLT model. The AKLT state deﬁned on a chainhas a well-known MPS description with bond dimension D = 2 and physical dimension d = 3. The16atrices in the MPS picture are given by A = − √ (cid:18) − (cid:19) , A + = (cid:114) (cid:18) (cid:19) , A − = − (cid:114) (cid:18) (cid:19) . (56)We take the { x i } = { , + , −} as the basis for our physical space. It can be seen by inspection thatthe transfer operator, E ( χ ) = (cid:80) x i A x i χA † x i , has a unique stationary state ρ = /

2. In the inﬁnitechain setting, the probability of a measurement outcome x B = x , . . . , x N is given by p Ψ ( x B ) = 12 Tr (cid:104) A x N · · · A x A † x · · · A † x N (cid:105) . (57)We want to calculate the average entropy on C after measurement of system B , i.e., we need toestimate (cid:104) S [Ψ C ( x B )] (cid:105) = (cid:88) x B =0 , + , − p Ψ ( x B ) S  A x N · · · A x A † x · · · A † x N Tr (cid:16) A x N · · · A x A † x · · · A † x N (cid:17)  . (58)We note two scenarios: p Ψ ( x B ) = 0 and p Ψ ( x B ) (cid:54) = 0. Since A + A + = A − A − = 0, we get thatwhenever the string x B contains two (or more) successive + (or − ), then p Ψ ( x B ) = 0. In otherwords, the only strings that give a p Ψ ( x B ) (cid:54) = 0 are those with an alternating sequence (Ex: + − + − ),possibly interspersed with 0’s. However, the only string with non-zero entropy is the one with all0’s because any alternating sequence has rank one. This string will occur with probability 1 / N .Thus, we get that (cid:104) S [Ψ C ( x B )] (cid:105) = 13 N S [Ψ C ( x )] , (59)where x := 0 , . . . ,

0. Hence, the AKLT model in the standard basis has a post-measurement CMIthat is exponentially decaying in the size of B for large A and C . Coincidentally, the correlationlength is the same as the classical CMI decay in this basis.Let us now consider a change of basis. The 1D AKLT state is also given by the MPS represen-tation with matrices˜ A := (cid:114)

13 ˆ σ x = (cid:114) (cid:18) (cid:19) , ˜ A := (cid:114)

13 ˆ σ y = (cid:114) (cid:18) − ii (cid:19) , ˜ A := (cid:114)

13 ˆ σ z = (cid:114) (cid:18) − (cid:19) , where ˆ σ i are the Pauli matrices. The Pauli matrices are unitary, and thus the average von-Neumannentropy of Ψ C ( x B ) (Eq. (58)) is constant, namely log 2. In other words, the post-measurementCMI of the AKLT chain when measured in the basis corresponding to the Pauli matrices is notdecaying. This is consistent with the discussion in Section 6.1 because the AKLT chain is in theHaldane phase, which is a SPP protected by the Z × Z symmetry generated by the π rotationsaround three orthogonal axes. As mentioned in Section 2, translationally invariant MPS are injective if and only if the transferoperator E , deﬁned in (2), is primitive [29], where primitivity means that the channel possessesa unique full-rank ﬁxed point. Primitivity in turn guarantees the existence of a gapped parentHamiltonian and exponential decay of correlations [13, 32].In view of the essential role played by primitivity for the decay of correlations, one may ask howit relates to the purity condition for the exponential decay of the CMI, in the sense of Theorem 4.17n this section, we present an example which shows that primitivity is not a necessary conditionfor the exponential decay of the CMI. In other words, we consider an MPS with a non-primitivetransfer operator, which nevertheless yields and exponentially decaying CMI due to purity.We say that a channel Φ is strictly contractive if there exists a number 0 ≤ α < (cid:107) Φ( χ ) − Φ( χ ) (cid:107) ≤ α (cid:107) χ − χ (cid:107) for all density operators χ and χ .Consider an MPS of the form of Eq. (1) which has a strictly contractive transfer operator, E ,with a pure ﬁxed point, denoted by | φ (cid:105) . Note that if a channel is strictly contractive, then theﬁxed point is unique. Note further that, since the ﬁxed point is pure, the transfer operator is,by deﬁnition, not primitive. Let us also assume that F = (cid:112) E ∗| C | ( R ) is full rank. Under theseassumptions, our aim is to ﬁnd an exponentially decaying bound of the average von Neumannentropy of the reduced post-measurement state (see Eq. (17)). We start by using the concavity ofthe entropy and obtain (cid:104) S [Ψ C ( x B )] (cid:105) p Ψ ( x B ) ≤ S (cid:32) K (cid:88) x B F A x N · · · A x σA † x · · · A † x N F † (cid:33) = S (cid:18) F E N ( σ ) F † Tr [ F E N ( σ ) F † ] (cid:19) , where K = Tr (cid:2) F E N ( σ ) F † (cid:3) . The purity of the ﬁxed point of E allows us to transform the aboveinequality to (cid:104) S [Ψ C ( x B )] (cid:105) p Ψ ( x B ) ≤ S (cid:18) F E N ( σ ) F † Tr [ F E N ( σ ) F † ] (cid:19) = (cid:12)(cid:12)(cid:12)(cid:12) S (cid:18) F E N ( σ ) F † Tr [ F E N ( σ ) F † ] (cid:19) − S (cid:18) F E N ( | φ (cid:105)(cid:104) φ | ) F † Tr [ F E N ( | φ (cid:105)(cid:104) φ | ) F † ] (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) , We can further bound the average von Neumann entropy using the Fannes-Audenaert inequality[33, 34]. This yields (cid:104) S [Ψ C ( x B )] (cid:105) p Ψ ( x B ) ≤ t log( D −

1) + H B ( t ) , (60)where H B is the binary entropy, i.e., H B ( t ) = − t log( t ) − (1 − t ) log(1 − t ), with H B (0) := 0 and H B (1) := 0, and where t is deﬁned as t := 12 (cid:13)(cid:13)(cid:13)(cid:13) F E N ( σ ) F † Tr [ F E N ( σ ) F † ] − F E N ( | φ (cid:105)(cid:104) φ | ) F † Tr [ F E N ( | φ (cid:105)(cid:104) φ | ) F † ] (cid:13)(cid:13)(cid:13)(cid:13) , (61)with (cid:107) · (cid:107) denoting the trace norm.For any full-rank F , and any pair of density operators χ , χ on a ﬁnite-dimensional Hilbertspace, one can show that (cid:13)(cid:13)(cid:13)(cid:13) F χ F † Tr(

F χ F † ) − F χ F † Tr(

F χ F † ) (cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:18) s max ( F ) s min ( F ) (cid:19) (cid:107) χ − χ (cid:107) , (62)where ν ( F ) and ν D ( F ) denote the largest and the smallest singular values of F , and where wenote that ν D ( F ) > F is full-rank on a ﬁnite-dimensional space.By combining (61) and (62) with an iterative use of strict contractivity of E , we ﬁnd an upper-bound on t such that t ≤ (cid:18) ν ( F ) ν D ( F ) (cid:19) (cid:107) E N ( σ ) − E N ( | φ (cid:105)(cid:104) φ | ) (cid:107) , ≤ (cid:18) ν ( F ) ν D ( F ) (cid:19) α N (cid:107) σ − | φ (cid:105)(cid:104) φ |(cid:107) . (63)18f it would be possible to choose α = 0, then (cid:107) E ( χ ) − E ( χ ) (cid:107) = 0, and thus E ( χ ) = E ( χ ),which implies that the convergence is not only exponential, but immediate. Hence, without loss ofgenerality, we may in the following assume that 0 < α < H B ( t ). For that, we deﬁne the function g ( t ) := t − t log t , with g (0) := 0. One can show that g is monotonically increasing on t ∈ [0 ,

1] andsatisﬁes H B ( t ) ≤ g ( t ) for 0 ≤ t ≤

1. These two properties of g together with inequality (63) leadto H B ( t ) ≤ (cid:18) ν ( F ) ν D ( F ) (cid:19) α N (cid:107) σ − | φ (cid:105)(cid:104) φ |(cid:107) − (cid:18) ν ( F ) ν D ( F ) (cid:19) α N (cid:107) σ − | φ (cid:105)(cid:104) φ |(cid:107) log (cid:34)(cid:18) ν ( F ) ν D ( F ) (cid:19) (cid:107) σ − | φ (cid:105)(cid:104) φ |(cid:107) (cid:35) − (cid:18) ν ( F ) ν D ( F ) (cid:19) (cid:107) σ − | φ (cid:105)(cid:104) φ |(cid:107) N α N log α. (64)By combining (60) with (64), and again using inequality (63), we ﬁnd that (cid:104) S [Ψ C ( x B )] (cid:105) p Ψ ( x B ) ≤ c α N + c N α N , (65)where c and c are deﬁned as c := (cid:18) ν ( F ) ν D ( F ) (cid:19) (cid:107) σ − | φ (cid:105)(cid:104) φ |(cid:107) (cid:20) log( D −

1) + 1 − log (cid:32)(cid:18) ν ( F ) ν D ( F ) (cid:19) (cid:107) σ − | φ (cid:105)(cid:104) φ |(cid:107) (cid:33) (cid:21) ,c := (cid:18) ν ( F ) ν D ( F ) (cid:19) (cid:107) σ − | φ (cid:105)(cid:104) φ |(cid:107) ( − log α ) . The right-hand side of Eq. (65) decays exponentially to zero when N grows to inﬁnity because0 < α <

1. Hence, we have found an exponentially-decaying bound on the post-measurement CMIfor an MPS with a transfer operator that is strictly contractive and has a pure ﬁxed point. Thisshows that primitivity of the transfer operator is not a necessary condition for the CMI to converge.

Here, we construct a simple example of a two-element set { A , A } that has a nontrivial correctablesubspace (in the sense of Section 6.2) for small enough N , but where the purity condition never-theless holds. For the Hilbert space H , we let dim H = D + 1 and let | (cid:105) , . . . , | D − (cid:105) , | D (cid:105) be anorthonormal basis of H . We deﬁne the projector P := (cid:80) D − k =0 | k (cid:105)(cid:104) k | and the operators A := D − (cid:88) k =0 | k + 1 (cid:105)(cid:104) k | , and A := | D (cid:105)(cid:104) D | . (66)We note that A is a Jordan block with zeros on the diagonal, and that A † A = D − (cid:88) k =0 | k (cid:105)(cid:104) k | = P, (67)19hile A † A = A = | D (cid:105)(cid:104) D | . Thus, A † A + A † A = . Moreover, observe that P A † A P = λ P, λ = 1 ,P A † A P = λ P, λ = 0 . (68)Hence, for a single site, the correctable subspace is C := span {| (cid:105) , . . . , | D − (cid:105)} .Now consider the case of several sites, where we construct the sequence A x N · · · A x . It is notdiﬃcult to see that A N = D − N (cid:88) k =0 | k + N (cid:105)(cid:104) k | . (69)Hence, the correctable subspace decreases the dimension with one step along the sequence, untilit is exhausted. As a consequence, the CMI in this example will be exponentially decaying with apre-factor that grows exponentially in D . Examples that do not have a block diagonal structure canalso be constructed. We note that the example above is similar in spirit to a Bosonic annihilationoperator. Indeed, in the inﬁnite system case where the Kraus operators are Bosonic creation andannihilation operators, the purity condition no longer makes sense, and the theory breaks down. We have shown that the amplitudes of an injective MPS in a speciﬁc local basis follow a quasi-localGibbs distribution with exponentially decaying tails if a condition on the Kraus operators of theMPS called the ‘purity’ is satisﬁed. The purity condition reﬂects the fact that no information canbe preserved in the virtual subspace on average, upon measurements. Our proof makes extensiveuse of the theory of random matrix products.A number of open questions remain. Perhaps the most obvious is whether the methods used inthis paper can be applied in higher dimensions or in the context of matrix product operators, andwhether this leads to new insights or algorithmic improvements. In the setting of matrix productoperators, the purity condition would no longer be suﬃcient to prevent information transmissionalong the chain. There one would likely have to bound the stochastic process upon measurementsfrom above and below. Some recent progress in this direction has been communicated to us [35].Another place where the present tools might be applied is in the rigorous analysis of the WaveFunction Monte Carlo algorithm. A ﬁrst attempt to achieve this has been made in Ref. [21], yetsome work remains to be done in connecting these mathematical results to more realistic physicalsettings and particular examples. Yet another extension would be to continuous MPSs [36].On a more technical level, it would be valuable to get a better handle on the decay rate of thestochastic process. In particular, whether there exists a closed from expression as is the case forthe correlation length (as the spectral gap of the transfer operator).

Acknowledgements

We thank T. Benoist for clarifying some details in Ref. [19]. We thankDavid Gross for helpful discussion. Funded by the Deutsche Forschungsgemeinschaft (DFG, Ger-man Research Foundation) under Germany’s Excellence Strategy - Cluster of Excellence Matterand Light for Quantum Computing (ML4Q) EXC 2004/1-390534769. This work was completedwhile MJK was at the University of Cologne. 20 eferences [1] M. B. Hastings,

Solving Gapped Hamiltonians Locally , Phys. Rev. B , 085115 (2006).[2] S. White, Density matrix formulation for quantum renormalization groups , Phys. Rev. Lett. , 2863 (1992).[3] U. Schollwoeck, The density-matrix renormalization group , Rev. Mod. Phys. , 259 (2005).[4] U. Schollwoeck, The density-matrix renormalization group in the age of matrix product states ,Annals of Physics , 96 (2011).[5] J. C. Xavier, J. A. Hoyos, E. Miranda,

Adaptive Density Matrix Renormalization Group forDisordered Systems , Phys. Rev. B , 195115 (2018).[6] F. Verstraete, J. J. Garcia-Ripoll, and J. I. Cirac, Matrix product density operators: Simulationof ﬁnite-temperature and dissipative systems , Phys. Rev. Lett. , 207204 (2004).[7] S. Paeckel, T. K¨ohler, A. Swoboda, S. R. Manmana, U. Schollw¨ock, C. Hubig, Time-evolutionmethods for matrix-product states , Annals of Physics , 167998 (2019).[8] J. Almeida, M. A. Martin-Delgado, and G. Sierra,

DMRG applied to critical systems: spinchains , AIP Conference Proceedings , 261 (2007).[9] S. Bravyi, D. Gosset,

Polynomial-time classical simulation of quantum ferromagnets , Phys.Rev. Lett. , 100503 (2017).[10] M. Jerrum and A. Sinclair,

Polynomial-time approximation algorithm for the Ising model ,SIAM Journal on computing , 1087 (1993).[11] F. Martinelli, E. Olivieri, Approach to equilibrium of Glauber dynamics in the one phase region.I. The attractive case , Comm. Math. Phys. , 3, 447-486 (1994).[12] G. Carleo, M. Troyer,

Solving the Quantum Many-Body Problem with Artiﬁcial Neural Net-works , Science , 602 (2017).[13] D. Perez-Garcia, F. Verstraete, M.M. Wolf, J.I. Cirac,

Matrix Product State Representations ,Quantum Inf. Comput. , 401 (2007).[14] O. K. Kozlov, Gibbs Description of a System of Random Variables , Probl. Peredachi Inf. :3(1974), 94–103; Problems Inform. Transmission, :3 (1974), 258–265.[15] W. Brown, D. Poulin, Quantum Markov Networks and Commuting Hamiltonians ,arXiv:1206.0755 (2012).[16] K. Kato, F. G. S. L. Brandao,

Quantum Approximate Markov Chains are Thermal , Commun.Math. Phys. , 117 (2019).[17] K. Kato, F. G. S. L. Brandao,

Locality of Edge States and Entanglement Spectrum from StrongSubadditivity , Phys. Rev. B , 195124 (2019).[18] M. J. Kastoryano, A. Lucia, D. Perez-Garcia, Locality at the boundary implies gap in the bulkfor 2D PEPS , Comm. Math. Phys. (2019) 366: 895.2119] T. Benoist, M. Fraas, Y. Pautrat, and C. Pellegrini,

Invariant measure for quantum trajectories ,Probability Theory and Related Fields , 307–334, (2019).[20] H. Maassen and B. K¨ummerer,

Puriﬁcation of quantum trajectories , Lecture Notes-MonographSeries , 252–261, (2006).[21] T. Benoist, M. Fraas, Y. Pautrat, C. Pellegrini, Invariant measure for stochastic Schr¨odingerequations , Ann. Henri Poincar´e.[22] M. Popp, F. Verstraete, J. I. Cirac,

Entanglement versus Correlations in Spin Systems , Phys.Rev. Lett. , 027901 (2004).[23] M. Popp, F. Verstraete, M. A. Martin-Delgado, J. I. Cirac, Localizable Entanglement , Phys.Rev. A , 042306 (2005).[24] T. B. Wahl, D. Perez-Garcia, J. I. Cirac, Matrix Product States with long-range LocalizableEntanglement , Phys. Rev. A , 062314 (2012).[25] E. Knill, R. Laﬂamme, A Theory of Quantum Error-Correcting Codes , Phys. Rev. Lett. ,2525-2528 (2000).[26] D. V. Else, I. Schwarz, S. D. Bartlett, A. C. Doherty, Symmetry-protected phases formeasurement-based quantum computation , Phys. Rev. Lett. , 240505 (2012).[27] R. Movassagh, and Jeﬀrey Schenker,

An ergodic theorem for homogeneously distributed quan-tum channels with applications to matrix product states , arXiv:1909.11769 (2019).[28] R. Movassagh, and Jeﬀrey Schenker,

Theory of Ergodic Quantum Processes , arXiv:2004.14397(2020).[29] M. Sanz, D. P´erez-Garc´ıa, M. M. Wolf, J. I. Cirac,

A quantum version of Wielandt’s inequality ,IEEE Transactions on Information Theory , 4668 (2010).[30] M. B. Hastings, How Quantum Are Non-Negative Wavefunctions? , J. Math. Phys. , 015210(2016).[31] P. Bougerol, J. Lacroix, Products of Random Matrices with Applications to Schr¨odinger Oper-ators , Birkh¨auser, Boston – Basel – Stuttgart (1985).[32] M. Fannes, B. Nachtergaele, R. F. Werner,

Finitely correlated states on quantum spin chains ,Commun.Math. Phys. , 443 (1992).[33] M. Fannes,

A Continuity Property of the Entropy Density for Spin Lattice Systems , Commun.math. Phys. , 291 (1973).[34] K. M. R. Audenaert, A Sharp Fannes-type Inequality for the von Neumann Entropy , J. Phys.A , 8127–8136 (2007).[35] C-F Chen, K. Kato, F. G. S. L. Brandao, When do matrix-product-density-operators have alocal parent Hamiltonian? , private communication.[36] F. Verstraete, J.I. Cirac,

Continuous Matrix Product States for Quantum Fields , Phys. Rev.Lett. , 190405 (2010). 2237] A. Gut,

Probability: A Graduate Course , Springer texts in statistics (Springer, New York,2005).[38] R. A. Horn and C. R. Johnson,

Matrix Analysis ¨Uber die Verteilung der Wurzeln bei gewissen algebraischen Gleichungen mit ganz-zahligen Koeﬃzienten , Math. Zeit. , 228 (1923).[40] J. M. Steele, Probability Theory and Combinatorial Optimization , CBMS-NSF regional con-ference series in applied mathematics; 69 (SIAM, 1996).23

Elements of the proof of Theorem 4

In order to show Theorem 4 in Section 5, we use two key bounds, one on the average von Neumannentropy and a second on the average purity. Here, we state these bounds in the form of two lemmas.In the following we let (cid:107) Q (cid:107) := sup (cid:107) ψ (cid:107) =1 (cid:107) Q | ψ (cid:105)(cid:107) denote the standard operator norm. Lemma 6.

Let { ρ x } d − x =0 be a collection of density operators on a Hilbert space H , with D = dim H ,and { p ( x ) } d − x =0 be real numbers such that p ( x ) ≥ and (cid:80) d − x =0 p ( x ) = 1 . Then, d − (cid:88) x =0 p ( x ) S ( ρ x ) ≤ − Q log Q + Q [log ( D −

1) + 1] , (70) where we refer to Q as the the average purity and deﬁne it as Q := 1 − d − (cid:88) x =0 p ( x ) (cid:107) ρ x (cid:107) . (71) Proof.

To begin with, let us ﬁrst consider a single density operator, ρ x , and deﬁne the channelΓ( ρ x ) := | φ (cid:105)(cid:104) φ | ρ x | φ (cid:105)(cid:104) φ | + Φ ⊥ ρ x Φ ⊥ , where | φ (cid:105) is a pure state in H , and Φ ⊥ := − | φ (cid:105)(cid:104) φ | . The channel Γ is mixing-enhancing, i.e., S ( ρ x ) ≤ S [Γ( ρ x )]. Moreover, Γ transforms any input state into a block-diagonal state, whichimplies that for any function, f , and any input state, ρ , it holds that f [Γ( ρ )] = f ( | φ (cid:105)(cid:104) φ | ρ | φ (cid:105)(cid:104) φ | ) + f (cid:0) Φ ⊥ ρ Φ ⊥ (cid:1) . Using these two properties of Γ, we obtain S ( ρ x ) ≤ H B [ q ( x )] + q ( x ) S (cid:20) Φ ⊥ ρ x Φ ⊥ Tr (Φ ⊥ ρ x ) (cid:21) , ≤ H B [ q ( x )] + q ( x ) log (dim H − , where we have deﬁned q ( x ) := Tr(Φ ⊥ ρ x ) = 1 − (cid:104) φ | ρ x | φ (cid:105) for x = 0 , . . . , d −

1, and recall that H B ( t ) = − t log t − (1 − t ) log(1 − t ) is the binary entropy. Note that we can choose | φ (cid:105) to be thenormalized eigenvector corresponding to the largest eigenvalue of ρ x , which we denote as λ ↓ ( ρ x ).Then, we have q ( x ) = 1 − λ ↓ ( ρ x ) = 1 − (cid:107) ρ x (cid:107) .Considering now the whole set of density operators, { ρ x } d − x =0 , we have by the concavity of theentropy that d − (cid:88) x =0 p ( x ) S ( ρ x ) ≤ H B ( Q ) + Q log (dim H − , (72)where Q is deﬁned in Eq. (71).One can next bound the binary entropy, as H B ( t ) ≤ t − t log t on 0 ≤ t ≤

1. By combining thisobservation with (72), we obtain (70).The average purity, Q , deﬁned in Eq. (71) can be bounded if one considers some structure onthe density operators and the probabilities. In particular, taking ρ x = Ψ C ( x B ) and p ( x ) = p Ψ ( x B )(see Eq. (1) and Eq. (8)), the average purity is Q = 1 − K − (cid:80) d − x =0 (cid:107) F A x N · · · A x σA † x · · · A † x N F † (cid:107) ,where recall that σ = E | A | ( L ) and F † F = E ∗| C | ( R ). An upper and a lower bound on Q are statedand shown in the following lemma. 24 emma 7. Let { A x } d − x =0 be a collection of operators on a Hilbert space, H , with D := dim H ≤ + ∞ ,such that (cid:80) d − x =0 A † x A x = . Let σ be a density operator on H , and F an operator on H , such that F † F ≤ . Then, K d − (cid:88) x =0 λ ↓ (cid:16) F A x N · · · A x σA † x · · · A † x N F † (cid:17) ≤ Q ≤ D − K d − (cid:88) x =0 λ ↓ (cid:16) F A x N · · · A x σA † x · · · A † x N F † (cid:17) , (73) where x := x , . . . , x N ; K is a normalization constant such that K := d − (cid:88) x =0 Tr (cid:16) F A x N · · · A x σA † x · · · A † x N F † (cid:17) ; (74) and Q is Q = 1 − K d − (cid:88) x =0 (cid:13)(cid:13)(cid:13) F A x N · · · A x σA † x · · · A † x N F † (cid:13)(cid:13)(cid:13) . (75) Proof.

Consider a positive semi-deﬁnite operator, ρ ≥

0, and deﬁne a function, L , on ρ such that L ( ρ ) := D (cid:88) j =2 λ ↓ j ( ρ ) . This function L ( ρ ) can be upper and lower bounded as λ ↓ ( ρ ) ≤ L ( ρ ) ≤ ( D − λ ↓ ( ρ ) . (76)Moreover, it holds that λ ↓ ( ρ ) + L ( ρ ) = Tr( ρ ) . (77)If we introduce in Eq. (77) the positive operator ρ := K − F A x N · · · A x σA † x · · · A † x N F † and wesum over all possible values of x := x , . . . , x N , we obtain1 K d − (cid:88) x =0 λ ↓ (cid:16) F A x N · · · A x σA † x · · · A † x N F † (cid:17) + 1 K d − (cid:88) x =0 L (cid:16) F A x N · · · A x σA † x · · · A † x N F † (cid:17) = 1 K d − (cid:88) x =0 Tr (cid:16) F A x N · · · A x σA † x · · · A † x N F † (cid:17) , = 1 , where the last equality holds due to the deﬁnition of K in Eq. (74). This implies that1 K d − (cid:88) x =0 L (cid:16) F A x N · · · A x σA † x · · · A † x N F † (cid:17) = 1 − K d − (cid:88) x =0 λ ↓ (cid:16) F A x N · · · A x σA † x · · · A † x N F † (cid:17) , = 1 − K d − (cid:88) x =0 (cid:13)(cid:13)(cid:13) F A x N · · · A x σA † x · · · A † x N F † (cid:13)(cid:13)(cid:13) , = Q. Using the bounds of Eq. (76), we ﬁnish the proof, since we obtain the bounds on Q in Eq. (73).25ecall that we throughout this investigation assume that log denotes the natural logarithm.We state the following two lemmas without proof. Lemma 8.

Let H B ( t ) := − t log t − (1 − t ) log(1 − t ) , < t < , (78) and H B (0) := 0 and H B (1) := 0 . Let g ( t ) := t − t log t, < t ≤ , (79) and g (0) := 0 . Then, g is monotonically increasing on [0 , , and H B ( t ) ≤ g ( t ) , ≤ t ≤ . (80) Lemma 9. − t log t ≤ (cid:15) t − (cid:15) , ≤ t ≤ , < (cid:15) < . (81)We recall that if the channel E is primitive, then it follows that E has a unique full-rank ﬁxpoint ρ [29]. With the replacement-map R ( σ ) := ρ Tr( σ ), the fact that every initial state σ converges to ρ can be expressed as lim N →∞ E N = R . Since the underlying Hilbert space is ﬁnite-dimensional,we can express the convergence in terms of any norm. It is convenient to express the convergencein terms of the norm (cid:107)F (cid:107) := sup (cid:107) Q (cid:107) =1 (cid:107)F ( Q ) (cid:107) , (82)and thus lim N →∞ (cid:107) E N − R(cid:107) = 0, where (cid:107) Q (cid:107) := Tr (cid:112) Q † Q is the trace norm. Lemma 10.

Let { A x } d − x =0 be operators on a Hilbert space H , with D := dim H < + ∞ , such that (cid:80) d − x =0 A † x A x = , and E ( · ) := (cid:80) d − x =0 A x · A † x is primitive. Then, there exists a real number r > and a natural number N such that (cid:104) R | E | A | + | B | + | C | ( | L (cid:105)(cid:104) L | ) | R (cid:105) ≥ r, ∀| A | , | C | , ∀(cid:107) R (cid:107) = 1 , (cid:107) L (cid:107) = 1 , ∀| B | ≥ N . (83) Proof.

For the map R ( σ ) := ρ Tr( σ ) with ρ the unique ﬁxpoint ρ of E , we ﬁrst observe that (cid:12)(cid:12)(cid:12) (cid:104) R | E | A | + | B | + | C | ( | L (cid:105)(cid:104) L | ) | R (cid:105) − (cid:104) R | ρ | R (cid:105) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) Tr (cid:16) | R (cid:105)(cid:104) R | (cid:0) E | A | + | B | + | C | ( | L (cid:105)(cid:104) L | ) − ρ (cid:1)(cid:17)(cid:12)(cid:12)(cid:12) , ≤ (cid:13)(cid:13) | R (cid:105)(cid:104) R | (cid:13)(cid:13)(cid:13)(cid:13) E | A | + | B | + | C | ( | L (cid:105)(cid:104) L | ) − ρ (cid:13)(cid:13) , = (cid:13)(cid:13)(cid:13) E | B | (cid:0) E | A | + | C | ( | L (cid:105)(cid:104) L | ) (cid:1) − R (cid:0) E | A | + | C | ( | L (cid:105)(cid:104) L | ) (cid:1)(cid:13)(cid:13)(cid:13) , ≤ (cid:13)(cid:13) E | B | − R (cid:13)(cid:13) (cid:107) E | A | + | C | ( | L (cid:105)(cid:104) L | ) (cid:107) , = (cid:13)(cid:13) E | B | − R (cid:13)(cid:13) . (84)Since E is assumed to be primitive, it follows that E has a unique full rank ﬁxed point ρ . Since H is assumed to be ﬁnite-dimensional, it follows that the minimal eigenvalue of ρ is such that λ min ( ρ ) >

0. By (84), it follows that λ min ( ρ ) − (cid:13)(cid:13) E | B | − R (cid:13)(cid:13) ≤(cid:104) R | ρ | R (cid:105) − (cid:13)(cid:13) E | B | − R (cid:13)(cid:13) , ≤(cid:104) R | E | A | + | B | + | C | ( | L (cid:105)(cid:104) L | ) | R (cid:105) . (85)26ince lim | B |→∞ (cid:13)(cid:13) E | B | − R (cid:13)(cid:13) = 0 and λ min ( ρ ) >

0, it follows that there exists an r such that λ min ( ρ ) > r > N , such that (cid:104) R | E | A | + | B | + | C | ( | L (cid:105)(cid:104) L | ) | R (cid:105) ≥ r, ∀| B | ≥ N . (86)One should note that r and N are independent of | A | , | C | , and all normalized | R (cid:105) and | L (cid:105) .Theorem 4 in the main text follows as a direct corollary of Theorem 11 with κ := γ − (cid:15) and c := c (cid:15) for any ﬁxed 0 < (cid:15) <

1. In essence, we use the bound f ( N ) ≤ cγ N in Proposition 5 in orderto prove the bound in Theorem 11, and thus it is the same γ that appears in both bounds. Thereason for the transition from γ to γ − (cid:15) is loosely speaking due to a leading order term proportionalto | B | γ | B | . This term appears in a bound on the CMI and can be accommodated by an arbitrarilysmall sacriﬁce of the rate in the exponential decay. However, since we here are not only interestedin the asymptotics, but rather wish to achieve a general bound valid for all values of | B | , theconstruction in the proof becomes more elaborate. Theorem 11.

For a set of operators { A x } d − x =0 on a Hilbert space H with D := dim H ≥ , andnormalized | R (cid:105) , | L (cid:105) ∈ H , let Ψ be the MPS as deﬁned in (1) on a region Λ =

ABC . The set { A x } d − x =0 is such that (cid:80) d − x =0 A † x A x = satisﬁes the purity condition in Deﬁnition 3, and is suchthat E ( · ) := (cid:80) d − x =0 A x · A † x is primitive. For the constant γ as guaranteed by Proposition 5, and forevery < (cid:15) < , there exists a constant c (cid:15) ≥ such that I p ψ,B ( A : C | B ) ≤ c (cid:15) γ | B | (1 − (cid:15) ) , | B | = 1 , , . . . . (87) The constant γ is independent of | A | , | B | , | C | , | L (cid:105) , | R (cid:105) and (cid:15) . The constant c (cid:15) is independent of | A | , | B | , | C | , | L (cid:105) and | R (cid:105) , but may depend on (cid:15) .Proof. We ﬁrst note that I p ψ,B ( A : C | B ) ≤ I ψ ( x B ) ( A : C | B ) , = (cid:104) S [ ψ A ( x B )] (cid:105) p ψ ( x B ) + (cid:104) S [ ψ C ( x B )] (cid:105) p ψ ( x B ) , = 2 (cid:104) S [ ψ C ( x B )] (cid:105) p ψ ( x B ) , (88)where the state ψ X ( x B ) is the reduced state in region X of the post-measurement state, ψ ( x B ),and (cid:104) S [ ψ ( x )] (cid:105) p ψ ( x ) is the average von Neumann entropy (cid:104) S [Ψ C ( x B )] (cid:105) p Ψ ( x B ) = (cid:88) x B p Ψ ( x B ) S ( ρ x B ) , (89)with p Ψ ( x B ) as in (8), and ρ x B := 1 p Ψ ( x B ) K F A x N · · · A x σA † x · · · A † x N F † . The inequality (88) follows from the fact that (up to zero eigenvalues) Ψ C ( x B ) is isospectral to ρ x B , as discussed in Section 3. By Lemma 6, we know that (cid:104) S [Ψ C ( x B )] (cid:105) p Ψ ( x B ) ≤ H B ( Q ) + Q log( D − , (90)with Q := 1 − (cid:88) x B p Ψ ( x B ) (cid:107) ρ x B (cid:107) . (91)27y Lemma 8 we know that the function g ( t ) = t − t log t is monotonically increasing on theinterval [0 ,

1] and satisﬁes H B ( t ) ≤ g ( t ). By combining this observation with (90), we get (cid:104) S [Ψ C ( x B )] (cid:105) p Ψ ( x B ) ≤ g ( Q ) + Q log( D −

1) = − Q log Q + 1ln a Q + Q log( D − . (92)By Lemma 7 we furthermore know that Q ≤ D − K (cid:88) x B λ ↓ (cid:16) F A x N · · · A x σA † x · · · A † x N F † (cid:17) , (93)where λ ↓ j ( O ) are the eigenvalues of an operator O in non-increasing order, i.e., λ ↓ ( O ) ≥ · · · ≥ λ ↓ D ( O ).Similarly, we let in the following ν ↓ j ( O ) denote the singular values of O in non-increasing order ν ↓ ( O ) ≥ · · · ≥ ν ↓ D ( O ). Then, recalling that for any operator O , we have λ j ( OO † ) = ν j ( O ) , we get Q ≤ D − K (cid:88) x B λ ↓ (cid:16) F A x N · · · A x σA † x · · · A † x N F † (cid:17) , ≤ D − K (cid:88) x B (cid:113) λ ↓ ( F A x N · · · A x σA † x · · · A † x N F † ) λ ↓ ( F A x N · · · A x σA † x · · · A † x N F † ) , = D − K (cid:88) x B ν ↓ ( F A x N · · · A x √ σ ) ν ↓ ( F A x N · · · A x √ σ ) , = D − K f ( N ) , [By Proposition 5] ≤ D − K cγ N , = D − (cid:104) R | E | A | + | B | + | C | ( | L (cid:105)(cid:104) L | ) | R (cid:105) cγ | B | , (94)where we recall that N = | B | and that the constant c and 0 < γ < σ := E | A | ( L )and F := (cid:112) E ∗| C | ( R ), and consequently are independent of | A | and | C | (as well as of | B | ).By Lemma 10, there exist constants r > N such that (cid:104) R | E | A | + | B | + | C | ( | L (cid:105)(cid:104) L | ) | R (cid:105) ≥ r forall | B | ≥ N . By Lemma 10 we know that r and N do not depend on | A | , | B | , | C | , | R (cid:105) , | L (cid:105) . Bycombining this observation with (94), we can conclude that Q ≤ ˜ cγ | B | , with ˜ c := D − r c, ∀| B | ≥ N , (95)where we note that ˜ c and N do not depend on | A | , | B | , | C | , | R (cid:105) , | L (cid:105) . By inspection of the deﬁnitionof Q in (91), one can see that Q ≤ Q ≤ t, ∀| B | ≥ N , with t := min (cid:104) , ˜ cγ | B | (cid:105) , (97)where t by necessity is contained in the interval [0 , g to obtain (cid:104) S [Ψ C ( x B )] (cid:105) p Ψ ( x B ) ≤ g ( Q ) + Q log( D − , [Monotonicity of g , Lemma 8, together with (97)] ≤ g ( t ) + t log( D − , = − t log t + t [1 + log( D − , [By Lemma 9] ≤ (cid:15) t − (cid:15) + t [1 + log( D − , [By t ≤ t − (cid:15) , ≤ t ≤ , < (cid:15) < ≤ (cid:20) (cid:15) + 1 + log( D − (cid:21) t − (cid:15) . (98)Since t = min (cid:104) , ˜ cγ | B | (cid:105) ≤ ˜ cγ | B | we get t − (cid:15) ≤ ˜ c − (cid:15) γ | B | (1 − (cid:15) ) , and thus (cid:104) S [Ψ C ( x B )] (cid:105) p Ψ ( x B ) ≤ (cid:20) (cid:15) + 1 + log( D − (cid:21) ˜ c − (cid:15) γ | B | (1 − (cid:15) ) . (99)By combining this with (88) we get I p ψ,B ( A : C | B ) ≤ ˜ c (cid:15) γ | B | (1 − (cid:15) ) , ∀| B | ≥ N , ˜ c (cid:15) =2 (cid:20) (cid:15) + 1 + log( D − (cid:21) ˜ c − (cid:15) . (100)Finally we should remove the restriction that | B | ≥ N . By (88) and (89) we can conclude that I p ψ,B ( A : C | B ) ≤ (cid:80) x B p Ψ ( x B ) S ( ρ x B ) ≤ D , where the last inequality follows since ρ x B is adensity operator on H , which has dimension D . Let c (cid:15) := max (cid:16) ˜ c (cid:15) , D ) γ − ( N − − (cid:15) ) (cid:17) . (101)One can conﬁrm that this guarantees that I p ψ,B ( A : C | B ) ≤ c (cid:15) γ | B | (1 − (cid:15) ) (102)for all | B | = 1 , , . . . . The resulting constant c (cid:15) is independent of | A | , | B | , | C | , | L (cid:105) and | R (cid:105) .By combining Lemma 2 with Theorem 11, and deﬁning κ := γ − (cid:15) and c := c (cid:15) for some arbitrarybut ﬁxed 0 < (cid:15) <

1, we get the following.

Corollary 12.

For a set of operators { A x } d − x =0 on a Hilbert space H with D := dim H ≥ ,and normalized | R (cid:105) , | L (cid:105) ∈ H , let Ψ be the MPS as deﬁned in (1) on a region Λ =

ABC . The set { A x } d − x =0 is such that (cid:80) d − x =0 A † x A x = , satisﬁes the purity condition in Deﬁnition 3, and is such that E ( · ) := (cid:80) d − x =0 A x · A † x is primitive. Let p ,..., | Λ | ( x , . . . , x | Λ | ) = (cid:104) x Λ | Ψ | x Λ (cid:105) be the classical restrictionof Ψ , and assume that this restriction is such that p ,..., | Λ | ( x , . . . , x | Λ | ) > for all x , . . . , x | Λ | . Let p (cid:96) ,..., | Λ | be as deﬁned in (20-23). Then, there exist constants, ≤ c and < κ < , such that S ( p ,..., | Λ | (cid:107) p (cid:96) ,..., | Λ | ) ≤ c | Λ | κ (cid:96) , ≤ (cid:96) ≤ | Λ | − . (103)29 Notions from probability theory

As mentioned in the main text, and in the proof overview, the proof of Proposition 5 relies onvarious probabilistic concepts. Here, we brieﬂy review the pertinent notions, and also collect thetechnical results that we will need at various points along the proof.Throughout these derivations we will use bold letters, such as x , Y , etc, to denote randomvariables and random operators (where ‘random variables’ by default are real-valued measurablefunctions on the underlying probability space, while ‘random operators’ are operator-valued mea-surable functions). In the following E ( x ), E ( y ), etc, denote the expectation value, and E ( y | x )denotes the expectation value of y conditioned on x . One should keep in mind that E ( y | x ) is arandom variable (due to x ). One should also keep in mind the general relation E (cid:0) E ( y | x ) (cid:1) = E ( y ). B.1 Almost surely

When we say that a relation for one, or several, random variables holds almost surely ( a.s. ), itmeans that the relation is true apart from a set of probability zero. Put diﬀerently, the relation istrue with probability one. For example, x = y a.s. means that P ( { ω ∈ Ω : x ( ω ) = y ( ω ) } ) = 1,where Ω denotes the underlying sample space, and ω an element of the sample space.As examples, one can consider various notions that intuitively remain true even if ‘a few’ pointsare excluded. For example, if x and y are such that x ≤ y then (if the expectations exist) E ( x ) ≤ E ( y ). This conclusion still holds, even if the inequality only holds almost everywhere (seee.g. Theorem 4.4 in chapter 2 of [37]): Lemma 13. If x and y are non-negative random variables, then x ≤ y a.s implies E ( y ) ≤ E ( x ) . Another statement in a similar spirit is the following. (The claim of the lemma is contained inTheorem 4.4 in chapter 2 of [37].)

Lemma 14. If x is a non-negative random variable, then x = 0 a.s. if and only if E ( x ) = 0 . For a random variable x , we deﬁne the positive and negative components by x + := max( x , x − := max( − x , x + ≥ x − ≥ x = x + − x − . Theexpectation value E ( x ) of a random variable, x , is deﬁned as E ( x ) = E ( x + ) − E ( x − ) if at leastone of E ( x + ) and E ( x − ) is ﬁnite. Lemma 15. If x is a random variable such that x ≥ a.s. and E ( x ) = 0 , then x = 0 a.s. Proof.

Since x ≥ x − = 0 almost surely. Since x − byconstruction is a non-negative random variable, it follows by Lemma 14 that E ( x − ) = 0. Wecan conclude that E ( x ) is well deﬁned, and E ( x ) = E ( x + ) − E ( x − ) and thus E ( x ) = 0 implies E ( x + ) = E ( x − ) = 0. Since x + by construction is non-negative, Lemma 14 implies x + = 0 a.s. Wecan thus conclude that x = x + − x − = 0 a.s. B.2 Stochastic convergence of real-valued sequences

A sequence of random variables ( x N ) N ∈ N is said to converge almost surely to a random variable x ∞ (denoted lim N →∞ x N = x ∞ a.s. ) if P ( { ω ∈ Ω : lim N →∞ x N ( ω ) = x ∞ ( ω ) } ) = 1 . (104)30s mentioned above, ω is an element of the underlying sample space, Ω, and x N ( ω ) is a speciﬁcrealization of the stochastic process. (If one thinks of an inﬁnite sequence of coin-tosses, then x N vaguely stands for all possible sequences of coin-tosses, while x N ( ω ) means a speciﬁc sequence ofheads and tails.) What (104) essentially says is that if we look at the set of all sequences x N ( ω )and x ∞ ( ω ), such that x N ( ω ) actually do converge to x ∞ ( ω ), then this set has probability 1.In these derivations, we will often start with a process that converges lim N →∞ x N = x ∞ almostsurely, but we want to show that lim N →∞ E ( x N ) = E ( x ∞ ). This is not generally true, but as aconsequence of Lebesgues dominated convergence theorem (see e.g., Theorem 5.3 in chapter 2 of[37]) we have the following. Proposition 16.

Suppose that x N , x ∞ and y are random variables such that | x N | ≤ y for all N ,where E ( y ) < + ∞ , and that x N → x ∞ a.s. , then lim N →∞ E ( x N ) = E ( x ∞ ) . (105)In the special case that y is equal to a constant C , one obtains the following special case(sometimes referred to as the bounded convergence theorem). Proposition 17.

Suppose that x N , x ∞ are random variables, and there exists a constant C < + ∞ ,such that | x N | ≤ C for all N . If x N → x ∞ a.s. , then lim N →∞ E ( x N ) = E ( x ∞ ) . (106)The following lemma is a consequence of the Borel-Cantelli Lemma (and is included in Theorem3.1 in chapter 5 of [37]). Lemma 18.

Let ( x n ) n ∈ N and x be random variables such that (cid:80) ∞ n =1 P ( | x n − x | > (cid:15) ) < + ∞ for all (cid:15) > (sometimes referred to as complete convergence of ( x n ) n ∈ N to x ). Then, ( x n ) n ∈ N convergesalmost surely to x . The above can be used to obtain the following.

Lemma 19.

Let ( r N ) N ∈ N be a sequence of random variables, such that r N ≥ , and such thatthe expectations values E ( r N ) exist and are ﬁnite. Suppose that there exists a number R such that lim k →∞ E (cid:16) (cid:80) kN =1 r N (cid:17) = R < + ∞ , then lim N →∞ r N = 0 a.s. Proof.

We ﬁrst note that if ( a N ) N ∈ N is a sequence of real numbers such that a N ≥

0, and if A k := (cid:80) kN =1 a N is such that lim k →∞ A k = R < + ∞ , then lim N →∞ a N = 0. With a N := E ( r N ),and A k := (cid:80) kN =1 a N = E (cid:16) (cid:80) kN =1 r N (cid:17) , it thus follows, by the assumptions of the lemma, thatlim N →∞ E ( r N ) = lim N →∞ a N = 0. By assumption, r N ≥ E ( r N ) are well deﬁned and ﬁnite.Hence, by Markov’s inequality, it follows that P ( r N > (cid:15) ) ≤ E ( r N ) /(cid:15) for all (cid:15) >

0. Consequently, (cid:80) kN =1 P ( r N > (cid:15) ) ≤ (cid:15) E (cid:16) (cid:80) kN =1 r N (cid:17) , and thus ∞ (cid:88) N =1 P ( | r N | > (cid:15) ) ≤ (cid:15) lim N →∞ E (cid:32) k (cid:88) N =1 r N (cid:33) = R(cid:15) < + ∞ , ∀ (cid:15) > . (107)Hence, ( r N ) N ∈ N converges completely to 0. By Lemma 18, we can conclude that ( r N ) N ∈ N convergesalmost surely to 0. 31 .3 Stochastic convergence of operator-valued sequences Convergence of various sequences of operators play an important role in this investigation. Sincewe here exclusively will deal with ﬁnite-dimensional spaces, one may argue that the distinctionbetween ‘random variables’ and ‘random operators’ is not very dramatic. For the sake of clarity,we will nevertheless throughout these derivations make a distinction of the these two types and, tofurther this, we will use small bold letters, x , y , etc to denote random variables, while capital boldletters X , Y , etc denote random operators.Here, we brieﬂy recall that, on ﬁnite-dimensional Hilbert spaces, all norms are metrically equiv-alent. Due to the metrical equivalence in ﬁnite dimensions (see e.g., Corollary 5.4.5 in [38]),we do not need to make a distinction between diﬀerent norms when we discuss convergences ofsequences of operators, and we can equivalently consider the element-wise convergence of the ele-ments of the matrix-representation in some arbitrary basis. In what follows we will switch betweenthese equivalent manifestations of convergence without any further comments. For the operatornorms, we will mainly be using the supremum norm, (cid:107) O (cid:107) := sup (cid:107) ψ (cid:107) =1 (cid:107) O | ψ (cid:105)(cid:107) , and the trace-norm, (cid:107) O (cid:107) := Tr √ O † O , but also the Hilbert-Schmidt norm, (cid:107) O (cid:107) := (cid:112) Tr( O † O ).Let us now consider a sequence of random operators ( X N ) N ∈ N and X ∞ on a ﬁnite-dimensionalHilbert space. We interpret the convergence X N → X ∞ a.s. aslim N →∞ (cid:107) X N − X ∞ (cid:107) = 0 a.s., (108)or equivalently for any other operator norm (since the underlying Hilbert space is ﬁnite-dimensional),or as lim N →∞ (cid:104) k | X N | k (cid:48) (cid:105) = (cid:104) k | X ∞ | k (cid:48) (cid:105) a.s. ∀ N, N (cid:48) = 1 , . . . , D. (109)The following is a counterpart of Proposition 17, which can be obtained by applying Proposition17 to the real and imaginary matrix components with respect to a basis, i.e., Re (cid:104) k | X N | k (cid:48) (cid:105) andIm (cid:104) k | X N | k (cid:48) (cid:105) . Proposition 20.

Suppose that X N and X ∞ are random operators on a complex ﬁnite-dimensionalHilbert space, and that there exists a constant C < + ∞ , such that (cid:107) X N (cid:107) ≤ C for all N . If X N → X ∞ a.s. , then lim N →∞ E ( X N ) = E ( X ∞ ) . (110) B.4 Martingales

Our primary interest in martingales is that they allow for statements concerning the stochasticconvergence of sequences of random variables. However, in order to connect to the manner thatthese convergence-theorems typically are phrased in the literature, we need to brieﬂy discuss sometechnical concepts. (For a more thorough introduction, see, e.g., chapter 10 in [37].)Consider a sequence of random variables ( y N ) N ∈ N on a probability space (Ω , F , P ), where Ωis the sample space, F is a σ -algebra (the event space), and P a probability measure. We alsoconsider a ﬁltration, i.e., a non-decreasing sequence of σ -subalgebras F ⊂ F ⊂ · · · ⊂ F . Asequence ( y N ) N ∈ N of random variables is said to be adapted to ( F N ) N ∈ N if each y N is measurablewith respect to F N . A sequence ( y N ) N ∈ N is a martingale with respect to ( F N ) N ∈ N if ( y N ) N ∈ N isadapted to ( F N ) N ∈ N , satisﬁes E ( y N +1 |F N ) = y N a.s. , as well as E ( | y N | ) < + ∞ . Intuitively, F N stands for the information available to us at step N . In our setting, this information corresponds32o variables x , . . . , x N (which are assumed to also be random variables on the same underlyingprobability space (Ω , F , P )). More precisely, F N := σ ( x N , . . . , x ), which denotes the σ -algebragenerated by x , . . . , x N and often is referred to as the natural ﬁltration of x , . . . , x N . Since in thefollowing we exclusively will use the natural ﬁltrations, we will employ the more succinct notation E ( y N +1 | x N , . . . , x ) := E ( y N +1 |F N ), with F N := σ ( x , . . . , x N ). We moreover say that ( y N ) N ∈ N is a martingale with respect to ( x N ) N ∈ N if E ( y N +1 | x N , . . . , x ) = y N a.s., and E ( | y N | ) < + ∞ , with y N = f N ( x N , . . . , x ) , (111)for (Borel measurable) functions f N . The construction with the functions f N guarantees that( y N ) N ∈ N is adapted to the natural ﬁltration of ( x N ) N ∈ N . The following proposition is obtained asa special case of Theorem 12.1 in chapter 10 of [37]. Proposition 21.

Let ( y N ) N ∈ N be a martingale with respect to another process ( x N ) N ∈ N , andsuppose that there exists a real number C such that | y N | ≤ C for all N ∈ N , then there exists arandom variable y ∞ such that lim N →∞ y N = y ∞ a.s. and E ( | y ∞ | ) < + ∞ . (112)As a technical remark concerning the relation to Theorem 12.1 in chapter 10 of [37], one maynote that the condition | y N | ≤ C implies that ( y n ) n ∈ N is uniformly integrable.Our main interest is not these ‘standard’ real-valued martingales, but rather operator-valuedmartingales. It is again worth recalling that we here only consider ﬁnite-dimensional spaces, andhence we can represent each operator as a ﬁnite matrix with respect to some choice of basis.With this in mind, we say that an operator-valued process ( Y N ) N ∈ N on a ﬁnite-dimensionalHilbert space is an operator-valued martingale with respect to a stochastic process ( x N ) N ∈ N ifeach of Re (cid:104) k | Y N | k (cid:48) (cid:105) and Im (cid:104) k | Y N | k (cid:48) (cid:105) are a martingale with respect to ( x N ) N ∈ N for some ﬁxedorthonormal basis {| k (cid:105)} Dk =1 . One may note that the condition that E ( | Re (cid:104) k | Y N | k (cid:48) (cid:105)| ) < + ∞ and E ( | Im (cid:104) k | Y N | k (cid:48) (cid:105)| ) < + ∞ in the ﬁnite-dimensional case is equivalent to E ( (cid:107) Y N (cid:107) ) < + ∞ . (113)Similarly, the conditions E (cid:0) Re (cid:104) k | Y N +1 | k (cid:48) (cid:105) (cid:12)(cid:12) x N , . . . , x (cid:1) = Re (cid:104) k | Y n +1 | k (cid:48) (cid:105) a.s.,E (cid:0) Im (cid:104) k | Y N +1 | k (cid:48) (cid:105) (cid:12)(cid:12) x N , . . . , x (cid:1) = Im (cid:104) k | Y n +1 | k (cid:48) (cid:105) a.s., (114)can equivalently be stated as E ( Y N +1 | x N , . . . , x ) = y N a.s. (115)In a similar manner, Proposition 21 can be applied to the real and imaginary components of anoperator-valued martingale, which yields the following ‘operator counterpart’ to Proposition 21. Proposition 22.

Let ( Y N ) N ∈ N be an operator-valued martingale on a ﬁnite-dimensional complexHilbert space with respect to a real-valued process ( x N ) N ∈ N , and suppose that there exists real number C such that (cid:107) Y N (cid:107) ≤ C for all N ∈ N , then there exists a random operator, Y ∞ , such that lim N →∞ Y N = Y ∞ a.s. and E ( (cid:107) Y ∞ (cid:107) ) < + ∞ . (116)33 Stochastic process of measurements

Consider a set of operators { A x } d − x =0 on a ﬁnite-dimensional Hilbert space, H , with D := dim H ,such that (cid:80) d − x =0 A † x A x = . We introduce the stochastic process ( x N ) N ∈ N with a joint distributionsuch that, for each N , the marginal distribution of x , . . . , x N is given by P ( x N = x N , . . . , x = x ) = 1 D Tr (cid:16) A x N · · · A x A † x · · · A † x N (cid:17) . (117)This means that x , . . . , x N can be interpreted as the outcomes of a sequence of measurements,where the initial state is maximally mixed, i.e., σ = /D . Note that when we in the followingrefer to an expectation value, the underlying probability distribution is assumed to be (117) unlessotherwise stated. It will be useful to note that P ( x N +1 = x N +1 | x N = x N , . . . , x = x ) = Tr( A † x · · · A † x N A † x N +1 A x N +1 A x N · · · A x )Tr( A † x · · · A † x N A x N · · · A x ) , (118)where P ( y | x ) := P ( y, x ) /P ( x ) denotes the conditional probability.Based on the process ( x N ) N ∈ N , we deﬁne sequences of random operators A N := A x N , W N := A N · · · A = A x N · · · A x , M N :=  W † N W N Tr( W † N W N ) if Tr( W † N W N ) (cid:54) = 00 if Tr( W † N W N ) = 0 . (119)One should note that M N is Hermitian, i.e., M † N = M N . Moreover, M N is positive semideﬁnite,and either has trace 1 or trace 0. Hence, M N is either a density operator, or the zero operator.One may further note thatTr( W † N W N ) = Tr( A † x · · · A † x N A x N · · · A x ) = P ( x N , . . . , x ) D, (120)from which we can conclude that M N = 0 with probability zero, i.e., M N is almost surely a densityoperator.As a side-remark, one might note that M N is not the post-measurement state of the mea-surement process. The post-measurement state would rather be ρ N := W N W † N / Tr( W † N W N ).However, M N and ρ N have the same non-zero eigenvalues (as can be seen by a singular valuedecomposition of W N ). The main reason for why it is convenient to use M N , rather than ρ N , isthat on M N we can directly utilize (cid:80) x A † x A x = , which for example is used in the proof of themartingale property in Lemma 23.In the following it will be useful to observe that since A N is a (deterministic) function of x N [as seen by (119)] it is the case that E ( A N | x N ) = A x N = A N . (121)Analogously, E ( W N | x N , . . . , x ) = W N , and similarly E ( M N | x N , . . . , x ) = M N .34 Elements of the proof of Proposition 5

D.1 lim N →∞ M N = M ∞ a.s. The purpose of this section is to show that M N has limit operator M ∞ in a suﬃciently strongsense, and that this limit operator has ‘nice’ properties. We do this by ﬁrst showing that M N isa martingale relative to the sequence of measurement outcomes x N . This in turn yields almostsure convergence to limiting operator M ∞ . Recall that the underlying probability distribution isassumed to be (117), and that all expectations are taken with respect to this distribution. Lemma 23.

Let { A x } d − x =0 be linear operators on a ﬁnite-dimensional complex Hilbert space, suchthat (cid:80) d − x =0 A † x A x = . Then, ( M N ) N ∈ N , deﬁned by (119), is an operator-valued martingale withrespect to ( x N ) N ∈ N with distribution (117).Proof. From the fact that each M N is a density operator, or the zero operator, it follows that (cid:107) M N (cid:107) ≤

1, and thus in particular that E ( (cid:107) M N (cid:107) ) ≤ < + ∞ . Moreover, by the constructionin (119), it is the case that M n (and thus the matrix-elements with respect to a given basis) arefunctions of x n , . . . , x . By (118) and (cid:80) d − x N +1 =0 A † x N +1 A x N +1 = we ﬁnd that E ( M N +1 | x N = x N , . . . , x = x )= (cid:88) x N +1 A † x · · · A † x N A † x N +1 A x N +1 A x N · · · A x Tr( A † x · · · A † x N A † x N +1 A x N +1 A x N · · · A x ) P ( x N +1 = x N +1 | x N = x N , . . . , x = x ) , = E ( M N | x N = x N , . . . , x = x ) . We can conclude that E ( M N +1 | x N , . . . , x ) = E ( M N | x N , . . . , x ) = M N . Hence, ( M N ) N is amartingale sequence with respect to ( x N ) N ∈ N . Lemma 24.

Let { A x } d − x =0 be linear operators on a ﬁnite-dimensional complex Hilbert space, suchthat (cid:80) d − x =0 A † x A x = . Let ( M N ) N ∈ N be as deﬁned in (119) with respect to ( x N ) N ∈ N and distributedas in (117). Then, there exists a random operator, M ∞ , such that lim N →∞ M N = M ∞ a.s., (122) M ∞ is almost surely a density operator, (123)lim N →∞ E ( M N ) = E ( M ∞ ) , (124)lim N →∞ E ( (cid:107) M N (cid:107) ) = E ( (cid:107) M ∞ (cid:107) ) , (125) E ( (cid:107) M ∞ (cid:107) ) < + ∞ . (126) Proof.

By Lemma 23 we know that ( M N ) N ∈ N is a martingale with respect to ( x N ) N ∈ N . From thefact that each M N is a density operator, or the zero operator, it follows that (cid:107) M N (cid:107) ≤ , ∀ N ∈ N (127)35y Proposition 22 it follows that there exists a random operator, M ∞ , such thatlim N →∞ M N = M ∞ a.s., (128)with E ( (cid:107) M ∞ (cid:107) ) < + ∞ . By combining (128) with (127), Proposition 20 yields E ( M N ) → E ( M ∞ ).Moreover, (128) yields lim N →∞ (cid:107) M N (cid:107) = (cid:107) M ∞ (cid:107) a.s. By this observation together with (127),Proposition 17 with x N := (cid:107) M N (cid:107) and x ∞ := (cid:107) M ∞ (cid:107) yields E ( (cid:107) M N (cid:107) ) → E ( (cid:107) M ∞ (cid:107) ). Finally, weshould show that M ∞ almost surely is a density operator, i.e., that M ∞ ≥ M ∞ = 1 almost surely. From (128) it follows that lim N →∞ (cid:104) ψ | M N | ψ (cid:105) = (cid:104) ψ | M ∞ | ψ (cid:105) a.s .Since (cid:104) ψ | M N | ψ (cid:105) ≥

0, it follows that (cid:104) ψ | M ∞ | ψ (cid:105) ≥ a.s . Analogously, since Tr M N = 1 almostsurely, it follows that lim N →∞ Tr M N = Tr M ∞ = 1 a.s . Hence, M ∞ is almost surely a densityoperator. D.2 If { A x } d − x =0 satisﬁes the purity condition, then rank( M ∞ ) = 1 a.s. The purpose of this section is to show that the limit operator M ∞ , more or less always, is arank-one operator whenever { A x } d − x =0 satisﬁes the purity condition. The ﬁrst step (Lemma 25) isto show that the diﬀerence between the operators M N + p and M N tends to vanish as N increases,even when conditioned on all the measurement outcomes x N , . . . , x . Lemma 25.

Let { A x } d − x =0 be linear operators on a ﬁnite-dimensional complex Hilbert space, suchthat (cid:80) d − x =0 A † x A x = . Let ( M N ) N ∈ N be as deﬁned in (119) with respect to ( x N ) N ∈ N and distributedas in (117). Then, lim N →∞ E (cid:16) (cid:107) M N + p − M N (cid:107) (cid:12)(cid:12)(cid:12) x N , . . . , x (cid:17) = 0 a.s. (129) Proof.

Recall that M N is a deterministic function of x N , . . . , x , and thus E ( M N | x N , . . . , x ) = M N . (130)A direct consequence is that M N and M N + p are independent when conditioned on x N , . . . , x ,and thus E ( M N + p M N | x N , . . . , x ) = E ( M N + p | x N , . . . , x ) E ( M N | x N , . . . , x ) . (131)By expanding E (cid:0) ( M N + p − M N ) (cid:1) one obtains cross-terms such as E ( M N + p M N ) = E (cid:16) E ( M N + p M N | x N , . . . , x ) (cid:17) , = E (cid:16) E ( M N + p | x N , . . . , x ) E ( M N | x N , . . . , x ) (cid:17) , (132)where the last equality follows by the conditional independence in (131). By combining theseobservations with the martingale property, as shown in Lemma 23, with (130), E (cid:0) ( M N + p − M N ) (cid:1) results in E (cid:0) ( M N + p − M N ) (cid:1) = E ( M N + p ) + E ( M N ) − E (cid:16) E ( M N | x N , . . . , x ) E ( M N | x N , . . . , x ) (cid:17) , = E ( M N + p ) − E ( M N ) , (133)36here we in the last step have used (130).By the observation that M N is Hermitian, it follows that Tr E ( M N + p ) = E ( (cid:107) M N + p (cid:107) ),Tr E ( M N ) = E ( (cid:107) M N (cid:107) ) and Tr E (cid:0) ( M N + p − M N ) (cid:1) = E (cid:0) (cid:107) M N + p − M N (cid:107) (cid:1) , which with (133)yields E ( (cid:107) M N + p (cid:107) ) − E ( (cid:107) M N (cid:107) ) ≥ . (134)By using (133), we next observe that p − (cid:88) N =0 E ( M N + k +1 ) − p − (cid:88) N =0 E ( M N ) = k (cid:88) N =0 E ( M N + p ) − k (cid:88) N =0 E ( M N ) , = k (cid:88) N =0 E (cid:0) ( M N + p − M N ) (cid:1) , = E (cid:32) k (cid:88) N =0 E (cid:16) ( M N + p − M N ) (cid:12)(cid:12)(cid:12) x N , . . . , x (cid:17)(cid:33) , (135)where we in the last step use the general relation E (cid:0) E ( y | x ) (cid:1) = E ( y ). Recall that (cid:107) O (cid:107) := Tr( O ).By applying the trace to (135) we obtain E (cid:32) k (cid:88) N =0 E (cid:16) (cid:107) M N + p − M N (cid:107) (cid:12)(cid:12)(cid:12) x N , . . . , x (cid:17)(cid:33) = p − (cid:88) N =0 (cid:20) E ( (cid:107) M N + k +1 (cid:107) ) − E ( (cid:107) M N (cid:107) ) (cid:21) . By Lemma 24 we know that lim N →∞ M N = M ∞ almost surely. From this observation it followsthat lim N →∞ (cid:107) M N + k +1 (cid:107) = (cid:107) M ∞ (cid:107) a.s. Next, we note that M N is a density operator, or thezero operator, and thus it follows that (cid:107) M N (cid:107) ≤

1. With x N := (cid:107) M N (cid:107) and x ∞ := (cid:107) M ∞ (cid:107) , itfollows by Proposition 17 that lim N →∞ E ( (cid:107) M N + k +1 (cid:107) ) = E ( (cid:107) M ∞ (cid:107) ) . (136)By Lemma 24 we know that M ∞ is almost surely a density operator, from which it follows that (cid:107) M ∞ (cid:107) ≤ a.s. With x := (cid:107) M ∞ (cid:107) and y := 1 in Lemma 13, we get E ( (cid:107) M ∞ (cid:107) ) ≤ . (137)With p := k + 1 in the inequality (134), it follows that E ( (cid:107) M N + k +1 (cid:107) ) − E ( (cid:107) M N (cid:107) ) ≥

0, whichimplies E ( (cid:107) M ∞ (cid:107) ) − E ( (cid:107) M N (cid:107) ) ≥

0. Hence,lim k →∞ E (cid:32) k (cid:88) N =0 E (cid:16) (cid:107) M N + p − M N (cid:107) (cid:12)(cid:12)(cid:12) x N , . . . , x (cid:17)(cid:33) = p − (cid:88) N =0 (cid:20) E ( (cid:107) M ∞ (cid:107) ) − E ( (cid:107) M N (cid:107) ) (cid:21) =: R ( p ) . (138)By E ( (cid:107) M ∞ (cid:107) ) − E ( (cid:107) M N (cid:107) ) ≥

0, it follows that R ( p ) ≥ R ( p ) ≤ p < + ∞ . Deﬁne r N := E ( (cid:107) M N + p − M N (cid:107) | x N , . . . , x ). Note that r N ≥

0. Moreover, E ( r N ) = E ( (cid:107) M N + p − M N (cid:107) ). Since M N is either a density operator, or the zero operator, itfollows that (cid:107) M N (cid:107) ≤

1, and thus (cid:107) M N + p − M N (cid:107) ≤ ( (cid:107) M N + p (cid:107) + (cid:107) M N (cid:107) ) ≤

4. We concludethat E ( (cid:107) M N + p − M N (cid:107) ) ≤

4, which together with E ( r N ) = E ( (cid:107) M N + p − M N (cid:107) ) yields E ( r N ) ≤ R ( p ) such that lim k →∞ E ( (cid:80) kN =0 r N ) = R ( p ) < + ∞ . All theconditions of Lemma 19 are thus satisﬁed and it yieldslim N →∞ E (cid:16) (cid:107) M N + p ( ω ) − M N ( ω ) (cid:107) (cid:12)(cid:12)(cid:12) x N ( ω ) , . . . , x ( ω ) (cid:17) = 0 . (139)37ext, we note that x (cid:55)→ x is a convex function, and thus by Jensen’s inequality ( E ( X )) ≤ E ( X ).By combining this observation with (139) we obtainlim N →∞ E (cid:16) (cid:107) M N + p − M N (cid:107) (cid:12)(cid:12)(cid:12) x N , . . . , x (cid:17) = 0 a.s. (140)By the general relation between the supremum norm and the Hilbert-Schmidt norm, (cid:107) R (cid:107) ≤ (cid:107) R (cid:107) ,we get E ( (cid:107) R (cid:107) ) ≤ E ( (cid:107) R (cid:107) ), and thus (140) yields (129).The following lemma provides a reformulation of the purity condition that is better suited forthe proof-technique that we employ. Lemma 26.

Let { A x } d − x =0 be linear operators on a complex ﬁnite-dimensional Hilbert space, H .Then, { A x } d − x =0 satisﬁes the purity condition if and only if the following condition holds:If O is an operator on H such that O † A † x · · · A † x N A x N · · · A x O ∝ O † O, ∀ N ∈ N , ∀ ( x , . . . , x N ) ∈ { , . . . , d − } × N , then rank( O ) = 1 . (141) Proof.

We start proving the direction that, if { A x } d − x =0 satisﬁes condition (141), then { A x } d − x =0 alsosatisﬁes the purity condition. Suppose that condition (141) holds. For the subset of operators O = P for projectors P , we thus ﬁnd that condition (33) holds, and hence { A x } d − x =0 satisﬁes thepurity condition.Conversely, we wish to show that, if { A x } d − x =0 satisﬁes the purity condition, then { A x } d − x =0 alsosatisﬁes condition (141). Hence, assume that { A x } d − x =0 satisﬁes the purity condition. Let O be anyoperator on H such that O † A † x · · · A † x N A x N · · · A x O ∝ O † O (142)for all N and all x , . . . , x N . We next note that OO † is positive semideﬁnite, and let ( OO † ) (cid:9) denote the inverse on the support of OO † , such that ( OO † ) (cid:9) OO † = OO † ( OO † ) (cid:9) = P , where P isthe projector onto the support of OO † . Multiplying (142) from the left with ( OO † ) (cid:9) O and fromthe right with O † ( OO † ) (cid:9) results in P A † x · · · A † x N A x N · · · A x P ∝ P . Since the purity condition isassumed to hold, it follows that rank( P ) = 1. However, rank( P ) = rank( OO † ) = rank( O ). We canthus conclude that if { A x } d − x =0 satisﬁes the purity condition, then { A x } d − x =0 also satisﬁes condition(141).In the following lemma we use the convergence in (140) to show that M ∞ almost surely isa rank-one operator. A key-step in the proof is the equality (144) below, which with (140)and the observation that (cid:107)√ M N (cid:107) ≤ N →∞ M N = M ∞ a.s. , it seems reasonable that we in the limit N → ∞ obtain the propor-tionality in (149). The latter does via Lemma 26 imply the desired result that M ∞ almost surelyis a rank-one operator. However, there is a complication to this reasoning, namely the sequenceof unitary operators, U N . These unitary operators are the result of a polar decomposition of theoperators A x N · · · A x / (cid:113) Tr( A † x · · · A † x N A x N · · · A x ), and we have very little control of the sequence( U N ) N ∈ N , and in particular whether it possesses a limit U ∞ . However, we can mend this issueby using the fact that the set of unitary operators on a ﬁnite-dimensional Hilbert space is sequen-tially compact. Recall that a topological space, C , is sequentially compact if, for every sequence( x j ) j ∈ N ⊂ C , there exists a subsequence ( x j k ) k ∈ N such that x j k converges to an element in C . Ona ﬁnite-dimensional complex Hilbert space with dimension D , the set of unitary operators, U ( D ),38orms a sequentially compact (as well as compact) space. Hence, whenever we have a sequence( U j ) j ∈ N in U ( D ), then there exists a subsequence ( U j k ) k ∈ N such that U j k converges to an elementin U ( D ). Lemma 27.

With the assumptions in Lemma 24, let M ∞ be the random operator guaranteed byLemma 24. If { A x } d − x =0 satisﬁes the purity condition, then rank( M ∞ ) = 1 a.s. (143) Proof.

We deﬁne M x N ,...,x := A † x · · · A † x N A x N · · · A x / Tr( A † x · · · A † x N A x N · · · A x ), and thus wehave M N = M x N ,..., x . We make a polar decomposition with a unitary operator U x N ,...,x suchthat U x N ,...,x (cid:112) M x N ,...,x = A x N · · · A x / (cid:113) Tr( A † x · · · A † x N A x N · · · A x ), and deﬁne U N := U x N ,..., x .Then, we have E (cid:16) (cid:107) M N + p − M N (cid:107) (cid:12)(cid:12)(cid:12) x N = x N , . . . , x = x (cid:17) = (cid:88) x N + p ,...,x N +1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:112) M x N ,...,x U † x N ,...,x A † x N +1 · · · A † x N + p A x N + p · · · A x N +1 U x N ,...,x (cid:112) M x N ,...,x − M x N ,...,x Tr( A † x N +1 · · · A † x N + p A x N + p · · · A x N +1 U x N ,...,x M x N ,...,x U † x N ,...,x ) (cid:13)(cid:13)(cid:13)(cid:13) , = E  (cid:88) x (cid:48) p ,...,x (cid:48) (cid:13)(cid:13)(cid:13)(cid:13)(cid:112) M N U † N A † x (cid:48) · · · A † x (cid:48) p A x (cid:48) p · · · A x (cid:48) U N (cid:112) M N − M N Tr( A † x (cid:48) · · · A † x (cid:48) p A x (cid:48) p · · · A x (cid:48) U N M N U † N ) (cid:13)(cid:13)(cid:13)(cid:13) (cid:12)(cid:12)(cid:12)(cid:12) x N = x N , . . . , x = x (cid:19) , where we in the second equality have renamed the indices x N +1 , . . . , x N + p to x (cid:48) , . . . , x (cid:48) p . Conse-quently, E (cid:16) (cid:107) M N + p − M N (cid:107) (cid:12)(cid:12)(cid:12) x N , . . . , x (cid:17) = (cid:88) x (cid:48) p ,...,x (cid:48) (cid:13)(cid:13)(cid:13)(cid:13)(cid:112) M N U † N A † x (cid:48) · · · A † x (cid:48) p A x (cid:48) p · · · A x (cid:48) U N (cid:112) M N − M N Tr( A † x (cid:48) · · · A † x (cid:48) p A x (cid:48) p · · · A x (cid:48) U N M N U † N ) (cid:13)(cid:13)(cid:13)(cid:13) , (144)where we have used that M N and U N are deterministic functions of x (cid:48) N , . . . , x (cid:48) . Since M N is adensity operator, or the zero operator, it follows that (cid:107)√ M N (cid:107) ≤

1. By combining this observationwith (144), and with Lemma 25, it follows thatlim N →∞ (cid:88) x p ,...,x (cid:13)(cid:13)(cid:13)(cid:13) M N U † N A † x · · · A † x p A x p · · · A x U N M N − M N Tr( A † x · · · A † x p A x p · · · A x U N M N U † N ) (cid:13)(cid:13)(cid:13)(cid:13) = 0 a.s. (145)Next, we recall that Lemma 24 guarantees that lim N →∞ M N = M ∞ a.s. , where M ∞ almostsurely is a density operator. Let ω ∈ Ω be such that lim N →∞ M N ( ω ) = M ∞ ( ω ), where M ∞ ( ω ) isa density operator, and the limit in (145) holds. The latter implies a sequence of unitary operators( U N ( ω )) N ∈ N ⊂ U ( D ). By the sequential compactness of U ( D ), it follows that there exists asubsequence (cid:0) U N k ( ω ) (cid:1) k ∈ N and an element U ∞ ( ω ) ∈ U ( D ), such that lim k →∞ U N k ( ω ) = U ∞ ( ω ).39t still remains true that lim k →∞ M N k ( ω ) = M ∞ ( ω ), and similarly the limit in (145) remains truewith N replaced with N k . With the deﬁnition B k ( ω ) := (cid:88) x p ,...,x (cid:13)(cid:13)(cid:13)(cid:13) M N k ( ω ) U † N k ( ω ) A † x · · · A † x p A x p · · · A x U N k ( ω ) M N k ( ω ) − M N k ( ω ) Tr (cid:0) A † x · · · A † x p A x p · · · A x U N k ( ω ) M N k ( ω ) U N k ( ω ) † (cid:1)(cid:13)(cid:13)(cid:13)(cid:13) , it thus follows by (145) that B k ( ω ) →

0. Deﬁne B ∞ ( ω ) := (cid:88) x p ,...,x (cid:13)(cid:13)(cid:13)(cid:13) M ∞ ( ω ) U ∞ ( ω ) † A † x · · · A † x p A x p · · · A x U ∞ ( ω ) M ∞ ( ω ) − M ∞ ( ω ) Tr( A † x · · · A † x p A x p · · · A x U ∞ ( ω ) M ∞ ( ω ) U ∞ ( ω ) † ) (cid:13)(cid:13)(cid:13)(cid:13) . (146)Next we wish to show that B k ( ω ) → B ∞ ( ω ). By the inverted triangle inequality, a rearrange-ment, and the triangle inequality, one obtains | B ∞ ( ω ) − B k ( ω ) | ≤ (cid:88) x p ,...,x (cid:13)(cid:13)(cid:13)(cid:13) M ∞ ( ω ) U ∞ ( ω ) † A † x · · · A † x p A x p · · · A x U ∞ ( ω ) M ∞ ( ω ) − M N k ( ω ) U † N k ( ω ) A † x · · · A † x p A x p · · · A x U N k ( ω ) M N k ( ω ) (cid:13)(cid:13)(cid:13)(cid:13) + (cid:88) x p ,...,x (cid:13)(cid:13)(cid:13)(cid:13) M ∞ ( ω ) Tr (cid:0) A † x · · · A † x p A x p · · · A x U ∞ ( ω ) M ∞ ( ω ) U ∞ ( ω ) † (cid:1) − M N k ( ω ) Tr (cid:0) A † x · · · A † x p A x p · · · A x U N k ( ω ) M N k ( ω ) U N k ( ω ) † (cid:1)(cid:13)(cid:13)(cid:13)(cid:13) (147)The goal is to utilize the fact that M N k ( ω ) → M ∞ ( ω ), and consequently that M N k ( ω ) → M ∞ ( ω ),and similarly that U N k ( ω ) → U ∞ ( ω ). To this end, in the ﬁrst sum in (147), inside the norm, one cansubtract and add M N k ( ω ) U † N k ( ω ) A † x · · · A † x p A x p · · · A x U ∞ ( ω ) M ∞ ( ω ). Similarly in the secondsum, we subtract and add M N k ( ω ) Tr( A † x · · · A † x p A x p · · · A x U ∞ ( ω ) M ∞ ( ω ) U ∞ ( ω ) † ) inside of thenorm. One can repeatedly use the triangle inequality, subtractions and additions in the similar spiritas above, together with general relations such as (cid:107) AB (cid:107) ≤ (cid:107) A (cid:107)(cid:107) B (cid:107) , | Tr( AB ) | ≤ (cid:107) A (cid:107)(cid:107) B (cid:107) , as well asobservations such as (cid:107) U ∞ ( ω ) M ∞ ( ω ) (cid:107) ≤ (cid:13)(cid:13) M N k ( ω ) U † N k ( ω ) (cid:13)(cid:13) ≤ (cid:107) U ∞ ( ω ) M ∞ ( ω ) U ∞ ( ω ) † (cid:107) = (cid:107) M ∞ ( ω ) (cid:107) = 1, (cid:107) M N k ( ω ) (cid:107) ≤

1, and (cid:80) x p ,...,x (cid:107) A † x · · · A † x p A x p · · · A x (cid:107) ≤ d p to show that | B ∞ ( ω ) − B n ( ω ) | ≤ d p (cid:107) M ∞ ( ω ) − M N k ( ω ) (cid:13)(cid:13) + d p (cid:107) U ∞ ( ω ) † − U † N k ( ω ) (cid:107) + d p (cid:107) M ∞ ( ω ) − M N k ( ω ) (cid:107) + d p (cid:107) U ∞ ( ω ) − U N k ( ω ) (cid:107) + d p (cid:107) M ∞ ( ω ) − M N k ( ω ) (cid:107) + d p (cid:107) U ∞ ( ω ) − U N k ( ω ) (cid:107) + d p (cid:107) M ∞ ( ω ) − M N k ( ω ) (cid:107) + d p (cid:107) U ∞ ( ω ) † − U N k ( ω ) † (cid:107) , =: C k ( ω ) , and thus C k ( ω ) →

0. We can conclude that B ∞ ( ω ) − B k ( ω ) ≤ | B ∞ ( ω ) − B k ( ω ) | ≤ C k ( ω ), whichimplies 0 ≤ B ∞ ( ω ) ≤ B k ( ω ) + C k ( ω ). By combining this observation with B k ( ω ) → k ( ω ) →

0, as well as with the deﬁnition of B ∞ ( ω ) in (146), we can conclude that B ∞ ( ω ) = (cid:88) x p ,...,x (cid:13)(cid:13)(cid:13)(cid:13) M ∞ ( ω ) U ∞ ( ω ) † A † x · · · A † x p A x p · · · A x U ∞ ( ω ) M ∞ ( ω ) − M ∞ ( ω ) Tr (cid:0) A † x · · · A † x p A x p · · · A x U ∞ ( ω ) M ∞ ( ω ) U ∞ ( ω ) † (cid:1)(cid:13)(cid:13)(cid:13)(cid:13) , =0 . (148)This in turn implies M ∞ ( ω ) U †∞ ( ω ) A † x · · · A † x p A x p · · · A x U ∞ ( ω ) M ∞ ( ω ) ∝ M ∞ ( ω ) U †∞ ( ω ) U ∞ ( ω ) M ∞ ( ω ) . (149)Since we have assumed the purity condition, it follows by Lemma 26, with O := U ∞ ( ω ) M ∞ ( ω ),that rank( M ∞ ( ω )) = rank( U ∞ ( ω ) M ∞ ( ω )) = 1. Since this holds for almost all elements ω in thesample space, we can conclude that rank( M ∞ ) = 1 a.s. D.3 rank( M ∞ ) = 1 a.s. implies the purity condition While we in the previous section demonstrated that the purity condition is suﬃcient for M ∞ beinga rank-one operator, we here show that it also is a necessary condition. The idea is to assumethat M ∞ has rank one, but that the purity condition does not hold. The latter means that thereexists a projector P with rank( P ) >

1, while still

P A † x · · · A † x N A x N · · · A x P ∝ P . The latter isthen showed to imply P M ∞ ( ω ) P ∝ P . However, since rank( M ∞ ) = 1, the only possibility is that P M ∞ ( ω ) P = 0. This turns out to be in contradiction with (cid:80) d − x =0 A † x A x = . Lemma 28.

Let { A x } d − x =0 be operators on a ﬁnite-dimensional complex Hilbert space, H , such that (cid:80) d − x =0 A † x A x = . Let ( M N ) N ∈ N be as deﬁned in (119) with respect to ( x N ) N ∈ N and distributed asin (117). Let M ∞ := lim N →∞ M N a.s. , as guaranteed by Lemma 24. If rank( M ∞ ) = 1 a.s. , then { A x } d − x =0 satisﬁes the purity condition in Deﬁnition 3.Proof. We proceed via a proof by contradiction, and thus assume that { A x } d − x =0 is such thatrank( M ∞ ) = 1 a.s , but that the purity condition does not hold. The latter means that thereexists a projector P such that P A † x · · · A † x N A x N · · · A x P ∝ P, ∀ N ∈ N , ∀ ( x , . . . , x N ) ∈ { , . . . , d − } × N , (150)but rank( P ) >

1. Recall that M N =  A † x ··· A † x N A x N ··· A x Tr( A † x ··· A † x N A x N ··· A x ) if Tr( A † x · · · A † x N A x N · · · A x ) (cid:54) = 0 , A † x · · · A † x N A x N · · · A x ) = 0 , (151)where we note that Tr( A † x · · · A † x N A x N · · · A x ) = 0 if and only if A † x · · · A † x N A x N · · · A x = 0. By(150), it thus follows that P M N P ∝ P . Let ω ∈ Ω be such that lim N →∞ M N ( ω ) = M ∞ ( ω ).Consequently, lim N →∞ (cid:107) P M N ( ω ) P − P M ∞ ( ω ) P (cid:107) = 0 . (152)41y P M N P ∝ P , we know that there exists a proportionality constant, a N ( ω ), for each N and ω ,such that P M N ( ω ) P = a N ( ω ) P. (153)Next we use the general relation | Tr( AB ) | ≤ (cid:107) A (cid:107) (cid:107) B (cid:107) to show | a N ( ω ) Tr( P ) − Tr( P M ∞ ( ω ) P ) | = (cid:12)(cid:12)(cid:12) Tr (cid:16) (cid:0) a N ( ω ) P − P M ∞ ( ω ) P (cid:1)(cid:17)(cid:12)(cid:12)(cid:12) , ≤ D (cid:107) P M N ( ω ) P − P M ∞ ( ω ) P (cid:107) → , (154)where we have used (cid:107) (cid:107) = D , (153) and (152). With a ∞ ( ω ) := Tr( P M ∞ ( ω )) / Tr( P ), we can thusconclude that lim N →∞ | a N ( ω ) − a ∞ ( ω ) | = 0. Hence, (cid:107) a ∞ ( ω ) P − P M ∞ ( ω ) P (cid:107) = (cid:107) a ∞ ( ω ) P − a N ( ω ) P + a N ( ω ) P − P M ∞ ( ω ) P (cid:107) , ≤| a ∞ ( ω ) − a N ( ω ) | + (cid:107) P M N ( ω ) P − P M ∞ ( ω ) P (cid:107) → . We can thus conclude that P M ∞ ( ω ) P ∝ P , and hence P M ∞ P ∝ P a.s.

However, since M ∞ by assumption is rank-one a.s. , and rank( P ) >

1, the only possibility is that the proportionalityconstant is zero, i.e., that P M ∞ P = 0 a.s. Next, we note that E ( M N ) = D . By (124) in Lemma24, we know that E ( M N ) → E ( M ∞ ), and thus E ( M ∞ ) = D . However, this is in contradictionwith P M ∞ P = 0 a.s. D.4 w ( N ) goes to zero exponentially if and only if the purity condition holds In the previous sections we have shown that { A x } d − x =0 satisﬁes the purity condition if and only ifrank( M ∞ ) = 1. Here we show that the latter in turn is equivalent to lim N →∞ w ( N ) = 0, and thatthis in turn is equivalent to w ( N ) converging exponentially fast to zero.If M ∞ has rank one, i.e., rank( M ∞ ) = 1, then it follows that (cid:107) M ∞ (cid:107) = 1 and we can relate (cid:107) M ∞ (cid:107) − (cid:107) M N (cid:107) = 1 − (cid:107) M N (cid:107) to the eigenvalues of M N and the singular values of W N . Thelatter directly connects to the deﬁnition of w ( N ) in (157). To this end, we introduce the followingnotation. For a general operator, O , on a space of ﬁnite dimension, D , let ν ↓ ( O ) ≥ · · · ≥ ν ↓ D ( O ) bethe ordered singular values of O . Similarly, for a Hermitian operator, J , let λ ↓ ( J ) ≥ · · · ≥ λ ↓ D ( J )be the ordered eigenvalues of J .The fact that w ( N ) converges to zero does, of course, not guarantee that w ( N ) converges ex-ponentially fast to zero. The latter we obtain by ﬁrst showing that w ( N ) is submultiplicative, i.e., w ( N + M ) ≤ w ( N ) w ( M ), which implies that log w ( N ) is subadditive. We obtain the submultiplica-tivity by rewriting w ( N ) in terms of the norm of the second order exterior power of A x N · · · A x . Inorder to introduce the exterior power of an operator, consider a Hilbert space, H , with an orthonor-mal basis, {| j (cid:105)} Dj =1 , and D = dim H . On the product space H ⊗ H , we construct the swap-operator, S := (cid:80) Dj,k =1 | j (cid:105)(cid:104) k | ⊗ | k (cid:105)(cid:104) j | , where one may note that S = ⊗ and S † = S . We also deﬁne theprojector P A := ( ⊗ − S ) onto the the anti-symmetric subspace of H ⊗ H . For an operator O on H , we deﬁne the exterior power (of degree two) of an operator O as ∧ ( O ) := P A [ O ⊗ O ] P A . A con-sequence of this deﬁnition is that (cid:107) ∧ ( O ) (cid:107) = ν ↓ ( O ) ν ↓ ( O ). Moreover, ∧ ( O O ) = ∧ ( O ) ∧ ( O ),and consequently (cid:107) ∧ ( O O ) (cid:107) ≤ (cid:107) ∧ ( O ) (cid:107)(cid:107) ∧ ( O ) (cid:107) . By comparing these deﬁnitions with (157)below, we can conclude that w ( N ) = (cid:80) d − x ,...,x N =0 (cid:107) ∧ ( A x N · · · A x ) (cid:107) . A further observation is that (cid:107) ∧ ( O ) (cid:107) ≤ (cid:107) O (cid:107) . (155)The exponential decay of w ( N ) is obtained by combining lim N →∞ w ( N ) = 0 with the submulti-plicativity of log w ( N ) and Fekete’s subadditivity lemma. Fekete’s Lemma is commonly attributedto [39]. For a proof, see Lemma 1.2.1 in [40], and for a historical overview, see Section 1.10 in [40].42 emma 29 (Fekete’s subadditive lemma) . Let ( a N ) N ∈ N be a subadditive sequence of real numbers,i.e., a N + M ≤ a N + a M . Then the limit lim N →∞ a N /N is well deﬁned (but may be −∞ ) and lim N →∞ a N N = inf N ∈ N a N N . (156)

Proposition 30.

Let { A x } d − x =0 be linear operators on a ﬁnite-dimensional Hilbert space, such that (cid:80) d − x =0 A † x A x = . Deﬁne w ( N ) := d − (cid:88) x ,...,x N =0 ν ↓ ( A x N · · · A x ) ν ↓ ( A x N · · · A x ) . (157) The following statements are equivalent:1. { A x } d − x =0 satisﬁes the purity condition in Deﬁnition 3.2. lim N →∞ w ( N ) = 0 .3. There exist real constants C (cid:48) ≥ and < γ < such that w ( N ) ≤ C (cid:48) γ N , ∀ N ∈ N . (158) Proof. ⇒ Let M N be as deﬁned in (119). We ﬁrst distinguish the two cases that M N isa density operator, or that it is the zero operator. In the case that M N is a density operator, itfollows that 1 = Tr( M N ) ≥ λ ↓ ( M N ) + λ ↓ ( M N ), and thus 1 ≥ − λ ↓ ( M N ) ≥ λ ↓ ( M N ) ≥

0. Bynoting that (cid:107) M N (cid:107) = λ ↓ ( M N ), we thus get (cid:112) (cid:107) M N (cid:107) (1 − (cid:107) M N (cid:107) ) ≥ (cid:113) λ ↓ ( M N ) λ ↓ ( M N ). Since M N is assumed to be a density operator, it moreover follows that (cid:107) M N (cid:107) ≤

1, and thus (cid:112) | − (cid:107) M N (cid:107)| ≥ (cid:113) λ ↓ ( M N ) λ ↓ ( M N ) . (159)In the case that M N is the zero operator, then (159) is trivially true.By Lemma 24, we know that M ∞ almost surely is a density operator. By Lemma 27, we alsoknow that M ∞ almost surely is a rank-one operator. Hence, M ∞ almost surely corresponds to apure state. Consequently, (cid:107) M ∞ (cid:107) = 1 a.s. Combining this observation with the inverted triangleinequality yields (cid:112) (cid:107) M ∞ − M N (cid:107) ≥ (cid:112) |(cid:107) M ∞ (cid:107) − (cid:107) M N (cid:107)| = (cid:112) | − (cid:107) M N (cid:107)| a.s. (160)Combining (159) with (160) yields (cid:112) (cid:107) M ∞ − M N (cid:107) ≥ (cid:113) λ ↓ ( M N ) λ ↓ ( M N ) a.s. (161)We next observe that ν ↓ k  W N (cid:113) Tr( W † N W N )  = (cid:118)(cid:117)(cid:117)(cid:116) λ ↓ k (cid:32) W † N W N Tr( W † N W N ) (cid:33) = (cid:113) λ ↓ k ( M N ) . (162)Thus, (161) and (162) yields (cid:112) (cid:107) M ∞ − M N (cid:107) ≥ ν ↓ ( W N ) ν ↓ ( W N )Tr( W † N W N ) a.s., (163)43hich, by Lemma 13, results in E (cid:0)(cid:112) (cid:107) M ∞ − M N (cid:107) (cid:1) D ≥ E (cid:32) ν ↓ ( W N ) ν ↓ ( W N )Tr( W † N W N ) (cid:33) D = w ( N ) . (164)By Lemma 24, we know that M N → M ∞ almost surely. Since the underlying Hilbert space isﬁnite-dimensional, we then have (cid:107) M ∞ − M N (cid:107) → a.s. , and consequently x N := (cid:112) (cid:107) M ∞ − M N (cid:107) → a.s. (165)We next observe that M N is a density operator, or the zero operator, and thus (cid:107) M N (cid:107) ≤

1. Hence, (cid:107) M ∞ − M N (cid:107) ≤ (cid:107) M ∞ (cid:107) + (cid:107) M N (cid:107) ≤ (cid:107) M ∞ (cid:107) , which yields x N = (cid:112) (cid:107) M ∞ − M N (cid:107) ≤ (cid:112) (cid:107) M ∞ (cid:107) ≤ (cid:107) M ∞ (cid:107) =: y . (166)By Lemma 24 we know that E ( (cid:107) M ∞ (cid:107) ) < + ∞ , and thus E ( y ) = 1 + E ( (cid:107) M ∞ (cid:107) ) < + ∞ . Byusing this observation and Eqs. (165) and (166) into Proposition 16, we can conclude thatlim N →∞ E ( (cid:112) (cid:107) M ∞ − M N (cid:107) ) = 0. By combining this with (164), it follows that lim N →∞ w ( N ) = 0.Hence, we can conclude that statement 1 implies statement 2. ⇒ We ﬁrst make the observation that (cid:107) ∧ ( A x N + M · · · A x ) (cid:107) ≤ (cid:107) ∧ ( A x N + M · · · A x N +1 ) (cid:107)(cid:107) ∧ ( A x N · · · A x ) (cid:107) , (167)which in turn yields w ( N + M ) ≤ w ( M ) w ( N ). Hence, w is submultiplicative, and thus log w ( N )is subadditive. By statement 2 we know that lim N →∞ w ( N ) = 0. It follows that there exists a N ∈ N such that log w ( N ) <

0. Hence, since log w ( N ) is subadditive, it follows by Lemma 29that 0 > log w ( N ) N ≥ inf N log w ( N ) N = lim N →∞ log w ( N ) N . (168)In the case that the limit is ﬁnite, let l := lim N →∞ log w ( N ) N . By deﬁnition of the limit, we knowthat for any (cid:15) >

0, there exists an N (cid:15) such that [log w ( N )] /N − l ≤ (cid:15) for all N ≥ N (cid:15) . We choosean arbitrary but ﬁxed (cid:15) >

0, and thus w ( N ) ≤ γ N for all N ≥ N (cid:15) , where γ := e l + (cid:15) . Deﬁne C (cid:48) := max (cid:8) , max N =1 ,...,N (cid:15) w ( N ) /N (cid:9) , and thus (158) holds.Let us ﬁnally consider the case that lim N →∞ log w ( N ) N = −∞ . This means that for every a > N a such that [log w ( N )] /N ≤ − a for all N ≥ N a , which we can easily rewrite as w ( N ) ≤ e − aN . Hence, with γ := e − a and C (cid:48) := max { , max N =1 ,...,N a w ( N ) /N } we again obtain(158). We can conclude that statement 2 implies statement 3. ⇒ This implication is trivial. ⇒ In our ﬁrst step, we show that lim N →∞ w ( N ) = 0 implies that (cid:107) M ∞ (cid:107) = 1 a.s. Weﬁrst observe that if η is a density operator on a complex Hilbert space with ﬁnite dimension D ,then 1 − (cid:107) η (cid:107) ≤ (cid:112) ( D − D (cid:113) λ ↓ ( η ) λ ↓ ( η ). We know that M N is either a density operator, or thezero operator, and thus 1 − (cid:107) M N (cid:107) ≥

0. We moreover know that M N almost surely is a densityoperator. With x := 1 − (cid:107) M N (cid:107) and y := (cid:112) D ( D − (cid:113) λ ↓ ( M N ) λ ↓ ( M N ), we can use the aboveobservations to conclude that 0 ≤ x ≤ y a.s. Moreover, by Lemma 13, we obtain1 − E ( (cid:107) M N (cid:107) ) = E (1 − (cid:107) M N (cid:107) ) ≤ D (cid:114) D − D E (cid:18)(cid:113) λ ↓ ( M N ) λ ↓ ( M N ) (cid:19) . (169)44ext we note that the observation in (162) yields E (cid:18)(cid:113) λ ↓ ( M N ) λ ↓ ( M N ) (cid:19) D = E (cid:32) ν ↓ ( W N ) ν ↓ ( W N )Tr( W † N W N ) (cid:33) D = w ( N ) . (170)By combining (169) and (170), one obtains 1 − E ( (cid:107) M N (cid:107) ) ≤ w ( N ) (cid:112) ( D − /D . By the assumptionthat lim N →∞ w ( N ) = 0, it follows that lim N →∞ E ( (cid:107) M N (cid:107) ) = 1. By (125) in Lemma 24, we knowthat E ( (cid:107) M N (cid:107) ) → E ( (cid:107) M ∞ (cid:107) ). We can thus conclude that E ( (cid:107) M ∞ (cid:107) ) = 1. With x := 1 − (cid:107) M ∞ (cid:107) ,it follows that E ( x ) = 0. Since M ∞ is almost surely a density operator, it follows that 1 ≥ (cid:107) M ∞ (cid:107) almost surely. Hence, x = 1 − (cid:107) M ∞ (cid:107) ≥ a.s. By combining this observation and E ( x ) = 0 withLemma 15, we obtain x = 0 a.s. , and thus (cid:107) M ∞ (cid:107) = 1 a.s. By Lemma 24, we know that M ∞ almost surely is a density operator. If M ∞ is a density operator, then M ∞ is a rank one operator ifand only if (cid:107) M ∞ (cid:107) = 1. We can thus conclude that rank( M ∞ ) = 1 a.s. By Lemma 28, this impliesthat { A x } d − x =0 satisﬁes the purity condition in Deﬁnition 3. Hence, statement 2 implies statement1. D.5 Generalization to F and σ The entire proof has up to this point concerned the exponential decay of w ( N ), while we actuallywish to ﬁnd conditions for the exponential decay of f ( N ). Here, we ﬁnd necessary as well assuﬃcient conditions for the exponential decay of f ( N ). We state a slightly more elaborate versionof Proposition 5. Proposition 31.

Let { A x } d − x =1 be linear operators on the ﬁnite-dimensional complex Hilbert space H , such that (cid:80) d − x =1 A † x A x = . For operators σ and F on H , deﬁne f ( N ) := d − (cid:88) x N ,...,x =1 ν ↓ ( F A x N · · · A x √ σ ) ν ↓ ( F A x N · · · A x √ σ ) . (171) If { A x } d − x =1 satisﬁes the purity condition in Deﬁnition 3, then there exist real constants ≤ c and < γ < , which satisfy f ( N ) ≤ cγ N , ∀ N ∈ N , (172) for all density operators σ , and all F such that F † F ≤ .Conversely, if f is deﬁned with respect to some full-rank operators σ and F , such that thereexist constants ≤ c σ,F and < γ < which fulﬁll f ( N ) ≤ c σ,F γ N , ∀ N ∈ N , (173) then { A x } d − x =1 satisﬁes the purity condition.Proof. We begin by proving the ﬁrst claim of the proposition. We ﬁrst note that f ( N ) = (cid:88) x ,...,x N (cid:107) ∧ ( F A x N · · · A x √ σ ) (cid:107) , ≤(cid:107) ∧ ( F ) (cid:107)(cid:107) ∧ ( √ σ ) (cid:107) w ( N ) , ≤(cid:107) F (cid:107) (cid:107)√ σ (cid:107) w ( N ) , ≤ w ( N ) , (174)45here w is as deﬁned in (157), and where the next to last inequality follows from (155). The lastinequality follows since σ is assumed to be a density operator, and thus (cid:107)√ σ (cid:107) ≤

1, and similarly F † F ≤ implies (cid:107) F (cid:107) ≤

1. If { A x } d − x =0 satisﬁes the purity condition, then it follows by Proposition30 that w ( N ) ≤ C (cid:48) γ N . By combining this observation with (174), we obtain (172) with c := C (cid:48) .Note that Proposition 30 makes no reference to F or σ , and thus c is independent of these.Next, we turn to the second claim of the proposition. For this purpose, we ﬁrst note that since F and σ (and thus √ σ ) are full-rank operators on a ﬁnite-dimensional space, it follows that F − and √ σ − exist. With w as deﬁned in (157), we thus ﬁnd w ( N ) = (cid:88) x ,...,x N (cid:107) ∧ ( A x N · · · A x ) (cid:107) ≤ (cid:107) ∧ ( F − ) (cid:107)(cid:107) ∧ ( √ σ − ) (cid:107) f ( N ) . (175)Hence, with c (cid:48) σ,F := (cid:107) ∧ ( F − ) (cid:107)(cid:107) ∧ ( √ σ − ) (cid:107) , we get w ( N ) ≤ c (cid:48) σ,F f ( N ). Combined with the assump-tion (173), it follows that w ( N ) ≤ c σ,F c (cid:48) σ,F γ N , ∀ N ∈ N , (176)where w is as deﬁned in Proposition 30, and 0 < γ <

1. With C (cid:48) := c σ,F c (cid:48) σ,F in Proposition 30, itfollows that { A x } d − x =1=1