Local asymptotic equivalence of pure quantum states ensembles and quantum Gaussian white noise
aa r X i v : . [ m a t h . S T ] M a y LOCAL ASYMPTOTIC EQUIVALENCE OF PURE QUANTUM STATES ENSEMBLESAND QUANTUM GAUSSIAN WHITE NOISE
CRISTINA BUTUCEA , MĂDĂLIN GUŢĂ , MICHAEL NUSSBAUM Abstract.
Quantum technology is increasingly relying on specialised statistical inference methods for analysingquantum measurement data. This motivates the development of “quantum statistics”, a field that is shaping upat the overlap of quantum physics and “classical” statistics. One of the less investigated topics to date is thatof statistical inference for infinite dimensional quantum systems, which can be seen as quantum counterpartof non-parametric statistics. In this paper we analyse the asymptotic theory of quantum statistical modelsconsisting of ensembles of quantum systems which are identically prepared in a pure state. In the limit of largeensembles we establish the local asymptotic equivalence (LAE) of this i.i.d. model to a quantum Gaussianwhite noise model. We use the LAE result in order to establish minimax rates for the estimation of pure statesbelonging to Hermite-Sobolev classes of wave functions. Moreover, for quadratic functional estimation of thesame states we note an elbow effect in the rates, whereas for testing a pure state a sharp parametric rate isattained over the nonparametric Hermite-Sobolev class.
Keywords and phrases:
Le Cam distance, local asymptotic equivalence, quantum Gaussian process,quantum Gaussian sequence, quantum states ensemble, nonparametric estimation, quadratic functionals, non-parametric sharp testing rates. Introduction
A striking insight of quantum mechanics is that randomness is a fundamental feature of the physical world atthe microscopic level. Any observation made on a quantum system such as an atom or a light pulse, results in anon-deterministic, stochastic outcome. The study of the direct map from the system’s state or preparation to theprobability distribution of the measurement outcomes, has been one of the core topics in traditional quantumtheory. In recent decades the focus of research has shifted from fundamental physics towards applications atthe interface with information theory, computer science, and metrology, sharing the paradigm that individualquantum systems as carriers of a new type of information [52].In many quantum protocols, the experimenter has incomplete knowledge and control of the system and itsenvironment, or is interested in estimating an external field parameter which affects the system dynamics. In thiscase one deals with a statistical inverse problem of inferring unknown state parameters from the measurementdata obtained by probing a large number of individual quantum systems. The theory and practice arisingfrom tackling such questions is shaping up into the field of quantum statistics , which lies at the intersection ofquantum theory and statistical inference [39, 37, 36, 56, 6, 1].One of the central problems in quantum statistics is state estimation: given an ensemble of identically prepared,independent systems with unknown state, the task is to estimate the state by performing appropriate mea-surements and devising estimators based on the measurement data. A landmark experiment aimed at creatingmultipartite entangled states [34] highlighted the direct practical relevance of efficient estimation techniquesfor large dimensional systems, the complexity of estimating large dimensional states, and the need for solidstatistical methodology in computing reliable “error bars”. This has motivated the development of new methodssuch as compressed sensing and matrix ℓ -minimisation [29, 28, 22], spectral thresholding for low rank states[14], confidence regions [17, 18, 65, 62, 21].Another important research direction is towards developing a quantum decision theory as the overall mathe-matical framework for inference involving quantum systems seen as a form of “statistical data”. Typically, theroute to finding the building blocks of this theory starts with a decision problem (e.g. testing between twostates, or estimating certain parameters of a state) and the problem of finding optimal measurement settingsand statistical procedures for treating the (classical, random) measurement data. For instance, in the context CREST, ENSAE, Université Paris-Saclay, 3, ave. P. Larousse 92245 MALAKOFF Cedex, FRANCE, University of Nottingham, School of Mathematical Sciences, University Park, NG7 2RD Nottingham,UK, Department of Mathematics, Malott Hall, Cornell University, Ithaca, NY 14853, USA of asymptotic binary hypothesis testing, two key results are the quantum Stein lemma [38, 55] and the quantumChernoff bound [2, 53, 3, 50]. As in the classical case, they describe the exponential decay of appropriateerror probabilities for optimal measurements, and they provide operational interpretations for quantum relativeentropy, and respectively quantum Chernoff distance. Similarly, an important problem in state estimation is toidentify measurements which allow for the smallest possible estimation error. A traditional approach has beento establish a “quantum Cramér-Rao bound” (QCRB) [39, 37, 10] for the covariance of unbiased estimators,where the right side is the inverse of the “quantum Fisher information matrix”, the latter depending only onthe structure of the quantum statistical model. However, while the QCRB is achievable asymptotically for one-dimensional parameters, this is not the case for multi-parameter models due to the fact that the measurementswhich are optimal for different one-dimensional components, are generally incompatible with each other.These difficulties can be overcome by developing a fundamental theory of comparison and convergence of quan-tum statistical models, as an extension of its classical counterpart [63, 48]. While classical “data processing” isdescribed by randomisations, physical transformations of quantum systems are described by quantum channels [52]. Following up on this idea, Petz and Jencova [58] have obtained a general characterisation of equivalentmodels , as families of states that are related by quantum channels in both directions. This naturally leads tothe notion of
Le Cam distance between quantum statistical models as the least trace-norm error incurred whentrying to map one model into another via quantum channels [43]. In this framework, the asymptotic theoryof state estimation can be investigated by adopting ideas from the classical local asymptotic normality (LAN)theory [48]. Quantum LAN theory [31, 33, 43] shows that the sequence of models describing large samples ofidentically prepared systems can be approximated by a simpler quantum Gaussian shift model , in the neigh-bourhood of an interior point of the parameter space. The original optimal state estimation problem is thensolved by combining LAN theory with known procedures for estimation of Gaussian states [30, 32, 25].In this paper we extend the scope of the quantum LAN theory to cover non-parametric quantum models; moreprecisely we will be interested in the set of pure states (one-dimensional projections) on infinite dimensionalHilbert spaces . Infinite dimensional systems such as light pulses, free particles, are commonly encountered inquantum physics, and their estimation is an important topic in quantum optics [49]. The minimax resultsderived in this paper can serve as a benchmark for the performance of specific methods such as for instancequantum homodyne tomography [1, 13], by comparing their risk with the minimax risk derived here.The paper is organised as follows. In Section 2 we review the basic notions of quantum mechanics neededfor understanding the physical context of our investigation. In particular, we define the concepts of state,measurement and quantum channel which can loosely be seen as quantum analogs of probability distribution andMarkov kernels, respectively. We further introduce the formalism of quantum Gaussian states, the Fock spacesand second quantisation, which establish the quantum analogs of Gaussian distributions, Gaussian sequencesand Gaussian processes in continuous time. In Section 3.1 we review results in classical statistics on non-parametric asymptotic equivalence which serve as motivation and comparison to our work. In Section 3.2 weintroduce the general notion of a quantum statistical model and the Le Cam distance between two models. Inparticular, in Section 3.3 we define the i.i.d. and Gaussian quantum models which are analysed in the remainderof the paper.One of the main results is Theorem 4.1 giving the local asymptotic equivalence (LAE) between the non-parametrici.i.d. pure states model and the Gaussian shift model. This extends the existing local asymptotic normalitytheory from parametric to non-parametric (infinite dimensional) models. Section 5 details three applications ofthe LAE result in Theorem 4.1. In Section 5.1 we derive the asymptotic minimax rates and provide concreteestimation procedures for state estimation with respect to the trace-norm and Bures distances, which areanalogues of the norm-one and Hellinger distances respectively. The main results are Theorems 5.1 and 5.3which deal with the upper and respectively lower bound for a model consisting of ensemble of n independentidentically prepared systems in a pure state belonging to a Hermite-Sobolev class S α ( L ) of wave functions. InTheorem 5.1 we describe a specific measurement procedure which provides an estimator whose risk attains thenonparametric rate n − α/ (2 α +1) . The lower bound follows by using the LAE result to approximate the modelwith a Gaussian one, combined with the lower bound for the corresponding quantum Gaussian model derived inTheorem 5.2. In Section 5.2 we consider the estimation of a state functional corresponding to the expectation ofa power N β of the number operator. Theorems 5.4 and 5.5 establish the upper and lower bounds for functionalestimation for the Hermite-Sobolev class S α ( L ) . The minimax rates are n − / (parametric) if α ≥ β , and n − β/α if β < α < β . In Section 5.3 we investigate non-parametric testing between a single state and acomposite hypothesis consisting of all states outside a ball of shrinking radius. Surprisingly, we find that theminimax testing rates are parametric, in contrast to the non-parametric estimation rates. This fact is closelyrelated to the fact that the optimal estimation and testing measurements are incompatible with each other, so AE OF PURE STATES ENSEMBLES AND QUANTUM GAUSSIAN WHITE NOISE 3 that no single measurement strategy can allow for minimax estimation and testing in the same time. Results onthe minimax optimal rate for testing and the sharp asymptotics are given in Theorems 5.6 and 5.7 respectively.
Notation.
Following physics convention, the vectors of a Hilbert space H will be denoted by the “ket” | v i ,so that the inner product of two vectors is the “bra-ket” h u | v i ∈ C which is linear with respect to the rightentry and anti-linear with respect to the left entry. Similarly, M := | u ih v | is the rank one operator acting as M : | w i 7→ M | w i = h v | w i| u i . We denote by L ( H ) the space of bounded linear operators on H which is a C ∗ -algebra with respect to the operator norm k A k := sup ψ =0 k Aψ k / k ψ k . Additionally, T ( H ) ⊂ L ( H ) is the spaceof Hilbert-Schmidt (or trace-class) operators equipped with the norm-one k τ k := Tr( | τ | ) , where the operator | τ | := ( τ ∗ τ ) / is the absolute value of τ , and τ ∗ is the adjoint of τ . Finally, we denote by T ( H ) ⊂ L ( H ) thespace of Hilbert-Schmidt operators equipped with the norm-two k τ k := Tr( | τ | ) , which is a Hilbert space withrespect to the inner product ( τ, σ ) := Tr( τ ∗ σ ) .2. Quantum mechanics background
In this section we review some basic notions of quantum mechanics (QM), in as much as it is required forunderstanding the subsequent results of the paper. Since QM is a probabilistic theory of quantum phenomena, itis helpful to approach the formalism from the perspective of analogies and differences with “classical” probability.We refer to [52] for more details on the quantum formalism.2.1.
States, measurements, channels.
The QM formalism assigns to each quantum mechanical system (e.g.an atom, light pulse, quantum spin) a complex Hilbert space H , called the space of states. For instance, thefinite dimensional space C d is the Hilbert space of a system with d “energy levels”, while L ( R ) is the spaceof “wave functions” of a particle moving in one dimension, or of a monochromatic light pulse. The state of aquantum system is represented mathematically by a density matrix. Definition 1.
Let H be the Hilbert space of a quantum system. A density matrix (or state) on H is a linearoperator ρ : H → H which is positive (i.e. it is selfadjoint and has non-negative eigenvalues), and has traceone.
We denote by S ( H ) the convex space of states on H . Its linear span is the space of trace class operators T ( H ) ,which is the non-commutative analogue of the space of absolutely integrable functions on a probability space L (Ω , Σ , P ) . For any states ρ or ρ , the convex combination λρ + (1 − λ ) ρ is also a state which correspondsto randomly preparing the system in either the state ρ or ρ with probabilities λ and respectively − λ . Theextremal elements of the convex set S ( H ) are the one dimensional projections P ψ = | ψ ih ψ | where | ψ i is anormalised vector, i.e. k ψ k = 1 . Such states are called pure (as opposed to mixed states which are convexcombinations of pure ones), and are uniquely determined by the vector | ψ i . Conversely, the vector | ψ i is fixedby the state up to a complex phase factor, i.e. | ψ i and | ψ ′ i := e iφ | ψ i represent the same state.Although the quantum state encodes all information about the preparation of the system, it is not a directlyobservable property. Instead, any measurement produces a random outcome whose distribution depends on thestate, and thus reveals in a probabilistic way a certain aspect of the system’s preparation. The simplest type ofmeasurement is determined by an orthonormal basis (ONB) {| i i} dim H i =1 and a set of possible outcomes { λ i } dim H i =1 in the following way: the outcome is a random variable X taking the value λ i with probability given by thediagonal elements of ρ in this particular basis P ρ ([ X = λ i ]) = ρ ii = h i | ρ | i i . More generally, a measurement M with outcomes in a measurable space (Ω , Σ) is determined by a positiveoperator valued measure. Definition 2.
A positive operator valued measure (POVM) is a map M : Σ → L ( H ) having the followingproperties1) positivity: M ( E ) ≥ for all events E ∈ Σ
2) sigma additivity: M ( ∪ i E i ) = P i M ( E i ) for any countable set of mutually disjoint events E i
3) normalization: M (Ω) = .The outcome of the corresponding measurement associated to M has probability distribution P ρ ( E ) = Tr( ρM ( E )) , E ∈ Σ . LAE OF PURE STATES ENSEMBLES AND QUANTUM GAUSSIAN WHITE NOISE
The most important example of a POVM, is that associated to the measurement of an observable, the latterbeing represented mathematically by a selfadjoint operator A : H → H . The Spectral Theorem shows that suchoperators can be “diagonalised”, i.e. they have a spectral decomposition A = Z σ ( A ) xP ( dx ) where σ ( A ) is the spectrum of A , and { P ( E ) : E ∈ Σ } is the collection of spectral projections of A . Thecorresponding measurement has outcome a ∈ σ ( A ) with probability distribution P ρ [ a ∈ E ] = Tr( ρP ( E )) .Unlike “classical” systems which can be observed without disturbing their state, quantum systems are typicallyperturbed by the measurement, so the system needs to be reprepared in order to obtain more informationabout the state. In this sense, the system can be seen as a “quantum sample” which it can be converted intoa “classical” sample only by performing a measurement. Thus, a measurement can be seen as a “quantum-to-classical randomisation”, i.e. a linear map M which sends a state ρ to the probability density M ( ρ ) ≡ p ρ := d P ρ d P with respect to a reference measure P . The latter can be taken to be P ρ for a strictly positive density matrix ρ . The following lemma summarises this perspective on measurements. Lemma 2.1.
Let H be a Hilbert space, and let (Ω , Σ) be a measurable space. For any fixed state ρ > on H ,there is a one-to-one correspondence between POVMs M over (Ω , Σ) and quantum-to-classical randomisations,i.e. linear maps M : T ( H ) → L (Ω , Σ , P ) which are positive and normalised (maps states into probability densities). The correspondence is given by P ρ ( E ) = Tr( M ( E ) ρ ) = Z A p ρ ( ω ) P ρ ( dω ) , M ( ρ ) ≡ p ρ := d P ρ d P . For comparison, recall that a linear map R : L (Ω ′ , Σ ′ , P ′ ) → L (Ω , Σ , P ) is a stochastic operator if it mapsprobability densities into probability densities [61]. Typically such maps arise from Markov kernels and describerandomizations of dominated statistical experiments (models).While a measurement is a quantum-to-classical randomization, a “quantum-to-quantum randomization” de-scribes how the system’s state changes as a result of time evolution or interaction with other systems. Themaps describing such transformations are called quantum channels. Definition 3.
A quantum channel between systems with Hilbert spaces H and H is a trace preserving, com-pletely positive linear map T : T ( H ) → T ( H ) . The two properties mentioned above are similar to those of a classical randomization, so in particular T mapsstates into states. However, unlike the classical case, T is required to satisfy a stronger positivity property: T is completely positive if Id m ⊗ T is positive for all m ≥ , where Id m is the identity map on the space of m dimensional matrices. This ensures that when the system is correlated with an ancillary system C m , andthe latter undergoes the identity transformation, the final joint state is still positive, as expected on physicalgrounds.The simplest example of a quantum channel is a unitary transformation ρ U ρU ∗ , where U is a unitaryoperator on H . More generally, if | ϕ i ∈ K is a pure state of an ancillary system, and V is a unitary on H ⊗ K ,then ρ T ( ρ ) := Tr K ( V ( ρ ⊗ | ϕ ih ϕ | ) V ∗ ) is a quantum channel describing the system state after interacting with the ancilla. By computing the partialtrace Tr K over K with respect to an orthonormal basis {| f i i} dim K i =1 we obtain the following expression T ( ρ ) = X i K i ρK ∗ i (1)where K i are operators on H defined by h ψ | K i | ψ ′ i := h ψ ⊗ f i | U | ψ ′ ⊗ ϕ i . Note that by definition, these operatorssatisfy the normalisation condition P i K ∗ i K i = . Conversely, the Kraus Theorem shows that any quantumchannel is of the form (1) with operators K i respecting the normalisation condition.2.2. Continuous variables, Fock spaces and Gaussian states.
In this section we look at the class of“continuous variables” (cv) systems, which model a variety of physical systems such as light pulses, or freeparticles. Such systems play an important role in this work as “carriers” of quantum Gaussian states, and inparticular in the local asymptotic equivalence result. We refer to [49] for further reading.
AE OF PURE STATES ENSEMBLES AND QUANTUM GAUSSIAN WHITE NOISE 5
One mode systems.
We start with the simplest case of a “one-mode” cv system, after which we show howthis construction can be extended to more general “multi-mode” cv systems. The Hilbert space of a one-modesystem is L ( R ) , i.e. the space of square integrable wave functions on the real line. On this we define theselfadjoint operators acting on appropriately defined domains as ( Qψ )( q ) = qψ ( q ) , ( P ψ )( q ) = − i dψ ( q ) dq which satisfy the “canonical commutation relations” QP − P Q = i . To better understand the meaning ofthe observable Q , let us consider its measurement for a pure state ρ = | ψ ih ψ | with wave function | ψ i . Theoutcome takes values in R , and its probability distribution has density with respect to the Lebesgue measure p Qρ ( x ) = | ψ ( x ) | . Similarly, the probability density of the observable P is given by p Pρ ( x ) = | ˜ ψ ( x ) | , where ˜ ψ ∈ L ( R ) is the Fourier transform of the function ψ ( · ) . When the system under consideration is the freeparticle, Q and P are usually associated to the position and momentum observables, while for a monochromaticlight mode they correspond to the electric and magnetic fields. Note that the distributions of P and Q are notsufficient to identify the state, even in the case of a pure state. However, it turns out that the state is uniquelydetermined by the collection of probability distributions of all quadrature observables X φ := cos( φ ) · Q +sin( φ ) · P for angles φ ∈ [0 , π ] . To understand this, it is helpful to think of the state of the one-mode cv system as aquantum analogue of a joint distribution of two real valued variables, i.e. a 2D distribution. Indeed, in thelatter case, the distribution is determined by collection of marginals along all directions in the plane (its Radontransform); this fact is exploited in PET tomography which aims at estimating the 2D distribution from samplesof its Radon transform. In the quantum case, since Q and P do not commute with each other, they cannot bemeasured simultaneously and cannot be assigned a joint distribution in a meaningful way. However, the “quasi-distribution” defined below has some of the desired properties, and is very helpful in visualising the quantumstate. Definition 4.
For any state ρ ∈ T ( L ( R )) we define the quantum characteristic function of ρ f W ρ ( u, v ) := Tr(exp( − iuQ − ivP ) ρ ) . The inverse Fourier transform of f W ρ with respect to both variables is called Wigner function W ρ , or quasi-distribution associated to ρ : W ρ ( q, p ) = 1(2 π ) Z Z exp( iuq + ivp ) f W ρ ( u, v ) dudv. A consequence of this definition is that the marginal of W ρ ( q, p ) along an arbitrary direction with angle φ isthe probability density of the quadrature X φ introduced above. This is the basis of a quantum state estimationscheme called “quantum homodyne tomography” [49, 1], where the Wigner function plays the role of the 2Ddistribution from “classical” PET tomography. One of the important differences however, is that the Wignerfunctions need not be positive in general, and satisfy other constraints which are specific to the quantum settingand can be exploited in the estimation procedure.The Wigner function representation offers an intuitive route to defining the notion of Gaussian state. Definition 5.
A state ρ of a one-mode cv system is called Gaussian if its Wigner function W ρ is a Gaussianprobability density, or equivalently if it has the quantum characteristic function f W ρ ( u, v ) = exp (cid:18) − ( u, v ) V u, v ) T (cid:19) · exp( iuq + ivp ) . where ( q , p ) ∈ R and V (a real positive × matrix) are the mean and variance of W ρ , respectively. In particular, all the quadratures X φ of a Gaussian state have Gaussian distribution. As consequence ofthe commutation relation QP − P Q = i the observables Q and P cannot have arbitrarily small variancesimultaneously; in particular, the covariance matrix V must satisfy the “uncertainty principle” Det( V ) ≥ / ,where the equality is achieved if and only if the state is a pure Gaussian state.We will be particularly interested in coherent states | G ( z ) i which are pure Gaussian states whose Wignerfunctions have covariance matrix V = I / , where I is the × identity matrix. To give a concrete Hilbert spacerepresentation, it is convenient to introduce a special orthonormal basis of L ( R ) , consisting of the eigenvectors {| i , | i , . . . } of the number operator N = a ∗ a , with N | k i = k | k i . Here, the operators a ∗ = ( Q − iP ) / √ and a = ( Q + iP ) / √ are called creation and annihilation operators and act as “ladder operators” on the numberbasis vectors (or Fock states) a | k i = √ n | k − i , a ∗ | k i = √ k + 1 | k + 1 i . LAE OF PURE STATES ENSEMBLES AND QUANTUM GAUSSIAN WHITE NOISE
The coherent states denoted by | G ( z ) i are obtained by applying the unitary Weyl (displacement) operators tothe vacuum state | i | G ( z ) i = exp ( za ∗ − ¯ za ) | i = exp( −| z | / ∞ X k =0 z k √ k ! | n i , (2)where z ∈ C is the eigenvalue of the annihilation operator a | G ( z ) i = z | G ( z ) i ; in particular, the quadraturemeans are h G ( z ) | Q | G ( z ) i = √ z ) and h G ( z ) | P | G ( z ) i = √ z ) , and the Wigner function is given by W | z i ( q, p ) = 1 π exp (cid:16) − ( q − √ x ) − ( p − √ y ) (cid:17) , q, p ∈ R . (3)Equation (2) implies that the number operator N has a Poisson distribution with mean | z | . Additionally, it canbe seen from the Fourier expansion in the second equality that the unitary Γ( φ ) = exp( iφN ) acts by rotatingthe coherent states by an angle φ in the complex plane, i.e. Γ( φ ) | G ( z ) i = | G ( e iφ z ) i .Another important class of Gaussian states are the mixed diagonal states Φ( r ) = (1 − r ) ∞ X k =0 r k | k ih k | , < r < (4)which are also called thermal states, cf. section 3.3 in [49]. The corresponding Wigner function is a centredGaussian W Φ( r ) ( q, p ) = 12 πσ ( r ) exp (cid:18) − q + p σ ( r ) (cid:19) . (5)with covariance matrix V = σ ( r ) · I where σ ( r ) =
12 1+ r − r . Proposition 2.2.
Consider the family of coherent states {| G ( z ) ih G ( z ) | , z ∈ C } , with random displacement(location) z distributed according to Π( dz ) , having a Gaussian law with covariance matrix σ · I . Then, themixed state Φ = R | G ( z ) ih G ( z ) | Π( dz ) is a thermal state Φ( r ) , with r = 2 σ σ + 1 . Proof.
Consider the corresponding Wigner function W Φ ( q, p ) = Z W | G ( z ) i ( q, p ) exp (cid:18) − σ ( x + y ) (cid:19) πσ dxdy = 1 πσ Z exp (cid:18) − ( q − √ x ) − x σ (cid:19) dx √ π · Z exp (cid:18) − ( p − √ y ) − y σ (cid:19) dy √ π = 1 π (4 σ + 1) exp (cid:18) − q + p σ + 1 / (cid:19) . (6)Therefore, the state Φ is identical to the thermal state Φ( r ) with σ + =
12 1+ r − r , or equivalently r = σ σ . (cid:3) This fact will be used later on in in section 5 in applications to functional estimation and testing.2.2.2.
Fock spaces and second quantisation.
The above construction can be generalised to multimode systemsby tensoring several one-mode systems. Thus, the Hilbert space of a k -mode system is L ( R ) ⊗ k ∼ = L ( R k ) ,upon which we define “canonical pairs” ( Q i , P i ) acting on the i -th tensor as above, and as identity on theother tensors. Similarly we define the one-mode operators a i , a ∗ i , N i . The number basis consists now of tensorproducts | n i := ⊗ ki =1 | n i i indexed by the sequences of integers n = ( n , . . . , n k ) . A multimode coherent state isa tensor product of one-mode coherent states | G ( z ) i = ⊗ ki =1 | G ( z i ) i = exp (cid:0) za † − az † (cid:1) | i = exp( −| z | / ∞ X n =0 k Y i =1 z ni √ n i ! ! | n i ∈ L ( R ) ⊗ k (7)where z = ( z , . . . , z k ) is the vector of means, a = ( a , . . . , a k ) , and † denotes the transposition and adjoint(complex conjugation) of individual entries.We will now extend this construction to systems with infinitely many modes. One way to do this is by definingan infinite tensor product of one-mode spaces, as completion of the space spanned by tensors in which all buta finite number of modes are in the vacuum state. Instead, we will present an equivalent but more elegantconstruction called second quantisation which will be useful for later considerations. AE OF PURE STATES ENSEMBLES AND QUANTUM GAUSSIAN WHITE NOISE 7
Definition 6.
Let K be a Hilbert space. The Fock space over K is the Hilbert space F ( K ) = M n ≥ K ⊗ s n (8) where K ⊗ s n denotes the n -fold symmetric tensor product, i.e. the subspace of K ⊗ n consisting of vectors whichare symmetric under permutations of the tensors. The term K ⊗ s =: C | i is called the vacuum state. In this definition the space K should be regarded as the “space of modes” rather than physical states. As wewill see below, by fixing an orthonormal basis in K , we can establish an isomorphism between the Fock space F ( K ) and a tensor product of one-mode cv spaces, one for each basis vector. In particular, if K = C , then F ( C ) ∼ = L ( R ) so that the one-dimensional subspaces in the direct sum in (8) correspond to the number basisvectors | i , | i , · · · ∈ L ( R ) of a one-mode cv system.We now introduce the general notion of coherent state on a Fock space. Definition 7.
Let F ( K ) be the Fock space over K . For each | v i ∈ K we define an associated coherent state | G ( v ) i := e −k v k / M n ≥ √ n ! | v i ⊗ n ∈ F ( K ) . The coherent vectors form a dense subspace of F ( K ) . This fact can be used to prove the following factorisationproperty, and to define the annihilation operators below. Let K = K ⊕ K be a direct sum decomposition of K into orthogonal subspaces, and let | v i = | v i ⊕ | v i be the decomposition of a generic vector | v i ∈ K . Then themap U : F ( K ) → F ( K ) ⊗ F ( K ) U : | G ( v ) i 7→ | G ( v ) i ⊗ | G ( v ) i is unitary. We will use this correspondence to identify F ( K ) with the tensor product F ( K ) ⊗ F ( K ) . By thesame argument, for any orthonormal basis {| e i , | e i , . . . } of K , the Fock space F ( K ) is isomorphic with thetensor product of one mode spaces F i := F ( C | e i i ) and the coherent states factorise as F ( K ) ∼ = O i F i | G ( u ) i ∼ = O i | G ( u i ) i , u i = h e i | u i . (9)so that we recover the formula (7).We define the annihilation operators through their action on coherent states as follows: for each mode | u i ∈ K the associated annihilator a ( u ) : F ( K ) → F ( K ) is given by a ( u ) : | G ( v ) i = h u | v i| G ( v ) i , | v i ∈ K . Then the annihilation and (their adjoint) the creation operators satisfy the commutation relations a ( u ) a ∗ ( w ) − a ∗ ( w ) a ( u ) = h u | v i . For each mode we can also define the canonical operators Q ( u ) , P ( u ) and the number operator N ( u ) in termsof a ( u ) , a ∗ ( u ) as in the one-mode case. Moreover, if | u i = | u i ⊕ | u i is the decomposition of | u i as above,then a ( u ) acts as a ( u ) ⊗ F ( K ) , when the Fock space is represented in the tensor product form. Similardecompositions hold for a ∗ ( u ) , N ( u ) , a ( u ) , a ∗ ( u ) , N ( u ) .The second quantisation has the following functorial properties which will be used later on. Definition 8.
Let W : K → K be a unitary operator. The quantisation operator Γ( W ) is the unitary definedby Γ( W ) : F ( K ) → F ( K ) by Γ( W ) := M n ≥ W ⊗ n (10) where W ⊗ n acts on the n -th level of the Fock space K ⊗ s n . From the definition it follows that the action of Γ( W ) on coherent states is covariant in the sense that Γ( W ) : F ( K ) → F ( K )Γ( W ) : | G ( v ) i 7→ | G ( W v ) i . LAE OF PURE STATES ENSEMBLES AND QUANTUM GAUSSIAN WHITE NOISE
In particular, it follows from the definitions that Γ( e iφ ) = exp( iφN ) , where N is the total number operator,whose action on the n -th level of the Fock space is N | v i ⊗ n = n | v i ⊗ n . Note that while | v i and e iφ | v i differ onlyby a phase and hence represent the same state, the corresponding coherent states | G ( v ) i and Γ( e iφ ) | G ( v ) i = | G ( e iφ v ) i are linearly independent and represent different states.As in the single mode case, the coherent states can be obtained by acting with the unitary displacement (orWeyl) operators onto the vacuum | G ( u ) i = exp( a ∗ ( u ) − a ( u )) | i Moreover, the coherent states | G ( u ) i are Gaussian with respect to all coordinates. The means of annihila-tion operators are given by h G ( u ) | a ( w ) | G ( u ) i = h w | v i , from which we can deduce that the the coordinates ( Q ( w ) , P ( w )) have means ( √ h w | u i , √ h w | u i ) . The covariance of coherent states is constant (independentof the displacement u ), and is given by h | a ( w ) a ∗ ( v ) | i = h w | v i . This implies that orthogonal modes (i.e. h w | v i = 0) have independent pairs of coordinates.2.3. Metrics on the space of states.
For future reference we review here the states space metrics used inthe paper. Recall that the space of states S ( H ) on a Hilbert space H is the cone of positive, trace one operatorsin T ( H ) . The norm-one (or trace-norm) distance between two states ρ , ρ ∈ S ( H ) is given by k ρ − ρ k := Tr( | ρ − ρ | ) where | τ | := √ τ ∗ τ denotes the absolute value of τ . The norm-one distance can be interpreted as the maximumdifference between expectations of bounded observables k ρ − ρ k = sup A : k A k≤ | Tr( ρ A ) − Tr( ρ A ) | . Another interpretation is in terms of quantum testing. Let M = ( M , M ) be a binary POVM used to testbetween hypotheses H := { measured state is ρ } and H := { measured state is ρ } . The sum of error proba-bilities is P Me = Tr( M ρ ) + Tr( M ρ ) . By optimizing over all possible POVM we obtain [37] the optimal error probability sum P ∗ e := inf M P Me = 1 − k ρ − ρ k . (11)In the special case of pure states, the norm-one distance is given by k| ψ ih ψ | − | ψ ih ψ |k = 2 p − |h ψ | ψ i| , (12)as proven e.g. in [44]. The previous formula becomes for coherent states k| G ( ψ ) ih G ( ψ ) | − | G ( ψ ) ih G ( ψ ) |k = 2 p − exp( −k ψ − ψ k ) . The second important metric is the
Bures distance whose square is given by d b ( ρ , ρ ) := 2(1 − Tr (cid:18)q √ ρ ρ √ ρ ) (cid:19) and is a quantum extension of the Hellinger distance. In the case of pure states the Bures distance becomes d b ( | ψ ih ψ | , | ψ ih ψ | ) = 2(1 − |h ψ | ψ i| ) (13)so for coherent states it is given by d b ( | G ( ψ ) ih G ( ψ ) | , | G ( ψ ) ih G ( ψ ) | ) := 2 (cid:18) − exp (cid:18) − k ψ − ψ k (cid:19)(cid:19) . Similarly to the classical case, the following inequality holds for arbitrary states [23] d b ( ρ , ρ ) ≤ k ρ − ρ k ≤ d b ( ρ , ρ ) . (14)Moreover, since |h ψ | ψ i| ≤ |h ψ | ψ i| , the additional inequality holds for pure states k ρ − ρ k ≥ √ d b ( ρ , ρ ) . (15)This means that for pure states, the trace and Bures distances are equivalent (up to constants).Finally, we will be using the fact that both the norm-one and the Bures distance are contractive under quantumchannels. T : T ( H ) → T ( H ′ ) , i.e. k T ( ρ ) − T ( ρ ) k ≤ k ρ − ρ k , d b ( T ( ρ ) , T ( ρ )) ≤ d b ( ρ , ρ ) . AE OF PURE STATES ENSEMBLES AND QUANTUM GAUSSIAN WHITE NOISE 9 Classical and quantum statistical models
In this section we review key elements of quantum statistics, and introduce the quantum statistical modelswhich will be analysed later on. For comparison, we review certain asymptotic equivalence results for relatedclassical statistical models.3.1.
Classical models.
Here we review several asymptotic normality results for classical models which areanalogous to the quantum models investigated in the paper.A classical statistical model is defined as a family of probability distributions Q = { P f : f ∈ W} on a measurablespace ( X , A ) , indexed by an unknown, possibly infinite dimensional parameter f to be estimated, which belongsto a parameter space W . In the asymptotic framework considered here we assume that we are given a (large)number n of independent, identically distributed samples X , . . . , X n from P f , from which we would like toestimate f . If d : W × W → R + is a chosen loss function, then the risk of an estimator ˆ f n = ˆ f n ( X , . . . , X n ) is R ( ˆ f n , f ) = E f h d ( ˆ f n , f ) i . In nonparametric statistics, the parameter of the model f is often a function that belongs to a smoothness class.We consider two classes W : the periodic Sobolev class S α ( L ) of functions on [0 , with smoothness α > / ,and the Hölder class Λ α ( L ) , with smoothness α > . For any f ∈ L [0 , , let { f j , j ∈ Z } be the set of Fouriercoefficients with respect to the standard trigonometric basis. The classes are defined as S α ( L ) := f : [0 , → R : X j ∈ Z Z | f j | | j | α du ≤ L . and Λ α ( L ) := { f : [0 , → R : | f ( x ) − f ( y ) | ≤ L | x − y | α , x, y ∈ [0 , } . In addition, when densities f are considered, we will assume that W includes an additional restriction to a class D ε = ( f : [0 , → [ ε, ∞ ) : Z [0 , f ( x ) dx = 1 ) for some ε > . Density model.
The classical density model consists of n observations X , . . . , X n which are independent,identically distributed (i.i.d.) with common probability density f P n = n P ⊗ nf : f ∈ W o . Gaussian regression model with fixed equidistant design.
In this model, we observe Y , ..., Y n such that Y i = f / (cid:18) in (cid:19) + ξ i , i = 1 , ..., n, where the errors ξ , ..., ξ n are i.i.d., standard Gaussian variables. Denote the Gaussian regression model by R n = ( n O i =1 N (cid:18) f / (cid:18) in (cid:19) , (cid:19) : f ∈ W ) . Gaussian white noise model.
In this model the square-root density f / is observed with Gaussian white noiseof variance n − , i.e. dY t = f / ( t ) dt + 1 √ n dW t , t ∈ [0 , . (16)If we denote by Q f the probability distribution of { Y ( t ) : t ∈ [0 , } , the corresponding model is F n := { Q f : f ∈ W} . Gaussian sequence model.
In this model we observe a sequence of Gaussian random variables with means equalto the coefficients of f / in some orthonormal basis of L [0 , for f ∈ F y j = θ j ( f / ) + 1 √ n ξ j , i = 1 , , . . . (17) where { ξ i } i ≥ are Gaussian i.i.d. random variables. We denote this model N n = O j ≥ N (cid:18) θ j (cid:16) f / (cid:17) , n (cid:19) : f ∈ W . In [54] it was shown that the sequences of models P n and F n are asymptotically equivalent in the sense thattheir Le Cam distance converges to zero as n → ∞ when W = Λ α ( L ) ∩ D ε with α > / ; in [12], a similarresult was established for R n and F n (more precisely, with f / any real valued function f / ∈ Λ α ( L ) ). Later,[60] showed that models F n and N n are asymptotically equivalent over periodic Sobolev classes f / ∈ S α ( L ) with smoothness α > / . Among many other results [27] considered generalized linear models, [11] regressionmodels with random design and [59] multivariate and random design, [26] compared the stationary Gaussianprocess with the Gaussian white noise model F n .In all classical results, the underlying nonparametric function was assumed to belong to a smoothness class inorder to establish asymptotic equivalence of models. In the quantum setup of pure states and Gaussian statesthat we discuss later on, no such smoothness assumption is needed.3.2. Quantum models, randomisations and convergence.
In this subsection we introduce the basic no-tions of a theory of quantum statistical models which is currently still in its early stages, cf. [33, 25] for moredetails. We will focus on the notions of quantum-to-classical randomisation carried out through measurements,and quantum-to-quantum randomisations implemented by quantum channels, which allow us to define theequivalence and the Le Cam distance between models.In analogy to the classical case, we make the following definition.
Definition 9.
A quantum statistical model over a parameter space Θ consists of a family of quantum states Q = { ρ θ : θ ∈ Θ } on a Hilbert space H , indexed by an unknown parameter θ ∈ Θ . A simple example is a family of pure states { ρ θ = | ψ θ ih ψ θ | : θ ∈ R } with | ψ θ i := exp( iθH ) | ψ i , where H is a selfdajoint operator generating the one-dimensional family of unitaries exp( iθH ) , and | ψ i ∈ H is a fixedvector. Physically, the parameter θ could be for instance time, a phase, or an external magnetic field. Anotherexample is that of a completely unknown state of a finite dimensional system, which can be parametrised interms of its density matrix elements, or the eigenvalues and eigenvectors. In order to increase the estimationprecision one typically prepares a number n of identical and independent copies of the state ρ θ , in which casethe corresponding model is Q n := { ρ ⊗ nθ : θ ∈ Θ } . Our work deals with non-parametric quantum statisticalmodels for which the underlying Hilbert space is infinite dimensional, as we will detail below.In order to obtain information about the parameter θ , we need to perform measurements on the system preparedin ρ θ . Using the random measurement data, we then employ statistical methods to solve specific decisionproblems. For instance, the task of estimating an unknown quantum state (also known as quantum tomography)is a key component of quantum engineering experiments [34]. In particular, the estimation of large dimensionalstates has received significant attention in the context of compressed sensing [29, 22], and estimation of low rankstates [14]. Suppose that we perform a measurement M on the system in state ρ θ , and obtain a random outcome O ∈ Ω with distribution P Mθ ( E ) := Tr( ρ θ M ( E )) , cf. section 2. The measurement data is therefore described bythe classical model P M := { P Mθ : θ ∈ Θ } , and the estimation problem can be treated using “classical” statisticalmethods. The measurement map M : T → L (Ω , Σ , P ) M : ρ θ p θ := d P θ d P can be seen as a randomisation from a classical to a quantum model, which intuitively means that Q is moreinformative that P M for any measurement M . Here P can be chosen to be the distribution correspondingto an arbitrary full rank (strictly positive) state ρ which insures the existence of all probability densities p θ .One of the distinguishing features of quantum statistics is the possibility to choose appropriate measurementsfor specific statistical problems (e.g. estimation, testing) and the fact that optimal measurements for differentproblems may be incompatible with each other. In the applications section we will discuss specific instances ofthis phenomenon. AE OF PURE STATES ENSEMBLES AND QUANTUM GAUSSIAN WHITE NOISE 11
Beside measurements, the quantum model Q can be transformed into another quantum model Q ′ := { ρ ′ θ : θ ∈ Θ } on a Hilbert space H ′ by means of a quantum randomisation , i.e. by applying a quantum channel T : T ( H ) → T ( H ′ ) T : ρ θ ρ ′ θ . The model Q ′ is less informative than Q in the sense that for any measurement M ′ on H ′ one can constructthe measurement M := M ′ ◦ T on H such that P M ′ θ = P Mθ for all θ . If there exists another channel S suchthat S ( ρ ′ θ ) = ρ θ for all θ we say (in analogy to the classical case) that the models Q and Q ′ are equivalent ; inparticular, for any statistical decision problem, one can match a procedure for one model with a procedure withthe same risk, for the other model. A closely related concept is that of quantum sufficiency whose theory wasdeveloped in [58]. More generally, we define the Le Cam distance in analogy to the classical case [48]. Definition 10.
Let Q and Q ′ be two quantum models over Θ . The deficiency between Q and Q ′ is defined by δ ( Q , Q ′ ) := inf T sup θ ∈ Θ k T ( ρ θ ) − ρ ′ θ k where the infimum is taken over all channels T . The Le Cam distance between Q and Q ′ is defined as ∆ ( Q , Q ′ ) := max ( δ ( Q , Q ′ ) , δ ( Q ′ , Q )) . (18)Its interpretation is that models which are “close” in the Le Cam distance have similar statistical properties.In practice, this metric is often used to approximate a sequence of models by another sequence of simplermodels, providing a method to establish asymptotic minimax risks. In particular, the approximation of i.i.d.quantum statistical models by quantum Gaussian ones has been investigated in [31, 33, 43], in the case of finitedimensional systems with arbitrary mixed states. Our goal is to extend these results to non-parametric modelsconsisting of pure states on infinite dimensional Hilbert spaces. The following lemma will be used later on. Lemma 3.1.
Let Q , Q ′ be two quantum models as defined above. Let ρ i = P i µ i,j ρ θ i,j be two arbitrary mixtures( i = 1 , ) of states in Q and let ρ ′ i = P i µ i,j ρ ′ θ i,j be their counterparts in Q ′ . Then k ρ ′ − ρ ′ k − Q , Q ′ ) ≤ k ρ − ρ k ≤ k ρ ′ − ρ ′ k + 2∆( Q , Q ′ ) . Proof.
Since quantum channels are contractive with respect to the norm-one k S ( ρ ′ ) − S ( ρ ′ ) k ≤ k ρ ′ − ρ ′ k and by the triangle inequality we get k ρ − ρ k ≤ k ρ − S ( ρ ′ ) k + k S ( ρ ′ ) − S ( ρ ′ ) k + k S ( ρ ′ ) − ρ k ≤ Q , Q ′ ) + k ρ ′ − ρ ′ k The second inequality can be shown in a similar way. (cid:3)
The i.i.d. and the quantum white noise models.
We now introduce the non-parametric quantummodels investigated in the paper, and discuss the relationship with the classical models described in section 3.1.Let H be an infinite dimensional Hilbert space and let B := {| e i , | e i , . . . } be a fixed orthonormal basis in H .The Fourier decomposition of an arbitrary vector is written as | ψ i = P ∞ j =0 ψ j | e j i . Since most of the modelswill consist of pure states, we will sometimes define them in terms of the Hilbert space vectors rather than thedensity matrices, but keep in mind that the vectors are uniquely defined only up to a complex phase.Let us consider the general problem of estimating an unknown pure quantum state in H . For finite dimensionalsystems, the risk with respect to typical rotation invariant loss functions scales linearly with the number ofparameters [24], hence with the dimension of the space. Therefore, since H is infinite dimensional, it is notpossible to develop a meaningful estimation theory without any prior information about the state. Motivatedby physical principles and statistical methodology we introduce the following Hermite-Sobolev classes [8] and[9] of pure states characterised by an appropriate decay of the coefficients with respect to the basis B : S α ( L ) := | ψ ih ψ | : ∞ X j =0 | ψ j | j α ≤ L, and k ψ k = 1 , α > , L > . (19)To gain some intuition about the meaning of this class, let us assume that B is the Fock basis of a one-modecv system. Then the constraint translates into the moment condition for the number operator h ψ | N α | ψ i ≤ L ;this is a mild assumption considering that all experimentally feasible states have finite moments to all orders.Even more, the coefficients of typical states such as coherent, squeezed, and Fock states decay exponentiallywith the photon number. Our first model describes n identical copies of a pure state belonging to the Sobolev class Q n := {| ψ ih ψ | ⊗ n : | ψ ih ψ | ∈ S α ( L ) } . (20)In section 5.1 we show that the minimax rate of Q n for the norm-one and Bures distance loss functions is n − α/ (2 α +1) . This is identical to the minimax rate of the classical i.i.d. model described in section 3.1.We now introduce the corresponding quantum Gaussian model. Let F := F ( H ) be the Fock space over H ,and let | G ( √ nψ ) i ∈ F be the coherent state with “displacement” vector √ nψ . As discussed in section 2.2.2,the vector √ nψ should be seen now as the expectation of the infinite dimensional Gaussian state rather than aquantum state in itself, for which reason we have omitted the ket notation. We define the coherent states model G n = (cid:8)(cid:12)(cid:12) G ( √ nψ ) (cid:11) (cid:10) G ( √ nψ ) (cid:12)(cid:12) : | ψ i ∈ H , such that | ψ ih ψ | ∈ S α ( L ) (cid:9) . (21)Using the factorisation property (9) with respect to the orthonormal basis B , we see that the model is equivalentto the product of independent one-mode coherent Gaussian states of mean √ nψ i (cid:12)(cid:12) G ( √ nψ ) (cid:11) ∼ = ∞ O i =1 (cid:12)(cid:12) G ( √ nψ i ) (cid:11) which is analogous to the classical Gaussian sequence model N n defined in equation (17).Similarly, we can draw an analogy with the white noise model F n by realising H as L ([0 , . Let us define the quantum stochastic process [57] on F ( L ([0 , B ( t ) := a (cid:0) χ [0 ,t ] (cid:1) + a ∗ (cid:0) χ [0 ,t ] (cid:1) and note that [ B ( t ) , B ( s )] = 0 for all t, s ∈ [0 , so that { B ( t ) : t ∈ [0 , } is a commutative family of operators.This implies that { B ( t ) : t ∈ [0 , } have a joint probability distribution which is uniquely determined bythe quantum state, and can be regarded as a classical stochastic process. If the state is the vacuum | i ,the process is Gaussian and has the same distribution as the Brownian motion. Consider now the process X ( t ) := W ( √ nψ ) ∗ B ( t ) W ( √ nψ ) . which is obtained by applying a unitary Weyl transformation to B ( t ) . Inphysics terms we work here in the “Heisenberg picture” where the transformation acts on operators while thestate is fixed. Using quantum stochastic calculus one can derive the following differential equation for X ( t ) / √ n √ n dX ( t ) = ψ ( t ) dt + 1 √ n dB ( t ) . Therefore, X ( t ) / √ n is similar to the process (16) with the exception that it has a complex rather than realvalued drift function. Note that in this correspondence ψ ( t ) plays the role of f / , which agrees with theintuitive interpretation of the wave function as square root of the state | ψ ih ψ | . Alternatively, one can use theSchrödinger picture, where the state is |√ nψ i = W ( √ nψ ) | i , such that the process B ( t ) has the same law as X ( t ) under the vacuum state.In section 5.1 we show that the minimax rate of G n for loss functions based on the norm-one and the Buresdistance, is n − α/ (2 α +1) . Although the rate is identical to that of the corresponding classical model, the resultdoes not follow from the classical case but relies on an explicit measurement strategy for the upper bounds,and on the quantum local asymptotic equivalence Theorem 4.1 for the lower bound. Furthermore, the minimaxrate for the estimation of certain quadratic functionals are established in section 5.2, and the minimax testingrates are derived in section 5.3. While the former are similar to the classical ones, the quantum testing ratesare parametric as opposed to non-parametric in the classical case. This reflects the fact that in the quantumcase, the optimal measurements for different statistical problems are in general incompatible with each otherand in some cases they differ significantly from what is expected on classical basis.4. Local asymptotic equivalence for quantum models
In this section we prove that the sequence (20) of non-parametric pure states models is locally asymptoticallyequivalent (LAE) with the sequence (21) of quantum Gaussian models, in the sense of the Le Cam distance.This is one of the main results of the paper and will be subsequently used in the applications. Throughout thesection | ψ i is a fixed but arbitrary state in an infinite dimensional Hilbert space H . We let H := {| ψ i ∈ H : h ψ | ψ i = 0 } denote the orthogonal complement of C | ψ i . Any vector state | ψ i ∈ H decomposes uniquely as | ψ i = | ψ u i := p − k u k | ψ i + | u i , | u i ∈ H (22)where the phase has been chosen such that the overlap h ψ | ψ i is real and positive. Therefore, the pure statesare uniquely parametrised by vectors | u i ∈ H . AE OF PURE STATES ENSEMBLES AND QUANTUM GAUSSIAN WHITE NOISE 13
Further to the i.i.d. and Gaussian models Q n and G n defined in (20) and respectively (21), we now introducetheir local counterparts which are parametrised by the local parameter | u i rather than by | ψ i . Let γ n be asequence such that γ n = o ( n − / ) , and define the pure state models Q n ( ψ , γ n ) := {| ψ ⊗ nu i ∈ H ⊗ n : | u i ∈ H , k u k ≤ γ n } (23) G n ( ψ , γ n ) := {| G ( √ nu ) i ∈ F ( H ) : | u i ∈ H , k u k ≤ γ n } . (24)The LAE Theorem below shows that these local models are asymptotically equivalent. An interesting fact isthat LAE holds without imposing global restrictions such as defined by the Sobolev classes, rather it sufficesthat the local balls shrink at rate γ n which is faster than n − / . This contrasts with the classical case whereboth types of conditions are needed, as explained in section 3.1. However, since the state cannot be “localised”without any prior knowledge, in applications we need to make additional assumptions which allow us to workin a small neighbourhood and make use of local asymptotic equivalence. In particular, the convergence holdsfor the restricted models where the Sobolev condition is imposed on top of the local one. This will be used inestablishing the estimation, testing, and functional estimation results. Theorem 4.1.
Let Q n ( ψ , γ n ) and G n ( ψ , γ n ) be the models defined in (23) and respectively (24) . Then thefollowing convergence holds uniformly over states | ψ i : lim sup n →∞ sup | ψ i∈H ∆( Q n ( ψ , γ n ) , G n ( ψ , γ n )) = 0 (25) where ∆( · , · ) is the quantum Le Cam distance defined in equation (18) .Proof. The direct map channel T n is defined as an isometric embedding T n : T ( H ⊗ s n ) → T ( F ( H )) ρ V n ρV ∗ n . where V n : H ⊗ s n → F ( H ) is an isometry defined below. Since we deal with pure states, it suffices to provethat lim sup n →∞ sup | ψ i∈H sup k u k≤ γ n (cid:13)(cid:13) V n ψ ⊗ nu − G ( √ nu ) (cid:13)(cid:13) = 0 . We now define the isometric embedding V n by showing its explicit action on the vectors of an ONB. For anypermutation σ ∈ S n , let U σ : | u i ⊗ · · · ⊗ | u n i 7→ | u σ − (1) i ⊗ · · · ⊗ | u σ − ( n ) i be the unitary action on H ⊗ n by tensor permutations. Then P s := n ! P σ ∈ S n U σ is the orthogonal projectoronto the subspace of symmetric tensors H ⊗ s n . We construct an orthonormal basis in H ⊗ s n as follows.Let B := {| e i , | e i , . . . } be an orthonormal basis in H . Let ˜ n = ( n , n ) = ( n , n , . . . ) be an infinite sequenceof integers such that P i ≥ n i = n , and note that only a finite number of n i s are different from zero. Then thesymmetric vectors | ˜ n i = | n , n , n , . . . i := r n ! n ! · n ! · . . . P s | ψ i ⊗ n ⊗ O i ≥ | e i i ⊗ n i form an ONB of H ⊗ s n .As discussed in section 2.2.2 the Fock space F ( H ) can be identified with the infinite tensor product of one-modeFock spaces N i ≥ F ( C | e i i ) which has an orthonormal number basis (or Fock basis) consisting of products ofnumber basis vectors of individual modes | n i := O i ≥ | n i i where n i = 0 only for a finite number of indices. We define V n : H ⊗ s n → F ( H ) as follows V n : | ˜ n i 7→ | n i . Its image consists of states with at most n “excitations”, with | ψ i ⊗ n being mapped to the vacuum state | i . Wewould like to show that the embedded state V n | ψ u i ⊗ n are well approximated by the coherent states | G ( √ nu ) i uniformly over the local neighbourhood k u k ≤ γ n . For this we will make use of the covariance and functorialproperties of the second quantisation construction in order to reduce the non-parametric LAE statement to thecorresponding one for 2-dimensional systems.Let | u i ∈ H be a fixed unit vector. Let j : C
7→ H be the isometric embedding j : | i 7→ | ψ i , j : | i 7→ | u i and let j : C | i → H be the restriction of j to the one dimensional subspace C | i . Since second quantisationis functorial under contractive maps, there is a corresponding isometric embedding J = Γ( j ) satisfying J : F ( C | i ) → F ( H ) | G ( α ) i 7→ | G ( j ( α )) i = | G ( αu ) i . (26)Let ˜ V n : (cid:0) C (cid:1) ⊗ s n → F ( C | i ) be the isometry constructed in the same way as V n , where | i plays the role of | ψ i and C | i is the analogue of H . As before, let | ψ α i = p − | α | | i + α | i , with | α | ≤ . Then by theproperties of the embedding map V n we have J ˜ V n | ψ α i ⊗ n = V n | ψ αu i ⊗ n . (27)From equations (26) and (27) we find sup | α |≤ γ n (cid:13)(cid:13) V n ψ ⊗ nαu − G ( √ nαu ) (cid:13)(cid:13) = sup | α |≤ γ n (cid:13)(cid:13)(cid:13) ˜ V n ψ ⊗ nα − G ( √ nα ) (cid:13)(cid:13)(cid:13) Since the right-hand side of the above equality is independent of | u i the same equality holds with supremumon the left side taken over all | u i ∈ H with k u k ≤ γ n . Therefore the LAE for the non-parametric models hasbeen reduced to that of a two-dimensional (qubit) model which has already been established in [31]. Thereforewe obtain a first version of LAE in which the i.i.d and Gaussian models are expressed in terms of the localparameter | u i lim sup n →∞ sup | ψ i∈H sup k u k≤ γ n (cid:13)(cid:13) V n ψ ⊗ nu − G ( √ nu ) (cid:13)(cid:13) = 0 . Conversely, we define the reverse channel S n : T ( F ( H )) as follows. Let P n denote the orthogonal projection in F ( H ) onto the image space of V n , i.e. the subspace with total excitation number at most n F ≤ n ( H ) := Lin {| n , n , . . . i : X i ≥ n i ≤ n } . Let R n : F ( H ) → H ⊗ s n be a right inverse of V n , i.e. R n V n = . Then the reverse channel is defined as S n ( ρ ) = R n P n ρP n R ∗ n + Tr( ρ (1 − P n )) | ψ ih ψ | ⊗ n . Operationally, the action of S n consists of two steps. We first perform a projection measurement with projections P n and ( − P n ) ; if the first outcome occurs the conditional state of the system is P n ρP n / Tr( P n ρ n ) , while if thesecond outcome occurs the state is ( − P n ) ρ ( − P n ) / Tr(( − P n ) ρ n ) . In the second stage, if the first outcomewas obtained we map the projected state through the map R n into a state in H ⊗ n , while if the second outcomewas obtained, we prepare the fixed state | ψ ih ψ | ⊗ n .When applied to the pure Gaussian states | G ( √ nψ u ) i , the output of S n is the mixed state S n ( | G ( √ nψ u ) ih G ( √ nψ u ) | ) = p nu | φ nu ih φ nu | + (1 − p nu ) | ψ ih ψ | ⊗ n where | φ nu i := R n P n | G ( √ nψ u ) i√ p nu , p nu = k P n G ( √ nψ u ) k The key observation is that the Gaussian states are almost completely supported by the subspace F ≤ n ( H ) ,uniformly with respect to the ball k u k ≤ γ n . This means that lim sup n →∞ sup | ψ i sup k u k≤ γ n (1 − p nu ) = 0 which together with the fact that R n is the inverse of V n implies lim sup n →∞ sup | ψ i sup k u k≤ γ n (cid:13)(cid:13) S n ( | G ( √ nψ u ) ih G ( √ nψ u ) | ) − | ψ u ih ψ u | ⊗ n (cid:13)(cid:13) = 0 . This completes the proof of (25). (cid:3) Applications
In this section we discuss three major applications of the local asymptotic equivalence result in Theorem 4.1,namely to the estimation of pure states, estimation of a physically meaningful quadratic functional, and finallyto testing between pure states. We stress that local asymptotic equivalence allows us to translate these problemsinto similar but easier ones involving Gaussian states. This strategy has already been successfully employed [31]in finding asymptotically optimal estimation procedures for finite dimensional mixed states , which otherwiseappeared to be a difficult problem due to the complexity of the set of possible measurements.
AE OF PURE STATES ENSEMBLES AND QUANTUM GAUSSIAN WHITE NOISE 15
As discussed in section 3.3, we will assume that we are given n independent systems, each prepared in astate | ψ i ∈ H belonging to the Sobolev ellipsoid S α ( L ) defined in equation (19). The corresponding quantumstatistical model Q n was defined in equation (20), and the Gaussian counterpart model G n was defined inequation (21).Here is a summary of the results. In Theorem 5.2 we show that the estimation rates over such ellipsoids are n − α/ (2 α +1) ; this is similar to the well-known rates, e.g. for density estimation, in nonparametric statistics(see [64]). The estimation of the quadratic functional F ( ψ ) = X j ≥ | ψ j | j β , for some fixed β > of the unknown pure state presents two regimes: a parametric rate n − for the MSE is attained when theunknown state has enough "smoothness" (that is α ≥ β ), whereas a nonparametric rate n − − β/α ) is obtainedwhen β < α < β . This double regime is known in nonparametric estimation for the density model, withdifferent values for both the rates and the values of the parameters where the phase-transition occurs, cf [15],[45] and references therein.Parametric rates and sharp asymptotic constants are obtained for the testing problem of a pure state againstan alternative described by the Sobolev-type ellipsoid with an L -ball removed. In the classical density modelonly nonparametric rates for testing of order n − α/ (4 α +1) can be obtained for the L norm. In our quantumi.i.d. model, parametric rate n − / is shown to be minimax for testing H : ψ = ψ , for some ψ in S α ( L ) overthe nonparametric set of alternatives: H : ψ ∈ S α ( L ) is such that k| ψ ih ψ | − | ψ ih ψ |k ≥ cn − / . The sharp asymptotic constant we obtain for testing is specific for ensembles of pure states. As we discuss inthe sequel, quantum testing of states allows us to optimize over the measurements, and thus to obtain the mostdistinguishable likelihoods for the underlying unknown quantum state.5.1.
Estimation.
We consider the problem of estimating an unknown pure state belonging to the Hermite-Sobolev class S α ( L ) given an ensemble of n independent, identically prepared systems. The correspondingsequence of statistical models Q n was defined in equation (20). We first describe a specific measurementprocedure which provides an estimator whose risk attains the nonparametric rate n − α/ (2 α +1) . We prove thelower bounds for estimating a Gaussian state in the model G n defined in (21). Subsequently we use LAE toestablish a lower bound showing that the rate is optimal in the i.i.d. model as well.Before deriving the bounds we briefly review the definitions of the loss functions used here and the relationsbetween them, cf. section 2.3. Recall that the trace norm distance between states ρ and ρ ′ is given by k ρ − ρ ′ k := Tr( | ρ − ρ ′ | ) , and is the quantum analogue of the norm-one distance between probability densities.The square of the Bures distance is given by d b := 2(1 − Tr( p √ ρρ ′ √ ρ )) , and is a quantum extension of theHellinger distance. These distances satisfy the inequalities (14).In the case of pure states (i.e. ρ = | ψ ih ψ | , and ρ ′ = | ψ ′ ih ψ ′ | ) these metrics become (cf. (12) and (13)), k ρ − ρ ′ k = 2 p − |h ψ | ψ ′ i| , d b ( ρ , ρ ′ ) = 2(1 − |h ψ | ψ ′ i| ) . Since vectors are not uniquely defined by the states, the distances cannot be expressed directly in terms of thelength k ψ − ψ ′ k . However if we consider a reference vector | ψ i and define the representative vector | ψ i suchthat h ψ | ψ i ≥ , then we can write (as in section 4) | ψ u i = p − k u k | ψ i + | u i , | ψ u ′ i = p − k u ′ k | ψ i + | u ′ i , | u i , | u ′ i ⊥ | ψ i and the distances have the same (up to a constant) quadratic approximation k ρ u − ρ u ′ k = 4 k u − u ′ k + O (max( k u k , k u ′ k ) ) , d b ( ρ u , ρ u ′ ) = k u − u ′ k + O (max( k u k , k u ′ k ) ) , (28)where the correction terms are of order 4 as k u k and k u ′ k tend to . Below we show that asymptotically with n the estimation risk for norm-one square and Bures distance square will have the same rate as that of estimatingthe local parameter u with respect to the Hilbert space distance.5.1.1. Upper bounds.
We first describe a two steps measurement procedure, which provides an estimator whoserisk has rate n − α/ (2 α +1) . Theorem 5.1.
Consider the i.i.d. quantum model Q n given by equation (20) . There exists an estimator b ρ n := | b ψ n ih b ψ n | such that lim sup n →∞ sup | ψ i∈ S α ( L ) n α/ (2 α +1) E ρ (cid:2) d (ˆ ρ n , ρ ) (cid:3) ≤ C, where ρ := | ψ ih ψ | , d (ˆ ρ n , ρ ) denotes either the trace-norm distance, or the Bures distance, and C > is aconstant depending only on α > and L > .Proof of Theorem 5.1. According to inequalities (14) and (15) the two distances are equivalent on pure states,so it suffices to prove the upper bound for the trace-norm distance.Firstly, a projective operation is applied to each of the n copies separately, whose aim is to truncate thestate to a finite dimensional subspace of dimension d n = [ n / (2 α +1) ] + 1 . Let P n be the projection onto thesubspace H n spanned by the first d n basis vectors {| e i , . . . , | e d n − i} . For a given state | ψ i the operationconsists of randomly projecting the state with P n or ( − P n ) , which produces i.i.d. outcomes O i ∈ { , } with P ( O i = 1) = p n = k P n ψ k . The posterior state conditioned on the measurement outcome is | ψ ih ψ | 7→ | ψ ( n ) ih ψ ( n ) | := P n | ψ ih ψ | P n p n with probability p n ( − P n ) | ψ ih ψ | ( − P n )1 − p n with probability − p n Since | ψ ih ψ | ∈ S α ( L ) , the probability − p n is bounded as − p n = ∞ X i = d n | ψ i | = ∞ X i = d n i − α i α | ψ i | ≤ d − αn ∞ X i =1 i α | ψ i | = n − α/ (2 α +1) L. (29)Let ˜ n = P ni =1 O i be the number of systems for which the outcome was equal to 1, so that ˜ n has binomialdistribution Bin( n, p n ) . Then E (˜ n/n ) = p n and Var(˜ n/n ) = p n (1 − p n ) /n = O (1 /n ) . Therefore ˜ n/n → , and we collect those with outcome , sothat the joint state is | ψ ( n ) ih ψ ( n ) | ⊗ ˜ n which is supported by the symmetric subspace H ⊗ s ˜ nn . In order to estimatethe truncated state | ψ ( n ) i (and by implication | ψ i ), we perform a covariant measurement M n [35] whose spaceof outcomes is the space of pure states ˆ ρ n = | ˆ ψ n ih ˆ ψ n | over H n , and the infinitesimal POVM element is M n ( d ˆ ρ ) = (cid:18) ˜ n + d n − d n − (cid:19) ˆ ρ ⊗ n d ˆ ρ. (30)The covariance property means that the unitary group has a covariant action on states and their correspondingprobability distributions P M n UρU ∗ ( d ˆ ρ ) = Tr( U ρU ∗ · d ˆ ρ ) = P M n ρ ( d ( U ∗ ˆ ρU )) . Recall that the trace-norm distance squared for pure states is given by d ( ρ, ρ ′ ) := k ρ − ρ ′ k = 4(1 − |h ψ | ψ ′ i| ) .In [35] it has been shown that, conditionnally on ˜ n , the risk of the estimator ˆ ρ with respect to the trace-normsquare distance is E ˜ n h d (ˆ ρ n , ρ ( n ) ) i = 4( d n − d n + ˜ n . Using the triangle inequality we have d (ˆ ρ n , ρ ) ≤ d (ˆ ρ n , ρ ( n ) ) + d ( ρ, ρ ( n ) )) . Since | ψ ( n ) i = P n | ψ i / √ p n , thebias term is d ( ρ, ρ ( n ) ) = 4(1 − p n ) , which by (29) is bounded by n − α/ (2 α +1) L . Therefore E (cid:2) d b (ˆ ρ n , ρ ) (cid:3) ≤ E (cid:20) ( d n − d n + ˜ n (cid:21) + 8 n − α/ (2 α +1) L. For an arbitrary small ε > , we have E (cid:20) ( d n − d n + ˜ n (cid:21) ≤ P (cid:20) ˜ nn < − ε (cid:21) + E (cid:20) ( d n − d n + n · ˜ n/n · I ( ˜ nn ≥ − ε ) (cid:21) ≤ O (cid:18) n (cid:19) + C d n n . Putting together the last two upper bounds concludes the proof. (cid:3) Reference [35] uses a fidelity distance erroneously called “Bures distance" , which for pure states coincides with the trace-normdistance up to a constant
AE OF PURE STATES ENSEMBLES AND QUANTUM GAUSSIAN WHITE NOISE 17
Lower bounds - Unimprovable rates.
We will first consider the Gaussian model G n given by equation (21)which is indexed by Hilbert space vectors ψ ∈ H in the Sobolev class S α ( L ) , playing the role of means ofquantum Gaussian states | G ( √ nψ ) i . In Theorem 5.2 we find a lower bound for the mean square error of anyestimator ˆ ψ . This is then used in conjunction with the local asymptotic equivalence Theorem 4.1 to obtain alower bound for the risk of the i.i.d. model Q n , with respect to the norm-one and Bures distances. Theorem 5.2.
Consider the quantum Gaussian model G n given by equation (21) . There exists some constant c > depending only on α and L such that lim inf n →∞ inf b ψ n sup ψ ∈ S α ( L ) n α/ (2 α +1) E ψ h k b ψ n − ψ k i ≥ c, where the infimum is taken over all estimators b ψ n , understood as combination of measurements and classicalestimators. The proof is given in Section 6.We now proceed to consider the i.i.d. model Q n defined in (20). We are given n copies of an unknown pure state | ψ ih ψ | , with ψ in the Sobolev class S α ( L ) . The goal is to find an asymptotic lower bound for the estimation risk(with respect to the Bures or norm-one loss functions) which matches the upper bound derived in section 5.1.1.Since both loss functions satisfy the triangle inequality, it can be shown that by choosing estimators which aremixed states, rather than pure states, one can improve the risk by at most a constant factor 2. Therefore weconsider estimators which are pure states. In order to fix the phase of the vector representing the true and theestimated state, we will assume that h ψ | e i ≥ and h ˆ ψ | e i ≥ . Theorem 5.3.
Consider the i.i.d. quantum model Q n given by equation (20) . There exists some constant c > depending only on α > / and L > such that lim inf n →∞ inf | b ψ n i sup | ψ i∈ S α ( L ) n α/ (2 α +1) E ρ (cid:2) d (ˆ ρ n , ρ ) (cid:3) ≥ c, where ρ := | ψ ih ψ | , the infimum is taken over all estimators b ρ n := | b ψ n ih b ψ n | (defined by a combination ofmeasurement and a classical estimator), and the loss function d (ˆ ρ, ρ ) is either the norm-one or the Buresdistance. The proof is given in Section 6.5.2.
Quadratic functionals.
This section deals with the estimation of the quadratic functional F ( ψ ) = X j ≥ | ψ j | · j β , for some fixed < β < α, which is well defined for all pure states | ψ i in the ellipsoid S α ( L ) . If the Hilbert space H is represented as L ( R ) and {| j i : j ≥ } is the Fock basis (cf. section 2.2.1) then F ( ψ ) is the moment of order β of the numberoperator N : F ( ψ ) = Tr( | ψ ih ψ | · N β ) . Below we derive upper and lower bounds for the rate of the quadratic risk for estimating F ( ψ ) , which is of order n − if α ≥ β , and n − − β/α ) if β < α < β .5.2.1. Upper bounds.
Let us describe an estimator b F n of F ( ψ ) in the quantum i.i.d. model. We consider themeasurement of the number operator with projections {| j ih j |} j ≥ . For a pure state | ψ i = P j ≥ ψ j | j i , weobtain an outcome X taking values j ∈ N with probabilities p j := P ψ ( X = j ) = | ψ j | , for j ≥ . By measuringeach quantum sample | ψ i separately, we obtain i.i.d. copies X , . . . , X n of X , allowing us to estimate each p j empirically, by ˆ p j = 1 n n X k =1 I ( X k = j ) , j ≥ . which is an unbiased estimator of p j with variance p j (1 − p j ) /n . The estimator of the quadratic functional isdefined as b F n = N X j =1 ˆ p j · j β (31)for an appropriately chosen truncation parameter N defined below. The next theorem, shows that a parametricrate can be attained for estimating the quadratic functional F ( ψ ) if α ≥ β , whereas a nonparametric rate isattained if β < α < β . Theorem 5.4.
Consider the i.i.d. quantum model Q n given by equation (20) . Let b F n be the estimator (31) of F ( ψ ) with N ≍ n / α − β ) , for α ≥ β , respectively N ≍ n / α , for β < α < β . Then sup ψ ∈ S α ( L ) E ψ (cid:16) ˆ F n − F ( ψ ) (cid:17) = O (1) · η n , where η n = (cid:26) n − , if α ≥ βn − − β/α ) , if β < α < β. (32) Proof of Theorem 5.4.
The usual bias-variance decomposition yields E ψ (cid:16) b F n − F ( ψ ) (cid:17) = (cid:16) E ψ b F n − F ( ψ ) (cid:17) + V ar ψ (cid:16) b F n (cid:17) . The bias can be upper bounded as (cid:12)(cid:12)(cid:12) F ( ψ ) − E ψ b F n (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F ( ψ ) − N X j =1 p j · j β (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = X j ≥ N +1 p j · j β ≤ N − α − β ) X j ≥ N +1 p j · j α ≤ LN − α − β ) . For the variance, let us note that the vector b V = n · (ˆ p , . . . , ˆ p N , ˆ p ∗ N +1 ) , with b p ∗ N +1 = n − n X k =1 I ( X k ≥ N + 1) , has a multinomial distribution with parameters n and probability vector V := ( p , . . . , p N , p ∗ N +1 = P j ≥ N +1 p j ) ⊤ .The covariance matrix of a multinomial vector writes n · (Diag( V ) − V · V ⊤ ) , where Diag( V ) denotes the diagonalmatrix with entries from V . In particular, if b p := (ˆ p , ..., ˆ p N ) ⊤ , p := ( p , ..., p N ) ⊤ and B := (1 , β , ..., N β ) ⊤ then Cov ψ ( b F n ) = Cov ψ ( B ⊤ · b p ) = B ⊤ · Cov ψ ( b p ) · B = 1 n · B ⊤ · (Diag( p ) − p · p ⊤ ) · B. This gives
Cov ψ ( b F n ) ≤ n · B ⊤ · Diag( p ) · B = 1 n N X j =1 p j · j β . The bound of this last term and the resulting bound of the risk is treated separately for the two cases. a) Case α ≥ β . In that case, N X j =1 p j · j β ≤ N X j =1 p j · j α ≤ L implying that V ar ( b F n ) ≤ Ln .
The upper bound of the risk is, in this case, E ψ (cid:16) b F n − F ( ψ ) (cid:17) ≤ L N − α − β ) + Ln .
If we choose N ≍ n / (4( α − β )) or larger, then the parametric rate is attained for the risk: E ψ (cid:16) b F n − F ( ψ ) (cid:17) = O (1) · n − . b) Case β < α < β . Here we have, Cov ψ ( b F n ) ≤ n N X j =1 p j · j β ≤ n N X j =1 p j · j β − α j α p j ≤ N β − α n L. The upper bound of the risk becomes E ψ (cid:16) b F n − F ( ψ ) (cid:17) ≤ L N − α − β ) + N β − α n L. The optimal choice of the parameter N that balances the two previous terms is N ≍ n / (2 α ) , giving the attainablerate for the quadratic risk E ψ (cid:16) b F n − F ( ψ ) (cid:17) = O (1) · n − − β/α ) . Cases a) and b) together prove that the rate η n is attainable. (cid:3) AE OF PURE STATES ENSEMBLES AND QUANTUM GAUSSIAN WHITE NOISE 19
Lower bounds.
The next Theorem proves the optimality of the previously attained rate for the estimationof quadratic functionals.
Theorem 5.5.
Consider the i.i.d. quantum model Q n given by equation (20) . Then, there exists some constant c > depending only on α , β (with α > β > ), and L > such that lim inf n →∞ inf b F n sup ψ ∈ S α ( L ) η − n · E ψ (cid:16) b F n − F ( ψ ) (cid:17) ≥ c, where the infimum is taken over all measurements and resulting estimators b F n of F ( ψ ) . The proof is given in Section 6.5.3.
Testing.
In the problem of testing for signal in classical Gaussian white noise, over a smoothness class withan L -ball removed, minimax rates of convergences (separation rates) are well known [41]; they are expressedin the rate of the ball radius tending to zero along with noise intensity, such that a nontrivial asymptotic poweris possible. We will consider an analogous testing problem here for pure states. Accordingly, let ρ = | ψ ih ψ | denote pure states, let ρ = | ψ ih ψ | be a fixed pure state to serve as the null hypothesis, and let B ( ϕ ) = {k ρ − ρ k ≥ ϕ } (33)be the complement of a trace norm ball around ρ . We want to test in the i.i.d. quantum model Q n given byequation (20) the following hypotheses about a pure state ρ : H : ρ = ρ H ( ϕ n ) : ρ ∈ S α ( L ) ∩ B ( ϕ n ) (34)for { ϕ n } n ≥ a decreasing sequence of positive real numbers. Consider a binary POVM M = ( M , M ) , actingon the product states ρ ⊗ n , cf. Definition 2. We denote the testing risk between two fixed hypotheses by thesum of the two error probabilities R Mn = R Mn ( ρ ⊗ n , ρ ⊗ n , M ) = Tr( ρ ⊗ n · M ) + Tr( ρ ⊗ n · M ) . In the minimax α -testing approach which dominates the literature on the classical Gaussian white noise case, onewould require Tr( ρ ⊗ n · M ) ≤ α while trying to minimize the worst case type 2 error sup ρ ∈ S α ( L ) ∩ B ( ϕ n ) Tr( ρ ⊗ n · M ) . However we will consider here the so-called detection problem [40] where the target is the worst case totalerror probability P Me ( ϕ n ) = sup ρ ∈ S α ( L ) ∩ B ( ϕ n ) R Mn ( ρ ⊗ n , ρ ⊗ n , M ) = Tr( ρ ⊗ n · M ) + sup ρ ∈ S α ( L ) ∩ B ( ϕ n ) Tr( ρ ⊗ n · M ) . The minimax total error probability is then obtained by optimizing over T : P ∗ e ( ϕ n ) = inf M binary POVM P Me ( ϕ n ) . Separation rate.
A sequence { ϕ ∗ n } n ≥ is called a minimax separation rate if any other sequence { ϕ n } n ≥ fulfills P ∗ e ( ϕ n ) → if ϕ n /ϕ ∗ n → and P ∗ e ( ϕ n ) → if ϕ n /ϕ ∗ n → ∞ . (35)Below we establish that ϕ ∗ n = n − / is a separation rate in the current problem, even though the alternative H ( · ) in (34) is a nonparametric set of pure states. Recall relations (11), (12) describing the total optimal errorfor testing between simple hypotheses given by two pure states. Theorem 5.6.
Consider the i.i.d. quantum model Q n given by equation (20) , and the testing problem (34).Assume that ρ is in the interior of S α ( L ) , i.e ρ ∈ S α ( L ′ ) for some L ′ < L . Then ϕ ∗ n = n − / is a minimaxseparation rate. The proof is given in the Section 6.
Sharp asymptotics.
Having identified the optimal rate of convergence in the testing problem (i.e. theminimax separation rate), we will go a step further and aim at a sharp asymptotics for the minimax testingerror. We will adopt the approach of [20], extended in [41], where testing analogs of the Pinsker-type sharprisk asymptotics in nonparametric estimation were obtained. The result will be framed as follows: if the radiusis chosen ϕ n ∼ cn − / for a certain c > , then the minimax testing error behaves as P ∗ e ( ϕ n ) ∼ exp (cid:0) − c / (cid:1) .Thus the sharp asymptotics is expressed as a type of scaling result: a choice of constant c in the radius impliesa certain minimax error asymptotics depending on c .To outline the problem, consider the upper and lower error bounds obtained in the proof of the separation rate.In (62) we obtained the bound P M n e ( ϕ n ) ≤ exp (cid:0) − c n / (cid:1) (36)if ϕ n = c n n − / , where M n was the sequence of projection tests M n = ( ρ ⊗ n , I − ρ ⊗ n ) . The lower risk boundobtained in the course of proving Theorem 5.6 was inf M binary POVM P Me ( ϕ n ) ≥ − q − (1 − c n n − / n . If c n = c we can summarize this as − p − exp ( − c /
4) + o (1) ≤ P ∗ e ( ϕ n ) ≤ exp (cid:0) − c / (cid:1) . Our result will be that the upper bound is sharp and represents the minimax risk asymptotics.
Theorem 5.7.
Consider the i.i.d. quantum model Q n given by equation (20) , and the testing problem (34).Assume that ρ ∈ S α ( L ′ ) for some L ′ < L . At the minimax separation rate for the radius, i.e. for ϕ n ≍ n − / we have lim n nϕ n log P ∗ e ( ϕ n ) = − / . The proof is given in the Section 6.5.4.
Discussion. State estimation.
Tomography and optimal rates.
Consider a model where the Sobolev-type assumption ρ ∈ S α ( L ) about thepure state ρ = | ψ i h ψ | (cf. (19)) is replaced by a finite dimensionality assumption: ρ ∈ H d where H d = {| ψ i h ψ | : ψ j = 0 , j ≥ d } and d is known. One observes n identical copies of the pure state ρ = | ψ i h ψ | , with possibly d = d n → ∞ , i. e.the model Q n of (20) is replaced by Q n := (cid:8) ρ ⊗ n : ρ ∈ H d (cid:9) . Since H d can be written H d = S ,d where S r,d := { ρ : h e i | ρ | e j i = 0 , i, j ≥ d, rank( ρ ) = r } , the model is effectively a special case of the d × d density matrices of rank( ρ ) = r considered in [47]. In [47]however, it is not known in advance that r = 1 but ρ is a density matrix of possibly low rank r , and the aimis estimation of ρ using quantum state tomography performed on n identical copies of ρ . Data are obtained bydefining an observable ⊗ ni =1 E i where E , . . . , E n are i.i.d. uniformly selected elements of the Pauli basis of thelinear space of d × d Hermitian matrices, and applying the corresponding measurement to ρ ⊗ n . Let ˆ ρ ∗ n denotean arbitrary estimator of ρ based on that measurement. A lower asymptotic risk bound for norm-one risk isestablished; in the special case d r = o ( n ) it reads as inf ˆ ρ ∗ n sup ρ ∈S r,d E ρ h k ˆ ρ ∗ n − ρ k i ≥ c r d n (37)for some c > (Theorem 10 in [47]). It is also shown in [47] that (37) is attained, up to a different constantand logarithmic terms, by an entropy penalized least squares type estimator based on measurement of ⊗ ni =1 E i ,even when the rank r is unknown. Analogous optimal rates for d × d mixed states ρ with Pauli measurements,but under sparsity assumptions on the entries of the matrix ρ have been obtained in [16].Returning to our setting of pure states, where r = 1 is known, with an infimum over all measurements of ρ ⊗ n and corresponding estimators ˆ ρ n , according to [35] one has inf ˆ ρ n sup ρ ∈S ,d E ρ h k ˆ ρ n − ρ k i = 4 ( d − d + n (38)and the bound is attained by an estimator of the pure state ρ based on the covariant measurement (cf. equation(30) ). Comparing (37) for r = 1 and d n → ∞ , d n = o ( n ) with (38), we find that the latter bound is of AE OF PURE STATES ENSEMBLES AND QUANTUM GAUSSIAN WHITE NOISE 21 order d n /n whereas the former is of order d n /n . It means that for estimation of finite dimensional pure states,estimators based on the Pauli type measurement ⊗ ni =1 E i do not attain the optimal rate when d n → ∞ . Itmay be conjectured that the same holds for the optimal rate over ρ ∈ S α ( L ) , i.e. our rate of Theorem 5.1.We emphasize again that our results establish lower asymptotic risk bounds over all quantum measurementsand estimators, whereas lower risk bounds within one specific measurement scheme [46] [47] [16] are essentiallyresults of non-quantum classical statistics. Separate measurements.
A notable fact is also that ⊗ ni =1 E i is a separate (or local) measurement, i.e. producesindependent random variables (or random elements) Y , . . . , Y n each based on a measurement of a copy of ρ , whereas the covariant measurement (cp. ( )) we used for attainment our risk bound of Theorem 5.1 is of collective (or joint) type with regard to the product ρ ⊗ n . Separate measurements are of interest from a practicalpoint of view since collective measurements of large quantum systems may be unfeasible in implementations[51]. In [5] it is shown that for fixed d = 2 , the bound (38) can be attained asymptotically as n → ∞ (up to afactor o (1) ) by a separate measurement of ρ ⊗ n ; it is an open question whether in our infinite dimensionalsetting, the optimal rate of Theorem 5.1 can be attained by a separate measurement. For mixed qubits ( d = 2 ),an asymptotic efficiency gap between separate and collective measurements is known to exist [4]. Quadratic functionals.
The elbow phenomenon.
The change of regime which occurs in the optimal MSE rate η n in (32) has beendescribed as the elbow phenomenon in the literature [15]. In the classical Gaussian sequence model, it takesthe following shape. Consider observations introduced in (17): y j = ϑ j + n − / ξ j , j = 1 , , . . . , where { ξ j } are i.i.d. standard normal, and the parameter ϑ = ( ϑ j ) ∞ j =1 satisfies a restriction P ∞ j =1 j α ϑ j ≤ L for some α > . For estimation of the quadratic functional ˜ F ( ϑ ) = P ∞ j =1 j β ϑ j with β < α , the minimax MSErate of convergence is ˜ η n = ( n − if α ≥ β + 1 / n − α − β )4 α +1 if β < α < β + 1 / n − r for ˜ r = min (cid:18) , α − β )4 α + 1 (cid:19) (cf [45] and references cited therein). The same rate holds for estimation of the squared L -norm of the β -thderivative of a density in an α -Hölder class, cf. [7]. Comparing with our rate η n in (32) which can be written η n = n − r for r = min (cid:16) , α − β )4 α (cid:17) , we see that both rates exhibit the elbow phenomenon, but at differentcritical values for ( α, β ) , and the rate for the quantum case is slightly faster in the region α < β + 1 / . A tail functional of a discrete distribution.
Our method of proof for the optimal rate η n = n − r shows thatit is also the optimal rate in the following non-quantum problem: suppose P = { p j } ∞ j =0 is a probabilitymeasure on the nonnegative integers, satisfying a restriction P ∞ j =0 j α p j ≤ L , and the aim is to estimate thelinear functional F ( P ) = P ∞ j =0 j β p j (which might be called a linear tail functional) on the basis of n i.i.d.observations X , . . . , X n having law P . Indeed, Theorem 5.4 shows that the estimator ˆ F n = P Nj =0 j β ˆ p j with ˆ p j = n − P ni =1 I ( X i = j ) attains the rate η n for mean square error, for an appropriate choice of N . On theother hand, the observations X , . . . , X n are obtained from one specific measurement in the quantum model(20), in such a way that p j = | ψ j | for j ≥ and F ( P ) = F ( ψ ) . If the rate η n is unimprovable in the quantummodel then it certainly is in the present derived (less informative) classical model. In the latter model, wenote that since F ( P ) is linear and the law P is restricted to a convex body, optimality of the rate η n can beconfirmed by standard methods, e.g. based on the concept of modulus of continuity [19]. The current problem isthus an example where the elbow phenomenon is present for estimation of a linear functional; a specific featurehere is that the probability measure P is discrete. Fuzzy quantum hypotheses.
Our method of proof of the lower bound for quadratic functionals, which works inthe approximating quantum Gaussian model, utilizes the well-known idea of setting up two prior distributionsand then invoking a testing bound between simple hypotheses. This has been described as the method of fuzzyhypotheses in the literature [64]. A summary of the present quantum variant could be as follows. First, theGaussian quantum model is represented in a fashion analogous to the classical sequence model (17) where the ϑ j correspond to the displacement parameter u j in certain Gaussian pure states (the coherent states). Thesedisplacement parameters are then assumed to be random as independent, non-identically distributed normal,for j = 1 , . . . , N where N = o ( n ) . Now Gaussian averaging over the displacements u j leads to certain non-pureGaussian states, i.e. the thermal states as the alternative, which happen to commute with the vacuum pure state(corresponding to u j = 0 ) as the null hypothesis. Even though both are again Gaussian states, by commutation the problem is reduced to testing between two ordinary discrete probability distributions, i.e. the point massat and a certain geometric distribution with parameter r j , depending on j = 1 , . . . , N . The combined errorprobability for this classical testing problem with N independent observations gives the lower risk bound. Nonparametric testing.
The separation rate n − / . Recall that for the classical Gaussian sequence model (17), for the testing problem H : ϑ = 0 H ( ϕ n ) : P ∞ j =1 j α ϑ j ≤ L and k ϑ k ≥ ϕ n (39)(Sobolev ellipsoid with an L -ball removed), the separation rate is ϕ n = n − α/ (4 α +1) [41]. We established that ϕ n = n − / is the separation rate for the quantum nonparametric testing problem (34) involving a pure state ρ . While this “parametric” rate for a nonparametric problem is somewhat surprising, it should be noted thatthere also exist testing problems for classical i.i.d. data with nonparametric alternative where that separationrate applies; cf [41], sec. 2.6.2.In our case, the rate n − / appears to be related to the fast rate ϕ n = n − in the following nonparametricclassical problem: given n i.i.d. observations X , . . . , X n having law P = { p j } ∞ j =0 on the nonnegative integers,the hypotheses are H : P = δ (the degenerate law at ) H ( ϕ n ) : k P − δ k ≥ ϕ n . (40)For that, note first that k P − δ k = 1 − p + ∞ X j =1 p j = 2 (1 − p ) . The likelihood ratio test for δ against any P ∈ H ( ϕ n ) rejects if max ≤ j ≤ n X j > , thus it does not depend on P . The pertaining sum of error probabilities is P (cid:18) max ≤ j ≤ n X j = 0 (cid:19) = p n = (cid:18) − k P − δ k (cid:19) n ≤ (cid:18) − ϕ n (cid:19) n and with a supremum over P ∈ H ( ϕ n ) , the upper bound is attained. This means that for ϕ n = cn − / ,the minimax sum of error probabilities tends to exp (cid:0) − c / (cid:1) , so that ϕ n = n − is the separation rate here asclaimed.In fact there is a direct connection to the quantum nonparametric testing problem (34): in the latter, for n = 1 ,consider a measurement defined as follows. Let {| ˜ e j i} ∞ j =0 be an orthonormal basis in H such that ρ = | ˜ e i h ˜ e | and consider the POVM {| ˜ e j i h ˜ e j |} ∞ j =0 ; the corresponding measurement yields a probability measure P on thenonnegative integers. Here the state ρ is mapped into δ and an alternative state ρ is mapped into P = { p j } ∞ j =0 such that p = Tr ( ρ ρ ) . Condition (
33) on the distance of the two states implies (cp (12)) ϕ n ≤ k ρ − ρ k = 2 p − Tr ( ρ ρ ) = 2 p − p = q k P − δ k so that up to a constant, the testing problem (40) is obtained.In the quantum problem (34), we noted that the optimal test between ρ and a specific alternative ρ dependson ρ , but found that the test (binary POVM) M n = (cid:8) ρ ⊗ n , I − ρ ⊗ n (cid:9) is minimax optimal in the sense of the rateand also in the sense of a sharp risk asymptotics. The sharp minimax optimality seems to be a specific resultfor the quantum case. We note that the optimal test M n can be realized via a measurement {| ˜ e j i h ˜ e j |} ∞ j =0 asdescribed above, applied separately to each component of ρ ⊗ n , resulting in independent identically distributedr.v.’s X , . . . , X n . The test M n then amounts to rejecting H if max ≤ j ≤ n X j > . Note that this measurementis incompatible with the one (30) providing the optimal rate for state estimation. Other separation rates.
In our proof of the lower bound for quadratic functionals, we formulated the nonpara-metric testing problem for pure states (
50) where the alternative includes the restriction P j ≥ | ψ j | j β ≥ η n ,and established that the rate η n = n − β/α is unimprovable there. Introduce a seminorm k ψ k ,β = X j ≥ | ψ j | j β / AE OF PURE STATES ENSEMBLES AND QUANTUM GAUSSIAN WHITE NOISE 23 (excluding the term for j = 0 ) and write the restriction as k ψ k ,β ≥ ϕ n = η / n ; (41)then the case β = 0 gives (cp (12)) ϕ n ≤ X j ≥ | ψ j | = 1 − | ψ | = 1 − |h ψ | e i| = 14 k| e i h e | − | ψ i h ψ |k , in other words, for ρ = | e i h e | and ρ = | ψ i h ψ | , the restriction (41) is equivalent to k ρ − ρ k ≥ ϕ n . Inthat sense, the testing problems (34) and (
50) are equivalent up to a constant, if β = 0 and ρ = | e i h e | . For β > , the testing problem (
50) is a quantum pure state analog of the generalization of the classical problem(39) where k ϑ k ≥ ϕ n is replaced by k ϑ k ,β ≥ ϕ n ( α -ellipsoid with a β -ellipsoid removed); the separation ratein the latter is ϕ n = n − α − β ) / (4 α +1) , cf. [41], sec. 6.2.1. In (50) the separation rate is ϕ n = n − / β/ α , i.e.of the more typical nonparametric form as well. References [1] Artiles, L, Gill, R., and Guţă, M. An invitation to quantum tomography.
J. Royal Statist. Soc. B (Method-ological) , 67:109–134, 2005.[2] K. M. R. Audenaert, J. Calsamiglia, R. Muñoz Tapia, E. Bagan, Ll. Masanes, A. Acin, and F. Verstraete.Discriminating states: The quantum Chernoff bound.
Phys. Rev. Lett. , 98:160501, 2007.[3] K. M. R. Audenaert, M. Nussbaum, A. Szkola, and F. Verstraete. Asymptotic error rates in quantumhypothesis testing.
Commun. Math. Phys. , 279:251–283, 2008.[4] E. Bagan, M. A. Ballester, R. D. Gill, R. Muñoz Tapia, and O. Romero-Isart. Separable measurementestimation of density matrices and its fidelity gap with collective protocols.
Phys. Rev. Lett. , 97:130501,Sep 2006.[5] E. Bagan, A. Monras, and R. Muñoz Tapia. Comprehensive analysis of quantum pure-state estimation fortwo-level systems.
Phys. Rev. A , 71:062318, Jun 2005.[6] O. E. Barndorff-Nielsen, Gill, R., and Jupp, P. E. On quantum statistical inference (with discussion).
J.R. Statist. Soc. B , 65:775–816, 2003.[7] P. J. Bickel and Y. Ritov. Estimating integrated squared density derivatives: sharp best order of convergenceestimates.
Sankhy¯a Ser. A , 50(3):381–393, 1988.[8] B. Bongioanni and J. L. Torrea. Sobolev spaces associated to the harmonic oscillator.
Proc. Indian Acad.Sci. Math. Sci. , 116(3):337–360, 2006.[9] Bruno Bongioanni. Sobolev spaces diversification.
Rev. Un. Mat. Argentina , 52(2):23–34, 2011.[10] S. L. Braunstein and Caves C. M. Statistical distance and the geometry of quantum states.
Phys. Rev.Lett. , 72:3439–3443, 1994.[11] Lawrence D. Brown, T. Tony Cai, Mark G. Low, and Cun-Hui Zhang. Asymptotic equivalence theoryfor nonparametric regression with random design.
Ann. Statist. , 30(3):688–707, 2002. Dedicated to thememory of Lucien Le Cam.[12] Lawrence D. Brown and Mark G. Low. Asymptotic equivalence of nonparametric regression and whitenoise.
Ann. Statist. , 24(6):2384–2398, 1996.[13] C. Butucea, M. Guţă, and L. Artiles. Minimax and adaptive estimation of the Wigner function in quantumhomodyne tomography with noisy data.
Annals of Statistics , 35:465–494, 2007.[14] C. Butucea, M. Guţă, and T. Kypraios. Spectral thresholding quantum tomography for low rank states.
New Journal of Physics , 17:113050, 2015.[15] T. Tony Cai and Mark G. Low. Nonquadratic estimators of a quadratic functional.
Ann. Statist. ,33(6):2930–2956, 2005.[16] Tony Cai, Donggyu Kim, Yazhen Wang, Ming Yuan, and Harrison H. Zhou. Optimal large-scale quantumstate tomography with Pauli measurements.
Ann. Statist. , 44(2):682–712, 2016.[17] A. Carpentier, J. Eisert, D. Gross, and R. Nickl. Uncertainty Quantification for Matrix Compressed Sensingand Quantum Tomography Problems. arXiv:1504.03234.[18] M. Christandl and R. Renner. Reliable quantum state tomography.
Phys.Rev. Lett. , 109:120403, 2012.[19] David L. Donoho and Richard C. Liu. Geometrizing rates of convergence ii,iii.
Ann. Statist. , 19(2):633–667,668–701, 1991.[20] M. S. Ermakov. Minimax detection of a signal in Gaussian white noise.
Teor. Veroyatnost. i Primenen. ,35(4):704–715, 1990.[21] P. Faist and R. Renner. Practical and reliable error bars in quantum tomography.
Phys. Rev. Lett. ,117:010404, 2016.[22] S. T. Flammia, D. Gross, Y.-K. Liu, and J. Eisert. Quantum Tomography via Compressed Sensing: ErrorBounds, Sample Complexity, and Efficient Estimators.
New. J. Phys. , 14:095022, 2012. [23] C. A. Fuchs and J. van de Graaf. Cryptographic distinguishability measures for quantum-mechanical states.
IEEE Transactions Information Theory , 45, 1999.[24] R. D. Gill and S. Massar. State estimation for large ensembles.
Phys. Rev. A , 61:042312, 2000.[25] Richard D. Gill and Mădălin I. Guţă. On asymptotic quantum statistical inference. In
From probability tostatistics and back: high-dimensional models and processes , volume 9 of
Inst. Math. Stat. (IMS) Collect. ,pages 105–127. Inst. Math. Statist., Beachwood, OH, 2013.[26] Georgi K. Golubev, Michael Nussbaum, and Harrison H. Zhou. Asymptotic equivalence of spectral densityestimation and Gaussian white noise.
Ann. Statist. , 38(1):181–214, 2010.[27] Ion Grama and Michael Nussbaum. Asymptotic equivalence for nonparametric generalized linear models.
Probab. Theory Related Fields , 111(2):167–214, 1998.[28] D. Gross. Recovering Low-Rank Matrices From Few Coefficients in Any Basis.
IEEE Transactions onInformation Theory , 57:1548–1566, 2011.[29] D. Gross, Y.-K. Liu, S.T. Flammia, S. Becker, and J. Eisert. Quantum State Tomography via CompressedSensing.
Physical Review Letters , 105:150401, 2010.[30] M. Guţă, B. Janssens, and J. Kahn. Optimal estimation of qubit states with continuous time measurements.
Commun. Math. Phys. , 277:127–160, 2008.[31] M. Guţă and J. Kahn. Local asymptotic normality for qubit states.
Phys. Rev. A , 73(5):052108, 2006.[32] M. Guta and J. Kahn. Local asymptotic normality and optimal estimation for d-dimensional quantumsystems. In
Quantum Stochastics and Information: statistics, filtering and control , pages 300–322. WorldScientific, 2008.[33] Mădălin Guţă and Anna Jenčová. Local asymptotic normality in quantum statistics.
Comm. Math. Phys. ,276(2):341–379, 2007.[34] H. Häffner, W. Hänsel, C. F. Roos, J. Benhelm, D. Chek-al kar, M. Chwalla, T. Körber, U. D. Rapol,M. Riebe, P. O. Schmidt, C. Becher, O. Gühne, W. Dür, and R. Blatt. Scalable multiparticle entanglementof trapped ions.
Nature , 438:643, 2005.[35] M. Hayashi. Asymptotic estimation theory for a finite-dimensional pure state model.
Journal of PhysicsA: Mathematical and General , 31(20), 1998.[36] Masahito Hayashi, editor.
Asymptotic theory of quantum statistical inference: selected papers . WorldScientific, 2005.[37] Carl W. Helstrom.
Quantum Detection and Estimation Theory . Academic Press, New York, 1976.[38] F. Hiai and D. Petz. The proper formula for relative entropy and its asymptotics in quantum probability.
Commun. Math. Phys. , 143:99–114, 1991.[39] A. S. Holevo.
Probabilistic and Statistical Aspects of Quantum Theory . North-Holland, 1982.[40] Yuri Ingster and Natalia Stepanova. Estimation and detection of functions from anisotropic Sobolev classes.
Electron. J. Stat. , 5:484–506, 2011.[41] Yuri I. Ingster and Irina A. Suslina.
Nonparametric goodness-of-fit testing under Gaussian models , volume169 of
Lecture Notes in Statistics . Springer-Verlag, New York, 2003.[42] P. Ji and M. Nussbaum. Sharp minimax adaptation over Sobolev ellipsoids in nonparametric testing.
Toappear, Electron. J. Stat. [43] J. Kahn and M. Guţă. Local asymptotic normality for finite dimensional quantum systems.
Commun.Math. Phys. , 289:597–652, 2009.[44] Vladislav Kargin. On the Chernoff bound for efficiency of quantum hypothesis testing.
Ann. Statist. ,33(2):959–976, 2005.[45] Jussi Klemelä. Sharp adaptive estimation of quadratic functionals.
Probab. Theory Related Fields ,134(4):539–564, 2006.[46] Vladimir Koltchinskii. Von Neumann entropy penalization and low-rank matrix estimation.
Ann. Statist. ,39(6):2936–2973, 2011.[47] Vladimir Koltchinskii and Dong Xia. Optimal estimation of low rank density matrices.
J. Mach. Learn.Res. , 16:1757–1792, 2015.[48] L. Le Cam.
Asymptotic Methods in Statistical Decision Theory . Springer Verlag, New York, 1986.[49] Ulf Leonhardt.
Measuring the Quantum State of Light . Cambridge University Press, 1997.[50] K. Li. Discriminating quantum states: The multiple Chernoff distance.
Ann. Statist. , 44:1661–1679, 2016.[51] Ranjith Nair, Saikat Guha, and Si-Hui Tan. Realizable receivers for discriminating coherent and multicopyquantum states near the quantum limit.
Phys. Rev. A , 89:032318, Mar 2014.[52] Michael A. Nielsen and Isaac L. Chuang.
Quantum Computation and Quantum Information . CambridgeUniversity Press, Cambridge, 2000.[53] M. Nussbaum and A. Szkola. The Chernoff lower bound for symmetric quantum hypothesis testing.
Ann.Statist. , 37:1040–1057, 2006.[54] Michael Nussbaum. Asymptotic equivalence of density estimation and Gaussian white noise.
Ann. Statist. ,24(6):2399–2430, 1996.
AE OF PURE STATES ENSEMBLES AND QUANTUM GAUSSIAN WHITE NOISE 25 [55] T. Ogawa and H. Nagaoka. Strong converse and Stein’s lemma in quantum hypothesis testing.
IEEETransactions on Information Theory , 46:2428–2433, 2000.[56] M. G. A. Paris and J. Řeháček, editors.
Quantum State Estimation , 2004.[57] K. R. Parthasarathy.
An Introduction to Quantum Stochastic Calculus . Modern Birkhäuser Classics.Birkhäuser/Springer Basel AG, Basel, 1992.[58] D. Petz and Jencova, A. Sufficiency in quantum statistical inference.
Commun. Math. Phys. , 263:259 –276, 2006.[59] Markus Reiß. Asymptotic equivalence for nonparametric regression with multivariate and random design.
Ann. Statist. , 36(4):1957–1982, 2008.[60] Angelika Rohde. On the asymptotic equivalence and rate of convergence of nonparametric regression andGaussian white noise.
Statist. Decisions , 22(3):235–243, 2004.[61] H. Strasser.
Mathematical Theory of Statistics . de Gruyter, 1985.[62] K. Temme and F. Verstraete. Quantum chi-squared and goodness of fit testing.
J. Math. Phys. , 56:012202,2015.[63] E. Torgersen.
Comparison of Statistical Experiments . Cambridge University Press, 1991.[64] Alexandre B. Tsybakov.
Introduction to Nonparametric Estimation . Springer Series in Statistics. Springer,New York, 2009.[65] M. Walter and J. M. Renes. Lower bounds for quantum parameter estimation.
IEEE Transactions onInformation Theory , 60:8007–8023, 2014. 6.
Proofs
Proof of Theorem 5.2.
Let us denote by R En = inf b ψ n sup ψ ∈ S α ( L ) E ψ h k b ψ n − ψ k i the minimax risk.The first step is to reduce the set of states S α ( L ) to a finite hypercube denoted S α N ( L ) consisting of certain“truncated” vectors | ψ i = P ≤ i ≤ N ψ i | e i i which have N ≍ n / (2 α +1) non-zero coefficients with respect to thestandard basis. This will provide a lower bound to the minimax risk. The coefficients are chosen as ψ j = ± σ j √ n , σ j = λ (1 − ( j/N ) α ) , j = 1 , . . . , N, for some fixed λ > and we check that they satisfy the ellipsoid constraint X j ≥ | ψ j | j α = λn N X j =1 ( j α − j α N − α ) ≤ N α +1 n αλ (2 α + 1)(4 α + 1) (1 + o (1)) ≤ L for an appropriate choice of λ > .Using the factorisation property (9) we can identify the corresponding Gaussian states with the N -mode statedefined by | φ i = ⊗ Nj =1 | G ( √ nψ j ) i , where the remaining modes are in the vacuum state and can be ignored.Thus R En ≥ inf b ψ sup ψ ∈ S α N ( L ) E ψ h k b ψ − ψ k i = inf b ψ sup ψ ∈ S α N ( L ) E ψ N X j =1 | b ψ j − ψ j | . The supremum over the finite hypercube S α N ( L ) is bounded from below by the average over all its elements.This turns the previous maximal risk into a Bayesian risk, that we can further bound from below as follows: R En ≥ inf b ψ N X ψ ∈ S α N ( L ) N X j =1 E ψ h | b ψ j − ψ j | i = inf b ψ N X j =1 N X ψ ∈ S α N ( L ) E ψ h | b ψ j − ψ j | i ≥ N X j =1 inf b ψ j N X ψ ∈ S α N ( L ) E ψ h | b ψ j − ψ j | i . (42) In the second line b ψ is the result of an arbitrary measurement and estimation procedure of the state | G ( √ nψ ) i .In the third line each infimum is over procedures for estimating the component ψ j only; since such proceduremay not be compatible with a single measurement, the third line is upper bounded by the second.The second major step in the proof of the lower bounds is to reduce the risk over all measurements, to testingtwo simple hypotheses. Let us bound from below the term (42) for arbitrary fixed j between 1 and N : N X ψ ∈ S α N ( L ) E ψ h | b ψ j − ψ j | i = 12 N − X ψ ∈ S α ( j +) ( L ) E ψ h | b ψ j − σ j / √ n | i + 12 N − X ψ ∈ S α ( j − ) ( L ) E ψ h | b ψ j − ( − σ j / √ n ) | i = 12 n E ρ + j h | b ψ j − σ j / √ n | i + E ρ − j h | b ψ j − ( − σ j / √ n ) | io , (43)where the sum over ψ ∈ S α ( j ± ) ( L ) means that the j th coordinate is fixed to ± σ j / √ n and all k th coordinates,for k = j , take values in { σ k / √ n, − σ k / √ n } . In the third line, we denote by ρ ± j the average state over states in S α ( j ± ) ( L ) .Let us define the testing problem of the two hypotheses H : ρ = ρ + j against H : ρ = ρ − j . For a given estimator b ψ j we construct the test ∆ = I (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) b ψ j − σ j √ n (cid:12)(cid:12)(cid:12)(cid:12) > (cid:12)(cid:12)(cid:12)(cid:12) b ψ j − ( − σ j √ n ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:19) , and decide H or H , if ∆ equals 1 or 0, respectively. By the Markov inequality, we get that E ρ ± j "(cid:12)(cid:12)(cid:12)(cid:12) b ψ j − ( ± σ j √ n ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ σ j n P ρ ± j (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) b ψ j − ( ± σ j √ n ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ σ j √ n (cid:19) . On the one hand, P ρ + j (cid:18) | b ψ j − σ j / √ n | ≥ σ j √ n (cid:19) ≥ P ρ + j (∆ = 1) . (44)Indeed, under P ρ + j , the event ∆ = 1 implies that | b ψ j − σ j √ n | > | b ψ j + σ j √ n | , which further implies by the triangularinequality that (cid:12)(cid:12)(cid:12)(cid:12) b ψ j − σ j √ n (cid:12)(cid:12)(cid:12)(cid:12) ≥ σ j √ n − (cid:12)(cid:12)(cid:12)(cid:12) b ψ j + σ j √ n (cid:12)(cid:12)(cid:12)(cid:12) ≥ σ j √ n − (cid:12)(cid:12)(cid:12)(cid:12) b ψ j − σ j √ n (cid:12)(cid:12)(cid:12)(cid:12) , giving | b ψ j − ψ j | ≥ σ j √ n . By a similar reasoning for the P ρ − j distribution we get P ρ − j (cid:18) | b ψ j + σ j / √ n | ≥ σ j √ n (cid:19) ≥ P ρ − j (∆ = 0) . (45)By using (44) and (45) in (43) (cid:26) E ρ + j (cid:20)(cid:12)(cid:12)(cid:12) b ψ j − σ j / √ n (cid:12)(cid:12)(cid:12) (cid:21) + E ρ − j (cid:20)(cid:12)(cid:12)(cid:12) b ψ j − ( − σ j / √ n ) (cid:12)(cid:12)(cid:12) (cid:21)(cid:27) ≥ σ j n (cid:16) P ρ + j (∆ = 1) + P ρ − j (∆ = 0) (cid:17) . To summarise, we have lower bounded the MSE by the probability of error for testing between the states ρ ± j .At closer inspection, these states are of the form | G ( σ j ) ih G ( σ j ) | ⊗ ρ and | G ( − σ j ) ih G ( − σ j ) | ⊗ ρ where ρ is afixed state obtained by averaging the coherent states of all the modes except j . Recall that the optimal testingerror in (11) gives a further bound from below P ρ + j (∆ = 1) + P ρ − j (∆ = 0) ≥ − k ρ + j − ρ − j k . Moreover, the state ρ can be dropped without changing the optimal testing error k ρ + j − ρ − j k = k| G ( σ j ) ih G ( σ j ) | − | G ( − σ j ) ih G ( − σ j ) |k = 2(1 − exp( − σ j )) . We conclude that inf b ψ j (cid:26) E ρ + j (cid:20)(cid:12)(cid:12)(cid:12) b ψ j − σ j / √ n (cid:12)(cid:12)(cid:12) (cid:21) + E ρ − j (cid:20)(cid:12)(cid:12)(cid:12) b ψ j − ( − σ j / √ n ) (cid:12)(cid:12)(cid:12) (cid:21)(cid:27) ≥ σ j n · exp( − σ j ) AE OF PURE STATES ENSEMBLES AND QUANTUM GAUSSIAN WHITE NOISE 27 and we further use this in (43) to get R En ≥ N X j =1 σ j n · exp( − σ j ) = Nn · λ N N X j =1 (cid:18) − ( jN ) α (cid:19) exp (cid:18) − · λ (1 − ( jN ) α ) (cid:19) ≥ c Nn . Indeed, the average over j is the Riemann sum associated to the integral of a positive function and can bebounded from below by some constant c > depending on α . Moreover, N/n ≍ n − α/ (2 α +1) and thus we finishthe proof of the theorem. (cid:3) Proof of Theorem 5.3.
Let ˜ R En = inf | b ψ n i sup | ψ i∈ S α ( L ) E ρ (cid:2) d (ˆ ρ n , ρ ) (cid:3) be the minimax risk for Q n .We bound from below the risk by restricting to (pure) states in a neigbourhood Σ n ( e ) of the basis vector | e i defined as follows. As in (22) we write the state and the estimator in terms of their corresponding local vectors | ψ i = p − k u k | e i + | u i , | ˆ ψ i = p − k ˆ u k | e i + | ˆ u i , | u i , | ˆ u i ⊥ | e i . Then the neighbourhood is given by Σ n ( e ) := {| ψ u i : k u k ≤ γ n } , where γ n = ( n log n ) − / . Such states aredescribed by the local model Q n ( e , γ n ) , cf. equation (23). The risk is bounded from below by ˜ R En ≥ inf | b ψ n i sup | ψ i∈ S α ( L ) ∩ Σ n ( e ) E ρ (cid:2) d (ˆ ρ n , ρ ) (cid:3) . By using the triangle inequality we can assume that ˆ ψ ∈ Σ n ( e ) , while incurring at most a factor 2 in the risk.By using the quadratic approximation (28) we find that d ( b ρ n , ρ ) = k k u − ˆ u k + o ( n − ) (46)where k = 1 or k = 4 depending on which distance we use. Since n − decreases faster than n − α/ (2 α +1) , thesecond term does not contribute to the asymptotic rate and can be neglected, so that the problem has beenreduced to that of estimating the local parameter u with respect to the Hilbert space distance. To study thelatter, we further restrict the set of states to a hypercube similar to the one in the proof of Theorem 5.2,consisting of states | ψ u i with “truncated” local vectors | u i = P ≤ i ≤ N u i | e i i belonging to S α N ( L ) . As before,there are N ≍ n / (2 α +1) non-zero coefficients of the form u j = ± σ j √ n , σ j = λ (1 − ( j/N ) α ) , j = 1 , . . . , N. It has been already shown that such vectors belong to the ellipsoid S α ( L ) . Additionally, we show that they alsobelong to the local ball Σ n ( e ) . Indeed k u k = N X j =1 | u j | = 1 n N X j =1 σ j = 1 n N X j =1 λ (cid:16) − ( j/N ) α (cid:17) = Nn N N X j =1 λ (cid:16) − ( j/N ) α (cid:17) ≤ C Nn , where we used that as N → ∞ the expression between the parentheses tens to a finite integral. As N scales as n / (2 α +1) , the upper bound becomes k e − ψ u k ≤ C n − α/ (2 α +1) = o ( γ n ) and the state | ψ u i belongs to the local ball Σ n ( e ) . Taking into account (46) the risk is therefore lower boundedas ˜ R En ≥ inf b u sup u ∈ S α N ( L ) E ρ u (cid:2) k u − ˆ u k (cid:3) + o ( n − ) . where ρ u = | ψ u ih ψ u | , and the infimum is now taken over the local component | ˆ u i of an estimator | ˆ ψ i = p − k ˆ u k | e i + | ˆ u i . The first term is further lower bounded by passing to the Bayes risk for the uniformdistribution over S α N ( L ) , similarly to the proof of Theorem 5.2 ˜ R En ≥ N X j =1 inf b u j N X u ∈ S α N ( L ) E ψ u (cid:2) | b u j − u j | (cid:3) + o ( n − ) . By following the same steps we get N X u ∈ S α N ( L ) E ρ u (cid:2) | b u j − u j | (cid:3) = 12 n E τ + j h | b ψ j − σ j / √ n | i + E τ − j h | b ψ j − ( − σ j / √ n ) | io , ≥ σ j n (cid:16) P τ + j (∆ = 1) + P τ − (∆ = 0) (cid:17) ≥ σ j n · (1 − k τ + j − τ − j k ) , (47) where we denote by τ ± j the average state over states | ψ u ih ψ u | ⊗ n with u ∈ S α ( j ± ) ( L ) , and ∆ is a test for thehypotheses H : τ = τ + j and H : τ = τ − j . In the last inequality we used the Helstrom bound [37] whichexpresses the optimal average error probability for two states discrimination in terms of the norm-one distancebetween states.We now make use of the local asymptotic equivalence result in Theorem 4.1. From (25) we know that thereexist quantum channels S n such that δ n := max u ∈ S α N ( L ) (cid:13)(cid:13) | ψ u ih ψ u | ⊗ n − S n (cid:0) | G ( √ nu ) ih G ( √ nu ) | (cid:1)(cid:13)(cid:13) ≤ ∆( Q n , G n ) = o (1) . By Lemma 3.1 we get k τ + j − τ − j k ≤ k ρ + j − ρ − j k + 2 δ n where ρ ± j are the corresponding mixtures in the Gaussian model as defined in the proof of Theorem 5.2. From(47) we then get N X u ∈ S α N ( L ) E ρ u (cid:2) | b u j − u j | (cid:3) ≥ σ j n · (1 − k ρ + j − ρ − j k − δ n ) ≥ σ j n · (exp( − σ j ) − δ n ) The rest of the proof follows as in the proof of Theorem 5.2, with the additional remark that min j exp( − σ j ) = λ (1 − N − α ) ≍ λ and infinitely larger than δ n , for n large enough. (cid:3) Proof of Theorem 5.5.
Denote by R Fn = inf b F n sup ψ ∈ S α ( L ) η − n · E ψ (cid:16) b F n − F ( ψ ) (cid:17) the minimax risk.The case a) where α ≥ β reduces to the Cramér-Rao bound that proves that the parametric rate /n is alwaysa lower bound for the mean square error for estimating F ( ψ ) .We prove that in the case b) where β < α < β , this bound from below increases to n − − β/α ) (up toconstants). By the Markov inequality, η − n · E ψ (cid:16) b F n − F ( ψ ) (cid:17) ≥ · P ψ (cid:16) | b F n − F ( ψ ) | ≥ η n (cid:17) . (48)Let us restrict the set of pure states S α ( L ) to its intersection with the local model Q n ( e , γ n ) (see equation(23)) where | ψ u i = p − k u k · | e i + | u i is such that k u k ≤ γ n , with γ n = ( n log n ) − / . In other words, u belongs to the set s α ( L, γ n ) = u ∈ ℓ ( N ∗ ) : X j ≥ | u j | j α ≤ L and k u k ≤ γ n . Using the fact that F ( e ) = 0 , we have sup ψ ∈ S α ( L ) · P ψ (cid:16) | b F n − F ( ψ ) | ≥ η n (cid:17) ≥
14 max ( P e (cid:16) | b F n | ≥ η n (cid:17) , sup u ∈ s α ( L,γ n ) ,F ( ψ u ) ≥ η n P ψ u (cid:16) | b F n − F ( ψ u ) | ≥ η n (cid:17)) ≥ ( P e (cid:16) | b F n | ≥ η n (cid:17) + sup u ∈ s α ( L,γ n ) ,F ( ψ u ) ≥ η n P ψ u (cid:16) | b F n − F ( ψ u ) | ≥ η n (cid:17)) ≥ ( P e (cid:16) | b F n | ≥ η n (cid:17) + sup u ∈ s α ( L,γ n ) ,F ( ψ u ) ≥ η n P ψ u (cid:16) | b F n | < η n (cid:17)) (49)where in the last inequality we used that | b F n | < η n / and F ( ψ u ) ≥ η n imply | b F n − F ( ψ u ) | ≥ η n / . Note alsothat F ( ψ u ) = F ( u ) for | u i ∈ H ; we now consider the testing problem with hypotheses (cid:26) H : | u i = | i H ( α, L, γ n , η n ) : | u i , with u ∈ s α ( L, γ n ) and F ( u ) ≥ η n . (50)Let ∆ = ∆( η n ) = I ( | b F n | ≥ η n / be the test that accepts the null hypothesis when ∆ = 0 and rejects thenull hypothesis when ∆ = 1 . Then the right-hand side of (49) is lower bounded by the sum of the error AE OF PURE STATES ENSEMBLES AND QUANTUM GAUSSIAN WHITE NOISE 29 probability of type I and of the maximal error probability of type II of ∆ . We can describe ∆ as a binaryPOVM M = ( M , M ) , depending on η n : M ( η n ) = ( M ( η n ) , M ( η n )) . Thus, P e (cid:16) | b F n | ≥ η n (cid:17) = Tr( | e ih e | ⊗ n · M ) (51)and P ψ u (cid:16) | b F n | < η n (cid:17) = Tr( | ψ u ih ψ u | ⊗ n · M ) . (52)By putting together (48)-(52), we get that the minimax risk has the lower bound R Fn ≥
18 inf M h e ⊗ n | M | e ⊗ n i + sup u ∈ s α ( L,γ n ) ,F ( u ) ≥ η n h ψ ⊗ nu | M | ψ ⊗ nu i ! . Now, using the local asymptotic equivalence Theorem 4.1 with respect to the state | ψ i := | e i we map thei.i.d. ensemble | ψ u i ⊗ n to the Gaussian state | G ( u ) i ∈ F ( H ) . The lower bound becomes R Fn ≥
18 inf Q h | Q | i + sup u ∈ s α ( L,γ n ) ,F ( u ) ≥ η n h G ( √ nu ) | Q | G ( √ nu ) i ! + o (1) (53)where the infimum is taken over tests Q = ( Q , Q ) and the o (1) terms stems from the vanishing Le Camdistance ∆( Q n ( e , γ n ) , G n ( e , γ n )) . The lower bound has been transformed into a testing problem for theGaussian model.In order to bound from below the maximal error probability of type II, we define a prior distribution on theset of alternatives and average over the whole set with respect to this a priori distribution. Similarly to theclassical proofs of lower bounds, our construction will lead to a test of simple hypotheses: the former null andthe constructed averaged state. Assume that { u j } j ≥ are all independently distributed, such that u j has acomplex (bivariate) Gaussian distribution N (0 , σ j · I ) for all j from 1 to N , and that u j = 0 for all j > N ,where I is the × identity matrix. The σ j are defined as σ j = λ − (cid:18) jN (cid:19) α ! + , (54)where λ, N > are selected such that X j ≥ j α σ j = L (1 − ε ) and X j ≥ j β σ j = n − β/α (1 + ε ) , (55)for an arbitrary ε > . Let us denote by Π the joint prior distribution of { u j } j ≥ .Such a choice of the prior distribution was first introduced by Ermakov [20] for establishing sharp minimax riskbounds for nonparametric testing in the Gaussian white noise model. This construction represents an analog ofthe prior distribution used in Pinsker’s theory for sharp estimation of functions. In our case, using a Gaussianprior as an alternative hypothesis leads to the well-known Gaussian thermal state.The essence of this construction is that the random vectors u = { u j } j ≥ concentrate asymptotically, withprobability tending to 1, on the spherical segment { u ∈ ℓ ( N ) : C n − ≤ k u k ≤ C n − (1 + 2 ε ′ ) } , for ε ′ > depending on ε and some constant C > depending on α and β described later on, and on thealternative set of hypothesis, H ( α, L, γ n , η n ) . Note that the spherical segment is included in the set k u k ≤ γ n ,as γ n = ( n log n ) − / ≫ n − / . This is proven by the following lemma. Lemma 6.1.
A unique solution ( λ, N ) of (54), (55), exists for n large enough and admits an asymptoticexpansion with respect to nλ ∼ n − − / α C λ (1 + ε ) ( α +1 / / ( α − β ) (1 − ε ) ( β +1 / / ( α − β ) , C λ = ((2 β + 1)(2 β + 2 α + 1)) ( α +1 / / ( α − β ) α ( L (2 α + 1)(4 α + 1)) ( β +1 / / ( α − β ) N ∼ n / α C N (cid:18) − ε ε (cid:19) / (2( α − β )) , C N = (cid:18) L (2 α + 1)(4 α + 1)(2 β + 1)(2 β + 2 α + 1) (cid:19) / (2( α − β )) . (56) The independent complex Gaussian random variables u j ∼ N (0 , σ j I ) , with σ j ’s and ( λ, N ) given in (54),(55), are such that, for an arbitrary ε > , P C n − ≤ N X j =1 | u j | ≤ C n − (1 + 2 ε ′ ) → , (57) P N X j =1 j α | u j | ≤ L → , (58) P N X j =1 j β | u j | ≥ n − β/α → , (59) where C = C λ · C N · α/ (2 α + 1) is a positive constant depending on α and β , and ε ′ > depends only on ε .Proof of Lemma 6.1. The solution of the problem (54), (55) can be found in [20] (see also [42], Lemma 8 ) for β = 0 ; a similar reasoning applies here. Let us prove that the random variables { u j } j =1 ,...,N satisfy (57) to(59). We have N X j =1 σ j = λ N X j =1 − (cid:18) jN (cid:19) α ! ∼ λN α α + 1 ∼ C λ C N α α + 1 n − (1 + ε ) α/ ( α − β ) (1 − ε ) − β/ ( α − β ) = C n − (1 + ε ′ ) , (60)where we denote ε ′ = (1 + ε ) α/ ( α − β ) (1 − ε ) − β/ ( α − β ) − which is positive for all ε ∈ (0 , .Note that E | u j | = σ j and V ar (cid:16) | u j | (cid:17) = σ j . We have P C n − ≤ N X j =1 | u j | ≤ C n − (1 + 2 ε ′ ) = 1 − P N X j =1 | u j | < C n − − P (cid:16) | u j | > C n − (1 + 2 ε ′ ) (cid:17) . Now, by the Markov inequality, P N X j =1 | u j | < C n − = P N X j =1 ( | u j | − σ j ) < C n − − C n − (1 + ε ′ + o (1)) ≤ P N X j =1 ( σ j − | u j | ) > C n − ( ε ′ + o (1)) ≤ P Nj =1 V ar ( | u j | ) C n − ε ′ / ≤ P Nj =1 σ j C n − ε ′ ≍ λ NC n − ε ′ ≍ n − / α = o (1) . Moreover, P N X j =1 | u j | > C n − (1 + 2 ε ′ ) = P N X j =1 ( | u j | − σ j ) > C n − ( ε ′ + o (1)) = o (1) , which finishes the proof of (57).Also, in view of (55), we have P N X j =1 j α | u j | > L = P N X j =1 j α ( | u j | − σ j ) > L ε ≤ P Nj =1 j α V ar (cid:16) | u j | (cid:17) L ε = P Nj =1 j α σ j L ε ≍ λ N α +1 L ε ≍ n − / α = o (1) , AE OF PURE STATES ENSEMBLES AND QUANTUM GAUSSIAN WHITE NOISE 31 proving (58). Also, P N X j =1 j β | u j | < n − β/α ≤ P N X j =1 j β ( | u j | − σ j ) < − n − β/α ε ≤ P Nj =1 j β V ar ( | u j | ) n − β/α ε = P Nj =1 j β σ j n − β/α ε ≍ λ N β +1 n − β/α ε ≍ n − / α = o (1) , proving (59). (cid:3) Let us go back to (53) and bound from below the maximal error probability of type II by the averaged risk,with respect to our prior measure Π : sup u ∈ s α ( L ) ,F ( u ) ≥ η n h G ( √ n u ) | Q | G ( √ n u ) i ≥ Z H ( α,L,γ n ,η n ) Tr( | G ( √ n u ) ih G ( √ n u ) | · Q )Π( du )= Tr (cid:18)Z | G ( √ n u ) ih G ( √ n u ) | Π( du ) · Q (cid:19) − Z H ( α,L,γ n ,η n ) C Tr( | G ( √ n u ) ih G ( √ n u ) | · Q )Π( du ) ≥ Tr (cid:18)Z | G ( √ n u ) ih G ( √ n u ) | Π( du ) · Q (cid:19) − Π( H ( α, L, γ n , η n ) C ) . In the last inequality we used that
Tr( | G ( √ n u ) ih G ( √ n u ) |· Q ) ≤ . By Lemma 6.1, Π( H ( α, L, γ n , η n ) C ) = o (1) and thus we deduce from (53) that R Fn ≥
18 inf Q (cid:18) Tr ( | G (0) ih G (0) | · Q ) + Tr (cid:18)Z | G ( √ n u ) ih G ( √ n u ) | Π( du ) · Q (cid:19)(cid:19) + o (1) . We recognize in the previous line the sum of error probabilities of type I and II for testing two simple quantumhypotheses, i.e. the underlying state is either | G (0) i or the mixed state Φ := Z | G ( √ n u ) ih G ( √ n u ) | Π( du ) . As a last step of the proof, we characterize more precisely the previous mixed Gaussian state as a thermal stateand use classical results from quantum testing of two simple hypotheses to give the bound from below of thetesting risk. Recall from Section 2.2.2, equation (9) that coherent states | G ( √ n u ) i factorize as tensor productof one-mode coherent states with displacements u j , i.e. ⊗ j ≥ | G ( √ nu j ) i . A coherent state with displacement z = x + iy with x, y ∈ R is fully characterized by its Wigner function given by equation (3). Since the prior isGaussian, our mixed state Φ is Gaussian and can be written Z | G ( √ n u ) ih G ( √ n u ) | Π( du ) = N O j =1 Z | G ( √ n u j ) ih G ( √ n u j ) | Π j ( du j ) ⊗ O j ≥ N +1 | ih | := N O j =1 Φ j ⊗ O j ≥ N +1 | ih | where Π j represents the bivariate centred Gaussian distribution with covariance matrix σ j / · I over the complexplane u j = x j + iy j . Using equation (6), and setting σ = nσ j / there, we find that the individual modes withindex j ≤ N are centred Gaussian thermal states Φ j = Φ( r j ) (cf. definition (4)) with r j = nσ j / ( nσ j + 1) .In order to bound from below the right-hand side term in (53) we use the theory of quantum testing of twosimple hypotheses H : ⊗ j ≥ Φ(0) against H : ⊗ Nj =1 Φ( r j ) ⊗ j ≥ N +1 Φ(0) . Using (11), it is easy to see that this testing problem is equivalent to H : (Φ(0)) ⊗ N against H : ⊗ Nj =1 Φ( r j ) . As the vacuum and the thermal state are both diagonalized by the Fock basis, they commute, which reducesthe problem to a classical test between the N -fold products of discrete distributions H : {G (0) } ⊗ N and H : {⊗ Nj =1 G ( r j ) } . In view of the form (4) of the thermal state, G ( r j ) is the geometric distribution (cid:8) (1 − r j ) r kj (cid:9) ∞ k =0 and G (0) is the degenerate distribution concentrated at . The optimal testing error is given by the maximum likelihood test which decides H if and only if all observations are . The type I error is 0 and the type II erroris N Y j =1 (1 − r j ) = N Y j =1 nσ j + 1 ≥ exp − n N X j =1 σ j ≥ exp( − c ) , for some c > , where in the last inequality we used (60). Using this in (53), we get as a lower bound R Fn ≥ exp( − c ) + o (1) ≥ c , where c > is some constant depending on c . This finishes the proof. (cid:3) Proof of Theorem 5.6.
Let ϕ n = c n n − / for a positive sequence c n . Let M n = ( ρ ⊗ n , I − ρ ⊗ n ) be the well-knownprojection test for the problem (34). Then R M n n = Tr( ρ ⊗ n · ρ ⊗ n ) + Tr( ρ ⊗ n · ( I − ρ ⊗ n ))= (Tr( ρ · ρ )) n = |h ψ | ψ i| n . Let us recall that for any pure states ρ = | ψ ih ψ | and ρ = | ψ ih ψ | , we have k ρ − ρ k = 2 p − |h ψ | ψ i| , (61)thus |h ψ | ψ i| = 1 − k ρ − ρ k and hence R M n n = (cid:18) − k ρ − ρ k (cid:19) n . For any ρ satisfying the alternative hypothesis H ( ϕ n ) , we have k ρ − ρ k ≥ ϕ n and consequently P M n e ( ϕ n ) ≤ (cid:18) − ϕ n (cid:19) n = (cid:18) − c n n − (cid:19) n ≤ (cid:18) exp (cid:18) − c n n − (cid:19)(cid:19) n = exp (cid:18) − c n (cid:19) . (62)If now ϕ n /ϕ ∗ n → ∞ then c n → ∞ and P M n e ( ϕ n ) → , so that the second relation in (35) is fulfilled.Consider now the case ϕ n /ϕ ∗ n → so that c n → . For any vector v ∈ H define k v k α = ∞ X j =0 |h e j | v i| j α ; (63)then k v k α is a seminorm on the space of v fulfilling k v k α < ∞ . The assumption that ρ = | ψ ih ψ | ∈ S α ( L ′ ) means that k ψ k α ≤ L ′ < L . For some N > , consider the linear space H ,N = { u ∈ H : h u | ψ i = 0 , h u | e j i = 0 , j > N } ; it is nonempty if N ≥ . Let u ∈ H ,N , k u k = 1 be an unit vector; and for ε > consider ψ u,ε = ψ p − ε + εu. (64)Then k ψ u,ε k = 1 , ρ u,ε = | ψ u,ε ih ψ u,ε | is a pure state, and |h ψ u,ε | ψ i| = 1 − ε . According to (61) we then have k ρ u,ε − ρ k = 2 q − |h ψ u,ε | ψ i| = 2 ε so for a choice ε = c n n − / / it follows k ρ u,ε − ρ k = ϕ n and ρ u,ε ∈ B ( ϕ n ) . On the other hand, by (64) andthe triangle inequality k ψ u,ε k α ≤ p − ε k ψ k α + ε k u k α . Now k u k α < ∞ for u ∈ H ,N , and by assumption k ψ k α < L / , so for sufficiently large n k ψ u,ε k α ≤ L / and thus ρ u,ε ∈ S α ( L ) . Thus ρ u,ε ∈ S α ( L ) ∩ B ( ϕ n ) for sufficiently large n . By (
11) the optimal errorprobability for testing between states ρ u,ε and ρ fulfills inf M binary POVM R Tn ( ρ ⊗ n , ρ ⊗ nu,ε , M ) = 1 − (cid:13)(cid:13) ρ ⊗ n − ρ ⊗ nu,ε (cid:13)(cid:13) AE OF PURE STATES ENSEMBLES AND QUANTUM GAUSSIAN WHITE NOISE 33 = 1 − q − |h ψ ⊗ n | ψ ⊗ nu,ε i| = 1 − q − |h ψ | ψ u,ε i| n = 1 − q − (1 − ε ) n = 1 − q − (1 − c n n − / n . (65)Obviously if c n → then (cid:0) − c n n − / (cid:1) n → so that inf M binary POVM R Mn ( ρ ⊗ n , ρ ⊗ nu,ε , M ) ≥ o (1) . But since ρ u,ε ∈ S α ( L ) ∩ B ( ϕ n ) we have P ∗ e ( ϕ n ) ≥ inf M binary POVM R Mn ( ρ ⊗ n , ρ ⊗ nu,ε , M ) ≥ o (1) , so that the first relation in (35) is shown. (cid:3) Proof of Theorem 5.7.
It suffices to prove that if ϕ n = c n n − / with c n → c > then P ∗ e ( ϕ n ) → exp (cid:0) − c / (cid:1) .In view of the upper bound (36), if suffices to prove P ∗ e ( ϕ n ) ≥ exp (cid:0) − c / (cid:1) (1 + o (1)) . (66)Recall (cf. (61)) that for any pure states ρ = | ψ ih ψ | and ρ = | ψ ih ψ | , the condition k ρ − ρ k ≥ ϕ n in H ( ϕ n ) is equivalent to a condition for the fidelity F ( ρ, ρ ) = |h ψ | ψ i| ≤ − ϕ n / .Let H ⊂ H be the orthogonal complement of C | ψ i in H . Consider the vector ψ u = p − k u k · ψ + u, u ∈ H and the corresponding pure state | ψ u ih ψ u | defined in terms of the local vector u . We restrict the alternativehypothesis to a smaller set of states such that k u k ≤ γ n , with γ n = n − / (log n ) − . Since the fidelity is givenby F ( ρ , | ψ u ih ψ u | ) = |h ψ u | ψ i| = 1 − k u k , the restricted hypothesis is characterised by − γ n ≤ F ( ρ , | ψ u ih ψ u | ) ≤ − ϕ n / , or ϕ n / ≤ k u k ≤ γ n . and additionally by k ψ u k α ≤ L where k·k α is given by (63).Consider again the linear space H ,N defined in the proof of Theorem 5.7 for a choice N = N n ∼ log n . Since H ,N ⊂ H , we can further restrict the local vector u to u ∈ H ,N . Note that for u ∈ H ,N and k u k ≤ γ n wehave k u k α = N X j =0 |h e j | u i| j α ≤ N α k u k ≤ N α γ n ∼ (log n ) α n − / (log n ) − = o (1) . It follows that k ψ u k α ≤ q − k u k k ψ k α + k u k α ≤ L for sufficiently large n , thus ψ u ∈ S α ( L ) . We can now write the test problem with restricted alternative as H : ρ = ρ H ′ ( ϕ n ) : ρ = | ψ u ih ψ u | : u ∈ H ,N , ϕ n / ≤ k u k ≤ γ n . By the strong approximation proven in Theorem 4.1 we get that the models {| ψ u ih ψ u | ⊗ n , k u k ≤ γ n } and {| G ( √ nu ) ih G ( √ nu ) | , k u k ≤ γ n } are asymptotically equivalent, where G ( √ nu ) is the coherent vector in the Fock space Γ s ( H ) pertaining to √ nu . Note that this proof is very similar to the previous proofs of lower bounds, with a major difference: thereduced set of states under the alternative hypothesis is defined with repect to ρ given by the null hypothesis H instead of an arbitrary state previously.In the asymptotically equivalent Gaussian white noise model, the modified hypotheses concern Gaussian stateswhich can be written in terms of their coherent vectors as H : | G (0) i H ( ϕ n ) : | G ( √ nu ) i : u ∈ H ,N , ϕ n / ≤ k u k ≤ γ n . (67) In order to prove the theorem it is sufficient to prove that inf M n sup ϕ n / ≤k u k≤ γ n , u ∈H ,N R Mn ( | G (0) ih G (0) | , | G ( √ nu ) ih G ( √ nu ) | , M n ) (68) ≥ exp (cid:0) − c / (cid:1) + o (1) (69)as n → ∞ .Note that dim H ,N = N ; let { g j , j = 1 , . . . , N } be an orthogonal basis of H ,N and let | u i = P Nj =1 u j | g j i .The quantum Gaussian white noise model {| G ( √ nu ) i , u ∈ H ,N , k u k ≤ γ n } is then equivalent to the quantumGaussian sequence model {⊗ Nj =1 | G ( √ nu j ) i , k u k ≤ γ n } . From now on | G ( z ) i denotes the coherent vector in theFock space F ( C ) pertaining to z := x + iy ∈ C . Recall that such a state is fully characterized by its Wignerfunction W G ( z ) , which in the case of coherent states is the density fuction of a bivariate Gaussian distribution.We shall bound from below the maximal type 2 error probability in the risk R Mn in (68) sup ϕ n / ≤k u k≤ γ n , u ∈H ,N Tr (cid:0) | G ( √ nu ) ih G ( √ nu ) | · M n (cid:1) (70)by an average over u , where the average is taken with respect to a prior distribution defined as follows. Assumethat u j , j = 1 , . . . , N are independently distributed following a complex centered Gaussian law with variance σ I , where σ = c n εN , for some fixed and arbitrary small ε > , and I is the 2 by 2 identity matrix. Lemma 6.2.
Let Π be the distribution of independent complex random variables u j , for j = 1 , ..., N , each onedistributed as N (cid:18) , σ I (cid:19) , σ = c n εN , for fixed ε > and m = log n . Then as n → ∞ P (cid:18) c n n ≤ k u k ≤ c n n (1 + ε ) (cid:19) → , as n → ∞ , and in particular P ( ϕ n / ≤ k u k ≤ γ n ) → , as n → ∞ .Proof. We have P (cid:18) k u k < c n n (cid:19) = P N X j =1 ( | u j | − σ ) < c n n − N c n εN ≤ Var( P Nj =1 | u j | )( c n − c (1 + ε )) / n = N σ ( c ε + o (1)) / n = N c (1 + ε ) / n N ( c ε + o (1)) / n = (cid:18) εε + o (1) (cid:19) N = o (1) , since N ∼ log n → ∞ . Similarly, as (1 + ε ) > ε , one shows that P (cid:18) k u k > c n n (1 + ε ) (cid:19) → , as n → ∞ and thus we get P (cid:18) c n n ≤ k u k ≤ c n n (1 + ε ) (cid:19) → . As γ n = n − / (log n ) − decays slower than c n /n , and ϕ n / c n n − / / , we deduce that P ( ϕ n / ≤ k u k ≤ γ n ) → as n → ∞ which ends the proof of the lemma. (cid:3) Let us denote by Π the prior distribution introduced in Lemma 6.2. Let us go back to (70) and bound theexpression from below as follows: sup ϕ n / ≤k u k≤ γ n , u ∈H ,N Tr (cid:0) | G ( √ nu ) ih G ( √ nu ) | · M n (cid:1) ≥ Z ϕ n / ≤k u k≤ γ n Tr( | G ( √ nu ) ih G ( √ nu ) | M n )Π( du ) AE OF PURE STATES ENSEMBLES AND QUANTUM GAUSSIAN WHITE NOISE 35 ≥ Z Tr( | G ( √ nu ) ih G ( √ nu ) | M n )Π( du ) − Z { ϕ n / ≤k u k≤ γ n } c Tr( | G ( √ nu ) ih G ( √ nu ) | M n )Π( du ) ≥ Z Tr( | G ( √ nu ) ih G ( √ nu ) | M n )Π( du ) − Π ( { ϕ n / ≤ k u k ≤ γ n } c ) . By Lemma 6.2, we get for (68) sup ϕ n / ≤k u k≤ γ n , u ∈H ,N R Mn ( G (0) , G ( √ nu ) , M n ) ≥ Tr( | G (0) ih G (0) | P n ) + Tr (cid:18)Z | G ( √ nu ) ih G ( √ nu ) | Π( du ) · M n (cid:19) + o (1) . (71)The integral on the right side is a mixed state which can be written as Φ := Z | G ( √ nu ) ih G ( √ nu ) | Π( du ) = N O j =1 Z | G ( √ nu j ) ih G ( √ nu j ) | · Π j ( du j ) . Similarly to the proof of Theorem 5.5 we use equation (6) to show that each of the Gaussian integrals aboveproduces a thermal (Gaussian) state Φ( r ) = (1 − r ) ∞ X k =0 r k | k ih k | , r = nσ nσ + 1 . Since | G (0) ih G (0) | = Φ(0) , the main terms in (71) are the sum of error probabilities for testing two simplehypothesis H : Φ(0) ⊗ N against H : Φ( r ) ⊗ N . Moreover, we have two commuting product states under thetwo simple hypotheses, which reduces the problem to a classical test between the N -fold products of discretedistributions H : {G (0) } ⊗ N and H : {G ( r ) } ⊗ N . Here G ( r ) is the geometric distribution (cid:8) (1 − r ) r k (cid:9) ∞ k =0 ; inparticular s G (0) is the degenerate distribution concentrated at . The optimal testing error is given by themaximum likelihood test which decides H if and only if all observations are . The type 1 error is 0 and thetype 2 error is (1 − r ) N = ( nσ + 1) − N ≥ exp( − N · nσ )= exp (cid:18) − N n c n εN (cid:19) = exp (cid:18) − c (1 + ε )4 (cid:19) . Since ε > was arbitrary, this establishes the lower bound (69) and thus (66).was arbitrary, this establishes the lower bound (69) and thus (66).