[PDF] Entropy Distance: New Quantum Phenomena

Abstract

We study a curve of Gibbsian families of complex 3x3-matrices and point out new features, absent in commutative finite-dimensional algebras: a discontinuous maximum-entropy inference, a discontinuous entropy distance and non-exposed faces of the mean value set. We analyze these problems from various aspects including convex geometry, topology and information geometry. This research is motivated by a theory of info-max principles, where we contribute by computing first order optimality conditions of the entropy distance.

Full PDF

EEntropy Distance: New Quantum Phenomena

Stephan Weis Max Planck Institute for Mathematics in the SciencesInselstr. 22, D-04103 Leipzig, GermanyandAndreas Knauf Department of Mathematics, Friedrich-Alexander-UniversityErlangen-Nuremberg, Cauerstr. 11, D-91058 Erlangen, GermanySeptember 5, 2012

Abstract

We study a curve of Gibbsian families of complex × -matrices andpoint out new features, absent in commutative ﬁnite-dimensional algebras:a discontinuous maximum-entropy inference, a discontinuous entropy dis-tance and non-exposed faces of the mean value set. We analyze theseproblems from various aspects including convex geometry, topology andinformation geometry. This research is motivated by a theory of info-max principles, where we contribute by computing ﬁrst order optimalityconditions of the entropy distance. Index Terms – maximum-entropy inference, discontinuous, exponential family,infomax principles.

AMS Subject Classiﬁcation:

The aim of the introduction is a discussion of the maximum-entropy inferenceunder linear constraints, in two aspects: The problem of its discontinuity and itsconnection to infomax principles, asking for maximization of the entropy distancefrom an exponential family. Section 1.2 gives an overview of the article.

The maximum-entropy principle , while dating back to Boltzmann, became theinformation theoretic justiﬁcation of the thermodynamic formalism, see [Ja].We have discovered in three-level quantum systems a problem that can arisefor non-commutative observables: The real analytic maximum-entropy inferenceunder linear constraints has no continuous extension. An example is given in [email protected] [email protected] a r X i v : . [ m a t h - ph ] S e p emark 22; this phenomenon does not appear in commutative algebras of ﬁnitedimensions.The roughness of a discontinuity in the maximum-entropy inference showsthat we are currently at the very beginning of a quantitative understanding ofits performance. A deeper analysis seems necessary to tackle applications basedon asymptotic statistical variance or on asymptotic error rates. Other branchesof quantum inference, e.g. state tomography [WF, PR] or hypothesis testing[AV, NS], are further developed and asymptotic error rates are used to identifyoptimal tests.What do we mean by a discontinuous maximum-entropy inference? Weuse a ﬁxed set of observables a , . . . , a k , i.e. self-adjoint matrices in the algebra A = Mat( N, C ) , and denote by A sa the real vector space of self-adjoint matrices.We assume a quantum system is described by a density matrix ρ , also called state , i.e. ρ ∈ Mat( N, C ) ( N -level system), ρ (cid:23) (positive semi-deﬁnite) and tr( ρ ) = 1 (normalized). We denote by S ( A ) the set of density matrices, called state space . We assume a generic quantum systems where the density matrix ρ is invertible.The von Neumann measurements (see [Pe3]) of a r = (cid:80) λ ∈ spec( a r ) λP r,λ yieldeigenvalue λ with probability tr( ρP r,λ ) . • If n copies of ρ are available for measurement (in form of the n -fold tensorproduct ρ ⊗· · ·⊗ ρ ∈ A ⊗ n sa ), then n measurements of a r give us eigenvalues λ , . . . , λ n ∈ spec( a r ) such that the mean a r ( n ) := n ( λ + . . . + λ n ) (1)converges to the true mean tr( ρa r ) by the strong law of large numbers. • If nk copies of ρ are available, the measured values m , . . . , m k of the k random variables a ( n ) , . . . , a k ( n ) deﬁne an aﬃne subspace (cid:8) σ ∈ A sa | (cid:0) tr( σa ) , . . . , tr( σa k ) (cid:1) = (cid:0) m , . . . , m k (cid:1) (cid:9) . (2)We assume that this subspace intersects the state space S ( A ) , since bylarge deviation theory (e.g., Chap. I.3 of [El]) the probability of a distancelarger than a given ε > from ρ decays exponentially in n .The maximum-entropy inference associates to the measured values (cid:0) m , . . . , m k (cid:1) the unique density matrix (cid:98) ρ ( n ) in the set of states σ satisfying (2) which maxi-mizes the von Neumann entropy S ( σ ) := − tr( σ log( σ )) . (3)The maximum-entropy inference is well-deﬁned since the von Neumann en-tropy is a strictly concave function [We]. The inference is a real analytic map-ping on the domain of all mean value tuples (cid:0) tr( σa ) , . . . , tr( σa k ) (cid:1) for invertible2ensity matrices σ , see e.g. [Wi]. The image, called Gibbsian family (of densitymatrices), consists of all matrices of the form exp( a + λ a + · · · + λ k a k ) / tr(exp( a + λ a + · · · + λ k a k )) for real λ , . . . , λ k and a = 0 . In general, if a r ∈ A sa , this manifold of densitymatrices is called exponential family .In Remark 22 we discuss a Gibbsian family where the real analytic maximum-entropy inference deﬁned on the interior of the mean value set has no continuousextension to the full mean value set. While the variance of the random variables a r ( n ) , r = 1 , . . . , k in (1) and of the tuple ( a ( n ) , . . . , a k ( n )) is O (1 /n ) , thestatement is not obvious for the maximum-entropy inference (cid:98) ρ ( n ) . Indeed, thelack of continuous extension shows that the constant in the variance estimate O (1 /n ) of (cid:98) ρ ( n ) can be arbitrarily large. For the non-generic choice of a singulardensity matrix ρ the limit lim n →∞ (cid:98) ρ ( n ) needs not even be a state of maximumvon Neumann entropy. Convergence rates of the maximum-entropy inferencewere considered in the context of model selection [Ra].Maximum-entropy inference is closely connected to the entropy distance froman exponential family. The relative entropy between states ρ, σ ∈ S ( A ) is S ( ρ, σ ) := + ∞ unless the image of σ contains that of ρ and then (using thenatural logarithm) S ( ρ, σ ) := tr ρ (cid:0) ln( ρ ) − ln( σ ) (cid:1) . (4)The distance-like properties of S ( ρ, σ ) ≥ and of S ( ρ, σ ) = 0 ⇐⇒ ρ = σ hold [We]. However, the relative entropy is not a metric. For E ⊂ S ( A )d E : S ( A ) → R , ρ (cid:55)→ inf σ ∈E S ( ρ, σ ) (5)is called entropy distance of ρ from E . If E contains invertible density matrices,then d E is bounded on S ( A ) .Under arbitrary constraints, maximizing the von Neumann entropy is thesame as minimizing the relative entropy distance d { / tr(1l) } from the tracial state.In Section 2 we recall that for linear constraints the latter is equivalent to theunconstrained minimization of the relative entropy in its second argument fromthe corresponding Gibbsian family.Infomax principles support the hypothesis that natural systems tend to max-imize structured correlations. This, in the work [Ay], is formalized as deviationfrom an exponential family E , and is quantiﬁed by the entropy distance (5). Aninstructive example is the mutual information used in information theory: Example 1 (Product States) . The mutual information of a bipartite state ρ AB is given by S ( ρ AB , ρ A ⊗ ρ B ) ≥ for the relative entropy S and for reduced states ρ A resp. ρ B on subsystem A resp. B . It is zero only when ρ AB = ρ A ⊗ ρ B . Therelative entropy measures the distance of an arbitrary bipartite state from theGibbsian family of all product states. 3he mutual information of a quantum system measures the total correlationof a bipartite quantum system. For the entanglement in the system there existother measures, e.g. the entropy distance from the set of separable states, knownas relative entropy of entanglement , see e.g. [VK]. Correlation measures havingthe form of the entropy distance from a Gibbsian family are used in statisticalphysics, image processing or in the theory of neural networks to just name a few,see e.g. [MM, EA, Am, AJ].Maximizers of the entropy distance from an exponential families (of proba-bility distributions) were studied e.g. in [Ay, AK, Ma, Rh, MR]. In Section 5 wecontribute to a non-commutative analogon by computing ﬁrst order optimalityconditions. Most of the rest of the paper will focus on observables in the algebra of Exam-ple 3. We study a curve of planes in a Grassmannian manifold of linear spacesthat deﬁnes a curve of two-dimensional Gibbsian families of × -density matri-ces. Unlike Gibbsian families in ﬁnite probability spaces, one of the families has adiscontinuous entropy distance and its real analytic maximum-entropy inferencedoes not extend continuously. We discuss several candidates of closures to extendGibbsian families and we propose a convex geometric criterion to characterizediscontinuities: Where non-exposed faces are born in a Grassmannian manifoldof linear subspaces, families have a discontinuous inference. This conjecture issupported by the example of the Staﬀelberg family in Section 4.2.To compare classical and quantum physics, we consider *-subalgebras A of Mat( N, C ) . To allow low-dimensional examples we consider them real , i.e. A is a subring of Mat( N, C ) , and an R -module closed under conjugation a (cid:55)→ a ∗ .However, it is not necessarily closed under complex scalar multiplication. The state space of A is the set S = S ( A ) = { ρ ∈ A | ρ (cid:23) , tr( ρ ) = 1 } ofdensity matrices. We denote / resp. N / N the identity / zero in A resp. Mat( N, C ) . We allow for (cid:54) = 1l N which we need to study the swallow family inSection 4.4 and to prove an optimality condition in Section 5, see also Remark 6.The real vector space of self-adjoint matrices A sa is a Euclidean vector space forthe Hilbert-Schmidt scalar product (cid:104) a, b (cid:105) = tr( ab ) , a, b ∈ A sa . Remark 2.

There are other natural deﬁnitions of the state space of a real *-subalgebra A of Mat( N, C ) , e.g.1. the density matrices in A (like above),2. the states on Mat( N, C ) restricted to A sa ,3. the positive linear functionals on A sa that take the value at the identity.These deﬁnitions are mutually equivalent, assuming N ∈ A . The inclusionsof 1. into 2. into 3. are trivial. The inclusion of 3. into 2. follows from the4igure 1: Mean value sets for two probabilistic exponential families. Left: trian-gle; right: square.Riesz extension theorem and the inclusion of 2. into 1. follows from the factthat orthogonal projection from Mat( N, C ) sa onto A sa takes density matricesto density matrices.The following real *-subalgebra of the C*-algebra Mat(2 , C ) ⊕ C is suﬃcientlyrich for our purposes and it includes the curve of Gibbsian families. The statespace of Mat(2 , C ) ⊕ C has already been analyzed in [Ku] as the simplest exampleof a ’hybrid’ memory (and called hybrid trit ) but the main subject of that articleis not relevant to our discussions. Example 3.

We consider the real *-subalgebra

B ⊂

Mat(2 , C ) spanned by , σ , σ and i σ for the Pauli σ -matrices σ := ( ) , σ := (cid:0) − ii 0 (cid:1) , σ := (cid:0) − (cid:1) .This algebra is isomorphic to Mat(2 , R ) by exchanging σ and σ .A real *-subalgebra A ⊂

Mat(3 , C ) is deﬁned by block diagonal matrices (cid:16) ∗ ∗ ∗ ∗

00 0 ∗ (cid:17) with elements of B in the upper left corner and real numbers in the lowerright corner. The state space of B is S ( B ) = conv { (1l +sin( α ) σ +cos( α ) σ ) | α ∈ R } where conv denotes convex hull. This disk is a section of the state spaceof Mat(2 , C ) , known as Bloch ball . The state space of A is a three-dimensionalcone based on S ( B ) ⊕ and with apex ⊕ , S ( A ) = conv(0 ⊕ , ρ ( α ); α ∈ R ) for ρ ( α ) := (1l + sin( α ) σ + cos( α ) σ ) ⊕ . It is the solid of revolution of an equilateral triangle.It is well known that state spaces of commutative and non-commutativealgebras have quite diﬀerent geometries. Whereas in the commutative case wehave a simplex (and thus every state is uniquely decomposed into pure states),in the non-commutative case such a decomposition is highly non-unique (thinkof the Bloch ball).Still, from the point of view of convex geometry there is one common propertyof all these state spaces: all of their faces are exposed, that is, they can be5igure 2: The 3D cone is the state space of a non-commutative algebra. Left:Mean value sets (projections of the cone); right: sections of the cone. Projectionsrespectively sections are w.r.t. planes including the tracial state / tr(1l) , whichis the centroid of the cone.described as the intersection of state space with a half space. Non-exposedfaces are found, e.g., on the circumference of a stadium, at the four pointswhere a half-circle meets a segment. See Section 3 for precise deﬁnitions. In theprobabilistic setting of

A ∼ = C N , embedded as diagonal matrices, measurementof observables f , . . . , f n leads to an orthogonal projection S ( A ) −→ R n , p (cid:55)−→ (cid:0) E p ( f ) , . . . , E p ( f n ) (cid:1) of state space, based on expectation E p . The image, called mean value set or convex support [Ba] is no longer a simplex but still a polygon. So faces of amean value set are exposed, too. The same applies to all exponential familiesand their natural projections, see Figure 1.We exhibit here two main diﬀerences between exponential families in com-mutative and non-commutative algebras, at least in the curve of our example. • First, we show in Section 3 that it is typical for a non-commutative algebrathat mean value sets have non-exposed faces. • Second, we show in Section 4.2 that the entropy distance from an expo-nential family can be discontinuous in exceptional cases.In Figure 2 (left) we have sketched two-dimensional mean value sets of the statespace S ( A ) from Example 3. A mean value set has non-exposed faces if itis the convex hull of a non-degenerate ellipse and of an exterior point. Meanvalue sets with non-exposed faces are bounded in the Grassmannian manifoldby elliptical shapes that correspond to exponential families with discontinuousentropy distance (angles π, π, π and π ). It seems this boundary in theGrassmannian manifold is pivotal for discontinuity.6owards a classiﬁcation of non-exposed faces of mean value sets one canstudy singularities of the dual convex set (see [Ws3], including references on theprogress of that question). These dual convex sets are sections of the state space(cf. [Ws2] and Figure 2, right) and they are bounded by determinantal varietieswhich are a subject of study in convex algebraic geometry, see e.g. [Ne].Already in 1963 ensembles of maximum chaos in A = Mat( N, C ) werestudied in [Wi]. However, non-exposed faces at a mean value set have attractedlittle attention in the literature. In particular Theorem I (e) in [Wi], concerningextreme points is wrong, it fails in all cases where non-exposed extreme pointsappear. An example is given in Remark 29 a). We are convinced that non-exposed faces are important in the analysis of maximum-entropy inference andentropy distance. As we have seen in the beginning of this section the convexgeometric notion of non-exposed face indicates discontinuity of the inference.We will show later in this paper that the maximum-entropy inference doesnot extend continuously. So the question arises how a Gibbsian family G mustbe extended to a locus of maximum-entropy density matrices under linear con-straints. It is clear that the topological norm closure is too large. In the examplespresented in Section 4 we will prove that the reverse information closure or rI-closure cl rI ( G ) := { ρ ∈ S ( A ) | inf σ ∈G S ( ρ, σ ) = 0 } (6)gives the right answer. Its name is motivated from probability theory [CM] and itconsists of states that approximate G in relative entropy S . Since the algebra A isa nice substructure of Mat( N, C ) , we have also a theory of information geometry[AN] at our disposition, which gives us two canonical choices of geodesics onthe manifold G . These (+1) -geodesics and ( − -geodesics will be deﬁned in thenext section. They give rise to the (+1) -closure cl (+1) ( G ) := G ∪ { limit points of (+1) -geodesics in

G } (7)and the ( − -closure cl ( − ( G ) := G ∪ { limit points of ( − -geodesics in G } . (8)The inclusions of cl (+1) ( G ) and cl ( − ( G ) into the norm closure G are obvious.We show cl (+1) ( G ) ⊂ cl rI ( G ) ⊂ G , (9)where the second inclusion follows from the Pinsker-Csiszár inequality. We provethat the (+1) -closure is smaller than the locus of maximum-entropy densitymatrices, and that the rI-closure and the ( − -closure are possible candidatesfor the correct extension of G . We introduce two sorts of canonical geodesics on a Gibbsian family and weprovide a geometric discussion of how the maximum-entropy inference relates7o the entropy distance. We remark on the information geometric context ofthe geodesics, on quantum channels and on advantages of real *-subalgebras asopposed to C*-subalgebras.In this section A denotes an arbitrary real *-subalgebra of Mat( N, C ) . Theset of invertible states equals the relative interior of the state space ri S ( A ) = { ρ ∈ S ( A ) | ρ − exists in A} , i.e. the interior of S ( A ) in its aﬃne span A := { a ∈ A sa | tr( a ) = 1 } , see e.g.Proposition 2.9 in [Ws2]. The trace-normalized exponential is the real analyticmapping exp : A sa −→ ri S ( A ) , a (cid:55)−→ e a tr( e a ) deﬁned by functional calculus of self-adjoint matrices in A . This is a dif-feomorphism when restricted to traceless matrices. The real analytic inverse ln : ri( S ( A )) → A deﬁned by ln : ρ (cid:55)→ ln( ρ ) −

1l tr(ln( ρ )) / tr(1l) is the canonical chart of ri S ( A ) .The image of a non-empty aﬃne subspace of A sa under exp is an expo-nential family in A . For an exponential family E we call ln | E the canonicalchart of E . The aﬃne space Θ := ln ( E ) is the canonical parameter space , itstranslation vector space V := { x − y | x, y ∈ Θ } is the canonical tangent space and the restriction of exp to Θ is the canonical parametrization of E .An exponential family is a Gibbsian family if Θ = V , and for that case adiﬀerent chart was introduced in Theorem 2 (b) in [Wi]: If π V : A sa → V denotes orthogonal projection onto V , we deﬁne the mean value set M ( V ) = M A ( V ) := π V (cid:0) S ( A ) (cid:1) . (10)The mean value set is aﬃnely isomorphic to { ( (cid:104) ρ, v (cid:105) , . . . , (cid:104) ρ, v (cid:105) | ρ ∈ S ( A ) } ,if v , . . . , v k is a basis of V , see e.g. Remark 1.1 in [Ws2]. The latter set wasused in [Wi]. It is not reasonable to choose a basis of V in our analysis, becausevector spaces pV p for projections p = p = p ∗ ∈ A will be used, see Remark 6,and multiplication with p can destroy linear independence.The map π V ◦ exp | V : V → ri (cid:0) M ( V ) (cid:1) is a real analytic diﬀeomorphism,its image is an open subset of V . The mean value chart for the Gibbsian family E is the bijection π V | E : E −→ ri (cid:0) M ( V ) (cid:1) . (11)The real analytic inverse π E : ri (cid:0) M ( V ) (cid:1) → E shall be called mean valueparametrization . Below we also write π E for the map π E ◦ π V deﬁned on the do-main dom E := S ( A ) ∩ ( E + V ⊥ ) , which was introduced in [Ay] (for probabilitydistributions). In fact, the chart (11) was established in [Wi] for A = Mat( N, C ) .Since V contains only traceless matrices, its is proved in Lemma 3.13 in [Ws2]8hat M A ( V ) = M Mat( N, C ) ( V ) holds for every C*-subalgebra A ⊂

Mat( N, C ) which contains V . Remark 6 extends this equality to real *-subalgebras A in-cluding V . So (11) holds for these algebras.The two charts for a Gibbsian family E have open subsets of the canonicaltangent space V as their images. Given that V is an aﬃne space, two kinds ofaﬃne geodesics for E arise: Unparametrized (+1) -geodesics are the images ofopen segments in V under the canonical parametrization exp : V → E , andunparametrized ( − -geodesics are the images of open segments in ri (cid:0) M ( V ) (cid:1) under the mean value parametrization π E : ri (cid:0) M ( V ) (cid:1) → E . We shall denotethe open segment between a, b ∈ A sa by ] a, b [ := { (1 − λ ) a + λb | < λ < } and the closed segment by [ a, b ] := { (1 − λ ) a + λb | ≤ λ ≤ } . A morecomprehensive introduction of ( ± ) -geodesics is given in Section 7.2 in [AN].The geodesics are part of a beautiful theory, called information geometry, aboutaﬃne connections and Riemannian metrics on state spaces. See Remark 4 forsome details.The relative entropy suits exponential families very well. If ρ, σ and τ arestates in A with σ and τ invertible, and if ρ − σ ⊥ ln( τ ) − ln( σ ) , then S ( ρ, σ ) + S ( σ, τ ) = S ( ρ, τ ) (12)holds, see e.g. [Pe1]. This is the Pythagorean theorem of the relative entropy.Clearly the Pythagorean theorem (12) holds if σ and τ belong to an exponentialfamily E in A and if ρ ∈ S ( A ) satisﬁes ρ − σ ⊥ V . The projection theorem follows for ρ ∈ dom E : min σ ∈E S ( ρ, σ ) = S (cid:0) ρ, π E ( ρ ) (cid:1) , (13)the minimum being unique. See Remark 4 about the information geometry ofthese theorems.The linearly constrained maximization of von Neumann entropy can be re-placed by an unconstrained minimization of the relative entropy. As mentionedpreviously, for V = Θ the mean value parametrization π E : ri (cid:0) M ( V ) (cid:1) → E (14)assigns to vectors v ∈ ri (cid:0) M ( V ) (cid:1) the unique state π E ( v ) of maximum von Neu-mann entropy in the ﬁber F ( v ) := ( v + V ⊥ ) ∩ S ( A ) . This is often proved usingLagrange multipliers or Klein’s inequality, see e.g. [IO].A proof of (14) by information geometry opens a link to the entropy distance:Let τ = 1l / tr(1l) be a reference state. Then S ( ρ, τ ) = − S ( ρ ) + log(tr(1l)) for all ρ ∈ S ( A ) , so maximizing the von Neumann entropy is equivalent to minimizingthe relative entropy from τ , under arbitrary constraints (a diﬀerent choice of τ corresponds to a biased inference [Ru]). For all v ∈ ri (cid:0) M ( V ) (cid:1) the ﬁber F ( v ) isincluded in the domain dom E of the Gibbsian family E = exp ( V ) by the meanvalue chart (11). Since τ ∈ E , the Pythagorean theorem (12) shows for anystate ρ ∈ F ( v ) S ( ρ, π E ( ρ )) + S ( π E ( ρ ) , τ ) = S ( ρ, τ ) . S ( · , τ ) over ρ ∈ F ( v ) has the unique solution π E ( ρ ) . In addition, bythe projection theorem (13), it is equivalent to the unconstrained minimizationof S ( ρ, · ) on E (independent of the choice of ρ ∈ F ( v ) ).Pythagorean and projection theorems as well as the (+1) - and ( − -geodesicsare rooted in information geometry. Remark 4.

The exponential family ri (cid:0) S ( A ) (cid:1) of invertible density matrices hasthe mean value chart ρ (cid:55)→ ρ − / tr(1l) . Its tangent space at ρ is called the (m)-representation and equals A := { a ∈ A sa | tr( a ) = 0 } , see p. 148 in [AN].According to [Pe1, GS], the BKM (Bogoliubov-Kubo-Mori) metric, a Riemannianmetric on ri (cid:0) S ( A ) (cid:1) , can be deﬁned for invertible density matrices ρ and tangentvectors A, B in the (m)-representation by g ( A, B ) ρ := (cid:90) ∞ tr (cid:0) ( t + ρ ) − A ( t + ρ ) − B (cid:1) d t . Although the BKM metric is a natural generalization of the Fisher metric tostate spaces of non-commutative algebras, unlike the former it is not the onlysuch monotone one, see e.g. [Pe2].The (m)-connection on the state space ri( S ( A )) , denoted ∇ (m) , is deﬁnedthrough the parallel transport of translation on the aﬃne hull A = { a ∈ A sa | tr( a ) = 1 } of the state space. If g is a Riemannian metric on the manifold ri( S ( A )) then the (e)-connection , denoted ∇ (e) , is deﬁned by Xg ( Y, Z ) = g (cid:0) ∇ (m) X Y, Z (cid:1) + g (cid:0) Y, ∇ (e) X Z (cid:1) for vector ﬁelds X, Y, Z on ri (cid:0) S ( A ) (cid:1) . The connections ∇ (m) and ∇ (e) are saidto be dual with respect to g . The (m)-connection is also called ( − -connection,and when the BKM Riemannian metric g is used, then the dual (e)-connectionis called (+1) -connection. The connections ∇ (+1) and ∇ ( − give rise to thegeodesics introduced above, see e.g. Section 7.2 and Section 7.3 in [AN].The state space of the (real) *-subalgebra A is trivially ( − -autoparallel(i.e. totally geodesic) and it is (+1) -autoparallel as it is an exponential family.This shows that the ( ± ) -connections restrict from the state space of Mat( N, C ) to ri( S ( A )) .A Pythagorean theorem and a projection theorem are known in informationgeometry for dually ﬂat spaces . The relative entropy is the canonical divergence of the dually ﬂat space of invertible density matrices with respect to the BKMmetric and the ( ± ) -connections. Hence the Pythagorean theorem (12) arisesfrom a more general theory, see e.g. Section 3.4 in [AN]. The ( − -geodesicthrough ρ and σ and the (+1) -geodesic through σ and τ meet at σ orthogonallywith respect to the BKM metric.The projection π V | S ( A ) : S ( A ) → V can be seen as a quantum channel toa commutative algebra. 10 emark 5. The mean value set M ( V ) = π V ( S ( A )) relates to a POVM quantummeasurement. A POVM is deﬁned as a ﬁnite sequence F , . . . , F n of positivesemideﬁnite matrices in A , such that F + · · · + F n = 1l . The probability ofoutcome i ∈ { , . . . , n } when measuring the quantum system ρ ∈ S ( A ) is P ρ ( i ) := tr( F i ρ ) , see e.g. [Pe3]. Given a POVM F , . . . , F n in A , a quantumchannel S ( A ) → S ( C n ) , ρ (cid:55)→ ( P ρ (1) , . . . , P ρ ( n )) = (tr( F ρ ) , . . . , tr( F n ρ )) is deﬁned. If U is the real linear span of F , . . . , F n and (cid:101) U is the orthogonalprojection of U onto the space of traceless matrices A , then the mean valuesets M ( U ) = π U ( S ( A )) and M ( (cid:101) U ) = π (cid:101) U ( S ( A )) are aﬃnely isomorphic to theimage of the above channel S ( A ) → S ( C n ) . (For a proof see e.g. Remark 1.1in [Ws2]).We would like to comment on (real) *-subalgebras. Remark 6.

As already mentioned earlier, *-subalgebras allow for low-dimensionalexamples. What makes *-subalgebra A of Mat( N, C ) eligible for our treatment isthat all results in [Ws2] are true for them (unfortunately the choice in that articlewas to argue with intersections of C*-subalgebras and real matrices Mat( N, R ) ).Some caution is needed, e.g. spectral projections of normal matrices need not beincluded in A , as the matrix i σ ⊕ in Example 3 shows. This error is presentin Deﬁnition 2.5.3 of the above article. However, as only self-adjoint matricesare used, there is no problem arising.An important feature of a *-subalgebra A of Mat( N, C ) is that spectralprojections p of a self-adjoint matrix a ∈ A sa can be written as p = f ( a ) for areal polynomial f in one variable. This implies that • if a is a self-adjoint matrix and g is a real valued function deﬁned on thespectrum of a , then g ( a ) belongs to A sa , • the state space has codimension one in A sa , as the cone of positive semi-deﬁnite matrices has full dimension (decompose a self-adjoint matrix intoa diﬀerence of two positive semi-deﬁnite matrices).One superﬁcial ﬂaw of *-subalgebras (and of C*-subalgebras!) is that eigen-values can not be used directly, as the identity of A may diﬀer from the identity N in Mat( N, C ) . On a closer inspection one realizes that this is exactly theﬂexibility we need e.g. in Proposition 14 and Theorem 27 to analyze rI-closures.The (+1)-closure of an exponential family is formed by exponential families ofstrictly smaller support, lying in compressed algebras p A p = { pap | a ∈ A} with identity p = p = p ∗ ∈ A . The algebra p A p as a *-subalgebra of Mat( N, C ) may be treated in the same way as A . The unorthodox use ofspectral values within a ﬁnite-dimensional algebra was overlooked in [Ws2], seethe correction Lin. Alg. Appl. no. 1 p. xvi (2012).11 A classical—quantum metamorphosis

In the algebra A from Example 3 we study a curve of 2D mean value sets andwe address the question whether they have non-exposed faces. The algebra A has the commutative *-subalgebra (cid:16) ∗ ∗

00 0 ∗ (cid:17) of diagonal matrices, isomorphic to R , and its left upper corner (cid:16) ∗ ∗ ∗ ∗

00 0 0 (cid:17) is a non-commutative *-subalgebra.The curve of mean value sets is rather a Grassmannian manifold of subspaces.More precisely, we consider 2D subspaces of the 4D space A sa of self-adjointmatrices and here we restrict to 2D subspaces of the 3D space of tracelessmatrices (since the state space is parallel to it). So by symmetry of the cone S ( A ) one real angular variable suﬃces to describe mean value sets. Thus wecan consider a curve in the Grassmannian manifold. In Figure 2, left, mean valuesets M ( V ) are drawn isometrically at equidistant π angles around a full circle.Our example is minimal in two respects: • Planar projections have minimal dimension to allow for non-exposed faces. • The algebra A is (up to isomorphism) the smallest *-subalgebra A of Mat( N, C ) allowing for a mean value set with non-exposed faces. If A has no *-subalgebra isomorphic to Mat(2 , R ) then, assuming

1l = 1l N ,then A is commutative (see Theorem 5.2 and 5.4 in Section 5 in [KH]).(

1l = 1l (cid:101) N may be achieved by restricting a faithful representation ofthe C*-algebra

1l Mat( N, C )1l onto a direct sum of full matrix algebras,see e.g. [Da].) Hence the state space S ( A ) is a simplex. Then allmean value sets are polytopes having no non-exposed faces. The alge-bra Mat(2 , R ) ∼ = span R { , σ , σ , i σ } itself has a disk as state space,whose proper projections are a point or a segment, having no non-exposedfaces.We introduce precise deﬁnitions in convex geometry for subsequent discus-sions e.g. in Lemma 13. Deﬁnition 7.

Let M be a compact and convex subset of a ﬁnite-dimensionalEuclidean vector space ( E , (cid:104)· , ·(cid:105) ) . • A convex subset F of M is a face of M , if for all x, y ∈ M and all < λ < the inclusion of (1 − λ ) x + λy ∈ F implies x, y ∈ F . • A face of dimension zero is called extreme point and if it is not exposed,a non-exposed point . An extreme point of S ( A ) will be called pure state . • If M is non-empty, then for non-zero u ∈ E the supporting hyperplane isdeﬁned by H ( M, u ) := { x ∈ E | (cid:104) x, u (cid:105) = max y ∈ M (cid:104) y, u (cid:105)} . A face F of M is called exposed if F is the intersection of M with asupporting hyperplane F ( M, u ) := M ∩ H ( M, u ) . (15) F = ∅ and F = M are exposed faces by deﬁnition.The Grassmannian manifold G of real 2D subspaces of self-adjoint tracelessmatrices A = { a ∈ A sa | tr( a ) = 0 } will be denoted G := { V ⊂ A | V is a real 2D subspace } . We deﬁne the angle between a subspace V ∈ G and z := ( − ) ⊕ (pointingalong the axis of the cone), ϕ = ϕ ( V ) := ∠ (cid:4) ( V, z ) . (16)The state space is S ( A ) = conv( S ( B ) ∪ { ⊕ } ) for the disk S ( B ) intro-duced in Example 3. The mean value set of V ∈ G is the convex hull of theellipse e := π V ( S ( B )) and of x := π V (0 ⊕ , M ( V ) = conv( e, x ) . (17)The problem of ﬁnding non-exposed faces at M ( V ) may be solved in R bystudying projections of a symmetric 3D cone isometric to S ( A ) . Explicit calcu-lations with matrices are done in Example 1.2 in [Ws2] by studying tangents tothe elliptical boundary curve ∂e . For all subspaces V ∈ G the projection of V onto span R ( σ , σ , σ ) ⊕ is a subspace of span R ( σ , σ ) ⊕ . Hence the statespace S ( A ) equals the cone C in [Ws2] and we have the following: Lemma 8.

Let V ∈ G be a 2D plane. If ϕ = 0 , then ∂e is a segment (degenerateellipse) and the mean value set M ( V ) is a triangle. If < ϕ < π , then ∂e isa non-degenerate ellipse, x (cid:54)∈ e and the tangents from x to ∂e meet ∂e at twonon-exposed points of M ( V ) . If π ≤ ϕ ≤ π , then M ( V ) = e is bounded by anon-degenerate ellipse ∂e . We see that non-exposed faces are typical in the following sense. A contin-uous curve γ : [0 , → G induces a curve of mean value sets λ (cid:55)→ M ( γ ( λ )) .By Lemma 8 a mean value set without non-exposed faces must be a triangle oran ellipse. If γ connects the classical mean value set of a triangle to an ellipse,then we have ∠ (cid:4) ( γ (0) , z ) = 0 and ∠ (cid:4) ( γ (1) , z ) ∈ [ π , π ] . Since the angle ϕ iscontinuous on G , the curve γ must cross the range of angles (0 , π ) with meanvalue sets having non-exposed faces. This range corresponds to an open subsetof the Grassmannian G . 13 Closures of exponential families

The curve of 2D mean value sets M ( V ) in Section 3 shows that the angle of ϕ = ϕ ( V ) = π divides mean value sets with non-exposed faces from otherswithout non-exposed faces. In Section 4.2 we show that the Gibbsian familyat ϕ = π , called Staﬀelberg family, has a discontinuous entropy distance. Theanalysis is based on more general results about (+1) -closures in Section 4.1. InSection 4.3 we compute the ( − -closure of the Staﬀelberg family. We will seein Section 4.4 that the (+1) -closure of a Gibbsian family, in general, is not alocus of maximum-entropy density matrices under linear constraints.In the sequel we assume that A is a real *-subalgebra of Mat( N, C ) and that E is an exponential family in A with canonical parameter space Θ and canonicaltangent space V = lin(Θ) . In Section 4.2 through 4.4 we shall specialize to thealgebra A deﬁned in Example 3. (+1) -closures of exponential families In this section we compute the (+1) -closure cl (+1) ( E ) deﬁned in (7). We showthat it is a union of exponential families. We also discuss aspects of the rI-closure cl rI ( E ) , deﬁned in (6) and of the norm closure E . Among others, we show cl (+1) ( E ) ⊂ cl rI ( E ) ⊂ E . Strict inclusions are presented by example in Section 4.4 and Section 4.2In this section A denotes an arbitrary real *-subalgebra of Mat( N, C ) . In theanalysis of (+1) - and rI-closures, subalgebras with various identities will appear,so spectral values shall be used in some statements, see also Remark 6. On thespace A sa of self-adjoint matrices we have the partial ordering deﬁned by a (cid:22) b if and only if b − a (cid:23) , i.e. b − a is positive semi-deﬁnite. The set of projections { p ∈ A | p ∗ = p = p } will be considered with this partial ordering. If p ∈ A isa projection, then the compressed algebra by p is p A p := { p ap | a ∈ A} . (18)The algebra p A p is a *-subalgebra of A with identity p . The spectral values of a ∈ A sa are the real numbers λ such that a − λ is not invertible in A . The sumof spectral projections of non-zero spectral values of a is the support projection s ( a ) ; we notice s ( a ) ∈ A . (19)We denote by λ + ( a ) the maximal spectral value of a and by p + ( a ) ∈ A thespectral projection of a corresponding to λ + ( a ) , which we call the maximalprojection of a . Notice in Remark 6 that eigenvalues can not be used.The free energy , deﬁned for a ∈ A sa by F ( a ) := ln(tr( e a )) is useful todiscuss limits of (+1) -geodesics. Functions deﬁned for projections p ∈ A byfunctional calculus on ( p A p ) sa will be decorated by a superscript p , e.g. ln p ( p ) = , while ln( p ) is not deﬁned if p (cid:54) = 1l . The superscript p = 1l will often beomitted. For a ∈ ( p A p ) sa we notice exp p ( a ) = p exp( a ) , exp p ( a ) = p e a tr( p e a ) and F p ( a ) = ln tr( p e a ) . We use the projection A → p A p , a (cid:55)→ pap to deﬁne theexponential family in p A p E p := { exp p ( pθp ) | θ ∈ Θ } . Lemma 9.

Suppose θ, u ∈ A sa and p := p + ( u ) is the maximal projection of u .We have lim t →∞ exp ( θ + t u ) = exp p ( pθp ) (20) and lim t →∞ (cid:0) F ( θ + t u ) − t λ + ( u ) (cid:1) = F p ( pθp ) . (21) Proof: If u has maximal spectral value λ + ( u ) = 0 then by standard pertur-bation theory one proves lim t →∞ e θ + t u = p e p θp . (22)Since exp ( θ + α ( θ ) holds for α ∈ R we have for arbitrary u ∈ A sa lim t →∞ exp ( θ + t u ) = lim t →∞ exp ( θ + t ( u − λ + ( u )1l)) = p e p θp tr( p e p θp ) . If u has maximal spectral value λ + ( u ) = 0 then (22) and the continuity ofthe logarithm show lim t →∞ F ( θ + t u ) = ln tr( p e p θp ) . We have F ( θ + α F ( θ ) + α for α ∈ R , hence for arbitrary u ∈ A sa the equality of F ( θ + t u ) − tλ + ( u ) = F [ θ + t ( u − λ + ( u ))] shows the second claim. (cid:3) An immediate consequence of (20) is as follows.

Proposition 10.

The (+1) -closure of E is cl (+1) ( E ) = (cid:83) p E p where the disjointunion extends over the maximal projections p = p + ( v ) of all vectors v ∈ V (including

1l = p + (0) ). The ﬁrst hurdle to tackle the rI-closure will be Lemma 13 which controls limitsof the relative entropy of certain states ρ from states σ on (+1) -geodesics. Thisis remarkable since for A = Mat(2 , C ) S ρ ( σ ) := S ( ρ, σ ) is not continuous on the set { σ ∈ S | s ( σ ) (cid:23) s ( ρ ) } with larger support projec-tions (19). However, S ρ is continuous throughout the simplex S for A ∼ = C N .15 xample 11 (Discontinuity of Relative Entropy) . In the algebra A = Mat(2 , C ) of a qubit we consider the pure state ρ := (1l + σ ) .For real α > let s α ∈ [0 , such that s α α → → and deﬁne the state σ α := (1 − s α ) (1l + cos( α ) σ + sin( α ) σ ) + s α (1l − cos( α ) σ − sin( α ) σ ) . Then σ α α → → ρ as well as S ( ρ, σ α ) = − α log( s α )(1 + o (1)) + o (1) . E.g. if we choose c, γ > and put s α := exp( − c/α γ ) , then s α α → → and S ( ρ, σ α ) = c α − γ (1 + o (1)) + o (1) . So any non-negative limit of S ( ρ, σ α ) can be achieved for smooth paths con-verging to an arbitrary point ρ in the boundary of state space.Using maximal spectral values λ + and maximal projections p + we summarizeProposition 2.9 in [Ws2]. Lemma 12. If u ∈ A sa is a non-zero self-adjoint matrix, then the exposedface F ( S ( A ) , u ) consists of the states ρ ∈ S ( A ) such that (cid:104) ρ, u (cid:105) = λ + ( u ) or,equivalently, s ( ρ ) (cid:22) p + ( u ) . The lemma says that the exposed face F ( S ( A ) , u ) is the state space of thecompressed algebra p A p discussed in (18) for p := p + ( u ) . Moreover, it followsthat all faces of S ( A ) are exposed, see e.g. Section 2.3 in [Ws2].The derivative of the exponential function for a, b ∈ A sa is D | a exp( b ) = (cid:90) e ya be (1 − y ) a d y . It implies the derivative of the free energy F D | a F ( b ) = (cid:104) b, exp ( a ) (cid:105) . (23)The derivative of the exponential for A = Mat( N, C ) is explained by powerseries expansion e.g. in [Li] and may be generalized to any *-subalgebra A of Mat( N, C ) by left- and right-multiplication with the identity in A . Lemma 13.

Suppose θ, u ∈ A sa such that u is not a multiple of the identity in A and let p := p + ( u ) . If ρ ∈ F ( S ( A ) , u ) , then S ρ (exp ( θ + t u )) is strictlymonotone decreasing with t ∈ R and S ρ (cid:0) exp p ( p θp ) (cid:1) = lim t →∞ S ρ (cid:0) exp ( θ + t u ) (cid:1) = inf t ∈ R S ρ (cid:0) exp ( θ + t u ) (cid:1) . roof: By deﬁnition (15) of an exposed face we have for τ ∈ S ( A ) and for ρ ∈ F ( S ( A ) , u ) the inequality (cid:104) u, τ − ρ (cid:105) ≤ . Since u is not proportional to ,its maximal projection is p := p + ( u ) is not . If τ is invertible, then s ( τ ) = 1l and it follows from Lemma 12 that τ (cid:54)∈ F ( S ( A ) , u ) . This implies the strictinequality (cid:104) u, τ − ρ (cid:105) < to hold for all invertible states τ = exp ( θ + tu ) with t ∈ R . Using (23) we have for all t ∈ R ∂∂λ S ρ ◦ exp ( θ + t u ) = (cid:104) u, exp ( θ + t u ) − ρ (cid:105) < . We conclude that S ρ ◦ exp ( θ + t u ) is strictly monotone decreasing in t .The limit of the (+1) -geodesic g : t (cid:55)→ exp ( θ + t u ) is calculated in (20), σ := lim t →∞ g ( t ) = exp p ( p θp ) . The states ρ and σ belong to the compressed algebra p A p deﬁned in (18) and σ is invertible in p A p . Then − S ( ρ, σ ) − S ( ρ ) = tr ( ρ ln p ◦ exp p ( p θp )) = tr( ρ θ ) − F p ( p θp )= lim t →∞ [tr( ρ θ ) + t λ + ( u ) − F ( θ + t u )]= lim t →∞ (cid:2) tr (cid:0) ρ ( θ + t u ) (cid:1) − F ( θ + t u ) (cid:3) = lim t →∞ tr (cid:0) ρ ln ◦ exp ( θ + t u ) (cid:1) = lim t →∞ [ − S ( ρ, g ( t )) − S ( ρ )] . We have used (21) in the third step. The result is lim t →∞ S ρ ◦ g ( t ) = S ρ ( σ ) .Since S ρ ◦ g is monotone decreasing in λ we have inf t ∈ R S ρ ◦ g ( t ) = S ρ ( σ ) . (cid:3) We show that (+1) -closures do not decrease the entropy distance, deﬁnedin (5), from exponential families.

Proposition 14. If v (cid:54) = 0 belongs to the canonical tangent space V of the expo-nential family E and ρ to the exposed face F ( S ( A ) , v ) , then d E ( ρ ) = d E p +( v ) ( ρ ) .For arbitrary ρ ∈ S ( A ) we have d E ( ρ ) = inf { S ( ρ, σ ) | σ ∈ cl (+1) ( E ) } .Proof: We prove the ﬁrst statement, let p := p + ( v ) . If p + ( v ) = 1l , thenthere is nothing to prove. Otherwise we have by Lemma 13 and Lemma 9 d E ( ρ ) = inf σ ∈E S ( ρ, σ ) = inf θ ∈ Θ inf t ∈ R S ( ρ, exp ( θ + tv ))= inf θ ∈ Θ S ( ρ, lim t →∞ exp ( θ + tv )) = inf θ ∈ Θ S ( ρ, exp p ( pθp )) = d E p ( ρ ) . For the second statement, let ρ ∈ S ( A ) be arbitrary. By Proposition 10 itsuﬃces to show d E p ( ρ ) ≥ d E ( ρ ) for all projections p of the form p = p + ( v ) where v ∈ V is non-zero. If ρ (cid:54)∈ F ( S ( A ) , v ) , then s ( ρ ) (cid:54)(cid:22) p by Lemma 12. Sofor all σ ∈ E p we have S ( ρ, σ ) = ∞ . Otherwise, the equality d E p ( ρ ) = d E ( ρ ) follows from the ﬁrst assertion above. (cid:3) orollary 15. We have cl (+1) ( E ) ⊂ cl rI ( E ) ⊂ E .Proof: The ﬁrst inclusion follows from Proposition 14: If ρ ∈ cl (+1) ( E ) , then d E ( ρ ) = inf { S ( ρ, σ ) | σ ∈ cl (+1) ( E ) } = 0 shows ρ ∈ cl rI ( E ) .The second inclusion follows from the Pinsker-Csiszár inequality (see e.g. p.40 in [Pe3]), which says that (cid:107) ρ − σ (cid:107) ≤ S ( ρ, σ ) holds for all states ρ, σ ∈ S ( A ) with the trace norm (cid:107) a (cid:107) := tr( √ a ∗ a ) for a ∈ A . (cid:3) Finally we prove an upper bound for the norm closure of a Gibbsian family.

Lemma 16.

Let E be a Gibbsian family, i.e. Θ = V . Then E ⊂ E∪ (cid:83) v F (cid:0) S ( A ) , v (cid:1) where the union of exposed faces extends over all non-zero vectors v ∈ V .Proof: We assume θ i ⊂ Θ and that ρ i := exp ( θ i ) ∈ E is a convergingsequence with limit ρ := lim i →∞ ρ i . If π V ( ρ ) ∈ ri( M ( V )) , then there is aneighborhood U ( π V ( ρ )) ⊂ ri( M ( V )) containing π V ( ρ i ) for large i . Choosingthis neighborhood suﬃciently small we can assume its closure X is a compactsubset of ri( M ( V )) . As discussed in (11) the map π V ◦ exp : V → ri( M ( V )) is areal analytic diﬀeomorphism. Using the inverse mapping, the set log ◦ π E ( X ) ⊂ V is compact and it contains θ i for large i . It follows ρ ∈ E .Otherwise, if π V ( ρ ) belongs to the boundary of the mean value set, thenby Theorem 13.1 in [Ro] there is a non-zero vector v ∈ V such that π V ( ρ ) ∈ F ( M ( V ) , v ) . Then the state ρ lies in the exposed face F ( S ( A ) , v ) for the samevector v . (cid:3) The exponential family E discussed in this section is an example of a discontinu-ous maximum-entropy inference announced in the introduction. That exponen-tial family has the form of the Staﬀelberg table mountain, in the natural preserveof

Fränkische Schweiz—Veldensteiner Forst . Its mean value set appears at theangle (16) of ϕ = π in the metamorphosis of Figure 2. Smaller angles ϕ havenon-exposed faces, larger angles do not. We explain the geometrical componentsof the closures cl (+1) ( E ) = cl rI ( E ) (cid:40) E . Then we address continuity issues. Deﬁnition 17.

The

Staﬀelberg family , depicted in Figure 3, is the Gibbsianfamily E := exp (span R { σ ⊕ , σ ⊕ } ) in the *-subalgebra A ⊂

Mat(3 , C ) deﬁned in Example 3.The self-adjoint matrices in A are A sa = span R { ⊕ , σ ⊕ , σ ⊕ , ⊕ } ,the state space S ( A ) is a 3D cone. We use the notation B := { ρ ( α ) | α ∈ (0 , π ) } E sketched by (+1) -geodesics. Left: Thecone about E is the state space S ( A ) . The ellipse below is the boundary of themean value set M ( V ) . The generating line [ ρ (0) , ⊕ of the cone S ( A ) , withmidpoint c , is perpendicular to V . Right: E has equal (+1) - and rI-closures, theycover the punctured base circle of S ( A ) (large circle) with ρ (0) missing (smallcircle). These closures include c . The norm closure of E contains in addition theentire segment [ ρ (0) , c ] .for the punctured base circle of S ( A ) with ρ (0) = (1l + σ ) ⊕ missing. Thesymmetry axis l of S ( A ) goes through the tracial state and through the apex ⊕ , where it meets the generating lines of the cone S ( A ) under an angleof π . The generating line [ ρ (0) , ⊕ is perpendicular to V . We denote itsmidpoint by c := ( ρ (0) + 0 ⊕ . The canonical tangent space V = Θ of E is spanned by v := σ ⊕ and v := σ ⊕ − . The vector z = − ⊕ is perpendicular to v , so ϕ = ∠ (cid:4) ( V, z ) = ∠ (cid:4) ( v , z ) = arccos( ) = π as claimed. The basis vectors of V connect special points in S ( A ) , v = ρ ( π ) − ρ ( π ) and v = ( c − ρ ( π )) . The *-algebra generated by σ ⊕ is isomorphic to R and it has the segment [ ρ ( π ) , c ] as its state space. The (+1) -geodesic { exp ( λv ) | λ ∈ R } is includedin E and it covers the invertible states in [ ρ ( π ) , c ] . The *-algebra generatedby ρ (0) , ρ ( π ) and ⊕ is isomorphic to R , its state space is the equilateraltriangle spanned by these generators, see Figure 3, left.19or discussions of (+1) -geodesics in E we use a redundant parametrizationand deﬁne for real α, s, tE ( α, s, t ) := exp (cid:8) t [cos( α )( σ ⊕

1) + sin( α ) σ ⊕ (24) + s [ − sin( α )( σ ⊕

1) + cos( α ) σ ⊕ (cid:9) . Let x := s cos( α ) + t sin( α ) , y := − s sin( α ) + t cos( α ) , b := (cid:112) x + y = √ s + t and η := 2 cosh( b ) + e − s sin( α )+ t cos( α ) . Then E ( α, s, t ) = η (cid:8) [cosh( b )1l + sinh( b )( xσ + yσ ) /b ] ⊕ e − s sin( α )+ t cos( α ) (cid:9) . The vectors v and v are completed by v := 0 ⊕ − ρ (0) to an orthogonalbasis of the traceless matrices A = V + R z . We have (cid:104) E ( α, s, t ) , σ ⊕ (cid:105) = η (cid:2) b ) x/b (cid:3) (25) (cid:104) E ( α, s, t ) , σ ⊕ (cid:105) = η (cid:2) b ) y/b + e − s sin( α )+ t cos( α ) (cid:3) (cid:104) E ( α, s, t ) , ⊕ − ρ (0) (cid:105) = η (cid:2) − cosh( b ) − sinh( b ) y/b + e − s sin( α )+ t cos( α ) (cid:3) . We discuss closures of the Staﬀelberg family and its entropy distance.

Theorem 18.

The Staﬀelberg family E has (+1) -closure and rI-closure equal to cl (+1) ( E ) = cl rI ( E ) = E ∪ B ∪ { c } . The norm closure is E = cl rI ( E ) ∪ [ ρ (0) , c ] .The entropy distance of ρ ∈ [ ρ (0) , ⊕ from E is d E ( ρ ) = S ( ρ, c ) . Therestricted projection π V | cl rI ( E ) is a bijection onto the mean value set M ( V ) .Proof: By Proposition 10 the (+1) -closure of E is a union of exponentialfamilies E q = { exp q ( qθq ) | θ ∈ V } for maximal projections q . In place of themaximal projections of v (cid:54) = 0 in V we consider equivalently the maximal projec-tions of the vectors u ( α ) := sin( α ) σ ⊕ α )( σ ⊕ , α ∈ R . (26)There are two cases depending on the spectral projections in the orthogonal sum u ( α ) = ρ ( α ) − ρ ( α + π ) + 0 ⊕ cos( α ) . The maximal eigenvalue of u ( α ) is constant one. If α (cid:54) = 0 mod 2 π , then themaximal projection of u ( α ) is ρ ( α ) and has rank one. We get E ρ ( α ) = (cid:110) exp ρ ( α )1 ( ρ ( α ) θρ ( α )) | θ ∈ V (cid:111) = { ρ ( α ) } proving B ⊂ cl (+1) ( E ) . If α = 0 mod 2 π , then the maximal projection of u (0) is p := ρ (0) + 0 ⊕ c . Since p ( σ ⊕ p = 0 and p ( σ ⊕ p = p thecanonical parameter space of E p consists of multiples of the identity p in p A p , p Θ p = pV p = R p . So is E p = { c } , we conclude cl (+1) ( E ) = E ∪ B ∪ { c } .20emma 16 provides an upper bound on the norm closure E in terms of facesof S ( A ) exposed by vectors in V , and Lemma 12 describes these faces in termsof maximal projections F ( S ( A ) , u ( α )) = { ρ ∈ S ( A ) | s ( ρ ) (cid:22) p + ( u ( α )) } . For α (cid:54) = 0 mod 2 π the maximal projection ρ ( α ) of u ( α ) has rank one and theexposed face is F ( S ( A ) , u ( α )) = { ρ ( α ) } . The projection p + ( u (0)) = p abovegives the segment [ ρ (0) , ⊕

1] = F ( S ( A ) , u (0)) . We obtain E ⊂ E ∪ B ∪ [ ρ (0) , ⊕ . The inclusions cl (+1) ( E ) ⊂ E and B ⊂ cl (+1) ( E ) show B ⊂ E . We prove thatexactly the part [ ρ (0) , c ] of the segment [ ρ (0) , ⊕ belongs to E .We prove that at most the half segment [ ρ (0) , c ] belongs to E by showingthat E is included in the closed half space (cid:104) a, v (cid:105) ≤ . This is suﬃcient because v is parallel to [ ρ (0) , ⊕ and (cid:104) ρ (0) , v (cid:105) = − , (cid:104) c, v (cid:105) = 0 and (cid:104) ⊕ , v (cid:105) = 1 . We look at the polar parametrization of E , deﬁned with (24) as R × R +0 → E , ( α, t ) (cid:55)→ E ( α, , t ) . The normalization factor η is strictly positive, so (cid:104) E ( α, , t ) , v (cid:105) ≤ is by (25)equivalent to z ( α, t ) := η (cid:104) E ( α, , t ) , v (cid:105) = − cos( α ) sinh( t ) − cosh( t ) + e cos( α ) t ≤ . For t = 0 we have z ( α,

0) = 0 while for t ≥ and arbitrary α ∈ R we have ± z ( α, t ) + ∂∂t z ( α, t ) = (cos( α ) ± (cid:2) e cos( α ) t − e ± t (cid:3) ≤ . This implies ∂∂t z ( α, t ) ≤ and by integration z ( α, t ) ≤ .We show [ ρ (0) , c ] ⊂ E . The state ρ (0) lies in the closure of B so we stillhave to approximate for λ ∈ (0 , the state τ ( λ ) := (1 − λ ) ρ (0) ⊕ λ fromwithin E . For t > we choose α ( t ) := (cid:113) t ln( − λλ ) . Then lim t →∞ α ( t ) = 0 and lim t →∞ e (cos( α ( t )) − t = λ − λ hold. Expanding by e − t we have lim t →∞ E ( α ( t ) , , t ) = (1l + σ ) ⊕ λ − λ λ − λ = τ ( λ ) . We calculate the rI-closure. This is bounded by Corollary 15 between (+1) -and norm closures cl (+1) ( E ) = E ∪ B ∪ { c } ⊂ cl rI ( E ) ⊂ E ∪ B ∪ [ ρ (0) , c ] = E .

21t remains to discuss states ρ ∈ [ ρ (0) , ⊕

1] = F ( S ( A ) , u (0)) . Proposition 14and E p = { c } show d E ( ρ ) = d E p ( ρ ) = S ( ρ, c ) . So ρ ∈ cl rI ( E ) holds for ρ ∈ [ ρ (0) , ⊕ if and only if ρ = c . This shows cl rI ( E ) = cl (+1) ( E ) .We show that π V | cl rI ( E ) is a bijection onto M ( V ) . The boundary of themean value set M ( V ) is by (17) and by Lemma 8 equal to the ellipse ∂ M ( V ) = π V ( B ∪ { ρ (0) } ) so π V restricted to the circle B ∪ { ρ (0) } is a bijection. Since c lies on thesegment [ ρ (0) , ⊕ which is perpendicular to V , it substitutes ρ (0) in thatbijection. Another bijection is the mean value chart π V | E : E → ri( M ( V )) , see(11). The two latter bijections assembled prove the claim. (cid:3) Corollary 19.

The entropy distance d E : S ( A ) → [0 , log(3)] from the Staﬀel-berg family is discontinuous at ρ (0) .Proof: By the previous theorem we have d E ( ρ (0)) = S ( ρ (0) , c ) = ln(2) while d E ≡ on the punctured base circle B of the cone S ( A ) . But ρ (0) ∈ B . (cid:3) Corollary 20.

The mean value parametrization π E : ri( M ( V )) → E of theStaﬀelberg family has no continuous extension to the mean value set M ( V ) ; ithas no continuous extension to π V ( ρ (0)) .Proof: Since the segment [ ρ (0) , ⊕ belongs to the norm closure of E and since this segment is perpendicular to V , the mean value parametrization π E : ri( M ( V )) → E does not extend continuously to π V ( ρ (0)) . (cid:3) We address the maximum-entropy principle.

Theorem 21.

The rI-closure of the Staﬀelberg family is a set of maximum-entropy density matrices, cl rI ( E ) = { argmax ρ ∈ F ( v ) S ( ρ ) | v ∈ M ( V ) } . Thisholds for ﬁbers F ( v ) := ( v + V ⊥ ) ∩ S ( A ) as well as for F ( v ) := ( v + V ⊥ ) ∩ S (Mat(3 , C )) . Proof:

Since the Staﬀelberg family E is included in the state space S ( A ) , thePinsker-Csiszár inequality, recalled in Corollary 15, shows that E has the samerI-closure in both algebras A and Mat(3 , C ) . The mean value chart (11) showsthat the mean value set M ( V ) is the same for both algebras. So the bijection π V | cl rI ( E ) from the rI-closure onto the mean value set, proved in Theorem 18,also applies to both algebras. 22e discuss the inverse M ( V ) → cl rI ( E ) . Its restriction to the interior of themean value set ri (cid:0) M ( V ) (cid:1) → E is the mean value parametrization of E and thisis known to have the maximum-entropy property (14).Let us now consider the boundary of the mean value set M ( V ) , which is by(17) and by Lemma 8 equal to the ellipse ∂ M ( V ) = π V (cid:0) B ∪ { ρ (0) } (cid:1) . The ﬁbers F ( (cid:101) v ) for points (cid:101) v ∈ ∂ M ( V ) are faces of the state space S ( A ) , seeSection 5 in [Ws1]. Indeed they are the set of state space faces F ( S ( A ) , v ) which are exposed by a non-zero v ∈ V . Using Lemma 12 and consulting thelist of maximal projections of vectors v ∈ V in Theorem 18 these faces are thepoints on the punctured circle B and the segment [ ρ (0) , ⊕ . Maximizers ofthe von Neumann entropy on these ﬁbers are the points on B and the centroid c in the segment. This set completes E to its rI-closure by Theorem 18.In the larger C*-algebra Mat(3 , C ) the projection ρ (0) + 0 ⊕ correspondsto the face { ρ ∈ S (Mat(3 , C )) | s ( ρ ) (cid:22) ρ (0) + 0 ⊕ } which is isomorphicto the Bloch ball. So the maximizer of the von Neumann entropy in the ﬁber ( v + V ⊥ ) ∩ S (Mat(3 , C )) is c as before. (cid:3) We ﬁnish with two short conclusions about a discontinuous inference.

Remark 22.

If a maximum-entropy inference (3) is carried out by observablesspanning the canonical tangent space V of the Staﬀelberg family, then the vari-ance of the inferred state (cid:98) ρ ( n ) may be large: Assuming that the quantum systemis given by an invertible density matrix ρ , measured values (cid:0) m , . . . , m k (cid:1) aremapped to the inferred state (cid:98) ρ ( n ) by the mean value parametrization π E deﬁnedin (14). The mean value parametrization π E does not extend continuously to π V ( ρ (0)) by Corollary 20 so the mean value theorem shows that π E has arbi-trary large partial derivatives near π V ( ρ (0)) . It follows that the constant in thevariance estimate O (1 /n ) of (cid:98) ρ ( n ) can be arbitrarily large.Second, the non-generic choice of ρ such that π V ( ρ ) = π V ( ρ (0)) makesit likely that the inferred states (cid:98) ρ ( n ) diverge or converge to a state which isnot a maximum-entropy state. This follows from Theorem 18 and Theorem 21because the whole segment [ ρ (0) , c ] belongs to the closure of E while only c isa state of maximum entropy under the given constraints. (+1) -asymptotics and ( − -closure of the Staﬀelberg family We show that the ( − -closure of the Staﬀelberg family E equals its rI-closure.This follows from an asymptotic analysis of its (+1) -geodesics. See (7) and (8)for deﬁnitions of these closures.We use the parametrization E ( α, s, t ) of E deﬁned in (24) and a coordinatesystem spanned by ( σ ⊕ and ( σ ⊕ . Coeﬃcients of points on E are the23rst two numbers in (25), they describe projection onto V : g := (cid:104) E ( α, s, t ) , σ ⊕ (cid:105) = η (cid:2) ( e b − e − b ) y/b + e − s sin( α )+ t cos( α ) (cid:3) h := (cid:104) E ( α, s, t ) , σ ⊕ (cid:105) = η (cid:2) ( e b − e − b ) x/b (cid:3) . We consider the asymptotic slope in the ( σ ⊕ - ( σ ⊕ -coordinate system κ ( α, s ) := lim t →∞ d h d g = lim t →∞ d h d t d g d t = lim t →∞ η d( hη )d t − ( hη ) d η d t η d( gη )d t − ( gη ) d η d t . (27)The coordinates { ( (cid:104) ρ, σ ⊕ (cid:105) , (cid:104) ρ, σ ⊕ (cid:105) ) | ρ ∈ S ( A ) } of the mean value set ﬁllthe unit disk. Projections of (+1) -geodesics hit the unit circle for s = 0 , theyare tangential to the unit circle for every s (cid:54) = 0 : Lemma 23.

For all α ∈ R and all s ∈ R we have ( g, h ) t →∞ −→ (cos( α ) , sin( α )) .The asymptotic slope of (+1) -geodesics through the tracial state ( s = 0 ) is κ ( α,

0) = (cid:26) if α = 0 , − cot( α ) if α (cid:54) = 0 . The asymptotic slope of (+1) -geodesics missing the tracial state ( s (cid:54) = 0 ) is κ ( α, s ) = − cot( α ) . Proof:

The (+1) -geodesic limit t → ∞ follows from (20) and from thediscussion of maximal projections in Theorem 18. Then lim t →∞ ( g, h ) follows.We ﬁrst compute the asymptotical slope for (+1) -geodesics through thetracial state s = 0 . We have ( η d( hη )d t − ( hη ) d η d t ) e − t (1+cos( α )) = sin( α )(1 + e − t + 4 e − t − t cos( α ) − cos( α ) + e − t cos( α )) and ( η d( gη )d t − ( gη ) d η d t ) e − t (1+cos( α )) = − (1 − cos( α )) + e − t + cos( α )(2 e − t + 4 e − t − t cos( α ) + e − t cos( α )) . From this and (27) we get the desired result, studying α = 0 and α = π apart.The asymptotical slope for (+1) -geodesics missing the tracial state ( s (cid:54) = 0) follows from a third order Taylor expansion at t = ∞ . If α (cid:54) = 0 modulo π then ( η d( hη )d t − ( hη ) d η d t ) = − st cos( α ) + O ( t )( η d( gη )d t − ( gη ) d η d t ) = st sin( α ) + O ( t ) . (+1) -geodesics in the Staﬀelberg family. Left: geodesicsthrough the tracial state; right: two families of parallel geodesics, those throughthe tracial state are dashed.For α = 0 we have ( η d( hη )d t − ( hη ) d η d t ) = − st + O ( t )( η d( gη )d t − ( gη ) d η d t ) = O ( t ) completing the claim. (cid:3) Some projected (+1) -geodesics of the Staﬀelberg family are drawn in Fig-ure 4. As a fact not used in the sequel, Lemma 23 shows that the two asymptotictangents t → ±∞ of a projected (+1) -geodesic through the tracial state ( s = 0 )intersect orthogonally at (1 , for α (cid:54) = 0 , π . While the right angle is not invariantunder aﬃne reparametrizations, these tangents intersect in V at the projectionof the cliﬀ c = ( ρ (0) + 0 ⊕ of the Staﬀelberg family. Lemma 24.

For all s ∈ [ − , and all t ≥ we have uniformly in s (cid:107) E (0 , s, t ) − c (cid:107) = O ( t − ) . Proof:

By Taylor expansion b = t + s t + O ( t − ) , we have uniformly for s ∈ [ − , E (0 , s, t ) = (cid:32) cosh( b ) ( s − it ) sinh( b ) b s + it ) sinh( b ) b cosh( b ) 00 0 e t (cid:33)(cid:46)(cid:0) b ) + e t (cid:1) = c + O ( t − ) . This proves the statement, since (cid:107) a (cid:107) = (cid:113)(cid:80) k,(cid:96) | a k,(cid:96) | . (cid:3) Theorem 25.

For the Staﬀelberg family E the ( − -closure equals the (+1) -and the rI-closure, cl ( − ( E ) = cl (+1) ( E ) = cl rI ( E ) . roof: The equality cl (+1) ( E ) = cl rI ( E ) was shown in Theorem 18. Since ( − -geodesics are included in E we clearly have cl ( − ( E ) ⊂ E . On the otherhand, in every ﬁber ( v + V ⊥ ) ∩ S ( A ) with v ∈ M ( V ) there is at least one pointof the ( − -closure (choose a segment ] u, v [ ⊂ ri( M ( V )) and lift it to E throughthe mean value parametrization). By Theorem 18 there is a bijection π V | E\ S : E \ S → M ( V ) \ { m } for the segment S := [ ρ (0) , ⊕ and its projection m := π V ( c ) . The threearguments combined show E \ S = cl ( − ( E ) \ S .It remains to discuss states ρ ∈ S , whether they belong to cl ( − ( E ) . Thepoint c clearly does since the unparametrized ( − -geodesic ] ρ ( π ) , c [ belongs to E . We ﬁnish by showing { c } = S ∩ cl ( − ( E ) .The ( − -geodesic from ρ ( π ) to c is also a (+1) -geodesic, parametrized for s = 0 by g s ( t ) := E (0 , s, t ) . Using (20) we see that for all real s the geodesic g s has the limit c when t → + ∞ ,its projection π V ( g s ) has the limit m = π V ( c ) . For s (cid:54) = 0 the asymptotic tangentof π V ( g s ) is tangential to the elliptical boundary ∂ M ( V ) of the mean value set byLemma 23. This implies that the projections π V ( g − ) and π V ( g +1 ) concatenateto a closed smooth curve in M ( V ) which is tangential to ∂ M ( V ) at m . Usingthe mean value chart (11) of E , it is clear that this curve bounds the set U := { π V ( g s ( t )) | − ≤ s ≤ , t ∈ R } ⊂ M ( V ) . Let h be any ( − -geodesic in E with limit ρ in the segment S . If we chooseany sequence ρ n ⊂ h such that ρ = lim n →∞ ρ n , then θ n := log ( ρ n ) divergesin the norm (otherwise the contradiction ρ ∈ E follows). As the boundary of U is tangential to the ellipse ∂ M ( V ) at m , there is (cid:15) > such that π V ( h ) ∩ { v ∈ V | (cid:107) v − m (cid:107) < (cid:15) } ⊂ U .

So the points π V ( ρ n ) lie in U for large n . Since the convergence of the (+1) -geodesics g s to c is uniform (for − ≤ s ≤ ) by Lemma 24, the states ρ n converge to c . (cid:3) We now consider 2D families E = exp ( V ) in the metamorphosis of Figure 2that have non-exposed faces in the mean value set M ( V ) . By Lemma 8 thishappens for angles ϕ ( V ) ∈ (0 , π/ . We prove that the (+1) -closure cl (+1) ( E ) is too small to serve as a set of entropy maximizers under linear constraints.The problem is that the two non-exposed points of the mean value set are notcovered by cl (+1) ( E ) in the projection onto V . Calculations become easy for ϕ = arccos( (cid:112) / ≈ . π and we then call E the Swallow family because itlooks like the beak of a bird: 26igure 5: The Swallow family E sketched by (+1) -geodesics. The cone about E is the state space S ( A ) . Its generating lines [ ρ (0) , ⊕ and [ ρ ( π ) , ⊕ belong to the rI-closure of E but the pure states ρ (0) and ρ ( π ) do not belongto the (+1) -closure of E . They project to the non-exposed points of the meanvalue set M ( V ) whose boundary is drawn below. Deﬁnition 26.

The

Swallow family , depicted in Figure 5, is the Gibbsian family E := exp (span R { σ ⊕ , σ ⊕ } ) in the *-subalgebra A ⊂

Mat(3 , C ) deﬁned in Example 3.The canonical tangent space V = Θ of E is spanned by the vectors of equallength v := σ ⊕ − and v := σ ⊕ − . The vector z = − ⊕ isperpendicular to v − v , so indeed ϕ = ∠ (cid:4) ( V, z ) = ∠ (cid:4) ( v + v , z ) = arccos( (cid:112) / . The pure states ρ (0) = (1l + σ ) ⊕ and ρ ( π ) = (1l + σ ) ⊕ on the basecircle of the conic state space S ( A ) are crucial for the Swallow family. Theorem 27.

The (+1) -closure of the Swallow family E is the union of E , ofthe segments ] ρ (0) , ⊕ and ] ρ ( π ) , ⊕ (rank-two states) and of the purestates ⊕ and { ρ ( α ) | π < α < π } . The ( − - rI- and norm closures are cl ( − ( E ) = cl rI ( E ) = E = cl (+1) ( E ) ∪ { ρ (0) , ρ ( π ) } . Proof:

First we calculate the (+1) -closure cl (+1) ( E ) using Proposition 10.For α ∈ R we have the orthogonal sum u ( α ) := sin( α )( σ ⊕ α )( σ ⊕

1) = ρ ( α ) − ρ ( α + π )+0 ⊕√ α − π ) . α = 0 and π are p := p + ( u (0)) = ρ (0) + 0 ⊕ and q := p + ( u ( π )) = ρ ( π ) + 0 ⊕ . For < α < π we have p + ( u ( α )) = 0 ⊕ and for π < α < π we have p + ( u ( α )) = ρ ( α ) .Calculating the corresponding exponential families we observe p A p ∼ = C and since p ( σ ⊕ p = 0 ⊕ , the exponential family E p = exp p ( p Θ p ) has thecanonical parameter space R (0 ⊕ − ρ (0)) ∼ = R (1 , − ⊂ C . The analogue arguments apply to q , so the exponential family E p = ] ρ (0) , ⊕ resp. E q = ] ρ ( π ) , ⊕ consists of the invertible states in the compressed algebra p A p resp. q A q . Allother maximal projections r of elements of v (cid:54) = 0 of V have rank one and producethe exponential family E r = { exp r ( rθr ) | θ ∈ V } = { r } . This completes thecalculation of the (+1) -closure of E .In the second step we prove that the points ρ (0) and ρ ( π ) missing in the (+1) -closure belong to the rI-closure of E . Lemma 12 describes the exposedface F ( S ( A ) , u (0)) = [ ρ (0) , ⊕

1] = S ( p A p ) , containing the pure state ρ (0) .Then Proposition 14 shows d E ( ρ (0)) = d E p ( ρ (0)) = d cl (+1) ( E p ) ( ρ (0)) . Since cl (+1) ( E p ) = cl (+1) ( ] ρ (0) , ⊕

1[ ) = [ ρ (0) , ⊕ we get d E ( ρ (0)) = d [ ρ (0) , ⊕ ( ρ (0)) = 0 and this implies ρ (0) ∈ cl rI ( E ) . Theanalogue arguments show ρ ( π ) ∈ cl rI ( E ) .By the same method as in Theorem 18 an upper bound on the norm clo-sure E can be stated in terms of maximal projections in V . These projectionsare listed above, the corresponding faces are the pure state ⊕ , the arc ofpure states ρ ( α ) for π < α < π and the two segments [ ρ (0) , ⊕ and [ ρ ( π ) , ⊕ (the state spaces of the algebras p A p ∼ = q A q ∼ = C ). Thus E ⊂ cl (+1) ( E ) ∪ { ρ (0) , ρ ( π ) } follows from the above description of the (+1) -closure.Since ρ (0) and ρ ( π ) belong to the rI-closure and since cl (+1) ( E ) ⊂ cl rI ( E ) ⊂ E holds by Corollary 15 we have shown cl rI ( E ) = E = cl (+1) ( E ) ∪ { ρ (0) , ρ ( π ) } . (cid:3) Theorem 28.

The projection π V | cl rI ( E ) is a bijection onto the mean value set M ( V ) , the non-exposed points of M ( V ) are π V ( ρ (0)) and π V ( ρ ( π )) . The rI-closure of the Swallow family is a set of maximum-entropy density matrices, cl rI ( E ) = { argmax ρ ∈ F ( v ) S ( ρ ) | v ∈ M ( V ) } for ﬁbers F ( v ) := ( v + V ⊥ ) ∩ S ( A ) . roof: The relative interiors of faces of the mean value set M ( V ) are apartition of M ( V ) [Ro]. Each face F of M ( V ) is the projection to V of theinverse projection ( F + V ⊥ ) ∩ S ( A ) , which is a face of S ( A ) . The relativeinterior of the inverse projection of F projects onto the relative interior of F ;we show that these projections are bijections for the algebra A , for the Swallowfamily E and for all faces F in the boundary of the mean value set M ( V ) .The two non-exposed points π V ( ρ (0)) and π V ( ρ ( π )) at the ellipse withcorner M ( V ) are computed in case 3 of Example 1.2 in [Ws2] studying tan-gents. The present setting ﬁts into Example 1.2 in [Ws2] by choosing there g := √ (1 , − , and h := √ (1 , , . The inverse projections ( ρ (0) + V ⊥ ) ∩ S ( A ) and ( ρ ( π ) + V ⊥ ) ∩ S ( A ) are faces of the state space S ( A ) and it is proved incase 3 of Section 3.3 in [Ws2] that these faces are the extremal points ρ (0) and ρ ( π ) and that they are not larger.Every exposed face F = F ( M ( V ) , v ) for non-zero v ∈ V is actually theprojection of the exposed face F ( S ( A ) , v ) , see Section 3.1 in [Ws2]. Thesefaces are computed in the last paragraph of Theorem 27. A missing bijectivityof their projections onto V is only possible for the two segments, but it does notoccur because the two segments cover the two boundary segments of M ( V ) .The maximum-entropy problem is solved for points in ri( M ( V )) in (14).Since the projection of ( ∂ M ( V )+ V ⊥ ) ∩S ( A ) onto V is a bijection onto ∂ M ( V ) ,the maximum-entropy problem is trivial for boundary points of M ( V ) . (cid:3) Remark 29. a) The Swallow family is suitable to demonstrate that the extremepoints of a mean value set M ( V ) are in general not covered by the projec-tions π V ( p tr( p ) ) for maximal projections p = p + ( v ) , v ∈ V , as is claimed inTheorem 1 (e) in [Wi].Let B denote one of the algebras A or Mat(3 , C ) where A ⊂

Mat(3 , C ) is the*-subalgebra deﬁned in Example 3. Since A and Mat(3 , C ) have the sameidentities

1l = 1l we can argue with eigenvalues to calculate the maximalprojections of vectors in V . Moreover, the mean value set M B ( V ) is well-deﬁned, see Section 3.4 in [Ws2]. For faces F of the mean value set the liftedfaces ( F + V ⊥ ) ∩ S ( B ) are of the form { ρ ∈ S ( B ) | s ( ρ ) (cid:22) p } for projections p ∈ B , see Section 2.3 in [Ws2]. The necessary projections p are computedrecursively from V , see Theorem 3.7 or Remark 3.10 in [Ws2]. This givesthe same set of projections for both algebras A and Mat(3 , C ) .Now, the pure state ρ (0) (and ρ ( π ) ) is not on the list of maximal projectionsof vectors in V provided in the ﬁrst paragraph of Theorem 27. On the otherhand, as discussed in the second paragraph of Theorem 28, the state ρ (0) isthe unique state in S ( A ) that projects to the non-exposed point π V ( ρ (0)) ofthe mean value set.b) There is no (+1) -geodesic in the Swallow family E that meets ρ (0) asymp-totically. Calculation of cl rI ( E ) in Theorem 27 is done by two limits of (+1) -29eodesics. One of the limits is implicit in the equation d E ( ρ (0)) = d E p ( ρ (0)) .Only a second (+1) -geodesic in E p meets ρ (0) asymptotically. We now study local maximizers of the entropy distance d E from an exponentialfamily E , a question which was motivated in Section 1.1 in the context of infomaxprinciples. We have to restrict to Gibbsian families since the mean value chart(11) is only available for these exponential families in the present article.We show that a local maximizer ρ of d E carries a clear imprint from its pro-jection π E ( ρ ) to E . This generalizes the commutative case, where ρ is the con-ditional probability distribution of π E ( ρ ) conditioned on its own support supp( ρ ) ρ = π E ( ρ )( · | supp( ρ )) . (28) Remark 30.

In the commutative case the assertion (28) was proved for a localmaximizer ρ ∈ dom E = S ( A ) ∩ ( E + V ⊥ ) in [Ay]. The articles [AK, Ma, Rh, MR]contain further characterizations of local and global maximizers that can beinteresting also in the non-commutative case.The derivative of the logarithm is derived for A = Mat( N, C ) in [Li]. It maybe generalized to any *-subalgebra A of Mat( N, C ) using an algebra embedding φ : A →

Mat( n, C ) such that φ (1l) is invertible. If p ∈ A is a projection thenfor invertible ρ ∈ S ( p A p ) and self-adjoint u ∈ p A p we have D | ρ ln p ( u ) = (cid:82) ∞ ( ρ + sp ) − u ( ρ + sp ) − d s . (29)Here we denote functions in p A p by a superscript like in the paragraph beforeLemma 9. Theorem 31.

Suppose A is a *-subalgebra of Mat( N, C ) and E a Gibbsianfamily in A with canonical tangent space V . Let ρ ∈ dom E , let p denote thesupport projection of ρ and put θ := ln ◦ π E ( ρ ) ∈ V . If u is a traceless self-adjoint matrix in p A p , then D | ρ d E ( u ) = (cid:104) u, ln p ( ρ ) − θ (cid:105) . If ρ is a local maximizerof d E , then ρ = exp p ( p θp ) and d E ( ρ ) = F ( θ ) − F p ( p θp ) .Proof: As discussed in the paragraph following (11), the mean value parametriza-tion π E deﬁned for a ∈ E + V ⊥ by intersection a (cid:55)→ ( a + V ⊥ ) ∩ E is real analytic.This gives a real analytic mapping L : E + V ⊥ −→ V, a (cid:55)−→ ln ◦ π E ( a ) . We can use π E ( a ) = exp ◦ L ( a ) and rewrite the entropy distance (13) of a state ρ ∈ E + V ⊥ from E in the form d E ( ρ ) = S ( ρ, π E ( ρ )) = S ( ρ, exp ◦ L ( ρ )) (30) = − S ( ρ ) − tr( ρ ln ◦ exp ◦ L ( ρ )) = − S ( ρ ) − tr( ρL ( ρ )) + F ◦ L ( ρ ) F and von Neumann entropy S . As ρ is invertible in thealgebra p A p , we can diﬀerentiate at ρ the logarithm ln p in the direction of anyself-adjoint matrix u ∈ p A p . By (29) and cyclic reordering under the trace weget D | ρ S ( u ) = −(cid:104) u, ln p ( ρ ) (cid:105) − tr( u ) . Using the derivative of the free energy (23), which is for a, b ∈ A given by D | a F ( b ) = (cid:104) b, exp ( a ) (cid:105) , the chain rule leads to D | ρ ( F ◦ L )( u ) = D | L ( ρ ) F ◦ D | ρ L ( u )= (cid:104) D | ρ L ( u ) , exp ◦ L ( ρ ) (cid:105) = (cid:104) D | ρ L ( u ) , π E ( ρ ) (cid:105) . Since the image of L is V we have D | ρ L ( u ) ∈ V and thus by deﬁnition of theprojection π E follows (cid:104) D | ρ L ( u ) , π E ( ρ ) − ρ (cid:105) = 0 . Diﬀerentiation of (30) in thedirection of a traceless self-adjoint matrix u ∈ p A p gives D | ρ d E ( u ) = (cid:104) u, ln p ( ρ ) (cid:105) + tr( u ) − (cid:104) u, L ( ρ ) (cid:105) − (cid:104) ρ, D | ρ L ( u ) (cid:105) + (cid:104) D | ρ L ( u ) , π E ( ρ ) (cid:105) = (cid:104) u, ln p ( ρ ) − L ( ρ ) (cid:105) . This completes the asserted directional derivative.If ρ is a local maximizer of d E , then ln p ( ρ ) = p L ( ρ ) p + λp for some real λ because p spans the orthogonal complement of the space of traceless self-adjointmatrices in p A p . If follows that ρ must be proportional to p exp( p L ( ρ ) p ) asclaimed. If we write θ := L ( ρ ) = ln ◦ π E ( ρ ) , then we have ρ = exp p ( p θp ) and π E ( ρ ) = exp ( θ ) . We get d E ( ρ ) = S ( ρ, π E ( ρ )) = tr[ ρ (ln p ( ρ ) − ln ◦ π E ( ρ ))]= tr [ ρ ( p θp − p ln ◦ tr ◦ exp p ( p θp ) − θ + 1l ln ◦ tr ◦ exp( θ ))]= ln(tr( e θ )) − ln(tr( p e p θp )) . (cid:3) Acknowledgment:

SW thanks the organizers of the DFG research group“Geometry and Complexity in Information Theory” (2004–2008) for the scholar-ship and the great workshops. We thank Nihat Ay for discussions about infor-mation measures and the referee for several helpful comments.

References [AS] Alfsen, E. M. and Shultz, F. W.:

State Spaces of Operator Algebras.

Birkhäuser, Boston (2001)[Am] Amari, S.:

Information geometry on hierarchy of probability distribu-tions.

IEEE Trans. Inf. Theory Methods of Information Geometry.

Tans-lations of Mathematical Monographs , AMS, Providence (2000)[AV] Audenaert, K. M. R., Nussbaum, M., Szkoła, A. and Verstraete, F.:

Asymptotic Error Rates in Quantum Hypothesis Testing.

Comm. Math.Phys.

An information-geometric approach to a theory of pragmaticstructuring.

Ann. Probab. Maximizing multi-information.

Kybernetika A geometric approachto complexity.

Chaos Information and Exponential Families in Statis-tical Theory.

John Wiley & Sons, New York (1978)[CM] Csiszár, I. and Matúš, F.:

Information projections revisited.

IEEE Trans.Inf. Theory C*-algebras by example.

Providence, AMS (1996)[El] Ellis, R.:

Entropy, Large Deviations, and Statistical Mechanics.

Classicsin Mathematics, Springer (2006)[EA] Erb, I. and Ay, N.:

Multi-information in the thermodynamic limit.

J.Stat. Phys.

On the Uniqueness of the ChentsovMetric in Quantum Information Geometry.

Inﬁnite Dim. Anal. QuantumInfo. and Related Topics Convex Polytopes.

Springer-Verlag, New York, 2nd ed.(2003)[IO] Ingarden, R. S., Kossakowski, A. and Ohya, M.:

Information dynam-ics and open systems.

Kluwer Academic Publishers Group, Dordrecht(1997)[Ja] Jaynes, E. T.:

Information Theory and Statistical Mechanics I/II.

Phys.Rev.

Linear Algebra for SemideﬁniteProgramming.

S¯urikaisekikenky¯usho K¯oky¯uroku

The capacity of hybrid quantum memory.

InformationTheory, IEEE Transactions 49, 1465–1473 (2003)32Li] Lieb, E. H.:

Convex trace functions and the Wigner-Yanase-Dyson con-jecture.

Adv. in Math. Mutual information of Ising systems.

Int. J. Theor. Phys. Optimality conditions for maximizers of the informationdivergence from an exponential family.

Kybernetika Maximization of the information divergencefrom an exponential family and criticality.

IEEE ISIT Proceedings (2011)[Ne] Netzer, T.:

Spectrahedra and Their Shadows.

Habilitationsschrift, Uni-versität Leipzig (2011)[NS] Nussbaum, M. and Szkoła, A.:

An assymptotic error bound for testingmultiple quantum hypothesis. (to appear in Ann. Statist.)[Pe1] Petz, D.:

Geometry of canonical correlation on the state space of aquantum system.

J. Math. Phys. Monotone Metrics on Matrix Spaces.

Lin. Alg. Appl.

Quantum Information Theory and Quantum Statistics.

The-oretical and Mathematical Physics, Springer-Verlag, Berlin (2008)[PR] Petz, D. and Ruppert, L.:

Eﬃcient quantum tomography needs com-plementary and symmetric measurements. (to be published)[Ra] Rau, J.:

Inferring the Gibbs state of a small quantum system.

PhysicalReview A Finding the Maximizers of the Information Divergence froman Exponential Family.

IEEE Trans. Inf. Theory Convex Analysis.

Princeton University Press, Prince-ton (1970)[Ru] Ruskai, M. B.:

Extremal Properties of Relative Entropy in QuantumStatistical Mechanics.

Rep. Math. Phys. QuantifyingEntanglement.

Phys. Rev. Lett. General properties of entropy.

Reviews of Modern Physics A Note on Touching Cones and Faces.

J. Convex Analysis (2012).[Ws2] Weis, S.: Quantum Convex Support.

Lin. Alg. Appl.

Duality of non-exposed faces.

J. Convex Analysis (2012)[Wi] Wichmann, E. H.: Density matrices arising from incomplete measure-ments.

J. Math. Phys. Optimal State-Discrimination byMutually Unbiased Measurements.

Ann. Phys.191