aa r X i v : . [ qu a n t - ph ] F e b A new quantum version of f -divergence Keiji MatsumotoQuantum Computation Group, National Institute of Informatics,2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430,e-mail:[email protected] 7, 2018
Abstract
This paper proposes and studies new quantum version of f -divergences, a classof convex functionals of a pair of probability distributions including Kullback-Leibler divergence, Renyi-type relative entropy and so on. There are severalquantum versions so far, including the one by Petz [12]. We introduce anotherquantum version (D max f , below), defined as the solution to an optimization prob-lem, or the minimum classical f - divergence necessary to generate a given pairof quantum states. It turns out to be the largest quantum f -divergence. Theclosed formula of D max f is given either if f is operator convex, or if one of the stateis a pure state. Also, concise representation of D max f as a pointwise supremumof linear functionals is given and used for the clarification of various propertiesof the quality.Using the closed formula of D max f , we show: Suppose f is operator convex.Then the maximum f - divergence of the probability distributions of a measure-ment under the state ρ and σ is strictly less than D max f ( ρ k σ ). This statementmay seem intuitively trivial, but when f is not operator convex, this is notalways true. A counter example is f ( r ) = | − r | , which corresponds to totalvariation distance.We mostly work on finite dimensional Hilbert space, but some results areextended to infinite dimensional case. This paper proposes and studies a new quantum version of f -divergence:D f ( p k q ) := X x q ( x ) f ( p ( x ) /q ( x ) ) , where p and q are probability distributions. Several important quantities ininformation theory and statistics are in this class. For example, D r ln r and D r α f -divergences than these have at least one operational meaning. If f is a convex function and satisfies some moderate conditions, D f ( p k q ) is theoptimal gain of a certain Bayes decision problem: for each f , there is a pair offunctions w and w on decision space representing a gain of decision d withD f ( p k q ) = sup d ( · ) X x ( w ( d ( x )) p ( x ) + w ( d ( x )) q ( x )) . (1.1)Conversely, for each ( w ( · ) , w ( · )), there is a convex function f with this iden-tity. Also, by (1.1) and the celebrated randomization criterion [25], there is aMarkov map which sends ( p, q ) to ( p ′ , q ′ ) iff D f ( p k q ) ≥ D f ( p ′ k q ′ ) holds for anyconvex function f with above mentioned properties.In quantum information theory, a series of works by Petz (see [12] and ref-erences therein) is most impressive, and his version of quantum divergence havebeen widely studied and applied. Also, recent development of theory of quantumRenyi entropy is significant.In this paper, we introduce another quantum version, the maximal quan-tum f - divergence D max f ( ρ k σ ). This quantity is defined as the solution to thefollowing optimization problem: given a pair of quantum states { ρ, σ } , considera (completely) positive trace preserving map Γ that sends probability distribu-tions { p, q } to { ρ, σ } . The triple (Γ , { p, q } ) ( reverse test , here after), is optimizedto minimize D f ( p k q ), and this infimum is D max f ( ρ k σ ). The name comes fromthe fact that D max f is the largest of the all possible quantum f - divergences.Some historical remarks are in order. When f is r ln r and σ is invertible,D max r ln r ( ρ k σ ) = tr ρ log ρ / σ − ρ / . This RHS quantity had been studied by several authors from operator theoreticpoint of view [4][9][13]. Also, some authors had pointed out this quantity ispath dependent divergence [1] of RLD quantum Fisher metric [22], which playsan important role in quantum statistical estimation theory [14], along e - and m -geodesic connecting ρ and σ [10][19][16]. However, its characterization as thesolution to the optimization problem and the largest quantum version is firstpointed out by the present author [16]. In [17], the present author studied D max r / rather intensively, and briefly treated the case when f is operator monotonedecreasing. Recently, based on an earlier version of the present paper, [11]studied some aspects of D max f .Below, we summarize our main results. When f is operator convex, a seriesof rich results are available. First, we can write down the value of D max f andthe operation achieving the minimum explicitly: Suppose σ and ρ are ρ = (cid:20) ρ ρ ρ ρ (cid:21) , σ = (cid:20) σ
00 0 (cid:21) , max f ( ρ || σ ) = tr σ f (cid:16) σ − / ˜ ρσ − / (cid:17) + tr ( ρ − ˜ ρ ) lim ε ↓ εf (1 /ε ) . (1.2)where ˜ ρ := ρ − ρ ( ρ ) − ρ . The first term of the RHS is trace of non-commutative perspective [6][7] of ˜ ρ and σ . The operation achieves the minimumis obtained using spectral decomposition of σ − / ˜ ρσ − / , and the same reversetest is optimal for all operator convex function f ’s . Uniqueness of optimaloperation modulo trivial redundancy is also shown.Based on these analysis, we had shown, for example: Suppose f is operatorconvex. Then the maximum f - divergence of the probability distributions ofa measurement under the state ρ and σ is strictly less than D max f ( ρ k σ ). Thus,once encoded into non - commutative quantum states, some amount of classical f -divergence is irrecoverably lost. (This statement may seem intuitively trivial,but when f is not operator convex, this is not always true.)After the detailed analysis of the case of f is operator convex, we study thecase where such an assumption is not true. One of the motivation is much ofthe results in the former case generalizes.First, when one of the states are a pure state, (1.2) generalize to all theconvex functions, and the optimal reverse test is also the same, and also unique.Also, D max f is strictly larger than measured f - divergence, unless two statescommute.Next, we analyzed f ( r ) = | − r | , since this corresponds to total variationdistance, which is quite often used in statistics, information theory, and so on.Though we failed to obtain the closed formula, the optimization problem isreduced to quite simple linear semidefinite program. Using this, we had shownthat (1.2) is not true in this case, and the optimal reverse test is no the sameeither.In addition, when { ρ, σ } satisfies some conditions, it turns outD max | − r | ( ρ || σ ) = k ρ − σ k . (1.3)Since the RHS equals the measured total variation distance, this means the totalvariation sometimes does not decrease by embedding into non - commutativequantum states. The condition for (1.3) is not too restrictive: for example, if ρσ + σρ ≥
0, this identity holds. In the qubit case, the necessary and sufficientcondition for (1.3) is obtained, and fairly large area of Bloch sphere satisfies(1.3).Besides from these case studies, we had shown the dual expression of D max f ,D max f ( ρ k σ ) = sup { tr ( ρW + σW ) ; rW + W ≤ f ( r ) , r ≥ } . (1.4)This shows that D max f is the pointwise supremum of linear functionals, thus it islower semicontinuous. Thus, D max f behaves extremely nicely at the edge of thedomain. In fact, if ( ρ ε , σ ε ) is an arbitrary line segment connecting ( ρ, σ ) andan interior point of the domain, lim ε ↓ D max f ( ρ ε k σ ε ) = D max f ( ρ k σ ) holds. (1.4)3s also valid, with certain restrictions, even when the underlying Hilbert spaceis separable infinite dimensional space.Except for the last subsection, we will work on a finite dimensional Hilbertspace H . In most cases, the underlying Hilbert space is not mentioned unless itis confusing. The space of trace class operators, and bounded operators on H isdenoted by B ( H ), and B ( H ), respectively, and the space of their self - adjointelements are denoted by B ,sa ( H ), and B sa ( H ). In most cases, specification ofunderlying Hilbert space is dropped, thus B sa in stead of B sa ( H ), for example.When dim H < ∞ (thus in most of the paper,) to denote the space of all linearoperators, we use B ( H ).For each operator A , A − denotes its Moore-Penrose generalized inverse.Also, for each positive operator X , denote by supp X the its support, and by π X the projection onto supp X . The projection onto the space K is denotedby π K . Orthogonal complement of the projector π is denoted by π ⊥ . In thispaper, in most part, probability distributions or positive measures are definedon the finite set X . These are easily identified with commutative elements of B sa (cid:0) C |X | (cid:1) . Note the support of the measures µ is also denoted by supp µ . f -divergence This section explains the definition and known useful facts about classical f -divergence, and convex analysis.The definition of D f in the introduction obviously cannot be used when q ( x ) = 0 for some x . Convex analysis supplies useful tools to cope with suchcontinuity issue. As in [23], we suppose that h is a map from R n to R ∪ {±∞} .Instead of saying that h is not defined on a certain set, we say that h ( r ) = ∞ on that set. The effective domain of h , denoted by dom h , is the set of all r ’s with h ( r ) < ∞ . h is said to be convex iff its epigraph , or the set epi h := { ( r, λ ) ; λ ≥ h ( r ) } is convex. A convex function h is proper iff h is nowhere −∞ and not ∞ everywhere, and is lower semicontinuous iff the set { r ; λ ≥ h ( r ) } isclosed for any λ , or equivalently, iff its epigraph is closed (Theorem 7.1 of [23]), or equivalently, h (lim k →∞ r k ) ≤ lim k →∞ h ( r k ).Given a convex function h , its closure cl h is the greatest lower semicontin-uous (not necessarily finite) function majorized by h . The name comes fromthe fact that epi (cl h ) = cl (epi h ). cl h coincide with h except perhaps at therelative boundary points of its effective domain. If h is proper and convex, sois cl h (Theorem 7.4, [23]).The following Proposition will be intensively used later. Proposition 2.1 (Theorem 10.2,[23] ) If h is lower semicontinuous, properand convex, it is continuous on any simplex in dom h . From here, unless otherwise mentioned, f , which is used to define f -divergence, is supposed to satisfy the following condition. (FC) f is a proper, lower semicontinuous, and convex function with dom f ⊃ (0 , ∞ ). Also, f (0) = 0. 4ow we are in the position to define the classical f -divergence D f betweenthe positive measures p and q over the finite set X . It is defined in the followingmanner, so that the function ( p, q ) → D f ( p || q ) is lower semicontinuous: Namely,D f ( p k q ) := X x ∈X g f ( p ( x ) , q ( x )) , where g f ( s, t ) is the closure of tf (cid:0) st (cid:1) ( see p. 35 and p.67 of [23] ), g f ( s, t ) := tf ( s/t ) , if s ∈ dom f, t > t ↓ tf ( s/t ) , if s ∈ dom f, t = 0 , , if s = t = 0 , ∞ , otherwise. (2.1)It is easy to check thatD f ( p k q ) = X x ∈ supp q q ( x ) f (cid:18) p ( x ) q ( x ) (cid:19) + X x ∈X / supp q p ( x ) lim ε ↓ ε f (cid:18) ε (cid:19) . Remark 2.2
Though p and q have to be probability for D f to have operationalmeanings, we extend the domain of D f to pairs of positive finite measures onfinite set for the sake of mathematical convenience. Observe also g f is in addition positively homogeneous, or ∀ a ≥ , g f ( as, at ) = ag f ( s, t ) . Since it is positively homogeneous, proper, lower semicontinuous and convex,by Corollary 13.5.1 of [23], it is the pointwise supremum of linear functions, g f ( s, t ) = sup ( w ,w ) ∈W f w s + w t, (2.2)where the set W is convex and unbounded from below.Therefore,D f ( p k q ) = sup ( X x ∈X w ( x ) p ( x ) + w ( x ) q ( x ); ( w ( x ) , w ( x )) ∈ W f ) . (2.3)This in turn shows, by Corollary 13.5.1 of [23], D f is positively homogeneous,proper, lower semicontinuous and convex. Remark 2.3 (2.3) indicates (1.1).To see this, use W f as a decision space. If f satisfies (FC), the functionˆ f ( r ) := g f ( r,
1) (2.4)5lso satisfies (FC) and g ˆ f ( t, s ) = g f ( s, t ). This identity impliesD f ( p || q ) = D ˆ f ( q || p ) . (2.5)Also, lim ε ↓ εf (1 /ε ) = ˆ f (0) , f (0) = lim ε ↓ ε ˆ f (1 /ε ) . (2.6)Introduction of ˆ f often simplifies the argument, allowing to switch the first andthe second variables. f - divergence In this section we define maximal f - divergence D max f as the solution to anoperationally defined minimization problem.A reverse test of a pair { ρ, σ } of positive definite operators is a triple(Γ , { p, q } ). Here, Γ is a trace preserving positive linear map from positive mea-sures over some a finite set X (or commutative algebra with dimension |X | ) toHermitian operators, and p and q are positive measures over X , withΓ ( p ) = ρ, Γ ( q ) = σ. (Note Γ is necessarily completely positive.)For a function f satisfying above (FC), we define maximal f -divergence D max f ( ρ k σ ) = inf (Γ , { p,q } ) D f ( p k q ) , (3.1)where the infimum is taken over all the reverse tests. The name comes from thefact that D max f ( ρ k σ ) is the largest quantum version of D f ( p k q ); here, quantumversion of D f ( p k q ) is any D Qf ( ρ k σ ) such that (D1) D Qf (Λ ( ρ ) k Λ( σ )) ≤ D Qf ( ρ k σ ) holds for any completely positive trace pre-serving (CPTP) map, any density operators ρ , σ on finite dimensionalHilbert spaces. (D2) D Qf ( p k q ) = D f ( p k q ) for any probability distributions p , q over any finitesets.Here p is identified with P x ∈X p ( x ) | e x i h e x | , for example, where {| e x i ; x ∈ X } is a CONS. Choice of a particular CONS is not important, sinceD Qf (cid:0) U ρU † k U σU † (cid:1) = D Qf ( ρ k σ )for any unitary operator U due to (D1).We also consider the following stronger condition. (D1’) D Qf (Λ ( ρ ) || Λ( σ )) ≤ D Qf ( ρ || σ ) holds for any trace preserving positive mapΛ, any any density operators ρ , σ , on finite dimensional Hilbert spaces.6 emma 3.1 If (FC) is satisfied, D max f satisfies above (D1), (D1’) and (D2).Also, if a two point functional D Qf satisfies satisfies both of (D1) and (D2), orboth of (D1’) and (D2), D Qf ( ρ k σ ) ≤ D max f ( ρ k σ ) . Proof.
Let Λ be a trace preserving positive map. Then,D max f (Λ ( ρ ) k Λ( σ ))= inf (Γ , { p,q } ) { D f ( p k q ); (Γ , { p, q } ) : a reverse test of { Λ ( ρ ) , Λ( σ ) }}≤ inf (Γ , { p,q } ) { D f ( p k q ); Γ = Γ ′ ◦ Λ , (Γ ′ , { p, q } ) : a reverse test of { ρ, σ }} = D max f ( ρ k σ ) . Hence, D max f satisfies (D1’), and thus (D1) also. Also,D max f ( p k q ) = inf { D f ( p ′ k q ′ ) ; p = Γ ( p ′ ) , q = Γ ( q ′ ) , Γ: stochastic map }≥ inf { D f (Γ ( p ′ ) k Γ ( q ′ )) ; p = Γ ( p ′ ) , q = Γ ( q ′ ) , Γ: stochastic map } = D f ( p k q ) . Since the opposite inequality is trivial, we have D max f ( p k q ) = D f ( p k q ). Thus,D max f satisfies (D2).Suppose D Qf satisfies (D1) (or (D1’)) and (D2), and let (Γ , { p, q } ) be a reversetest of { ρ, σ } . Then,D Qf ( ρ k σ ) = D Qf (Γ ( p ) k Γ ( q )) ≤ D Qf ( p k q ) = D f ( p k q ) . Therefore, taking infimum over all the reverse tests of { ρ, σ } , we have D Qf ( ρ k σ ) ≤ D max f ( ρ k σ ). X In defining reverse tests, we had assumed the cardinality of X , where { p, q } are defined, is finite for mathematical simplicity. But, this restriction is notessential as long as dim H < ∞ , since Caratheodory’s theorem puts a naturalupper bound to the size of X .Denote by δ x the delta distribution at x , and define r x := p ( x ) /q ( x ) , if x ∈ supp q, X x ∈X q ( x )Γ( δ x ) = σ, X x ∈X q ( x ) r x Γ( δ x ) = ρ − X x : q ( x )=0 p ( x )Γ( δ x ) , X x ∈X q ( x ) f ( r x ) = D f ( p k q ) − ˆ f (0) X x : q ( x )=0 p ( x ) . Since P x ∈X q ( x ) < ∞ , by Caratheodory’s theorem, there is a positive finitemeasure q such that P x ∈X q ( x ) = P x ∈X q ( x ), X x ∈X q ( x )Γ( δ x ) = X x ∈X q ( x )Γ( δ x ) , X x ∈X q ( x ) r x Γ( δ x ) = X x ∈X q ( x ) r x Γ( δ x ) , X x ∈X q ( x ) f ( r x ) = X x ∈X q ( x ) f ( r x ) , and supp q ⊂ supp q , | supp q | ≤ (dim H ) + (dim H ) + 1 + 1. Thus, defining X , p , and Γ by X := supp q ∪ { x } ,p ( x ) := (cid:26) r x q ( x ) , if x ∈ supp q, P x : q ( x )=0 p ( x ) , if x = x , Γ( δ x ) := (cid:26) Γ( δ x ) , if x ∈ supp q, P x : q ( x )=0 Γ( δ x ) , if x = x we have: Lemma 4.1
To each given test (Γ , { p, q } ) of { ρ, σ } , there is a reverse test (cid:0) Γ , { p, q } (cid:1) such that (i) D f ( p k q ) = D f ( p k q ) and (ii) p and q are defined overthe set X with (cid:12)(cid:12) X (cid:12)(cid:12) ≤ H ) + 3 . Also, X =supp q ∪ { x } . Without loss of generality, we suppose X =supp q ∪ { x } , and define S x := (cid:26) q ( x )Γ( δ x ) , if x = x , ,p ( x )Γ( δ ) , if x = x . Then it should satisfy X x ∈X \{ x } r x S x + S x = ρ, X x ∈X \{ x } S x = σ. (4.1)8onversely, to each such { S x , r x ; x ∈ X } , there corresponds a reverse test withΓ( δ x ) = S x S x and { p ( x ) , q ( x ) } = (cid:26) { r x tr S x , tr S x } , if x = x , { tr S x , } , if x = x .So { S x , r x ; x ∈ X } is a bijective representation of a reverse test. By this repre-sentation, D max f is the solution to the optimization problemD max f ( ρ k σ ) = inf X x ∈X \{ x } f ( r x )tr S x + ˆ f (0)tr S x ; S x with (4.1) . (4.2)Observe this optimization can be done in the two stages; Fixing S x , orequivalently ρ ∗ := ρ − S x = X x ∈X \{ x } r x tr S x ≤ ρ, (4.3)optimize { S x ; x ∈ X \{ x }} to minimize X x ∈X \{ x } f ( r x )tr S x = X x :supp q ( x ) q ( x ) f (cid:18) p ( x ) q ( x ) (cid:19) = D f ( p k q ) , where ˜ p is restriction of p to supp q . Since (Γ , { ˜ p, q } ) is a reverse test of { ρ ∗ , σ } ,the minimum of D f (˜ p k q ) equals D max f ( ρ ∗ k σ ) . After this is done, we optimize ρ ∗ to minimizeD max f ( ρ ∗ k σ ) + ˆ f (0)tr S x = D max f ( ρ ∗ k σ ) + ˆ f (0)tr ( ρ − ρ ∗ ) . Note that supp ρ ∗ ⊂ supp σ holds by (4.1) and (4.3), and 0 ≤ ρ ∗ ≤ ρ by itsdefinition (4.3). Thus,D max f ( ρ k σ ) = inf n D max f ( ρ ∗ k σ ) + ˆ f (0)tr ( ρ − ρ ∗ ) ; 0 ≤ ρ ∗ ≤ ρ, supp ρ ∗ ⊂ supp σ o (4.4)Here, introduce the operator˜ ρ := ρ − ρ ρ − ρ , (4.5)where ρ = (cid:20) ρ ρ ρ ρ (cid:21) , σ = (cid:20) σ
00 0 (cid:21) . Lemma 4.2
Suppose ρ ∗ ≥ is supported on supp σ and ρ ∗ ≤ ρ . Then, ˜ ρ ≥ ρ ∗ . Also, ≤ ˜ ρ ≤ ρ and supp ˜ ρ ⊂ supp σ . Proof.
By Proposition 10.5, ˜ ρ ≥
0. (in fact, ˜ ρ − = π σ ρ − π σ .) ˜ ρ ≤ ρ andsupp ˜ ρ ⊂ supp σ are obvious by definition.9ince ρ ∗ ≤ ρ , (cid:20) ρ − ρ ∗ ρ ρ ρ (cid:21) ≥ , Therefore, by Proposition 10.5, we should have ρ − ρ ∗ ≥ ρ ρ − ρ , or equiv-alently, ˜ ρ = ρ − ρ ρ − ρ ≥ ρ ∗ . By Lemma 4.2, (4.4) can be rewritten as follows:D max f ( ρ k σ ) = inf n D max f ( ρ ∗ k σ ) + ˆ f (0)tr ( ρ − ρ ∗ ) o = inf n D max f ( ρ ∗ k σ ) + ˆ f (0)tr (˜ ρ − ρ ∗ ) o + ˆ f (0)tr ( ρ − ˜ ρ )= D max f (˜ ρ k σ ) + ˆ f (0)tr ( ρ − ˜ ρ ) , (4.6)where ρ ∗ moves all the operators with ρ ≥ ρ ∗ ≥ ρ ∗ ⊂ supp σ , orequivalently, ˜ ρ ≥ ρ ∗ ≥ ρ ∗ ⊂ supp σ . To list all the reverse tests, commutative Radon-Nikodym derivative is useful.Given ρ ∗ ≥ ρ ∗ ⊂ sppp σ , the commutative Radon-Nikodym deriva-tive with respect to σ is defined by d ( ρ ∗ , σ ) := σ − / ρ ∗ σ − / . (4.7)Suppose ρ ∗ ≤ ρ and let { M x } be a resolution of identity into positive operatorswith d ( ρ ∗ , σ ) = X x ∈X \{ x } r x M x , X x ∈X M x = . Then S x = (cid:26) σ / M x σ / , if x = x , ρ − ρ ∗ ) ( ρ − ρ ∗ ) , if x = x . Therefore, { q ( x ) , p ( x ) } = (cid:26) { tr σM x , r x q x } , if x = x , { , tr ( ρ − ρ ∗ ) } , if x = x , (4.8)and Γ( δ x ) := ( q ( x ) σ / M x σ / , if x = x , ρ − ρ ∗ ) ( ρ − ρ ∗ ) , if x = x . (4.9)Thus, { M x , r x } and ρ ∗ specifies a reverse test.When M x ’s are projectors and ρ ∗ = ˜ ρ , we say the corresponding reversetest is minimal . The minimal reverse test turns out to be optimal under certainnatural conditions (namely, the condition (F) in Section 3) on f .10 Properties of D max f Theorem 5.1
When f satisfies (FC), D max f has the following properties.(i) D max f is jointly convex: if ρ = P i c i ρ i , σ = P i c i σ i , P i c i = 1 ( c i ≥ ), D max f ( ρ k σ ) ≤ X i c i D max f ( ρ i k σ i ) , (5.1) (ii) If f (0) = 0 in addition, it is monotone decreasing in the second argu-ment: D max f ( ρ k X ) ≤ D max f ( ρ k σ ) , X ≥ σ (5.2) (iii) D max f is positively homogeneous. D max f ( cρ k cσ ) = c D max f ( ρ k σ ) , c ≥ . (5.3) In particular, D max f (0 k
0) = 0 . (5.4) (iv) Direct sum property: D max f ( ρ ⊕ ρ k σ ⊕ ρ ) = D max f ( ρ k σ ) + D max f ( ρ k σ ) , (5.5) where ρ i , σ i are supported on H i ( i = 0 , ), and H ⊥ H . Proof. (i): let (Γ i , { p i , q i } ) be a reverse tests of { ρ i , σ i } , where p i , q i are positivemeasures over the finite set X i . Then ‘mixture’ of these reverse tests withprobability c i , compose a reverse test (Γ , { p, q } ) of { ρ, σ } : let X = S i X i , anddefine p i ( x ) := c i p , q i ( x ) := c i q i ( x ) , Γ( δ x ) = Γ i ( δ x ) , ( x ∈ X i ) . Then, D f ( p k q ) = X i c i D f ( p i k q i ) . Therefore, minimizing over all the reverse tests of { ρ, σ } , we obtain (5.1).(ii): let X ′ := X − σ ≥
0, and define X = supp q ∪ supp p and X ∩ X = ∅ .Then D max f ( ρ k X ) = D max f ( ρ k σ + X ′ ) ≤ inf { D f ( p k q + q ′ ) ; Γ ( p ) = ρ, Γ ( q ) = σ, Γ ( q ′ ) = X, supp q ′ = X } = inf { D f ( p k q ) ; Γ ( p ) = ρ, Γ ( q ) = σ, Γ ( q ′ ) = X, supp q ′ = X } = D max f ( ρ k σ ) , where the identity in the third line is due to: since X ∩ X = ∅ and f (0) = 0, X x ∈X ∪X g f ( p ( x ) , q ( x ) + q ′ ( x )) = X x ∈X g f ( p ( x ) , q ( x )) + X x ∈X g f (0 , q ′ ( x ))= X x ∈X g f ( p ( x ) , q ( x ))11iii): Let c >
0. Then to each reverse test (Γ , { p, q } ) of { ρ, σ } , correspondsthe reverse test (Γ , { cp, cq } ) test of { cρ, cσ } , and vice versa. Hence, due to thefact that D f positively homogeneous, we have the identity. When c = 0, weonly have to show the LHS is 0. In fact, if (Γ , { p, q } ) is an arbitrary reverse testof { , } , supp p and supp q are empty, and D f ( p k q ) = 0. Thus D max f (0 k
0) = 0 .(iv): ” ≤ ” is trivial. Thus, we show ” ≥ ”. Let (Γ , { p, q } ) be a reverse test of { ρ, σ } = { ρ ⊕ ρ , σ ⊕ σ } , and defineΓ i ( δ x ) := 1tr π H i Γ( δ x ) π H i Γ( δ x ) π H i ,p i ( x ) := p ( x )tr π H i Γ( δ x ) , q i ( x ) := p ( x )tr π H i Γ( δ x ) . Then (Γ i , { p i , q i } ) is a reverse test of { ρ i , σ i } ( i = 0 , g f ispositively homogeneous,D f ( p k q ) + D f ( p k q )= X i =0 , X x ∈X g f ( p ( x )tr π H i Γ( δ x ) , q ( x )tr π H i Γ( δ x )) = X x ∈X X i =0 , tr π H i Γ( δ x ) g f ( p ( x ) , q ( x ))= X x ∈X g f ( p ( x ) , q ( x )) = D f ( p k q ) . Thus,inf D f ( p k q ) = inf { D f ( p k q ) + D f ( p k q ) } ≥ inf D f ( p k q ) + inf D f ( p k q ) , which leads to the asserted inequality. After all, we have (5.5). Lemma 5.2
Suppose f satisfies (FC). Then the convex function ( ρ, σ ) → D max f ( ρ || σ ) is proper. Thus, it is nowhere −∞ . Proof.
An improper convex function is necessarily infinite except perhaps atrelative boundary points of its effective domain (Theorem 7.2 of [23]). ButD max f ( p k p ) = D f ( p k p ) = X x ∈X p ( x ) f (1)is finite. Thus D max f cannot be improper. Theorem 5.3
Suppose f satisfies (FC). Then D max f ( ρ || σ ) < ∞ only in thefollowing four cases.(i) ˆ f (0) < ∞ and f (0) < ∞ ;(ii) ˆ f (0) < ∞ , f (0) = ∞ , and supp ρ ⊃ supp σ ;(iii) ˆ f (0) = ∞ , f (0) < ∞ , and supp ρ ⊂ supp σ ;(iv) ˆ f (0) = ∞ , f (0) = ∞ , and supp ρ = supp σ . Proof.
In all the cases, if (Γ , { p, q } ) is the minimal reverse test of { ρ, σ } ,D f ( p || q ) < ∞ . Thus below we show D max f ( ρ || σ ) = ∞ in the case where theseconditions are not true. 12uppose ˆ f (0) = ∞ and supp ρ supp σ . Then by (4.6) D max f ( ρ k σ ) = ∞ ,since D max f (˜ ρ k σ ) ≥ D max ar + b (˜ ρ k σ ) = a tr ˜ ρ + b tr σ > −∞ , where a , b is chosen so that f ( r ) ≥ ar + b , r ≥
0. Suppose f (0) = ∞ andsupp ρ supp σ . Then D max f ( ρ k σ ) = ∞ is concluded by replacing f by ˆ f in theabove argument. f is operator convex In this section, we suppose that f is operator convex and f (0) = 0 in additionto satisfying (FC): (F) f is proper, lower semicontinuous, and operator convex. In addition,dom f ( x ) = [0 , ∞ ) and f (0) = 0.If this is true and ˆ f (0) < ∞ , by Proposition 10.3, f ( r ) = ˆ f (0) r + f ( r ) , (6.1)where f ( r ) satisfies (F) and is operator monotone decreasing.When supp σ ⊃ supp ρ , by the correspondence (4.8) and (4.9),D max f ( ρ k σ ) = inf { M x } , { r x } ∗ ( X x ∈X f ( r x ) tr σM x ; X x ∈X r x M x = d ( ρ, σ ) , X x ∈X M x = ) (6.2)Here we use Naimark extension. Denoting the extended space by H ′ , and letting V be an isometry from H (,where ρ , σ , etc. are living in) into H ′ , there is a tupleof mutually orthogonal projectors { E x } in H ′ with V E x V † = M x . Therefore, X x ∈X f ( r x ) tr σM x = tr σV f X x ∈X r x E x ! V † ≥ tr σf X x ∈X r x V E x V † ! = tr σf X x ∈X r x M x ! = tr σf ( d ( ρ, σ )) , where the inequality in the second line is by Jensen’s inequality, Proposition 10.2(Note X → V XV † is a positive unital map into B ( H )). The identity is true if M x ’s are mutually orthogonal projectors and H ′ = H , i.e., if the reverse test isminimal. Thus, D max f ( ρ k σ ) = tr σf ( d ( ρ, σ ))and the identity is achieved by the minimal test.Next, suppose supp σ supp ρ . If ˆ f (0) = ∞ , by Theorem 5.3, D max f ( ρ k σ ) = ∞ . If ˆ f (0) < ∞ , we can apply (4.6). After all:13 heorem 6.1 D max f ( ρ || σ ) = tr σ f ( d (˜ ρ, σ )) + ˆ f (0)tr ( ρ − ˜ ρ ) (6.3)= tr σ f (cid:0) σ − ˜ ρ (cid:1) + ˆ f (0)tr ( ρ − ˜ ρ ) holds if (F) is true. (If ˆ f (0) = ∞ , both ends are ∞ and the identity holds.) Theminimum is achieved by the minimal reverse test of { ρ, σ } . Throughout this subsection, we suppose supp σ ⊃ supp ρ . With f KL ( r ) := r log r, D max f KL ( ρ || σ ) = tr σσ − ρ (cid:0) log σ − ρ (cid:1) = tr ρ log σ − ρ = tr ρ log ρ / σ − ρ / . This quantity, corresponding to Kullback-Leibler divergence, had been studiedby various authors [4][9][13][10][19]. The relation to the reverse test problem isfirst pointed out by [16].Define f α ( r ) := ( ± ) r α , where the sign is chosen so that the function isconvex on the positive half - line. This is operator convex if α ∈ [ − , / { } ,and D max f α ( ρ || σ ) = tr σf α (cid:0) σ − ρ (cid:1) . One can check the following identity, which is a special case of D max f = D maxˆ f :D max f α ( ρ || σ ) = D max f − α ( σ || ρ ) . When (F) is true, the following operator valued quantity g f ( ρ, σ ), called non-commutative perspective [6][7], satisfies tr g f ( ρ, σ ) = D max f ( ρ k σ ) and some oper-ator version of properties of D max f has: g f ( ρ, σ ):= √ σf (cid:0) σ − / ρσ − / (cid:1) √ σ, if supp σ ⊃ supp ρ,g f (˜ ρ, σ ) + ˆ f (0)( ρ − ˜ ρ ) , if supp σ supp ρ , ˆ f (0) < ∞ , undefined, otherwise.In the second case, since ˆ f (0) < ∞ , by (6.1), g f ( ρ, σ ) = g f (˜ ρ, σ ) + ˆ f (0) ( ρ − ˜ ρ ) = g f (˜ ρ, σ ) + ˆ f (0) ρ = inf n g f ( ρ ∗ , X ) + ˆ f (0) ρ ; 0 ≤ ρ ∗ ≤ ρ, supp X ⊃ supp ρ ∗ o . (6.4)14 emark 6.2 In [6][7], they define g f ( ρ, σ ) only for the case where supp σ ⊃ supp ρ , and proves various properties of the quantity including ones presentedbelow. The most important one, which is used later, is operator version of (D1):
Lemma 6.3 (i) For any positive trace preserving map Λ , we have Λ ( g f ( ρ, σ )) ≥ g f (Λ ( ρ ) , Λ( σ )) . (6.5) (ii) If g f (Λ ( ρ ) , Λ( σ )) = Λ ( g f ( ρ, σ )) , then Λ(˜ ρ ) is the largest positive oper-ator supported on Λ( σ ) and majorized by Λ ( ρ ) . Thus, Λ ( g f (˜ ρ, σ )) = g f (Λ(˜ ρ ) , Λ( σ )) . (6.6) Proof.
For a given positive trace preserving map Λ, defineΛ σ ( X ) := { Λ( σ ) } − / Λ (cid:16) σ / Xσ / (cid:17) { Λ( σ ) } − / , (6.7)which is a positive unital map into B (supp Λ( σ )):Λ σ ( ) = π Λ( σ ) , (6.8)and Λ σ ( d ( ρ, σ )) = d (Λ( ρ ) , Λ( σ )) . (6.9)If supp σ ⊃ supp ρ , since Λ is positive, Λ( ρ ) is supported on supp Λ( σ ) and d (Λ( ρ ) , Λ( σ )) exists. Also, g f (Λ( ρ ) , Λ( σ )) = ( a ) Λ( σ ) / f (Λ σ ( d ( ρ, σ ) ) ) Λ( σ ) / ≤ ( b ) Λ( σ ) / Λ σ ( f ( d ( ρ, σ ) ) ) Λ( σ ) / = Λ (cid:16) σ / f ( d ( ρ, σ )) σ / (cid:17) , (6.10)where (a) and (b) is by (6.9) and Proposition 10.2, respectively. If supp σ supp ρ and ˆ f (0) < ∞ , g f (Λ( ρ ) , Λ( σ ))= inf ρ ∗ n g f ( ρ ∗ , Λ( σ )) + ˆ f (0)Λ( ρ ); Λ( ρ ) ≥ ρ ∗ ≥ , supp Λ( σ ) ⊃ supp ρ ∗ o ≤ g f (Λ(˜ ρ ) , Λ( σ )) + ˆ f (0)Λ ( ρ ) ≤ Λ ( g f (˜ ρ, σ )) + ˆ f (0)Λ ( ρ )= Λ (cid:16) g f (˜ ρ, σ ) + ˆ f (0) ρ (cid:17) = Λ ( g f ( ρ, σ )) , where the inequality in the fourth line is by (6.10) (Recall ˜ ρ as of (4.5) issupported on supp σ ). 15herefore, if g f (Λ( ρ ) , Λ( σ )) = Λ ( g f ( ρ, σ )), Λ(˜ ρ ) should achieve the infimumin the second line. Thus the first statement of (ii) is true. Then, g f (Λ( ρ ) , Λ( σ )) = g f (Λ(˜ ρ ) , Λ( σ )) + ˆ f (0)Λ ( ρ − ˜ ρ ) . Equating this to Λ ( g f ( ρ, σ )) = Λ ( g f (˜ ρ, σ )) + ˆ f (0)Λ(˜ ρ ), we have (6.6).Operator versions of (5.3) and (5.5) are trivial. Thus next we show theoperator versions of (5.1): g f ( ρ, σ ) ≤ X i c i g f ( ρ i , σ i ) , (6.11)where ρ := P i c i ρ i , σ := P i c i σ i , P i c i = 1, and c i ≥
0. (6.11) for the casesupp σ ⊃ supp ρ is known [6]. if ˆ f (0) < ∞ , X i c i g f ( ρ i , σ i ) = X i c i g f (˜ ρ i , σ i ) + ˆ f (0) X i c i ( ρ i − ˜ ρ i ) ≥ g f X i c i ˜ ρ i , X i c i σ i ! + ˆ f (0) X i c i ρ i Above, since supp σ = span { S supp σ i } , P i c i ˜ ρ i is supported on supp σ . Thusthe last end is well-defined. Also, P i c i ˜ ρ i ≤ P i c i ρ i = ρ . Thus, The last end isbounded from below byinf n g f ( ρ ∗ , σ ) + ˆ f (0) ρ ; ρ ≥ ρ ∗ ≥ , supp σ ⊃ supp ρ ∗ o = g f (˜ ρ, σ ) + ˆ f (0) ρ = g f ( ρ, σ ) , concluding (6.11).Lastly, the analogue of (5.2) is g f ( ρ, σ ) ≤ g f ( ρ, X ) , X ≥ σ. (6.12)This is proved as follows. Since X ≥ σ , C := σ / X − / satisfies CX / = σ / , k C k ≤ . If supp X ⊃ supp σ ⊃ supp ρ , g f ( ρ, σ ) = X / C † f ( d ( ρ, σ )) X / C ≥ X / f (cid:0) C † d ( ρ, σ ) C (cid:1) X / = X / f (cid:16) X − / ρ X − / (cid:17) X / = g f ( ρ, X ) , where the inequality in the second line is due to Proposition 10.1. If ˆ f (0) < ∞ and supp σ supp ρ , g f ( ρ, σ ) = g f (˜ ρ, σ ) + ˆ f (0) ρ ≥ g f (˜ ρ, X ) + ˆ f (0) ρ ≥ inf n g f ( ρ ∗ , X ) + ˆ f (0) ρ ; 0 ≤ ρ ∗ ≤ ρ, supp X ⊃ supp ρ ∗ o = g f ( ρ, X ) . emark 6.4 Here, ˜ ρ is not necessarily the largest element of the set { ρ ∗ ; 0 ≤ ρ ∗ ≤ ρ, supp X ⊃ supp ρ ∗ } . Thus, in general, the equality between the second and the third line does nothold. D max f is closely related to RLD Fisher metric J Rρ ( X, Y ) := tr Xρ − Y, where X and Y are self - adjoint operators living in the support of ρ > X = tr Y = 0. This quantity plays important role in quantum statisticalestimation theory [14], and is the largest monotone metric on the space of densityoperators [22]. Also, this quantity is the solution to infinitesimal version ofreverse test [16]; The triple (Γ , p, v ) of the positive trace preserving map Γ frompositive measures to self - adjoint operators, the probability distribution over thefinite set X , and the real valued function over X with P x ∈X v ( x ), is said to be reverse estimation of { ρ, X } iffΓ ( p ) = ρ, Γ ( v ) = X. Then J Rρ ( X, X ) = inf ( X x ∈X { v ( x ) } p ( x ) ; (Γ , p, v ) is a reverse estimation of { ρ, X } ) . Here, the function minimized is called
Fisher information and plays significantroll in point estimation.This problem reduces to our reverse test problem of { ρ + εX, ρ } , where ε ischosen so that ρ + εX ≥
0, and f ( r ) = (1 − r ) . Since f ( r ) − , { p + εv, p } ) is the minimal reverse test of { ρ + εX, ρ } .Below, we prove f ′′ (1) J Rρ ( X, X ) , = d d ε D max f ( ρ + εX k ρ ) (cid:12)(cid:12)(cid:12)(cid:12) ε =0 = d d ε D max f ( ρ k ρ + εX ) (cid:12)(cid:12)(cid:12)(cid:12) ε =0 (6.13)when f ′′′ exists and uniformly bounded in the sense that | f ′′′ ( x ) | < c, ∃ ε > ∀ x ∈ (1 − ε, ε ) . (Differentiating (6.3) twice, one can also obtain (6.13).The key observation is: The minimal reverse test of { ρ + εX, ρ } are the samefor all ε > ρ + εX ≥ , Therefore, differentiating the both sides ofD max f ( ρ + εX || ρ ) = D f ( p + εv || p )17wice, well-known relation between Fisher information and f -divergence leads tothe first identity. The second identity follows from Corollary 6.10 (which will beshown later) that states the minimal reverse test of { ρ + εX, ρ } and { ρ, ρ + εX } are identical. In this section, we show that any optimal reverse test is essentially identical tothe minimal reverse test, provided that (F) is satisfied. First, we show sometechnical lemmas. (They themselves are of interest. We show another applica-tion of them in the next section)A function f with (F), by Theorem 8.1 of [12], is written as f ( r ) = cr + br + Z (0 , ∞ ) (cid:18) r λ + ψ λ ( r ) (cid:19) d µ ( λ ) , (6.14)where c is a real number, b > µ is a positive Borel measure with R (0 , ∞ ) d µ ( λ )(1+ λ ) < ∞ , and ψ λ ( r ) : = − rλ + r .In what follows, we suppose[0 , ∞ ) ⊂ supp µ ∪ { } , (6.15)where supp µ is the set of all points r having property that µ ( U ) > U containing λ (see Theorem 2.2.1 and Definition 2.2.1 of [21].) r ln r ,( ± r α ( − ≤ α ≤ α = 1) satisfies (6.15) (see Example 8.3, [12]). Lemma 6.5
Suppose f satisfies (F) and (6.15). Suppose also D max f ( ρ || σ ) < ∞ .Let Λ be a positive trace preserving map. Then, D max f ( ρ || σ ) = D max f (Λ ( ρ ) || Λ( σ )) (6.16) implies Λ σ ( h ( d (˜ ρ, σ ))) = h (Λ σ ( d (˜ ρ, σ ))) . (6.17) Here, Λ σ is a subunital positive map defined by (6.7), and h is an arbitraryfunction on [0 , ∞ ) .Conversely, if (6.17) holds, (6.16) holds for any f with (F). In fact, for anyfunction h on [0 , ∞ ) , tr σh ( d (˜ ρ, σ )) = tr Λ( σ ) h ( d (Λ(˜ ρ ) , Λ( σ ))) . (6.18) Proof.
By (6.5), (6.16) and D max f ( ρ || σ ) < ∞ impliesΛ ( g f ( ρ, σ )) = g f (Λ( ρ ) , Λ( σ )) . (6.19)First, suppose supp ρ ⊂ supp σ . Observe, by (6.14), g f ( ρ, σ ) = cρ + b g f ( ρ, σ ) + Z (0 , ∞ ) (cid:18) ρ λ + g ψ λ ( ρ, σ ) (cid:19) d µ ( λ ) (6.20)18here f ( r ) := r . Since f and ψ λ ( r ) = − rλ + r satisfies (F), (6.5). (6.19) and(6.15) lead to Λ ( g ψ t ( ρ, σ )) = g ψ t (Λ ( ρ ) , Λ( σ )) , ∀ t > . This, by Proposition 10.4, impliesΛ ( g h ( ρ, σ )) = g h (Λ ( ρ ) , Λ( σ )) , (6.21)which, using (6.9), implies (6.17).Next, suppose supp ρ supp σ . Then for D max f ( ρ || σ ) < ∞ to be true,ˆ f (0) < ∞ should hold. Therefore, by (6.6),Λ ( g f (˜ ρ, σ )) = g f (Λ(˜ ρ ) , Λ( σ )) . Therefore, using the parallel argument as above, we have (6.17).The second assertion of the theorem is proved by straightforward computa-tion.
Lemma 6.6
Suppose f satisfies (F) and (6.15). Suppose also D max f ( ρ || σ ) < ∞ .Let Λ be a positive trace preserving map. Also, let (Γ , { p, q } ) and (Γ ′ , { p ′ , q ′ } ) bethe minimal reverse test of { ρ, σ } and { Λ ( ρ ) , Λ( σ ) } , respectively. Then, (6.16)holds iff Λ (Γ( δ x )) = Γ ′ ( δ x ) , { p, q } = { p ′ , q ′ } . (6.22) Proof.
Since ‘if’ is trivial, we prove ‘only if’. Recall the minimal reverse testis given by Γ( δ x ) = 1tr ( ρ − ˜ ρ ) ( ρ − ˜ ρ ) , Γ( δ x ) = 1tr σP x √ σP x √ σ, ( x = x ) , where d (˜ ρ, σ ) = P x d x P x , where d x and P x is an eigenvalue and projection ontoeigenspace, respectively.Let P ′ x := Λ σ ( P x ), and applying (6.17) with h ( r ) := (cid:26) , ( r = d x ) , , otherwise . we have P ′ x = Λ σ ( P x ) = Λ σ ( h ( d (˜ ρ, σ ))) = h (Λ σ ( d (˜ ρ, σ ))) . Since eigenvalues of h (Λ σ ( d )) are either 0 or 1, P ′ x is a projector. Since d (Λ(˜ ρ ) , Λ( σ )) = Λ σ ( d (˜ ρ, σ )) = X x d x Λ σ ( P x ) = X x d x P ′ x ,P ′ x s are the projectors onto the eigenspaces of d ′ , and spec d = spec d ′ . There-fore, if x = x ,Λ (Γ( δ x )) = Λ (cid:0) √ σP x √ σ (cid:1) = p Λ( σ )Λ σ ( P x ) p Λ( σ )= p Λ( σ ) P ′ x p Λ( σ ) = Γ ′ ( δ x ) . x = x ,Γ ′ ( δ x ) = 1tr Λ ( ρ − ˜ ρ ) Λ ( ρ − ˜ ρ )= Λ (cid:18) ρ − ˜ ρ ) ( ρ − ˜ ρ ) (cid:19) = Λ (Γ ( δ x )) . Therefore, if (Γ ′ , { p ′ , q ′ } ) is the minimal reverse test of { Λ( ρ ) , Λ( σ ) } , Γ ′ = Λ ◦ Γ.Having specified the map Γ ′ , next we specify { p ′ , q ′ } . For (Λ ◦ Γ , { p ′ , q ′ } ) tobe a reverse test of { Λ( ρ ) , Λ( σ ) } , X x = x q ′ ( x )Λ ◦ Γ( δ x ) = Λ( σ ) = X x = x q ( x )Λ ◦ Γ( δ x ) . Thus we have to have, for all x = x , X x q ′ ( x ) p Λ( σ ) P ′ x p Λ( σ ) = X x q ( x ) p Λ( σ ) P ′ x p Λ( σ ) . Since P ′ x is supported on supp d ′ = supp Λ( σ ), this is equivalent to X x q ′ ( x ) P ′ x = X x q ( x ) P ′ x . Since P ′ x s are orthogonal projectors, we have q ′ = q .In the same way, we can prove that p ′ ( x ) = p ( x ), for all x = x . Then bytrace preserving nature of Λ, obviously p ′ ( x ) = p ( x ), concluding p ′ = p . Thuswe have the assertion.In the following, we extend the notion of the minimal reverse test to thepair { p, q } of positive measures, by identifying p with the diagonal matrix, P x p ( x ) | e x i h e x | , where {| e x i} is a CONS. Let (Υ , { p , q } ) be the minimalreverse test of { p, q } , where { p , q } are positive measures over the finite set Y . Then Υ is in fact stochastic map, but at the same time viewed as thepositive trace preserving map sending diagonal density matrices to diagonaldensity matrices.With this correspondence, the equivalence ˜ p of ˜ ρ (see (4.5) ) is in fact re-striction of p to supp q , and d (˜ p, q ) = X x ∈ supp q p ( x ) q ( x ) | e x i h e x | = X y ∈Y\{ y } r y P y , supp P y = span {| e x i ; p ( x ) /q ( x ) = r y } . Viewing Υ as a stochastic map, y ( = y ) is mapped to x iff p ( x ) /q ( x ) = r y ,and y is mapped to x ∈ (supp q ) c . (Detailed form of the transition probabilityis not relevant now.) 20 emma 6.7 Let (Υ , { p , q } ) be the minimal reverse test of { p, q } , where { p , q } are positive measures over the finite set Y . Then, there is a positivetrace preserving map Υ − that invert Υ , Υ − ( p ) = p , Υ − ( q ) = q . In addition, Υ − is deterministic. Therefore, Υ − ◦ Υ ( δ y ) = δ y . Proof. Υ − corresponds to the following deterministic map from X to Y : x ∈ supp q is mapped to y iff p ( x ) /q ( x ) = r y , and x ∈ (supp q ) c is mapped to y . . Remark 6.8
In the statistician’s term, r y is likelihood ratio, and thus y isminimal sufficient statistic of the family { p, q } [25]. That roughly means y contains all the information about the family { p, q } , and the smallest one amongthose having the same property. Thus, { p , q } is a kind of ”compression” of { p, q } . In fact, the map from { p, q } to { p , q } is deterministic, while its inverseis noisy. Lemmas 6.6 and 6.7 indicate that the optimal reverse test is essentiallyunique.
Theorem 6.9
Suppose D max f ( ρ || σ ) < ∞ , where f is a function with (F), and µ defined by (6.14) satisfies(6.15). Let (Γ , { p, q } ) be an optimal reverse test, D max f ( ρ || σ ) = D f ( p || q ) . (6.23) If (Γ , { p , q } ) is the minimal reverse test of { ρ, σ } , there is a CPTP map Υ and Υ − with { p, q } Υ ⇆ Υ − { p , q } , (6.24) and Υ − is deterministic: Υ − ◦ Υ ( δ y ) = δ y .Therefore, Γ = Γ ◦ Υ . (6.25)Before proving this, let us see its implication. (6.25) intuitively means thatany optimal reverse test Γ differs from the minimal one only in its classicalpreprocessing (”essential uniqueness”). Proof.
Let (Υ , { p , q } ) be the minimal reverse test of { p, q } . Then, takingrecourse to Lemma 6.6, we have { p , q } = { p , q } .Therefore, p = Υ ( p ) , q = Υ ( q ) . Also, by Lemma 6.7, there is a positive trace preserving map Υ − with p = Υ − ( p ) , q = Υ − ( q ) , (6.26)and Υ − ◦ Υ ( δ y ) = δ y .The following simple statement is not easy to prove directly, but quite easyif the theorem is given. 21 orollary 6.10 The minimal reverse test of { ρ, σ } and { σ, ρ } are identical. Proof.
Let us consider an operator convex function f ( r ) = −√ r , that satisfies(F) and (6.15) (Example 8.3, [12]). Since ˆ f as of (2.4) satisfies ˆ f ( r ) = −√ r = f ( r ), by ( 2.6), we haveD max f ( σ k ρ ) = D maxˆ f ( ρ k σ ) = D max f ( ρ k σ ) . Thus the minimal reverse test (Γ , { p, q } ) of { σ, ρ } also achieves D max f ( ρ k σ ).Therefore, by Theorem 6.9 { p , q } = { Υ − ( p ) , Υ − ( q ) } , where (Γ , { p , q } ) isthe minimal reverse test of { ρ, σ } , and Υ − is a deterministic map. Exchangingthe role of { ρ, σ } and { σ, ρ } , there is a deterministic map Υ − with { p, q } = (cid:8) Υ − ( p ) , Υ − ( q ) (cid:9) . Therefore,Γ = Γ ◦ Υ − , Γ = Γ ◦ Υ − . Thus we have the assertion.
Corollary 6.11
Suppose D max f ( ρ || σ ) < ∞ , where f is a function with (F), and µ defined by (6.14) satisfies (6.15). Then if there is a measurement M takingvalues on the finite set Z such that D max f ( ρ || σ ) = D f (cid:0) P Mρ || P Mσ (cid:1) ,ρ and σ commute. Intuitively, this result seems trivial: It is not possible to retrieve classicalinformation imbedded in quantum states perfectly, unless the quantum statesare in fact commutative (classical). However, as later turns out, this result isgenerally not true if f is not operator convex. A counter example is f ( r ) = | − r | , and this corresponds to the total variation distance, one of the mostcommonly used distance measure. Proof.
Suppose (cid:8) P Mρ , P Mσ (cid:9) is a positive measures over the finite set Z .Let (Γ , { p, q } ) and (Υ , { p , q } ) be the minimal reverse test of { ρ, σ } and (cid:8) P Mρ , P Mσ (cid:9) , respectively. Apply Lemma 6.6 considering the measurement M as a positive linear map. Then we have { p , q } = { p, q } and P M Γ( δ x ) = Υ ( δ x ) (6.27)Since (cid:8) P Mρ , P Mσ (cid:9) are probability distributions, by Lemma 6.7, there is apositive trace preserving map Υ − with Υ − (Υ ( δ x )) = δ x . Composing them, weobtain Υ − (cid:16) P M Γ( δ x ) (cid:17) = δ x . The composition of the measurement M followed by the data processing Γ − canbe viewed as a measurement, to which POVM { ˜ M x } corresponds. Then, thiscan be rewritten as tr ˜ M x Γ( δ x ) = 1 , x ∈ X .22ince tr Γ( δ x ) = 1, this means that Γ( δ x )Γ ( δ x ′ ) = 0 ( x ′ = x ). Therefore, ρ = P x ∈X p ( x )Γ( δ x ) and σ = P x ∈X q ( x )Γ( δ x ) commute. Proposition 6.12 If f satisfies (F) and | f ′′′ ( x ) | < c, ∃ ε > ∀ x ∈ (1 − ε, ε ) .Let ρ ε := ρ + εX > . Then if that there is a measurement M ε for each ε takingvalues on the finite set Z such that D max f ( ρ k ρ ε ) = D f ( P M ε ρ ε k P M ε ρ ε ) , then ρ and X commute. Proof.
By (6.13) and by Section 9 of [18], this is equivalent to the existence ofthe measurement M with J Rρ ( X, X ) = J p M (cid:0) v M , v M (cid:1) , where the RHS is the Fisher information of p M := P Mρ , v M = ε (cid:0) P Mρ + εX − P Mρ (cid:1) .But this is impossible unless ρ and X commute (see, for example, [15]). Let {| ˆ ϕ x i ; x ∈ X } be family of linearly independent state vectors. Also let { τ x ; x ∈ X } be a family of density operators. The necessary and sufficient con-dition for the existence of CPTP map Λ withΛ ( τ x ) = | ˆ ϕ x i h ˆ ϕ x | , ∀ x ∈ X (6.28)have been studied by several authors. Especially, if τ x = | ϕ x i h ϕ x | , it is expressedin the following very simple form. ∃ A ≥ h ϕ x | ϕ x ′ i = A x,x ′ h ˆ ϕ x | ˆ ϕ x ′ i (see [5][26]).Here we show this is equivalent toΛ( ρ ) = ˆ ρ, Λ( σ ) = ˆ σ, (6.29)where ρ := X x p ( x ) τ x , σ := X x q ( x ) τ x , ˆ ρ := X x p ( x ) | ˆ ϕ x i h ˆ ϕ x | , ˆ σ := X x q ( x ) | ˆ ϕ x i h ˆ ϕ x | , and { p, q } is a probability distributions over X such that, with r x := p ( x ) /q ( x ), r x < ∞ , r x = r x ′ , ( x = x ′ ) . (6.30)That (6.28) implies (6.29) is trivial. To show the opposite implication, wetake recourse to Lemma 6.6. 23irst, observe (Γ , { p, q } ) and (cid:16) ˆΓ , { p, q } (cid:17) , where Γ( δ x ) := τ x and ˆΓ( δ x ) := | ˆ ϕ x i h ˆ ϕ x | , is a reverse test of { ρ, σ } and { ˆ ρ, ˆ σ } , respectively. In addition, thelatter one is minimal, since we had supposed that | ˆ ϕ x i ’s are linearly independentand that { p, q } satisfies (6.30).(Compute the minimal reverse test in the following manner. Define N := P x | ˆ ϕ x i h e x | , D p := P x p ( x ) | e x i h e x | , D q := P x q ( x ) | e x i h e x | . Then, ˆ σ = N D q N † . Therefore, there is a unitary U withˆ σ / = N D / q U. Therefore, ˆ σ − / ˆ ρ ˆ σ − / = X x r x U † | e x i h e x | U. Therefore, the minimal reverse test maps δ x to the constant multiple of ˆ σ / U † | e x i h e x | U ˆ σ / = q ( x ) | ˆ ϕ x i h ˆ ϕ x | . )Therefore, D f ( p k q ) = D max f (ˆ ρ k ˆ σ ) = D max f (Λ( ρ ) k Λ( σ )) ≤ D max f ( ρ k σ ) ≤ D f ( p k q ) , indicating D max f ( ρ || σ ) = D max f (ˆ ρ || ˆ σ ) = D f ( p || q ) . By Theorem 6.9, the minimal reverse test of { ρ, σ } should be essentially identicalto (Γ , { p, q } ). But by the assumption (6.30), (Γ , { p, q } ) has to be the minimalreverse test. Therefore, by Lemma 6.6, (6.29) implies (6.28). From this section, again we remove the assumption of operator convexity and f (0) = 0, and come back to our initial assumption (FC). To start, we treat thecase where one of the argument is rank -1.Suppose σ is rank -1(the other case is reduce to this case by replacing f byˆ f ), and apply (4.6). Since ˜ ρ is constant multiple of σ , we have:D max f ( ρ || σ ) = σ f (cid:0) σ − ˜ ρ (cid:1) + ˆ f (0) (tr ρ − ˜ ρ ) , (7.1)Though this coincide with (6.3), it holds irrespective of the assumption of op-erator convexity.Especially, if ρ is also rank -1, ˜ ρ = 0, thusD max f ( ρ || σ ) = f (0)tr σ + ˆ f (0)tr ρ, where f (0) and/or ˆ f (0) may be ∞ . 24 Total variation distance
The divergence corresponding to f ( r ) = | − r | ,D | − r | ( p k q ) = k p − q k , is called total variation distance. Its common quantum version is k ρ − σ k = sup M (cid:13)(cid:13) P Mρ − P Mσ (cid:13)(cid:13) , where P Mρ is the distribution of the outcome of the measurement M under ρ . This quantum version in fact is the smallest of all the quantum versionssatisfying (D1’) and (D2): D Q | − r | ( ρ k σ ) ≥ k ρ − σ k . (8.1)Observe (D1’) and (D2) implyD Q | − r | ( ρ k σ ) ≥ (cid:13)(cid:13) P Mρ − P Mσ (cid:13)(cid:13) . Maximization of the RHS about M leads to (8.1).In this section we study D max | − r | ( ρ k σ ). Given a reverse test (Γ , { p, q } ) of { ρ, σ } ,we define (Γ ′ , { p ′ , q ′ } ), where { p ′ , q ′ } are probability distributions on { , , } :Γ ′ ( δ ) := 1tr A A, Γ ′ ( δ ) := ρ − A tr ( ρ − A ) , Γ ′ ( δ ) := σ − A tr ( σ − A ) ,p ′ (0) := tr A, p ′ (1) := tr ( ρ − A ) , p ′ (2) := 0 ,q ′ (0) := tr A, q ′ (1) := 0 , q ′ (2) := tr ( σ − A ) . (8.2)where A := X x ∈X min { p ( x ) , q ( x ) } Γ( δ x ) . Then (Γ ′ , { p ′ , q ′ } ) is a reverse test of { ρ, σ } with k p ′ − q ′ k = k p − q k . In-tuitively, Γ ′ ( δ ) takes care of the common part of two states, and Γ ′ ( δ ) andΓ ′ ( δ ) compensates the remainder.Therefore, without loss of generality, we may restrict reverse tests to thosein the form of (8.2). Therefore:D max | − r | ( ρ k σ ) = inf { tr ( ρ + σ − A ) ; A ≥ , ρ ≥ A, σ ≥ A } . (8.3) In this subsection and the next, under the assumption that tr ρ = tr σ = 1, westudy conditions for D max | − r | ( ρ k σ ) = k ρ − σ k . (8.4)25his identity implies uniqueness of quantum version of statistical distance, andalso indicates that classical total variation distance embedded into quantumstates can be completely recovered by measurements. It turns out that thesize of the set of all { ρ, σ } ’s satisfying (8.4) is substantial. This is in contrastwith the case of operator convex functions, where the equivalence of (8.4) holdsalmost exclusively for commutative pairs of states (see Subsection 6.6).Dropping the constraint A ≥ max | − r | ( ρ k σ ) ≥ inf { tr ( ρ + σ − A ) ; ρ ≥ A, σ ≥ A } = inf { ρ − A ); ρ ≥ A, σ ≥ A } = inf { ρ − A ); ρ − A ≥ , ρ − A ≥ ρ − σ } = 2tr [ ρ − σ ] + = k ρ − σ k . Here, the minimum in the third line is achieved if ρ − A = [ ρ − σ ] + . ([ X ] + isthe positive part of the self-adjoint operator X .)Therefore, (8.4) holds iff A = ρ − [ ρ − σ ] + = 12 ( ρ + σ − | ρ − σ | ) ≥ . (8.5)(Here, | X | := √ X † X .) Another necessary and sufficient condition is the exis-tence of A , ∆ , ∆ ≥ ρ = A + ∆ , σ = A + ∆ , (8.6)∆ ∆ = 0 . (8.7)To see this, observe k ∆ − ∆ k = k ρ − σ k ≤ D max | − r | ( ρ k σ ) = min { tr ∆ + tr ∆ ; (8 . , ∆ ≥ , ∆ ≥ } . For (8.4) to hold, existence of ∆ , ∆ with tr ∆ + tr ∆ = k ∆ − ∆ k isnecessary and sufficient. Thus ∆ ∆ = 0.Of course, in general, (8.5) is not true. For example, if ρ is a pure state and ρ = cσ , it is not true. (Let f = | − r | in the formula (7.1). Then what weobtain is very much different from k ρ − σ k .) However, if ρ and σ are very closeso that k | ρ − σ | k ≤ minimum eigenvalue of ρ + σ, (8.8)it is true.Another sufficient condition is( ρ − σ ) = | ρ − σ | ≤ ( ρ + σ ) . To see this is sufficient, take the square root of both sides of inequality: then weobtain (8.5). (Recall √· is operator monotone. This condition is not necessary,since r is not operator monotone.) Rearranging the terms, we have ρσ + σρ ≥ . (8.9)26y (8.8), D max | − r | ( ρ k ρ + εX ) = k ρ − ( ρ + εX ) k for all small ε > X . On the other hand, if f is operator convex, Proposition 6.12 indicatesthat D max f ( ρ k ρ + εX ) = D min f ( ρ k ρ + εX ) for most of small ε > ρ and X commute. In this subsection, we assume dim H = 2 and tr ρ = tr σ = 1, and compute theset { σ ; (8.4) } for each fixed ρ , using the necessary and sufficient condition givenby (8.6) and (8.7). As it turns out, this set is the spheroid, with focal points ρ and − ρ , and touching to the surface of Bloch sphere at each end of the longestaxis.Since tr ρ = tr σ = 1, c := tr ∆ = tr ∆ = 1 − tr A, and 0 ≤ c ≤ . Let v ρ , v σ , u , u , and u A be the Bloch vector of ρ , σ , c ∆ , c ∆ , and − c A , respectively. Also, (8.7) holds iff ∆ and ∆ are rank - 1 and u = − u . Therefore, by (8.6), v ρ = cu + (1 − c ) u A , v σ = − cu + (1 − c ) u A . Therefore, v σ − v ρ = − cu , v σ − ( − v ρ ) = 2 (1 − c ) u A . Let k·k denote the Euclid norm in R , and k v σ − v ρ k + k v σ − ( − v ρ ) k = 2 ( c k u k + (1 − c ) k u A k ) ≤ . The set { σ ; (8.4) } is fairly large. For example, if the largest eigenvalue of ρ is ≤ .
85, this occupies more than the half of the volume of the Bloch sphere.If ρ = (cid:20) a cc b (cid:21) , σ = (cid:20) a − c − c b (cid:21) , ( a ≥ b )the minimization problem (8.3) is solved explicitly. With Z := diag (1 , − σ = ZρZ † , ρ = ZσZ † . Thus, if A satisfies constrains of (8.3), so does ( ZAZ † + A ),and tr A = tr ( ZAZ † + A ). Therefore, without loss of generality, we suppose A is diagonal. After some elementary analysis, the optimal A turns out to be A = (cid:26) diag ( a − | c | , b − | c | ) , ( a ≥ b ≥ | c | )diag( a − | c | b − , , ( a ≥ | c | ≥ b )and we have D max | − r | ( ρ k σ ) = (cid:26) | c | = k ρ − σ k , ( a ≥ b ≥ | c | )2( b + | c | b − ) . ( a ≥ | c | ≥ b )27 Dual Representation and Continuity
In this section we give the dual of (3.1), or representation of D max f by maximiza-tion of a linear functional:D max f ( ρ k σ ) = sup ( W ,W ) ∈W max f ( H ) { tr ρW + tr σW } , (9.1)where W max f ( H ) := { ( W , W ) ; sW + tW − g f ( s, t ) H ≤ , ∀ s, t ∈ [0 , } = { ( W , W ) ; f ( r ) − rW − W ≥ , r ≥ } (To see the equality in the second line, recall g f is positively homogeneous andcontinuous.).Let D Qf be a lower semicontinuous, proper, positively homogeneous, andconvex function over positive operators. Then by Corollary 13.5.1 of [23], itshould be in the form ofD Qf ( ρ k σ ) = sup ( W ,W ) ∈W Qf tr ( ρW + σW ) , where W Qf is, without loss of generality, convex and unbounded from below. Inaddition, suppose D Qf satisfies (D1’) and (D2).Since D Qf satisfies (D2) with p (1) = s , q (1) = t , p ( x ) = q ( x ) = 0, ( x = 1),D Qf ( s | e i h e | k t | e i h e | )= sup ( W ,W ) ∈W Qf ( s h e | W | e i + t h e | W | e i ) = g f ( s, t ).Since the second identity is true for all s ≥ t ≥ | e i with k e k = 1,( W , W ) ∈ W max f . Therefore, W Qf ⊂ W max f . Also, the RHS of (9.1) satisfies(D2).Therefore, the RHS of (9.1) is the largest of all proper, positively homoge-neous, convex, lower and semicontinuous functionals with (D2). As is easilyverified, it also satisfies (D1) and (D1’).On the other hand, D max f is the largest of all functionals with (D1’) and (D2)(Lemma 3.1) and turns out to be proper, positively homogeneous, convex, lowersemicontinuous. Therefore:cl D max f ( ρ k σ ) = sup ( W ,W ) ∈W max f ( H ) { tr ρW + tr σW } . (9.2) Remark 9.1
From above argument, if f satisfies (FC), D max f is the largestof all D Qf ’s which are lower semicontinuous, proper, positively homogeneous,convex, and satisfy (D2). max f is lower semicontinuous. Itsuffices to show this on D := { ( ρ, σ ) ; ρ ≥ , σ ≥ , tr ρ ≤ , tr σ ≤ } , by D max f ( λρ k λσ ) = λ D max f ( ρ k σ ), ∀ λ > , { p, q } ) satisfies X x ∈X q ( x ) ≤ , X x ∈X p ( x ) ≤ . Also, by Lemma 4.1, we suppose { p, q } are over X with |X | ≤ (dim H ) + 3.The set of all such reverse tests T = { τ = (Γ , { p, q } ) } can be identified with acompact subset of a finite dimensional real vector space.Define maps F ( τ ) := { Γ( p ) , Γ( q ) } and F ( τ ) := D f ( p k q ), and let U := { υ = ( υ , υ ); υ = F ( τ ) , υ ≥ F ( τ ) , τ ∈ T } . Lemma 9.2
Suppose (FC) is satisfied. Then the set U is closed and identicalto epi D max f | D . Proof.
Suppose ( υ , υ ) ∈ cl U . Then for any ε > B ε ( υ ) × C ε ( υ ) ∩ U is notempty, where B ε ( υ ) is the closed ε -ball centered at υ , and C ε ( υ ) := { t ; t ≤ υ + ε } . Therefore, all the sets in the family { F − ( B ε ( υ )) ∩ F − ( C ε ( υ )) ∩ T } ε> are not empty, and in fact, they are closed subsets of the compact set T , since F is continuous and F is lower semicontinuous. Since the family has finiteintersection property, the intersection of these sets is not empty. Any element τ of this intersection satisfies υ = F ( τ ) and υ = F ( τ ), indicating υ ∈ U .Therefore, U is closed.The second statement follows by U ⊂ epi D max f | D ⊂ cl U . The first ” ⊂ ” by F ( τ ) ≥ D max f (Γ( p ) k Γ( q )), and the second one is by the defi-nition (3.1) of D max f . Lemma 9.3
Suppose (FC) is satisfied. Then, D max f is lower semicontinuous.Moreover, for each { ρ, σ } such that D max f ˙( ρ k σ ) < ∞ , the infimum in (3.1) isachieved by some (Γ , { p, q } ) . Proof.
It suffices to prove the assertion on D . The closedness of D max f | D followsby closedness of epi D max f | D . Also, by the previous lemma,epi D max f | D = { ( ρ, σ, t ) ; ( ρ, σ ) ∈ D , t ≥ D max f ( ρ k σ ) } = { (Γ( p ) , Γ( q ) , t ) ; τ ∈ T , t ≥ D f ( p k q ) } . Therefore, to each { ρ, σ } there is a reverse test (Γ , { p, q } ) of { ρ, σ } with { t ; t ≥ D max f ( ρ k σ ) } = { t ; t ≥ D f ( p k q ) } , or equivalently, D max f ( ρ k σ ) = D f ( p k q ). Thisreverse test achieves the infimum in (3.1).By this lemma and (9.2): 29 heorem 9.4 If f satisfies (FC), then (9.1) holds. Moreover, for each { ρ, σ } such that D max f ˙( ρ k σ ) < ∞ , the infimum in (3.1) is achieved by some (Γ , { p, q } ) . f is operator convex When f satisfies the condition (F) and ρ > σ >
0, we can write ( W ∗ , W ∗ )achieving the maximum in (9.1) explicitly.Since f is operator convex, it is differentiable. Hence the Frechet derivativeD f ( T ) of f i.e., a linear transform in B ( H ) with k f ( T + X ) − f ( T ) − D f ( T ) ( X ) k = o ( k X k )is given by, in the basis which diagonalizes T ,D f ( T ) ( X ) = h f [1] ( t i , t j ) X i,j i , (9.3)where t i ( i = 1 , · · · ) are eigenvalues of T , and f [1] ( t, t ′ ) := ( f ( t ) − f ( t ′ ) t − t ′ , ( t = t ′ ) ,f ′ ( t ) , ( t = t ′ ) . Using d ( ρ, σ ) as of (4.7),dd t D max f ( ρ + tX k σ ) (cid:12)(cid:12)(cid:12)(cid:12) t =0 = dd t tr σf ( σ − ( ρ + tX ) σ − ) (cid:12)(cid:12)(cid:12)(cid:12) t =0 = tr σ D f ( d ( ρ, σ ))( σ − Xσ − )= tr Xσ − D f ( d ( ρ, σ ))( σ ) σ − , where the last identity is by self-adjointness of D f ( T ) ( · ) with respect to theinner product tr XY ,tr Y D f ( T ) ( X ) = X i,j ρ ,i,j f [1] ( t i , t j ) X i,j = X i,j f [1] ( t i , t j ) ρ ,i,j X i,j = tr X D f ( T ) ( Y ) . (9.4)Replacing f by ˆ f , the derivative about the second argument is computedsimilarly. Therefore, W ∗ = σ − { D f ( d ( ρ, σ ))( σ ) } σ − , W ∗ = ρ − n D ˆ f ( d ( σ, ρ ))( ρ ) o ρ − . achieves the maximum in (9.1). In fact, by (9.4),tr ( ρW , ∗ + σW , ∗ ) = tr σ D f ( d ( ρ, σ ))( d ( ρ, σ )) + tr ρ D ˆ f ( d ( σ, ρ ))( d ( σ, ρ ))= tr σf ′ ( d ( ρ, σ )) d ( ρ, σ ) + tr ρ ˆ f ′ ( d ( σ, ρ )) d ( σ, ρ )= D max f ( ρ k σ ) . For example, if f ( r ) = r , ˆ f ( r ) = 1 /r , D f ( T ) ( X ) = T X + XT andD ˆ f ( T ) ( X ) = − T XT − , W ∗ = σ − ρ + ρσ − , W ∗ = − σ − ρ σ − . .2 On continuity In this subsection, some remarks on continuity of D max f are in order. ByLemma 9.3 and Proposition 2.1, if f satisfies (FC),lim ε ↓ D max f ( ρ ε k σ ε ) = D max f ( ρ k σ ) , (9.5)where { ( ρ ε , σ ε ) } ε> is a straight line in the effective domain of D max f . { ( ρ ε , σ ε ) } ε> cannot be arbitrary curve for (9.5) to hold. To see this, supposethat ˆ f (0) < ∞ and σ is a pure state, and use (7.1). Let σ = (cid:20) a
00 0 (cid:21) , ρ ε = (cid:20) b √ εC † √ εC εD (cid:21) , and ρ ≥
0, tr ρ ε = 1. Then˜ ρ ε = b − √ εC ( εD ) − √ εC † = b − CD − C † = ˜ ρ is constant of ε , andlim ε ↓ D max f ( ρ ε k σ ) = D max f (˜ ρ k σ ) + ˆ f (0)(1 − ˜ ρ ) = D max f ( ρ k σ ) . However, { ( ρ ε , σ ε ) } ε> need not to be straight line, either. For example,consider a continuous curve { σ ε } ε ≥ of positive operators with σ = σ , supp σ ε ⊃ supp ρ, supp σ, and σ ε > σ . Then, if f (0) = 0, (6.12) we havelim ε ↓ D max f ( ρ k σ ε ) ≤ D max f ( ρ k σ ) . The opposite inequality results from lower semicontinuity of D max f . Thus (9.5)holds. So far, we had supposed that the dimension of the underlying Hilbert spaceis finite. Some of them, namely Sections 5 and 7, are obviously generalized toseparable infinite dimensional case. In this section, we consider generalizationof (9.1), thus proving lower semicontinuity. Also the existence of the minimumis discussed. Throughout the section, we suppose ˆ f (0) < ∞ and f (0) < ∞ . Inthis case, g f is not only lower semicontinuous, but also continuous. Remark 9.5
To see that g f is continuous, we only have to check it at theorigin. Observe g f ( s, t ) = ˜ g f ( s, t ) + ˆ f (0) s + f (0) t , with ˜ g f ( s, t ) ≤ . Therefore,if ( s k , t k ) → (0 , , lim k →∞ g f ( s k , t k ) = lim k →∞ { ˜ g f ( s, t ) + ˆ f (0) s k + f (0) t k } = lim k →∞ ˜ g f ( s, t ) ≤ g f ( s, t ) ≤ lim k →∞ g f ( s k , t k ) . indicating lim k →∞ g f ( s k , t k ) = g f ( s, t ) .
31n the RHS of (9.1), let ρ and σ are positive element of B ,sa , the space ofall the self - adjoint trace class operators (operators with finite trace), and W and W are self - adjoint bounded operators, B sa .Also, we modify the definition of reverse tests. admitting all the positiveregular measures as input. To state that object, operator valued functions andtheir integrals are used, see Section 2.3, [24] for these concepts. In this newdefinition, the reverse test is specified by a regular finite measure ν over theBorel sets of [0 , × , a ν - measurable function Z ( s, t ) from ( s, t ) ∈ [0 , × into B ,sa with tr Z ( s, t ) = 1 , ν - a.e., (9.6) Z s Z ( s, t )d ν = ρ, Z tZ ( s, t )d ν = σ, (9.7)where the integrals of operator valued functions are understood as short for Z s tr Z ( s, t ) W d ν = tr ρW, ∀ W ∈ B sa . Thus our new definition is:D max f ( ρ k σ ) := inf (cid:26)Z g f ( s, t )tr Z ( s, t )d ν ; (9.6) and (9.7) (cid:27) (9.8)This defines a positive map from measures which are absolutely continuousrelative to ν into B ,sa such that Γ ( µ ) = R d µ d ν Z ( s, t )d ν . This is ‘trace pre-serving’ in the sense µ ([0 , × ) = tr Γ ( µ ) . Thus the pair ν and Z represents a”reverse test” (Γ , { P, Q } ), where P and Q are positive measures with density s and t , respectively. Remark 9.6
A function Z ( s, t ) is ν - measurable iff its norm - approximableby simple functions, but the definition is equivalent to that the scalar valuedfunction ( s, t ) → tr Z ( s, t ) W is ν -measurable (Proposition 2.15, [24]). (9.7) maybe understood in the ”weak” sense as stated, but since k sZ ( s, t ) k ≤ tr Z ( s, t ) is ν -integrable, also can be understood as Bochner integral, or the limit of integralof simple functions in norm. Proposition 9.7 (Theorem 8.6.1, [8]) Let F be a real-valued convex functiondefined on a convex subset Ω of a vector space X , and let G be a convex mappingof X into a partially ordered normed space Z . Define µ := sup { F ( ~W ); ~W ∈ Ω , G ( ~W ) ≤ } . Then for any ζ ∗ ≥ , µ ≤ F ∗ ( ζ ∗ ) := sup { F ( ~W ) + h ζ ∗ , G ( ~W ) i ; ~W ∈ Ω } . (9.9) Also, if there exists an ~W such that G ( ~W ) < and µ is finite, µ = min ζ ∗ ≥ F ∗ ( ζ ∗ ) . (9.10)32elow, we apply this proposition considering the RHS of (9.1) as the primalproblem, and obtain the reverse test as its dual problem. To proceed, we needto introduce a proper mathematical framework.Consider the space C of continuous real valued functions on the compactset [0 , × and the space B sa of the space of the bounded self - adjoint linearoperators on the Hilbert space H . Endorse C and B sa with the norm k h k :=sup ( s,t ) ∈ [0 , × | h ( s, t ) | and the operator norm k W k , respectively. From thesetwo spaces, we compose the linear space ( n X i =1 h ( i ) W ( i ) ; h ( i ) ∈ C , W ( i ) ∈ B sa ) , and its completion with respect to the projective norm k z k π : = inf ( n X i =1 k h ( i ) kk W ( i ) k ; z = n X i =1 h ( i ) W ( i ) ) is denoted by Z . In fact, Z is the projective tensor product C ˆ ⊗ π B sa . (That k · k π is a norm and k hW k π = k h kk W k is known [24].)Then for each z ∈ Z , there exist bounded sequences (cid:8) h ( i ) (cid:9) and (cid:8) W ( i ) (cid:9) with z = P ∞ i =1 h ( i ) W ( i ) and k z k π = inf ( ∞ X i =1 k h ( i ) kk W ( i ) k ; z = ∞ X i =1 h ( i ) W ( i ) ) . One can endorse the partial order ≥ in Z by z ≥ ⇔ ∞ X i =1 h ( i ) ( s, t ) W ( i ) ≥ , ∀ ( s, t ) ∈ [0 , × . The strict inequality z > z is an interior point of the cone { z ′ ; z ′ ≥ } .Any bounded linear functional ζ ∗ on Z is the linearization of bilinear formon C and B sa (see Section 2.2, [24]) : ζ ∗ ( hW ) = ζ ∗ ( h ) ( W ) , where ζ ∗ ( h ) ( · ) and ζ ∗ ( · ) ( W ) is an element of B ∗ sa and C ∗ , respectively.Below, Z is the one defined as above, and X = Ω := { ~W = ( W , W ); W , W ∈ B sa } ,F ( ~W ) := tr ρW + tr σW , Lemma 9.8
Suppose g f is positive, bounded and continuous on [0 , × . Sup-pose ( s, t ) → η s,t ( W ) is ν - measurable function on [0 , × and W → η s,t ( W ) isa linear functional with | η s,t ( W ) | ≤ k W k , ν - a.e., (9.11)33 nd Z s η s,t ( W )d ν = tr ρW, Z t η s,t ( W )d ν = tr σW, ∀ W ∈ B sa . (9.12) Then min η Z g f ( s, t ) η s,t ( )d ν = sup ( W ,W ) ∈W f (tr W ρ + tr W σ ) . (9.13) Proof.
We apply Proposition 9.7 with G ( ~W ) : = g W + g W − g f , where g ( s, t ) : = s , g ( s, t ) : = t . With ζ ∗ ∈ Z ∗ , ζ ∗ ≥ F ∗ ( ζ ∗ ) = sup ~W { tr ρW + tr σW − ζ ∗ ( g W + g W − g f ) } = sup ~W { (tr ρW − ζ ∗ ( g )( W )) + (tr σW − ζ ∗ ( g )( W )) + ζ ∗ ( g f )( ) } = (cid:26) ζ ∗ ( g f ) ( ) , if ζ ∗ ( g ) ( W ) = tr ρW and ζ ∗ ( g ) ( W ) = tr σW, ∞ , otherwise.Observe h → ζ ∗ ( h )( W ) is a bounded functional on C . Therefore, by Riesz-Markov representation theorem, ζ ∗ ( h )( W ) = Z h ( s, t )d ν W , where ν W is a regular measure over the Borel sets of [0 , × . By ζ ∗ ( χ ( B )) ( W ) = ν W ( B ), where χ ( · ) is the indicator function, | ν W ( B ) | ≤ k W kk ζ ∗ ( χ ( B ))( · ) k = k W kk ζ ∗ ( χ ( B ))( ) k = k W k ν ( B ) . (9.14)Therefore, ν W is absolutely continuous relative to ν .Thus η s,t ( W ) : = d ν W d ν exists, and ζ ∗ ( h )( W ) = Z h ( s, t ) η s,t ( W )d ν . Since W → ζ ∗ ( h )( W ) is linear and positive, so is W → η s,t ( W ), ν -a.e. (9.11)follows from (9.14). Therefore, rewriting F ∗ using η s,t and ν : = ν , we havethe LHS of (9.13).Also, G ( · ) is convex, and ~W : = ( w , , w , ), where ( w , , w , ) is arelative interior point of W f , satisfies G ( ~W ) <
0. Finally, η s,t ( W ) := tr ρW, if ( s, t ) = (1 , , tr σW, if ( s, t ) = (0 , , , otherwise ,ν ( { (1 , } ) = ν ( { (1 , } ) := 1 , ν ([0 . × \{ (0 , , (1 , } ) := 0 , satisfies (9.12) and R g f ( s, t ) η s,t ( )d ν = g f (1 ,
0) + g f (0 ,
1) is finite. Thus by(9.9), the RHS of (9.13) is finite. Therefore, we can apply Proposition 9.7, andthe assertion is proved. 34 heorem 9.9
Suppose H a separable Hilbert space, (FC) is satisfied, and ˆ f (0) < ∞ and f (0) < ∞ .Then, (i) (9.1) holds if W and W ranges over B sa . (ii) inf in (9.8) can be replaced by min . (iii) D max f is lower semicontinuous. Proof.
We use Lemma 9.8, and rewrite η s,t using Z ( s, t ). Then, (i) and (ii) willbe simultaneously proved. Since (i) means D max f is the pointwise supremum oflinear functionals, (iii) will follow.Without loss of generality, one may suppose f ( r ) ≥ ∀ r ≥
0, or equiva-lently, g f is positive. To see this, choose a and b so that f ( r ) := f ( r ) − ar − b ≥ g f ( s, t ) = g f ( s, t ) − as − bt ≥
0. If ( W , W ) ∈ W max f , ( W + a, W + b ) ∈W max f , and D max f ( ρ k σ ) = D max f ( ρ k σ ) + a tr ρ + b tr σ . Thus, D max f satisfies (i)-(iii)of the present theorem iff D max f satisfy those.Since η s,t is a bounded linear functional on B sa , there is Z ( s, t ) ∈ B ,sa withtr Z ( s, t ) W = η s,t ( W ), for all for any W with finite rank. Then by ζ ≥ Z ( s, t ) ≥ , tr Z ( s, t ) ≤ , ν -a.e. . (9.15)Also, tr Z ( s, t ) W ≤ η s,t ( W ) , W ∈ B sa , W ≥ , ν -a.e. . (9.16)Therefore, since g f ≥ Z g f ( s, t ) η s,t ( )d ν ≥ Z g f ( s, t )tr Z ( s, t )d ν, (9.17)thus replacement of η s,t by W → tr Z ( s, t ) W only improve the value of optimizedfunction.Next we show that (9.12) leads to (9.7). Suppose W ≥
0, and let { W ( k ) } bethe sequence of positive finite rank operators such that W ( k ) = π k W π k , where π k is the projector onto k - dimensional subspace. Then 0 ≤ W ( k ) ≤ W and as k → ∞ , for , s tr Z ( s, t ) W ( k ) ր s tr Z ( s, t ) W , ν -a.e. . Since the function ( s, t ) → s tr Z ( s, t ) W is ν -integrable, by monotone conver-gence theorem,tr ρW = ( a ) lim k →∞ tr ρW ( k ) = lim k →∞ Z sη s,t ( W ( k ) )d ν = ( b ) lim k →∞ Z s tr Z ( s, t ) W ( k ) d ν = Z lim k →∞ s tr Z ( s, t ) W ( k ) d ν = Z s tr Z ( s, t ) W d ν. Here, ( a ) holds since ρ is trace crass, and ( b ) holds since W ( k ) is of finite rank.When W is not positive, decomposing it into its positive and negative part, weobtain the identity. Thus, (9.7) is satisfied.Finally, due to (9.15), Z ( s, t ) can be normalized to satisfy (9.6).35 We had introduced the maximal f - divergence as the solution to an optimizationproblem, reverse test, and shown its closed formula in some important cases.Next step is to consider asymptotic version of the problem, in the hope that thisclose the gap between the maximum and minimum quantum divergence. Thepresent author’s long standing project is to characterize all the possible quantum f - divergence, as [22] had characterized all the quantum Fisher information. Appendix Matrix analysis
Proposition 10.1 (Theorem V.2.3 of [2])Let f be a continuous function on [0 , ∞ ) . Then, if f is operator convex and f (0) ≤ , for any positive operator X and an operator C such that k C k ≤ , f (cid:0) C † XC (cid:1) ≤ C † f ( X ) C . Proposition 10.2 ((2.43) of [3]) Let f be a operator convex function definedon [0 , ∞ ) . Let Λ † be a unital positive map. Then f (cid:0) Λ † ( A ) (cid:1) ≤ Λ † ( f ( A )) holds for any A ≥ . Proposition 10.3 (Proposition 8.4 of [12])Let f be a continuous operator con-vex function on [0 , ∞ ) . Then, if ˆ f (0) < ∞ , there is a real number a and apositive Borel measure µ such that f ( r ) = f (0) + ˆ f (0) r + Z (0 , ∞ ) ψ λ ( r )d µ ( t ) , ψ λ ( r ) := − rr + λ , and R (0 , ∞ ) d µ ( λ )1+ λ < ∞ . Since ψ λ is operator monotone decreasing, this meansthat f ( r ) is sum of linear function and operator monotone decreasing function. Proposition 10.4 (Lemma 5.2 of [12]) If f is a complex-valued function onfinitely many points { r i ; i ∈ I } ⊂ [0 , ∞ ) , then for any pairwise different positivenumbers { λ i ; i ∈ I } there exist complex numbers { c i ; i ∈ I } such that f ( r i ) = P i ∈ I c i r i + λ i , i ∈ I . Proposition 10.5 (Exercise 1.3.5 of [3]) Let X , Y be a positive definite ma-trices. Then, (cid:20) X CC † Y (cid:21) ≥ implies X ≥ CY − C † , Y ≥ C † X − C. (10.2)36 eferences [1] Amari, S., Nagaoka,H.: Methods of Information Geometry. AMS (2001)[2] Bhatia, R.: Matrix Analysis. Springer, Berlin (1996)[3] Bhatia,R.: Positive Definite Matrices. Princeton (2007)[4] Belavkin,V. P.:On Entangled Quantum Capacity. In: Quantum Communi-cation, Computing, and Measurement 3.pp.325-333. Kluwer, Boston (2001)[5] Chefles, A.:Deterministic quantum state transformations. Phys. Lett A 270,14 (2000)[6] Ebadian, A., Nikoufar, I., and Gordjic,M.: Perspectives of matrix convexfunctions. Proc. Natl Acad. Sci. USA, 108(18), 7313–7314 (2011)[7] Effros, E., and Hansen, F.,: Non-commutative perspectives, Ann. Funct.Anal. Volume 5, Number 2, 74-79 (2014)[8] Luenberger, D. G.:Optimization by vector space methods. Wiley, New York(1969)[9] Hammersley,S. J.,Belavkin, V. P.:Information Divergence for QuantumChannels, Infinite Dimensional Analysis. In: Quantum Informationand Computing, Quantum Probability and White Noise Analysis,VXIX,pp.149-166, World Scientific, Singapore(2006)[10] Hayashi,M.:Characterization of Several Kinds of Quantum Analogues ofRelative Entropy. Quantum Information and Computation, Vol. 6, 583-596(2006)[11] Hiai,F., Petz,D.: Different quantum f-divergences and the reversibility ofquantum operations, arXiv:math-ph/1604.03089 (2006)[12] Hiai, F., Mosonyi, M., Petz D., and Beny, C.:Quantum f- divergences anderror corrections. Rev. Math. Phys. 23, 691–747 (2011)[13] Hiai,F., Petz,D.: The proper formula for relative entropy and its asymp-totics in quantum probability. Comm. Math. Phys. 143, 99-114 (1991)[14] Holevo, A.S.:Probabilistic and Statistical Aspects of Quantum Theory,North-Holland, Amsterdam, (1982)(in Russian, 1980)[15] Matsumoto, K.: A Geometrical Approach to Quantum Estimation Theory,doctoral dissertation, University of Tokyo (1998)[16] Matsumoto, K.: Reverse estimation theory, Complementality between RLDand SLD, and monotone distances. arXiv:quant-ph/0511170 (2005)[17] Matsumoto, K.: Reverse test and quantum analogue of classical Fidelityand generalized Fidelity, arXiv:quant-ph/1006.0302 (2010)3718] Matsumoto, K.: On maximization of measured ff