[PDF] A new quantum version of f-divergence

Abstract

Full PDF

aa r X i v : . [ qu a n t - ph ] F e b A new quantum version of f -divergence Keiji MatsumotoQuantum Computation Group, National Institute of Informatics,2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430,e-mail:[email protected] 7, 2018

Abstract

This paper proposes and studies new quantum version of f -divergences, a classof convex functionals of a pair of probability distributions including Kullback-Leibler divergence, Renyi-type relative entropy and so on. There are severalquantum versions so far, including the one by Petz [12]. We introduce anotherquantum version (D max f , below), deﬁned as the solution to an optimization prob-lem, or the minimum classical f - divergence necessary to generate a given pairof quantum states. It turns out to be the largest quantum f -divergence. Theclosed formula of D max f is given either if f is operator convex, or if one of the stateis a pure state. Also, concise representation of D max f as a pointwise supremumof linear functionals is given and used for the clariﬁcation of various propertiesof the quality.Using the closed formula of D max f , we show: Suppose f is operator convex.Then the maximum f - divergence of the probability distributions of a measure-ment under the state ρ and σ is strictly less than D max f ( ρ k σ ). This statementmay seem intuitively trivial, but when f is not operator convex, this is notalways true. A counter example is f ( r ) = | − r | , which corresponds to totalvariation distance.We mostly work on ﬁnite dimensional Hilbert space, but some results areextended to inﬁnite dimensional case. This paper proposes and studies a new quantum version of f -divergence:D f ( p k q ) := X x q ( x ) f ( p ( x ) /q ( x ) ) , where p and q are probability distributions. Several important quantities ininformation theory and statistics are in this class. For example, D r ln r and D r α f -divergences than these have at least one operational meaning. If f is a convex function and satisﬁes some moderate conditions, D f ( p k q ) is theoptimal gain of a certain Bayes decision problem: for each f , there is a pair offunctions w and w on decision space representing a gain of decision d withD f ( p k q ) = sup d ( · ) X x ( w ( d ( x )) p ( x ) + w ( d ( x )) q ( x )) . (1.1)Conversely, for each ( w ( · ) , w ( · )), there is a convex function f with this iden-tity. Also, by (1.1) and the celebrated randomization criterion [25], there is aMarkov map which sends ( p, q ) to ( p ′ , q ′ ) iﬀ D f ( p k q ) ≥ D f ( p ′ k q ′ ) holds for anyconvex function f with above mentioned properties.In quantum information theory, a series of works by Petz (see [12] and ref-erences therein) is most impressive, and his version of quantum divergence havebeen widely studied and applied. Also, recent development of theory of quantumRenyi entropy is signiﬁcant.In this paper, we introduce another quantum version, the maximal quan-tum f - divergence D max f ( ρ k σ ). This quantity is deﬁned as the solution to thefollowing optimization problem: given a pair of quantum states { ρ, σ } , considera (completely) positive trace preserving map Γ that sends probability distribu-tions { p, q } to { ρ, σ } . The triple (Γ , { p, q } ) ( reverse test , here after), is optimizedto minimize D f ( p k q ), and this inﬁmum is D max f ( ρ k σ ). The name comes fromthe fact that D max f is the largest of the all possible quantum f - divergences.Some historical remarks are in order. When f is r ln r and σ is invertible,D max r ln r ( ρ k σ ) = tr ρ log ρ / σ − ρ / . This RHS quantity had been studied by several authors from operator theoreticpoint of view [4][9][13]. Also, some authors had pointed out this quantity ispath dependent divergence [1] of RLD quantum Fisher metric [22], which playsan important role in quantum statistical estimation theory [14], along e - and m -geodesic connecting ρ and σ [10][19][16]. However, its characterization as thesolution to the optimization problem and the largest quantum version is ﬁrstpointed out by the present author [16]. In [17], the present author studied D max r / rather intensively, and brieﬂy treated the case when f is operator monotonedecreasing. Recently, based on an earlier version of the present paper, [11]studied some aspects of D max f .Below, we summarize our main results. When f is operator convex, a seriesof rich results are available. First, we can write down the value of D max f andthe operation achieving the minimum explicitly: Suppose σ and ρ are ρ = (cid:20) ρ ρ ρ ρ (cid:21) , σ = (cid:20) σ

00 0 (cid:21) , max f ( ρ || σ ) = tr σ f (cid:16) σ − / ˜ ρσ − / (cid:17) + tr ( ρ − ˜ ρ ) lim ε ↓ εf (1 /ε ) . (1.2)where ˜ ρ := ρ − ρ ( ρ ) − ρ . The ﬁrst term of the RHS is trace of non-commutative perspective [6][7] of ˜ ρ and σ . The operation achieves the minimumis obtained using spectral decomposition of σ − / ˜ ρσ − / , and the same reversetest is optimal for all operator convex function f ’s . Uniqueness of optimaloperation modulo trivial redundancy is also shown.Based on these analysis, we had shown, for example: Suppose f is operatorconvex. Then the maximum f - divergence of the probability distributions ofa measurement under the state ρ and σ is strictly less than D max f ( ρ k σ ). Thus,once encoded into non - commutative quantum states, some amount of classical f -divergence is irrecoverably lost. (This statement may seem intuitively trivial,but when f is not operator convex, this is not always true.)After the detailed analysis of the case of f is operator convex, we study thecase where such an assumption is not true. One of the motivation is much ofthe results in the former case generalizes.First, when one of the states are a pure state, (1.2) generalize to all theconvex functions, and the optimal reverse test is also the same, and also unique.Also, D max f is strictly larger than measured f - divergence, unless two statescommute.Next, we analyzed f ( r ) = | − r | , since this corresponds to total variationdistance, which is quite often used in statistics, information theory, and so on.Though we failed to obtain the closed formula, the optimization problem isreduced to quite simple linear semideﬁnite program. Using this, we had shownthat (1.2) is not true in this case, and the optimal reverse test is no the sameeither.In addition, when { ρ, σ } satisﬁes some conditions, it turns outD max | − r | ( ρ || σ ) = k ρ − σ k . (1.3)Since the RHS equals the measured total variation distance, this means the totalvariation sometimes does not decrease by embedding into non - commutativequantum states. The condition for (1.3) is not too restrictive: for example, if ρσ + σρ ≥

0, this identity holds. In the qubit case, the necessary and suﬃcientcondition for (1.3) is obtained, and fairly large area of Bloch sphere satisﬁes(1.3).Besides from these case studies, we had shown the dual expression of D max f ,D max f ( ρ k σ ) = sup { tr ( ρW + σW ) ; rW + W ≤ f ( r ) , r ≥ } . (1.4)This shows that D max f is the pointwise supremum of linear functionals, thus it islower semicontinuous. Thus, D max f behaves extremely nicely at the edge of thedomain. In fact, if ( ρ ε , σ ε ) is an arbitrary line segment connecting ( ρ, σ ) andan interior point of the domain, lim ε ↓ D max f ( ρ ε k σ ε ) = D max f ( ρ k σ ) holds. (1.4)3s also valid, with certain restrictions, even when the underlying Hilbert spaceis separable inﬁnite dimensional space.Except for the last subsection, we will work on a ﬁnite dimensional Hilbertspace H . In most cases, the underlying Hilbert space is not mentioned unless itis confusing. The space of trace class operators, and bounded operators on H isdenoted by B ( H ), and B ( H ), respectively, and the space of their self - adjointelements are denoted by B ,sa ( H ), and B sa ( H ). In most cases, speciﬁcation ofunderlying Hilbert space is dropped, thus B sa in stead of B sa ( H ), for example.When dim H < ∞ (thus in most of the paper,) to denote the space of all linearoperators, we use B ( H ).For each operator A , A − denotes its Moore-Penrose generalized inverse.Also, for each positive operator X , denote by supp X the its support, and by π X the projection onto supp X . The projection onto the space K is denotedby π K . Orthogonal complement of the projector π is denoted by π ⊥ . In thispaper, in most part, probability distributions or positive measures are deﬁnedon the ﬁnite set X . These are easily identiﬁed with commutative elements of B sa (cid:0) C |X | (cid:1) . Note the support of the measures µ is also denoted by supp µ . f -divergence This section explains the deﬁnition and known useful facts about classical f -divergence, and convex analysis.The deﬁnition of D f in the introduction obviously cannot be used when q ( x ) = 0 for some x . Convex analysis supplies useful tools to cope with suchcontinuity issue. As in [23], we suppose that h is a map from R n to R ∪ {±∞} .Instead of saying that h is not deﬁned on a certain set, we say that h ( r ) = ∞ on that set. The eﬀective domain of h , denoted by dom h , is the set of all r ’s with h ( r ) < ∞ . h is said to be convex iﬀ its epigraph , or the set epi h := { ( r, λ ) ; λ ≥ h ( r ) } is convex. A convex function h is proper iﬀ h is nowhere −∞ and not ∞ everywhere, and is lower semicontinuous iﬀ the set { r ; λ ≥ h ( r ) } isclosed for any λ , or equivalently, iﬀ its epigraph is closed (Theorem 7.1 of [23]), or equivalently, h (lim k →∞ r k ) ≤ lim k →∞ h ( r k ).Given a convex function h , its closure cl h is the greatest lower semicontin-uous (not necessarily ﬁnite) function majorized by h . The name comes fromthe fact that epi (cl h ) = cl (epi h ). cl h coincide with h except perhaps at therelative boundary points of its eﬀective domain. If h is proper and convex, sois cl h (Theorem 7.4, [23]).The following Proposition will be intensively used later. Proposition 2.1 (Theorem 10.2,[23] ) If h is lower semicontinuous, properand convex, it is continuous on any simplex in dom h . From here, unless otherwise mentioned, f , which is used to deﬁne f -divergence, is supposed to satisfy the following condition. (FC) f is a proper, lower semicontinuous, and convex function with dom f ⊃ (0 , ∞ ). Also, f (0) = 0. 4ow we are in the position to deﬁne the classical f -divergence D f betweenthe positive measures p and q over the ﬁnite set X . It is deﬁned in the followingmanner, so that the function ( p, q ) → D f ( p || q ) is lower semicontinuous: Namely,D f ( p k q ) := X x ∈X g f ( p ( x ) , q ( x )) , where g f ( s, t ) is the closure of tf (cid:0) st (cid:1) ( see p. 35 and p.67 of [23] ), g f ( s, t ) :=  tf ( s/t ) , if s ∈ dom f, t > t ↓ tf ( s/t ) , if s ∈ dom f, t = 0 , , if s = t = 0 , ∞ , otherwise. (2.1)It is easy to check thatD f ( p k q ) = X x ∈ supp q q ( x ) f (cid:18) p ( x ) q ( x ) (cid:19) + X x ∈X / supp q p ( x ) lim ε ↓ ε f (cid:18) ε (cid:19) . Remark 2.2

Though p and q have to be probability for D f to have operationalmeanings, we extend the domain of D f to pairs of positive ﬁnite measures onﬁnite set for the sake of mathematical convenience. Observe also g f is in addition positively homogeneous, or ∀ a ≥ , g f ( as, at ) = ag f ( s, t ) . Since it is positively homogeneous, proper, lower semicontinuous and convex,by Corollary 13.5.1 of [23], it is the pointwise supremum of linear functions, g f ( s, t ) = sup ( w ,w ) ∈W f w s + w t, (2.2)where the set W is convex and unbounded from below.Therefore,D f ( p k q ) = sup ( X x ∈X w ( x ) p ( x ) + w ( x ) q ( x ); ( w ( x ) , w ( x )) ∈ W f ) . (2.3)This in turn shows, by Corollary 13.5.1 of [23], D f is positively homogeneous,proper, lower semicontinuous and convex. Remark 2.3 (2.3) indicates (1.1).To see this, use W f as a decision space. If f satisﬁes (FC), the functionˆ f ( r ) := g f ( r,

1) (2.4)5lso satisﬁes (FC) and g ˆ f ( t, s ) = g f ( s, t ). This identity impliesD f ( p || q ) = D ˆ f ( q || p ) . (2.5)Also, lim ε ↓ εf (1 /ε ) = ˆ f (0) , f (0) = lim ε ↓ ε ˆ f (1 /ε ) . (2.6)Introduction of ˆ f often simpliﬁes the argument, allowing to switch the ﬁrst andthe second variables. f - divergence In this section we deﬁne maximal f - divergence D max f as the solution to anoperationally deﬁned minimization problem.A reverse test of a pair { ρ, σ } of positive deﬁnite operators is a triple(Γ , { p, q } ). Here, Γ is a trace preserving positive linear map from positive mea-sures over some a ﬁnite set X (or commutative algebra with dimension |X | ) toHermitian operators, and p and q are positive measures over X , withΓ ( p ) = ρ, Γ ( q ) = σ. (Note Γ is necessarily completely positive.)For a function f satisfying above (FC), we deﬁne maximal f -divergence D max f ( ρ k σ ) = inf (Γ , { p,q } ) D f ( p k q ) , (3.1)where the inﬁmum is taken over all the reverse tests. The name comes from thefact that D max f ( ρ k σ ) is the largest quantum version of D f ( p k q ); here, quantumversion of D f ( p k q ) is any D Qf ( ρ k σ ) such that (D1) D Qf (Λ ( ρ ) k Λ( σ )) ≤ D Qf ( ρ k σ ) holds for any completely positive trace pre-serving (CPTP) map, any density operators ρ , σ on ﬁnite dimensionalHilbert spaces. (D2) D Qf ( p k q ) = D f ( p k q ) for any probability distributions p , q over any ﬁnitesets.Here p is identiﬁed with P x ∈X p ( x ) | e x i h e x | , for example, where {| e x i ; x ∈ X } is a CONS. Choice of a particular CONS is not important, sinceD Qf (cid:0) U ρU † k U σU † (cid:1) = D Qf ( ρ k σ )for any unitary operator U due to (D1).We also consider the following stronger condition. (D1’) D Qf (Λ ( ρ ) || Λ( σ )) ≤ D Qf ( ρ || σ ) holds for any trace preserving positive mapΛ, any any density operators ρ , σ , on ﬁnite dimensional Hilbert spaces.6 emma 3.1 If (FC) is satisﬁed, D max f satisﬁes above (D1), (D1’) and (D2).Also, if a two point functional D Qf satisﬁes satisﬁes both of (D1) and (D2), orboth of (D1’) and (D2), D Qf ( ρ k σ ) ≤ D max f ( ρ k σ ) . Proof.

Let Λ be a trace preserving positive map. Then,D max f (Λ ( ρ ) k Λ( σ ))= inf (Γ , { p,q } ) { D f ( p k q ); (Γ , { p, q } ) : a reverse test of { Λ ( ρ ) , Λ( σ ) }}≤ inf (Γ , { p,q } ) { D f ( p k q ); Γ = Γ ′ ◦ Λ , (Γ ′ , { p, q } ) : a reverse test of { ρ, σ }} = D max f ( ρ k σ ) . Hence, D max f satisﬁes (D1’), and thus (D1) also. Also,D max f ( p k q ) = inf { D f ( p ′ k q ′ ) ; p = Γ ( p ′ ) , q = Γ ( q ′ ) , Γ: stochastic map }≥ inf { D f (Γ ( p ′ ) k Γ ( q ′ )) ; p = Γ ( p ′ ) , q = Γ ( q ′ ) , Γ: stochastic map } = D f ( p k q ) . Since the opposite inequality is trivial, we have D max f ( p k q ) = D f ( p k q ). Thus,D max f satisﬁes (D2).Suppose D Qf satisﬁes (D1) (or (D1’)) and (D2), and let (Γ , { p, q } ) be a reversetest of { ρ, σ } . Then,D Qf ( ρ k σ ) = D Qf (Γ ( p ) k Γ ( q )) ≤ D Qf ( p k q ) = D f ( p k q ) . Therefore, taking inﬁmum over all the reverse tests of { ρ, σ } , we have D Qf ( ρ k σ ) ≤ D max f ( ρ k σ ). X In deﬁning reverse tests, we had assumed the cardinality of X , where { p, q } are deﬁned, is ﬁnite for mathematical simplicity. But, this restriction is notessential as long as dim H < ∞ , since Caratheodory’s theorem puts a naturalupper bound to the size of X .Denote by δ x the delta distribution at x , and deﬁne r x := p ( x ) /q ( x ) , if x ∈ supp q, X x ∈X q ( x )Γ( δ x ) = σ, X x ∈X q ( x ) r x Γ( δ x ) = ρ − X x : q ( x )=0 p ( x )Γ( δ x ) , X x ∈X q ( x ) f ( r x ) = D f ( p k q ) − ˆ f (0) X x : q ( x )=0 p ( x ) . Since P x ∈X q ( x ) < ∞ , by Caratheodory’s theorem, there is a positive ﬁnitemeasure q such that P x ∈X q ( x ) = P x ∈X q ( x ), X x ∈X q ( x )Γ( δ x ) = X x ∈X q ( x )Γ( δ x ) , X x ∈X q ( x ) r x Γ( δ x ) = X x ∈X q ( x ) r x Γ( δ x ) , X x ∈X q ( x ) f ( r x ) = X x ∈X q ( x ) f ( r x ) , and supp q ⊂ supp q , | supp q | ≤ (dim H ) + (dim H ) + 1 + 1. Thus, deﬁning X , p , and Γ by X := supp q ∪ { x } ,p ( x ) := (cid:26) r x q ( x ) , if x ∈ supp q, P x : q ( x )=0 p ( x ) , if x = x , Γ( δ x ) := (cid:26) Γ( δ x ) , if x ∈ supp q, P x : q ( x )=0 Γ( δ x ) , if x = x we have: Lemma 4.1

To each given test (Γ , { p, q } ) of { ρ, σ } , there is a reverse test (cid:0) Γ , { p, q } (cid:1) such that (i) D f ( p k q ) = D f ( p k q ) and (ii) p and q are deﬁned overthe set X with (cid:12)(cid:12) X (cid:12)(cid:12) ≤ H ) + 3 . Also, X =supp q ∪ { x } . Without loss of generality, we suppose X =supp q ∪ { x } , and deﬁne S x := (cid:26) q ( x )Γ( δ x ) , if x = x , ,p ( x )Γ( δ ) , if x = x . Then it should satisfy X x ∈X \{ x } r x S x + S x = ρ, X x ∈X \{ x } S x = σ. (4.1)8onversely, to each such { S x , r x ; x ∈ X } , there corresponds a reverse test withΓ( δ x ) = S x S x and { p ( x ) , q ( x ) } = (cid:26) { r x tr S x , tr S x } , if x = x , { tr S x , } , if x = x .So { S x , r x ; x ∈ X } is a bijective representation of a reverse test. By this repre-sentation, D max f is the solution to the optimization problemD max f ( ρ k σ ) = inf  X x ∈X \{ x } f ( r x )tr S x + ˆ f (0)tr S x ; S x with (4.1)  . (4.2)Observe this optimization can be done in the two stages; Fixing S x , orequivalently ρ ∗ := ρ − S x = X x ∈X \{ x } r x tr S x ≤ ρ, (4.3)optimize { S x ; x ∈ X \{ x }} to minimize X x ∈X \{ x } f ( r x )tr S x = X x :supp q ( x ) q ( x ) f (cid:18) p ( x ) q ( x ) (cid:19) = D f ( p k q ) , where ˜ p is restriction of p to supp q . Since (Γ , { ˜ p, q } ) is a reverse test of { ρ ∗ , σ } ,the minimum of D f (˜ p k q ) equals D max f ( ρ ∗ k σ ) . After this is done, we optimize ρ ∗ to minimizeD max f ( ρ ∗ k σ ) + ˆ f (0)tr S x = D max f ( ρ ∗ k σ ) + ˆ f (0)tr ( ρ − ρ ∗ ) . Note that supp ρ ∗ ⊂ supp σ holds by (4.1) and (4.3), and 0 ≤ ρ ∗ ≤ ρ by itsdeﬁnition (4.3). Thus,D max f ( ρ k σ ) = inf n D max f ( ρ ∗ k σ ) + ˆ f (0)tr ( ρ − ρ ∗ ) ; 0 ≤ ρ ∗ ≤ ρ, supp ρ ∗ ⊂ supp σ o (4.4)Here, introduce the operator˜ ρ := ρ − ρ ρ − ρ , (4.5)where ρ = (cid:20) ρ ρ ρ ρ (cid:21) , σ = (cid:20) σ

00 0 (cid:21) . Lemma 4.2

Suppose ρ ∗ ≥ is supported on supp σ and ρ ∗ ≤ ρ . Then, ˜ ρ ≥ ρ ∗ . Also, ≤ ˜ ρ ≤ ρ and supp ˜ ρ ⊂ supp σ . Proof.

By Proposition 10.5, ˜ ρ ≥

0. (in fact, ˜ ρ − = π σ ρ − π σ .) ˜ ρ ≤ ρ andsupp ˜ ρ ⊂ supp σ are obvious by deﬁnition.9ince ρ ∗ ≤ ρ , (cid:20) ρ − ρ ∗ ρ ρ ρ (cid:21) ≥ , Therefore, by Proposition 10.5, we should have ρ − ρ ∗ ≥ ρ ρ − ρ , or equiv-alently, ˜ ρ = ρ − ρ ρ − ρ ≥ ρ ∗ . By Lemma 4.2, (4.4) can be rewritten as follows:D max f ( ρ k σ ) = inf n D max f ( ρ ∗ k σ ) + ˆ f (0)tr ( ρ − ρ ∗ ) o = inf n D max f ( ρ ∗ k σ ) + ˆ f (0)tr (˜ ρ − ρ ∗ ) o + ˆ f (0)tr ( ρ − ˜ ρ )= D max f (˜ ρ k σ ) + ˆ f (0)tr ( ρ − ˜ ρ ) , (4.6)where ρ ∗ moves all the operators with ρ ≥ ρ ∗ ≥ ρ ∗ ⊂ supp σ , orequivalently, ˜ ρ ≥ ρ ∗ ≥ ρ ∗ ⊂ supp σ . To list all the reverse tests, commutative Radon-Nikodym derivative is useful.Given ρ ∗ ≥ ρ ∗ ⊂ sppp σ , the commutative Radon-Nikodym deriva-tive with respect to σ is deﬁned by d ( ρ ∗ , σ ) := σ − / ρ ∗ σ − / . (4.7)Suppose ρ ∗ ≤ ρ and let { M x } be a resolution of identity into positive operatorswith d ( ρ ∗ , σ ) = X x ∈X \{ x } r x M x , X x ∈X M x = . Then S x = (cid:26) σ / M x σ / , if x = x , ρ − ρ ∗ ) ( ρ − ρ ∗ ) , if x = x . Therefore, { q ( x ) , p ( x ) } = (cid:26) { tr σM x , r x q x } , if x = x , { , tr ( ρ − ρ ∗ ) } , if x = x , (4.8)and Γ( δ x ) := ( q ( x ) σ / M x σ / , if x = x , ρ − ρ ∗ ) ( ρ − ρ ∗ ) , if x = x . (4.9)Thus, { M x , r x } and ρ ∗ speciﬁes a reverse test.When M x ’s are projectors and ρ ∗ = ˜ ρ , we say the corresponding reversetest is minimal . The minimal reverse test turns out to be optimal under certainnatural conditions (namely, the condition (F) in Section 3) on f .10 Properties of D max f Theorem 5.1

When f satisﬁes (FC), D max f has the following properties.(i) D max f is jointly convex: if ρ = P i c i ρ i , σ = P i c i σ i , P i c i = 1 ( c i ≥ ), D max f ( ρ k σ ) ≤ X i c i D max f ( ρ i k σ i ) , (5.1) (ii) If f (0) = 0 in addition, it is monotone decreasing in the second argu-ment: D max f ( ρ k X ) ≤ D max f ( ρ k σ ) , X ≥ σ (5.2) (iii) D max f is positively homogeneous. D max f ( cρ k cσ ) = c D max f ( ρ k σ ) , c ≥ . (5.3) In particular, D max f (0 k

0) = 0 . (5.4) (iv) Direct sum property: D max f ( ρ ⊕ ρ k σ ⊕ ρ ) = D max f ( ρ k σ ) + D max f ( ρ k σ ) , (5.5) where ρ i , σ i are supported on H i ( i = 0 , ), and H ⊥ H . Proof. (i): let (Γ i , { p i , q i } ) be a reverse tests of { ρ i , σ i } , where p i , q i are positivemeasures over the ﬁnite set X i . Then ‘mixture’ of these reverse tests withprobability c i , compose a reverse test (Γ , { p, q } ) of { ρ, σ } : let X = S i X i , anddeﬁne p i ( x ) := c i p , q i ( x ) := c i q i ( x ) , Γ( δ x ) = Γ i ( δ x ) , ( x ∈ X i ) . Then, D f ( p k q ) = X i c i D f ( p i k q i ) . Therefore, minimizing over all the reverse tests of { ρ, σ } , we obtain (5.1).(ii): let X ′ := X − σ ≥

0, and deﬁne X = supp q ∪ supp p and X ∩ X = ∅ .Then D max f ( ρ k X ) = D max f ( ρ k σ + X ′ ) ≤ inf { D f ( p k q + q ′ ) ; Γ ( p ) = ρ, Γ ( q ) = σ, Γ ( q ′ ) = X, supp q ′ = X } = inf { D f ( p k q ) ; Γ ( p ) = ρ, Γ ( q ) = σ, Γ ( q ′ ) = X, supp q ′ = X } = D max f ( ρ k σ ) , where the identity in the third line is due to: since X ∩ X = ∅ and f (0) = 0, X x ∈X ∪X g f ( p ( x ) , q ( x ) + q ′ ( x )) = X x ∈X g f ( p ( x ) , q ( x )) + X x ∈X g f (0 , q ′ ( x ))= X x ∈X g f ( p ( x ) , q ( x ))11iii): Let c >

0. Then to each reverse test (Γ , { p, q } ) of { ρ, σ } , correspondsthe reverse test (Γ , { cp, cq } ) test of { cρ, cσ } , and vice versa. Hence, due to thefact that D f positively homogeneous, we have the identity. When c = 0, weonly have to show the LHS is 0. In fact, if (Γ , { p, q } ) is an arbitrary reverse testof { , } , supp p and supp q are empty, and D f ( p k q ) = 0. Thus D max f (0 k

0) = 0 .(iv): ” ≤ ” is trivial. Thus, we show ” ≥ ”. Let (Γ , { p, q } ) be a reverse test of { ρ, σ } = { ρ ⊕ ρ , σ ⊕ σ } , and deﬁneΓ i ( δ x ) := 1tr π H i Γ( δ x ) π H i Γ( δ x ) π H i ,p i ( x ) := p ( x )tr π H i Γ( δ x ) , q i ( x ) := p ( x )tr π H i Γ( δ x ) . Then (Γ i , { p i , q i } ) is a reverse test of { ρ i , σ i } ( i = 0 , g f ispositively homogeneous,D f ( p k q ) + D f ( p k q )= X i =0 , X x ∈X g f ( p ( x )tr π H i Γ( δ x ) , q ( x )tr π H i Γ( δ x )) = X x ∈X X i =0 , tr π H i Γ( δ x ) g f ( p ( x ) , q ( x ))= X x ∈X g f ( p ( x ) , q ( x )) = D f ( p k q ) . Thus,inf D f ( p k q ) = inf { D f ( p k q ) + D f ( p k q ) } ≥ inf D f ( p k q ) + inf D f ( p k q ) , which leads to the asserted inequality. After all, we have (5.5). Lemma 5.2

Suppose f satisﬁes (FC). Then the convex function ( ρ, σ ) → D max f ( ρ || σ ) is proper. Thus, it is nowhere −∞ . Proof.

An improper convex function is necessarily inﬁnite except perhaps atrelative boundary points of its eﬀective domain (Theorem 7.2 of [23]). ButD max f ( p k p ) = D f ( p k p ) = X x ∈X p ( x ) f (1)is ﬁnite. Thus D max f cannot be improper. Theorem 5.3

Suppose f satisﬁes (FC). Then D max f ( ρ || σ ) < ∞ only in thefollowing four cases.(i) ˆ f (0) < ∞ and f (0) < ∞ ;(ii) ˆ f (0) < ∞ , f (0) = ∞ , and supp ρ ⊃ supp σ ;(iii) ˆ f (0) = ∞ , f (0) < ∞ , and supp ρ ⊂ supp σ ;(iv) ˆ f (0) = ∞ , f (0) = ∞ , and supp ρ = supp σ . Proof.

In all the cases, if (Γ , { p, q } ) is the minimal reverse test of { ρ, σ } ,D f ( p || q ) < ∞ . Thus below we show D max f ( ρ || σ ) = ∞ in the case where theseconditions are not true. 12uppose ˆ f (0) = ∞ and supp ρ supp σ . Then by (4.6) D max f ( ρ k σ ) = ∞ ,since D max f (˜ ρ k σ ) ≥ D max ar + b (˜ ρ k σ ) = a tr ˜ ρ + b tr σ > −∞ , where a , b is chosen so that f ( r ) ≥ ar + b , r ≥

0. Suppose f (0) = ∞ andsupp ρ supp σ . Then D max f ( ρ k σ ) = ∞ is concluded by replacing f by ˆ f in theabove argument. f is operator convex In this section, we suppose that f is operator convex and f (0) = 0 in additionto satisfying (FC): (F) f is proper, lower semicontinuous, and operator convex. In addition,dom f ( x ) = [0 , ∞ ) and f (0) = 0.If this is true and ˆ f (0) < ∞ , by Proposition 10.3, f ( r ) = ˆ f (0) r + f ( r ) , (6.1)where f ( r ) satisﬁes (F) and is operator monotone decreasing.When supp σ ⊃ supp ρ , by the correspondence (4.8) and (4.9),D max f ( ρ k σ ) = inf { M x } , { r x } ∗ ( X x ∈X f ( r x ) tr σM x ; X x ∈X r x M x = d ( ρ, σ ) , X x ∈X M x = ) (6.2)Here we use Naimark extension. Denoting the extended space by H ′ , and letting V be an isometry from H (,where ρ , σ , etc. are living in) into H ′ , there is a tupleof mutually orthogonal projectors { E x } in H ′ with V E x V † = M x . Therefore, X x ∈X f ( r x ) tr σM x = tr σV f X x ∈X r x E x ! V † ≥ tr σf X x ∈X r x V E x V † ! = tr σf X x ∈X r x M x ! = tr σf ( d ( ρ, σ )) , where the inequality in the second line is by Jensen’s inequality, Proposition 10.2(Note X → V XV † is a positive unital map into B ( H )). The identity is true if M x ’s are mutually orthogonal projectors and H ′ = H , i.e., if the reverse test isminimal. Thus, D max f ( ρ k σ ) = tr σf ( d ( ρ, σ ))and the identity is achieved by the minimal test.Next, suppose supp σ supp ρ . If ˆ f (0) = ∞ , by Theorem 5.3, D max f ( ρ k σ ) = ∞ . If ˆ f (0) < ∞ , we can apply (4.6). After all:13 heorem 6.1 D max f ( ρ || σ ) = tr σ f ( d (˜ ρ, σ )) + ˆ f (0)tr ( ρ − ˜ ρ ) (6.3)= tr σ f (cid:0) σ − ˜ ρ (cid:1) + ˆ f (0)tr ( ρ − ˜ ρ ) holds if (F) is true. (If ˆ f (0) = ∞ , both ends are ∞ and the identity holds.) Theminimum is achieved by the minimal reverse test of { ρ, σ } . Throughout this subsection, we suppose supp σ ⊃ supp ρ . With f KL ( r ) := r log r, D max f KL ( ρ || σ ) = tr σσ − ρ (cid:0) log σ − ρ (cid:1) = tr ρ log σ − ρ = tr ρ log ρ / σ − ρ / . This quantity, corresponding to Kullback-Leibler divergence, had been studiedby various authors [4][9][13][10][19]. The relation to the reverse test problem isﬁrst pointed out by [16].Deﬁne f α ( r ) := ( ± ) r α , where the sign is chosen so that the function isconvex on the positive half - line. This is operator convex if α ∈ [ − , / { } ,and D max f α ( ρ || σ ) = tr σf α (cid:0) σ − ρ (cid:1) . One can check the following identity, which is a special case of D max f = D maxˆ f :D max f α ( ρ || σ ) = D max f − α ( σ || ρ ) . When (F) is true, the following operator valued quantity g f ( ρ, σ ), called non-commutative perspective [6][7], satisﬁes tr g f ( ρ, σ ) = D max f ( ρ k σ ) and some oper-ator version of properties of D max f has: g f ( ρ, σ ):=  √ σf (cid:0) σ − / ρσ − / (cid:1) √ σ, if supp σ ⊃ supp ρ,g f (˜ ρ, σ ) + ˆ f (0)( ρ − ˜ ρ ) , if supp σ supp ρ , ˆ f (0) < ∞ , undeﬁned, otherwise.In the second case, since ˆ f (0) < ∞ , by (6.1), g f ( ρ, σ ) = g f (˜ ρ, σ ) + ˆ f (0) ( ρ − ˜ ρ ) = g f (˜ ρ, σ ) + ˆ f (0) ρ = inf n g f ( ρ ∗ , X ) + ˆ f (0) ρ ; 0 ≤ ρ ∗ ≤ ρ, supp X ⊃ supp ρ ∗ o . (6.4)14 emark 6.2 In [6][7], they deﬁne g f ( ρ, σ ) only for the case where supp σ ⊃ supp ρ , and proves various properties of the quantity including ones presentedbelow. The most important one, which is used later, is operator version of (D1):

Lemma 6.3 (i) For any positive trace preserving map Λ , we have Λ ( g f ( ρ, σ )) ≥ g f (Λ ( ρ ) , Λ( σ )) . (6.5) (ii) If g f (Λ ( ρ ) , Λ( σ )) = Λ ( g f ( ρ, σ )) , then Λ(˜ ρ ) is the largest positive oper-ator supported on Λ( σ ) and majorized by Λ ( ρ ) . Thus, Λ ( g f (˜ ρ, σ )) = g f (Λ(˜ ρ ) , Λ( σ )) . (6.6) Proof.

For a given positive trace preserving map Λ, deﬁneΛ σ ( X ) := { Λ( σ ) } − / Λ (cid:16) σ / Xσ / (cid:17) { Λ( σ ) } − / , (6.7)which is a positive unital map into B (supp Λ( σ )):Λ σ ( ) = π Λ( σ ) , (6.8)and Λ σ ( d ( ρ, σ )) = d (Λ( ρ ) , Λ( σ )) . (6.9)If supp σ ⊃ supp ρ , since Λ is positive, Λ( ρ ) is supported on supp Λ( σ ) and d (Λ( ρ ) , Λ( σ )) exists. Also, g f (Λ( ρ ) , Λ( σ )) = ( a ) Λ( σ ) / f (Λ σ ( d ( ρ, σ ) ) ) Λ( σ ) / ≤ ( b ) Λ( σ ) / Λ σ ( f ( d ( ρ, σ ) ) ) Λ( σ ) / = Λ (cid:16) σ / f ( d ( ρ, σ )) σ / (cid:17) , (6.10)where (a) and (b) is by (6.9) and Proposition 10.2, respectively. If supp σ supp ρ and ˆ f (0) < ∞ , g f (Λ( ρ ) , Λ( σ ))= inf ρ ∗ n g f ( ρ ∗ , Λ( σ )) + ˆ f (0)Λ( ρ ); Λ( ρ ) ≥ ρ ∗ ≥ , supp Λ( σ ) ⊃ supp ρ ∗ o ≤ g f (Λ(˜ ρ ) , Λ( σ )) + ˆ f (0)Λ ( ρ ) ≤ Λ ( g f (˜ ρ, σ )) + ˆ f (0)Λ ( ρ )= Λ (cid:16) g f (˜ ρ, σ ) + ˆ f (0) ρ (cid:17) = Λ ( g f ( ρ, σ )) , where the inequality in the fourth line is by (6.10) (Recall ˜ ρ as of (4.5) issupported on supp σ ). 15herefore, if g f (Λ( ρ ) , Λ( σ )) = Λ ( g f ( ρ, σ )), Λ(˜ ρ ) should achieve the inﬁmumin the second line. Thus the ﬁrst statement of (ii) is true. Then, g f (Λ( ρ ) , Λ( σ )) = g f (Λ(˜ ρ ) , Λ( σ )) + ˆ f (0)Λ ( ρ − ˜ ρ ) . Equating this to Λ ( g f ( ρ, σ )) = Λ ( g f (˜ ρ, σ )) + ˆ f (0)Λ(˜ ρ ), we have (6.6).Operator versions of (5.3) and (5.5) are trivial. Thus next we show theoperator versions of (5.1): g f ( ρ, σ ) ≤ X i c i g f ( ρ i , σ i ) , (6.11)where ρ := P i c i ρ i , σ := P i c i σ i , P i c i = 1, and c i ≥

0. (6.11) for the casesupp σ ⊃ supp ρ is known [6]. if ˆ f (0) < ∞ , X i c i g f ( ρ i , σ i ) = X i c i g f (˜ ρ i , σ i ) + ˆ f (0) X i c i ( ρ i − ˜ ρ i ) ≥ g f X i c i ˜ ρ i , X i c i σ i ! + ˆ f (0) X i c i ρ i Above, since supp σ = span { S supp σ i } , P i c i ˜ ρ i is supported on supp σ . Thusthe last end is well-deﬁned. Also, P i c i ˜ ρ i ≤ P i c i ρ i = ρ . Thus, The last end isbounded from below byinf n g f ( ρ ∗ , σ ) + ˆ f (0) ρ ; ρ ≥ ρ ∗ ≥ , supp σ ⊃ supp ρ ∗ o = g f (˜ ρ, σ ) + ˆ f (0) ρ = g f ( ρ, σ ) , concluding (6.11).Lastly, the analogue of (5.2) is g f ( ρ, σ ) ≤ g f ( ρ, X ) , X ≥ σ. (6.12)This is proved as follows. Since X ≥ σ , C := σ / X − / satisﬁes CX / = σ / , k C k ≤ . If supp X ⊃ supp σ ⊃ supp ρ , g f ( ρ, σ ) = X / C † f ( d ( ρ, σ )) X / C ≥ X / f (cid:0) C † d ( ρ, σ ) C (cid:1) X / = X / f (cid:16) X − / ρ X − / (cid:17) X / = g f ( ρ, X ) , where the inequality in the second line is due to Proposition 10.1. If ˆ f (0) < ∞ and supp σ supp ρ , g f ( ρ, σ ) = g f (˜ ρ, σ ) + ˆ f (0) ρ ≥ g f (˜ ρ, X ) + ˆ f (0) ρ ≥ inf n g f ( ρ ∗ , X ) + ˆ f (0) ρ ; 0 ≤ ρ ∗ ≤ ρ, supp X ⊃ supp ρ ∗ o = g f ( ρ, X ) . emark 6.4 Here, ˜ ρ is not necessarily the largest element of the set { ρ ∗ ; 0 ≤ ρ ∗ ≤ ρ, supp X ⊃ supp ρ ∗ } . Thus, in general, the equality between the second and the third line does nothold. D max f is closely related to RLD Fisher metric J Rρ ( X, Y ) := tr Xρ − Y, where X and Y are self - adjoint operators living in the support of ρ > X = tr Y = 0. This quantity plays important role in quantum statisticalestimation theory [14], and is the largest monotone metric on the space of densityoperators [22]. Also, this quantity is the solution to inﬁnitesimal version ofreverse test [16]; The triple (Γ , p, v ) of the positive trace preserving map Γ frompositive measures to self - adjoint operators, the probability distribution over theﬁnite set X , and the real valued function over X with P x ∈X v ( x ), is said to be reverse estimation of { ρ, X } iﬀΓ ( p ) = ρ, Γ ( v ) = X. Then J Rρ ( X, X ) = inf ( X x ∈X { v ( x ) } p ( x ) ; (Γ , p, v ) is a reverse estimation of { ρ, X } ) . Here, the function minimized is called

Fisher information and plays signiﬁcantroll in point estimation.This problem reduces to our reverse test problem of { ρ + εX, ρ } , where ε ischosen so that ρ + εX ≥

0, and f ( r ) = (1 − r ) . Since f ( r ) − , { p + εv, p } ) is the minimal reverse test of { ρ + εX, ρ } .Below, we prove f ′′ (1) J Rρ ( X, X ) , = d d ε D max f ( ρ + εX k ρ ) (cid:12)(cid:12)(cid:12)(cid:12) ε =0 = d d ε D max f ( ρ k ρ + εX ) (cid:12)(cid:12)(cid:12)(cid:12) ε =0 (6.13)when f ′′′ exists and uniformly bounded in the sense that | f ′′′ ( x ) | < c, ∃ ε > ∀ x ∈ (1 − ε, ε ) . (Diﬀerentiating (6.3) twice, one can also obtain (6.13).The key observation is: The minimal reverse test of { ρ + εX, ρ } are the samefor all ε > ρ + εX ≥ , Therefore, diﬀerentiating the both sides ofD max f ( ρ + εX || ρ ) = D f ( p + εv || p )17wice, well-known relation between Fisher information and f -divergence leads tothe ﬁrst identity. The second identity follows from Corollary 6.10 (which will beshown later) that states the minimal reverse test of { ρ + εX, ρ } and { ρ, ρ + εX } are identical. In this section, we show that any optimal reverse test is essentially identical tothe minimal reverse test, provided that (F) is satisﬁed. First, we show sometechnical lemmas. (They themselves are of interest. We show another applica-tion of them in the next section)A function f with (F), by Theorem 8.1 of [12], is written as f ( r ) = cr + br + Z (0 , ∞ ) (cid:18) r λ + ψ λ ( r ) (cid:19) d µ ( λ ) , (6.14)where c is a real number, b > µ is a positive Borel measure with R (0 , ∞ ) d µ ( λ )(1+ λ ) < ∞ , and ψ λ ( r ) : = − rλ + r .In what follows, we suppose[0 , ∞ ) ⊂ supp µ ∪ { } , (6.15)where supp µ is the set of all points r having property that µ ( U ) > U containing λ (see Theorem 2.2.1 and Deﬁnition 2.2.1 of [21].) r ln r ,( ± r α ( − ≤ α ≤ α = 1) satisﬁes (6.15) (see Example 8.3, [12]). Lemma 6.5

Suppose f satisﬁes (F) and (6.15). Suppose also D max f ( ρ || σ ) < ∞ .Let Λ be a positive trace preserving map. Then, D max f ( ρ || σ ) = D max f (Λ ( ρ ) || Λ( σ )) (6.16) implies Λ σ ( h ( d (˜ ρ, σ ))) = h (Λ σ ( d (˜ ρ, σ ))) . (6.17) Here, Λ σ is a subunital positive map deﬁned by (6.7), and h is an arbitraryfunction on [0 , ∞ ) .Conversely, if (6.17) holds, (6.16) holds for any f with (F). In fact, for anyfunction h on [0 , ∞ ) , tr σh ( d (˜ ρ, σ )) = tr Λ( σ ) h ( d (Λ(˜ ρ ) , Λ( σ ))) . (6.18) Proof.

By (6.5), (6.16) and D max f ( ρ || σ ) < ∞ impliesΛ ( g f ( ρ, σ )) = g f (Λ( ρ ) , Λ( σ )) . (6.19)First, suppose supp ρ ⊂ supp σ . Observe, by (6.14), g f ( ρ, σ ) = cρ + b g f ( ρ, σ ) + Z (0 , ∞ ) (cid:18) ρ λ + g ψ λ ( ρ, σ ) (cid:19) d µ ( λ ) (6.20)18here f ( r ) := r . Since f and ψ λ ( r ) = − rλ + r satisﬁes (F), (6.5). (6.19) and(6.15) lead to Λ ( g ψ t ( ρ, σ )) = g ψ t (Λ ( ρ ) , Λ( σ )) , ∀ t > . This, by Proposition 10.4, impliesΛ ( g h ( ρ, σ )) = g h (Λ ( ρ ) , Λ( σ )) , (6.21)which, using (6.9), implies (6.17).Next, suppose supp ρ supp σ . Then for D max f ( ρ || σ ) < ∞ to be true,ˆ f (0) < ∞ should hold. Therefore, by (6.6),Λ ( g f (˜ ρ, σ )) = g f (Λ(˜ ρ ) , Λ( σ )) . Therefore, using the parallel argument as above, we have (6.17).The second assertion of the theorem is proved by straightforward computa-tion.

Lemma 6.6

Suppose f satisﬁes (F) and (6.15). Suppose also D max f ( ρ || σ ) < ∞ .Let Λ be a positive trace preserving map. Also, let (Γ , { p, q } ) and (Γ ′ , { p ′ , q ′ } ) bethe minimal reverse test of { ρ, σ } and { Λ ( ρ ) , Λ( σ ) } , respectively. Then, (6.16)holds iﬀ Λ (Γ( δ x )) = Γ ′ ( δ x ) , { p, q } = { p ′ , q ′ } . (6.22) Proof.

Since ‘if’ is trivial, we prove ‘only if’. Recall the minimal reverse testis given by Γ( δ x ) = 1tr ( ρ − ˜ ρ ) ( ρ − ˜ ρ ) , Γ( δ x ) = 1tr σP x √ σP x √ σ, ( x = x ) , where d (˜ ρ, σ ) = P x d x P x , where d x and P x is an eigenvalue and projection ontoeigenspace, respectively.Let P ′ x := Λ σ ( P x ), and applying (6.17) with h ( r ) := (cid:26) , ( r = d x ) , , otherwise . we have P ′ x = Λ σ ( P x ) = Λ σ ( h ( d (˜ ρ, σ ))) = h (Λ σ ( d (˜ ρ, σ ))) . Since eigenvalues of h (Λ σ ( d )) are either 0 or 1, P ′ x is a projector. Since d (Λ(˜ ρ ) , Λ( σ )) = Λ σ ( d (˜ ρ, σ )) = X x d x Λ σ ( P x ) = X x d x P ′ x ,P ′ x s are the projectors onto the eigenspaces of d ′ , and spec d = spec d ′ . There-fore, if x = x ,Λ (Γ( δ x )) = Λ (cid:0) √ σP x √ σ (cid:1) = p Λ( σ )Λ σ ( P x ) p Λ( σ )= p Λ( σ ) P ′ x p Λ( σ ) = Γ ′ ( δ x ) . x = x ,Γ ′ ( δ x ) = 1tr Λ ( ρ − ˜ ρ ) Λ ( ρ − ˜ ρ )= Λ (cid:18) ρ − ˜ ρ ) ( ρ − ˜ ρ ) (cid:19) = Λ (Γ ( δ x )) . Therefore, if (Γ ′ , { p ′ , q ′ } ) is the minimal reverse test of { Λ( ρ ) , Λ( σ ) } , Γ ′ = Λ ◦ Γ.Having speciﬁed the map Γ ′ , next we specify { p ′ , q ′ } . For (Λ ◦ Γ , { p ′ , q ′ } ) tobe a reverse test of { Λ( ρ ) , Λ( σ ) } , X x = x q ′ ( x )Λ ◦ Γ( δ x ) = Λ( σ ) = X x = x q ( x )Λ ◦ Γ( δ x ) . Thus we have to have, for all x = x , X x q ′ ( x ) p Λ( σ ) P ′ x p Λ( σ ) = X x q ( x ) p Λ( σ ) P ′ x p Λ( σ ) . Since P ′ x is supported on supp d ′ = supp Λ( σ ), this is equivalent to X x q ′ ( x ) P ′ x = X x q ( x ) P ′ x . Since P ′ x s are orthogonal projectors, we have q ′ = q .In the same way, we can prove that p ′ ( x ) = p ( x ), for all x = x . Then bytrace preserving nature of Λ, obviously p ′ ( x ) = p ( x ), concluding p ′ = p . Thuswe have the assertion.In the following, we extend the notion of the minimal reverse test to thepair { p, q } of positive measures, by identifying p with the diagonal matrix, P x p ( x ) | e x i h e x | , where {| e x i} is a CONS. Let (Υ , { p , q } ) be the minimalreverse test of { p, q } , where { p , q } are positive measures over the ﬁnite set Y . Then Υ is in fact stochastic map, but at the same time viewed as thepositive trace preserving map sending diagonal density matrices to diagonaldensity matrices.With this correspondence, the equivalence ˜ p of ˜ ρ (see (4.5) ) is in fact re-striction of p to supp q , and d (˜ p, q ) = X x ∈ supp q p ( x ) q ( x ) | e x i h e x | = X y ∈Y\{ y } r y P y , supp P y = span {| e x i ; p ( x ) /q ( x ) = r y } . Viewing Υ as a stochastic map, y ( = y ) is mapped to x iﬀ p ( x ) /q ( x ) = r y ,and y is mapped to x ∈ (supp q ) c . (Detailed form of the transition probabilityis not relevant now.) 20 emma 6.7 Let (Υ , { p , q } ) be the minimal reverse test of { p, q } , where { p , q } are positive measures over the ﬁnite set Y . Then, there is a positivetrace preserving map Υ − that invert Υ , Υ − ( p ) = p , Υ − ( q ) = q . In addition, Υ − is deterministic. Therefore, Υ − ◦ Υ ( δ y ) = δ y . Proof. Υ − corresponds to the following deterministic map from X to Y : x ∈ supp q is mapped to y iﬀ p ( x ) /q ( x ) = r y , and x ∈ (supp q ) c is mapped to y . . Remark 6.8

In the statistician’s term, r y is likelihood ratio, and thus y isminimal suﬃcient statistic of the family { p, q } [25]. That roughly means y contains all the information about the family { p, q } , and the smallest one amongthose having the same property. Thus, { p , q } is a kind of ”compression” of { p, q } . In fact, the map from { p, q } to { p , q } is deterministic, while its inverseis noisy. Lemmas 6.6 and 6.7 indicate that the optimal reverse test is essentiallyunique.

Theorem 6.9

Suppose D max f ( ρ || σ ) < ∞ , where f is a function with (F), and µ deﬁned by (6.14) satisﬁes(6.15). Let (Γ , { p, q } ) be an optimal reverse test, D max f ( ρ || σ ) = D f ( p || q ) . (6.23) If (Γ , { p , q } ) is the minimal reverse test of { ρ, σ } , there is a CPTP map Υ and Υ − with { p, q } Υ ⇆ Υ − { p , q } , (6.24) and Υ − is deterministic: Υ − ◦ Υ ( δ y ) = δ y .Therefore, Γ = Γ ◦ Υ . (6.25)Before proving this, let us see its implication. (6.25) intuitively means thatany optimal reverse test Γ diﬀers from the minimal one only in its classicalpreprocessing (”essential uniqueness”). Proof.

Let (Υ , { p , q } ) be the minimal reverse test of { p, q } . Then, takingrecourse to Lemma 6.6, we have { p , q } = { p , q } .Therefore, p = Υ ( p ) , q = Υ ( q ) . Also, by Lemma 6.7, there is a positive trace preserving map Υ − with p = Υ − ( p ) , q = Υ − ( q ) , (6.26)and Υ − ◦ Υ ( δ y ) = δ y .The following simple statement is not easy to prove directly, but quite easyif the theorem is given. 21 orollary 6.10 The minimal reverse test of { ρ, σ } and { σ, ρ } are identical. Proof.

Let us consider an operator convex function f ( r ) = −√ r , that satisﬁes(F) and (6.15) (Example 8.3, [12]). Since ˆ f as of (2.4) satisﬁes ˆ f ( r ) = −√ r = f ( r ), by ( 2.6), we haveD max f ( σ k ρ ) = D maxˆ f ( ρ k σ ) = D max f ( ρ k σ ) . Thus the minimal reverse test (Γ , { p, q } ) of { σ, ρ } also achieves D max f ( ρ k σ ).Therefore, by Theorem 6.9 { p , q } = { Υ − ( p ) , Υ − ( q ) } , where (Γ , { p , q } ) isthe minimal reverse test of { ρ, σ } , and Υ − is a deterministic map. Exchangingthe role of { ρ, σ } and { σ, ρ } , there is a deterministic map Υ − with { p, q } = (cid:8) Υ − ( p ) , Υ − ( q ) (cid:9) . Therefore,Γ = Γ ◦ Υ − , Γ = Γ ◦ Υ − . Thus we have the assertion.

Corollary 6.11

Suppose D max f ( ρ || σ ) < ∞ , where f is a function with (F), and µ deﬁned by (6.14) satisﬁes (6.15). Then if there is a measurement M takingvalues on the ﬁnite set Z such that D max f ( ρ || σ ) = D f (cid:0) P Mρ || P Mσ (cid:1) ,ρ and σ commute. Intuitively, this result seems trivial: It is not possible to retrieve classicalinformation imbedded in quantum states perfectly, unless the quantum statesare in fact commutative (classical). However, as later turns out, this result isgenerally not true if f is not operator convex. A counter example is f ( r ) = | − r | , and this corresponds to the total variation distance, one of the mostcommonly used distance measure. Proof.

Suppose (cid:8) P Mρ , P Mσ (cid:9) is a positive measures over the ﬁnite set Z .Let (Γ , { p, q } ) and (Υ , { p , q } ) be the minimal reverse test of { ρ, σ } and (cid:8) P Mρ , P Mσ (cid:9) , respectively. Apply Lemma 6.6 considering the measurement M as a positive linear map. Then we have { p , q } = { p, q } and P M Γ( δ x ) = Υ ( δ x ) (6.27)Since (cid:8) P Mρ , P Mσ (cid:9) are probability distributions, by Lemma 6.7, there is apositive trace preserving map Υ − with Υ − (Υ ( δ x )) = δ x . Composing them, weobtain Υ − (cid:16) P M Γ( δ x ) (cid:17) = δ x . The composition of the measurement M followed by the data processing Γ − canbe viewed as a measurement, to which POVM { ˜ M x } corresponds. Then, thiscan be rewritten as tr ˜ M x Γ( δ x ) = 1 , x ∈ X .22ince tr Γ( δ x ) = 1, this means that Γ( δ x )Γ ( δ x ′ ) = 0 ( x ′ = x ). Therefore, ρ = P x ∈X p ( x )Γ( δ x ) and σ = P x ∈X q ( x )Γ( δ x ) commute. Proposition 6.12 If f satisﬁes (F) and | f ′′′ ( x ) | < c, ∃ ε > ∀ x ∈ (1 − ε, ε ) .Let ρ ε := ρ + εX > . Then if that there is a measurement M ε for each ε takingvalues on the ﬁnite set Z such that D max f ( ρ k ρ ε ) = D f ( P M ε ρ ε k P M ε ρ ε ) , then ρ and X commute. Proof.

By (6.13) and by Section 9 of [18], this is equivalent to the existence ofthe measurement M with J Rρ ( X, X ) = J p M (cid:0) v M , v M (cid:1) , where the RHS is the Fisher information of p M := P Mρ , v M = ε (cid:0) P Mρ + εX − P Mρ (cid:1) .But this is impossible unless ρ and X commute (see, for example, [15]). Let {| ˆ ϕ x i ; x ∈ X } be family of linearly independent state vectors. Also let { τ x ; x ∈ X } be a family of density operators. The necessary and suﬃcient con-dition for the existence of CPTP map Λ withΛ ( τ x ) = | ˆ ϕ x i h ˆ ϕ x | , ∀ x ∈ X (6.28)have been studied by several authors. Especially, if τ x = | ϕ x i h ϕ x | , it is expressedin the following very simple form. ∃ A ≥ h ϕ x | ϕ x ′ i = A x,x ′ h ˆ ϕ x | ˆ ϕ x ′ i (see [5][26]).Here we show this is equivalent toΛ( ρ ) = ˆ ρ, Λ( σ ) = ˆ σ, (6.29)where ρ := X x p ( x ) τ x , σ := X x q ( x ) τ x , ˆ ρ := X x p ( x ) | ˆ ϕ x i h ˆ ϕ x | , ˆ σ := X x q ( x ) | ˆ ϕ x i h ˆ ϕ x | , and { p, q } is a probability distributions over X such that, with r x := p ( x ) /q ( x ), r x < ∞ , r x = r x ′ , ( x = x ′ ) . (6.30)That (6.28) implies (6.29) is trivial. To show the opposite implication, wetake recourse to Lemma 6.6. 23irst, observe (Γ , { p, q } ) and (cid:16) ˆΓ , { p, q } (cid:17) , where Γ( δ x ) := τ x and ˆΓ( δ x ) := | ˆ ϕ x i h ˆ ϕ x | , is a reverse test of { ρ, σ } and { ˆ ρ, ˆ σ } , respectively. In addition, thelatter one is minimal, since we had supposed that | ˆ ϕ x i ’s are linearly independentand that { p, q } satisﬁes (6.30).(Compute the minimal reverse test in the following manner. Deﬁne N := P x | ˆ ϕ x i h e x | , D p := P x p ( x ) | e x i h e x | , D q := P x q ( x ) | e x i h e x | . Then, ˆ σ = N D q N † . Therefore, there is a unitary U withˆ σ / = N D / q U. Therefore, ˆ σ − / ˆ ρ ˆ σ − / = X x r x U † | e x i h e x | U. Therefore, the minimal reverse test maps δ x to the constant multiple of ˆ σ / U † | e x i h e x | U ˆ σ / = q ( x ) | ˆ ϕ x i h ˆ ϕ x | . )Therefore, D f ( p k q ) = D max f (ˆ ρ k ˆ σ ) = D max f (Λ( ρ ) k Λ( σ )) ≤ D max f ( ρ k σ ) ≤ D f ( p k q ) , indicating D max f ( ρ || σ ) = D max f (ˆ ρ || ˆ σ ) = D f ( p || q ) . By Theorem 6.9, the minimal reverse test of { ρ, σ } should be essentially identicalto (Γ , { p, q } ). But by the assumption (6.30), (Γ , { p, q } ) has to be the minimalreverse test. Therefore, by Lemma 6.6, (6.29) implies (6.28). From this section, again we remove the assumption of operator convexity and f (0) = 0, and come back to our initial assumption (FC). To start, we treat thecase where one of the argument is rank -1.Suppose σ is rank -1(the other case is reduce to this case by replacing f byˆ f ), and apply (4.6). Since ˜ ρ is constant multiple of σ , we have:D max f ( ρ || σ ) = σ f (cid:0) σ − ˜ ρ (cid:1) + ˆ f (0) (tr ρ − ˜ ρ ) , (7.1)Though this coincide with (6.3), it holds irrespective of the assumption of op-erator convexity.Especially, if ρ is also rank -1, ˜ ρ = 0, thusD max f ( ρ || σ ) = f (0)tr σ + ˆ f (0)tr ρ, where f (0) and/or ˆ f (0) may be ∞ . 24 Total variation distance

The divergence corresponding to f ( r ) = | − r | ,D | − r | ( p k q ) = k p − q k , is called total variation distance. Its common quantum version is k ρ − σ k = sup M (cid:13)(cid:13) P Mρ − P Mσ (cid:13)(cid:13) , where P Mρ is the distribution of the outcome of the measurement M under ρ . This quantum version in fact is the smallest of all the quantum versionssatisfying (D1’) and (D2): D Q | − r | ( ρ k σ ) ≥ k ρ − σ k . (8.1)Observe (D1’) and (D2) implyD Q | − r | ( ρ k σ ) ≥ (cid:13)(cid:13) P Mρ − P Mσ (cid:13)(cid:13) . Maximization of the RHS about M leads to (8.1).In this section we study D max | − r | ( ρ k σ ). Given a reverse test (Γ , { p, q } ) of { ρ, σ } ,we deﬁne (Γ ′ , { p ′ , q ′ } ), where { p ′ , q ′ } are probability distributions on { , , } :Γ ′ ( δ ) := 1tr A A, Γ ′ ( δ ) := ρ − A tr ( ρ − A ) , Γ ′ ( δ ) := σ − A tr ( σ − A ) ,p ′ (0) := tr A, p ′ (1) := tr ( ρ − A ) , p ′ (2) := 0 ,q ′ (0) := tr A, q ′ (1) := 0 , q ′ (2) := tr ( σ − A ) . (8.2)where A := X x ∈X min { p ( x ) , q ( x ) } Γ( δ x ) . Then (Γ ′ , { p ′ , q ′ } ) is a reverse test of { ρ, σ } with k p ′ − q ′ k = k p − q k . In-tuitively, Γ ′ ( δ ) takes care of the common part of two states, and Γ ′ ( δ ) andΓ ′ ( δ ) compensates the remainder.Therefore, without loss of generality, we may restrict reverse tests to thosein the form of (8.2). Therefore:D max | − r | ( ρ k σ ) = inf { tr ( ρ + σ − A ) ; A ≥ , ρ ≥ A, σ ≥ A } . (8.3) In this subsection and the next, under the assumption that tr ρ = tr σ = 1, westudy conditions for D max | − r | ( ρ k σ ) = k ρ − σ k . (8.4)25his identity implies uniqueness of quantum version of statistical distance, andalso indicates that classical total variation distance embedded into quantumstates can be completely recovered by measurements. It turns out that thesize of the set of all { ρ, σ } ’s satisfying (8.4) is substantial. This is in contrastwith the case of operator convex functions, where the equivalence of (8.4) holdsalmost exclusively for commutative pairs of states (see Subsection 6.6).Dropping the constraint A ≥ max | − r | ( ρ k σ ) ≥ inf { tr ( ρ + σ − A ) ; ρ ≥ A, σ ≥ A } = inf { ρ − A ); ρ ≥ A, σ ≥ A } = inf { ρ − A ); ρ − A ≥ , ρ − A ≥ ρ − σ } = 2tr [ ρ − σ ] + = k ρ − σ k . Here, the minimum in the third line is achieved if ρ − A = [ ρ − σ ] + . ([ X ] + isthe positive part of the self-adjoint operator X .)Therefore, (8.4) holds iﬀ A = ρ − [ ρ − σ ] + = 12 ( ρ + σ − | ρ − σ | ) ≥ . (8.5)(Here, | X | := √ X † X .) Another necessary and suﬃcient condition is the exis-tence of A , ∆ , ∆ ≥ ρ = A + ∆ , σ = A + ∆ , (8.6)∆ ∆ = 0 . (8.7)To see this, observe k ∆ − ∆ k = k ρ − σ k ≤ D max | − r | ( ρ k σ ) = min { tr ∆ + tr ∆ ; (8 . , ∆ ≥ , ∆ ≥ } . For (8.4) to hold, existence of ∆ , ∆ with tr ∆ + tr ∆ = k ∆ − ∆ k isnecessary and suﬃcient. Thus ∆ ∆ = 0.Of course, in general, (8.5) is not true. For example, if ρ is a pure state and ρ = cσ , it is not true. (Let f = | − r | in the formula (7.1). Then what weobtain is very much diﬀerent from k ρ − σ k .) However, if ρ and σ are very closeso that k | ρ − σ | k ≤ minimum eigenvalue of ρ + σ, (8.8)it is true.Another suﬃcient condition is( ρ − σ ) = | ρ − σ | ≤ ( ρ + σ ) . To see this is suﬃcient, take the square root of both sides of inequality: then weobtain (8.5). (Recall √· is operator monotone. This condition is not necessary,since r is not operator monotone.) Rearranging the terms, we have ρσ + σρ ≥ . (8.9)26y (8.8), D max | − r | ( ρ k ρ + εX ) = k ρ − ( ρ + εX ) k for all small ε > X . On the other hand, if f is operator convex, Proposition 6.12 indicatesthat D max f ( ρ k ρ + εX ) = D min f ( ρ k ρ + εX ) for most of small ε > ρ and X commute. In this subsection, we assume dim H = 2 and tr ρ = tr σ = 1, and compute theset { σ ; (8.4) } for each ﬁxed ρ , using the necessary and suﬃcient condition givenby (8.6) and (8.7). As it turns out, this set is the spheroid, with focal points ρ and − ρ , and touching to the surface of Bloch sphere at each end of the longestaxis.Since tr ρ = tr σ = 1, c := tr ∆ = tr ∆ = 1 − tr A, and 0 ≤ c ≤ . Let v ρ , v σ , u , u , and u A be the Bloch vector of ρ , σ , c ∆ , c ∆ , and − c A , respectively. Also, (8.7) holds iﬀ ∆ and ∆ are rank - 1 and u = − u . Therefore, by (8.6), v ρ = cu + (1 − c ) u A , v σ = − cu + (1 − c ) u A . Therefore, v σ − v ρ = − cu , v σ − ( − v ρ ) = 2 (1 − c ) u A . Let k·k denote the Euclid norm in R , and k v σ − v ρ k + k v σ − ( − v ρ ) k = 2 ( c k u k + (1 − c ) k u A k ) ≤ . The set { σ ; (8.4) } is fairly large. For example, if the largest eigenvalue of ρ is ≤ .

85, this occupies more than the half of the volume of the Bloch sphere.If ρ = (cid:20) a cc b (cid:21) , σ = (cid:20) a − c − c b (cid:21) , ( a ≥ b )the minimization problem (8.3) is solved explicitly. With Z := diag (1 , − σ = ZρZ † , ρ = ZσZ † . Thus, if A satisﬁes constrains of (8.3), so does ( ZAZ † + A ),and tr A = tr ( ZAZ † + A ). Therefore, without loss of generality, we suppose A is diagonal. After some elementary analysis, the optimal A turns out to be A = (cid:26) diag ( a − | c | , b − | c | ) , ( a ≥ b ≥ | c | )diag( a − | c | b − , , ( a ≥ | c | ≥ b )and we have D max | − r | ( ρ k σ ) = (cid:26) | c | = k ρ − σ k , ( a ≥ b ≥ | c | )2( b + | c | b − ) . ( a ≥ | c | ≥ b )27 Dual Representation and Continuity

In this section we give the dual of (3.1), or representation of D max f by maximiza-tion of a linear functional:D max f ( ρ k σ ) = sup ( W ,W ) ∈W max f ( H ) { tr ρW + tr σW } , (9.1)where W max f ( H ) := { ( W , W ) ; sW + tW − g f ( s, t ) H ≤ , ∀ s, t ∈ [0 , } = { ( W , W ) ; f ( r ) − rW − W ≥ , r ≥ } (To see the equality in the second line, recall g f is positively homogeneous andcontinuous.).Let D Qf be a lower semicontinuous, proper, positively homogeneous, andconvex function over positive operators. Then by Corollary 13.5.1 of [23], itshould be in the form ofD Qf ( ρ k σ ) = sup ( W ,W ) ∈W Qf tr ( ρW + σW ) , where W Qf is, without loss of generality, convex and unbounded from below. Inaddition, suppose D Qf satisﬁes (D1’) and (D2).Since D Qf satisﬁes (D2) with p (1) = s , q (1) = t , p ( x ) = q ( x ) = 0, ( x = 1),D Qf ( s | e i h e | k t | e i h e | )= sup ( W ,W ) ∈W Qf ( s h e | W | e i + t h e | W | e i ) = g f ( s, t ).Since the second identity is true for all s ≥ t ≥ | e i with k e k = 1,( W , W ) ∈ W max f . Therefore, W Qf ⊂ W max f . Also, the RHS of (9.1) satisﬁes(D2).Therefore, the RHS of (9.1) is the largest of all proper, positively homoge-neous, convex, lower and semicontinuous functionals with (D2). As is easilyveriﬁed, it also satisﬁes (D1) and (D1’).On the other hand, D max f is the largest of all functionals with (D1’) and (D2)(Lemma 3.1) and turns out to be proper, positively homogeneous, convex, lowersemicontinuous. Therefore:cl D max f ( ρ k σ ) = sup ( W ,W ) ∈W max f ( H ) { tr ρW + tr σW } . (9.2) Remark 9.1

From above argument, if f satisﬁes (FC), D max f is the largestof all D Qf ’s which are lower semicontinuous, proper, positively homogeneous,convex, and satisfy (D2). max f is lower semicontinuous. Itsuﬃces to show this on D := { ( ρ, σ ) ; ρ ≥ , σ ≥ , tr ρ ≤ , tr σ ≤ } , by D max f ( λρ k λσ ) = λ D max f ( ρ k σ ), ∀ λ > , { p, q } ) satisﬁes X x ∈X q ( x ) ≤ , X x ∈X p ( x ) ≤ . Also, by Lemma 4.1, we suppose { p, q } are over X with |X | ≤ (dim H ) + 3.The set of all such reverse tests T = { τ = (Γ , { p, q } ) } can be identiﬁed with acompact subset of a ﬁnite dimensional real vector space.Deﬁne maps F ( τ ) := { Γ( p ) , Γ( q ) } and F ( τ ) := D f ( p k q ), and let U := { υ = ( υ , υ ); υ = F ( τ ) , υ ≥ F ( τ ) , τ ∈ T } . Lemma 9.2

Suppose (FC) is satisﬁed. Then the set U is closed and identicalto epi D max f | D . Proof.

Suppose ( υ , υ ) ∈ cl U . Then for any ε > B ε ( υ ) × C ε ( υ ) ∩ U is notempty, where B ε ( υ ) is the closed ε -ball centered at υ , and C ε ( υ ) := { t ; t ≤ υ + ε } . Therefore, all the sets in the family { F − ( B ε ( υ )) ∩ F − ( C ε ( υ )) ∩ T } ε> are not empty, and in fact, they are closed subsets of the compact set T , since F is continuous and F is lower semicontinuous. Since the family has ﬁniteintersection property, the intersection of these sets is not empty. Any element τ of this intersection satisﬁes υ = F ( τ ) and υ = F ( τ ), indicating υ ∈ U .Therefore, U is closed.The second statement follows by U ⊂ epi D max f | D ⊂ cl U . The ﬁrst ” ⊂ ” by F ( τ ) ≥ D max f (Γ( p ) k Γ( q )), and the second one is by the deﬁ-nition (3.1) of D max f . Lemma 9.3

Suppose (FC) is satisﬁed. Then, D max f is lower semicontinuous.Moreover, for each { ρ, σ } such that D max f ˙( ρ k σ ) < ∞ , the inﬁmum in (3.1) isachieved by some (Γ , { p, q } ) . Proof.

It suﬃces to prove the assertion on D . The closedness of D max f | D followsby closedness of epi D max f | D . Also, by the previous lemma,epi D max f | D = { ( ρ, σ, t ) ; ( ρ, σ ) ∈ D , t ≥ D max f ( ρ k σ ) } = { (Γ( p ) , Γ( q ) , t ) ; τ ∈ T , t ≥ D f ( p k q ) } . Therefore, to each { ρ, σ } there is a reverse test (Γ , { p, q } ) of { ρ, σ } with { t ; t ≥ D max f ( ρ k σ ) } = { t ; t ≥ D f ( p k q ) } , or equivalently, D max f ( ρ k σ ) = D f ( p k q ). Thisreverse test achieves the inﬁmum in (3.1).By this lemma and (9.2): 29 heorem 9.4 If f satisﬁes (FC), then (9.1) holds. Moreover, for each { ρ, σ } such that D max f ˙( ρ k σ ) < ∞ , the inﬁmum in (3.1) is achieved by some (Γ , { p, q } ) . f is operator convex When f satisﬁes the condition (F) and ρ > σ >

0, we can write ( W ∗ , W ∗ )achieving the maximum in (9.1) explicitly.Since f is operator convex, it is diﬀerentiable. Hence the Frechet derivativeD f ( T ) of f i.e., a linear transform in B ( H ) with k f ( T + X ) − f ( T ) − D f ( T ) ( X ) k = o ( k X k )is given by, in the basis which diagonalizes T ,D f ( T ) ( X ) = h f [1] ( t i , t j ) X i,j i , (9.3)where t i ( i = 1 , · · · ) are eigenvalues of T , and f [1] ( t, t ′ ) := ( f ( t ) − f ( t ′ ) t − t ′ , ( t = t ′ ) ,f ′ ( t ) , ( t = t ′ ) . Using d ( ρ, σ ) as of (4.7),dd t D max f ( ρ + tX k σ ) (cid:12)(cid:12)(cid:12)(cid:12) t =0 = dd t tr σf ( σ − ( ρ + tX ) σ − ) (cid:12)(cid:12)(cid:12)(cid:12) t =0 = tr σ D f ( d ( ρ, σ ))( σ − Xσ − )= tr Xσ − D f ( d ( ρ, σ ))( σ ) σ − , where the last identity is by self-adjointness of D f ( T ) ( · ) with respect to theinner product tr XY ,tr Y D f ( T ) ( X ) = X i,j ρ ,i,j f [1] ( t i , t j ) X i,j = X i,j f [1] ( t i , t j ) ρ ,i,j X i,j = tr X D f ( T ) ( Y ) . (9.4)Replacing f by ˆ f , the derivative about the second argument is computedsimilarly. Therefore, W ∗ = σ − { D f ( d ( ρ, σ ))( σ ) } σ − , W ∗ = ρ − n D ˆ f ( d ( σ, ρ ))( ρ ) o ρ − . achieves the maximum in (9.1). In fact, by (9.4),tr ( ρW , ∗ + σW , ∗ ) = tr σ D f ( d ( ρ, σ ))( d ( ρ, σ )) + tr ρ D ˆ f ( d ( σ, ρ ))( d ( σ, ρ ))= tr σf ′ ( d ( ρ, σ )) d ( ρ, σ ) + tr ρ ˆ f ′ ( d ( σ, ρ )) d ( σ, ρ )= D max f ( ρ k σ ) . For example, if f ( r ) = r , ˆ f ( r ) = 1 /r , D f ( T ) ( X ) = T X + XT andD ˆ f ( T ) ( X ) = − T XT − , W ∗ = σ − ρ + ρσ − , W ∗ = − σ − ρ σ − . .2 On continuity In this subsection, some remarks on continuity of D max f are in order. ByLemma 9.3 and Proposition 2.1, if f satisﬁes (FC),lim ε ↓ D max f ( ρ ε k σ ε ) = D max f ( ρ k σ ) , (9.5)where { ( ρ ε , σ ε ) } ε> is a straight line in the eﬀective domain of D max f . { ( ρ ε , σ ε ) } ε> cannot be arbitrary curve for (9.5) to hold. To see this, supposethat ˆ f (0) < ∞ and σ is a pure state, and use (7.1). Let σ = (cid:20) a

00 0 (cid:21) , ρ ε = (cid:20) b √ εC † √ εC εD (cid:21) , and ρ ≥

0, tr ρ ε = 1. Then˜ ρ ε = b − √ εC ( εD ) − √ εC † = b − CD − C † = ˜ ρ is constant of ε , andlim ε ↓ D max f ( ρ ε k σ ) = D max f (˜ ρ k σ ) + ˆ f (0)(1 − ˜ ρ ) = D max f ( ρ k σ ) . However, { ( ρ ε , σ ε ) } ε> need not to be straight line, either. For example,consider a continuous curve { σ ε } ε ≥ of positive operators with σ = σ , supp σ ε ⊃ supp ρ, supp σ, and σ ε > σ . Then, if f (0) = 0, (6.12) we havelim ε ↓ D max f ( ρ k σ ε ) ≤ D max f ( ρ k σ ) . The opposite inequality results from lower semicontinuity of D max f . Thus (9.5)holds. So far, we had supposed that the dimension of the underlying Hilbert spaceis ﬁnite. Some of them, namely Sections 5 and 7, are obviously generalized toseparable inﬁnite dimensional case. In this section, we consider generalizationof (9.1), thus proving lower semicontinuity. Also the existence of the minimumis discussed. Throughout the section, we suppose ˆ f (0) < ∞ and f (0) < ∞ . Inthis case, g f is not only lower semicontinuous, but also continuous. Remark 9.5

To see that g f is continuous, we only have to check it at theorigin. Observe g f ( s, t ) = ˜ g f ( s, t ) + ˆ f (0) s + f (0) t , with ˜ g f ( s, t ) ≤ . Therefore,if ( s k , t k ) → (0 , , lim k →∞ g f ( s k , t k ) = lim k →∞ { ˜ g f ( s, t ) + ˆ f (0) s k + f (0) t k } = lim k →∞ ˜ g f ( s, t ) ≤ g f ( s, t ) ≤ lim k →∞ g f ( s k , t k ) . indicating lim k →∞ g f ( s k , t k ) = g f ( s, t ) .

31n the RHS of (9.1), let ρ and σ are positive element of B ,sa , the space ofall the self - adjoint trace class operators (operators with ﬁnite trace), and W and W are self - adjoint bounded operators, B sa .Also, we modify the deﬁnition of reverse tests. admitting all the positiveregular measures as input. To state that object, operator valued functions andtheir integrals are used, see Section 2.3, [24] for these concepts. In this newdeﬁnition, the reverse test is speciﬁed by a regular ﬁnite measure ν over theBorel sets of [0 , × , a ν - measurable function Z ( s, t ) from ( s, t ) ∈ [0 , × into B ,sa with tr Z ( s, t ) = 1 , ν - a.e., (9.6) Z s Z ( s, t )d ν = ρ, Z tZ ( s, t )d ν = σ, (9.7)where the integrals of operator valued functions are understood as short for Z s tr Z ( s, t ) W d ν = tr ρW, ∀ W ∈ B sa . Thus our new deﬁnition is:D max f ( ρ k σ ) := inf (cid:26)Z g f ( s, t )tr Z ( s, t )d ν ; (9.6) and (9.7) (cid:27) (9.8)This deﬁnes a positive map from measures which are absolutely continuousrelative to ν into B ,sa such that Γ ( µ ) = R d µ d ν Z ( s, t )d ν . This is ‘trace pre-serving’ in the sense µ ([0 , × ) = tr Γ ( µ ) . Thus the pair ν and Z represents a”reverse test” (Γ , { P, Q } ), where P and Q are positive measures with density s and t , respectively. Remark 9.6

A function Z ( s, t ) is ν - measurable iﬀ its norm - approximableby simple functions, but the deﬁnition is equivalent to that the scalar valuedfunction ( s, t ) → tr Z ( s, t ) W is ν -measurable (Proposition 2.15, [24]). (9.7) maybe understood in the ”weak” sense as stated, but since k sZ ( s, t ) k ≤ tr Z ( s, t ) is ν -integrable, also can be understood as Bochner integral, or the limit of integralof simple functions in norm. Proposition 9.7 (Theorem 8.6.1, [8]) Let F be a real-valued convex functiondeﬁned on a convex subset Ω of a vector space X , and let G be a convex mappingof X into a partially ordered normed space Z . Deﬁne µ := sup { F ( ~W ); ~W ∈ Ω , G ( ~W ) ≤ } . Then for any ζ ∗ ≥ , µ ≤ F ∗ ( ζ ∗ ) := sup { F ( ~W ) + h ζ ∗ , G ( ~W ) i ; ~W ∈ Ω } . (9.9) Also, if there exists an ~W such that G ( ~W ) < and µ is ﬁnite, µ = min ζ ∗ ≥ F ∗ ( ζ ∗ ) . (9.10)32elow, we apply this proposition considering the RHS of (9.1) as the primalproblem, and obtain the reverse test as its dual problem. To proceed, we needto introduce a proper mathematical framework.Consider the space C of continuous real valued functions on the compactset [0 , × and the space B sa of the space of the bounded self - adjoint linearoperators on the Hilbert space H . Endorse C and B sa with the norm k h k :=sup ( s,t ) ∈ [0 , × | h ( s, t ) | and the operator norm k W k , respectively. From thesetwo spaces, we compose the linear space ( n X i =1 h ( i ) W ( i ) ; h ( i ) ∈ C , W ( i ) ∈ B sa ) , and its completion with respect to the projective norm k z k π : = inf ( n X i =1 k h ( i ) kk W ( i ) k ; z = n X i =1 h ( i ) W ( i ) ) is denoted by Z . In fact, Z is the projective tensor product C ˆ ⊗ π B sa . (That k · k π is a norm and k hW k π = k h kk W k is known [24].)Then for each z ∈ Z , there exist bounded sequences (cid:8) h ( i ) (cid:9) and (cid:8) W ( i ) (cid:9) with z = P ∞ i =1 h ( i ) W ( i ) and k z k π = inf ( ∞ X i =1 k h ( i ) kk W ( i ) k ; z = ∞ X i =1 h ( i ) W ( i ) ) . One can endorse the partial order ≥ in Z by z ≥ ⇔ ∞ X i =1 h ( i ) ( s, t ) W ( i ) ≥ , ∀ ( s, t ) ∈ [0 , × . The strict inequality z > z is an interior point of the cone { z ′ ; z ′ ≥ } .Any bounded linear functional ζ ∗ on Z is the linearization of bilinear formon C and B sa (see Section 2.2, [24]) : ζ ∗ ( hW ) = ζ ∗ ( h ) ( W ) , where ζ ∗ ( h ) ( · ) and ζ ∗ ( · ) ( W ) is an element of B ∗ sa and C ∗ , respectively.Below, Z is the one deﬁned as above, and X = Ω := { ~W = ( W , W ); W , W ∈ B sa } ,F ( ~W ) := tr ρW + tr σW , Lemma 9.8

Suppose g f is positive, bounded and continuous on [0 , × . Sup-pose ( s, t ) → η s,t ( W ) is ν - measurable function on [0 , × and W → η s,t ( W ) isa linear functional with | η s,t ( W ) | ≤ k W k , ν - a.e., (9.11)33 nd Z s η s,t ( W )d ν = tr ρW, Z t η s,t ( W )d ν = tr σW, ∀ W ∈ B sa . (9.12) Then min η Z g f ( s, t ) η s,t ( )d ν = sup ( W ,W ) ∈W f (tr W ρ + tr W σ ) . (9.13) Proof.

We apply Proposition 9.7 with G ( ~W ) : = g W + g W − g f , where g ( s, t ) : = s , g ( s, t ) : = t . With ζ ∗ ∈ Z ∗ , ζ ∗ ≥ F ∗ ( ζ ∗ ) = sup ~W { tr ρW + tr σW − ζ ∗ ( g W + g W − g f ) } = sup ~W { (tr ρW − ζ ∗ ( g )( W )) + (tr σW − ζ ∗ ( g )( W )) + ζ ∗ ( g f )( ) } = (cid:26) ζ ∗ ( g f ) ( ) , if ζ ∗ ( g ) ( W ) = tr ρW and ζ ∗ ( g ) ( W ) = tr σW, ∞ , otherwise.Observe h → ζ ∗ ( h )( W ) is a bounded functional on C . Therefore, by Riesz-Markov representation theorem, ζ ∗ ( h )( W ) = Z h ( s, t )d ν W , where ν W is a regular measure over the Borel sets of [0 , × . By ζ ∗ ( χ ( B )) ( W ) = ν W ( B ), where χ ( · ) is the indicator function, | ν W ( B ) | ≤ k W kk ζ ∗ ( χ ( B ))( · ) k = k W kk ζ ∗ ( χ ( B ))( ) k = k W k ν ( B ) . (9.14)Therefore, ν W is absolutely continuous relative to ν .Thus η s,t ( W ) : = d ν W d ν exists, and ζ ∗ ( h )( W ) = Z h ( s, t ) η s,t ( W )d ν . Since W → ζ ∗ ( h )( W ) is linear and positive, so is W → η s,t ( W ), ν -a.e. (9.11)follows from (9.14). Therefore, rewriting F ∗ using η s,t and ν : = ν , we havethe LHS of (9.13).Also, G ( · ) is convex, and ~W : = ( w , , w , ), where ( w , , w , ) is arelative interior point of W f , satisﬁes G ( ~W ) <

0. Finally, η s,t ( W ) :=  tr ρW, if ( s, t ) = (1 , , tr σW, if ( s, t ) = (0 , , , otherwise ,ν ( { (1 , } ) = ν ( { (1 , } ) := 1 , ν ([0 . × \{ (0 , , (1 , } ) := 0 , satisﬁes (9.12) and R g f ( s, t ) η s,t ( )d ν = g f (1 ,

0) + g f (0 ,

1) is ﬁnite. Thus by(9.9), the RHS of (9.13) is ﬁnite. Therefore, we can apply Proposition 9.7, andthe assertion is proved. 34 heorem 9.9

Suppose H a separable Hilbert space, (FC) is satisﬁed, and ˆ f (0) < ∞ and f (0) < ∞ .Then, (i) (9.1) holds if W and W ranges over B sa . (ii) inf in (9.8) can be replaced by min . (iii) D max f is lower semicontinuous. Proof.

We use Lemma 9.8, and rewrite η s,t using Z ( s, t ). Then, (i) and (ii) willbe simultaneously proved. Since (i) means D max f is the pointwise supremum oflinear functionals, (iii) will follow.Without loss of generality, one may suppose f ( r ) ≥ ∀ r ≥

0, or equiva-lently, g f is positive. To see this, choose a and b so that f ( r ) := f ( r ) − ar − b ≥ g f ( s, t ) = g f ( s, t ) − as − bt ≥

0. If ( W , W ) ∈ W max f , ( W + a, W + b ) ∈W max f , and D max f ( ρ k σ ) = D max f ( ρ k σ ) + a tr ρ + b tr σ . Thus, D max f satisﬁes (i)-(iii)of the present theorem iﬀ D max f satisfy those.Since η s,t is a bounded linear functional on B sa , there is Z ( s, t ) ∈ B ,sa withtr Z ( s, t ) W = η s,t ( W ), for all for any W with ﬁnite rank. Then by ζ ≥ Z ( s, t ) ≥ , tr Z ( s, t ) ≤ , ν -a.e. . (9.15)Also, tr Z ( s, t ) W ≤ η s,t ( W ) , W ∈ B sa , W ≥ , ν -a.e. . (9.16)Therefore, since g f ≥ Z g f ( s, t ) η s,t ( )d ν ≥ Z g f ( s, t )tr Z ( s, t )d ν, (9.17)thus replacement of η s,t by W → tr Z ( s, t ) W only improve the value of optimizedfunction.Next we show that (9.12) leads to (9.7). Suppose W ≥

0, and let { W ( k ) } bethe sequence of positive ﬁnite rank operators such that W ( k ) = π k W π k , where π k is the projector onto k - dimensional subspace. Then 0 ≤ W ( k ) ≤ W and as k → ∞ , for , s tr Z ( s, t ) W ( k ) ր s tr Z ( s, t ) W , ν -a.e. . Since the function ( s, t ) → s tr Z ( s, t ) W is ν -integrable, by monotone conver-gence theorem,tr ρW = ( a ) lim k →∞ tr ρW ( k ) = lim k →∞ Z sη s,t ( W ( k ) )d ν = ( b ) lim k →∞ Z s tr Z ( s, t ) W ( k ) d ν = Z lim k →∞ s tr Z ( s, t ) W ( k ) d ν = Z s tr Z ( s, t ) W d ν. Here, ( a ) holds since ρ is trace crass, and ( b ) holds since W ( k ) is of ﬁnite rank.When W is not positive, decomposing it into its positive and negative part, weobtain the identity. Thus, (9.7) is satisﬁed.Finally, due to (9.15), Z ( s, t ) can be normalized to satisfy (9.6).35 We had introduced the maximal f - divergence as the solution to an optimizationproblem, reverse test, and shown its closed formula in some important cases.Next step is to consider asymptotic version of the problem, in the hope that thisclose the gap between the maximum and minimum quantum divergence. Thepresent author’s long standing project is to characterize all the possible quantum f - divergence, as [22] had characterized all the quantum Fisher information. Appendix Matrix analysis

Proposition 10.1 (Theorem V.2.3 of [2])Let f be a continuous function on [0 , ∞ ) . Then, if f is operator convex and f (0) ≤ , for any positive operator X and an operator C such that k C k ≤ , f (cid:0) C † XC (cid:1) ≤ C † f ( X ) C . Proposition 10.2 ((2.43) of [3]) Let f be a operator convex function deﬁnedon [0 , ∞ ) . Let Λ † be a unital positive map. Then f (cid:0) Λ † ( A ) (cid:1) ≤ Λ † ( f ( A )) holds for any A ≥ . Proposition 10.3 (Proposition 8.4 of [12])Let f be a continuous operator con-vex function on [0 , ∞ ) . Then, if ˆ f (0) < ∞ , there is a real number a and apositive Borel measure µ such that f ( r ) = f (0) + ˆ f (0) r + Z (0 , ∞ ) ψ λ ( r )d µ ( t ) , ψ λ ( r ) := − rr + λ , and R (0 , ∞ ) d µ ( λ )1+ λ < ∞ . Since ψ λ is operator monotone decreasing, this meansthat f ( r ) is sum of linear function and operator monotone decreasing function. Proposition 10.4 (Lemma 5.2 of [12]) If f is a complex-valued function onﬁnitely many points { r i ; i ∈ I } ⊂ [0 , ∞ ) , then for any pairwise diﬀerent positivenumbers { λ i ; i ∈ I } there exist complex numbers { c i ; i ∈ I } such that f ( r i ) = P i ∈ I c i r i + λ i , i ∈ I . Proposition 10.5 (Exercise 1.3.5 of [3]) Let X , Y be a positive deﬁnite ma-trices. Then, (cid:20) X CC † Y (cid:21) ≥ implies X ≥ CY − C † , Y ≥ C † X − C. (10.2)36 eferences [1] Amari, S., Nagaoka,H.: Methods of Information Geometry. AMS (2001)[2] Bhatia, R.: Matrix Analysis. Springer, Berlin (1996)[3] Bhatia,R.: Positive Deﬁnite Matrices. Princeton (2007)[4] Belavkin,V. P.:On Entangled Quantum Capacity. In: Quantum Communi-cation, Computing, and Measurement 3.pp.325-333. Kluwer, Boston (2001)[5] Cheﬂes, A.:Deterministic quantum state transformations. Phys. Lett A 270,14 (2000)[6] Ebadian, A., Nikoufar, I., and Gordjic,M.: Perspectives of matrix convexfunctions. Proc. Natl Acad. Sci. USA, 108(18), 7313–7314 (2011)[7] Eﬀros, E., and Hansen, F.,: Non-commutative perspectives, Ann. Funct.Anal. Volume 5, Number 2, 74-79 (2014)[8] Luenberger, D. G.:Optimization by vector space methods. Wiley, New York(1969)[9] Hammersley,S. J.,Belavkin, V. P.:Information Divergence for QuantumChannels, Inﬁnite Dimensional Analysis. In: Quantum Informationand Computing, Quantum Probability and White Noise Analysis,VXIX,pp.149-166, World Scientiﬁc, Singapore(2006)[10] Hayashi,M.:Characterization of Several Kinds of Quantum Analogues ofRelative Entropy. Quantum Information and Computation, Vol. 6, 583-596(2006)[11] Hiai,F., Petz,D.: Diﬀerent quantum f-divergences and the reversibility ofquantum operations, arXiv:math-ph/1604.03089 (2006)[12] Hiai, F., Mosonyi, M., Petz D., and Beny, C.:Quantum f- divergences anderror corrections. Rev. Math. Phys. 23, 691–747 (2011)[13] Hiai,F., Petz,D.: The proper formula for relative entropy and its asymp-totics in quantum probability. Comm. Math. Phys. 143, 99-114 (1991)[14] Holevo, A.S.:Probabilistic and Statistical Aspects of Quantum Theory,North-Holland, Amsterdam, (1982)(in Russian, 1980)[15] Matsumoto, K.: A Geometrical Approach to Quantum Estimation Theory,doctoral dissertation, University of Tokyo (1998)[16] Matsumoto, K.: Reverse estimation theory, Complementality between RLDand SLD, and monotone distances. arXiv:quant-ph/0511170 (2005)[17] Matsumoto, K.: Reverse test and quantum analogue of classical Fidelityand generalized Fidelity, arXiv:quant-ph/1006.0302 (2010)3718] Matsumoto, K.: On maximization of measured ff