Note on bounds for symmetric divergence measures
aa r X i v : . [ c s . I T ] M a r Note on bounds for symmetric divergence measures
S.Furuichi , K.Yanagi and K.Kuriyama Nihon University Josai University Yamaguchi University a) Corresponding author: [email protected]
Abstract.
In the paper [1], the tight bounds for symmetric divergence measures are derived by applying the results established inthe paper [2]. In this article, we are going to report two kinds of extensions for the above results, namely classical q -extension andnon-commutative(quantum) extension. INTRODUCTION
In the paper [1], the tight bounds for symmetric divergence measures are derived by applying the results establishedin the paper [2]. In the paper [1], the minimization problem for Bhattacharyya coe ffi cient, Cherno ff information,Jensen-Shannon divergence and Je ff rey’s divergence under the constraint on total variation distance. In this arti-cle, we are going to report two kinds of extensions for the above results, namely classical q -extension and non-commutative(quantum) extension. The parametric q -extension means that Tsallis entropy H q ( X ) ≡ P x p ( x ) q − p ( x )1 − q [3]converges to Shannon entropy when q →
1. Namely, all results with the parameter q recover the usual (standard)Shannon’s results when q →
1. We give here list of our extensions as follows.(i) The lower bound for Jensen-Shannon-Tsallis diverence is given by applying the results in [2].(ii) The lower bound for Je ff rey-Tsallis divergence is given by applying the results in [2] and deriving q -Pinsker’sinequality for q ≥
1. This implies new upper bounds of P u ∈U | p ( u ) − Q d , l ( u ) | .(iii) The lower bound for quantum Cherno ff information is given by the known relation between the trace distanceand fidelity.(iv) The lower bound for quantum Je ff rey divergence is given by applying the monotonicity (data processing in-equality) of quantum f -divergence. q -EXTENDED CASES Here we review some quantities. The total variation distance between two probability distributions P ( x ) and Q ( x ) isdefined by d TV ( P , Q ) ≡ X x | P ( x ) − Q ( x ) | = || P − Q || , where || · || represents l norm. The f -divergence introduced by Csisz´ar in [4] is defined by D f ( P || Q ) ≡ X x Q ( x ) f P ( x ) Q ( x ) ! where f is convex function and f (1) =
0. If we take f ( t ) = − t ln q t , where ln q ( x ) ≡ x − q − − q is q -logarithmic functiondefined for x ≥ q ,
1, then f -divergence is equal to the Tsallis relative entropy (Tsallis divergence) defined bysee e.g., [5]) D q ( P || Q ) ≡ − X x P ( x ) ln q Q ( x ) P ( x ) = X x P ( x ) − P ( x ) q Q ( x ) − q − q . In this section, we use the result established by Gilardoni in [2] for the symmetric divergence.
Theorem (Gilardoni, 2006 [2])
We suppose D f is symmetric divergence (which condition is known as f ( u ) = u f (1 / u ) + c ( u − u ∈ (0 , ∞ ) and c is constant number) and f : (0 , ∞ ) → R with f (1) =
0. Then we haveinf P , Q : d TV ( P , Q ) = ε D f ( P k Q ) = (1 − ε ) f + ε − ε ! − f ′ (1) ε As corollaries of the above theorem, we obtain the following two propositions. We define the Jensen-Shannon-Tsallis diverence as C q ( P , Q ) ≡ D q (cid:18) P (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) P + Q (cid:19) + D q (cid:18) Q (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) P + Q (cid:19) . Then D f q ( P k Q ) = C q ( P , Q ) with f q ( t ) = − t ln q t + t − ln q t + , f q is convex, with f q (1) = C q ( P , Q ) = C q ( Q , P ).Thus we have the following proposition which is q -parametric extension of Proposition 3 in [1]. Proposition 1 min P , Q : d TV ( P , Q ) = ε C q ( P , Q ) = − (1 − ε ) ln q − ε − (1 + ε ) ln q + ε , The equality is archived when P = (cid:16) − ε , + ε (cid:17) , Q = (cid:16) + ε , − ε (cid:17) .We also define Je ff rey-Tsallis divergence as J q ( P , Q ) ≡ n D q ( P k Q ) + D q ( Q k P ) o . Then D f q ( P k Q ) = J q ( P , Q ) with f q ( t ) = ( t q − q t , f q is convex with f q (1) = J q ( P , Q ) = J q ( Q , P ). Thus wehave the following proposition which is q -parametric extension of Proposition 4 in [1]. Proposition 2 min P , Q : d TV ( P , Q ) = ε J q ( P , Q ) = − ( (1 + ε ) ln q − ε + ε + (1 − ε ) ln q + ε − ε ) . The equality is archived when P = (cid:16) − ε , + ε (cid:17) , Q = (cid:16) + ε , − ε (cid:17) .Here we are able to prove the following lemma, which may be named q -Pinsker’s inequality. Lemma 1 D q ( P k Q ) ≥ d TV ( P , Q ) f or q ≥ . Proof:
The proof is easily done by the fact that log t ≤ t r − r , ( t > , r >
0) implies − log t ≤ − ln q t , ( t > , q > r = q −
1. Thus we have − x ln q yx − (1 − x ) ln q − y − x ≥ − x log yx − (1 − x ) log 1 − y − x ≥ x − y ) for 0 < x , y < , q ≥
1. Thus we have this lemma by data processing inequality.As remark, the above q -Pinsker inequality does not hold for the case 0 < q <
1, since we have counter-examples.Applying this lemma, we can prove the following proposition, which condition is same to the paper [1] except for thextended parameter q . Theorem 1
Consider a memoryless stationary source with alphabet U with probability distribution P and assumethat a uniquely decodable code with an alphabet size d . For q ≥
1, we have12 X u ∈U (cid:12)(cid:12)(cid:12) p ( u ) − Q d , l ( u ) (cid:12)(cid:12)(cid:12) ≤ min , r ∆ d , q log e d . Where ∆ d , q ≡ n q − H d , q ( U ), n q ≡ − ( c d , l ) q − log e d P u ∈U p ( u ) q ln q d − l ( u ) , H d , q ( U ) ≡ − e d P u ∈U p ( u ) q ln q p ( u ), Q d , l ( u ) ≡ d − l ( u ) c d , l and c d , l ≡ P u ∈U d − l ( u ) . Proof:
We give the sketch of the proof of this proposition. Firstly P u ∈U (cid:12)(cid:12)(cid:12) p ( u ) − Q d , l ( u ) (cid:12)(cid:12)(cid:12) ≤ D q (cid:16) P (cid:13)(cid:13)(cid:13) Q d , l (cid:17) ≥ D q (cid:18)b P (cid:13)(cid:13)(cid:13)(cid:13)d Q d , l (cid:19) ≥ (cid:0) P ( A ) − Q d , l ( A ) (cid:1) = d TV (cid:0) P , Q d , l (cid:1)! = X u ∈U (cid:12)(cid:12)(cid:12) p ( u ) − Q d , l ( u ) (cid:12)(cid:12)(cid:12) , where A ≡ (cid:8) x : P ( x ) > Q d , l ( x ) (cid:9) , Y ≡ φ ( X ) and b P and d Q d , l are distributions of new random variable Y . By simplecomputations with formula ln q yx = x q − (ln q y − ln q x ), we have D q (cid:16) P (cid:13)(cid:13)(cid:13) Q d , l (cid:17) = X u ∈U p ( u ) q (cid:16) ln q p ( u ) − ln q Q d , l ( u ) (cid:17) = X u ∈U p ( u ) q ln q p ( u ) − ln q d − l ( u ) c d , l ! = − log e d · H d , q ( U ) − (cid:0) c d , l (cid:1) q − X u ∈U p ( u ) q (cid:16) ln q d − l ( u ) − ln q c d , l (cid:17) = − log e d · H d , q ( U ) + log e d · n q − ln q c d , l X u ∈U p ( u ) q ≤ log e d · ∆ d , q since the Kraft-McMillian inequality c d , l ≤ P u ∈U (cid:12)(cid:12)(cid:12) p ( u ) − Q d , l ( u ) (cid:12)(cid:12)(cid:12)! ≤ log e d · ∆ d , q . Remark 1
This theorem is a parametric extension of the inequality (32) in the paper [1] in the sense that the lefthand side of our inequality contains the parameter q ≥
1. We also note that the condition q ≥ q ≥ q ≥ q ∆ d , q log e d ≤ q ∆ d , log e d , where ∆ d , was used in the paper [1] as ∆ d .Consider the following information source U = u , u , u . , . , . ! , with d =
2. Then we have the code u → “0” , u → “10” , u → “110” by Shannon-Fano coding, so that c d , l = < l = , l = , l =
3. By numerical computations, we have q ∆ , . log e ≃ . q ∆ , log e ≃ . q ∆ d , q log e d ≤ q ∆ d , log e d , which shows our upper bound with the parameter q ≥ q ≥ q ∆ d , q log e d ≤ q ∆ d , log e d for the case c d , l < c d , l = ff man code), the following proposition can be proven. Proposition 3
Let q ≥ c d , l =
1. Then we have the relation ∆ d , ≤ ∆ d , q . Proof:
We firstly prove the inequality f q ( x , y ) ≥ q ≥ , < x , y ≤
1, where f q ( x , y ) ≡ x (log e y − log e x ) + x q (ln q x − ln q y ) . Since d f q ( x , y ) dy = x q y q (cid:16) x − q y − q − (cid:17) , if x ≤ y , then d f q ( x , y ) dy ≥ x ≥ y , then d f q ( x , y ) dy ≤
0, thus we have f q ( x , y ) ≥ f q ( x , x ) =
0. Putting x = p ( u ) and y = d − l ( u ) , taking summation on both sides by u ∈ U and dividing theboth sides by log e d , we have − e d X u ∈U p ( u ) q ln q d − l ( u ) + e d X u ∈U p ( u ) log e d − l ( u ) − e d X u ∈U p ( u ) log e p ( u ) + e d X u ∈U p ( u ) q ln q p ( u ) ≥ . When c d , l =
1, we thus obtain the inequality ∆ d , q − ∆ d , = n q − n + H d , ( U ) − H d , q ( U ) ≥
0, taking account that theusual average code length can be rewritten as n = P u ∈U p ( u ) l ( u ) = − e d P u ∈U p ( u ) log e d − l ( u ) .This proposition shows that for the special (but nontrivial) case c d , l =
1, the upper bound q ∆ d , log e d given in (32)of the paper [1] is always tighter than ours q ∆ d , q log e d (for q ≥
1) obtained in Theorem 1.
NON-COMMUTATIVE CASES
Let ρ and σ be density matrices (quantum states), which are positive semi-definite matrices and unit trace. Then thefollowing quantities are well known in the field of quantum information or physics as trace distance and fidelity,respectively: d ( ρ, σ ) ≡ T r | ρ − σ | , F ( ρ, σ ) ≡ T r (cid:12)(cid:12)(cid:12) ρ / σ / (cid:12)(cid:12)(cid:12) , Where | A | = ( A ∗ A ) / . Then we have the following propositions. Proposition 4
For the trace distance and fidelity, we have the following relation:1 − d ( ρ, σ ) ≤ F ( ρ, σ ) ≤ q − d ( ρ, σ ) . This relation is well known in the field of quantum information or quantum statistical physics, and this propositionis non-commutative extension of Proposition 1 in the paper [1].By the easy calculations such as C Q ( ρ, σ ) ≡ − log (cid:18) min ≤ s ≤ T r h ρ s σ − s i(cid:19) = − min ≤ s ≤ (cid:16) log T r h ρ s σ − s i(cid:17) ≥− log T r h ρ / σ / i ≥ − log T r h(cid:12)(cid:12)(cid:12) ρ / σ / (cid:12)(cid:12)(cid:12)i = − log F ( ρ, σ ) ≥ − log (cid:16) − d ( ρ, σ ) (cid:17) , we have the following proposi-tion. Proposition 5
For the quantum Cherno ff information, we havemin ρ,σ : d ( ρ,σ ) = ε C Q ( ρ, σ ) = ( − log (cid:16) − ε (cid:17) , ε ∈ [0 , + ∞ , ε = D ( ρ | σ ) ≡ T r [ ρ (log ρ − log σ )] ≥ T r (cid:2) | ρ − σ | (cid:3) nd D ( ρ | σ ) ≥ − T r h ρ / σ / i ≥ T r h ρ / − σ / i To show our final result, we use the following well-known fact. See [7] for example.
Lemma 2
Let E : B ( H ) → B ( K ) be a state transformation. For an operator monotone decreasing function f : R + → R , the monotonicity holds: D f ( ρ | σ ) ≥ D f ( E ( ρ ) |E ( σ ) )where D f ( ρ | σ ) ≡ T r (cid:2) ρ f ( ∆ ) ( I ) (cid:3) is the quantum f -divergence, with ∆ σ,ρ ≡ ∆ = LR is the relative modular operatorsuch as L ( A ) = σ A and R ( A ) = A ρ − . Theorem 2
The quantum Je ff rey divergence defined by J ( ρ | σ ) ≡ { D ( ρ | σ ) + D ( σ | ρ ) } has the following lowerbound: J ( ρ | σ ) ≥ d ( ρ, σ ) log + d ( ρ, σ )1 − d ( ρ, σ ) ! . Proof:
By Lemma 2, Proposition 4 in the paper [1] and k ρ − σ k = k P − Q k (which will be shown in the end ofproof), we have J ( ρ | σ ) ≥ J ( P | Q ) ≥ d TV ( P , Q ) log + d TV ( P , Q )1 − d TV ( P , Q ) ! = d ( ρ, σ ) log + d ( ρ, σ )1 − d ( ρ, σ ) ! . Here we note that f ( t ) = ( t −
1) log t is operator convex which is equivalent to operator monotone decreasing andwe have D ( t −
1) log t ( ρ | σ ) = J ( ρ | σ ), since (cid:16) ∆ σ,ρ log ∆ σ,ρ (cid:17) ( Y ) = σ log σ ( Y ) ρ − − σρ − log ρ ( Y ).Finally, we show k ρ − σ k = k P − Q k . Let A = C ∗ ( ρ − ρ ) be commutative C ∗ -algebra generated by ρ − ρ , M n be the set of all n × n matrices and set the map E : M n → A as trace preserving, conditional expectation. If wetake p = E ( ρ ) and p = E ( ρ ), then two elements ( ρ − ρ ) + and ( ρ − ρ ) − of Jordan decomposition of ρ − ρ ,are commutative functional calculus of ρ − ρ , and we have p − p = E ( ρ − ρ ) = E (cid:0) ( ρ − ρ ) + − ( ρ − ρ ) − (cid:1) = E (cid:0) ( ρ − ρ ) + (cid:1) − E (cid:0) ( ρ − ρ ) − (cid:1) = ( ρ − ρ ) + − ( ρ − ρ ) − = ρ − ρ which implies k ρ − σ k = k P − Q k . ACKNOWLEDGMENTS
The author (S. F.) was partially supported by JSPS KAKENHI Grant Number 16K05257.
REFERENCES [1] I. Sason, Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Cod-ing, IEEE, TIT, Vol. 61(2015),pp.701–707.[2] G. L. Gilardoni, On the minimum f -divergence for given total variation, C. R. Acad. Sci. Paris, Ser. I, Vol.343(2006), pp.763–766.[3] C.Tsallis, Possible generalization of Bolzmann-Gibbs statistics, J.Stat. Phys., Vol.52(1988), pp. 479–487.[4] I. Csisz´ar, Information-type measures of di ff erence of probability distributions and indirect observations,Stud. Sci. Math. Hungarica, Vol. 2(1967), pp. 299–318.[5] S.Furuichi, K.Yanagi and K.Kuriyama, Fundamental properties of Tsallis relative entropy, J.Math.Phys.,Vol.45(2004), pp.4868–4877.[6] S.Furuichi, Information theoretical properties of Tsallis entropies, J.Math.Phys., Vol.47(2006), 023302.[7] D.Petz, Quantum information theory and quantum statistics, Springer, 2004.[8] E.A.Carlen and E.H. Lieb, Remainder terms for some quantum entropy inequalities, J. Math. Phys., Vol.55(2014), 042201. ppendix: Added notes related to Theorem 1 Actually we have lim q → n q = P u ∈U p ( u ) l ( u ) which is the usual average code length, but the definition of n q inTheorem 1 seems to be complicated and somewhat unnatural to understand its meaning. In order to overcome thisproblem, we may adopt the simple alternative definition for n q instead of that in Theorem 1. Then we have thefollowing proposition. Proposition A
Let q ≥ c d , l , q ≤
1. Then we have12 X u ∈U (cid:12)(cid:12)(cid:12) p ( u ) − Q d , l , q ( u ) (cid:12)(cid:12)(cid:12) ≤ min , r ∆ d , q log e d Where ∆ d , q ≡ n q − H d , q ( U ), n q ≡ P u ∈U p ( u ) q l ( u ), H d , q ( U ) ≡ − e d P u ∈U p ( u ) q ln q p ( u ), Q d , l , q ( u ) ≡ c d , l , q exp q (cid:16) log e d − l ( u ) (cid:17) and c d , l , q ≡ P u ∈U exp q (cid:16) log e d − l ( u ) (cid:17) , where q -exponential function exp q ( · ) is the inverse functionof q -logarithmic function ln q ( · ) and its form is given in the proof of this proposition. Proof:
By the same way to the proof of Theorem 1, we have D q (cid:16) P (cid:13)(cid:13)(cid:13) Q d , l , q (cid:17) ≥ X u ∈U (cid:12)(cid:12)(cid:12) p ( u ) − Q d , l , q ( u ) (cid:12)(cid:12)(cid:12) , By simple computations with formula ln q yx = y − q (ln q y + ln q x ), we have D q (cid:16) P (cid:13)(cid:13)(cid:13) Q d , l , q (cid:17) = X u ∈U p ( u ) q (cid:16) ln q p ( u ) − ln q Q d , l , q ( u ) (cid:17) = − log e d · H d , q ( U ) − X u ∈U p ( u ) q ln q exp q (cid:16) log e d − l ( u ) (cid:17) c d , l , q = − log e d · H d , q ( U ) − X u ∈U p ( u ) q (cid:16) exp q (cid:16) log e d − l ( u ) (cid:17)(cid:17) − q ln q c d , l , q − X u ∈U p ( u ) q log e d − l ( u ) = log e d · X u ∈U p ( u ) q l ( u ) − log e d · H d , q ( U ) − X u ∈U p ( u ) q (cid:16) exp q (cid:16) log e d − l ( u ) (cid:17)(cid:17) − q ln q c d , l , q = ∆ d , q log e d − X u ∈U p ( u ) q (cid:16) exp q (cid:16) log e d − l ( u ) (cid:17)(cid:17) − q ln q c d , l , q ≤ ∆ d , q log e d since d ≥ l ( u ) ≥ e d − l ( u ) ≤ + (1 − q ) log e d − l ( u ) ≥
0, then the definition of q -exponentialfunction exp q ( x ) = ( (1 + (1 − q ) x ) − q , if 1 + (1 − q ) x > , otherwiseshows exp q (log e d − l ( u ) ) ≥ c d , l , q ≤ P u ∈U (cid:12)(cid:12)(cid:12) p ( u ) − Q d , l , q ( u ) (cid:12)(cid:12)(cid:12)! ≤ ∆ d , q log e d . We could not remove the needless and meaningless condition c d , l , q ≤ c d , l , ≤ c d , l , = n = H d , ( U ) [1]. In our proposition, we obtained qq