How type of Convexity of the Core function affects the Csiszár f-divergence functional
aa r X i v : . [ c s . I T ] J a n HOW TYPE OF CONVEXITY OF THE CORE FUNCTIONAFFECTS THE CSISZ ´AR f -DIVERGENCE FUNCTIONAL MOHSEN KIAN
Abstract.
We investigate how the type of Convexity of the Core function affectsthe Csisz´ar f -divergence functional. A general treatment for the type of convexityhas been considered and the associated perspective functions have been studied.In particular, it has been shown that when the core function is MN-convex, thenthe associated perspective function is jointly MN-convex if the two scalar mean Mand N are the same. In the case where M = N, we study the type of convexity ofthe perspective function. As an application, we prove that the Hellinger distance is jointly GG-convex. As further applications, the matrix Jensen inequality hasbeen developed for the perspective functions under different kinds of convexity. introduction In the probability theory, the notion of Csisz´ar f -divergence is well-known in rela-tion with measures between probability distributions. Those kinds of measures havemany applications in many directions, like economics, genetics, signal processingand so on. In fact, Csisz´ar [4, 3] introduced f -divergence functional of a function f : [0 , ∞ ) → R by I f ( p , q ) := n X j =1 q j f (cid:18) p j q j (cid:19) for n -tuples of positive real numbers p = ( p , . . . , p n ) and q = ( q , . . . , q n ). In abovedefinition, the undefined expressions are interpreted as f (0) = lim t → + f ( t ) , f (cid:18) (cid:19) = 0 , f (cid:16) p (cid:17) = lim ǫ → + f (cid:16) pǫ (cid:17) = p lim t →∞ f ( t ) t . A useful fact concerning the f -divergence functional was proved by Csisz´ar andK¨orner [5] as follows. In fact, they showed that the perspective function of a convexfunction is sub-additive. Mathematics Subject Classification.
Primary 26B25 . 94A17; Secondary 15A45 . 26D15.
Key words and phrases.
Csisz´ar f -divergence, convexity and joint convexity, Matrix Jenseninequality, spectral decomposition. Theorem A. If f : [0 , ∞ ) → R is convex, then I f ( p , q ) is jointly convex in p and q and g n X j =1 p j , n X j =1 q j ! = n X j =1 q j f P nj =1 p j P nj =1 q j ! ≤ I f ( p , q ) (1.1) for all positive n -tuples p = ( p , . . . , p n ) and q = ( q , . . . , q n ) , where the perspectivefunction g associated to f is defined by g ( x, y ) := yf (cid:18) xy (cid:19) . When f varies through convex functions, the Csisz´ar f -divergence produces dif-ferent known measures. Among others, we mention the following notable measures:- Kullback–Leibler distance is defined by KL ( p , q ) := P nj =1 p j log (cid:16) p j q j (cid:17) and KL = I f ,when f ( t ) = t ln t ( t > Total variation distance is defined by V ( p , q ) := P nj =1 | p j − q j | and V = I f , when f ( t ) = | t − | ( t ≥ Hellinger distance is defined by H ( p , q ) := 2 P nj =1 (cid:0) √ p j − √ q j (cid:1) and H = I f ,when f ( t ) = 2( √ t − ( t ≥ χ -distance is defined by D χ ( p , q ) := P nj =1 ( p j − q j ) q j and D χ = I f , when f ( t ) =( t − ( t ≥ R´enyi’s divergences are defined by R α ( p , q ) := α ( α − ln ρ α ( p , q ) for every α ∈ R \{ , } , where ρ α ( p , q ) = P nj =1 p αj q − αj and ρ α = I f , when f ( t ) = t α ( t > f -divergence functional and its properties, the readeris referred to [11, 13, 14, 22, 23] and references therein.For every two positive real numbers x, y and every α ∈ [0 , A α ( x, y ) = αx + (1 − α ) y Arithmetic mean G α ( x, y ) = x α y − α Geometric mean H α ( x, y ) = (cid:0) αx − + (1 − α ) y − (cid:1) − Harmonic mean . The Arithmetic-Geometric-Harmonic means inequality is well-known: H α ( x, y ) ≤ G α ( x, y ) ≤ A α ( x, y ) , ( x, y ≥ , α ∈ [0 , . (1.2) ONVEXITY OF THE CSISZ ´AR f -DIVERGENCE FUNCTIONAL 3 The effect of type of convexity of core function on the f -divergence functional Convex functions are known to be defined using the Arithmetic mean: A realfunction f is convex when f ( A α ( x, y )) ≤ A α ( f ( x ) , f ( y ))for all x, y in domain of f and every α ∈ [0 , M α and N α are two α -weighted scalar means, a positive real function f on (0 , ∞ )is said to be MN-convex, when f ( M α ( x, y )) ≤ N α ( f ( x ) , f ( y )) (2.1)holds for all x, y ≥ α ∈ [0 , Lemma 2.1.
Let f be a positive real function on (0 , ∞ ) . (i) If f is AG-convex if and only if log f is convex;(ii) f is AH-convex if and only if 1 /f is concave;(iii) f is GA-convex (concave) if and only if f o exp is convex (concave);(iv) If h is convex (concave), then f ( t ) = h (ln t ) is GA-convex (concave);(v) f is GG-convex if and only if the function h = ln of o exp is convex;(vi) f is GG-convex if and only if h = ln of is GA-convex;(vii) f is GH-convex (concave) if and only if f o exp is AH-convex (concave);(iix) f is HG-convex if and only if h ( t ) = t ln f ( t ) is convex;(ix) f is HG-convex if and only if ln f is HA-convex;(x) f is HH-convex (concave) if and only if h ( t ) = t/f ( t ) is concave (convex).We remarked that each class of MN-convex functions we mentioned in Lemma 2.1actually contains many examples. So we give several examples before we continue. Example 2.2.
The functions t / √ t and t
7→ − t − are AH-convex on (0 , ∞ ).The functions t exp t and t t r ( r <
0) are AG-convex on R and (0 , ∞ ), M. KIAN respectively.The function t log(1 + t ) is GA-convex on (0 , ∞ ). Moreover, recall that the well-known digamma function is defined by ψ ( t ) = ddt log Γ( t ) = Γ( t )Γ ′ ( t ) on (0 , ∞ ), where Γdenotes the gamma function, i.e., Γ( t ) = R ∞ x t − e − x dx . It is known that [24] thefunctions t ψ ( t ) + t and t ψ ( t ) + t + t are GA-concave and GA-convex,respectively, on (0 , ∞ ).It has been shown in [19] that if f ( t ) = P ∞ n =0 c n t n is a real analytic function whoseradius of convergence is r > c n are non-negative, then f isa GG-convex function on (0 , r ). This implies that the functions exp, sinh and coshare GG-convex on R and the functions sec, csc and tan are GG-convex on (0 , π/ t (1 − t ) − and t t − t are GG-convex on (0 , t √ ln t and t
7→ − (ln t ) − are GH-convex on (0 , ∞ ).For all r ≥ r ≤ −
1, the function t exp( t r ) are HG-convex on (0 , ∞ ).The functions t t ln t and t t r (0 ≤ r ≤
1) are HH-convex on (0 , ∞ ).Regarding the Jensen inequality, Lemma 2.1 can be used to demonstrate variantsof the Jensen inequality for every MN-convex function. The proof of next lemma eas-ily follows from Lemma 2.1 and the classical Jensen inequality for convex functions.So we do not include details. Lemma 2.3.
Let f be a non-negative real function on (0 , ∞ ) and for i = 1 , . . . , n let x i ≥ and α i ∈ [0 , with P ni =1 α i = 1 . (i) If f is AG-convex, then f n X i =1 α i x i ! ≤ n Y i =1 f ( x i ) α i . (2.2)(ii) If f is AH-convex, then f n X i =1 α i x i ! ≤ n X i =1 α i f ( x i ) ! − . (2.3)(iii) If f is GA-convex, then f n Y i =1 x α i i ! ≤ n X i =1 α i f ( x i ) . (2.4)(iv) If f is GG-convex, then f n Y i =1 x α i i ! ≤ n Y i =1 f ( x i ) α i . (2.5) ONVEXITY OF THE CSISZ ´AR f -DIVERGENCE FUNCTIONAL 5 (vi) If f is GH-convex, then f n Y i =1 x α i i ! ≤ n X i =1 α i f ( x i ) ! − . (2.6)(vii) If f is HG-convex, then f n X i =1 α i x i ! − ≤ n Y i =1 f ( x i ) α i . (2.7)(iix) If f is HH-convex, then f n X i =1 α i x i ! − ≤ n X i =1 α i f ( x i ) ! − . (2.8)We begin with modifications of the celebrated result of Csisz´ar, Theorem A. Aconsequence of Theorem A is that if f is convex, the associated perspective function g f is convex in both variables. In the next theorem, we investigate the affect oftype of convexity of the generating function f to the convexity of the associatedperspective function g f . When there is not fear of ambiguity, we briefly use g for theassociated perspective function of f . Once more, we note that although we restrictthe domain and the range of our function to the positive half-line, depending onsituation, it is possible to consider this sets more general subsets of real functions. Theorem 2.4.
Let f : (0 , ∞ ) → (0 , ∞ ) be a real function. (i) f is AH-convex if and only if g is AH-convex on the first coordinate and convexon the second coordinate. In particular, the inequality g (cid:0) A α ( a, b ) , A α ( x, y ) (cid:1) ≤ H α { [ A α ( g ( a, x ) , g ( a, y ))] , [ A α ( g ( b, x ) , g ( b, y ))] } (2.9)holds for all a, b, x, y ≥ α ∈ [0 , f is AG-convex if and only if g is AG-convex on the first coordinate and convexon the second coordinate. In particular, the inequality g (cid:0) A α ( a, b ) , A α ( x, y ) (cid:1) ≤ G α { A α ( g ( a, x ) , g ( a, y )) , A α ( g ( b, x ) , g ( b, y )) } (2.10)holds, for all a, b, x, y ≥ α ∈ [0 , f is GG-convex if and only if g is jointly GG-convex. In particular, g (cid:0) G α ( a, b ) , G α ( x, y ) (cid:1) ≤ G α { g ( a, x ) , g ( b, y )) } (2.11)for all a, b, x, y ≥ α ∈ [0 , f is HH-convex if and only if g is jointly HH-convex. In particular, g (cid:0) H α ( a, b ) , H α ( x, y ) (cid:1) ≤ H α { g ( a, x ) , g ( b, y )) } (2.12) M. KIAN for all a, b, x, y ≥ α ∈ [0 , f is GH-convex if and only if g is GH-convex in its first variable and GG-convexin its second variable. In particular, the inequality g (cid:0) G α ( a, b ) , G α ( x, y ) (cid:1) ≤ H α { G α ( g ( a, x ) , g ( a, y )) , G α ( g ( b, x ) , g ( b, y )) } (2.13)holds for all a, b, x, y ≥ α ∈ [0 , g is jointlyGG-convex, i.e., (2.11) holds.Before proving Theorem 2.4, we would like to note that if f is MN-convex, then g is not necessarily MN-convex in both variables, unless M = N. For example, if f is AH-convex, then part (i) of Theorem 2.4 shows that g is AH-convex on the firstcoordinate and convex on the second coordinate. However, g is not AH-convex inboth variables. To see this, consider the AH-convex function f ( t ) = 1 / √ t and put α = 1 / a = 1, x = 2 and y = 4. Then3 √ g ( a, A / ( x, y )) (cid:2) H / ( g ( a, x ) , g ( a, y )) = 16 √
24 + √ . Note in addition that when f is AH-convex, g is AA-convex in both variables.However, the the reverse direction does not hold, i.e., if g is AA-convex in bothvariables, then f is not necessarily AH-convex. Proof. of Theorem 2.4.
First assume that f is AH-convex. For all x, y, a ≥ α ∈ [0 ,
1] we have g (cid:0) αa + (1 − α ) b, x (cid:1) = xf (cid:16) x (cid:0) αa + (1 − α ) b (cid:1)(cid:17) ≤ x h αf (cid:0) ax (cid:1) − + (1 − α ) f (cid:0) bx (cid:1) − i − = h αx − f (cid:0) ax (cid:1) − + (1 − α ) x − f (cid:0) bx (cid:1) − i − = h αg ( a, x ) − + (1 − α ) g ( b, x ) − i − . This ensures that g is AH-convex on the first coordinate. Therefore g (cid:16) α ( a, x ) + (1 − α )( b, y ) (cid:17) ≤ (cid:16) αg ( a, z ) − + (1 − α ) g ( b, z ) − (cid:17) − , where z = αx + (1 − α ) y . This means that g (cid:0) A α ( a, b ) , A α ( x, y ) (cid:1) ≤ H α { g ( a, A α ( x, y )) , g ( b, A β ( x, y )) } . (2.14)On the other hand we can write f (cid:18) a A α ( x, y ) (cid:19) = f αayx + (1 − α ) ayxαy + (1 − α ) x ! = f (cid:18) β (cid:16) ax (cid:17) + (1 − β ) (cid:18) ay (cid:19)(cid:19) , (2.15) ONVEXITY OF THE CSISZ ´AR f -DIVERGENCE FUNCTIONAL 7 where β = αyαy + (1 − α ) x = αx A α ( x,y ) . Since f is convex, (2.15) implies that f (cid:18) a A α ( x, y ) (cid:19) ≤ αx A α ( x, y ) f (cid:16) ax (cid:17) + (1 − α ) y A α ( x, y ) f (cid:18) ay (cid:19) . Multiplying both sides by A α ( x, y ) we get g (cid:0) a, A α ( x, y ) (cid:1) ≤ A α ( g ( a, x ) , g ( a, y )) . (2.16)Similarly g (cid:0) b, A α ( x, y ) (cid:1) ≤ A α ( g ( b, x ) , g ( b, y )) . (2.17)Since the Harmonic mean is monotone, it follows from (2.16) and (2.17) that g (cid:0) A α ( a, b ) , A α ( x, y ) (cid:1) ≤ H α { g ( a, A α ( x, y )) , g ( b, A α ( x, y )) } (by (2.14)) ≤ H α { A α ( g ( a, x ) , g ( a, y )) , A α ( g ( b, x ) , g ( b, y )) } which is the desired inequality (2.9). With x = y , this gives the AH-convexity of g in the first coordinate and with a = b this implies the convexity of g in the secondcoordinate.Conversely, if g is AH-convexity in the first coordinate, then f ( t ) = g ( t,
1) is anAH-convex function, too. This completes the proof of (i).Next suppose that f is AG-convex. For all x, y, a ≥ α ∈ [0 ,
1] we have g (cid:0) αa + (1 − α ) b, x (cid:1) = xf (cid:16) x (cid:0) αa + (1 − α ) b (cid:1)(cid:17) ≤ x h f (cid:0) ax (cid:1) α f (cid:0) bx (cid:1) − α i = h xf (cid:0) ax (cid:1)i α (cid:20) xf (cid:0) bx (cid:1)(cid:21) − α = g ( a, x ) α g ( b, x ) − α , whence g is AG-convex in its first variable. Hence g (cid:0) A α ( a, b ) , A α ( x, y ) (cid:1) ≤ G α { g ( a, A α ( x, y )) , g ( b, A α ( x, y )) } . (2.18)Furthermore, taking into account the Arithmetic-Geometric means inequality, weknow that f is a convex function so that (2.16) and (2.17) hold. Regarding themonotonicity of the Geometric mean in its both variables we conclude from (2.16)and (2.17) that G α { g ( a, A α ( x, y )) , g ( b, A α ( x, y )) } ≤ G α { A α ( g ( a, x ) , g ( a, y )) , A α ( g ( b, x ) , g ( b, y )) } . (2.19)Combining (2.18) and (2.19) we infer (2.10). Putting x = y , the AG-convexity of g in first coordinate follows from (2.10) and with a = b , (2.10) gives the convexity M. KIAN of g in the second coordinate. Conversely, if g is AG-convex in its first coordinate,then f ( t ) = g ( t,
1) is AG-convex as well.To prove (iii), let f be a GG-convex function. Then f (cid:18) G α ( a, b ) G α ( x, y ) (cid:19) = f (cid:0) a α b − α x − α y α − (cid:1) = f (cid:18) G α (cid:18) ax , by (cid:19)(cid:19) ≤ G α (cid:18) f (cid:16) ax (cid:17) , f (cid:18) by (cid:19)(cid:19) . Hence g (cid:0) G α ( a, b ) , G α ( x, y ) (cid:1) = G α ( x, y ) f (cid:18) G α ( a, b ) G α ( x, y ) (cid:19) ≤ G α ( x, y ) G α (cid:18) f (cid:16) ax (cid:17) , f (cid:18) by (cid:19)(cid:19) = G α { g ( a, x ) , g ( b, y )) } as required. This proves (iii).Next suppose that f is a HH-convex function. We write f (cid:18) H α ( a, b ) H α ( x, y ) (cid:19) = f (cid:18) αa − + (1 − α ) b − αx − + (1 − α ) y − (cid:19) − ! = f (cid:18) α xya + (1 − α ) xyb αy + (1 − α ) x (cid:19) − ! = f (cid:18)(cid:16) β xa + (1 − β ) yb (cid:17) − (cid:19) in which we set β = αyαy +(1 − α ) x . Since f is HH-convex, we obtain f (cid:18) H α ( a, b ) H α ( x, y ) (cid:19) = f (cid:18) H β (cid:18) ax , by (cid:19)(cid:19) ≤ H β (cid:18) f (cid:16) ax (cid:17) , f (cid:18) by (cid:19)(cid:19) so that g ( H α ( a, b ) , H α ( x, y )) = H α ( x, y ) f (cid:18) H α ( a, b ) H α ( x, y ) (cid:19) ≤ H α ( x, y ) H β (cid:18) f (cid:16) ax (cid:17) , f (cid:18) by (cid:19)(cid:19) . (2.20)A simple calculation shows that H α ( x, y ) H β (cid:18) f (cid:16) ax (cid:17) , f (cid:18) by (cid:19)(cid:19) = H α { g ( a, x ) , g ( b, y ) } . Consequently, (2.12) follows from (2.20). Hence g is jointly HH-convex. Conversely,if g is jointly HH-convex, then f ( t ) = g ( t,
1) is HH-convex. This proves (iv).
ONVEXITY OF THE CSISZ ´AR f -DIVERGENCE FUNCTIONAL 9 Let f be a GH-convex function. For all a, b, x ≥ α ∈ [0 ,
1] we have g (cid:0) G α ( a, b ) , x (cid:1) = xf (cid:18) G α ( a, b ) x (cid:19) = xf (cid:18) G α (cid:18) ax , bx (cid:19)(cid:19) ≤ x H α (cid:18) f (cid:16) ax (cid:17) , f (cid:18) bx (cid:19)(cid:19) = H α ( g ( a, x ) , g ( b, x )) , (2.21)whence g is GH-convex function in its first coordinate. Furthermore, we can write g (cid:0) a, G α ( x, y ) (cid:1) = G α ( x, y ) f (cid:18) a G α ( x, y ) (cid:19) = G α ( x, y ) f (cid:18) G α (cid:18) ax , ay (cid:19)(cid:19) ≤ G α ( x, y ) H α (cid:18) f (cid:16) ax (cid:17) , f (cid:18) ay (cid:19)(cid:19) ≤ G α ( x, y ) G α (cid:18) f (cid:16) ax (cid:17) , f (cid:18) ay (cid:19)(cid:19) = G α ( g ( a, x ) , g ( a, y )) , (2.22)where the last inequality follows from the Harmonic-Geometric mean inequality.This ensures that g is GG-convex in the second coordinate. Furthermore, combining(2.21) and (2.22) and using the monotonicity of the Harmonic mean, we reach (2.13).In addition, a similar argument as in (2.21) shows that g is jointly GG-convex.Indeed, g (cid:0) G α ( a, b ) , G α ( x, y ) (cid:1) = G α ( x, y ) f (cid:18) G α ( a, b ) G α ( x, y ) (cid:19) = G α ( x, y ) f (cid:18) G α (cid:18) ax , by (cid:19)(cid:19) ≤ G α ( x, y ) H α (cid:18) f (cid:16) ax (cid:17) , f (cid:18) by (cid:19)(cid:19) ≤ G α ( x, y ) G α (cid:18) f (cid:16) ax (cid:17) , f (cid:18) by (cid:19)(cid:19) = G α ( g ( a, x ) , g ( b, y )) , and so g is jointly GG-convex as we claimed. The converse follows similarly asprevious parts. (cid:3) We give some particular corollaries of of Theorem 2.4 for some f -divergence func-tionals. It is easy to see that there are positive real numbers c , c for which the func-tion f ( t ) = t log t is AH-convex on ( c , c ). Theorem 2.4 implies that the Kullback–Leibler distance KL ( p , q ) = P nj =1 p j log (cid:16) p j q j (cid:17) is AH-convex on the first coordinateand convex on the second coordinate. As another example, the function f ( t ) = t r is AG-convex on (0 , ∞ ) for all r <
0. By Theorem 2.4, the generated divergencefunctional ρ r ( p , q ) = P nj =1 p rj q − rj is AG-convex on its first coordinate and convexon its second coordinate. As another example, the function f ( t ) = 2( √ t − isGG-convex. Accordingly, the related divergence functional, which is the Hellingerdistance H ( p , q ) := 2 P nj =1 (cid:0) √ p j − √ q j (cid:1) is jointly GG-convex.Some applications of Theorem 2.4 will be given in the next section. As we sawin Lemma 2.3, the MN-convexity of f produces variants of the Jensen inequality.Here, we study inequality (1.1) in Theorem A, when the core function f enjoysMN-convexity. Theorem 2.5.
Let a = ( a , . . . , a n ) and b = ( b , . . . , b n ) be two n -tuples of positivereal numbers and let f be a positive real function on (0 , ∞ ) . (i) If f is an AH-convex function, then g (cid:0) ¯a , ¯b (cid:1) ≤ ¯a I f ( a , b ) − ≤ I f ( a , b ) . (2.23)(ii) If f is an AG-convex function, then g (cid:0) ¯a , ¯b (cid:1) ≤ ¯b exp (cid:20) ¯b I log f ( a , b ) (cid:21) ≤ I f ( a , b ) . (2.24)(iii) If f is a HA-convex function, then g (cid:0) ¯a , ¯b (cid:1) = ¯b¯a g ϕ (cid:0) ¯b , ¯a (cid:1) = ¯b¯a g φ (cid:0) ¯a , ¯b (cid:1) ≤ ¯b¯a I ϕ ( b , a ) = ¯b¯a I φ ( a , b ) , (2.25)where ϕ ( t ) = f (1 /t ) and φ ( t ) = tf ( t ).(iv) If f is an increasing GA-convex function, then g (cid:0) ¯a , ¯b (cid:1) ≤ g fo exp (cid:0) ¯a , ¯b (cid:1) ≤ I fo exp ( a , b ) . (2.26) Proof.
Suppose that a , b are n -tuples of positive real numbers. For every i =1 , . . . , n , we set β i = b i P nk =1 b k so that ( β , . . . , β n ) is a probability vector. Firstassume that f is an AH-convex function for which we can write f (cid:16) P ni =1 a i P ni =1 b i (cid:17) = f (cid:16) a P nk =1 b k + · · · + a n P nk =1 b k (cid:17) = f (cid:16) a b b P nk =1 b k + · · · + a n b n b n P nk =1 b k (cid:17) ≤ n X i =1 b i P nk =1 b k f ( a i b i ) ! − (2.27) ONVEXITY OF THE CSISZ ´AR f -DIVERGENCE FUNCTIONAL 11 where we use (2.3) with β i = b i P nk =1 b k . Multiplying both sides of (2.27) with P nk =1 b k we get (cid:0) n X k =1 b k (cid:1) f (cid:16) P ni =1 a i P ni =1 b i (cid:17) ≤ (cid:16) n X k =1 b k (cid:17)(cid:18) n X i =1 b i P nk =1 b k f ( a i b i ) (cid:19) − = (cid:0) n X k =1 b k (cid:1) (cid:18) n X i =1 b i f ( a i b i ) (cid:19) − = (cid:16) n X k =1 b k (cid:17) I f ( a , b ) − , which implies the first inequality in (2.23). To get the second inequality we use theconvexity of the function t t − . (cid:16) n X k =1 b k (cid:17) I f ( a , b ) − = (cid:16) n X k =1 b k (cid:17)(cid:18) n X i =1 b i P nk =1 b k f ( a i b i ) (cid:19) − ≤ n X i =1 b i f (cid:18) a i b i (cid:19) = I f ( a , b ) . This completes the proof of (i). Next assume that f is an AG-convex function. Then f (cid:16) P ni =1 a i P ni =1 b i (cid:17) = f n X i =1 β i a i b i ! ≤ n Y i =1 f (cid:18) a i b i (cid:19) β i , (2.28)in which we use the same convex coefficients β i as in the proof of (i). Moreover, n Y i =1 f (cid:18) a i b i (cid:19) β i = n Y i =1 exp (cid:20) b i P nk =1 b k log f (cid:18) a i b i (cid:19)(cid:21) = exp " n X i =1 b i P nk =1 b k log f (cid:18) a i b i (cid:19) = exp (cid:20) ¯b I log f ( a , b ) (cid:21) . (2.29)The left inequality in (2.24) follows from (2.28) and (2.29). In addition, utilising theArithmetic-Geometric means inequality we reach n Y i =1 f (cid:18) a i b i (cid:19) β i ≤ n X i =1 β i f (cid:18) a i b i (cid:19) = 1 ¯b I f ( a , b ) (2.30)and the right inequality in (2.24) is derived. This concludes (ii).Now assume that f is a HA-convex function. It is not hard to see that [6] thefunctions ϕ ( t ) = f (1 /t ) and φ ( t ) = tf ( t ) are convex on proper domains so thatTheorem A gives g ϕ (cid:0) ¯a , ¯b (cid:1) ≤ I ϕ ( a , b ) and g φ (cid:0) ¯a , ¯b (cid:1) ≤ I φ ( a , b ). We consider theconvex coefficients α i = a i P nk =1 a k for i = 1 , . . . , n in such a way that f (cid:18) P ni =1 a i P ni =1 b i (cid:19) = f (cid:18) P ni =1 b i P ni =1 a i (cid:19) − ! = f n X i =1 b i a i a i P nk =1 a k ! − = f n X i =1 α i b i a i ! − . By the HA-convex of f , this concludes that f (cid:18) P ni =1 a i P ni =1 b i (cid:19) ≤ n X i =1 α i f (cid:18) a i b i (cid:19) = 1 P nk =1 a k n X i =1 a i f (cid:18) a i b i (cid:19) . Multiplying both sides by P nk =1 b k we reach g (cid:0) ¯a , ¯b (cid:1) ≤ ¯b¯a I ϕ ( b , a ) . On the other hand, g (cid:0) ¯a , ¯b (cid:1) = ¯b f (cid:18) ¯a¯b (cid:19) = ¯b¯a ¯a ϕ (cid:18) ¯b¯a (cid:19) = ¯b¯a g ϕ (cid:0) ¯b , ¯a (cid:1) = ¯b¯a g φ (cid:0) ¯a , ¯b (cid:1) . Furthermore, we compute I ϕ ( b , a ) = n X i =1 a i ϕ (cid:18) b i a i (cid:19) = n X i =1 b i a i b i f (cid:18) a i b i (cid:19) = I φ ( a , b ) , so that we arrive at (iii).For proving (iv), first not that a function f is GA-convex if and only if the function t f ( e t ) is convex, indeed, when proper domains are considered. So the Csisz´arinequality in Theorem A implies the right inequality of (2.26): g fo exp (cid:0) ¯a , ¯b (cid:1) ≤ I fo exp ( a , b ) . (2.31)When f is increasing, we have f o exp ≥ f on the positive half line. This ensure thatthe left inequality in (iv) is valid. (cid:3) Matrix Jensen Inequality
Let M n denote the algebra of n × n complex matrices and I denote the identitymatrix. It is known that (see for example [10, Theorem 1.2]) an extension of theclassical Jensen inequality holds as follows: f ( h Aη, η i ) ≤ h f ( A ) η, η i (3.1)for every continuous convex function f : J → R and every Hermitian matrix A ∈ M n with eigenvalues in J and every unit vector η ∈ C n . Our mean by f ( A ) isthe Hermitian matrix defined using the spectral decomposition of A . Indeed, if A = P ni =1 λ i P i is the spectral decomposition of the Hermitian matrix A ∈ M n ,when λ i ’s are eigenvalues of A and P i ’s are projections with P ni =1 P i = I , then f ( A ) = P ni =1 f ( λ i ) P i , See [10].Lemma 2.3 can be applied to derive variants of (3.1) for MN-convex functions.See [1, 12, 15, 16, 18, 21] and references therein for a collection of such inequalities. ONVEXITY OF THE CSISZ ´AR f -DIVERGENCE FUNCTIONAL 13 Proposition 3.1.
Let f be a continuous real function. (i) If f is AH-convex, then f ( h Aη, η i ) ≤ h f ( A ) − η, η i − , (3.2)(ii) If f is AG-convex, then f ( h Aη, η i ) ≤ exp h log f ( A ) η, η i , (3.3)(iii) If f is GA-convex, then f (exp h log Aη, η i ) ≤ h f ( A ) η, η i , (3.4)(iv) If f is GG-convex, then f (exp h log Aη, η i ) ≤ exp h log f ( A ) η, η i , (3.5)(v) If f is GH-convex, then f (exp h log Aη, η i ) ≤ h f ( A ) − η, η i − , (3.6)(vi) If f is HG-convex, then f (cid:16)(cid:10) A − η, η (cid:11) − (cid:17) ≤ exp h log f ( A ) η, η i , (3.7)(vii) If f is HH-convex, then f (cid:16)(cid:10) A − η, η (cid:11) − (cid:17) ≤ h f ( A ) − η, η i − , (3.8)for every unit vector η ∈ C n and every Hermitian matrix A ∈ M n , whose eigenvaluesare contained in the domain of f . Proof.
We only note that utilising the spectral decomposition of A , the inner productterms in every part of proposition can be described by an scalar mean. For example,if A = P ni =1 λ i P i is the spectral decomposition of A , then P ni =1 h P i η, η i = 1 andexp h log Aη, η i = exp * n X i =1 log λ i P i ! η, η + = exp n X i =1 h P i η, η i log λ i ! = n Y i =1 λ i h P i η,η i = G( α ; Λ) , where α = ( h P η, η i , . . . , h P n η, η i ) is a weight vector and Λ = ( λ , . . . , λ n ). (cid:3) Let A ∈ M n and B ∈ M m be Hermitian matrices with spectral decompositions A = P ni =1 λ i P i and B = P mi =1 µ i Q i . When f is a two variable real function definedon J × J ⊆ R , then we can define a Hermitian matrix f ( A, B ) as f ( A, B ) = n X i =1 m X j =1 f ( λ i , µ j ) P i ⊗ Q j and so f becomes a matrix function of two variables from M n × M m to M nm . It hasbeen shown in [17] that if f is a separately convex function on J × J ⊆ R , then f ( h Aη, η i , h Bζ , ζ i ) ≤ h f ( A, B ) η ⊗ ζ , η ⊗ ζ i (3.9)for all unit vectors η ∈ C n and ζ ∈ C m and all Hermitian matrices A ∈ M n and B ∈ M m .As it was shown in Theorem 2.4, the type of convexity of the core function f affects on the convexity of perspective function g . In the rest of this section, we aregoing to establish matrix Jensen inequality (3.9) for the perspective functions in thecase where f is a MN-convex function. Theorem 3.2.
Let h be a real two-variable function on J × J ⊆ R . (i) If h is separately HH-convex, then h (cid:0) h A − η, η i − , h B − ζ , ζ i − (cid:1) ≤ (cid:10) h ( A, B ) − η ⊗ ζ , η ⊗ ζ (cid:11) − ; (3.10)(ii) If h is separately GG-convex, then h (exp h log Aη, η i , exp h log Bζ , ζ i ) ≤ exp h log h ( A, B ) η ⊗ ζ , η ⊗ ζ i ; (3.11)for all unit vectors η ∈ C n and ζ ∈ C m and all Hermitian matrices A ∈ M n and B ∈ M m . Proof.
Suppose that A = P ni =1 λ i P i and B = P mi =1 µ i Q i are spectral decompositionsof Hermitian matrices A and B . Assume that η ∈ C n and ζ ∈ C m are unit vectorsso that P ni =1 h P i η, η i = 1 = P mj =1 h Q j ζ , ζ i . Then h (cid:0) h A − η, η i − , h B − ζ, ζ i − (cid:1) = h n X i =1 λ i − h P i η, η i ! − , b ≤ n X i =1 h P i η, η i h ( λ i , b ) − ! − , (3.12) where b = h B − ζ , ζ i − and the inequality follows from the HH-convexity of h inthe first variable and (2.8) of Lemma 2.3. Furthermore, for every i = 1 , . . . , n , theHH-convexity of h in the second variable gives h ( λ i , b ) = h λ i , m X j =1 µ j − h Q j ζ, ζ i − ≤ m X j =1 h ( λ i , µ j ) − h Q j ζ, ζ i − . (3.13) ONVEXITY OF THE CSISZ ´AR f -DIVERGENCE FUNCTIONAL 15 It follows from (3.12) and (3.13) that h (cid:0) h A − η, η i − , h B − ζ, ζ i − (cid:1) ≤ n X i =1 m X j =1 h P i η, η ih Q j ζ, ζ i h ( λ i , µ j ) − − = n X i =1 m X j =1 h ( λ i , µ j ) − h ( P i ⊗ Q j ) η ⊗ ζ, η ⊗ ζ i − = (cid:10) h ( A, B ) − η ⊗ ζ, η ⊗ ζ (cid:11) − and we obtain (3.10). Next suppose that h is separately GG-convex and supposethat A and B are Hermitian matrices with the same spectral decompositions as inthe first part. Utilising Lemma 2.3 we have h (exp h log Aη, η i , exp h log Bζ, ζ i ) = h n Y i =1 λ i h P i η,η i , b ! ≤ n Y i =1 h ( λ i , b ) h P i η,η i , (3.14) in which b = exp h log Bζ , ζ i = Q mj =1 µ j h Q j ζ,ζ i . Moreover, another use of Lemma 2.3regarding the GG-convexity of h in the second variable gives h ( λ i , b ) = h λ i , m Y j =1 µ j h Q j ζ,ζ i ≤ m Y j =1 h ( λ i , µ j ) h Q j ζ,ζ i (3.15) for every i = 1 , . . . , n . From (3.14) and (3.15) we obtain h (exp h log Aη, η i , exp h log Bζ, ζ i ) ≤ n Y i =1 m Y j =1 h ( λ i , µ j ) h P i η,η ih Q j ζ,ζ i = exp h log h ( A, B ) η ⊗ ζ, η ⊗ ζ i and we are done. (cid:3) Remark . Let us give some applications of Theorem 2.4 and Theorem 3.2 forperspective functions. We show in Theorem 2.4 that if f is HH-convex, then theassociated perspective function g is HH-convex in its both variables and so (3.10)holds by Theorem 3.2. For example, the function f ( t ) = t r is HH-convex for every r ∈ [0 ,
1] and so g ( t, s ) = sf ( t/s ) = s − r t r is HH-convex in its both variables. Notethat in this particular example we have g ( A, B ) = A r ⊗ B − r . Now (3.10) impliesthat h A − η, η i − r h B − ζ , ζ i r − ≤ (cid:10) ( A r ⊗ B − r ) η ⊗ ζ , η ⊗ ζ (cid:11) . Note that because the function g ( t, s ) in this example can be decomposed as g ( t, s ) = g ( t ) g ( s ), the above inequality follows directly from (3.1). Remark . If the type of convexity in first coordinate of a two variable function h differs from its second coordinate, it is also possible to present Theorem 3.2. For example, assume that the function h , defined on J × J ⊆ R , is AH-convex in itsfirst coordinate and convex in the second coordinate. Then h ( h Aη, η i , h Bζ , ζ i ) ≤ (cid:10) h ( A, h Bζ , ζ i ) − η, η (cid:11) − , (3.16)where we use (3.2) in Proposition 3.1 for the AH-convex function h ( t ) = h ( t, h Bζ , ζ i ).Note that h ( A, h Bζ , ζ i ) is a Hermitian matrix in M n defined by h ( A, h Bζ , ζ i ) = n X i =1 h ( λ i , h Bζ , ζ i ) P i , in which we use the spectral decomposition of A as before. Since h is convex on thesecond coordinate, for every i = 1 , . . . , n we have h ( λ i , h Bζ , ζ i ) ≤ m X j =1 h ( λ i , µ j ) h Q j ζ , ζ i . Accordingly, h ( A, h Bζ , ζ i ) ≤ n X i =1 m X j =1 h ( λ i , µ j ) h Q j ζ , ζ i P i , whence h h ( A, h Bζ, ζ i ) η, η i ≤ n X i =1 m X j =1 h ( λ i , µ j ) h Q j ζ, ζ ih P i η, η i = h h ( A, B ) η ⊗ ζ, η ⊗ ζ i . (3.17) Now we obtain from (3.16) and (3.17) that h ( h Aη, η i , h Bζ , ζ i ) ≤ (cid:10) h ( A, h Bζ , ζ i ) − η, η (cid:11) − ≤ h h ( A, h Bζ , ζ i ) η, η i≤ h h ( A, B ) η ⊗ ζ , η ⊗ ζ i . Similarly, it can be shown that h ( h Aη, η i , h Bζ , ζ i ) ≤ h h ( h Aη, η i , B ) ζ , ζ i ≤ h h ( A, B ) η ⊗ ζ , η ⊗ ζ i . We show in Theorem 2.4 that if f is AH-convex, then its perspective function g isAH-convex in first coordinate and convex in the second coordinate. Hence, the twolast series of inequalities holds true for the perspective function of every AH-convex. References [1] R.P. Agarwal, S.S. Dragomir,
A survey of Jensen type inequalities for functions of self-adjointoperators in Hilbert spaces , Comput. Math. Appl. (2010), 3785–3812.[2] G. D. Anderson, M. K. Vamanamurthy and M. Vuorinen, Generalized convexity and inequal-ities , J. Math. Anal. Appl. (2007), 1294–1308.
ONVEXITY OF THE CSISZ ´AR f -DIVERGENCE FUNCTIONAL 17 [3] I. Csisz´ar, Information Measures: A Critical Survey , Trans. 7’th Prague Conf. on Info. Th.,Statist. Decis. Funct., Random Processes and 8’th European Meeting of Statist., Volume B,Academia Prague (1978), 73-–86.[4] I. Csisz´ar,
Information-type measures of difference of probability distributions and indirectobservations , Stud. Sci. Math. Hung. (1967), 299–318.[5] I. Csisz´ar and J. K¨orner, Information Theory: Coding Theorems for Discrete Memory-lessSystems , Academic Press, New York, 1981.[6] S. S. Dragomir,
Hermite-Hadamard type inequalities for
MN-convex functions, to appear inThe Australian Journal of Mathematical Analysis and Applications.[7] S. S. Dragomir, (Ed.),
Inequalities for Csisz´ar f -divergence in information theory , RGMIAMonographs, Victoria University, 2000.[8] S. S. Dragomir, Inequalities of Jensen type for
AH-convex functions, J. Numer. Anal. Approx.Theory, (2016), 128–146.[9] S. S. Dragomir, Inequalities of Hermite–Hadamard type for HH -convex functions , Acta Comm.Univ. Tartuensis Math. , Number 2, (2018), 179–190.[10] T. Furuta, J. Mi´ci´c, J. Pe´cari´c and Y. Seo, Mond–Pe´cari´c method in operator inequalities ,Element, Zagreb, 2005.[11] G.L. Gilardoni,
On Pinsker’s and Vajda’s type inequalities for Csisz´ar’s f -divergences , IEEETrans. Inf. Theory (2010), 5377–5386.[12] F. Hansen, H. Najafi, M.S. Moslehian, Operator maps of Jensen-type , Positivity (2018),no. 5, 1255–1263.[13] J.-B. Hiriart-Urruty and J.-E. Mart´ınez-Legaz, Convex solutions of a functional equation aris-ing in information theory , J. Math. Anal. Appl. (2007), 1309–1320.[14] M. Kian,
A characterization of mean values for Csisz´ar’s inequality and applications , Indaga.Math. (2014) 505–515.[15] M. Kian, Operator Jensen inequality for superquadratic functions , Linear Algebra Appl. (2014), 82–87.[16] M. Kian and S. S. Dragomir,
Inequalities involving superquadratic functions and operators ,Mediter. J. Math. (2014), 1205–1214.[17] J. S. Matharu and J. S. Aujla, Some majorization inequalities for convex functions of severalvariables , Math. Inequal. Appl. (2011), 947–956.[18] M.S. Moslehian, A. Dadkhah, and K. Yanagi, Noncommutative versions of inequalities inquantum information theory , Anal. Math. Phys. (2019), no. 4, 2151–2169.[19] C. P. Niculescu, Convexity according to the geometric mean , Math. Inequal. Appl., (2000),155–167.[20] M. A. Noor, K. I. Noor and M. U. Awan, Some inequalities for geometrically–arithmeticallyh-convex functions , Creat. Math. Inform. (2014), No. 1, 91–98.[21] J. Rooin, S. Habibzadeh and M.S. Moslehian, Jensen inequalities for P-class functions , Period.Math. Hungar. (2018), no. 2, 261–273.[22] I. Sason, On f -divergences: integral representations, local behavior, and inequalities , Entropy (2018), 383.[23] I. Vajda, On metric divergences of probability measures , Kybernetika (2009), 885–900. [24] X.-M. Zhang, Y.-M. Chu and X.-H. Zhang, The Hermite-Hadamard type inequality of
GA-convex functions and its application, Journal of Inequalities and Applications, Volume 2010,Article ID 507560, 11 pages.
Mohsen Kian: Department of Mathematics, University of Bojnord, P. O. Box1339, Bojnord 94531, Iran
Email address ::