[PDF] How type of Convexity of the Core function affects the Csiszár f-divergence functional

Abstract

We investigate how the type of Convexity of the Core function affects the Csisz\'{a}r f-divergence functional. A general treatment for the type of convexity has been considered and the associated perspective functions have been studied. In particular, it has been shown that when the core function is \rm{MN}-convex, then the associated perspective function is jointly \rm{MN}-convex if the two scalar mean \rm{M} and \rm{N} are the same. In the case where \mathrm{M}\neq\mathrm{N}, we study the type of convexity of the perspective function. As an application, we prove that the \textit{Hellinger distance} is jointly \rm{GG}-convex. As further applications, the matrix Jensen inequality has been developed for the perspective functions under different kinds of convexity.

Full PDF

aa r X i v : . [ c s . I T ] J a n HOW TYPE OF CONVEXITY OF THE CORE FUNCTIONAFFECTS THE CSISZ ´AR f -DIVERGENCE FUNCTIONAL MOHSEN KIAN

Abstract.

We investigate how the type of Convexity of the Core function aﬀectsthe Csisz´ar f -divergence functional. A general treatment for the type of convexityhas been considered and the associated perspective functions have been studied.In particular, it has been shown that when the core function is MN-convex, thenthe associated perspective function is jointly MN-convex if the two scalar mean Mand N are the same. In the case where M = N, we study the type of convexity ofthe perspective function. As an application, we prove that the Hellinger distance is jointly GG-convex. As further applications, the matrix Jensen inequality hasbeen developed for the perspective functions under diﬀerent kinds of convexity. introduction In the probability theory, the notion of Csisz´ar f -divergence is well-known in rela-tion with measures between probability distributions. Those kinds of measures havemany applications in many directions, like economics, genetics, signal processingand so on. In fact, Csisz´ar [4, 3] introduced f -divergence functional of a function f : [0 , ∞ ) → R by I f ( p , q ) := n X j =1 q j f (cid:18) p j q j (cid:19) for n -tuples of positive real numbers p = ( p , . . . , p n ) and q = ( q , . . . , q n ). In abovedeﬁnition, the undeﬁned expressions are interpreted as f (0) = lim t → + f ( t ) , f (cid:18) (cid:19) = 0 , f (cid:16) p (cid:17) = lim ǫ → + f (cid:16) pǫ (cid:17) = p lim t →∞ f ( t ) t . A useful fact concerning the f -divergence functional was proved by Csisz´ar andK¨orner [5] as follows. In fact, they showed that the perspective function of a convexfunction is sub-additive. Mathematics Subject Classiﬁcation.

Primary 26B25 . 94A17; Secondary 15A45 . 26D15.

Key words and phrases.

Csisz´ar f -divergence, convexity and joint convexity, Matrix Jenseninequality, spectral decomposition. Theorem A. If f : [0 , ∞ ) → R is convex, then I f ( p , q ) is jointly convex in p and q and g n X j =1 p j , n X j =1 q j ! = n X j =1 q j f P nj =1 p j P nj =1 q j ! ≤ I f ( p , q ) (1.1) for all positive n -tuples p = ( p , . . . , p n ) and q = ( q , . . . , q n ) , where the perspectivefunction g associated to f is deﬁned by g ( x, y ) := yf (cid:18) xy (cid:19) . When f varies through convex functions, the Csisz´ar f -divergence produces dif-ferent known measures. Among others, we mention the following notable measures:- Kullback–Leibler distance is deﬁned by KL ( p , q ) := P nj =1 p j log (cid:16) p j q j (cid:17) and KL = I f ,when f ( t ) = t ln t ( t > Total variation distance is deﬁned by V ( p , q ) := P nj =1 | p j − q j | and V = I f , when f ( t ) = | t − | ( t ≥ Hellinger distance is deﬁned by H ( p , q ) := 2 P nj =1 (cid:0) √ p j − √ q j (cid:1) and H = I f ,when f ( t ) = 2( √ t − ( t ≥ χ -distance is deﬁned by D χ ( p , q ) := P nj =1 ( p j − q j ) q j and D χ = I f , when f ( t ) =( t − ( t ≥ R´enyi’s divergences are deﬁned by R α ( p , q ) := α ( α − ln ρ α ( p , q ) for every α ∈ R \{ , } , where ρ α ( p , q ) = P nj =1 p αj q − αj and ρ α = I f , when f ( t ) = t α ( t > f -divergence functional and its properties, the readeris referred to [11, 13, 14, 22, 23] and references therein.For every two positive real numbers x, y and every α ∈ [0 , A α ( x, y ) = αx + (1 − α ) y Arithmetic mean G α ( x, y ) = x α y − α Geometric mean H α ( x, y ) = (cid:0) αx − + (1 − α ) y − (cid:1) − Harmonic mean . The Arithmetic-Geometric-Harmonic means inequality is well-known: H α ( x, y ) ≤ G α ( x, y ) ≤ A α ( x, y ) , ( x, y ≥ , α ∈ [0 , . (1.2) ONVEXITY OF THE CSISZ ´AR f -DIVERGENCE FUNCTIONAL 3 The effect of type of convexity of core function on the f -divergence functional Convex functions are known to be deﬁned using the Arithmetic mean: A realfunction f is convex when f ( A α ( x, y )) ≤ A α ( f ( x ) , f ( y ))for all x, y in domain of f and every α ∈ [0 , M α and N α are two α -weighted scalar means, a positive real function f on (0 , ∞ )is said to be MN-convex, when f ( M α ( x, y )) ≤ N α ( f ( x ) , f ( y )) (2.1)holds for all x, y ≥ α ∈ [0 , Lemma 2.1.

Let f be a positive real function on (0 , ∞ ) . (i) If f is AG-convex if and only if log f is convex;(ii) f is AH-convex if and only if 1 /f is concave;(iii) f is GA-convex (concave) if and only if f o exp is convex (concave);(iv) If h is convex (concave), then f ( t ) = h (ln t ) is GA-convex (concave);(v) f is GG-convex if and only if the function h = ln of o exp is convex;(vi) f is GG-convex if and only if h = ln of is GA-convex;(vii) f is GH-convex (concave) if and only if f o exp is AH-convex (concave);(iix) f is HG-convex if and only if h ( t ) = t ln f ( t ) is convex;(ix) f is HG-convex if and only if ln f is HA-convex;(x) f is HH-convex (concave) if and only if h ( t ) = t/f ( t ) is concave (convex).We remarked that each class of MN-convex functions we mentioned in Lemma 2.1actually contains many examples. So we give several examples before we continue. Example 2.2.

The functions t / √ t and t

7→ − t − are AH-convex on (0 , ∞ ).The functions t exp t and t t r ( r <

0) are AG-convex on R and (0 , ∞ ), M. KIAN respectively.The function t log(1 + t ) is GA-convex on (0 , ∞ ). Moreover, recall that the well-known digamma function is deﬁned by ψ ( t ) = ddt log Γ( t ) = Γ( t )Γ ′ ( t ) on (0 , ∞ ), where Γdenotes the gamma function, i.e., Γ( t ) = R ∞ x t − e − x dx . It is known that [24] thefunctions t ψ ( t ) + t and t ψ ( t ) + t + t are GA-concave and GA-convex,respectively, on (0 , ∞ ).It has been shown in [19] that if f ( t ) = P ∞ n =0 c n t n is a real analytic function whoseradius of convergence is r > c n are non-negative, then f isa GG-convex function on (0 , r ). This implies that the functions exp, sinh and coshare GG-convex on R and the functions sec, csc and tan are GG-convex on (0 , π/ t (1 − t ) − and t t − t are GG-convex on (0 , t √ ln t and t

7→ − (ln t ) − are GH-convex on (0 , ∞ ).For all r ≥ r ≤ −

1, the function t exp( t r ) are HG-convex on (0 , ∞ ).The functions t t ln t and t t r (0 ≤ r ≤

1) are HH-convex on (0 , ∞ ).Regarding the Jensen inequality, Lemma 2.1 can be used to demonstrate variantsof the Jensen inequality for every MN-convex function. The proof of next lemma eas-ily follows from Lemma 2.1 and the classical Jensen inequality for convex functions.So we do not include details. Lemma 2.3.

Let f be a non-negative real function on (0 , ∞ ) and for i = 1 , . . . , n let x i ≥ and α i ∈ [0 , with P ni =1 α i = 1 . (i) If f is AG-convex, then f n X i =1 α i x i ! ≤ n Y i =1 f ( x i ) α i . (2.2)(ii) If f is AH-convex, then f n X i =1 α i x i ! ≤ n X i =1 α i f ( x i ) ! − . (2.3)(iii) If f is GA-convex, then f n Y i =1 x α i i ! ≤ n X i =1 α i f ( x i ) . (2.4)(iv) If f is GG-convex, then f n Y i =1 x α i i ! ≤ n Y i =1 f ( x i ) α i . (2.5) ONVEXITY OF THE CSISZ ´AR f -DIVERGENCE FUNCTIONAL 5 (vi) If f is GH-convex, then f n Y i =1 x α i i ! ≤ n X i =1 α i f ( x i ) ! − . (2.6)(vii) If f is HG-convex, then f  n X i =1 α i x i ! −  ≤ n Y i =1 f ( x i ) α i . (2.7)(iix) If f is HH-convex, then f  n X i =1 α i x i ! −  ≤ n X i =1 α i f ( x i ) ! − . (2.8)We begin with modiﬁcations of the celebrated result of Csisz´ar, Theorem A. Aconsequence of Theorem A is that if f is convex, the associated perspective function g f is convex in both variables. In the next theorem, we investigate the aﬀect oftype of convexity of the generating function f to the convexity of the associatedperspective function g f . When there is not fear of ambiguity, we brieﬂy use g for theassociated perspective function of f . Once more, we note that although we restrictthe domain and the range of our function to the positive half-line, depending onsituation, it is possible to consider this sets more general subsets of real functions. Theorem 2.4.

Let f : (0 , ∞ ) → (0 , ∞ ) be a real function. (i) f is AH-convex if and only if g is AH-convex on the ﬁrst coordinate and convexon the second coordinate. In particular, the inequality g (cid:0) A α ( a, b ) , A α ( x, y ) (cid:1) ≤ H α { [ A α ( g ( a, x ) , g ( a, y ))] , [ A α ( g ( b, x ) , g ( b, y ))] } (2.9)holds for all a, b, x, y ≥ α ∈ [0 , f is AG-convex if and only if g is AG-convex on the ﬁrst coordinate and convexon the second coordinate. In particular, the inequality g (cid:0) A α ( a, b ) , A α ( x, y ) (cid:1) ≤ G α { A α ( g ( a, x ) , g ( a, y )) , A α ( g ( b, x ) , g ( b, y )) } (2.10)holds, for all a, b, x, y ≥ α ∈ [0 , f is GG-convex if and only if g is jointly GG-convex. In particular, g (cid:0) G α ( a, b ) , G α ( x, y ) (cid:1) ≤ G α { g ( a, x ) , g ( b, y )) } (2.11)for all a, b, x, y ≥ α ∈ [0 , f is HH-convex if and only if g is jointly HH-convex. In particular, g (cid:0) H α ( a, b ) , H α ( x, y ) (cid:1) ≤ H α { g ( a, x ) , g ( b, y )) } (2.12) M. KIAN for all a, b, x, y ≥ α ∈ [0 , f is GH-convex if and only if g is GH-convex in its ﬁrst variable and GG-convexin its second variable. In particular, the inequality g (cid:0) G α ( a, b ) , G α ( x, y ) (cid:1) ≤ H α { G α ( g ( a, x ) , g ( a, y )) , G α ( g ( b, x ) , g ( b, y )) } (2.13)holds for all a, b, x, y ≥ α ∈ [0 , g is jointlyGG-convex, i.e., (2.11) holds.Before proving Theorem 2.4, we would like to note that if f is MN-convex, then g is not necessarily MN-convex in both variables, unless M = N. For example, if f is AH-convex, then part (i) of Theorem 2.4 shows that g is AH-convex on the ﬁrstcoordinate and convex on the second coordinate. However, g is not AH-convex inboth variables. To see this, consider the AH-convex function f ( t ) = 1 / √ t and put α = 1 / a = 1, x = 2 and y = 4. Then3 √ g ( a, A / ( x, y )) (cid:2) H / ( g ( a, x ) , g ( a, y )) = 16 √

24 + √ . Note in addition that when f is AH-convex, g is AA-convex in both variables.However, the the reverse direction does not hold, i.e., if g is AA-convex in bothvariables, then f is not necessarily AH-convex. Proof. of Theorem 2.4.

First assume that f is AH-convex. For all x, y, a ≥ α ∈ [0 ,

1] we have g (cid:0) αa + (1 − α ) b, x (cid:1) = xf (cid:16) x (cid:0) αa + (1 − α ) b (cid:1)(cid:17) ≤ x h αf (cid:0) ax (cid:1) − + (1 − α ) f (cid:0) bx (cid:1) − i − = h αx − f (cid:0) ax (cid:1) − + (1 − α ) x − f (cid:0) bx (cid:1) − i − = h αg ( a, x ) − + (1 − α ) g ( b, x ) − i − . This ensures that g is AH-convex on the ﬁrst coordinate. Therefore g (cid:16) α ( a, x ) + (1 − α )( b, y ) (cid:17) ≤ (cid:16) αg ( a, z ) − + (1 − α ) g ( b, z ) − (cid:17) − , where z = αx + (1 − α ) y . This means that g (cid:0) A α ( a, b ) , A α ( x, y ) (cid:1) ≤ H α { g ( a, A α ( x, y )) , g ( b, A β ( x, y )) } . (2.14)On the other hand we can write f (cid:18) a A α ( x, y ) (cid:19) = f αayx + (1 − α ) ayxαy + (1 − α ) x ! = f (cid:18) β (cid:16) ax (cid:17) + (1 − β ) (cid:18) ay (cid:19)(cid:19) , (2.15) ONVEXITY OF THE CSISZ ´AR f -DIVERGENCE FUNCTIONAL 7 where β = αyαy + (1 − α ) x = αx A α ( x,y ) . Since f is convex, (2.15) implies that f (cid:18) a A α ( x, y ) (cid:19) ≤ αx A α ( x, y ) f (cid:16) ax (cid:17) + (1 − α ) y A α ( x, y ) f (cid:18) ay (cid:19) . Multiplying both sides by A α ( x, y ) we get g (cid:0) a, A α ( x, y ) (cid:1) ≤ A α ( g ( a, x ) , g ( a, y )) . (2.16)Similarly g (cid:0) b, A α ( x, y ) (cid:1) ≤ A α ( g ( b, x ) , g ( b, y )) . (2.17)Since the Harmonic mean is monotone, it follows from (2.16) and (2.17) that g (cid:0) A α ( a, b ) , A α ( x, y ) (cid:1) ≤ H α { g ( a, A α ( x, y )) , g ( b, A α ( x, y )) } (by (2.14)) ≤ H α { A α ( g ( a, x ) , g ( a, y )) , A α ( g ( b, x ) , g ( b, y )) } which is the desired inequality (2.9). With x = y , this gives the AH-convexity of g in the ﬁrst coordinate and with a = b this implies the convexity of g in the secondcoordinate.Conversely, if g is AH-convexity in the ﬁrst coordinate, then f ( t ) = g ( t,

1) is anAH-convex function, too. This completes the proof of (i).Next suppose that f is AG-convex. For all x, y, a ≥ α ∈ [0 ,

1] we have g (cid:0) αa + (1 − α ) b, x (cid:1) = xf (cid:16) x (cid:0) αa + (1 − α ) b (cid:1)(cid:17) ≤ x h f (cid:0) ax (cid:1) α f (cid:0) bx (cid:1) − α i = h xf (cid:0) ax (cid:1)i α (cid:20) xf (cid:0) bx (cid:1)(cid:21) − α = g ( a, x ) α g ( b, x ) − α , whence g is AG-convex in its ﬁrst variable. Hence g (cid:0) A α ( a, b ) , A α ( x, y ) (cid:1) ≤ G α { g ( a, A α ( x, y )) , g ( b, A α ( x, y )) } . (2.18)Furthermore, taking into account the Arithmetic-Geometric means inequality, weknow that f is a convex function so that (2.16) and (2.17) hold. Regarding themonotonicity of the Geometric mean in its both variables we conclude from (2.16)and (2.17) that G α { g ( a, A α ( x, y )) , g ( b, A α ( x, y )) } ≤ G α { A α ( g ( a, x ) , g ( a, y )) , A α ( g ( b, x ) , g ( b, y )) } . (2.19)Combining (2.18) and (2.19) we infer (2.10). Putting x = y , the AG-convexity of g in ﬁrst coordinate follows from (2.10) and with a = b , (2.10) gives the convexity M. KIAN of g in the second coordinate. Conversely, if g is AG-convex in its ﬁrst coordinate,then f ( t ) = g ( t,

1) is AG-convex as well.To prove (iii), let f be a GG-convex function. Then f (cid:18) G α ( a, b ) G α ( x, y ) (cid:19) = f (cid:0) a α b − α x − α y α − (cid:1) = f (cid:18) G α (cid:18) ax , by (cid:19)(cid:19) ≤ G α (cid:18) f (cid:16) ax (cid:17) , f (cid:18) by (cid:19)(cid:19) . Hence g (cid:0) G α ( a, b ) , G α ( x, y ) (cid:1) = G α ( x, y ) f (cid:18) G α ( a, b ) G α ( x, y ) (cid:19) ≤ G α ( x, y ) G α (cid:18) f (cid:16) ax (cid:17) , f (cid:18) by (cid:19)(cid:19) = G α { g ( a, x ) , g ( b, y )) } as required. This proves (iii).Next suppose that f is a HH-convex function. We write f (cid:18) H α ( a, b ) H α ( x, y ) (cid:19) = f (cid:18) αa − + (1 − α ) b − αx − + (1 − α ) y − (cid:19) − ! = f (cid:18) α xya + (1 − α ) xyb αy + (1 − α ) x (cid:19) − ! = f (cid:18)(cid:16) β xa + (1 − β ) yb (cid:17) − (cid:19) in which we set β = αyαy +(1 − α ) x . Since f is HH-convex, we obtain f (cid:18) H α ( a, b ) H α ( x, y ) (cid:19) = f (cid:18) H β (cid:18) ax , by (cid:19)(cid:19) ≤ H β (cid:18) f (cid:16) ax (cid:17) , f (cid:18) by (cid:19)(cid:19) so that g ( H α ( a, b ) , H α ( x, y )) = H α ( x, y ) f (cid:18) H α ( a, b ) H α ( x, y ) (cid:19) ≤ H α ( x, y ) H β (cid:18) f (cid:16) ax (cid:17) , f (cid:18) by (cid:19)(cid:19) . (2.20)A simple calculation shows that H α ( x, y ) H β (cid:18) f (cid:16) ax (cid:17) , f (cid:18) by (cid:19)(cid:19) = H α { g ( a, x ) , g ( b, y ) } . Consequently, (2.12) follows from (2.20). Hence g is jointly HH-convex. Conversely,if g is jointly HH-convex, then f ( t ) = g ( t,

1) is HH-convex. This proves (iv).

ONVEXITY OF THE CSISZ ´AR f -DIVERGENCE FUNCTIONAL 9 Let f be a GH-convex function. For all a, b, x ≥ α ∈ [0 ,

1] we have g (cid:0) G α ( a, b ) , x (cid:1) = xf (cid:18) G α ( a, b ) x (cid:19) = xf (cid:18) G α (cid:18) ax , bx (cid:19)(cid:19) ≤ x H α (cid:18) f (cid:16) ax (cid:17) , f (cid:18) bx (cid:19)(cid:19) = H α ( g ( a, x ) , g ( b, x )) , (2.21)whence g is GH-convex function in its ﬁrst coordinate. Furthermore, we can write g (cid:0) a, G α ( x, y ) (cid:1) = G α ( x, y ) f (cid:18) a G α ( x, y ) (cid:19) = G α ( x, y ) f (cid:18) G α (cid:18) ax , ay (cid:19)(cid:19) ≤ G α ( x, y ) H α (cid:18) f (cid:16) ax (cid:17) , f (cid:18) ay (cid:19)(cid:19) ≤ G α ( x, y ) G α (cid:18) f (cid:16) ax (cid:17) , f (cid:18) ay (cid:19)(cid:19) = G α ( g ( a, x ) , g ( a, y )) , (2.22)where the last inequality follows from the Harmonic-Geometric mean inequality.This ensures that g is GG-convex in the second coordinate. Furthermore, combining(2.21) and (2.22) and using the monotonicity of the Harmonic mean, we reach (2.13).In addition, a similar argument as in (2.21) shows that g is jointly GG-convex.Indeed, g (cid:0) G α ( a, b ) , G α ( x, y ) (cid:1) = G α ( x, y ) f (cid:18) G α ( a, b ) G α ( x, y ) (cid:19) = G α ( x, y ) f (cid:18) G α (cid:18) ax , by (cid:19)(cid:19) ≤ G α ( x, y ) H α (cid:18) f (cid:16) ax (cid:17) , f (cid:18) by (cid:19)(cid:19) ≤ G α ( x, y ) G α (cid:18) f (cid:16) ax (cid:17) , f (cid:18) by (cid:19)(cid:19) = G α ( g ( a, x ) , g ( b, y )) , and so g is jointly GG-convex as we claimed. The converse follows similarly asprevious parts. (cid:3) We give some particular corollaries of of Theorem 2.4 for some f -divergence func-tionals. It is easy to see that there are positive real numbers c , c for which the func-tion f ( t ) = t log t is AH-convex on ( c , c ). Theorem 2.4 implies that the Kullback–Leibler distance KL ( p , q ) = P nj =1 p j log (cid:16) p j q j (cid:17) is AH-convex on the ﬁrst coordinateand convex on the second coordinate. As another example, the function f ( t ) = t r is AG-convex on (0 , ∞ ) for all r <

0. By Theorem 2.4, the generated divergencefunctional ρ r ( p , q ) = P nj =1 p rj q − rj is AG-convex on its ﬁrst coordinate and convexon its second coordinate. As another example, the function f ( t ) = 2( √ t − isGG-convex. Accordingly, the related divergence functional, which is the Hellingerdistance H ( p , q ) := 2 P nj =1 (cid:0) √ p j − √ q j (cid:1) is jointly GG-convex.Some applications of Theorem 2.4 will be given in the next section. As we sawin Lemma 2.3, the MN-convexity of f produces variants of the Jensen inequality.Here, we study inequality (1.1) in Theorem A, when the core function f enjoysMN-convexity. Theorem 2.5.

Let a = ( a , . . . , a n ) and b = ( b , . . . , b n ) be two n -tuples of positivereal numbers and let f be a positive real function on (0 , ∞ ) . (i) If f is an AH-convex function, then g (cid:0) ¯a , ¯b (cid:1) ≤ ¯a I f ( a , b ) − ≤ I f ( a , b ) . (2.23)(ii) If f is an AG-convex function, then g (cid:0) ¯a , ¯b (cid:1) ≤ ¯b exp (cid:20) ¯b I log f ( a , b ) (cid:21) ≤ I f ( a , b ) . (2.24)(iii) If f is a HA-convex function, then g (cid:0) ¯a , ¯b (cid:1) = ¯b¯a g ϕ (cid:0) ¯b , ¯a (cid:1) = ¯b¯a g φ (cid:0) ¯a , ¯b (cid:1) ≤ ¯b¯a I ϕ ( b , a ) = ¯b¯a I φ ( a , b ) , (2.25)where ϕ ( t ) = f (1 /t ) and φ ( t ) = tf ( t ).(iv) If f is an increasing GA-convex function, then g (cid:0) ¯a , ¯b (cid:1) ≤ g fo exp (cid:0) ¯a , ¯b (cid:1) ≤ I fo exp ( a , b ) . (2.26) Proof.

Suppose that a , b are n -tuples of positive real numbers. For every i =1 , . . . , n , we set β i = b i P nk =1 b k so that ( β , . . . , β n ) is a probability vector. Firstassume that f is an AH-convex function for which we can write f (cid:16) P ni =1 a i P ni =1 b i (cid:17) = f (cid:16) a P nk =1 b k + · · · + a n P nk =1 b k (cid:17) = f (cid:16) a b b P nk =1 b k + · · · + a n b n b n P nk =1 b k (cid:17) ≤ n X i =1 b i P nk =1 b k f ( a i b i ) ! − (2.27) ONVEXITY OF THE CSISZ ´AR f -DIVERGENCE FUNCTIONAL 11 where we use (2.3) with β i = b i P nk =1 b k . Multiplying both sides of (2.27) with P nk =1 b k we get (cid:0) n X k =1 b k (cid:1) f (cid:16) P ni =1 a i P ni =1 b i (cid:17) ≤ (cid:16) n X k =1 b k (cid:17)(cid:18) n X i =1 b i P nk =1 b k f ( a i b i ) (cid:19) − = (cid:0) n X k =1 b k (cid:1) (cid:18) n X i =1 b i f ( a i b i ) (cid:19) − = (cid:16) n X k =1 b k (cid:17) I f ( a , b ) − , which implies the ﬁrst inequality in (2.23). To get the second inequality we use theconvexity of the function t t − . (cid:16) n X k =1 b k (cid:17) I f ( a , b ) − = (cid:16) n X k =1 b k (cid:17)(cid:18) n X i =1 b i P nk =1 b k f ( a i b i ) (cid:19) − ≤ n X i =1 b i f (cid:18) a i b i (cid:19) = I f ( a , b ) . This completes the proof of (i). Next assume that f is an AG-convex function. Then f (cid:16) P ni =1 a i P ni =1 b i (cid:17) = f n X i =1 β i a i b i ! ≤ n Y i =1 f (cid:18) a i b i (cid:19) β i , (2.28)in which we use the same convex coeﬃcients β i as in the proof of (i). Moreover, n Y i =1 f (cid:18) a i b i (cid:19) β i = n Y i =1 exp (cid:20) b i P nk =1 b k log f (cid:18) a i b i (cid:19)(cid:21) = exp " n X i =1 b i P nk =1 b k log f (cid:18) a i b i (cid:19) = exp (cid:20) ¯b I log f ( a , b ) (cid:21) . (2.29)The left inequality in (2.24) follows from (2.28) and (2.29). In addition, utilising theArithmetic-Geometric means inequality we reach n Y i =1 f (cid:18) a i b i (cid:19) β i ≤ n X i =1 β i f (cid:18) a i b i (cid:19) = 1 ¯b I f ( a , b ) (2.30)and the right inequality in (2.24) is derived. This concludes (ii).Now assume that f is a HA-convex function. It is not hard to see that [6] thefunctions ϕ ( t ) = f (1 /t ) and φ ( t ) = tf ( t ) are convex on proper domains so thatTheorem A gives g ϕ (cid:0) ¯a , ¯b (cid:1) ≤ I ϕ ( a , b ) and g φ (cid:0) ¯a , ¯b (cid:1) ≤ I φ ( a , b ). We consider theconvex coeﬃcients α i = a i P nk =1 a k for i = 1 , . . . , n in such a way that f (cid:18) P ni =1 a i P ni =1 b i (cid:19) = f (cid:18) P ni =1 b i P ni =1 a i (cid:19) − ! = f  n X i =1 b i a i a i P nk =1 a k ! −  = f  n X i =1 α i b i a i ! −  . By the HA-convex of f , this concludes that f (cid:18) P ni =1 a i P ni =1 b i (cid:19) ≤ n X i =1 α i f (cid:18) a i b i (cid:19) = 1 P nk =1 a k n X i =1 a i f (cid:18) a i b i (cid:19) . Multiplying both sides by P nk =1 b k we reach g (cid:0) ¯a , ¯b (cid:1) ≤ ¯b¯a I ϕ ( b , a ) . On the other hand, g (cid:0) ¯a , ¯b (cid:1) = ¯b f (cid:18) ¯a¯b (cid:19) = ¯b¯a ¯a ϕ (cid:18) ¯b¯a (cid:19) = ¯b¯a g ϕ (cid:0) ¯b , ¯a (cid:1) = ¯b¯a g φ (cid:0) ¯a , ¯b (cid:1) . Furthermore, we compute I ϕ ( b , a ) = n X i =1 a i ϕ (cid:18) b i a i (cid:19) = n X i =1 b i a i b i f (cid:18) a i b i (cid:19) = I φ ( a , b ) , so that we arrive at (iii).For proving (iv), ﬁrst not that a function f is GA-convex if and only if the function t f ( e t ) is convex, indeed, when proper domains are considered. So the Csisz´arinequality in Theorem A implies the right inequality of (2.26): g fo exp (cid:0) ¯a , ¯b (cid:1) ≤ I fo exp ( a , b ) . (2.31)When f is increasing, we have f o exp ≥ f on the positive half line. This ensure thatthe left inequality in (iv) is valid. (cid:3) Matrix Jensen Inequality

Let M n denote the algebra of n × n complex matrices and I denote the identitymatrix. It is known that (see for example [10, Theorem 1.2]) an extension of theclassical Jensen inequality holds as follows: f ( h Aη, η i ) ≤ h f ( A ) η, η i (3.1)for every continuous convex function f : J → R and every Hermitian matrix A ∈ M n with eigenvalues in J and every unit vector η ∈ C n . Our mean by f ( A ) isthe Hermitian matrix deﬁned using the spectral decomposition of A . Indeed, if A = P ni =1 λ i P i is the spectral decomposition of the Hermitian matrix A ∈ M n ,when λ i ’s are eigenvalues of A and P i ’s are projections with P ni =1 P i = I , then f ( A ) = P ni =1 f ( λ i ) P i , See [10].Lemma 2.3 can be applied to derive variants of (3.1) for MN-convex functions.See [1, 12, 15, 16, 18, 21] and references therein for a collection of such inequalities. ONVEXITY OF THE CSISZ ´AR f -DIVERGENCE FUNCTIONAL 13 Proposition 3.1.

Let f be a continuous real function. (i) If f is AH-convex, then f ( h Aη, η i ) ≤ h f ( A ) − η, η i − , (3.2)(ii) If f is AG-convex, then f ( h Aη, η i ) ≤ exp h log f ( A ) η, η i , (3.3)(iii) If f is GA-convex, then f (exp h log Aη, η i ) ≤ h f ( A ) η, η i , (3.4)(iv) If f is GG-convex, then f (exp h log Aη, η i ) ≤ exp h log f ( A ) η, η i , (3.5)(v) If f is GH-convex, then f (exp h log Aη, η i ) ≤ h f ( A ) − η, η i − , (3.6)(vi) If f is HG-convex, then f (cid:16)(cid:10) A − η, η (cid:11) − (cid:17) ≤ exp h log f ( A ) η, η i , (3.7)(vii) If f is HH-convex, then f (cid:16)(cid:10) A − η, η (cid:11) − (cid:17) ≤ h f ( A ) − η, η i − , (3.8)for every unit vector η ∈ C n and every Hermitian matrix A ∈ M n , whose eigenvaluesare contained in the domain of f . Proof.

We only note that utilising the spectral decomposition of A , the inner productterms in every part of proposition can be described by an scalar mean. For example,if A = P ni =1 λ i P i is the spectral decomposition of A , then P ni =1 h P i η, η i = 1 andexp h log Aη, η i = exp * n X i =1 log λ i P i ! η, η + = exp n X i =1 h P i η, η i log λ i ! = n Y i =1 λ i h P i η,η i = G( α ; Λ) , where α = ( h P η, η i , . . . , h P n η, η i ) is a weight vector and Λ = ( λ , . . . , λ n ). (cid:3) Let A ∈ M n and B ∈ M m be Hermitian matrices with spectral decompositions A = P ni =1 λ i P i and B = P mi =1 µ i Q i . When f is a two variable real function deﬁnedon J × J ⊆ R , then we can deﬁne a Hermitian matrix f ( A, B ) as f ( A, B ) = n X i =1 m X j =1 f ( λ i , µ j ) P i ⊗ Q j and so f becomes a matrix function of two variables from M n × M m to M nm . It hasbeen shown in [17] that if f is a separately convex function on J × J ⊆ R , then f ( h Aη, η i , h Bζ , ζ i ) ≤ h f ( A, B ) η ⊗ ζ , η ⊗ ζ i (3.9)for all unit vectors η ∈ C n and ζ ∈ C m and all Hermitian matrices A ∈ M n and B ∈ M m .As it was shown in Theorem 2.4, the type of convexity of the core function f aﬀects on the convexity of perspective function g . In the rest of this section, we aregoing to establish matrix Jensen inequality (3.9) for the perspective functions in thecase where f is a MN-convex function. Theorem 3.2.

Let h be a real two-variable function on J × J ⊆ R . (i) If h is separately HH-convex, then h (cid:0) h A − η, η i − , h B − ζ , ζ i − (cid:1) ≤ (cid:10) h ( A, B ) − η ⊗ ζ , η ⊗ ζ (cid:11) − ; (3.10)(ii) If h is separately GG-convex, then h (exp h log Aη, η i , exp h log Bζ , ζ i ) ≤ exp h log h ( A, B ) η ⊗ ζ , η ⊗ ζ i ; (3.11)for all unit vectors η ∈ C n and ζ ∈ C m and all Hermitian matrices A ∈ M n and B ∈ M m . Proof.

Suppose that A = P ni =1 λ i P i and B = P mi =1 µ i Q i are spectral decompositionsof Hermitian matrices A and B . Assume that η ∈ C n and ζ ∈ C m are unit vectorsso that P ni =1 h P i η, η i = 1 = P mj =1 h Q j ζ , ζ i . Then h (cid:0) h A − η, η i − , h B − ζ, ζ i − (cid:1) = h  n X i =1 λ i − h P i η, η i ! − , b  ≤ n X i =1 h P i η, η i h ( λ i , b ) − ! − , (3.12) where b = h B − ζ , ζ i − and the inequality follows from the HH-convexity of h inthe ﬁrst variable and (2.8) of Lemma 2.3. Furthermore, for every i = 1 , . . . , n , theHH-convexity of h in the second variable gives h ( λ i , b ) = h  λ i ,  m X j =1 µ j − h Q j ζ, ζ i  −  ≤  m X j =1 h ( λ i , µ j ) − h Q j ζ, ζ i  − . (3.13) ONVEXITY OF THE CSISZ ´AR f -DIVERGENCE FUNCTIONAL 15 It follows from (3.12) and (3.13) that h (cid:0) h A − η, η i − , h B − ζ, ζ i − (cid:1) ≤  n X i =1 m X j =1 h P i η, η ih Q j ζ, ζ i h ( λ i , µ j ) −  − =  n X i =1 m X j =1 h ( λ i , µ j ) − h ( P i ⊗ Q j ) η ⊗ ζ, η ⊗ ζ i  − = (cid:10) h ( A, B ) − η ⊗ ζ, η ⊗ ζ (cid:11) − and we obtain (3.10). Next suppose that h is separately GG-convex and supposethat A and B are Hermitian matrices with the same spectral decompositions as inthe ﬁrst part. Utilising Lemma 2.3 we have h (exp h log Aη, η i , exp h log Bζ, ζ i ) = h n Y i =1 λ i h P i η,η i , b ! ≤ n Y i =1 h ( λ i , b ) h P i η,η i , (3.14) in which b = exp h log Bζ , ζ i = Q mj =1 µ j h Q j ζ,ζ i . Moreover, another use of Lemma 2.3regarding the GG-convexity of h in the second variable gives h ( λ i , b ) = h  λ i , m Y j =1 µ j h Q j ζ,ζ i  ≤ m Y j =1 h ( λ i , µ j ) h Q j ζ,ζ i (3.15) for every i = 1 , . . . , n . From (3.14) and (3.15) we obtain h (exp h log Aη, η i , exp h log Bζ, ζ i ) ≤ n Y i =1 m Y j =1 h ( λ i , µ j ) h P i η,η ih Q j ζ,ζ i = exp h log h ( A, B ) η ⊗ ζ, η ⊗ ζ i and we are done. (cid:3) Remark . Let us give some applications of Theorem 2.4 and Theorem 3.2 forperspective functions. We show in Theorem 2.4 that if f is HH-convex, then theassociated perspective function g is HH-convex in its both variables and so (3.10)holds by Theorem 3.2. For example, the function f ( t ) = t r is HH-convex for every r ∈ [0 ,

1] and so g ( t, s ) = sf ( t/s ) = s − r t r is HH-convex in its both variables. Notethat in this particular example we have g ( A, B ) = A r ⊗ B − r . Now (3.10) impliesthat h A − η, η i − r h B − ζ , ζ i r − ≤ (cid:10) ( A r ⊗ B − r ) η ⊗ ζ , η ⊗ ζ (cid:11) . Note that because the function g ( t, s ) in this example can be decomposed as g ( t, s ) = g ( t ) g ( s ), the above inequality follows directly from (3.1). Remark . If the type of convexity in ﬁrst coordinate of a two variable function h diﬀers from its second coordinate, it is also possible to present Theorem 3.2. For example, assume that the function h , deﬁned on J × J ⊆ R , is AH-convex in itsﬁrst coordinate and convex in the second coordinate. Then h ( h Aη, η i , h Bζ , ζ i ) ≤ (cid:10) h ( A, h Bζ , ζ i ) − η, η (cid:11) − , (3.16)where we use (3.2) in Proposition 3.1 for the AH-convex function h ( t ) = h ( t, h Bζ , ζ i ).Note that h ( A, h Bζ , ζ i ) is a Hermitian matrix in M n deﬁned by h ( A, h Bζ , ζ i ) = n X i =1 h ( λ i , h Bζ , ζ i ) P i , in which we use the spectral decomposition of A as before. Since h is convex on thesecond coordinate, for every i = 1 , . . . , n we have h ( λ i , h Bζ , ζ i ) ≤ m X j =1 h ( λ i , µ j ) h Q j ζ , ζ i . Accordingly, h ( A, h Bζ , ζ i ) ≤ n X i =1 m X j =1 h ( λ i , µ j ) h Q j ζ , ζ i P i , whence h h ( A, h Bζ, ζ i ) η, η i ≤ n X i =1 m X j =1 h ( λ i , µ j ) h Q j ζ, ζ ih P i η, η i = h h ( A, B ) η ⊗ ζ, η ⊗ ζ i . (3.17) Now we obtain from (3.16) and (3.17) that h ( h Aη, η i , h Bζ , ζ i ) ≤ (cid:10) h ( A, h Bζ , ζ i ) − η, η (cid:11) − ≤ h h ( A, h Bζ , ζ i ) η, η i≤ h h ( A, B ) η ⊗ ζ , η ⊗ ζ i . Similarly, it can be shown that h ( h Aη, η i , h Bζ , ζ i ) ≤ h h ( h Aη, η i , B ) ζ , ζ i ≤ h h ( A, B ) η ⊗ ζ , η ⊗ ζ i . We show in Theorem 2.4 that if f is AH-convex, then its perspective function g isAH-convex in ﬁrst coordinate and convex in the second coordinate. Hence, the twolast series of inequalities holds true for the perspective function of every AH-convex. References [1] R.P. Agarwal, S.S. Dragomir,

A survey of Jensen type inequalities for functions of self-adjointoperators in Hilbert spaces , Comput. Math. Appl. (2010), 3785–3812.[2] G. D. Anderson, M. K. Vamanamurthy and M. Vuorinen, Generalized convexity and inequal-ities , J. Math. Anal. Appl. (2007), 1294–1308.

ONVEXITY OF THE CSISZ ´AR f -DIVERGENCE FUNCTIONAL 17 [3] I. Csisz´ar, Information Measures: A Critical Survey , Trans. 7’th Prague Conf. on Info. Th.,Statist. Decis. Funct., Random Processes and 8’th European Meeting of Statist., Volume B,Academia Prague (1978), 73-–86.[4] I. Csisz´ar,

Information-type measures of diﬀerence of probability distributions and indirectobservations , Stud. Sci. Math. Hung. (1967), 299–318.[5] I. Csisz´ar and J. K¨orner, Information Theory: Coding Theorems for Discrete Memory-lessSystems , Academic Press, New York, 1981.[6] S. S. Dragomir,

Hermite-Hadamard type inequalities for

MN-convex functions, to appear inThe Australian Journal of Mathematical Analysis and Applications.[7] S. S. Dragomir, (Ed.),

Inequalities for Csisz´ar f -divergence in information theory , RGMIAMonographs, Victoria University, 2000.[8] S. S. Dragomir, Inequalities of Jensen type for

AH-convex functions, J. Numer. Anal. Approx.Theory, (2016), 128–146.[9] S. S. Dragomir, Inequalities of Hermite–Hadamard type for HH -convex functions , Acta Comm.Univ. Tartuensis Math. , Number 2, (2018), 179–190.[10] T. Furuta, J. Mi´ci´c, J. Pe´cari´c and Y. Seo, Mond–Pe´cari´c method in operator inequalities ,Element, Zagreb, 2005.[11] G.L. Gilardoni,

On Pinsker’s and Vajda’s type inequalities for Csisz´ar’s f -divergences , IEEETrans. Inf. Theory (2010), 5377–5386.[12] F. Hansen, H. Najaﬁ, M.S. Moslehian, Operator maps of Jensen-type , Positivity (2018),no. 5, 1255–1263.[13] J.-B. Hiriart-Urruty and J.-E. Mart´ınez-Legaz, Convex solutions of a functional equation aris-ing in information theory , J. Math. Anal. Appl. (2007), 1309–1320.[14] M. Kian,

A characterization of mean values for Csisz´ar’s inequality and applications , Indaga.Math. (2014) 505–515.[15] M. Kian, Operator Jensen inequality for superquadratic functions , Linear Algebra Appl. (2014), 82–87.[16] M. Kian and S. S. Dragomir,

Inequalities involving superquadratic functions and operators ,Mediter. J. Math. (2014), 1205–1214.[17] J. S. Matharu and J. S. Aujla, Some majorization inequalities for convex functions of severalvariables , Math. Inequal. Appl. (2011), 947–956.[18] M.S. Moslehian, A. Dadkhah, and K. Yanagi, Noncommutative versions of inequalities inquantum information theory , Anal. Math. Phys. (2019), no. 4, 2151–2169.[19] C. P. Niculescu, Convexity according to the geometric mean , Math. Inequal. Appl., (2000),155–167.[20] M. A. Noor, K. I. Noor and M. U. Awan, Some inequalities for geometrically–arithmeticallyh-convex functions , Creat. Math. Inform. (2014), No. 1, 91–98.[21] J. Rooin, S. Habibzadeh and M.S. Moslehian, Jensen inequalities for P-class functions , Period.Math. Hungar. (2018), no. 2, 261–273.[22] I. Sason, On f -divergences: integral representations, local behavior, and inequalities , Entropy (2018), 383.[23] I. Vajda, On metric divergences of probability measures , Kybernetika (2009), 885–900. [24] X.-M. Zhang, Y.-M. Chu and X.-H. Zhang, The Hermite-Hadamard type inequality of

GA-convex functions and its application, Journal of Inequalities and Applications, Volume 2010,Article ID 507560, 11 pages.

Mohsen Kian: Department of Mathematics, University of Bojnord, P. O. Box1339, Bojnord 94531, Iran

Email address ::