aa r X i v : . [ m a t h . P R ] M a y A complement to the Chebyshev integral inequality
Adam JakubowskiNicolaus Copernicus University, Toru´n, PolandE-mail: [email protected]
Abstract
We give necessary and sufficient conditions for the Chebyshev in-equality to be an equality.
Keywords:
Chebyshev inequality; association; variance reduction.
MSClassification 2010:
The simplest form of the Chebyshev integral (or algebraic) inequality holdsfor arithmetic means: if x ≤ x ≤ . . . ≤ x n and y ≤ y ≤ . . . ≤ y n are realnumbers, then1 n (cid:0) x y + x y + . . . + x n y n (cid:1) ≥ n (cid:0) x + x + . . . + x n (cid:1) · n (cid:0) y + y + . . . + y n (cid:1) . (1)The continuous counterpart reads as follows: if f, g : [ a, b ] → R are non-decreasing, then1 b − a Z ba f ( x ) g ( x ) dx ≥ b − a Z ba f ( x ) dx · b − a Z ba g ( x ) dx. (2)Clearly, both (1) and (2) are particular cases of the following probabilisticstatement. Theorem 1.1. If X is a random variable on (cid:0) Ω , F , P (cid:1) and f, g : R → R are non-decreasing , and such that C ov (cid:0) f ( X ) , g ( X ) (cid:1) exists, then f ( X ) and g ( X ) are positively correlated, i.e. C ov (cid:0) f ( X ) , g ( X ) (cid:1) = E f ( X ) g ( X ) − E f ( X ) E g ( X ) ≥ . (3)1otice that (3) valid for all non-decreasing f and g means that a singlerandom variable is associated , what in turn is a cornerstone in the proof ofassociation of independent random variables (see [1], also [3]).Somewhat surprisingly, a relatively recent work [10] states that the Cheby-shev inequality is equivalent to the classic Jensen inequality. In fact, bothinequalities are “dual” in a specific sense (see [9, Section 1.8]).It is also remarkable that relation (3) admits direct consequences in someeconomic considerations - see [11], [14].We refer to [8, Chapter IX] for the detailed report on developmentsrelated to the inequality that is nowadays called Chebyshev’s.The purpose of this note is to prove the following complement to Theo-rem 1.1. Theorem 1.2.
In assumptions of Theorem 1.1, C ov (cid:0) f ( X ) , g ( X ) (cid:1) = 0 if,and only if, either f ( X ) or g ( X ) is a.s constant. Taking into account the long history of the Chebyshev inequality thelack of explanation when the inequality is not strict seems to be unlikely.But we were not able to find any published reference proving Theorem 1.2in full generality.Of course the trivial case (1) was clear in early eighteen-eighties due tothe observation by A. Korkine (see [8, p. 242], also [5, pp. 43–44]).[7, p. 77] states that the equality holds in (2) only if f or g is constantalmost everywhere, but the proof is missing. Notice that (2) corresponds to X uniformly distributed on [ a, b ]. In a slightly more general case, when thelaw of X is given by a strictly positive density on [ a, b ], [6, p. 40, Theorem10] refers to the original Chebyshev’s proof [4, pp. 128–131, 157–169]. ButChebyshev worked under stronger assumptions (differentiability of f and g ). Likewise, [8, Chapter IX] provide several results (by Winckler, Pickard,. . . ) where the strict inequality occurs in (2), but all of them are related tostronger assumptions imposed on functions f and g . The reader may verifythat in [8, Chapter IX] no attention is paid to results similar to our Theorem1.2.Therefore it is surprising to find in [14] a statement (Theorem 1) refer-ring to [8, p. 248] and asserting that under Steffensen’s assumptions on f and g the equality holds in (3) if, and only if, f or g “are constant almost ev-erywhere” (i.e. with respect to the Lebesgue measure). As simple examples(and our Theorem 1.2) show this is incorrect. Notice that [14] refers also tothe original paper [13]. This article (written in Danish, a slightly extended Following [8, p. 287] Wagener provides wrong year of publication of this article - 1920.The correct is 1925. non-degenerate distribution function F with finite variance. Let F ← ( u ) = inf { s ; F ( s ) ≥ u } be its left-continuous inverse. Then both F ← ( u ) and − F ← (1 − u ) are non-decreasing in u and therefore, if U is uniformly distributed on [0 , C ov (cid:0) F ← ( U ) , − F ← (1 − U ) (cid:1) > , or, equivalently, C ov (cid:0) F ← ( U ) , F ← (1 − U ) (cid:1) < . We have justified a method of variance reduction, as described in [2, Section2.1], without invoking the results of [15].
The proof of (3) is immediate, if we observe that C ov (cid:0) f ( X ) , g ( X ) (cid:1) = 12 E (cid:0) f ( X ) − f ( Y ) (cid:1)(cid:0) g ( X ) − g ( Y ) (cid:1) , where Y is independent of X and Y ∼ X , and that for each ω (cid:0) f ( X ( ω )) − f ( Y ( ω )) (cid:1)(cid:0) g ( X ( ω )) − g ( Y ( ω )) (cid:1) ≥ . To prove Theorem 1.2 suppose that0 = C ov (cid:0) f ( X ) , g ( X ) (cid:1) = (1 / E (cid:0) f ( X ) − f ( Y ) (cid:1)(cid:0) g ( X ) − g ( Y ) (cid:1) = (1 / E X (cid:16) E Y (cid:0) f ( X ) − f ( Y ) (cid:1)(cid:0) g ( X ) − g ( Y ) (cid:1)(cid:17) It follows that for P X -almost all x E Y (cid:0) f ( x ) − f ( Y ) (cid:1)(cid:0) g ( x ) − g ( Y ) (cid:1) = 0. Sincestill (cid:0) f ( x ) − f ( Y ( ω )) (cid:1)(cid:0) g ( x ) − g ( Y ( ω )) (cid:1) ≥
0, we get P (cid:0) { f ( Y ) = f ( x ) } ∪ { g ( Y ) = g ( x ) } ) = 1 , for P X -almost all x. (4)3et A f = { x ; P (cid:0) f ( Y ) = f ( x ) (cid:1) > } , A g = { x ; P (cid:0) g ( Y ) = g ( x ) (cid:1) > } . If x ∈ A f , f ( x ) is an atom of the distribution of f ( X ). Hence there are distinct numbers u , u , . . . such that P (cid:0) f ( Y ) = u i (cid:1) > A f = [ i f − ( { u i } ) . In particular, A f is measurable. If P (cid:0) Y ∈ A cf (cid:1) >
0, then by (4) there exists x such that P (cid:0) f ( Y ) = f ( x ) (cid:1) = 0 and1 = P (cid:0) { f ( Y ) = f ( x ) } ∪ { g ( Y ) = g ( x ) } ) = P (cid:0) { g ( Y ) = g ( x ) } ) . This proves the theorem. So we may and do assume that P (cid:0) f ( Y ) ∈ { u , u , . . . } (cid:1) = X i P (cid:0) f ( Y ) = u i (cid:1) = 1 . By symmetry we may also assume that for some distinct numbers v , v , . . . P (cid:0) g ( Y ) ∈ { v , v , . . . } (cid:1) = X k P (cid:0) g ( Y ) = v k (cid:1) = 1 . We can write f ( X ) = ∞ X i =1 u i { f ( X )= u i } , f ( Y ) = ∞ X i =1 u i { f ( Y )= u i } ,f ( X ) − f ( Y ) = X i = j ( u i − u j )1I { f ( X )= u i }∩{ f ( Y )= u j } , and similarly g ( X ) − g ( Y ) = X k = l ( v k − v l )1I { g ( X )= v k }∩{ g ( Y )= v l } . Let ω ∈ { f ( X ) = u i , f ( Y ) = u j , g ( X ) = v k , g ( Y ) = v l } , i = j, k = l .Notice that u i > u j implies X ( ω ) > Y ( ω ), hence v k > v l (the monotonicityof f gives us only v k ≥ v l , but we know that v k = v l ). Similarly u i < u j v k < v l . Therefore we always have ( u i − u j )( v k − v l ) >
0. We alsohave0 = E (cid:0) f ( X ) − f ( Y ) (cid:1)(cid:0) g ( X ) − g ( Y ) (cid:1) = X i = j X k = l ( u i − u j )( v k − v l ) P (cid:0) f ( X ) = u i , f ( Y ) = u j , g ( X ) = v k , g ( Y ) = v l (cid:1) = X i = j X k = l ( u i − u j )( v k − v l ) P (cid:0) f ( X ) = u i , g ( X ) = v k (cid:1) P (cid:0) f ( Y ) = u j , g ( Y ) = v l (cid:1) = X i = j X k = l ( u i − u j )( v k − v l ) p i,k p j,l . It follows that p i,k p j,l = 0, if i = j and k = l .We have both1 = X i X k P (cid:0) f ( X ) = u i , g ( X ) = v k (cid:1) = X i X k p i,k and (keeping in mind that 1I { i = j }∪{ k = l } = 1I { i = j } + 1I { k = l } − { i = j,k = l } )1 = (cid:16) X i,k p i,k (cid:17) = X i X k X j X l p i,k p j,l = X i X k X j X l p i,k p j,l { i = j }∪{ k = l } = X i X k X l p i,k p i,l + X i X k X j p i,k p j,k − X i X k p i,k = X i X k p i,k (cid:16) X l p i,l + X j p j,k − p i,k (cid:17) = X i X k p i,k D i,k Obviously D i,k ≤ p i,k > D i,k = 1.Fo some i , k p i ,k >
0. Then the whole mass of the joint distributionof (cid:0) f ( Y ) , g ( Y ) (cid:1) must be concentrated on the “cross” defined as the supportof D i ,k . If some p r,k > r = i , then by the repeated reasoning thecomplete mass of the distribution must be concentrated on the intersectionof the two “crosses”, i.e. on the vertical axis containing p i ,k , i.e.1 = X s p s,k = X s P (cid:0) f ( Y ) = u s , g ( Y ) = v k (cid:1) = P (cid:0) g ( X ) = v k (cid:1) . Similarly, if for some q = k we have p i ,q >
0, then 1 = P (cid:0) f ( Y ) = u i (cid:1) .If p r,k = 0, r = i , and p i ,q = 0, q = k , then 1 = p i ,k = P (cid:0) f ( X ) = u i , g ( X ) = v k (cid:1) . 5 cknowledgements The author would like to thank Thomas Mikosch and Boualem Djehiche fortheir help in accessing some of the old papers quoted in this work.
References [1] F. Esary et al. (1967) Esary, J. Proschan and D.J. Walkup. Associationof random variables with applications.
Ann. Math. Statist. (5),1466–1474 (1967). MR1557852.[2] B.L. Bratley, P. Fox and L.E. Schrage. A Guide to Simulation. SecondEd.
Springer-Verlag, New York (1987). MR0050410.[3] A. Bulinski and A. Shashkin.
Limit theorems for associated ran-dom fields and related systems . World Scientific Publishing, Singapore(2007). MR0050410.[4] P.L Chebyshev.
Polnoje Sobranije Soˇcinieniˇj (Complete CollectedWorks), Vol. 3 . Izdatelstvo Akademii Nauk SSSR, Moscow-Leningrad(1948).[5] J.E. Hardy, G.H. Littlewood and G. P´olya.
Inequalities, Second Ed.
Cambridge University Press, Cambridge (1952). MR0050410.[6] D.S. Mitrinovi´c.
Analytic Inequalities . Springer-Verlag, Berlin (1970).MR0050410.[7] D.S. Mitrinovi´c.
Elementarne nier´owno´sci . Pa´nstwowe WydawnictwoNaukowe, Warszawa (1972). MR0050410.[8] J.E. Mitrinovi´c, D.S. Peˇcari´c and A.M. Fink.
Classical and New In-equalities in Analysis . Kluwer Academic Publishers, Dordrecht (1993).MR0050410.[9] C.P. Niculescu and L.-E. Persson.
Convex Functions and their Appli-cations. A Contemprorary Approach . CMS Books in Mathematics, Vol.23, Springer-Verlag New York (2006). MR0050410.[10] C.P. Niculescu and J. Peˇcari´c. The equivalence of Chebyshev’s inequal-ity to the hermite-hadamard inequality.
Math. Reports (2), 145–156(2010). MR1557756. 611] A. Simonovits. Three economic applications of Chebyshev’s algebraicinequality. Math. Social Sci. (3), 207–220 (1995). MR1557756.[12] J.F. Steffensen. On a generalization of certain inequalities by tchebychefand jensen. Scand. Actuar. J. (3-4), 137–147 (1925).[13] J.F. Steffensen. An ulighed mellem middelværdier.
MatematiskTidsskrift C (1), 49–53 (1925).[14] A. Wagener. Chebyshev’s algebraic inequality and comparative stat-ics under uncertainty. Math. Social Sci. (2), 217–221 (2006).MR1557756.[15] W. Whitt. Bivariate distributions with given marginals. Ann. Statistics4