Asymptotic behaviour of weighted differential entropies in a Bayesian problem
aa r X i v : . [ c s . I T ] J u l Asymptotic behaviour of weighted differential entropies ina Bayesian problem
Mark Kelbert ∗ and Pavel Mozgunov † International Laboratory of Stochastic Analysis and Its ApplicationsNational Research University Higher School of EconomicsMoscow, RussiaDepartment of MathematicsSwansea UniversitySwansea, UK
Abstract
We consider a Bayesian problem of estimating of probability of success in a seriesof conditionally independent trials with binary outcomes. We study the asymptotic be-haviour of differential entropy for posterior probability density function conditional on x successes after n conditionally independent trials, when n → ∞ . It is shown that afteran appropriate normalization in cases x ∼ n x ∼ n β (0 < β <
1) limiting distributionis Gaussian and the differential entropy of standardized RV converges to differential en-tropy of standard Gaussian random variable. When x or n − x is a constant the limitingdistribution in not Gaussian, but still the asymptotic of differential entropy can be foundexplicitly.Then suppose that one is interested to know whether the coin is fair or not and forlarge n is interested in the true frequency. To do so the concept of weighted differentialentropy introduced in [1] is used when the frequency γ is necessary to emphasize. It wasfound that the weight in suggested form does not change the asymptotic form of Shannon,Renyi, Tsallis and Fisher entropies, but change the constants. The main term in weightedFisher Information is changed by some constant which depend on distance between thetrue frequency and the value we want to emphasize.In third part we derived the weighted versions of Rao-Cram´er, Bhattacharyya andKullback inequalities. This result is applied to the Bayesian problem described above.The asymptotic forms of these inequalities are obtained for a particular class of weightfunctions. AMS subject classification:
Key words: weighted differential entropy, Renyi entropy, Tsallis entropy, Fisher informa-tion, Rao-Cram´er inequality, Bhattacharyya inequality, Kullback inequality ∗ Electronic address: [email protected] † Electronic address: [email protected] ; corresponding author ontents Let U be a random variable (RV) that uniformly distributed in interval [0 , p , consider a sequence of conditionally independent identically distributed ξ i , where ξ i = 1 with probability p and ξ i = 0 with probability 1 − p . Let x i , each 0 or 1, be an outcomein trial i . Denote by S n = ξ + . . . + ξ n , by x = ( x i , i = 1 , ..., n ) and by x = x ( n ) = P ni =1 x i .Note that RVs ( ξ i ) are positively correlated. Indeed, P ( ξ i = 1 , ξ j = 1) = R p dp = 1 / i = j ,but P ( ξ i = 1) P ( ξ j = 1) = ( R pdp ) = 1 / n trials the exact sequence x will appear: P ( ξ = x , ..., ξ n = x n ) = Z p x (1 − p ) n − x d p = 1( n + 1) (cid:0) nx (cid:1) . (1.1)This implies that the posterior probability density function (PDF) of the number of x successesafter n trials is uniform: P ( S n = x ) = 1( n + 1) , x = 0 , . . . , n. The posterior PDF given the information that after n trials one observes x successes takesthe form f p | S n ( p | ξ = x , ..., ξ n = x n ) = ( n + 1) (cid:18) nx (cid:19) p x (1 − p ) n − x . (1.2)Note that conditional distribution given in (1.2) is a Beta-distribution B ( x + 1 , n − x + 1). “Itis known that Beta-distribution is asymptotically normal with its mean and variance as x and( n − x ) tend to infinity, but this fact is lacking a handy reference”(see [7, p.1]). That is why,we give the proof of this fact in two cases.The RV Z ( n ) with PDF (1.2) has the following expectation: E [ Z ( n ) | S n = x ] = x + 1 n + 2 , (1.3)and the following variance: V [ Z ( n ) | S n = x ] = ( x + 1)( n − x + 1)( n + 3)( n + 2) . (1.4)2ecall: h d ( f ) is the differential entropy of some RV Z with PDF f : h d ( f ) = − Z R f ( z )log( f ( z ))d z (1.5)with convention 0log0 = 0. Note that after a linear transformation of RV Z to RV X withsome PDF g ( x ) where X = d Z + d differential entropy of RV X transforms in the followingway [5, 13]: h d ( g ) = h d ( f ) + log d (1.6)Let ¯ Z be a standard Gaussian RV with PDF ϕ then the differential entropy of ¯ Z [13]: h d ( ϕ ) = 12 log (2 πe ) . The goal of the first part of the work is to study the asymptotic behaviour of differentialentropy of the following RVs:1. Z ( n ) α with PDF f ( n ) α given in (1.2) when x = x ( n ) ∼ αn , where 0 < α < Z ( n ) β with PDF f ( n ) β given in (1.2) when x = x ( n ) ∼ n β , where 0 < β < Z ( n ) x with PDF f ( n ) x given in (1.2) when x = c and Z ( n ) n − x with PDF f ( n ) n − x given in (1.2)when n − x ( n ) = c where c and c are some constants.We will demonstrate that the limiting distributions of standardized RV when n → ∞ in thecases 1 and 2 are Gaussian. However, the asymptotic normality does not imply automaticallythe limiting form of differential entropy. In general the problem of taking the limits under thesign of entropy is rather delicate and was extensively studied in literature, cf., i.e., [6, 12]. Inthe third case the limiting distribution is not Gaussian, but still the asymptotic of differentialentropy can be found explicitly. In second part of the paper (section 3) we suppose that one is interested to know whetherthe coin is fair or not and for large n is interested in true frequency. So the goal of a statisticalexperiment in twofold: on the initial stage an experimenter is mainly concerns whether the coinis fair (i.e. p = 1 /
2) or not. As the size of a sample grows, he proceeds to estimating the truevalue of the parameter anyway. We want to quantify the differential entropy of this experimenttaking into account its two sided objective. It seems that quantitative measure of informationgain of this experiment is provided by the concept of weighted differential entropy [4, 1, 15, 16].In our case φ ( x ) is a weight function that underline the importance of 0 . h φ ( f ) = − Z R φ ( n ) ( p ) f ( p )log f ( p ) dp, (1.7) H φν ( f ) = 11 − ν log Z R φ ( n ) ( z ) ( f ( z )) ν d z (1.8) S φq ( f ) = 1 q − (cid:18) − Z R φ ( n ) ( z ) ( f ( z )) q d z (cid:19) (1.9)3 φ ( θ ) = E φ ( n ) ( Z ) (cid:18) ∂∂θ log f ( Z ; θ ) (cid:19) (cid:12)(cid:12)(cid:12) θ ! (1.10)where Z = Z ( n ) is a RV with PDF f given in (1.2) and φ ( n ) ( p ) is a weight function thatunderline the importance of some particular value. The following special cases are considered:1. φ ( n ) ( p ) = 12. φ ( n ) ( p ) depends both on n and p We will denote by γ the frequency that we want to emphasize (the 0 . φ ( x ) ≥ x . Choosing the weight function we adopt thefollowing normalization rule: Z R φ ( n ) ( p ) f ( n ) ( p ) dp = 1 (1.11)It can be easily checked that if weight function φ ( n ) ( p ) satisfies (3.33) then the Renyiweighted entropy (1.8) and Tsallis weighted entropy (1.9) tend to Shannon’s weighted entropyas ν → q → φ ( n ) ( p ) = Λ ( n ) ( γ ) p γ √ n (1 − p ) (1 − γ ) √ n , (1.12)where Λ ( n ) ( γ ) is found from the normalizing condition (3.33) and is given explicitly in (3.1). Thisweight function is selected as a model example with a twofold goal to emphasize a particularvalue γ for moderate n , while preserving the true frequency p ∗ . In the third part of paper (Section 4,5 and 6) we recall the statistical experiment with binaryoutcomes where the main objective is to find out whether the probabilities of success and failureare equal. In other words, the statistical decisions in a neighbourhood of a particular value γ = 1 / γ = 1 / Z ∈ R d with PDF f ( z ) or family of RV Z θ ∈ R d with PDF f θ where θ ∈ Θ ⊂ R m is the vector of parameters of PDF f θ . Denote z = [ z , . . . , z d ] T . Let φ ( . ) be the positive weightfunction that emphasizes particular value γ , E φθ ( Z ) be the weighted expectation of randomvector Z with PDF f θ g ( θ ) ≡ E φθ ( Z ) = Z R d z f θ ( z ) φ ( z )d z (1.13)and E θ ( Z ) be the classic expectation of random vector Z with PDF f θ e ( θ ) ≡ E θ ( Z ) = Z R d z f θ ( z )d z . (1.14)4uantitative measures of information gain of experiments of the type described above areprovided by the weighted Shannon differential entropy [1, 15, 16] h φ ( f θ ) = − Z R d φ ( z ) f θ ( z ) log f θ ( z )d z , (1.15)the weighted ( m × m ) Fisher information matrixI φ ( θ ) = E φθ "(cid:18) ∂∂θ log f θ ( Z ) (cid:19) (cid:18) ∂∂θ log f θ ( Z ) (cid:19) T (1.16)where ∂∂θ is the notation for the gradient (the vector ∂∂θ log f θ ( Z ) is the score ), and the weightedKullback-Leibler divergence of g from f [14] D φ ( f || g ) = Z R d φ ( z ) f ( z )log f ( z ) g ( z ) d z . (1.17)For simplicity we assume that the inverse Fisher matrix exists, but, in a general case, underinverse we understand the Moore-Penrose pseudoinverse. Also it is shown that in this contextit is more convenient to study the calibrated Kullback-Leibler divergence defined in [14]: K φ ( f || g ) = Z R d φ ( z ) f ( z ) C ( f ) log f ( z ) C ( g ) g ( z ) C ( f ) d z = D ( ˜ f || ˜ g ) (1.18)where C ( f ) = R R d φ ( z ) f ( z )d z , ˜ f = φ ( z ) f ( z ) C ( f ) − and D ( f || g ) is the standard Kullback-Leibler divergence of g from f D ( f || g ) = Z R d f ( z )log f ( z ) g ( z ) d z . (1.19)The goal of the third part is twofold. Firstly, the weighted analogous of the Rao-Cram´er,Bhattacharyya and Kullback inequalities will be derived in a general case. Secondly, theseinequalities will be illustrated in the example described above which has an independent interest. Theorem 1.
Let ˜ Z ( n ) α = n ( α (1 − α )) − ( Z ( n ) α − α ) be a RV with PDF ˜ f ( n ) α . Let ¯ Z ∼ N (0 , be the standard Gaussian RV, then (a) ˜ Z ( n ) α weakly converges to ¯ Z : ˜ Z ( n ) α ⇒ ¯ Z as n → ∞ . (b) The differential entropy of ˜ Z ( n ) α converges to differential entropy of ¯ Z : lim n →∞ h ( ˜ f ( n ) α ) = 12 log (2 πe ) . (c) The Kullback-Leibler divergence of ϕ from ˜ f ( n ) α tends to as n → ∞ : lim n →∞ D ( ˜ f ( n ) α || ϕ ) = 0 . roof. (a) Let x = x ( n ) = αn where 0 < α < Z ( n ) α = n ( α (1 − α )) − ( Z ( n ) α − α ) . We proceed by the method of characteristic functions, and establish that: φ ( t ) = E [ e it ˜ Z ( n ) α ] → e − t / (2.1)for all t ∈ R . Indeed φ ( t ) = Z e it ( p − α ) √ n √ α (1 − α ) f ( n ) α ( p )d p = ( n + 1) (cid:18) nx (cid:19) e it ( − α ) √ n √ α (1 − α ) Z e it p √ n √ α (1 − α ) p x (1 − p ) n − x d p and consider the integral: I ( t, α, n ) = Z e n ( it p √ α (1 − α ) n + α log p +(1 − α )log(1 − p )) d p. (2.2)Denote g ( p ) = it p √ α (1 − α ) n + α log p + (1 − α )log(1 − p ). The integrand in (2.2) has a narrow sharppeak, and the integral is completely dominated by the maximum of Re[ g ( p )] when n → ∞ . Forfixed values of t , α and n → ∞ , it can be studied by the saddle point method [8, Theorem 1.3,p.170]: I ( t, α, n ) ≃ e ng ( p ∗ ) s π − ng ′′ ( p ∗ ) (cid:18) O (cid:18) n (cid:19)(cid:19) . (2.3)Find the point of maximum of Re[ g ( p )] and deform initial contour [0 ,
1] into the steepest descentcontour through the saddle point: p ∗ = α + it p (1 − α ) α √ n + O (cid:18) n (cid:19) . So, φ ( t ) takes the form: φ ( t ) = e − t ( n + 1) (cid:18) nx (cid:19) ( p ∗ ) x (1 − p ∗ ) n − x s π − ng ′′ ( p ∗ ) + O (cid:18) n (cid:19) . Here and below x = ⌊ αn ⌋ . Next, by Stirling’s formula:( n + 1) (cid:18) nx (cid:19) ≃ ( n + 1) n n x x ( n − x ) ( n − x ) r n πx ( n − x ) . So, the straightforward computation yields:( p ∗ ) x (1 − p ∗ ) n − x ≃ α x (1 − α ) ( n − x ) e it √ (1 − α ) αn + (1 − α ) t − it √ (1 − α ) αn + αt == e t (cid:16) xn (cid:17) x (cid:18) n − xn (cid:19) n − x . It can be checked that next term in asymptotic of log p ∗ (as well as log(1 − p ∗ )) is decaying to0 after multiplication of αn and (1 − α ) n , correspondingly.We have for t ∈ R φ ( t ) ≃ e − t ( n + 1) n n x x ( n − x ) ( n − x ) r n πx ( n − x ) e t (cid:16) xn (cid:17) x (cid:18) n − xn (cid:19) n − x r πx ( n − x ) n ≃≃ e − t . (b) Write the differential entropy in the form: h ( f ( n ) α ) = − (cid:18) log (cid:20) ( n + 1) (cid:18) nx (cid:19)(cid:21) + ( n + 1) (cid:18) nx (cid:19) xI + ( n + 1) (cid:18) nx (cid:19) ( n − x ) I (cid:19) (2.4)where I = Z p x (1 − p ) n − x log p d p, (2.5) I = Z p x (1 − p ) n − x log(1 − p )d p. (2.6)Integrals I and I can be computed explicitly by reducing to the standard integral Z x µ − (1 − x r ) ν − log x d x = 1 r B (cid:16) µr , ν (cid:17) (cid:16) ψ (cid:16) µr (cid:17) − ψ (cid:16) µr + ν (cid:17)(cid:17) (2.7)where ψ ( x ) is the digamma function, and B ( x, y ) is the Beta-function [9, r ≡ , µ − ≡ x, ν − ≡ n − x .For integral I , we get: U = ( n + 1) (cid:18) nx (cid:19) xI = − x ( ψ ( n + 2) − ψ ( x + 1)) . Similarly, for the second integral I , we obtain: U = ( n + 1) (cid:18) nx (cid:19) ( n − x ) I = − ( n − x )( ψ ( n + 2) − ψ ( n − x + 1)) . After summation of these two integrals and using the asymptotic for digamma function [9, U + U = x log x − n log n + ( n − x )log( n − x ) −
12 + O (cid:18) n (cid:19) . Next, we apply the Stirling formula to the first term in (2): U = log (cid:20) ( n + 1) (cid:18) nx (cid:19)(cid:21) = n log n − x log x − ( n − x )log( n − x )++ 12 log n −
12 log α −
12 log(1 − α ) − log( √ π ) + O (cid:18) n (cid:19) . . Here as before x = ⌊ αn ⌋ . So, we obtain the following asymptotic of the differential entropy:lim n →∞ (cid:20) h ( f ( n ) α ) −
12 log 2 πe [ α (1 − α )] n (cid:21) = 0 . (2.8)Due to (1.6), the differential entropy of RV ˜ Z ( n ) α has the form:lim n →∞ h h ( ˜ f ( n ) α ) i = 12 log (2 πe ) . (2.9)7 c) By the definition of the the Kullback-Leibler divergence: D ( ˜ f ( n ) α || ϕ ) = − h ( ˜ f ( n ) α ) − Z ˜ f ( n ) α ( p ) log ϕ ( p )d p = −
12 log (2 πe ) + 12 log(2 π ) + 12 Z p ˜ f ( n ) α d p + O (cid:18) n (cid:19) = O (cid:18) n (cid:19) , R p ˜ f ( n ) α d p = 1 + O (cid:0) n (cid:1) is the second moment of ˜ Z ( n ) α . It completes the proof. Theorem 2.
Let ˜ Z ( n ) β = n − β/ ( Z ( n ) β − n β − ) be a RV with PDF ˜ f ( n ) β and ¯ Z ∼ N (0 , then (a) ˜ Z ( n ) β weakly converges to ¯ Z : ˜ Z ( n ) β ⇒ ¯ Z as n → ∞ . (b) The differential entropy of ˜ Z ( n ) β converges to differential entropy of ¯ Z : lim n →∞ h ( ˜ f ( n ) β ) = 12 log (2 πe ) . (c) The Kullback-Leibler divergence of ϕ from ˜ f ( n ) β tends to as n → ∞ : lim n →∞ D ( ˜ f ( n ) β || ϕ ) = 0 . Proof. (a)
Let x = x ( n ) = n β where 0 < β < Z ( n ) β such that˜ Z ( n ) β = n − β/ ( Z ( n ) β − n β − ) . In this case, it is more convenient to proceed by the method of moments. We use the followingclassical result. Let f n be a sequence of distribution functions with finite moments µ k ( n ), and µ k ( n ) tends to ν k for each k as n → ∞ where ν k are moments of distribution f , and thedistribution f is uniquely defined by its moments, then f n weakly converges to f as n → ∞ [11].Consider RV ˜ Z ( n ) β = n − β/ ( Z ( n ) β − n β − ) where Z ( n ) β has PDF (1.2) when x = ⌊ n β ⌋ andcompute all moments of ˜ Z ( n ) β . First, E ( ˜ Z ( n ) β ) → n → ∞ because E ( Z ( n ) β ) = n β − + O (cid:0) n (cid:1) .Next, we check that E (cid:20)(cid:16) ˜ Z ( n ) β (cid:17) (cid:21) = n − β/ E ( Z ( n ) β − n β − ) → n → ∞ . Compute centralmoments for any k > E (cid:20)(cid:16) ˜ Z ( n ) β (cid:17) k (cid:21) = n k − βk (1 − n − β ) − k (1 − n β − ) k F [ − k, n β + 1; n + 2; n − β ] (2.10)where F [ − k, n β + 1; n + 2; n − β ] is the hypergeometric function, which, in this case, is thepolynomial: F [ − k, n β + 1; n + 2; n − β ] = k X i =0 ( − i (cid:18) ki (cid:19) ( n β + 1) i ( n + 2) i n i (1 − β ) q ) n is the rising Pochhammer symbol. For n > q ) n = q ( q + 1) ... ( q + n − q ) = 1.Consider asymptotic of terms separately: n k − βk (1 − n − β ) − k (1 − n β − ) k ≃ O ( n kβ )and F [ − k, n β + 1; n + 2; n − β ] ≃ O ( n − [0 . . k ] β ) (2.11)where ⌊ k ⌋ is the integer part of k . For k odd: n k (1 − β/ E ( Z ( n ) β − n β − ) k = O ( n kβ ) O ( n − [0 . . k ] β ) ≃ O ( n − β/ ) → n → ∞ . For k even: n k (1 − β/ E ( Z ( n ) β − n β − ) k = O ( n kβ ) O ( n − [0 . . k ] β ) = O (1) . (2.13)We see that every even central moment tends to a constant which is the coefficient in frontof term n − [0 . . k ] β in the hypergeometric function. For k even, we have: n k (1 − β/ E ( Z ( n ) β − n β − ) k → ( k − . (2.14)These imply that RV ˜ Z ( n ) β weakly converges to the standard Gaussian RV. (b) Write the differential entropy in the form: h ( f ( n ) β ) = − (cid:18) log (cid:20) ( n + 1) (cid:18) nx (cid:19)(cid:21) + ( n + 1) (cid:18) nx (cid:19) xI + ( n + 1) (cid:18) nx (cid:19) ( n − x ) I (cid:19) == − ( U + U + U ) (2.15)where I and I are defined in (2.5) and (2.6) and can be computed explicitly by (2.7).As before, we apply the Stirling formula for U : U = n log n − x log x − ( n − x )log( n − x ) + log n + 12 ( − log n β − log(1 − n β − )) −
12 log(2 π ) + O (cid:18) n β (cid:19) . As far as 0 < β < n → ∞ . Note that the rate of decaying dependson parameter β , contrary to reminder in Theorem 1. Now U + U can be computed as follows: U + U = x log x − n log n + ( n − x )log( n − x ) −
12 + O (cid:18) n β (cid:19) . So, we proved that lim n →∞ (cid:20) h ( f ( n ) β ) −
12 log 2 πe (1 − n β − ) n − β (cid:21) = 0Due to (1.6), the differential entropy of RV ˜ Z ( n ) β has the form:lim n →∞ h ( ˜ f ( n ) β ) = 12 log (2 πe ) . c) Similarly, by the definition of the the Kullback-Leibler divergence: D ( ˜ f ( n ) β || ϕ ) = − h ( ˜ f ( n ) β ) − Z ˜ f ( n ) β ( p ) log ϕ ( p )d p = −
12 log (2 πe ) + 12 log(2 π ) + 12 Z p ˜ f ( n ) β d p + O (cid:18) n β (cid:19) = O (cid:18) n β (cid:19) , R p ˜ f ( n ) β d p = 1 + O (cid:0) n β (cid:1) is the second moment of ˜ Z ( n ) β . Theorem 3.
Let ˜ Z ( n ) c = nZ ( n ) c be a RV with PDF ˜ f ( n ) c and ˜ Z ( n ) n − c = nZ ( n ) n − c be a RV with PDF ˜ f ( n ) n − c . Denote H k = 1+ + . . . + k the partial sum of harmonic series and γ the Euler-Mascheroniconstant, then (a) lim n →∞ h ( ˜ f ( n ) c ) = c + c − X i =0 log( c − i ) − c ( H c − γ ) + 1 . (b) lim n →∞ h ( ˜ f ( n ) n − c ) = c + c − X i =0 log( c − i ) − c ( H c − γ ) + 1 . Proof. (a)
Let x = c where c is a some integer constant. Consider the differential entropy: h ( f ( n ) c ) = − ( U + U + U ) where U , U and U defined in (2.15). Applying the Stirling formulafor U : U = log n − log( x !) + x log n + O (cid:18) n (cid:19) . Next, we compute U + U via formula (2.7) as before. The only difference will be in asymptoticof digamma functions [9, x = c where c is constant: ψ ( n − x + 1) ≃ log n + / − x n , and ψ ( x + 1) = H x − γ , here H x is the partial sum of harmonicseries and γ stands for the Euler-Mascheroni constant. Using that x = c :lim n →∞ (cid:2) h ( f ( n ) c ) + log n (cid:3) = c + c − X i =0 log( c − i ) − c ( H c − γ ) + 1 . Due to (1.6) it can be written in the following form:lim n →∞ h ( ˜ f ( n ) c ) = c + c − X i =0 log( c − i ) − c ( H c − γ ) + 1 . (b) Let n − x ( n ) = c where c is some integer constant. In a similar way we compute h ( f ( n ) n − c )where n − x = c and c is a constant. The asymptotic of digamma function is given as follows[9, ψ ( n − x + 1) = H c − γ where x = n − c , and the final result for differential entropy: h ( f ( n ) n − c ) = − log n + c − c ( H c − γ ) + c − X i =0 log( c − i ) + 1 + O (cid:18) n (cid:19) .
10n terms of standardized RV ˜ Z ( n ) n − c we obtain due to (1.6):lim n →∞ h ( ˜ f ( n ) n − c ) = c + c − X i =0 log( c − i ) − c ( H c − γ ) + 1 . The normalizing constant in the weight function (1.12) is found from the condition (3.33). Weobtain that: Λ ( n ) ( γ ) = Γ( x + 1)Γ( n − x + 1)Γ( n + 2 + √ n )Γ( x + γ √ n + 1)Γ( n − x + 1 + √ n − γ √ n )Γ( n + 2) . (3.1)We denote by ψ (0) ( x ) = ψ ( x ) and by ψ (1) ( x ) the digamma function and its first derivativerespectively. ψ ( n ) ( x ) = d n+1 d x n +1 log (Γ( x )) (3.2)In further calculations we will need the asymptotic of these functions: ψ ( x ) = log( x ) − x + O (cid:18) x (cid:19) as x → ∞ ,ψ (1) ( x ) = 1 x + 12 x + O (cid:18) x (cid:19) as x → ∞ . Proposition 1.
Let Z ( n ) be a RV with f ( n ) - conditional PDF after n trials given by (1.2), h φ ( f ( n ) α ) - the weighted Shannon entropy of Z ( n ) given in (1.15) . When x = αn (0 < α < and the weight function φ ( n ) ( p ) is given in (1.12) lim n →∞ (cid:18) h φ ( f ( n ) α ) −
12 log (cid:18) πeα (1 − α ) n (cid:19)(cid:19) = ( α − γ ) α (1 − α ) . (3.3) If the α = γ then the asymptotic of h φ ( f ) is exactly the asymptotic of differential Shannon’sentropy with φ ( n ) ( p ) = 1 .Proof. The Shannon differential entropy of PDF f ( n ) ( p ) = f ( p ) given in (1.2) and weightfunction φ ( n ) ( p ) given in (1.12) takes the form: h φ ( f ) = log (cid:20) ( n + 1) (cid:18) nx (cid:19)(cid:21) + x Z log( p ) φ ( n ) ( p ) f ( p ) dp + ( n − x ) Z log(1 − p ) φ ( n ) ( p ) f ( p ) dp The integrals can be computed explicitly [9] (page 552): Z x µ − (1 − x r ) ν − log( x )d x = 1 r B (cid:16) µr , ν (cid:17) (cid:16) ψ (cid:16) µr (cid:17) − ψ (cid:16) µr + ν (cid:17)(cid:17) , Applying this formula for integral, we get: 11 log( p ) φ ( n ) ( p ) f ( p ) dp = ψ ( x + z + 1) − ψ ( n + √ n + 2), where z = γ √ n and ψ ( x ) is adigamma function. Z log(1 − p ) φ ( n ) ( p ) f ( p ) dp = ψ ( n − x + √ n − z + 1) − ψ ( n + √ n + 2)So we have that h φ ( f ) = log (cid:20) ( n + 1) (cid:18) nx (cid:19)(cid:21) + xψ ( x + z + 1) + ( n − x ) ψ ( n − x + √ n − z + 1) − nψ ( n + √ n + 2).By Stirling’s formula we have that for x = αn :log (cid:20) ( n + 1) (cid:18) nx (cid:19)(cid:21) = n log( n ) − x log x − ( n − x )log( n − x ) + 12 log( n ) −
12 log( α ) −
12 log(1 − α ) − log √ π + O (cid:18) n (cid:19) Using the asymptotic for digamma function ψ ( x + z + 1) = log( x ) + γ √ nx + α − γ αx + O (cid:18) n / (cid:19) ψ ( n − x + √ n − z + 1)) = log( n − x ) + (1 − γ ) √ nn − x + 2 γ − γ − α − α )( n − x ) + O (cid:18) n / (cid:19) ψ ( n + √ n + 2) = log( n ) + √ nn + O (cid:18) n (cid:19) , we get h wφ ( f ( n ) ) = 12 log 2 πe [ α (1 − α )] n + ( α − γ ) α (1 − α ) + O (cid:18) n (cid:19) (3.4)The first term in (3.4) is differential entropy with weight φ ≡ γ → α . Theorem 4.
Let Z ( n ) be a RV with f ( n ) - conditional PDF after n given by (1.2) and withweighted Renyi differential entropy H ν ( f ( n ) ) given in (1.8). (a) When both ( x ) and ( n − x ) tend to infinity as n → ∞ in the case φ ( n ) ( p ) = 1 , lim n →∞ (cid:18) H ν ( f ( n ) ) −
12 log 2 πx ( n − x ) n (cid:19) = − log( ν )2(1 − ν ) . (3.5) For any fixed n when ν → Renyi’s differential entropy of Z ( n ) tends to Shannon’s differentialentropy of Z ( n ) . (b) When x = αn (0 < α < and the weighted function is given in (1.12) lim n →∞ (cid:18) H φν ( f ( n ) α ) −
12 log 2 πα (1 − α ) n (cid:19) = − log( ν )2(1 − ν ) + ( α − γ ) α (1 − α ) ν . (3.6) For any fixed n the Renyi weighted differential entropy tends to Shannon’s weighted differentialentropy RV with PDF given in (1.2) as ν → .Proof. (a) In this case φ ( n ) ( p ) ≡
1, so the Renyi entropy have the form:(1 − ν ) H ν ( f ) = log Z ( f ( p )) ν d p = ν log (cid:20) ( n + 1) (cid:18) nx (cid:19)(cid:21) +log (cid:20)Z p νx (1 − p ) ν ( n − x ) (cid:21) = U + U
12y Stirling formula: U = ν log (cid:20) ( n + 1) (cid:18) nx (cid:19)(cid:21) = νn log( n ) − νx log( x ) − ν ( n − x )log( n − x ) + ν log( n ) + ν n ) − ν x ) − ν n − x ) − ν π ) + O (cid:18) n (cid:19) Consider the integral: Z p νx (1 − p ) ν ( n − x ) = B ( νx + 1 , ν ( n − x ) + 1) = Γ( νx + 1)Γ( ν ( n − x ) + 1)Γ( νn + 2)So by Stirling formula again: U = log (cid:20) Γ( νx + 1)Γ( ν ( n − x ) + 1)Γ( νn + 2) (cid:21) == (cid:20) νx log( ν ) + νx log( x ) − νx + 12 log( ν ) + 12 log( x ) + 12 log(2 π ) (cid:21) + (cid:20) ν ( n − x )log( ν ) + ν ( n − x )log( n − x ) − ν ( n − x ) + 12 log( ν ) + 12 log( n − x ) + 12 log(2 π ) (cid:21) −− (cid:20) νn log( n ) + νn log( n ) − νn + 12 log( ν ) + 12 log( n ) + 12 log(2 π ) (cid:21) − log( ν ) − log( n ) + O (cid:18) n (cid:19) . We obtain that U + U = 1 − ν x ) + 1 − ν n − x ) + 1 − ν π ) −
12 log( ν ) + ν log( n ) − log( n ) − − ν n ) + O (cid:18) n (cid:19) == (1 − ν ) 12 ( − log( n ) + log( x ) + log( n − x ) + log(2 π ) − n )) −
12 log( ν ) + O (cid:18) n (cid:19) = 1 − ν (cid:18) πx ( n − x ) n (cid:19) −
12 log( ν ) + O (cid:18) n (cid:19) So we have that: H ν ( f ) = 12 log (cid:18) πx ( n − x ) n (cid:19) − log( ν )2(1 − ν ) + O (cid:18) n (cid:19) , (3.7)note that it tends to Renyi differential entropy of Gaussian RV as n → ∞ .Taking the limit when ν → H ν → ( f ) = lim ν → H ν ( f ) = 12 log (cid:18) eπx ( n − x ) n (cid:19) + O (cid:18) n (cid:19) . (3.8)For example, when x = αn , 0 < α < H ν → ( f ) = 12 log 2 πe [ α (1 − α )] n + O (cid:18) n (cid:19) , where the first term is Shannon’s entropy of Gaussian RV with corresponding variance.Or similarly when x = n β , 0 < β < H ν → ( f ) = 12 log 2 πe (1 − n β − ) n − β + O (cid:18) n β (cid:19) where the first term is Shannon’s differential entropy of Gaussian RV with variance σ =1 − n β − n − β . 13 b) In this case when φ ( n ) ( p ) is given in (1.12) and x = αn , the weighted Renyi entropy hasthe form: H ψν ( f ) = 11 − ν log Z φ ( n ) ( p ) ( f ( p )) ν d p Z φ ( n ) ( p ) ( f ( p )) ν d p = U U U , where U = Γ( νx + γ √ n + 1)Γ( ν ( n − x ) + (1 − γ ) √ n + 1)Γ( νn + √ n + 2) ; U = (cid:18) Γ( n + 2)Γ( x + 1)Γ( n − x + 1) (cid:19) ν − ; U = Γ( n + √ n + 2)Γ( x + z + 1)Γ( n − x + √ n − z + 1)log( U ) = νx log( x ) + z log( x ) + 12 log( x ) + ν ( n − x )log( n − x ) + ( √ n − z )log( n − x ) +12 log(2 π ) −
12 log( ν ) + 12 log( n − x ) − νn log( n ) − √ n log( n ) −
12 log( n ) − log( n ) + ( α − γ ) α (1 − α ) ν +12 log (cid:18) πα (1 − α ) ν (cid:19) + O (cid:18) n (cid:19) log( U ) = νn log( n ) − νx log( x ) − ν ( n − x )log( n − x ) + ν log( n ) + ν n ) − log( x ) − log( n − x ) − log(2 π )) − n log( n ) + x log( x ) + ( n − x )log( n − x ) − log( n ) −
12 (log( n ) − log( x ) − log( n − x ) − log(2 π )) + O (cid:18) n (cid:19) log( U ) = log( n ) + n log( n ) + √ n log( n ) − x log( x ) − z log( x ) − ( n − x )log( n − x ) − ( √ n − z )log( n − x )+ 12 (log( n ) − log( x ) − log(2 π ) − log( n − x )) − ( α − γ ) α (1 − α ) −
12 log (2 πα (1 − α ))+ O (cid:18) n (cid:19) Taking all parts together, we obtain that H φν ( f ) = 12 log 2 πα (1 − α ) n − log ( ν )2(1 − ν ) + ( α − γ ) α (1 − α )(1 − ν ) (cid:18) ν − (cid:19) + O (cid:18) n (cid:19) (3.9)Taking the limit when ν → H φ ( f ) = lim ν → H ν ( f ) = 12 log 2 πe [ α (1 − α )] n + ( α − γ ) α (1 − α ) + O (cid:18) n (cid:19) (3.10)So the weighted Reniy entropy tends to Shannon’s weighted entropy as ν → Proposition 2.
For any continuous random variable X with PDF f ( x ) and for any non-negative weight function φ ( x ) which satisfies condition (3.33) and such that Z R φ ( x )( f ( x )) ν | log( f ( x )) | d x < ∞ , the weighted Renyi differential entropy H φν ( f ) is a non-increasing function of ν and ∂∂ν H φν ( f ) = − − ν ) Z R z ( x )log z ( x ) φ ( x ) f ( x ) dx, (3.11) where z ( x ) = φ ( x )( f ( x )) ν R R φ ( x )( f ( x )) ν d x Similarly, the Tsallis weighted entropy S φq ( f ) given in (1.9) is a non-increasing function of q . roof. We need to show that ∂∂ν H φν ( f ) ≤ .∂∂ν H φν ( f ) = log R R φ ( x )( f ( x )) ν d x (1 − ν ) + R R φ ( x )( f ( x )) ν log( f ( x ))d x (1 − ν ) R R φ ( x )( f ( x )) ν d x = I + I (3.12)Denote z ( x ) = φ ( x )( f ( x )) ν R R φ ( x )( f ( x )) ν d x . (3.13)Note that z ( x ) ≥ x and Z R z ( x )d x = 1Let Q = Z R φ ( x )( f ( x )) ν d x and Q = log Z R φ ( x )( f ( x )) ν d x .Using the substitution (3.13) Q = log( φ ( x )) + ν log( f ( x )) − log( z ( x )) . (3.14)We have that I = 11 − ν Q R R z ( x )log( f ( x ))d xQ = 11 − ν Z R z ( x )log( f ( x ))d xI + I = 1(1 − ν ) (cid:18) log Z R φ ( x )( f ( x )) ν d x + (1 − ν ) Z R z ( x )log( f ( x ))d x (cid:19) = 1(1 − ν ) I By substitution log( f ( x )) using (3.14) we get: I = Q + (1 − ν ) (cid:18) Q ν + 1 ν Z R z ( x )log( z ( x ))d x − ν Z R z ( x )log( φ ( x ))d x (cid:19) = Q ν + 1 ν Z R z ( x )log( z ( x ))d x − Z R z ( x )log( z ( x ))d x + Z R z ( x )log( φ ( x ))d x − ν Z R z ( x )log( φ ( x ))d x Applying (3.14) again we get that I = Z R z ( x )log( f ( x ))d x − Z R z ( x )log( z ( x ))d x + Z R z ( x )log( φ ( x ))d x = − Z R z ( x )log (cid:18) z ( x ) φ ( x ) f ( x ) (cid:19) d x We obtain that − ∂∂ν H φν ( f ) = 1(1 − ν ) Z R z ( x )log (cid:18) z ( x ) φ ( x ) f ( x ) (cid:19) d x = 1(1 − ν ) D KL ( z || φf ) . (3.15)Here D KL ( z || φf ) is Kullback–Leibler divergence between z and φf which is always non-negative.Due to conditions φ ( x ) f ( x ) ≥ φ ( x ) f ( x ) is itself a PDF: Z R φ ( x ) f ( x )d x = 1Similarly, one can show that Tsallis weighted differential entropy given in (1.9) is non-increasingfunction of q . So, the result follows. 15 heorem 5. Let Z ( n ) be a RV with f ( n ) - conditional PDF after n trials given by (1.2) withthe weighted Tsallis differential entropy S q ( f ( n ) ) given in (1.9). (a) When both ( x ) and ( n − x ) tend to infinity as n → ∞ and φ ( n ) ( p ) = 1 , lim n →∞ S q ( f ( n ) ) − q − − √ q (cid:18) πx ( n − x ) n (cid:19) − q !! = 0 . (3.16) For any fixed n the Tsallis differential entropy tends to Shannon’s differential entropy as q → . (b) When x = αn and the weight function φ ( n ) ( p ) given in (1.12) lim n →∞ S φq ( f ( n ) α ) − q − − √ q (cid:18) πα (1 − α ) n (cid:19) − q exp (cid:18) ( α − γ ) (1 − q )2 α (1 − α ) q (cid:19)!! = 0 (3.17) The weighted Tsallis differential entropy tends to Shannon’s weighted differential entropy RVwith PDF given in (1.2) as q → . Remark 1.
It can be seen from Theorem 4(a) and Theorem 5(a) that for large n Renyi’sentropy and Tsallis’s entropy (for φ ≡ ) ”behaves” like respective entropies of Gaussian RVwith variance σ = x ( n − x ) n .Proof. (a) In this case φ ( n ) ( p ) ≡
1, the Tsallis entropy have the form: S q ( f ) = 1 q − (cid:18) − Z ( f ( p )) q d p (cid:19) = 1 q − (cid:18) − Z (cid:18) ( n + 1) (cid:18) nx (cid:19) p x (1 − p ) n − x (cid:19) q d p (cid:19) It was shown above thatlog Z ( f ( p )) q d p ≃ − q (cid:18) πx ( n − x ) n (cid:19) −
12 log( q )So we have that V = Z ( f ( p )) q d p ≃ √ q (cid:18) πx ( n − x ) n (cid:19) − q We straightforwardly obtain that S q ( f ) ≃ q − − √ q (cid:18) πx ( n − x ) n (cid:19) − q ! (3.18)Note that V → q →
1, applying L’Hospital’s rule we get that:lim q → S q ( f ) = S ( f ) ≃
12 log (cid:18) eπx ( n − x ) n (cid:19) (3.19)The first term in expression above is nothing else but Shannon’s differential entropy ofGaussian RV. (b) In this case when φ ( n ) ( p ) is given in (1.12) the Tsallis entropy have the form: S φq ( f ) = 1 q − (cid:18) − Z φ ( n ) ( p ) ( f ( p )) q d p (cid:19) Using that x = αn and by Stirling’s formula, it was shown above thatlog (cid:20)Z φ ( n ) ( p ) ( f ( p )) q d p (cid:21) ≃ − q πα (1 − α ) n − log( q )2 + ( α − γ ) α (1 − α ) (cid:18) q − (cid:19)
16o we have: V = Z φ ( n ) ( p ) ( f ( p )) q d p ≃ √ q (cid:18) πα (1 − α ) n (cid:19) − q exp (cid:18) ( α − γ ) α (1 − α ) (cid:18) q − (cid:19)(cid:19) Weighted Tsallis entropy: S φq ( f ( p )) ≃ q − − √ q (cid:18) πα (1 − α ) n (cid:19) − q exp (cid:18) ( α − γ ) α (1 − α ) (cid:18) q − (cid:19)(cid:19)! (3.20)Note that V → q →
1, applying L’Hospital’s rule we get that: S φ ( f ) = lim q → S φq ( f ) ≃
12 log 2 πe [ α (1 − α )] n + ( α − γ ) α (1 − α ) . (3.21)Then the weighted Tsallis entropy tends to weighted Shannon’s differential entropy when q → Theorem 6.
Let Z ( n ) be a RV with f ( n ) α - conditional PDF after n trials given by (1.2), when x = αn (0 < α < and I ( f ( n ) α ) is the weighted Fisher information of Z ( n ) given in (1.5): (a) When φ ( n ) ( p ) = 1 , lim n →∞ (cid:20) I ( f ( n ) α ) − (cid:18) α (1 − α ) (cid:19) n (cid:21) = − α − α + 12 α (1 − α ) . (3.22) (b) When φ ( n ) ( p ) is given in (1.12): lim n →∞ (cid:20) I φ ( f ( n ) α ) − (cid:18) α (1 − α ) + ( α − γ ) (1 − α ) α (cid:19) n − B ( α, γ ) √ n (cid:21) = C ( α, γ ) , (3.23) where B ( α, γ ) and C ( α, γ ) are constants which depend only on α and γ and are given in (3.29)and (3.30) respectively .Proof. (a) The Fisher information in the case φ ( n ) ( p ) = 1 and x = αn takes the form: I ( α ) = E (cid:18) ∂∂α log f ( p ; α ) (cid:19) (cid:12)(cid:12)(cid:12) α ! = Z (cid:18) ∂∂α log f ( p ; α ) (cid:19) f ( p, α ) dp, where f = f ( n ) α . Next,log( f ( p, α )) = αn log( p ) + (1 − α ) n log(1 − p ) + log( n + 1)! − log( x !) − log(( n − x )!)and ∂∂α log f ( p ; α ) = n log( p ) − n log(1 − p ) + nψ ( n − x + 1) − nψ ( x + 1) , (3.24) (cid:18) ∂∂ log f ( p ; α ) (cid:19) = n log ( p ) + n log (1 − p ) + n ψ ( n − x + 1) + n ψ ( x + 1) − n log( p )log(1 − p ) + 2 n log( p ) ψ ( n − x + 1) − n log( p ) ψ ( x + 1) − n log(1 − p ) ψ ( n − x + 1) + 2 n log(1 − p ) ψ ( x +1) − n ψ ( x + 1) ψ ( n − x + 1).For the following computation of expectation we will need so following integrals:17 (log( p )) p x (1 − p ) n − x dp = Γ( n − x + 1)Γ( x + 1)Γ( n + 2) ( ψ ( n + 2) − ψ ( x + 1)) − ψ (1) ( n + 2) + ψ (1) ( x + 1), where Γ( x ) is a Gamma function and ψ (1) ( x ) is the first derivative of digammafunction. Z (log(1 − p )) p x (1 − p ) n − x dp = Γ( n − x + 1)Γ( x + 1)Γ( n + 2) ( ψ ( n + 2) − ψ ( n − x + 1)) − ψ (1) ( n +2) + ψ (1) ( n − x + 1) Z log( p )log(1 − p ) p x (1 − p ) n − x dp = Γ( n − x + 1)Γ( x + 1)Γ( n + 2) ( ψ ( n + 2) − ψ ( n − x + 1)( ψ ( n +2) − ψ ( x + 1)) − ψ (1) ( n + 2) Z log( p ) p x (1 − p ) n − x dp = Γ( n − x + 1)Γ( x + 1)Γ( n + 2) ( − ψ ( n + 2) + ψ ( x + 1)) Z log(1 − p ) p x (1 − p ) n − x dp = Γ( n − x + 1)Γ( x + 1)Γ( n + 2) ( − ψ ( n + 2) + ψ ( n − x + 1)So, we have that Z (cid:18) ∂∂α log f ( p ; α ) (cid:19) f ( p, α ) dp = n ( n + 1) (cid:18) nx (cid:19) Γ( n − x + 1)Γ( x + 1)Γ( n + 2) (( ψ ( n + 2)) + ( ψ ( x + 1)) − ψ ( n + 2) ψ ( x + 1) − ψ (1) ( n +2) + ψ (1) ( x + 1) + ( ψ ( n + 2)) + ( ψ ( n − x + 1)) − ψ ( n + 2) ψ ( n − x + 1) − ψ (1) ( n + 2) + ψ (1) ( n − x + 1) + ( ψ ( n − x + 1)) + ( ψ ( x + 1)) − ψ ( n + 2)) + 2 ψ ( n + 2) ψ ( x + 1) + 2 ψ ( n − x + 1) ψ ( n +2) − ψ ( n − x + 1) ψ ( x + 1) + 2 ψ (1) ( n + 2) − ψ ( n − x + 1) ψ ( n + 2) + 2 ψ ( n − x + 1) ψ ( x + 1) +2 ψ ( x + 1) ψ ( n + 2) − ψ ( x + 1)) − ψ ( n − x + 1)) + 2 ψ ( n − x + 1) ψ ( n + 2) + ψ ( x + 1) ψ ( n − x + 1) − ψ ( x + 1) ψ ( n + 2) − ψ ( x + 1) ψ ( n − x + 1)) == n ( ψ (1) ( x + 1) + ψ (1) ( n − x + 1)) I ( α ) = n ( ψ (1) ( x + 1) + ψ (1) ( n − x + 1)) . (3.25)Using the asymptotic for the digamma function we can rewrite: I ( α ) = 1 α (1 − α ) n −
12 2 α − α + 1 α (1 − α ) + O (cid:18) n (cid:19) . (3.26) Remark 2.
When x = αn Z pf ( n ) α d p = α + b n ( α ) , where b n ( α ) is a bias. b n ( α ) ≃ − αn Note that ∂∂α b n ( α ) ≃ − n → as n → ∞ . So, our estimate is asymptotically unbiased. Also note that the first term inTheorem 6 has the same form as in the classical problem of estimating p in a series of binarytrials np (1 − p ) . b) The weighted Fisher Information in the case x = αn (0 < α < ) takes the followingform: I φ ( f ) = E φ ( n ) ( p ) (cid:18) ∂∂α log f ( p ; α ) (cid:19) (cid:12)(cid:12)(cid:12) α ! = Z φ ( n ) ( p ) (cid:18) ∂∂α log f ( p ; α ) (cid:19) f ( p, α ) dp where the φ ( n ) ( p ) is given in (1.12).The second term under integral (cid:18) ∂∂α log f ( p ; α ) (cid:19) can be found as before exactly.Let W = Γ( n − x + 1 + √ n − z )Γ( x + 1 + z )Γ( n + 2 + √ n ) . So in order to compute the weighted Fisherinformation we will need to compute following integrals. Z (log( p )) p z + x (1 − p ) n − x + √ n − z dp = W ( ψ ( n + 2 + √ n ) − ψ ( x + z + 1)) − ψ (1) ( n + 2 + √ n ) + ψ (1) ( x + z + 1) Z (log(1 − p )) p z + x (1 − p ) n − x + √ n − z dp = W ( ψ ( n + 2 + √ n ) − ψ ( n − x + 1 + √ n − z )) − ψ (1) ( n + 2 + √ n ) + ψ (1) ( n − x + 1 + √ n − z ) Z log( p )log(1 − p ) p z + x (1 − p ) n − x + √ n − z dp = W ( ψ ( n + 2 + √ n ) − ψ ( n − x + 1 + √ n − z )( ψ ( n +2 + √ n ) − ψ ( x + 1 + z )) − ψ (1) ( n + 2 + √ n ) Z log( p ) p z + x (1 − p ) n − x + √ n − z dp = W ( − ψ ( n + 2 + √ n ) + ψ ( x + 1 + z ) Z log(1 − p ) p z + x (1 − p ) n − x + √ n − z dp = W ( − ψ ( n + 2 + √ n ) + ψ ( n − x + 1 + √ n − z )Taking all parts together: I φ ( f ( n ) α ) = n (cid:0) ψ (1) ( x + z + 1) + ψ (1) ( n − x + 1 + √ n − z ) (cid:1) ++ n h ( ψ ( x + z + 1) − ψ ( x + 1)) + (cid:0) ψ ( n − x + 1 + √ n − z ) − ψ ( n − x + 1) (cid:1) i ++2 n (cid:2)(cid:0) ψ ( n − x + 1) − ψ ( n − x + √ n − z + 1) (cid:1) ( ψ ( x + z + 1) − ψ ( x + 1)) (cid:3) Using the asymptotic for the digamma function we can rewrite: I ( α ) = A ( α, γ ) n + B ( α, γ ) √ n + C ( α, γ ) + O (cid:18) √ n (cid:19) (3.27)where A ( α, γ ) = 1 α (1 − α ) + ( α − γ ) (1 − α ) α (3.28) B ( α, γ ) = 2 αγ − γ − α (1 − α ) α + ( α − γ ) (1 − α ) α ( α (2 γ − − γ ) (3.29) C ( α, γ ) = α − α − γ + 6 αγ + α (2 + 4 γ ) − α ( γ ) − − α ) α ) ++ α ( − − γ + 72 γ − γ + 28 γ + 36 α − α )12(1 − α ) α ++ 6 α ( γ − γ + 12 γ − − γ (11 γ − αγ − γ − γ + 14 γ )12(1 − α ) α (3.30)19 rˆole of the weight function of form (1.12) results in appearance of the term of order √ n ,but the main order, n , remains the same. However, the coefficient in front of it is higher by( α − γ ) (1 − α ) α .Evidently, when the frequency of special interest is equal to the true frequency theleading term is the same as in Fisher Information with constant weight. Also note that the ratedepends on the distance between γ and α and when γ → α the only first terms remains. Weighted inequalities
Recall that Z θ ∈ R d is the family of RV with PDF f θ where θ ∈ Θ ⊂ R m is the vector ofparameters of PDF f θ . Let φ ( z , θ, γ ) be the continuous positive weight function defined in(3.32), I φ ( θ ) be the weighted Fisher information ( m × m ) matrix given in (1.16) and g ( θ ) bethe weighted expectation given in (1.13). Let V φθ ( Z ) be the weighted covariance matrix of RV Z θ V φθ ( Z ) = E φθ (cid:2) ( Z − e ( θ ))( Z − e ( θ )) T (cid:3) . (3.31)We also assume that in (1.13) and (3.33) differentiation with respect to the parameters up toorder to be considered under the sign of the integration is valid. So, the equality (4.1) (andanalogous) holds. A sufficient condition for this is that the integrand after the operation ofdifferentiation η ( θ ) is bounded by an integrable function χ which does not depend on θ | η ( θ ) | ≤ χ, i.e. the integral converges uniformly in θ .In the following sections we consider the special class of weight functions which can berepresented in the following form: φ ( z , θ, γ ) = 1 κ ( θ, γ ) ˜ φ ( z , γ ) . (3.32)Here κ ( θ, γ ) ∈ C k where C k is the family of function with continuous derivatives up to order k ( k will be specified below), and κ ( θ, γ ) is found from the normalizing condition Z R d φ ( z , θ, γ ) f ( z )d z = 1 (3.33)as before. Note that the condition (3.33) can be rewritten in the following form Z R d ˜ φ ( z , γ ) f ( θ, z )d z = κ ( θ, γ ) (3.34)where ˜ φ ( z , γ ) is a function that have a sharp peak at the point γ and does not depend on θ .In the Bayesian framework we consider RV Z ( n ) α with a PDF f ( n ) = f ( n ) α given in (1.2)assuming that x = x ( n ) = ⌊ αn ⌋ , considered in the Bayesian problem stated above [ ? ]. Theexplicit asymptotic expansions for lower bound are obtained in cases of the following weightfunctions: φ ( n )1 ( p ) = 1 κ ( α, γ ) p γ (1 − p ) − γ , (3.35)20 ( n )2 ( p ) = 1 κ ( α, γ ) p γ √ n (1 − p ) (1 − γ ) √ n , (3.36) φ ( n )3 ( p ) = 1 κ ( α, γ ) p γn (1 − p ) (1 − γ ) n (3.37)where κ i ( α, γ ), i = 1 , , jf ( j ) = ∂ j f∂θ j . Recall ψ (0) ( x ) = ψ ( x ) and by ψ (1) ( x ) the digamma function and its first derivative respec-tively ψ ( n ) ( x ) = d n+1 d x n +1 log (Γ( x )) (3.38)where Γ( x ) is the Gamma-function. In further calculations the asymptotic of these functionsfor x → ∞ will be used [9, ψ ( x ) = log( x ) − x + O (cid:18) x (cid:19) as x → ∞ , (3.39) ψ (1) ( x ) = 1 x + 12 x + O (cid:18) x (cid:19) as x → ∞ . (3.40) Theorem 7. (Weighted Rao-Cram´er inequality).
Assume that ∂g ( θ ) ∂θ = Z R d z ∂∂θ [ f θ ( z ) φ ( z , θ, γ )] d z . (4.1) Note that (4.1) holds if integral in its RHS converges uniformly in θ . Then the followinginequality for weighted covariance matrix V φθ ( Z ) holds V φθ ( Z ) ≥ (cid:18) ∂g ( θ ) ∂θ − κ ′ ( θ, γ ) κ ( θ, γ ) ( e ( θ ) − g ( θ )) (cid:19) I φ ( θ ) − (cid:18) ∂g ( θ ) ∂θ − κ ′ ( θ, γ ) κ ( θ, γ ) ( e ( θ ) − g ( θ )) (cid:19) T . (4.2) Proof.
Consider the following integral g ( θ ) ≡ Z R d z φ ( z , θ, γ ) f θ ( θ, z )d z . (4.3)Differentiating both sides in (4.3) and in (3.34) with respect to θ and multiplying the latterone by e ( θ ) defined in (1.14) Z z φ ( z , θ, γ ) ∂f θ ∂θ d z − κ ′ ( θ, γ ) κ ( θ, γ ) Z z ˜ φ ( z , γ ) f θ ( θ, z )d z = ∂g ( θ ) ∂θ , (4.4) e ( θ ) Z φ ( z , θ, γ ) ∂f θ ∂θ d z = κ ′ ( θ, γ ) κ ( θ, γ ) e ( θ ) . (4.5)21ubtracting (4.4) from (4.5), Z ( z − e ( θ )) φ ( z , θ, γ ) ∂f θ ∂θ dp = ∂g ( θ ) ∂θ − κ ′ ( θ, γ ) κ ( θ, γ ) ( e ( θ ) − g ( θ )) . Multiplying and dividing by √ f θ , multiplying by conjugate vector and applying Cauchy-Schwarz inequality we get V φθ ( Z ) ≥ (cid:18) ∂g ( θ ) ∂θ − κ ′ ( θ, γ ) κ ( θ, γ ) ( e ( θ ) − g ( θ )) (cid:19) I φ ( θ ) − (cid:18) ∂g ( θ ) ∂θ − κ ′ ( θ, γ ) κ ( θ, γ ) ( e ( θ ) − g ( θ )) (cid:19) T (4.6)where I φ = ( θ ) is the ( m × m ) Fisher Information matrix defined in (1.16). Theorem 8.
Let Z ( n ) α be a RV with a PDF f ( n ) α given in (1.2) assuming that x = ⌊ αn ⌋ where < α < . Then (a) When weight function φ ( p ) = φ ( p ) is given in (3.35) V φ ( Z α ) ≥ α (1 − α ) n + 1 − α + 18 α + 2 γ − αγ + 2 γ n + O (cid:18) n / (cid:19) . (4.7) (b) When weight function φ ( p ) = φ ( p ) is given in (3.36) V φ ( Z α ) ≥ α (1 − α ) + ( α − γ ) n + − α + α + γ + 2 αγ − γ n / + O (cid:18) n (cid:19) . (4.8) (c) When weight function φ ( p ) = φ ( p ) is given in (3.37) V φ ( Z α ) ≥ ( α − γ ) C ( α, γ ) 1 n + O (cid:18) n / (cid:19) (4.9) where C is a constant which depends only on α and γ and given explicitly in (4.30).Proof. (a) Consider the weight function φ ( n )1 ( p ) = 1 κ ( α, γ ) p γ (1 − p ) − γ (4.10)where κ ( α, γ ) is found from the normalizing condition (3.33). Thus,1 κ ( α, γ ) = Γ( x + 1)Γ( n − x + 1)Γ( n + 3)Γ( x + γ + 1)Γ( n − x + 2 − γ )Γ( n + 2) . Note that the normalizing constant depends on n , but the remainder does not contain n and α . For a given weight function (4.10) the Fisher information equals: I φ ( f ( n ) α ) = n (cid:0) ψ (1) ( x + γ + 1) + ψ (1) ( n − x + 1 + 1 − γ ) (cid:1) ++ n (cid:2) ( ψ ( x + γ + 1) − ψ ( x + 1)) + ( ψ ( n − x + 1 + 1 − γ ) − ψ ( n − x + 1)) (cid:3) +2 n [( ψ ( n − x + 1) − ψ ( n − x + 1 − γ + 1)) ( ψ ( x + γ + 1) − ψ ( x + 1))] . (4.11)22or the weight function (4.10), integral in (4.3) can be found explicitly Z pφ ( n )1 f ( n ) α d p = Γ( n + 3)Γ( x + γ + 1)Γ( n − x − γ + 2) Z p x + γ +1 (1 − p ) n − x +1 − γ d p = Γ( n + 3)Γ( x + γ + 2)Γ( n + 4)Γ( x + γ + 1) = g ( α ) . (4.12)Then ∂g ( α ) ∂α = n Γ( n + 3)Γ( x + γ + 2)Γ( n + 4)Γ( x + γ + 1) ( ψ ( x + γ + 2) − ψ ( x + γ + 1)) . (4.13)Differentiating κ ( α, γ ) we obtain that: κ ′ ( α, γ ) κ ( α, γ ) = n ( ψ ( n − x + 1) − ψ ( n − x + 1 − γ + 1) + ψ ( x + γ + 1) − ψ ( x + 1)) . (4.14)Also e ( α ) = Γ( n + 2)Γ( x + 1)Γ( n − x + 1) Z p x +1 (1 − p ) n − x d p = Γ( n + 2)Γ( x + 2)Γ( n + 3)Γ( x + 1) . (4.15)Plugging in (4.11),(4.12),(4.13),(4.14) and (4.15) in (4.2) we get V φ ( Z ( n ) α ) ≥ α (1 − α ) n + 1 − α + 18 α + 2 γ − αγ + 2 γ n + O (cid:18) n / (cid:19) . (4.16) (b) Consider the weight function φ ( n )2 ( p ) = 1 κ ( α, γ ) p γ √ n (1 − p ) (1 − γ ) √ n (4.17)where κ ( α, γ ) is found from the normalizing condition (3.33),1 κ ( α, γ ) = Γ( x + 1)Γ( n − x + 1)Γ( n + 2 + √ n )Γ( x + γ √ n + 1)Γ( n − x + 1 + √ n − γ √ n )Γ( n + 2) . Note that the normalizing constant depends on n as well as the remainder. For a given weightfunction (5.4) the Fisher information equals: I φ ( f ( n ) α ) = n (cid:0) ψ (1) ( x + z + 1) + ψ (1) ( n − x + 1 + √ n − z ) (cid:1) ++ n h ( ψ ( x + z + 1) − ψ ( x + 1)) + (cid:0) ψ ( n − x + 1 + √ n − z ) − ψ ( n − x + 1) (cid:1) i +2 n (cid:2)(cid:0) ψ ( n − x + 1) − ψ ( n − x + √ n − z + 1) (cid:1) ( ψ ( x + z + 1) − ψ ( x + 1)) (cid:3) (4.18)where z = γ √ n .For the weight function (5.4), integral in (4.3) equals Z pφ ( n )2 f ( n ) α d p = Γ( n + √ n + 2)Γ( x + γ √ n + 2)Γ( n + √ n + 3)Γ( x + γ √ n + 1) = g ( α ) . (4.19)Then ∂g ( α ) ∂α = n Γ( n + √ n + 2)Γ( x + γ √ n + 2)Γ( n + √ n + 3)Γ( x + γ √ n + 1) (cid:0) ψ ( x + γ √ n + 2) − ψ ( x + γ √ n + 1) (cid:1) . (4.20)23ifferentiating κ ( α, γ ) we obtain κ ′ ( α, γ ) κ ( α, γ ) = n (cid:0) ψ ( n − x + 1) − ψ ( n − x + 1 − γ √ n + 1) + ψ ( x + γ √ n + 1) − ψ ( x + 1) (cid:1) . (4.21)Plugging in (4.18),(4.19),(4.20),(4.21) and (4.15) in (4.2) we get V φ ( Z ( n ) α ) ≥ α (1 − α ) + ( α − γ ) n + − α + α + γ + 2 αγ − γ n / + O (cid:18) n (cid:19) . (4.22) (c) Consider the weight function φ ( n )3 ( p ) = 1 κ ( α, γ ) p γn (1 − p ) (1 − γ ) n (4.23)where κ ( α, γ ) is found from the normalizing condition (3.33):1 κ ( α, γ ) = Γ( x + 1)Γ( n − x + 1)Γ(2 n + 2)Γ( x + γn + 1)Γ(2 n − x + 1 − γn )Γ( n + 2) . Note that the normalizing constant depends on n as well as the remainder. Let y = γn thenthe Fisher Information in this case equals: I φ ( f ( n ) α ) = n (cid:0) ψ (1) ( x + y + 1) + ψ (1) (2 n − x + 1 − y ) (cid:1) ++ n (cid:2) ( ψ ( x + y + 1) − ψ ( x + 1)) + ( ψ (2 n − x + 1 − y ) − ψ ( n − x + 1)) (cid:3) +2 n [( ψ ( n − x + 1) − ψ (2 n − x − y + 1)) ( ψ ( x + y + 1) − ψ ( x + 1))] . (4.24)Note that unlike two cases above the differences in brackets do not tend to zero, i.e., ψ ( x + y + 1) − ψ ( x + 1) = log (cid:18) α + γα (cid:19) − γ α ( α + γ ) n + O (cid:18) n (cid:19) . Using (3.39) and (3.40), we obtain I φ ( f ( n ) α ) = (cid:18) log (1 − α )( α + γ ) α (2 − α − γ ) (cid:19) n + C ( α, γ ) n + C ( α, γ ) + O (cid:18) n (cid:19) (4.25)where C ( α, γ ) and C ( α, γ ) are constants that depend on α and γ and can be found explicitly C = 1 α + γ + 12 − α − γ − log α (2 − α − γ )(1 − α )( α + γ ) (cid:18) γα ( α + γ ) − − γ (1 − α )(2 − α − γ ) (cid:19) , (4.26) C = (cid:18) − α ) − − α + γ ) − c α ( α + γ ) − γ α ( α + γ ) (cid:19) log α (2 − α − γ )(1 − α )( α + γ ) + − α − α ( − γ ) + ( − γ ) γ − α ( − γ ) γ + α ( −
27 + 20 γ ) + 4 α (6 − γ − γ + 2 γ )4( − α ) α ( − α + γ ) ( α + γ ) + α ( − γ + 22 γ − γ + 4 γ )4( − α ) α ( − α + γ ) ( α + γ ) . (4.27)Also note that g ( α ) = Z pφ ( n )3 f ( n ) α d p = Γ(2 n + 2)Γ( x + y + 2)Γ(2 n + 3)Γ( x + y + 1) = α + γ − α − γ n + O (cid:18) n (cid:19) . (4.28)24t is easy to see that in this case g ( α ) has different asymptotic comparing two cases above,so g ( α ) − E ( Z α ) does not tend to zero as before. Proceeding with the same computations asbefore we obtain V φ ( Z ( n ) α ) ≥ ( α − γ ) C ( α, γ ) 1 n + O (cid:18) n / (cid:19) (4.29)where C ( α, γ ) is a constant depending on α and γ and can be found explicitly C = ( α − γ )48( − α ) α ( − α + γ )( α + γ )(log(1 − α ) − log( α ) − log(2 − α − γ ) + log( α + γ )) × ( − α + 84 α − α − αγ + 132 α γ − α γ + 24 γ − αγ − α γ − γ + 24 αγ +(( − α )(39 α − − γ ) γ − α (50 + 9 γ ) + α ( −
56 + 146 γ − γ ) + αγ ( −
44 + 194 γ − γ )))(log(1 − α ) − log( α )) − +log α + γ − α − γ (56 α − α − α + 39 α + 44 αγ − α γ + 155 α γ − α γ − γ − αγ +329 α γ − α γ + 2 γ + 85 αγ − α γ )) . (4.30) Theorem 9. (Weighted Bhattacharyya inequality, uniparametric case).(a)
Let θ be a scalar parameter, τ ( θ ) be a preassigned scalar function of parameter θ . Anunbiased estimator of τ ( θ ) is a scalar function T ( Z ) such that e ( θ ) = E θ [ T ( Z )] = τ ( θ ) . (5.1) Consider the weight function that satisfies the condition (3.33). Recall g ( θ ) ≡ Z R d T ( z ) φ ( z , θ, γ ) f θ ( z )d z . (5.2) Assume that integrands in (5.2) and (3.33) converge uniformly in θ after operation of differen-tiation up to order ν . Then the following inequality for the weighted variance of T holds V φθ ( T ) ≥ ν X i,j =1 (cid:0) g ( i ) ( θ ) − Q i + τ Q i (cid:1) (cid:0) g ( j ) ( θ ) − Q j + τ Q j (cid:1) J φij (5.3) where Q ji , i = 1 , are given in (5.13) and (5.15) respectively and J φij are the elements of thematrix J φ defined in (5.11). (b) Consider RV Z ( n ) α with PDF f ( n ) α given in (1.2) with x = ⌊ αn ⌋ where < α < . When ν = 2 , θ = α , T ( Z ) = Z ( n ) α for the weight function φ ( n ) ( p ) = 1 κ ( α, γ ) p γ √ n (1 − p ) (1 − γ ) √ n , (5.4) inequality (5.3) takes the following form V φθ ( Z ( n ) α ) ≥ C n + C n / + O (cid:18) n (cid:19) (5.5) where C , C are some constants that depend on α and γ that given explicitly in (5.21). roof. (a) Consider the function R ν ( z , θ ): R ν ( Z ; θ ) = T ( Z ) − τ ( θ ) − ν X i =1 λ i f ( i ) θ f − θ (5.6)where λ i are undefined parameters. It is easy to note that E [ R ν ( Z ; θ )] = 0 . (5.7)Consider the weighted variance given in (3.32) of R ν . Because of (5.7) it can be written in thefollowing form V φθ ( R ν ) = Z R d T ( z ) − τ ( θ ) − ν X i =1 λ i f ( i ) θ f − θ ! φ ( z , θ, γ ) f θ d z . (5.8)By the conditions of Theorem the differentiation is justified and leads to the following condition: Z R d T ( z ) − τ ( θ ) − ν X i =1 λ ⋆i f ( i ) θ f − θ ! φf ( j ) θ d z = 0 . (5.9)It can be rewritten as ν X i =1 λ ⋆i Z R d f ( i ) θ f − θ f ( j ) θ φ d z = Z R d T ( z ) φf ( j ) θ d z − τ ( θ ) Z R d φf ( j ) θ d z . (5.10)Let I φθ be the ν × ν matrix which elements are I φij = Z R d f ( i ) θ f − θ f ( j ) θ φ d z i, j ≤ ν . Let J φθ = (cid:16) I φθ (cid:17) − (5.11)be the inverse ν × ν matrix and elements of this matrix are J φij .Note that in the case i = j = 1, I φ equals to the weighted Fisher information given in(1.16).Consider integrals in RHS of (5.10) separately. Firstly, Z R d T ( z ) ˜ φ (cid:18) κ ( θ, γ ) f θ (cid:19) ( j ) d z = g ( j ) ( θ ) , Z R d T ( z ) ˜ φ " j − X k =0 (cid:18) jk (cid:19) (cid:18) κ ( θ, γ ) (cid:19) ( j − k ) f ( k ) θ d z + Z R d T ( z ) φf ( j ) θ d z = g ( j ) ( θ ) . Thus, Z R d T ( z ) φf ( j ) θ d z = g ( j ) ( θ ) − Q j (5.12)where Q j = Z R d T ( z ) ˜ φ " j − X k =0 (cid:18) jk (cid:19) (cid:18) κ ( θ, γ ) (cid:19) ( j − k ) f ( k ) θ d z . (5.13)26n the analogous way from the condition (3.33) the following equality can be derived: Z R d φf ( j ) θ d z = − Q j (5.14)where Q j = Z R d ˜ φ " j − X k =0 (cid:18) jk (cid:19) (cid:18) κ ( θ, γ ) (cid:19) ( j − k ) f ( k ) θ d z . (5.15)So, (5.10) takes the form g ( j ) ( θ ) = ν X i =1 λ ⋆i I φij + Q j − τ Q j (5.16)and λ ⋆i = ν X j =1 (cid:0) g ( j ) ( θ ) − Q j + τ Q j (cid:1) J φij . (5.17)Thus, we obtain the following equality V ( R ∗ ν ) = V φθ ( T ) − ν X i,j =1 (cid:0) g ( i ) ( θ ) − Q i + τ Q i (cid:1) (cid:0) g ( j ) ( θ ) − Q j + τ Q j (cid:1) J φij . (5.18)The non-negativity of variance implies the lower bound for weighted variance of T given in(5.3). Remark 3.
Note that this inequality includes the weighted version of Rao-Cram´er inequality.It appears when τ ( θ ) = e ( θ ) , θ = α , g ( θ ) = g ( α ) , T ( Z ) = Z and i = j = ν = 1 . In thisparticular case I φ = I φ ( θ ) = Z R d ( f ′ θ ) f − θ φ d z , Z R d φf ( j ) θ d z = κ ′ ( θ, γ ) κ ( θ, γ ) and Z R d T ( z ) φf (1) θ d z = g ′ ( θ ) + κ ′ ( θ, γ ) κ ( θ, γ ) g ( θ ) . Thus, we obtain the inequality given in (4.2). (b)
The lower bound in (5.3) takes the following form: (cid:0) g (1) ( θ ) − Q + τ Q (cid:1) (cid:16) J φ + J φ (cid:17) (cid:0) g (2) ( θ ) − Q + τ Q (cid:1) ++ (cid:0) g (2) ( θ ) − Q + τ Q (cid:1) J φ + (cid:0) g (1) ( θ ) − Q + τ Q (cid:1) J φ (5.19)where J φij are ij th elements of the matrix J φθ defined in (5.11). Moreover, the asymptotic of I φ is given above. Compute the asymptotic of other terms. I φ = Z R f (1) f − f (2) φ d z = L n / + L n + L √ n + L + O (cid:18) √ n (cid:19) L i i = 1 , , , α and γ , but have very large construction, L = α + α ( − γ ) + α (2 − γ ) γ + γ (1 − α ) α .L = − α − γ + α ( − c ) + 2 α (5 − γ + γ )2(1 − α ) α ++ 2 αγ ( − γ + 3 γ ) + α ( − γ + 24 γ − γ )2(1 − α ) α L = − γ + 24 α ( − γ ) + α (13 + 24 γ − γ ) − α γ ( −
109 + 72 γ + 36 γ )12( − α ) α αγ ( −
44 + 33 γ + 72 γ ) + α (44 − γ + 492 γ − γ ) + 6 α ( − γ − γ − γ + 36 γ )12( − α ) α L = 16 α − γ + α (40 γ − − α ( −
41 + 14 γ + 26 γ ) + 2 αγ ( −
12 + 10 γ + 35 γ )8(1 − α ) α ++ α ( −
161 + 118 γ + 136 γ + 152 γ ) + α γ (24 − γ + 157 γ − γ )8(1 − α ) α +2 α (66 − γ + 97 γ − γ + 40 γ + α ( −
52 + 148 γ + 75 γ + 160 γ + 400 γ − γ )8(1 − α ) α ++ 4 α (2 − γ − γ + 4 γ − γ + 60 γ + 20 γ )8(1 − α ) α The asymptotic of I φ takes the form I φ = Z R ( f ′′ ) f − φ d p = L n + L n / + L n + L √ n + L + O (cid:18) √ n (cid:19) where L i are some constants again that can be found explicitly and depend on α and γ . Inorder to compute I φ , one need to compute the integral of the following form Z log(1 − p ) i log( p ) j p A ( α,γ ) n + A ( α,γ ) √ n (1 − p ) A ( α,γ ) n + A ( α,γ ) √ n d p for i = 1 , , , j = 1 , , , i = 1 , j = 1 , i and j by integration by parts. The only problem withderiving the exact coefficients is the computational cost, so we proceed in terms of constants L i . In order to use the same notation we will write I φ in the following form: I φ = Z R ( f ′ ) f − φ d p = L n + L √ n + L + O (cid:18) √ n (cid:19) where coefficients L , L , L are found above.Other terms in (5.19) can be computed explicitly. Using the notations of previous sectionwe write Q = − κ ′ ( α, γ ) κ ( α, γ ) g ( α ) , = − (cid:18) κ ( α, γ ) (cid:19) ′′ Z p ˜ φf dp − κ ′ ( α, γ ) κ ( α, γ ) (cid:0) g (1) − Q (cid:1) ,Q = − κ ′ ( α, γ ) κ ( α, γ ) , and Q = (cid:18) κ ( α, γ ) (cid:19) ′′ Z ˜ φf dp − κ ′ ( α, γ ) κ ( α, γ ) Q where g (1) is given in (4.20). Thus, we obtain the following asymptotic of lower bound for theweighted variance in stated Bayesian problem: V φ ( T ) ≥ C n + C n / + O (cid:18) n (cid:19) (5.20)where C and C are some constants that depend on α and γ and can be found explicitly, butthey also have too cumbersome construction. As an example C is given below: C = 2(( α − γ ) + α (1 − α ))( − α + α + 2 αγ + α γ − αγ + γ ) L (1 − α ) α ( L − L L )+ ( − α + α + 2 αγ + α γ − αγ + γ ) L (1 − α ) α ( − L + L L ) + (cid:16) ( α − γ ) (1 − α ) α (cid:17) L − L + L L . (5.21) Remark 4.
Note that in the case α = γ the first and second term in C vanish. Also one caneasily check that L = 0 in this case. So, because of L = α (1 − α ) we have C = 1 L = α (1 − α ) . Thus, the main term of asymptotic is exactly the same as was obtained above in the standardCram´er-Rao case.
Theorem 10. (Weighted Bhattacharyya inequality, multiparametric case).
Let θ ∈ Θ ⊂ R m be a vector of parameters, τ ( θ ) = ( τ ( θ ) , . . . , τ l ( θ )) T ∈ R l be the preassigned vectorfunction of parameter θ and T ( Z ) be an unbiased estimate of τ ( θ ) : e ( θ ) = E θ ( T ) = Z R d T ( z ) f θ ( z )d z = τ ( θ ) . Consider the weight function φ ( z , θ, γ ) such that the condition (3.33) holds. Assume that thefollowing positively definite matrix exists I φ = E φθ [ ββ T ] (5.22) where β = ( β ( θ ) , . . . , β r ( θ )) T is r -dimensional RV, components of which are all possible expressions of the following form f θ ( Z ) ∂ i ,...,i m ∂θ i , . . . ∂θ i m m f θ ( Z ) (5.23)29 here (1 ≤ i + . . . + i m ≤ s ) and r is the total number of all these expressions.Let F φ be the ( r × l ) matrix which rows has the following form Z R d ( T ( z ) − τ ( θ )) φ ( z , θ, γ ) ∂ i ,...,i m ∂θ i , . . . ∂θ i m m f θ ( z )d z (5.24) numbered in the same order as expressions (5.23). Assume that integrands in (5.24) and (3.33)converge uniformly in θ after the operation of differentiation. Then the following inequality forweighted variance of T holds V φθ ( T ) ≥ ( F φ ) T I φ ( θ ) − F φ . (5.25) Remark 5.
Here and below for ( d × d ) matrices of the same dimension d , A and B , theinequality A ≥ B means that C = A − B is a non-negatively definite matrix.Proof. Note that elements of matrix F φ can be found from the condition (3.33).Consider one dimensional RV δ = [( T − τ ) − β ⋆ ( I φ ) − F φ ] y where y T = ( y , . . . , y l ) ∈ R l is a non-random vector. It is easy to see that E θ ( δ ) = 0. Takingweighted expectation of both sides in equality δ = y T (cid:2) ( T − τ )( T − τ ) ⋆ − T − τ ) β ⋆ ( I φ ) − F φ + ( F φ ) ⋆ ( I φ ) − ββ ∗ ( I φ ) − F φ (cid:3) y, (5.26)for any y we obtain E φθ ( δ ) = y T h V φθ ( T ) − ( F φ ) T ( I φ ) − F φ i y. (5.27)The non-negativity of variance implies the multi-parametric version of Bhattacharyya inequal-ity, given in (5.25). One can easily see that in uni-parametric and 1D case this inequalityequivalent to the weighted Cram´er-Rao inequality. Theorem 11. (Weighted Kullback inequality)(a)
For given PDFs f , gK φ ( f || g ) ≥ Ψ ∗ ˜ g ( µ φ ( ˜ f )) = sup t (cid:2) h t, µ φ ( f ) i + log C ( g ) − log ¯ M g ( t ) (cid:3) (6.1) where ¯ M g ( t ) = Z R d φ ( z ) e h t, z i g ( z )d z (6.2) is a weighted moment generating function, t ∈ R d and µ φ ( f ) = E f [ Z φ ( Z )] E f [ φ ( Z )] ∈ R d s the classical expectation of ˜ f . (b) Let Z ( n ) α and Z ( n ) ρ be RVs with PDF f ( n ) α given in (1.2) with x = ⌊ αn ⌋ and with PDF f ( n ) ρ given in (1.2) with x = ⌊ ρn ⌋ respectively where < α, ρ < and weight function φ ( n ) ( p ) = 1 κ ( ρ, γ ) p γ √ n (1 − p ) (1 − γ ) √ n , (6.3) where κ ( ρ, γ ) is found from normalization condition Z φ ( n ) f ( n ) ρ d p = 1 . (6.4) Denote ǫ = α − ρ then K φ ( f ( n ) α || f ( n ) ρ ) ≥ ǫ (1 + √ n − n ) − α ) αn + O (1) . As ǫ → , ∃ lim ǫ → ǫ K φ ( f ( n ) α || f ( n ) ρ ) = 12 I ( ˜ f α ) ≥ n α (1 − α ) − √ nα (1 − α ) + O (1) (6.5) where I ( ˜ f α ) is the standard Fisher information.Proof. (a) The inequality (6.1) is proved in [14]. (b)
Firstly, note that by (6.4): log (cid:0) C ( f ( n ) ρ ) (cid:1) = 0 . The weighted generating function of RV Z ( n ) ρ with PDF f ( n ) ρ equals:¯ M f ( n ) ρ ( t ) = Z φ ( n ) e tp f ( n ) ρ d p = F ( ρn + γ √ n + 1 , n + √ n + 2; t )= 1 + ∞ X k =1 t k k ! k − Y j =0 ρn + γ √ n + 1 + jn + √ n + 2 + j where F ( x, y ; z ) is the confluent hypergeometric function.For large n , the expression for weighted generating function can be written in the followingway [10, formula 12]: ¯ M f ( n ) ρ ( t ) = 1 + ∞ X k =1 t k k ! k − Y j =0 ρn + γ √ n + 1 + jn + √ n + 2 + j == ∞ X k =0 t k k ! (cid:18) ρ k − k ( ρ k − ρ k − γ ) 1 √ n + ρ k − k ( ρ − ρ − γ + ρk − ργk + γ k )2 n + O (cid:18) n / (cid:19)(cid:19) = e ρt (cid:18) − ( ρ − γ ) t √ n + 2(1 − ρ − γ ) t + ( ρ − ργ + γ ) t n + O (cid:18) n / (cid:19)(cid:19) . Thus, we have thatlog ¯ M f ( n ) ρ ( t ) = ρt + log (cid:18) − ( ρ − γ ) t √ n + 2(1 − ρ − γ ) t + ( ρ − ργ + γ ) t n + O (cid:18) n / (cid:19)(cid:19) =31 ρt − ( ρ − γ ) t √ n + (1 − ρ − γ ) t + ρt (1 − ρ ) n + O (cid:18) n / (cid:19) . The first term in (6.1) for PDF f ( n ) α and weight function φ ( n ) takes the following form µ φ ( f ( n ) α ) = αn + γ √ n + 2 n + √ n + 2 = α + ( γ − α ) 1 √ n + 1 − α − γn + O (cid:18) n / (cid:19) . ThenΨ ∗ f ρ ( µ φ ( ˜ f α )) = sup t (cid:20) ( α − ρ ) t − ( α − ρ ) t √ n + ρ − αn t − (1 − ρ ) ρ n t + O (cid:18) n / (cid:19)(cid:21) . (6.6)Finding supremum of the expression above, we obtain τ = ( α − ρ ) ( n − − √ n )(1 − α ) α + O (cid:18) n / (cid:19) . So Ψ ∗ f ρ ( µ φ ( ˜ f α )) = ( α − ρ ) (1 + √ n − n ) − α ) αn + O (1) . (6.7)Denote ǫ = α − ρ . When ǫ → ǫ Ψ ∗ f ρ ( µ φ ( ˜ f α )) = n α (1 − α ) − √ nα (1 − α ) + O (1) . Thus ∃ lim ǫ → ǫ K φ ( f ( n ) α || f ( n ) ρ ) = 12 I ( ˜ f α ) ≥ n α (1 − α ) − √ nα (1 − α ) + O (1) (6.8)which completes the proof of Theorem 5. Acknowledgement
The article was prepared within the framework of a subsidy granted to the HSE by the Govern-ment of the Russian Federation for the implementation of the Global Competitiveness Program.
References [1]
M. Belis, S. Guiasu , A Quantitative and qualitative measure of information in cyberneticsystems (1968), IEEE Trans. on Inf. Th.,14, 593-594[2]
A. Bhattacharyya , On some analogues of the amount of information and their use instatistical estimation (1946), Sankhya, 8, 1.[3]
L. Bolshev , A refinement of Rao-Cram´er inequality (1961), Th. Prob. Appl., 6, No. 3,319-326[4]
A. Clim , Weighted entropy with application , Analele Universitatii Bucurestica Matematica(2008), Anul LVII, 223-231. 325]
T.M. Cover, Thomas J.M. , Elements of Information Theory , NY: Basic Books (2006)[6]
R.L. Dobrushin , Passing to the limit under the sign of the information and entropy ,Theory Prob.Appl., (1960), 29-37[7]
R. Dudley , Lecture notes ”The Delta-Method and Asymptotics of some Estimators”
Lec-ture notes on ”Topics in Statistics: Nonparametrics and Robustness”, MIT, 2005[8]
M.V. Fedoruk , Saddle Point Method , Moscow: Nauka, 1977, 162–173[9]
I.S. Gradshteyn, I.M. Ryzhik , Table of Integrals, Series, and Product (2007), Elsevier,page 552[10]
M. Hapaev , Asymptotic expansions of hypergeometric and confluent hypergeometric func-tions (1961), Izv. Vyssh. Uchebn. Zaved. Mat., 1961, No 5, 98-101[11]
Yu.V. Prokhorov , Moments, method of (in probability theory) (2001), Encyclopedia ofMathematics, [12]
M. Kelbert, Yu. Suhov , Continuity of mutual entropy in the large signal-to-noise ratiolimit , Stochastic Analysis (2010), Berlin: Springer, 281–299[13]
M. Kelbert, Yu. Suhov , Information Theory and Coding by Example , Cambridge:Cambridge University Press, 2013[14]
M. Kelbert, Yu.Suhov, S.Y. Sekeh , Weighted Fisher Information inequality (2015),arXiv[15]
Yu. Suhov, S. Y. Sekeh, M. Kelbert , Entropy-power inequality for weighted entropy ,arXiv:1502.02188, 2015[16]