Entropy Maximization with Linear Constraints: The Uniqueness of the Shannon Entropy
aa r X i v : . [ c ond - m a t . s t a t - m ec h ] A p r Entropy Maximization with Linear Constraints: The Uniqueness of the ShannonEntropy Thomas Oikonomou ∗ and G. Baris Bagci Department of Physics, School of Science and Technology,Nazarbayev University, Astana 010000, Kazakhstan and Department of Materials Science and Nanotechnology Engineering,TOBB University of Economics and Technology, 06560 Ankara, Turkey (Dated: May 1, 2018)Within a framework of utmost generality, we show that the entropy maximization procedurewith linear constraints uniquely leads to the Shannon-Boltzmann-Gibbs entropy. Therefore, theuse of this procedure with linear constraints should not be extended to the generalized entropiesintroduced recently. In passing, it is remarked how the forceful use of the entropy maximization forthe Tsallis and R´enyi entropies implies either the Shannon limit of these entropies or self-referentialcontradictions. Finally, we note that the utilization of the entropy maximization procedure withdifferent averaging schemes is beyond the scope of this work.
PACS numbers: 05.20.-y; 05.20.Dd; 05.20.Gg; 51.30.+iKeywords: Entropy Maximization, linear constraints, Shannon-Boltzmann-Gibbs entropy, Tsallis entropy;R´enyi entropy
I. INTRODUCTION
Since Jaynes [1], the entropy maximization has been a widely used tool in many different fields benefiting fromShannon entropy. Although the initial aim of Jaynes was to derive the equilibrium distribution associated with theShannon entropy subject to the linear constraints, recent progress in the generalized entropies such as Tsallis [2] orR´enyi [3] entropies, to mention but a few, also benefited from the very same entropy maximization procedure withvarious applications [4–17].However, we have recently shown that the entropy maximization with linear constraints does not yield a distributionwhich can be cast into the appropriate form so as to include the partition function when it is used for the generalizedentropies [18]. In other words, the distributions are not of the form p i = f − ( βε i ) / P k f − ( βε k ) ( β being the Lagrangemultiplier of the internal energy constraint and ε i is the energy of the i th micro state) so that the the denominator(i.e. normalization term) could not be identified as the partition function. The sole possibility for such a distributionhas been found to be the one associated with the Shannon entropy.In this work, we do not interest ourselves with the explicit form of the distribution. Instead, in its all generality,we consider the entropy maximization with linear constraints as Jaynes previously did [1] and show that the onlyadmissible entropy expression is the Shannon (or Boltzmann-Gibbs) entropy. Therefore, we point out that the entropymaximization construed `a la Jaynes is suitable only for the Shannon entropy and thereby excludes the use of anygeneralized entropies. II. MAXIMIZATION PROCEDURE REVISITED
The entropy functional with linear constraints reads L ( { p } , α, β, U ) = S ( { p } ) − α " n X i =1 p i − − β " n X i =1 p i ε i − U , (1)where S denotes the entropy measure and U is the internal energy. As usual, p i is the probability of occurrence for the i th micro state and ( α, β ) are the respective Lagrange multipliers. Considering the maximization functional in Eq. ∗ Electronic address: [email protected] (1) and using the definition ∂S ( { p } ) /∂p i =: f ( p i ), the maximization procedure yields the following n + 3 equations[19] f ( p i ) = α + βε i = x i , (2a)1 = n X i =1 p i , (2b) U = n X i =1 p i ε i , (2c) β = ∂S∂U . (2d)Taking the derivative of Eq. (2b) with respect to β , we have0 = n X i =1 ∂p i ∂β = n X i =1 ∂f − ( α + βε i ) ∂β = n X i =1 ∂f − ( α + βε i ) ∂ ( α + βε i ) ∂ ( α + βε i ) ∂β . (3)Introducing the normalized P i as P i = n X k =1 ∂f − ( x k ) ∂x k ! − ∂f − ( x i ) ∂x i , (4)Eq. (3) yields ∂α∂β = − n X i =1 P i ε i = − e U . (5)The quantity e U is related to U as (combine Eqs. (2)- (5)) e U = U − n X i =1 p i ∂f ( p i ) ∂β . (6)Similarly to Eq. (3), since P i satisfies the normalization condition, we have0 = n X i =1 ∂P i ∂β = n X i =1 ∂P i ∂x i (cid:16) ε i − e U (cid:17) ⇒ e U = n X i =1 ∂P i ∂x i P nk =1 ∂P k ∂x k ε i . (7)Comparing Eqs. (5) and (7) we read n X i =1 Y i ε i = 0 , Y i := P i − ∂P i ∂x i P nk =1 ∂P k ∂x k . (8)The validity of this equation presents us with two cases we inspect below:(i.) The first possibility, assuming Y i = 0, is that the total sum can be equal to zero. Then, applying the m thderivative with respect to β yields n X i =1 ∂ m Y i ∂β m ε i = 0 (9)This is a m × n homogeneous system of the form A ij X i = 0 ( i = 1 , . . . , n , j = 1 , . . . , m ) to be solved with A ij ≡ ∂ j Y j ∂β j and X i ≡ ε i . Then, we know from linear algebra that the former system has either the zero solution, i.e., X i = 0,or a set of infinite solutions with A ij = A iℓ . The zero solution is apparently not an option. Thus, we have infinitesolutions yielding ∂ j ∂β j Y i = ∂ ℓ ∂β ℓ Y i ⇒ Y i = ce β . Summing over all i ’s and using the normalization condition we seethat the former relation is only possible when c = 0 ⇒ Y i = 0, which is a contradiction to our initial assumption.(ii.) The second and only possibility that is left is Y i = 0 . (10)Then, substituting the definition of Y i in Eq. (8) into the former equality, we have ∂∂x i ln( P i ) = n X k =1 ∂P k ∂x k (11)Since the l.h.s. and r.h.s. have an open and a closed i dependence (or equivalently, the former depends and thelatter does not depend on i ), respectively, the only option satisfying this relation is ln( P i ) ∼ x i so that the derivativeeliminates the i -dependence. Thus, the only option is that the measure P i has to be the inverse logarithmic function,i.e., P i = exp (cid:16) ± x i k (cid:17) , (12)where k is merely a constant. By virtue of Eq. (12), we read in Eq. (4) n X k =1 ∂f − ( x k ) ∂x k = exp (cid:16) ∓ x i k (cid:17) ∂f − ( x i ) ∂x i . (13)Then, a similar discussion to Eq. (11) uniquely yields P i = f − ( x i ) = p i , hence f − ( x i ) = exp (cid:16) ± x i k (cid:17) ⇔ f ( p i ) = ± k ln( p i ) . (14)To reiterate, the MaxEnt procedure with linear constraints leads to two distinct, at first glance, probability dis-tribution sets, { p i } and { P i } , respectively. The former is used in the maximization procedure itself and the latterwas deduced from the normalization condition of p i . However, the normalization of P i in turn shows that these twodistribution sets are actually one and the same, P i = p i ⇒ U = e U , exhibiting an exponential behavior with respectto the energy values ε i . III. DETERMINING THE ENTROPY UNIQUELY
We now show how considerations in the previous section uniquely leads to the Shannon-Boltzmann-Gibbs entropy.Integrating Eq. (2d) with respect to U , we have S = βU − Z U d β + C , (15)where C is the integration constant and does not depend on β . Using the mean value constraint in Eq. (2c) theformer equation can be written as S = n X i =1 p i ( βε i ) − Z U d β + C , (16)Taking into account Eqs. (2a) and (12) and then Eqs. (5) and (10), Eq. (16) can be written as S = ± k n X i =1 p i ln( p i ) + C . (17)This is the most general structure of the entropy S satisfying the MaxEnt procedure with linear constraints. Theterm C includes all additive constants. The sign in Eq. (17) depends on whether the entropy S is to be maximizedor minimized (negative or positive sign, respectively). For k = 1 this is identified with the Shannon entropy and for k = k B with the Boltzmann-Gibbs entropy within the information theory and statistical thermodynamics, respectively. IV. CONCLUSIONS
Since the seminal work of Jaynes [1], entropy maximization procedure has been utilized in the literature. However,in the recent decades, this procedure has been used for various entropy definitions such as Tsallis [2] or R´enyi entropies[3], although Jaynes originally used only the Shannon entropy (or Boltzman-Gibbs entropy which differs from Shannonentropy by a multiplicative constant) with linear constraints.Instead of specifying a particular entropy measure right from the beginning, we have considered a very generaltreatment of the entropy maximization in this work and shown that the only entropy measure compatible with theentropy maximization `a la Jaynes is the Shannon entropy if the linear constraints are to be used. In this sense, theprocedure devised by Jaynes is strictly devised for the Shannon entropy. As a matter of fact, this has exactly been thepoint of the well-known Shore-Johnson axioms [20], too. However, we note that we have not used a joint probabilitycomposition rule in above derivation thereby rendering our calculations in essence different from the approach of theShore-Johnson axioms [21].When we consider for example the R´enyi entropy (or Tsallis entropy for that matter) in virtue of Eq. (6), oneobtains 0 = (1 − q ) β ∂ e U∂β . This relation either forces us to use Shannon entropy i.e. setting q = 1 or assuming ∂ e U∂β = ∂U∂β = 0, which leads to a contradiction since ∂p i ∂β = 0, as can be seen in Eq. (2a). Therefore, the use of entropymaximization with linear constraints should not be extended to the uses of the deformed entropies. However, notethat our work is limited to the linear constraints i.e. linear averaging schemes so that other averaging schemes isbeyond the scope of present treatment. Acknowledgments
T.O. acknowledges the state-targeted program “Center of Excellence for Fundamental and AppliedPhysics”(BR05236454) by the Ministry of Education and Science of the Republic of Kazakhstan and ORAU grantentitled “Casimir light as a probe of vacuum fluctuation simplification” (090118FD5350). [1] E.T. Jaynes, Phys. Rev. (1957) 171; (1957) 620.[2] C. Tsallis, J. Stat. Phys. , (1988) 479.[3] A. R´enyi, Probability Theory , (North-Holland) 1970.[4] G.B. Bagci and T. Oikonomou, Phys. Rev. E (2013) 042126.[5] G. Rotundo, Physica A (2014) 296.[6] T. S. Bir´o, G. G. Barnaf¨oldi, and P. V´an, Eur. Phys. J. A (2013) 110.[7] T. S. Bir´o and V. G. Czinner, Phys. Lett. B (2013) 861.[8] Cheuk-Yin Wong, Grzegorz Wilk, Leonardo J.L. Cirto and Constantino Tsallis, Phys. Rev. D (2015) 114027.[9] J. L. Reis Jr., J. Amorim, and A. Dal Pino Jr., Phys. Rev. E (2011) 017401.[10] G. M. Bosyk, S. Zozor, F. Holik, M. Portesi, and P. W. Lamberti, Quantum Information Processing (2016) 3393.[11] G.B. Bagci, R. Sever and C. Tezcan, Mod. Phys. Lett. B (2004) 467; G. B. Bagci, Physica A (2007) 79.[12] M. Campisi and G. B. Bagci, Phys. Lett. A (2007) 11.[13] Th. Oikonomou, Physica A (2007) 155; Th. Oikonomou, A. Provata and U. Tirnakli, Physica A (2008) 2653.[14] L. Marques, J. Cleymans, and A. Deppman, Phys. Rev. D (2015) 054025.[15] G.B. Bagci, T. Oikonomou, Phys. Rev. E (2016) 022112; G. B. Bagci, Physica A (2015) 405.[16] G. C. Yalcin, C. Velarde, A. Robledo, Heliyon (3) (2015) e00045.[17] G. Livadiotis, Nonlin. Processes Geophys. (2018) 77.[18] Th. Oikonomou and G. B. Bagci, Phys. Lett. A (2017) 207.[19] H. Karabulut, Eur. J. Phys. (2006) 709.[20] J. E. Shore and R. W. Johnson, IEEE Transactions on Information Theory IT- , 26 (1980); IT- , 472 (1981); IT-29