[PDF] Exponential Families and MaxEnt Calculations for Entropy Measures of Statistical Physics

Abstract

For a wide range of entropy measures, easy calculation of equilibria is possible using a principle of Game Theoretical Equilibrium related to Jaynes Maximum Entropy Principle. This follows previous work of the author and relates to works of Naudts and, partly, Abe and Bagci.

Full PDF

aa r X i v : . [ c ond - m a t . s t a t - m ec h ] O c t Exponential Families and MaxEnt Calculationsfor Entropy Measures of Statistical Physics

Flemming Topsøe

University of Copenhagen, Institute of Mathematical SciencesUniversitetsparken 5, 2100 Copenhagen, Denmark

Abstract.

For a wide range of entropy measures, easy calculation of equilibria is possible usinga principle of

Game Theoretical Equilibrium related to Jaynes

Maximum Entropy Principle . Thisfollows previous work of the author and relates to Naudts [1], [2], and, partly, Abe and Bagci [3].

Keywords:

Complexity, Game Theoretical Equilibrium, Maximum Entropy, Robustness, Expo-nential Families, Bregman Generator.

PACS:

THE PRINCIPLE OF GAME THEORETICAL EQUILIBRIUM

Consider a discrete alphabet A and probability distributions P , Q , · · · over A . The setof all such distributions is denoted M + ( A ) . A distribution is identiﬁed by its pointprobabilities: P = ( p i ) i ∈ A . A measure of complexity is a map which to each pair ( P , Q ) of distributions assigns a value F ( P , Q ) ∈ [ , ¥ ] such that, for each P ∈ M + ( A ) , theminimal value of F ( P , Q ) with Q ∈ M + ( A ) is assumed on the diagonal, i.e. for Q = P and nowhere else unless F ( P , P ) = ¥ .A preparation is any non-empty subset P ⊆ M + ( A ) . When P is ﬁxed, a consistentdistribution is a distribution in P . The game g = g ( F , P ) has F as objective functionand is the two-person zero-sum game between Player I (“Nature” ), who can choosea strategy P ∈ P , and Player II (“the Physicist” ) who can choose any strategy Q ∈ M + ( A ) . Player I is a maximizer, Player II a minimizer. Thus val I deﬁned by val I = sup P ∈ P inf Q F ( P , Q ) is the Player I-value of the game and, similarly, val II deﬁned byval II = inf Q sup P ∈ P F ( P , Q ) is the Player II-value of the game. Here and below, avariable denoted by Q is understood to vary over all of M + ( A ) .An optimal Player I-strategy is a P ∈ P such that val I = inf Q F ( P , Q ) and an optimalPlayer II-strategy is a Q ∈ M + ( A ) such that val II = sup P ∈ P F ( P , Q ) . By the general minimax inequality , val I ≤ val II . The game is in equilibrium if val I = val II < ¥ .For further information about the game introduced, see [4]. The attempt to locateoptimal strategies for the players and to establish equilibrium for suitable preparationsis taken as a basic principle of statistical physics, the principle of game theoreticalequilibrium (GTE).We introduce F -entropy of P as minimal complexity, i.e. as H ( P ) = inf Q F ( P , Q ) . Byassumption, H ( P ) = F ( P , P ) , thus, val I = sup P ∈ P H ( P ) , which is the maximum entropyvalue , also denoted MaxEnt = MaxEnt ( F , P ) . So val I = MaxEnt and we realize thatthe GTE-principle leads directly to Jaynes maximum entropy principly , cf. [5].

Exponential Families and MaxEnt Calculations for Entropy Measures of Statistical PhysicsMarch 23, 20191 lassical Boltzmann-Gibbs-Shannon entropy (BGS-entropy) is obtained as minimalcomplexity with respect to the measure ( P , Q ) y (cid:229) p i ln q i which has a clear and con-vincing interpretation related to coding. Our results go some way to establish reasonableinterpretations also for more general measures of complexity. Regarding the origin ofthe the above measure of complexity, under the name of inaccuracy , see Kerridge [6].As we have seen, entropy is generated by complexity. So is divergence (cross entropy,relative entropy or redundancy) , deﬁned as actual minus minimal complexity: D ( P , Q ) = F ( P , Q ) − H ( P ) when H ( P ) < ¥ . In any case, the linking identity F ( P , Q ) = H ( P ) + D ( P , Q ) holds and D ( P , Q ) ≥ P = Q (for the measures ofcomplexity we shall consider, it will be clear how to deﬁne D ( P , Q ) when H ( P ) = ¥ ). ROBUSTNESS, EXPONENTIAL FAMILIES

A Player II-strategy Q is robust if, for some constant h < ¥ , the level of robustness , F ( P , Q ) = h for all consistent distributions P . The set E = E ( F , P ) of all robustPlayer II-strategies is the exponential family associated with g ( F , P ) . If a family N of preparations is considered, the exponential family E ( F , N ) associated with N is theset of distributions which are robust for all preparations P ∈ N .The following general and simple observation will play a key role in the sequal: Theorem 1 (robustness lemma) . Let the measure of complexity F and the preparation P be given. Assume that the distribution Q ∗ is robust (Q ∗ ∈ E ( F , P ) ) and consis-tent (Q ∗ ∈ P ). Then g ( F , P ) is in equilibrium and has Q ∗ as the unique MaxEnt-distribution as well as the unique optimal strategy for Player II.Proof. Though known from e.g. [4] we present a direct proof.Let h be the level of robustness. Then F ( Q ∗ , Q ∗ ) = h and, for P ∈ P with P = Q ∗ , H ( P ) = F ( P , P ) < F ( P , Q ∗ ) = h . Thus Q ∗ is the unique MaxEnt-distribution. Forany Q = Q ∗ , sup P ∈ P F ( P , Q ) ≥ F ( Q ∗ , Q ) > F ( Q ∗ , Q ∗ ) = h = sup P ∈ P F ( P , Q ∗ ) andequilibrium as well as unique optimality of Q ∗ for Player II follows.The result connects the exponential family E with the preparation P . Indeed, if E and P intersect, they only intersect in one distribution which then is the optimal strategy forboth players and, furthermore, the game considered is in equilibrium. COMPLEXITY AND LINEAR CONSTRAINTS

We shall apply the principle of GTE – via the robustness lemma – to a wide class ofcomplexity functions and associated notions of entropy, always having one and the sametype of preparations in mind, viz. those given by linear constraints . They are the mostimportant preparations for statistical physics and other applications, cf. e.g. Kapur [7].>From now on, we consider a ﬁxed ﬁnite set f = ( f n ) ≤ n ≤ k of real-valued functionsdeﬁned on A . The associated family of natural preparations , denoted N , consists of allnon-empty sets P a which are deﬁned as follows, denoting by h· , P i mean value w.r.t. P : P a = { P ∈ M + ( A ) |h f n , P i = a n for 1 ≤ n ≤ k } . (1) Exponential Families and MaxEnt Calculations for Entropy Measures of Statistical PhysicsMarch 23, 20192 ere a = ( a n ) ≤ n ≤ k ∈ R k . We assume that no non-trivial linear combination of the f n ’s reduces to a constant function. Clearly, E ( F , N ) , the natural exponential family ,consists of those distributions which are robust for all natural preparations.We shall select special measures of complexity adapted to a study of the naturalpreparations and constructed with the aim to simplify the search for distributions in E ( F , N ) . To accomplish this, we consider measures of complexity of the form F ( P , Q ) = x Q (cid:16) h k ( Q ) , P i (cid:17) (2)where, for each Q ∈ M + ( A ) , x Q is a real function and k maps Q ∈ M + ( A ) into a functiondeﬁned on A . We insist that h k ( Q ) , P i can be obtained by summation based on a function k : [ , ] → [ , ¥ ] , the coding function , via the formula h k ( Q ) , P i = (cid:229) i ∈ A p i k ( q i ) . (3)This corresponds to the requirement ( k ( Q ))( i ) = k ( q i ) ; i ∈ A .Regarding x Q : [ , ¥ ] → [ , ¥ ] and k : [ , ] → [ , ¥ ] , we assume that the x Q ’s areincreasing and concave, that k is decreasing and convex, that k ( ) =

0, that k iscontinuous at 0 (not just at ] , ] ) and, ﬁnally, that F deﬁned by (2) is a genuine measureof complexity. The last requirement will be trivially fulﬁlled in the concrete caseswe shall consider. The inverse function k − : [ , k ( )] → [ , ] will play a signiﬁcantrole. We note that this function is continuous, decreasing and convex, as is k (simplegeometric proof).For the classical example , x Q is the identity map and k the function q y ln q . Then k − is the restriction of x y exp ( − x ) to [ , ¥ ] . Entropy generated by this measure ofcomplexity is standard BGS-entropy.For the general situation, we note that any Q for which k ( Q ) is a linear combinationof the constant function 1 and the given functions f , · · · , f k , i.e. of the form k ( Q ) = l + l · f + · · · l k · f k = l + l · f (4)for certain constants l and l = ( l , · · · , l k ) , is a member of E ( F , N ) . Motivated bythis observation, we ﬁx real constants l = ( l , · · · , l k ) and ask if there exists a realconstant l and a distribution Q = ( q i ) i ∈ A such that (4) holds.For abbreviation, put L i = l · f ( i ) . Then (4) amounts to q i = k − ( l + L i ) for i ∈ A . As k − is deﬁned on [ , k ( )] , we must have 0 ≤ l + L i ≤ k ( ) for each i . Therefore, the L i must be bounded below. Furthermore, from (cid:229) i q i =

1, we conclude that, for each K < k ( ) , there can only be ﬁnitely many i ∈ A with L i ≤ K . Thus we may orderthe L i : L i ≤ L i ≤ · · · , with this sequence breaking off and having a largest elementif A is ﬁnite and with L i n → k ( ) if A is inﬁnite. Put L ∗ = L i and L ∗ = sup i ∈ A L i ( = k ( ) if A is inﬁnite). We realize that we must require that L ∗ − L ∗ ≤ k ( ) and,assuming this holds, the set of possible constants l is the set [ − L ∗ , ¥ [ in case k ( ) = ¥ and the set [ − L ∗ , k ( ) − L ∗ ] if k ( ) < ¥ . Consider the function f deﬁned by f ( x ) = (cid:229) i ∈ A k − ( x + L i ) with x ’s ranging over the possible values of l . What we search for isa value of l , necessarily unique, such that f ( l ) = Exponential Families and MaxEnt Calculations for Entropy Measures of Statistical PhysicsMarch 23, 20193 learly, f ( − L ∗ ) ≥

1. By standard techniques, we see that f is continuous fromthe right and if f ( x ) < ¥ for some value of x , then f is continuous at all x > x .Furthermore, if x n → k ( ) and if f ( x n ) < ¥ for all n , then f ( x n ) → n → ¥ .Our analysis shows that f can have at most one point of discontinuity, viz. where itpasses from the value ¥ to ﬁnite values. Such a discontinuity “normally” does not occur.Also other anomalies are “normally” excluded. For instance, one may easily constructexamples such that f is constantly equal to ¥ but such values are also excluded as theyare of no practical interest. Thus we maintain that “normally” the function f assumesﬁnite values larger than 1 as well as values less than 1 and hence the existence of a value l with f ( l ) = Theorem 2 (MaxEnt calculus) . Let l = ( l , · · · , l k ) be given real constants. Then,under “normal” circumstances (cf. the discussion above), the equation (cid:229) i ∈ A k − (cid:16) l + l · f ( i ) (cid:17) = (5)has a solution, necessarily unique, and Q = ( q i ) i ∈ A given byq i = k − (cid:16) l + l · f ( i ) (cid:17) for i ∈ A (6)satisﬁes (4) and hence belongs to the exponential family E ( F , N ) . This distribution isthe MaxEnt-distribution for P a with a = ( a , · · · , a k ) given bya n = (cid:229) i ∈ A q i f n ( i ) for n = , · · · , k (7)and, for this value of a, MaxEnt ( F , P a ) = x Q ( l + l · a ) . The theorem replaces and expands the standard recipe for MaxEnt-calculations. Themain difference is a focus on l via (5) rather than on the classical partition function. Inthe ﬁnal section we present a more thorough discussion of the signiﬁcance of the result.Before continuing, we shall limit the type of complexity functions studied by reduc-ing the number of parameters needed for their deﬁnition. Instead of the many functionalparameters appearing in (2), we now suggest a setting with only two functional param-eters, one function x , called the corrector , to account for all the functions x Q via theformula x Q ( x ) = x + (cid:229) i ∈ A x ( q i ) and then the already introduced coding function k . Inother words, we point to complexity functions of the form F ( P , Q ) = (cid:229) i ∈ A p i k ( q i ) + (cid:229) i ∈ A x ( q i ) . (8)The functions k and x are uniquely determined from F . The two terms in (8) arecalled, respectively the coding part and the correction . For the classical example, thecoding part is (cid:229) i p i ln q and the correction vanishes. Exponential Families and MaxEnt Calculations for Entropy Measures of Statistical PhysicsMarch 23, 20194

OMPLEXITY ` A LA BREGMAN

We shall now generate a ( F , H , D ) -triple from a simple starting point. The method fol-lows the idea of Bregman divergences and is referred to as

Bregman generation . Anothermethod,

Csiszár generation , was suggested in [4]. In our view, Bregman generation isby far the most important one for the needs of statistical physics.Given is a

Bregman generator by which we shall understand a strictly concave andsmooth real function h deﬁned on [ , ] with h ( ) = h ( ) = ′ ( ) = −

1. We take“smoothness” to mean that h has an analytic extension to [ , ¥ [ . Though less will dofor most investigations, the stronger requirement allows one to consider also the dualfunction ˜h deﬁned by ˜h ( x ) = x h (cid:16) x (cid:17) . (9)This function is well-deﬁned and real-valued in ] , ¥ [ . As a ﬁnal technical assumption,we assume that the function can be extended by continuity to [ , ¥ ] , allowing for inﬁnitevalues at the endpoints. A speciﬁc value h ( p ) is interpreted as the complexity of an eventwhich is known to occur with probability p .>From h we generate two functions, f = f ( p , q ) , and d = d ( p , q ) : f ( p , q ) = h ( q ) + ( p − q ) h ′ ( q ) , (10)d ( p , q ) = h ( q ) − h ( p ) + ( p − q ) h ′ ( q ) . (11)A speciﬁc value f ( p , q ) is interpreted as the complexity of an event which is believedto occur with probability q but actually occurs with probability p . This is consistentwith the previous interpretation as f ( p , p ) = h ( p ) . The function d simply measures thedifference ( divergence ) between estimated and true value. We also note that f ( p , q ) andd ( p , q ) may assume the value + ¥ . This happens if and only if both p > q = ′ ( ) = ¥ hold.Consider the internal functions, F = F h , H = H h and D = D h generated by f , h andd. By this we mean that: F ( P , Q ) = (cid:229) i ∈ A f ( p i , q i ) , H ( P ) = (cid:229) i ∈ A h ( p i ) , D ( P , Q ) = (cid:229) i ∈ A d ( p i , q i ) . (12)We refer to f , h and d as the partial functions, respectively partial complexity, entropy and divergence . They satisfy a partial version of the linking identity: f ( p , q ) = h ( p ) + d ( p , q ) . (13)Note that F = F h is of the special form (8) with coding function k = k h given by k ( x ) = h ′ ( x ) + x = x h given by x ( x ) = h ( x ) − x ( h ′ ( x ) + ) . Hence the Bregman generatoris decomposed into two terms: h ( x ) = x k ( x ) + x ( x ) . (15) Exponential Families and MaxEnt Calculations for Entropy Measures of Statistical PhysicsMarch 23, 20195 s x ( ) = x ( ) = x ′ ( x ) = − x h ′′ ( x ) − x ≡ classical case h ( x ) = x ln ( / x ) . We also see that x ( x ) ≥ − x in [ , ] , hence thecorrection related to any distribution Q is bounded below by −

1. The dual function ˜happears also to be of signiﬁcance. In particular, x ( x ) = ˜h ′ ( / x ) − x , hence F ( P , Q ) = (cid:229) i ∈ A p i h ′ ( q i ) + (cid:229) i ∈ A ˜h ′ ( q i ) . (16)The ﬁrst term in (16) is the coding part minus 1, the second term the correction plus 1.Partial complexity is given by f ( p , q ) = p h ′ ( q ) + ˜h ′ ( / q ) . GENERATORS VIA DEFORMED LOGARITHMS

We turn to a concrete two-parameter family ( h a , b ) of Bregman generators deﬁned via deformed logarithms (taken in this form from [10]) and given byln a , b x = ( x b − x a b − a for a = b x a ln x for a = b . (17)The associated Bregman generators are deﬁned byh a , b ( x ) = x ln a , b ( / x ) . (18) Warning:

We have chosen to model the deﬁnition after the expression x ln ( / x ) ratherthan − x ln x . The main reason is the more natural interpretation of the former expression,but also, the change appears to be more as preferred in the “Tsallis literature” . Thechange is in contrast to the choice in [4]. Thus, compared to [4], one should make thetransformation ( a , b ) y ( − b , − a ) . Note also the symmetry h a , b = h b , a .>From [4] we see (after transformation) that, in order to obtain a genuine Bregmangenerator, the following restrictions apply to a and b : Either ≤ a < b ≤ orelse a ≤ ≤ b < f a , b ( x , y ) = b − a (cid:16) − ( − a ) xy − a + ( − b ) xy − b − a y − a + b y − b (cid:17) , (19) k a , b ( x ) = − b − a (cid:16) ( − a ) x − a − ( − b ) x − b (cid:17) . (20)Note that k ( ) = ¥ except if either a = b = k ( ) = ( a + b − ) / ( a + b ) ).The important inverse functions k − are deﬁned on [ , k ( )] . They can only becalculated in closed form in special cases. We point to the Tsallis case which correspondsto a < b =

0. The

Tsallis parameter , traditionally denoted by q , is then given by q = − a . For the origin to this family within the physics literature, see Tsallis, [11].Let us put k a , = k q (as above with q = − a ). Then, for q = k − q ( x ) = (cid:16) + − qq x (cid:17) q − for 0 ≤ x ≤ k q ( ) (21) Exponential Families and MaxEnt Calculations for Entropy Measures of Statistical PhysicsMarch 23, 20196 nd one can insert (21) into (5). The kind of sums obtained will, typically, have to becalculated numerically. An exception is the case q =

2. We leave it to the reader to workout the pleasent details of our calculus in this case (take A to be ﬁnite).Another case where k − a , b can be calculated in closed form is the Kaniadakis family which corresponds to a = − b , cf. Kaniadakis [12]. We shall not go into that here. DISCUSSION

Some features of the main result.

Theorem 2 provides a theoretical framework forMaxEnt calculations for natural preparations given by linear constraints and pertainingto a wide range of different entropy measures. Among special features as compared withthe standard approach we mention the following:The basis for the result is the game theoretical approach which necessitates a focuson possibly unfamiliar aspects and quantities, notably a focus on a notion of complexity,intended to reﬂect the interplay between the physicist and the system he is studying.This aspect could have been hidden, but the underlying principle – the principle of

GameTheoretical Equilibrium – is in itself promoted as a major issue. Indeed, it is suggestedthat this principle is of a basic nature, applicable to several scientiﬁc investigations, andthat, for the area of statistical physics, it is more fundamental than Jaynes MaximumEntropy Principle. The principle originated with Pfaffelhuber [13] and, independently,the author (with [14] the ﬁrst publication in English). Among further studies, we mentionthe joint work [15] with Harremoës.Another feature is the puzzling fact that optimization has been achieved “miracu-lously” without recourse to Lagrange multipliers. Many will ﬁnd it difﬁcult to acceptthat for the problem studied, an approach which is better – simpler and more illuminat-ing – than the well proven technique involving the popular multipliers exists. Within themathematical literature, this special feature goes back at least to Csiszár, cf. [23].Finally, we note that the MaxEnt calculus outlined here has no mention of partitionfunctions. The calculus goes a good deal beyond traditional settings based on classicalBGS-entropy. This has resulted in a focus on l which corresponds to the logarithmof the partition function in the classical case (so, for the classical case, we can write l = ln Z ( l ) where Z ( l ) = (cid:229) exp ( − l · f ( i )) ). It is well known that ln Z is a key quantityto work with, thus this feature should be no great surprise. But it is interesting that ourapproach leads directly to this quantity. As the partition function has no place for thegeneral case covered by Theorem 2, this is of course also forced in some sense. Exponential families.

Whereas the concept of partition function does not survivethe extension to general entropy- and complexity measures, the notion of exponentialfamilies does. It even appears to be the central concept behind the approach taken, cf.Theorem 1. However, extensions of this concept are needed (see below).

Comparing with the classical approach.

The simpliﬁcations in the classical case resultfrom the factorization property of k − , an exponential function in that case. Apart fromthis, the calculations for a general complexity function appear to be of much the samenature as for the classical case. Indeed, given l = ( l , · · · , l k ) one determines l from(5) and then, via (6), (7) leads to the relevant averages a = ( a , · · · , a k ) . If you aim for aspeciﬁc set of averages, there seems to be no way, neither in the classical case nor in the Exponential Families and MaxEnt Calculations for Entropy Measures of Statistical PhysicsMarch 23, 20197 eneral setting, other than application of numerical optimization procedures to choosejust that set of parameters l which leads to the appropriate set of constrained values.This discussion then tells us that apart from the simpliﬁcations possible in handling (5),the general calculus suggested is no more complicated in practise than what you are usedto from classical studies. Thermodynamic calculus.

The difﬁculties, indeed impossibilities, involved in ﬁndingsolutions to MaxEnt problems in closed form for other than the simplest problemsconstitute part of the motivation to create a thermodynamic calculus, studying variationas functions of various parameters of signiﬁcance to the physicist or chemist. In thisway one hopes to develop useful approximate solutions or to discover interesting trendsin the thermodynamics as response to changes of relevant parameters. The differentialcalculus needed for such endeavours appears to be applicable also to the general settingof Theorem 2 with its precise equations to look closer into. Studies of this kind are nottaken up here.

Natural expansions, optimal opdating based on a prior.

There are many furtherpossibilities for theoretical investigations based on measures of complexity of the formhere studied. Assumptions related to the form (2) allows one to derive several resultsother than Theorem 2: Uniqueness of Q determined from l , convexity of the set of l ’sfor which Q can be found, convexity of the function l y l = l ( l ) (this correspondsin the classical case to log-convexity of the partition function), existence of equilibriafor the models in the natural family and, as a consequence, concavity of the map a y MaxEnt ( F , P a ) .We comment that whereas measures of complexity of the special form (8) are rathersimple and quite a rich family, the more elaborate form given by (2) is also of importance– especially, it allows the consideration of Rényi entropies and related quantities.A special expansion of the concept of robustness which allows identiﬁcation ofMaxEnt-distributions for which some of the point probabilities (the q i of Theorem 2) areallowed to be 0 should also be mentioned. This concerns cases where l + l · f ( i ) ≥ k ( ) and is therefore only relevant when k ( ) < ¥ . However, there are important cases wherethis is so, e.g. Tsallis-type quantities with q >

1. In such cases inconsistent inference ispossible where a feasible i (one for which there exists P ∈ P a with p i >

0) is inferredunder MaxEnt-based inference as an impossible event. This phenomenon is treated inpart by Jaynes, cf. p.345 of [22]. Taking this into consideration, it appears possibleto prove that any candidate to MaxEnt-distributions (or the more general centers ofattraction of [15]) of preparations in a natural family of preparations, must be a memberof the associated exponential family. For the classical case, where inconsistent inferenceis not possible, such a result was established in [15].Consider now the problem of optimal updating based on a given prior . In fact,such problems can be handled in analogy with our analysis of MaxEnt problems. Inparticular, a result `a la Theorem 2 holds which provides a calculus for optimal posteriordistributions via a minimum cross entropy principle – the kind of results initiated byKullback, cf. [24]. To indicate, if only brieﬂy, that this requires no new techniques,consider a prior Q and try to maximize the updating gain Y Q ( P , Q ) = F ( P , Q ) − F ( P , Q ) . This situation can be analyzed by applying our game theoretical reasoning to − Y Q which is a genuine complexity measure. For this to work, the theory has to beextended slightly, allowing complexity measures that can take negative values. Exponential Families and MaxEnt Calculations for Entropy Measures of Statistical PhysicsMarch 23, 20198 recise statements and proofs of results just indicated will be published elsewhere.

Origin of the two-parameter family.

The two-parameter family of complexity-,entropy- and divergence measures, ( F a , b , H a , b , D a , b ) has its origin in the mathemat-ical literature, cf. Mittal [8] and Sharma and Taneja [9], and was studied later in thephysical literature by Borges and Roditi, [10] who used the convenient concept ofdeformed logarithms. Entropy should not stand alone.

Let us illustrate this thesis by considering Tsallisentropy with Tsallis parameter q . There are inﬁnitely many ways of obtaining thisentropy measure as minimal complexity. Below we suggest three complexity measureswhich have this property: F B ( P , Q ) = q − + (cid:229) (cid:16) q qi − qq − p i q q − i (cid:17) (22) F C ( P , Q ) = − q (cid:229) p qi ( − q − qi ) (23) F R ( P , Q ) = − q (cid:16) (cid:229) p qi (cid:229) p qi q − qi − (cid:17) . (24)As usual, sums are over i ∈ A . The “B ” , “C” and “R” stand for, respectively“Bregman ” , “Csiszár” and “Rényi” . The complexity measure F B is the one consideredin the main text, F C the one considered in [4] and F R is closely related to the relevantcomplexity measure connected with Rényi entropy and divergence.The measure F B allows us – as we have seen – to study the natural preparationsgiven by linear constraints, F C allows us to develop a calculus much as Theorem 2, butaiming at maximizing entropy for preparations given by averaging with respect to the q-associated measures which are measures with point masses p qi and ﬁnally, F R allowsus to deal with preparations given by averages with respect to the q-escort distributions which are obtained by normalizing the q -associated measures. To realize that this isindeed so, you just have to note how P enters in the complexity measure considered.It can safely be argued that “distorted” averages as those indicated above related to F C and F R have no physical relevance and therefore, they are considered of less or noimportance for the study of natural maximum entropy problems. Bregman generation isthus the method which stands back as the really signiﬁcant method. The importance of Bregman type quantities.

The relevance for statistical physics ofBregman divergence was emphasized by Naudts [1], [2]. The work by Abe and Bagci[3] should also be mentioned, however, the present author does not agree with theirconclusion that the use of escort distributions is essential. Anyhow, the proper matchingof entropy measure with the type of constraints one wants to study is important. Thisissue is also addressed in Feng [20].Originally, Bregman introduced the concept to meet needs of learning theory, cf. [21].For more recent articles in this direction, see Murata et al., [19] and Sears [18].Concerning extensions in another direction, to quantum statistical physics, note therecent study by Petz, [17] where Bregman divergences are carefully deﬁned. Incorpora-tion of game theoretical considerations may be a fruitful area of research to look into.

Interpretations.

Any measure of entropy of importance to statistical physics shouldbe motivated by sound reasons, including appropriate interpretations. It appears that

Exponential Families and MaxEnt Calculations for Entropy Measures of Statistical PhysicsMarch 23, 20199 regman generation in itself goes a way in this direction. In addition, the choice ofterminology, especially regarding the frequent reference to “coding” , though not yetfounded in precise procedures for observation or measurement, is indicative for whatfuture research may bring, at least this is where speculations of the author goes.One should recall that Kullback-Leibler divergence is related to free energy for clas-sical preparations. This kind of interpretation when more general Bregman-type diver-gences are involved appears also to be sound, cf. the recent study by Bagci, [16]. Possi-bly, Crooks, [25], also points to issues to be integrated before a full picture is in place.

ACKNOWLEDGMENTS

Thanks are due to Jan Naudts for critical comments and suggestions and to BjarneAndresen for pointing me to [25]. The work was supported by the Danish NaturalScience Research Council.

REFERENCES

1. J. Naudts.

Rev. Math. Phys. , 16(6):809–822, 2004.2. J. Naudts.

J. Ineq. Pure and Appl. Math. , 5(4):1–15, 2004.3. S. Abe and G. B. Bagci.

Physical Review E , 71:016139,1–5, 2005.4. F. Topsøe.

Physica A , 340/1-3:11–31, 2004.5. E. T. Jaynes.

Physical Reviews , 106 and 108:620–630 and 171–190, 1957.6. D. F. Kerridge.

J. Roy. Stat. Soc. B. , 23:184–194, 1961.7. J. N. Kapur.

Maximum Entropy Models in Science and Engineering . Wiley, New York, 1993. ﬁrstedition 1989.8. D. P. Mittal.

Metrika , 22:35–45, 1975.9. B. D. Sharma and I. J. Taneja.

Metrika , 22:205–215, 1975.10. E. P. Borges and I. Roditi.

Physics Letters A , 246:399–402, 1998.11. C. Tsallis.

J. Stat. Physics , 52:479, 1988. See http://tsallis.cat.cbpf.br/biblio.htm for a comprehensivebibliography.12. G. Kaniadakis.

Physical Review E , 66:056125,1–17, 2002.13. E. Pfaffelhuber. Minimax information gain and minimum discrimination principle. In I. Csiszárand P. Elias, editors,

Topics in Information Theory , volume 16 of

Colloquia Mathematica SocietatisJános Bolyai , pages 493–519. János Bolyai Mathematical Society and North-Holland, 1977.14. F. Topsøe.

Kybernetika , 15(1):8 – 27, 1979.15. P. Harremoës and F. Topsøe.

Entropy , 3(3):191–226, Sept. 2001.16. G. B. Bagci. arXiv:cond-mat/0703008v1, March 2007.17. D. Petz.

Acta Math. Hungar. , 116:127–131, 2007.18. T. D. Sears. From maxent to machine learning and back. In K. Knuth, editor,

Proceedings of the26th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science andEngineering , 2008. to appear.19. T. Kanamori N. Murata, T. Takenouchi and S. Eguchi.

Neural Computation , 16(7):1437–1481, 2004.20. X. Feng. arXiv:cond-mat.stat-mech/0705.1332v4, May 2007.21. L. M. Bregman.

USSR Comp. Math. and Math. Phys. , 7:200–217, 1967.22. E. T. Jaynes.

Probability Theory - The Logic of Science . Cambridge University Press, Cambridge,2003.23. I. Csiszár.

Ann. Probab. , 3:146–158, 1975.24. S. Kullback.

Informaton Theory and Statistics . Wiley, New York, 1959.25. G. E. Crooks.