[PDF] When Shannon and Khinchin meet Shore and Johnson: equivalence of information theory and statistical inference axiomatics

Abstract

We propose a unified framework for both Shannon-Khinchin and Shore-Johnson axiomatic systems. We do it by rephrasing Shannon-Khinchine axioms in terms of generalized arithmetics of Kolmogorov and Nagumo. We prove that the two axiomatic schemes yield identical classes of entropic functionals -- Uffink class of entropies. This allows to re-establish the entropic parallelism between information theory and statistical inference that has seemed to be "broken" by the use of non-Shannonian entropies.

Full PDF

aa r X i v : . [ c ond - m a t . s t a t - m ec h ] A p r When Shannon and Khinchin meet Shore and Johnson: equivalence of informationtheory and statistical inference axiomatics

Petr Jizba ∗ and Jan Korbel

2, 3, 1, † Faculty of Nuclear Sciences and Physical Engineering,Czech Technical University in Prague, Bˇrehov´a 7, 115 19, Prague, Czech Republic Section for Science of Complex Systems, Medical University of Vienna, Spitalgasse 23, 1090 Vienna, Austria Complexity Science Hub Vienna, Josefst¨adter Strasse 39, 1080 Vienna, Austria

We propose a uniﬁed framework for both Shannon–Khinchin and Shore–Johnson axiomatic sys-tems. We do it by rephrasing Shannon–Khinchine axioms in terms of generalized arithmetics ofKolmogorov and Nagumo. We prove that the two axiomatic schemes yield identical classes of en-tropic functionals — the Uﬃnk class of entropies. This allows to re-establish the entropic parallelismbetween information theory and statistical inference that has seemed to be “broken” by the use ofnon-Shannonian entropies.

PACS numbers: 05.20.-y, 02.50.Tt, 89.70.Cf

I. INTRODUCTION

Entropy is undoubtedly one of the most important con-cepts in physics, information theory and statistics [1].The notion of entropy was originally developed by Claus-sius, Boltzmann, Gibbs, Carath´eodory and others in thecontext of statistical thermodynamics. There it supple-mented a new state function that was naturally exten-sive (due to its very formulation in terms of the heat1-form) and in any adiabatically isolated system it rep-resented a non-decreasing function of its state variables(on account of the Clausius theorem). Roughly a half-century after these developments, the entropy paradigmwas further conceptualized in the theory of informationby Shannon [2]. In this later context the ensuing en-tropy (Shannon’s entropy or measure of information)quantitatively represented the minimal number of bi-nary (yes/no) questions which brings us from our presentstate of knowledge about the system in question to theone of certainty. The higher is the measure of informa-tion (more questions to be asked) the higher is the igno-rance about the system and thus more information willbe uncovered after an actual measurement. A properaxiomatization of Shannon’s entropy is encapsulated inthe so-called Shannon–Khinchin (SK) axioms [3]. Onlyone decades after Shannon’s seminal works, Jaynes [4, 5]promoted Shannon’s information measure to the level ofinference functional that was able to extract least biasedprobability distributions from measured data. This pro-cedure is better known as

Maximum entropy principle (MEP). Since MEP is, in its essence, a statistical in-ference method, it needs a proper mathematical quali-ﬁcation to place Jaynes’ heuristic arguments in a soundmathematical framework. The corresponding mathemat-ical qualiﬁcation was provided by Shore and Johnson (SJ) ∗ Electronic address: p.jizba@fjﬁ.cvut.cz † Electronic address: [email protected] in the form of axioms that ensure that the MEP esti-mation procedure is consistent with desired propertiesof inference methods [6, 7]. At this point, one shouldemphasize that in the statistical inference theory (SIT)entropy functionals serve only as convenient technical ve-hicles for unbiased assignment of distributions that arecompatible with given constraints. In fact, one might saythat it is the MEP distribution that is the primary ob-ject in SIT while the entropy itself is merely secondary(not having any operational role in the scheme). Thisis very diﬀerent from the information theory or ther-modynamics where entropies are primary objects withﬁrm operational meanings (given, e.g., in terms of cod-ing theorems or calorimetric measurements). In the orig-inal paper [6, 7] Shore and Johnson concluded that theiraxioms yield only one “measure of bias”, namely Shan-non entropy. It might, however, seem a bit puzzling why“measure of bias” should have anything to do with ad-ditivity (i.e., one of the deﬁning properties of Shannon’sentropy). In the end, any monotonic function of such ameasure should provide the same MEP distribution butmight (and as a rule it does) yield non-additive entropy.So, it is perhaps not so surprising that with the adventof generalized entropies [8–16], the past two decade haveseen a renewed interest both in the SJ axiomatics andthe associated classes of admissible entropies [17–22]. Inparticular, it has been shown in Ref. [21] that the SJ ax-iomatization of the inference rule does account for sub-stantially wider class of entropic functionals than just SE— the so-called Uﬃnk class [22], which include Shannon’sentropy as a special case.The main aim of this paper is to answer the followingquestion: what generalization of the SK axioms wouldprovide the Uﬃnk class of entropic functional? Thiswould not only allow to re-establish the “broken” en-tropic parallelism between information theory and sta-tistical inference but it should also cast a new light onthe Uﬃnk class of entropies and its practical utility.We ﬁrst recall the original set of SK axioms [3]:Let A and B be two discrete random variables withrespective sets of possible values A = { A i } ni =1 and B = { B i } mi =1 . With A one can associate a complete set ofevents { a i } ni =1 so that a i denotes the event that A = A i (similarly for B ). Elements a i (and b j ) are known aselementary events. Let P ( A = A i ) = P ( A i ) = p i , P ( B j ) = q j ,P ( A = A i , B = B j ) = P ( A i , B j ) = r ij ,P ( A = A i | B = B j ) = P ( A i | B j ) = r i | j = r ij /q j , ≤ i ≤ m ; 1 ≤ j ≤ n , be corresponding elementary-event, joint and conditionalprobabilities, respectively. For A and B we denote theensuing probability distributions as P A = { p i } ni =1 and P B = { q j } mj =1 . Likewise, we write P A,B = { r ij } n,mi,j =1 , P A | B = { r i | j } n,mi,j =1 and P A | B j = { r i | j } ni =1 . The entropyof the probability distribution P A (which may also becalled the entropy of A ) will be, with a slight abuse ofthe notation, denoted interchangeably as H ( P A ) or H ( A ).Similar notation will be introduced for distributions P B , P A,B , P A | B and P A | B j . SK1

Continuity:

Entropy is a continuous function w.r.t.all its arguments, i.e., H ( P ) ∈ C . SK2

Maximality:

Entropy is maximal for uniform dis-tribution, i.e., max P H ( P ) = H ( U n ), where U n = { /n, . . . , /n } . SK3

Expandability:

Adding an elementary event withprobability zero does not change the entropy, i.e, H ( p , . . . , p n ,

0) = H ( p , . . . , p n ) . SK4 S Shannon additivity: H ( A, B ) = H ( A | B ) + H ( B ) = H ( B | A ) + H ( A ) , where H ( B | A ) = P i p i H ( B | A = A i ).We note that the conditional entropy H ( B | A ) can be cal-culated in two ways: i) from the entropy of the joint dis-tribution of the pair ( A, B ) and marginal distribution of A , or ii) from the marginal distribution A and entropy ofthe conditional random variable B | A = A i . This dualityis crucial for the internal consistency the SK axiomaticscheme. Aforestated set of SK axioms has the uniquesolution — Shannon’s entropy [44] H ( P ) = − X i p i log p i . With the advent of generalized entropies [8–16] therearose two natural questions. First, is it possible toconceptualize such entropies in terms of information-theoretic axioms ( `a la

SK axioms)? And second, cangeneralized entropies be used as consistent inference func-tionals with sound mathematical underpinning ( `a la

SJaxioms)? As for the ﬁrst question, it is well known thatone can “judiciously” generalize the additivity axiom

SK4 S to produce various generalized entropies. Typi-cal examples are provided by R´enyi and Tsallis–Havrda–Charv´at (THC) entropies. For instance, for the R´enyientropy, one keeps axioms SK1-3 and substitute

SK4 S with [8]: SK4 R R´enyi additivity: R q ( A, B ) = R q ( A | B ) + R q ( B )= R q ( B | A ) + R q ( A ), where R q ( B | A ) = f − (cid:0)P i ρ Ai ( q ) f ( R q ( B | A = A i )) (cid:1) .Here, ρ Ai ( q ) = ( p i ) q / P j ( p j ) q is the escort (or zooming)distribution [36, 37] and f is an arbitrary invertible andpositive function on [0 , ∞ ). Corresponding axiomatics isstringent enough to ﬁx uniquely f ( x ) to be either f ( x ) = e (1 − q ) x (for q = 1) or f ( x ) = x (for q = 1), and yieldsthe R´enyi entropy R q ( P ) = log P i p qi − q , as the unique solution.Similarly, for the case of non-additive THC entropy [9,10] one can augment axioms SK1-3 with [23, 24]

SK4 T Tsallis additivity: S q ( A, B ) = S q ( B | A ) + S q ( A )+ (1 − q ) S q ( B | A ) S q ( A ) where S q ( B | A ) = P i ρ Ai ( q ) S q ( B | A = A i ),where ρ Ai ( q ) is again the escort distribution. The uniquesolution of this axiomatic system gives the THC entropy S q ( P ) = P i p qi − − q . In parallel with this there has been several success-ful attempts to classify entropic functionals according tovarious desirable information-theoretic properties. Herewe should mention, e.g., the class of strongly pseudo-additive entropies (SPA) based on generalization ofR´enyi entropy axioms for non-additive entropies [25], Z -entropies based on group properties of the entropic func-tionals [27] or classiﬁcation according to the asymptoticscaling leading to (c,d)-entropies [13] and ensuing gener-alizations [28].As for the second question, there has been notableprogress in recent years in the classiﬁcation of entropicfunctionals satisfying SJ axioms [21, 22, 29]. Our aimhere is to employ generic arithmetical principles to gener-alize, in a logically sound way, the SK axiomatic scheme.To this end we will use the framework of Kolmogorov–Nagumo (KN) arithmetics [30, 31], KN quasi-arithmeticmeans [32–34] and escort distributions [36, 37]. Ensu-ing class of admissible entropies will be compared withthe class of entropies solving SJ axioms — Uﬃnk class.We will see that both classes not only coincide, and hencebolster the entropic parallelism between information the-ory and statistical inference, but there also is a close par-allelism between the two axiomatic schemes.The rest of the paper is organized as follows: In Sec-tion II, we brieﬂy summarize the concept of generalizedarithmetics and outline the key role that Kolmogorov–Nagumo functions play in this context. In Section III, weintroduce the class of Shannon–Khinchin axioms basedon the Kolmogorov–Nagumo generalized arithmetics andderive the generic class of entropic functionals satisfyingthese axioms. In Section IV, we show the equivalence ofthe aforementioned class and the Uﬃnk’s entropic class.This will, in turn, cast new light on the relationship be-tween SK and SJ axiomatic schemes. This will in turnre-establish the entropic parallelism between informationtheory and statistical inference. The last section is de-voted to some further observations, remarks and conclu-sions. II. GENERALIZED ARITHMETICS ANDKOLMOGOROV AND NAGUMO FUNCTIONS

Let us now introduce the concept of generalized arith-metics. From abstract algebra it is known that arith-metic operations can be deﬁned in various ways, even ifone assumes commutativity and associativity of additionand multiplication, and distributivity of multiplicationwith respect to addition [30, 31]. In consequence, when-ever one encounters “plus” or “times” one has certainﬂexibility in interpreting these operations. A change ofrealization of arithmetic, without altering the remainingstructures of equations involved, plays an analogous roleas a symmetry transformation in physics.Let us considering a bijection f − : M N ⊂ R ,where M is some set. The map f allows us to deﬁneaddition, subtraction, multiplication, and division in M ,as follows x ⊕ y = f ( f − ( x ) + f − ( y )) ,x ⊖ y = f ( f − ( x ) − f − ( y )) ,x ⊗ y = f ( f − ( x ) f − ( y )) ,x ⊘ y = f ( f − ( x ) /f − ( y )) . (1)One can readily verify the following standard properties:(1) associativity ( x ⊕ y ) ⊕ z = x ⊕ ( y ⊕ z ), ( x ⊗ y ) ⊗ z = x ⊗ ( y ⊗ z ), (2) commutativity x ⊕ y = y ⊕ x , x ⊗ y = y ⊗ x ,(3) distributivity ( x ⊕ y ) ⊗ z = ( x ⊗ z ) ⊕ ( y ⊗ z ). Fora future convenience we will explicitly aﬃliate with thearithmetic operations ⊕ , ⊖ , ⊗ and ⊘ the symbol of thefunction f , so for instance, we will write ⊕ f instead of ⊕ , etc.This generalized arithmetical structure motivated Kol-mogorov and Nagumo [32, 33] to formulate the most gen-eral class of means, so-called quasi-linear means, that arefully compatible with the usual Kolmogorov postulatesof probability theory [35], with interesting applicationsin thermostatistics [34].The aforemention generalized arithmetics can be ex-tended quite naturally to real multivariate functions. Forinstance, for a function of two variables G ( x, y ) it can bedeﬁned as G f ( x, y ) ≡ f (cid:0) G ( f − ( x ) , f − ( y ) (cid:1) . Let us state in this connection a couple of importantconsequences that can be easily veriﬁed: i) when z = x ⊗ f y , then g ( z ) = g ( x ) ⊗ g · f g ( y ), ii) x ⊕ f y = x ⊗ f · log y . Here, by f · g we implic-itly mean the composition of two functions. Par-ticularly important for our purposes will be the so-called q -deformed algebra where f ( x ) ≡ f q ( x ) = log q ( x ) = ( x − q − − q ) . Ensuing operation ⊗ f q is traditionally denoted as q -addition and the notation ⊕ q is often used instead. iii) For the generalized product ⊗ f the function f is notdetermined uniquely. In fact, there exists a two-parametric class of functions f a,b , so that f ( x ) f a,b ( x ) = f ( ax b ), which yield the same product.Indeed, x ⊗ f a,b y = f (cid:18) a h ( f − ( x ) /a ) /b ( f − ( y ) /a ) /b i b (cid:19) = x ⊗ f y . (2)This result will be particularly important in Sec-tion III. III. KOLMOGOROV–NAGUMOGENERALIZATION OF SHANNON–KHINCHINAXIOMS

Let us now generalize the Shannon–Khinchin (SK) en-tropic axioms in terms of the Kolmogorov–Nagumo arith-metics in the following way:

SK1

Continuity:

Entropy is a continuous function w.r.t.all its arguments, i.e., S ( P ) ∈ C . SK2

Maximality:

Entropy is maximal for uniform dis-tribution, i.e., max P S ( P ) = S ( U n ), where U n = { /n, . . . , /n } . SK3

Expandability:

Adding an elementary event withprobability zero does not change the entropy, i.e, S ( p , . . . , p n ,

0) = S ( p , . . . , p n ) . SK4

Composability:

Joint entropy for a pair (

A, B ) ofrandom variables can be expressed as S ( A, B ) = S ( A | B ) ⊗ f S ( B ) , where S ( A | B ) is conditional entropy satisfying con-sistency requirements I), II) (see below).In passing, we can observe from the two illustrative ax-iomatic schemes

SK4 R and SK4 T that viable entropicfunctionals should obey two natural conditions: I) For two independent random variables A and B the joint entropy S ( A, B ) should be composable from entropies S ( A ) and S ( B ), i.e., S ( A, B ) = F ( S ( A ) , S ( B )) II)

Conditional entropy should be decomposable into entropies of conditional distributions, i.e., S ( B | A ) = G ( P A , { S ( B | A = A i ) } ni =1 ).Here F and G are functionals to be determined shortly.The motivation for these two conditions is taken fromthe original SK axioms for Shannon, R´enyi and Tsallisentropy. They all are composable from marginal entropiesif the subsystems are independent and they all are decom-posable into conditional entropies and (escort) marginaldistributions.Let us also note that the conditional entropy S ( A | B )automatically fulﬁlls several important properties:a) Entropic Bayes’ rule: S ( A | B ) = S ( B | A ) ⊘ f S ( B ) ⊗ f S ( A ) ,b) Generalized Gibbs inequality: S ( A | B ) ≤ S ( A ).The Bayes rule is easy to show from the interchangeabil-ity of S ( A, B ) = S ( B, A ) and by using the deﬁnition ofconditional entropy. The second law of thermodynamicsis easy to show because S ( A, B ) ⊘ f S ( B ) ≤ S ( A ).Moreover, we can deﬁne the mutual information as I ( A, B ) = S ( A, B ) ⊘ f ( S ( B ) ⊗ f S ( A )) . The composition requirement I) is equivalent to I ( A, B ) = f (1) for independent random variables. Wemight note that the requirement I) is equivalent to strictcomposability axiom introduced in Ref. [27].Let us now prove the following theorem: Theorem 1.

The most general class of entropic func-tionals S satisfying the aforestated axioms SK1-4 can beexpressed as S fq ( P ) = f  X i p qi ! / (1 − q )  , (3) where f ( x ) is a generic strictly increasing function de-ﬁned on x ∈ [0 , ∞ ) . In passing it is useful to note that (3) can be equiva-lently expressed as S fq ( P ) = f " exp q X i p i log q (cid:18) p i (cid:19)! , (4)where exp q ( x ) = [1+(1 − q ) x ] / (1 − q ) . Eq. (4) is a simple consequence of the fact thatexp q X i p i log q (cid:18) p i (cid:19) = − q ) X i p i p − qi − − q ! / (1 − q ) = X i p qi ! / (1 − q ) . Proof of Theorem 1:

First, we see that the functionalhas to be symmetric in all components of P = { p i } . Thisis because by relabeling points in a set of elementaryevents should not change the information about the un-derlying stochastic process. Consequently, S must besymmetric. Second, the entropy of the uninform dis-tribution S ( n ) ≡ S (1 /n, . . . , /n ) can be obtained fromcomposability axiom. To this end we denote the randomvariable with uniform distribution as U nm = U n U m . Weabbreviate S ( U n ) as S ( n ). Then [see Eq. (1)] S ( nm ) = S ( n ) ⊗ f S ( m ) ⇒ S ( n ) = f ( n ) . Here we have explored the freedom in the deﬁnition ofthe function f [see, Eq. (2)] and scaled back the genericsolution S ( n ) = f ( n x ), x ∈ R to the solution S ( n ) = f ( n ).Third, let us take two random variables A and B withdistributions P A = { p i } ni =1 and P B = { q j = 1 /m } mj =1 .Let us also introduce the so-called Dar´oczy mapping[25, 26], i.e., S f − S . After this mapping we getmultiplicative entropy. From the deﬁnition of S ( A | B ) wethen obtain that mf − S ( p /m, . . . , p n /m ) = f − S ( p , . . . , p n ) , (5)since the conditional entropy is for each random vari-able just the usual unconditional one. Therefore, entropymust be a ﬁrst order homogenous, symmetric function.According to [38] the solution of homogeneous equation(5) can be (under mildly restrictive assumptions) ex-pressed as f − S ( x , . . . , x n ) = b n Y i =1 x a i i where X i a i = 1 . (6)Here a i and b are constants to be speciﬁed later. How-ever, this solution is not symmetric in its variables. Thiscan be achieved by symmetrization of Eq. (6) that canbe then rewritten in the following form f − S ( p , . . . , p n ) = b X { j ,...,j n }∈ σ ( n ) n Y i =1 p a jk i . This expression can be equivalently recast to f − S ( p , . . . , p n ) = b n Y i =1 X k i p a i k i ! , that can further be rewritten as f − S ( p , . . . , p n ) = b n Y i =1 X k i p a i k i ! c/ (1 − a i ) , (7)which still keeps the entropy to be a homogeneous func-tion of the ﬁrst order. The parameter c is a free pa-rameter that will be determined later. Note that thisrepresentation is also mentioned in [39].Let us now show that in order to fulﬁll the decompos-ability axiom II) , only one a j must be non-zero. To thisend, we explicitly express f − S ( A | B ) as f − S ( A | B ) = (cid:16)P k ,l ( r k | l q l ) a (cid:17) c/ (1 − a ) (cid:0)P l q a l (cid:1) c/ (1 − a ) × · · · × (cid:16)P k n ,l n r k n | l n q a n l n (cid:17) c/ (1 − a n ) (cid:0)P l n q a n l n (cid:1) c/ (1 − a n ) . This can be more explicitly rewritten as f − S ( A | B ) = X l ρ Bl ( a ) X k ( r k | l ) a ! c/ (1 − a ) × · · · × X l n ρ Bl n ( a n ) X k n ( r k n | l n ) a n ! c/ (1 − a n ) , where ρ Bl ( a ) = q al / P l q al is the escort distribution [36,37]. This expression is an unconditional entropy of theconditional distribution only if one of a j is non-zero andthe remaining ones are zero. With this we get that f − S ( A | B ) = X l ρ Bl ( a ) X k ( r k | l ) a ! / (1 − a ) = (X l ρ Bl ( a )[ S ( A | B = b l )] − a ) / (1 − a ) , which directly implies the decomposibility function G .With this result Eq. (7) boils down to f − S ( p , . . . , p n ) = b X k p ak ! c/ (1 − a ) = b " exp a X k p k log a p k ! c , which by Eq. (2) is equivalent to (3) provided we identify a with q .The function f must be strictly monotonic becausein the proof we needed inverse of f , and must bestrictly increasing because S has by SK2 the maximumfor uniform distribution (and not, for instance, for P = (1 , , , . . . , SK4 S isrecovered from SK4 by taking f ( x ) = ln x and the de-composibility function G ( x i , y i ) = P i x i y i . IV. EQUIVALENCE WITH SHORE–JOHNSONAXIOMS

Let us now turn our attention to MEP and corre-sponding consistency requirements. The MEP can beformulated in the following way [4, 5]:

Proposition (Maximum entropy principle).

Given theset of linear constraints P i p i E ( k ) i = h E ( k ) i , the leastbiased estimate of the underlying distribution P = { p i } is obtained from maximization of the entropic functional S ( P ) under normalization constraint and set of con-straints h E ( k ) i , i.e., by maximizing the Lagrange func-tional S ( P ) − α N X i =1 p i − ν X k =1 β ( k ) N X i =1 p i E ( k ) i . (8)In (8) the index “ i ” runs over all possible states , i.e.,over all elements from the set of possible outcomes asso-ciated with a given random system.Shore and Johnson formulated the set of consistencyrequirements that the MEP should satisfy [6, 7]: SJ1

Uniqueness : the result should be unique.

SJ2

Permutation invariance : the permutation of statesshould not matter.

SJ3

Subset independence : It should not matter whetherone treats disjoint subsets of system states in termsof separate conditional distributions or in terms ofthe full distribution.

SJ4

System independence : It should not matterwhether one accounts for independent constraintsrelated to independent systems separately in termsof marginal distributions or in terms of full-system.

SJ5

Maximality : In absence of any prior information,the uniform distribution should be the solution.Let us now state without proof the theorem that pro-vides the most general class of admissible entropic func-tionals consistent with the aforestated SJ axioms: Theorem 2 (Uﬃnk theorem) . The class of entropicfunctionals S satisfying the axioms SJ1-5 can be ex-pressed as S fq ( P ) = f  X i p qi ! / (1 − q )  , (9) for any q > and for any strictly increasing function f . In particular, the Uﬃnk theorem shows that membersof this entropic class admit representation in the formgiven by Eq. (3), and hence the SK and SJ axiomaticsystems are equivalent. A detailed proof of Theorem 2can be found in Ref. [21].Let us now discuss some salient results of the proof.First two axioms assert that the entropic functional mustbe a symmetric functional in the probability components.The third axiom determines the function in the sum form,i.e. in the form S ( P ) = f ( P k g ( p k )), with g being anarbitrary increasing concave function. The fourth axiomgives us the ﬁnal form of the entropic functional (withoutspecifying the range for q ’s), and ﬁnally the ﬁfth axiomguaranties that q >

0. Note that the class obtained fromTheorem 1 and epitomized by Eq. (3) is the same as theclass given by Eq. (9) from Uﬃnk theorem. Therefore, weimmediately see that in axiom

SK4 the requirement

II) (decomposability) corresponds to axiom

SJ3 , while re-quirement I) (composability) corresponds to axiom SJ4 .Moreover, the interpretation of f and q is now clear. Thefunction f determines the scaling of the entropy for uni-form distribution (as it is independent of q ), see also [28],while the parameter q determines the correlations in thesystem through MaxEnt distribution, which can be ex-pressed as (see Ref. [21]) p i = 1 Z q exp q ( − β ∆ E i ) ,Z q = X i (cid:2) exp q ( − β ∆ E i ) (cid:3) , β = βqf ′ ( Z q ) Z q , where ∆ E i = E i − h E i . The connection of the q parame-ter with correlations can be understood from the MaxEntdistribution of a joint system composed from two disjointsubsystems. Let us denote the MaxEnt distribution ofthe joint system as p ij and the MaxEnt distribution ofthe subsystems as u i and v j . In [21] it was shown thatthe MaxEnt distributions involved fulﬁll the compositionrule that can be formulated as1 p ij U q ( P ) = 1 u i U q ( U ) ⊗ q v j U q ( V ) . (10)where x ⊗ q y = [ x − q + y − q − / (1 − q )+ (with x, y >

0) isthe so-called q -product [40], and U q ( P ) = (cid:0)P p qij (cid:1) / (1 − q ) ,and similarly for U and V . For q →

1, (10) reducesto p ij = u i v j . The reverse is true as well. By re-expressing (10) in terms of escort distributions P ij ( q ), U i ( q ) and V j ( q ) (associated with p ij , u i and v j , respec-tively) as P ij ( q ) p ij = U i ( q ) u i + V j ( q ) v j − , (11)and using p ij = u i v j , we obtain U i ( q ) = u i , V k ( q ) = v k (for all i, k ). Latter have a unique solution [36] q =1. This implies that q parametrizes correlations between system’s subsystems since only for q = 1 the Pearsoncorrelation coeﬃcient is zero.As discussed, e.g., in [41], a monotonic function ofan entropic functional gives the same MEP distribu-tion and redeﬁnes only the Lagrange multipliers butdoes not change the actual form of the distribution.This can be interpreted as a sort of gauge invariance S ( P ) f ( S ( P )). Finally, let us mention that the q = 1case corresponds to uncorrelated MEP distributions fordisjoint systems, for which we get a stronger version ofsystem independence axiom [21]: SJ4

SSI

Strong system independence:

Whenever twosubsystems of a system are disjoint, we can treatthe subsystems in terms of independent distribu-tions.The solution is then S f ( P ) = f exp "X i p i log (1 /p i ) , which is equivalent (through Dar´oczy mapping) to Shan-non entropy — as expected. In this case, the compositionrules in Eqs. (10) and (11) reduce to the composition ruleof independent systems, i.e., p ij = u i v j . On the other hand, if we require that the entropy mustbe in the trace form [13, 27], i.e., S ( P ) = P i g ( p i ), thenwe get that f ( x ) = log q ( x ) and we end up with the classof THC entropies S log q q ( P ) = X i p i log q (1 /p i ) . V. CONCLUSIONS

Here we have reformulated Shannon–Khinchin axiomsof information theory in terms of generalized arithmeticsof Kolmogorov and Nagumo. Apart from the axiomaticstructure itself, the novelty of this work is in showing thatthe general class of entropic functional satisfying suchSK axioms is identical with the Uﬃnk class of entropies.Since the Uﬃnk class is known to represent the generalsolution of Shore–Johnson axioms of statistical-inferencetheory, both axiomatic systems have to be equivalent.We have shown that Uﬃnk functionals S fq are character-ized by the Kolmogorov–Nagumo function f and a posi-tive parameter q , where f determines a scaling behaviorof entropy for uniform distributions and q quantiﬁes cor-relations of MEP distributions for disjoint subsystems. Inpassing, we can note that the form (4) of the class S fq canalso be found in the literature under the name stronglypseudo-additive (SPA) entropies [42] or Z -entropies [27].The outlined entropic parallelism between informationtheory and statistical inference can serve as a good start-ing point for further research. In this context it wouldbe particularly interesting to investigate how robust theaforementioned equivalence between the two axiomaticsystems is and assess the extent and consequences result-ing from a prospective breakdown. One might instigatesuch a breakdown by working, e.g. with more generalconstraints (non-linear constraints or scalings as in non-inductive inference) or by relaxing some of the presentedaxioms. In fact, it is well-known that many complex sys-tems do not satisfy SK axioms, not even in our general-ized sense [13, 28, 43]. By relaxing some of these axioms,one might gain further maneuvering space allowing to ac-commodate entropies of such systems as path-dependent and super-exponential systems or complex systems withnon-trivial constraints. Acknowledgements

P.J. and J.K. were supported by the Czech ScienceFoundation (GA ˇCR), Grant 19-16066S. J.K. was alsosupported by the Austrian Science Foundation (FWF)under project I3073. [1] S. Thurner, B. Corominas-Murtra and R. Hanel, Phys.Rev. E 96 (2017) 032124.[2] C. Shannon, Bell Syst. Tech. J. 27 (1948) 379; 623.[3] A.I. Khinchin,

Mathematical Foundations of InformationTheory , (Dover Publications, New York, 1957).[4] E.T. Jaynes, Phys. Rev. 106 (1957) 620.[5] E.T. Jaynes, Phys. Rev. 108 (1957) 171.[6] J.E. Shore and R.W. Johnson, IEEE Trans. Inf. Theor.26 (1980) 26.[7] J.E. Shore and R.W. Johnson, IEEE Trans. Inf. Theor.27 (1981) 472.[8] P. Jizba and T. Arimitsu, Ann. Phys. 312 (2004) 17.[9] C. Tsallis, J. Stat. Phys. 52 (1988) 479.[10] J. Havrda and F. Charv´at, Kybernetika 3 (1967) 30.[11] G. Kaniadakis, Physica A 365 (2006) 17.[12] B.D. Sharma, J. Mitter and M. Mohan, Inf. Control 39(1978) 323.[13] R. Hanel, S. Thurner, Europhys. Lett. 93 (2011) 20006.[14] P. Jizba and J. Korbel, Physica A 444 (2016) 808.[15] T. Wada and H. Suyari, Phys. Lett. A 368 (2007) 199.[16] P. Jizba and T. Arimitsu, Physica A 340 (2004) 110.[17] S. Press´e, K. Ghosh, J. Lee and K.A. Dill, Phys. Rev.Lett. 111 (2013) 180604.[18] C. Tsallis, Entropy 17 (2015) 2853.[19] S. Press´e, K. Ghosh, J. Lee and K.A. Dill, Entropy 17(2015) 5043.[20] T. Oikonomou and G.B. Bagci, Phys. Rev. E. 99 (2019)032134.[21] P. Jizba and J. Korbel, Phys. Rev. Lett. 122 (2019)120601.[22] J. Uﬃnk, Stud. Hist. Phil. Mod. Phys. 26 (1995) 223.[23] S. Abe, Phys. Lett. A 271 (2000) 74.[24] H. Suyari, IEEE Trans. Inf. Theor. 50(8) (2004) 1783.[25] V.M. Ili´c and M.S. Stankovi´c, Physica A 411 (2014) 138. [26] P. Jizba and J. Korbel, Entropy 19 (2017) 605.[27] P. Tempesta, Proc. Royal Soc. A 472 (2016) 20160143.[28] J. Korbel, R. Hanel and S. Thurner, New J. Phys. 20(2018) 093007.[29] P. Jizba and J. Korbel, Phys. Rev. E. 100 (2019) 026101.[30] M. Czachor, Quantum Stud.: Math. Found. 3 (2016) 123;arXiv:1412.8583 [math-ph].[31] J. Naudts,

Generalised Thermostatistics , (Springer, Lon-don, 2011).[32] A. Kolmogorov, Atti della R. Accademia Nazionale deiLincei 12 (1930) 388.[33] M. Nagumo, Japan. J. Math. 7 (1930) 71.[34] M. Czachor and J. Naudts, Phys. Lett. A 298 (2002) 369.[35] A.N. Kolmogorov,

Foundations of Probability , (ChelseaPublishing Company, New York, 1950).[36] C. Beck and F. Schl¨ogl,

Thermodynamics of ChaoticSystems: An Introduction , (Cambridge University Press,Cambridge, 2008).[37] C. Tsallis, R.S. Mendes and A.R. Plastino, Physica A261 (1998) 534.[38] J. Acz´el and J. Dhombres,

Functional Equations in Sev-eral Variables , (Cambridge University Press, Cambridge,1989).[39] M.A. Rodr´ıguez, A. Romaniega and P. Tempesta, Proc.R. Soc. A, 475 (2018) 20180633.[40] E. P. Borges, Physica A 340 (2004) 95.[41] P. Jizba and J. Korbel, Phys. Rev. E 100 (2019) 026101.[42] V. Ili´c, A. M. Scafone, T. Wada, arXiv:1905.10533.[43] B. Corominas-Murtra, R. Hanel and S. Thurner, Proc.Nat. Akad. Sci. USA 112 (2015) 5348.[44] Here and throughout we use the base of natural loga-rithms. Entropy thus deﬁned is then measured in naturalunits — nats , rather than bitsbits