A superstatistical formulation of complexity measures
aa r X i v : . [ c s . CC ] J a n A superstatistical formulation of complexity measures
Jes´us Fuentes & Octavio Obreg´on
Abstract
It is discussed how the superstatistical formulation of effective Boltzmann factorscan be related to the concept of Kolmogorov complexity, generating an infinite set ofcomplexity measures (CMs) for quantifying information. At this level the information istreated according to its background, which means that the CM depends on the inherentattributes of the information scenario. While the basic Boltzmann factor directlyproduces the standard complexity measure (SCM), it succeeds in the description oflarge-scale scenarios where the data components are not interrelated with themselves,thus adopting the behaviour of a gas. What happens in scenarios in which the presenceof sources and sinks of information cannot be neglected, needs of a CM other than theone produced by the ordinary Boltzmann factor. We introduce a set of flexible CMs,without free parameters, that converge asymptotically to the Kolmogorov complexity,but also quantify the information in scenarios with a reasonable small density of states.We prove that these CMs are obtained from a generalised relative entropy and wesuggest why such measures are the only compatible generalisations of the SCM.
Suppose we are given the binary strings and then we are asked how random they are. Without further analysis, the first string issimply a sequence of ten times , hence to quantify its randomness is rather meaningless.On the other hand, although the second string looks more complicated than the firstone, it merely corresponds to the first twenty bits of the decimal part of π , therefore itsdescription can be put into unchallenging terms as well. The more economic description wecan glimpse to resemble these binary objects the closer we are to their actual Kolmogorovcomplexity [1–3], which is, roughly speaking, a measure of randomness that looks for theshortest possible program that halts and delivers the bit string in question, the shorter theprogram, the lesser random is its output.From a statistical viewpoint, however, the two bit strings above have the same proba-bility, 2 − , of being picked from the whole set of binary sequences of twenty bits. Even1hough, it is also interesting to ask for the probability with which we can pull a specific pro-gram, from a collection of programs, that prints out a bit string like the ones above. To bemore clear, suppose that a set of programs are recorded in the memory of a computer butwe ignore which state it is in. Suppose the only thing we know is the expected value of thelength of the program the computer outperforms. Of course we can improve our chancesof guessing if we are given more information. Suppose we additionally know the outcomeitself, hence we are led again to the Kolmogorov complexity of the program involved. Yetwe have now an additional component. From the Gibbs formulation of ensemble theory, weknow that the probability that the computer in question is in state (program) x is quanti-fied by p ( x ), as long as this distribution maximises a suitable entropy measure subject tosome specific constraints that we shall discuss soon.In this sense we are associating two different concepts that account for informationalthough each one by separate paths and with distinct interpretations: (1) The entropy,a functional which depends on a probability distribution (and sometimes on parameters)and, (2) The Kolmogorov complexity or algorithmic entropy , a descriptive quantity thatdepends on the object itself. The relation between these two concepts has been matter ofa substantial discussion [4–9], but here we want to adopt a different approach.In this work we introduce the notion of superstatistics [10] as an option to extendthe algorithmic information theory to an infinite number of CMs. The proposal is rathersimple. It consists of relating the concept of generalised entropy to their correspondingCM. Nonetheless, the entropy as a measure of information must satisfy the criterion ofstability [11] to become eligible as an adequate tool for such purposes. Which reducesthe whole universe of generalised entropies to a small set. To our knowledge, there areonly two entropies independent from parameters that generalise the Shannon’s entropyand simultaneously fulfil the condition of stability [12] —in consequence these are goodcandidates to deal with information. For this reason, we support our following discussionin such entropy measures, namely H + ( X ) and H − ( X ), and their corresponding CMs, fromnow on identified as K + ( X ) and K − ( X ).Interestingly, we also arrive to K + ( X ) and K − ( X ) from the relative versions of H + ( X )and H − ( X ), in turn interpreted as general representations of the Kullback-Leiber diver-gence [13] for the Shannon’s entropy.Our discussion is organised as follows. In Sec. 2 we introduce the superstatisticsframework and establish the concept of entropy on this context. We also formulate theconnection between the generalised entropy and its respective CM. In Sec. 3 we analysestatistically the consequences driven by a generalised Kolmogorov complexity in terms ofthe entropies H + ( X ) and H − ( X ). As a result we find the effective complexities K + ( X )and K − ( X ). Finally in Sec. 4 we summarise our results to conclude our work.2 Algorithmic superstatistics
The superstatistics approach [10] handles non-equilibrium macroscopic systems segmentedinto cells that manifest asymptotically stationary states with a spatiotemporally fluctu-ating intensive quantity, typically the inverse temperature β , which hardly varies on along time scale. To each cell there is assigned a particular β distributed according to apiecewise continuous, normalisable probability density f ( β ). Inside each region β is ap-proximately constant and therefore there is essentially local equilibrium. At a global level,the system lies in a state out of equilibrium but enough isolated from external upheavals,thus it only behaves slightly deviated from equilibrium. When the entire fluctuations areaveraged, inasmuch as the sum converges, an effective Boltzmann factor is obtained givingrise to general statistics. Even more, given that the method relies on those normalisabledistributions f ( β ) we can speak of an infinite set of possible generalised statistics.To knit the pieces together, we are to think of these cells as individual programs x that cast an outcome and halt, suggesting that the entire system can be thought of asa universal computer U of general purpose —indeed in a real scenario, the tasks thata computer outperforms are upshots of a big collection of different recipes stored in it.Accordingly, if we denote with | x | the length of the program x , hence B ( | x | ) = ∞ Z dβ f ( β ) e − β | x | , β > , (1)is the effective Boltzmann factor related to the universal computer U . In the case that allthe cells have the same β , then the system can be considered as a single cell, in that case f ( β ) = δ ( β − β ), and therefore we have B ( | x | ) = e − β | x | , which is the Boltzmann factor in the orthodox picture of statistical mechanics.Yet the generalised statistics have to be normalisable over the whole domain of programlengths, for that reason the integral (partition function Z ) ∞ Z d | x | B ( | x | )must converge. Nevertheless, the partition function is not always computable, for instance,in the particular case B ( | x | ) = e − γ | x | —which is the ordinary Boltzmann factor— theintegral exists for β ≥ ln 2 although it is uncomputable and partially random for ln 2as proved by Tadaki [14]. Exploring the (un) computability of Z in the general case,constitutes a mathematical challenge beyond the scope of this work, and we shall omitsuch discussion. 3here might be circumstances in which we are given an effective Boltzmann factor butinstead ignore the generating distribution, in that case we can compute it by reversing (1)as f ( β ) = Re πi | x | + i ∞ Z | x |− i ∞ d | x ′ | B ( | x ′ | ) e β | x ′ | , (2)we shall remark, however, that under this integral transformation, f ( β ) is not univocallydetermined by the effective Boltzmann factor B ( k x k ).We are now to connect these concepts with the algorithmic entropy. As we have previ-ously discussed, the core of the superstatistics approach is the effective Boltzmann factor B ( | x | ) whose construction rests on a probability distribution well-nigh customised for asystem of particular characteristics. In practice, this serves as a pivot to sprout generalstatistics in which the related fundamental quantities (such as entropy, free energy, etc.)must be rewritten in the frame of the new scheme.Of special interest is the general expression for entropy in superstatistics. Before givingits formal definition, it is important to state that we are to consider only entropy measuresof the form H ( X ) = X x ∈ X h ( p ( x )) , (3)where h ( p ( x )) is known as the entropic form, abbreviated as h ( x ), and p ( x ) is the proba-bility distribution according which the program x is distributed over a set X .As it has been nicely shown in [15], one can compute the entropic form as h ( x ) = x Z dy α + | y | − | y | / | y ∗ | , (4)where the length | · | is the inverse function of (1) with minimum value | · ∗ | , if any, and α is a constant that has to be identified from the condition h (1) = 0. It is worth mentioningthat the latter formula comes from a MaxEnt programme, in that regard one should expectthat if | x | is such that satisfies formulae (1) and (2) then it maximises (4).To proceed, we need to establish the following assumptions: The integral (4) exists andcan be computed; and the entropy (3) can be written according to the ansatz H ( X ) = − X x ∈ X p ( x )Λ( p ( x )) , where Λ( x ) is an effective logarithm whose corresponding inverse function is an effectiveexponential ǫ ( x ) such that Λ( ǫ ( x )) = ǫ (Λ( x )) = x , subject to the conditions Λ(1) = 0 andΛ ′ (1) = 1, see Appendix A. Along this work, all logarithms either effective or not are base2, without further notation unless a different thing needs to be specified.4 .1 A superstatistical measure of complexity In turn, we shall examine some of the aspects conveyed by the superstatistical formulationof entropy and its consequences in the measurement of complexity. To this aim, henceforthwe are to consider only recursive probability distributions, that is, distributions computablewith a Turing machine U , that we assume is running over a prefix-free domain X . Recallthat a set of strings X is prefix-free if no string x ∈ X is prefix of another string x ′ ∈ X .There is a number of approaches [16,17] to measure the randomness of an object x . Butin particular there is an uncomputable measure, although conceptually riveting, known as Kolmogorov complexity [1–3]. It is formally defined as follows:
Definition 2.1.
Kolmogorov complexity. Let U be a prefix-free Turing Machine, thecomplexity of the string y with respect to U is determined as K U ( y ) = min x {| x | : U ( x ) = y } , that is the minimum possible length over all programs x with the halt property, whichoutcome is y .As it was argued in Sec. 1, the quantity K U ( y ) has an intuitive but profound meaning.For a person describing the recipe for beef goulash to another person such that this onecannot make a different interpretation of the directions for the correct realisation of suchmeal, then the number of bits in that communication constitutes an upper bound on K U ( y ).Nonetheless, we want now to pursue a partially different approach regarding Def. 2.1.Instead of considering the complexity associated with a program x , we are interested inthe probability p ( x ) associated with that program, i.e. we look for the way this (minimum-length) program is distributed over the domain X of programs that can achieve a specificoutcome y . Thus rather than K U ( y ), from now on we refer to this quantity as K ( X ) ≡ K U ( p ( x )).Both K U ( y ) and K ( X ) are measures of information. The first one strictly arises fromcombinatorial arguments, while the latter is a purely statistical measure of complexitythat can be thought of as the average rate at which information is extracted from a com-binatorial trial. Since the second measure of complexity depends on how the program x is distributed according to the law p ( x ), there can be a number of ways in which suchprobability distribution can be maximised, depending on the entropy and maybe on someinherent constraints that are typically related with the special characteristics of a system.Even more, parallel to the discussion in [18], the complexity K ( X ) can be weightedaccording to a function ϕ ( K ( X )) that quantifies the cost of managing specific rates ofcomplexity. The Nagumo-Kolmogorov function ϕ is not arbitrary but depends on theentropy functional that maximises the distribution p ( x ), such that0 ≤ ϕ − X x ∈ X p ( x ) ϕ ( K ( x )) ! , X x ∈ X h ( x ) ≤ ϕ − X x ∈ X p ( x ) ϕ ( K ( x )) ! , (5)where, as shown in [19,20], the complexity K ( x ) is indeed related to the Chaitin formulationof complexity [16], according to the formula K ( x ) = − Λ( m ( x )) + O (1), with m ( y ) = ( X x ∈ X −| x | : U ( x ) = y ) , (6)that is, m is the probability that y is the output of a universal Turing machine U runningover the programs X . The summation over the whole set of programs in (6) can besimplified by reasoning that there is only one program x in X which outcome is y , tosee this imagine n programs x , . . . , x n , all having the same outcome y , yet we are onlyinterested in the shortest one given that K regards the minimum possible description of y .Moreover, in the hypothetical case that X contains n minimal programs x = x = · · · = x n ,all printing the same output y , then | x | > ln n . Under these considerations then (5) isequivalently expressed as X x ∈ X h ( x ) ≤ ϕ − X x ∈ X p ( x ) ϕ ( | x | + O (1)) ! , consequently, we are entitled to write the following relation0 ≤ ϕ − X x ∈ X p ( x ) ϕ ( K ( x )) ! − X x ∈ X h ( x ) ≤ ϕ ( K ( X )) , (7)this expression constitutes a theorem, and tells us that the entropy and the complexity, asmeasures of information, are truly connected if the complexity is treated from a statisticalviewpoint. Not only that, the latter relation comprises the coding theorem formulatedby Shannon [21], indicating us that the entropy provides the minimum rate at which thecomplexity can be expressed. Some of the consequences conveyed by the relation (7), interms of generalised entropies, shall be surveyed in the following section. As it has been discussed earlier, the connection between superstatistics and the algorithmicformulation of information theory has the advantage of quantifying data in generalisedscenarios with non-stationary information fluxes.6hat is the case of a theoretical computer partitioned into modules (or cells) thatcan be randomly accessed depending on the tasks that shall be executed. Each of thesesmodules will have an individual expected algorithmic-running time, such that the globalexecution time is distributed over the whole collection of modules that participate in aspecific task. However, the way in which the global algorithmic-running time is distributedover the modules may vary according to the computer’s configuration. For instance, if thecollection of modules are completely independent from each other, a probability distributionmaximised through the Shannon’s entropy may describe such scenario. Although this ismerely a special situation, in that many applications lead to non-standard distributions,and the necessity of a generalised entropy becomes evident.This happens, in computing scenarios in which a module-module interaction becomesnegligible in the presence of a high density of programs (states, in physics). While at thislevel the Shannon’s entropy suffices to maximise the underlying distribution, the story is notthe same in case the density of programs is reasonable small, given that the interactionscannot be entirely disregarded. In this case, the Shannon’s entropy needs of nonlinear,correction terms to account for such interactions.From the superstatistics viewpoint, it has been studied in [22, 23] that this kind of sys-tems could be well characterised via Gamma-like distributions conveying shape parametersthat a posteriori can be identified with a generic probability distribution p ( x ) as f ± p ( x ) ( β ) = 1 β p ( x )Γ (cid:16) p ( x ) (cid:17) (cid:18) ββ p ( x ) (cid:19) ± − p ( x ) p ( x ) exp (cid:18) − ββ p ( x ) (cid:19) , now the integral in (1) can be outperformed with these distributions to obtain a pair ofeffective Boltzmann factors B ± p ( x ) ( | x | ) = (1 ± p ( x ) β | x | ) ∓ p ( x ) . (8)Following the steps described in Sec. 2, it can be shown that substituting the inverseof B ± p i ( | x | ) into formulae (4) and (3), in that order, one obtains the two entropy measures H + ( X ) = − X x ∈ X p ( x ) ln + ( p ( x )) , H − ( X ) = − X x ∈ X p ( x ) ln − ( p ( x )) , (9)where the effective logarithms define as ln + ( ξ ) ≡ − (1 − ξ ξ ) /ξ and ln − ( ξ ) ≡ − ( ξ − ξ − /ξ for ξ ∈ (0 , H + ( X ) = − X x ∈ X X k ∈ N [ p ( x ) ln p ( x )] k k ! , H − ( X ) = − X x ∈ X X k ∈ N ( − k +1 [ p ( x ) ln p ( x )] k k ! , (10)7emarking that in general there is a region identified with physical phenomena slightly outof equilibrium [24–26]. We are to focus our analysis on the latter region, since the limitingcase of Shannon is already know.Unlike other non-extensive entropies the functionals (9) do not depend on free param-eters but only on the probability distribution. Actually, the lack of parameters and theirasymptotical behaviour grant both entropies of full stability, see [12], which is an essentialattribute for managing information either statistical or algorithmically, as we are to shownow. The superstatistical formulation of entropy can account for systems out of equilibrium.We have also discussed how the entropy and the Kolmogorov complexity as measures ofinformation relate each other through (7). In turn, we are to translate those arguments tothe entropies (9), thus we shall formally state a noiseless coding theorem in view of H ± ( X ). Theorem 1.
Generalised noiseless coding theorem. Let the Nagumo-Kolmogorov function ϕ ( x ) = x , then the expected lengths L ± = P x ∈ X p ( x ) | x | ± satisfy L ± ≥ H ± ( X ) , with equality iff | x | ∗± = − ln ± p ( x ) for every x in X .Proof. Note that the difference L ± − H ± ( X ) = X x ∈ X p ( x ) | x | ± + X x ∈ X p ( x ) ln ± p ( x )= X x ∈ X p ( x ) ( | x | ± + ln ± p ( x )) ≥ , (11)directly implies that | x | ± ≥ − ln ± p ( x ), for the reason that every | x | ± is an integer. Hence,the equality is attained iff the individual lengths | x | ± = | x | ∗± are optimal.What the Theorem 1 tells us is that the minimum rate of data compression that canbe accomplished by a codification process is bounded from below by the entropy scale thatcharacterises the statistics of the system involved. This result has been deeply studied in[27], although it conforms a cornerstone for our current purposes, namely for the statementof the following theorem. Theorem 2.
Let p ( x ) be a recursive probability distribution. For a linear cost func-tion ϕ ( x ) = x , the entropy measures H ± ( X ) induce the existence of effective complexities K + ( X ) and K − ( X ) such that ≤ X x ∈ X p ( x ) K ± ( x ) − H ± ( X ) ≤ K ± ( X ) , here K + ( X ) and K − ( X ) are interpreted, respectively, as average lower and upper boundson the statistical complexity K(X).Proof. The inequality at the left implies that H ± ( X ) ≤ P x ∈ X p ( x ) K ( x ), which is assuredby Theorem 1 given that K ± ( x ) ∼ | x | , attaining the equality as long as K ± ( x ) = | x | ∗± .On the other hand, to prove the second inequality suppose that K ± ( x ) = c ′ | x | ∗± , c ′ ≥ K ± ( x ) + O (1) = | x | ∗± , then c ′ X x ∈ X p ( x ) | x | ∗± + X x ∈ X p ( x ) ln ± p ( x ) ≤ c ′ | X | ∗± , regrouping terms on both sides, we get c ′ X x ∈ X (cid:0) − p ( x ) − (cid:1) p ( x ) ln ± p ( x ) ≤ X x ∈ X p ( x ) ln ± p ( x ) , since p ( x ) <
1, it follows that 1 − p ( x ) − <
0, hence the inequality is true, while theequality holds for p ( x ) = 1. Therefore we have the theorem.As a remark, here the effective K ± ( X ) are not combinatorial measures of informationbut statistical CMs and they shall not be directly interpreted as descriptive bounds on theKolmogorov complexity as stated in Def. 2.1, rather what the measures K ± ( X ) quantify isan average rate of complexity in agreement with the information measured by the entropies H ± ( X ).As an example, consider the probability distribution: p ( x ) = .y if x = x − .y if x = x , where y is the binary representation of a number 0 .y between 0 and 1.For the entropy measure H + ( X ) we have:0 ≤ ( c ′ − − .y ln + .y − (1 − .y ) ln + (1 − .y )) ≤ c ′ ( − ln + .y − ln + (1 − .y )) , but − x ln ± x ≤ − ln ± x , with equality if x = 1, then from the expression above we get( c ′ − − ln + .y − ln + (1 − .y )) ≤ c ′ ( − ln + .y − ln + (1 − .y )) , analogously for H − ( X ). 9 .2 Kullback-Leibler divergence as a complexity measure Sometimes it might be of interest how a given probability distribution is different fromanother one, typically a prior with respect to a trial distribution. In the context of infor-mation this leads to define the entropy of the distribution p relative to another distribution q , such that H ( p k q ) = − X x ∈ X p ( x )Λ( p ( x )) + X x ∈ X p ( x )Λ( q ( x )) , (12)which is a generalisation of the Kullback-Leibler divergence [13].The expression in (12) can be interpreted as a measure of information gain [28, 29].This is fairly intuitive since q is known as the prior in the Bayesian probability theory andconveys all the initial speculations about something before performing any observation.Yet the prior may haul redundancies, for example if q ( x ) = 1 / dim( X ) for all x , then weare led again to the entropy up to a constant.Nonetheless, as shown in [7], the prior distribution q can induce interesting results.Imagine that the prefix Turing machine U runs the program x , delivers the outcome y and halts, which is expressed in symbols as U ( x ) = y . In this respect, we arrive to ageneralisation of Eq. (6): q ( y ) = ( X x ∈ X ǫ − β | x | : U ( x ) = y ) , (13)acting as a mirror of p ∼ ǫ − β | x | over the set N .Indeed, using this prior, and following the same arguments given in [7], we are to showthat one could think of (12) as a generalisation of a superstatistical algorithmic entropy.Suppose now that we are particularly interested in those programs whose output is thestring s , hence the auxiliar distribution that allows us to select that sort of programs is p y ( s ) = (cid:26) y = s , (14)then we compute the entropy of p y ( s ) relative to (13) to obtain H ( p y k q ) = − X l ∈ N p y ( l )Λ( p y ( l )) + X l ∈ N p y ( l )Λ X x ∈ X ǫ − β | x | : U ( x ) = y ! = K U ( y ) + Λ ( Z ) , (15)where Z = P x ∈ X ǫ − β | x | is the partition function, and the algorithmic entropy reads K U ( y ) = − Λ X x ∈ X ǫ − β | x | : U ( x ) = y ! , (16)10mplying that the relative entropy (15) is a generalisation of the algorithmic entropy inDef. 2.1, cf. [4, 5, 30].For instance, when Eq. (15) is put into terms of Λ( x ) = ln( x ) and ǫ x = e x , one canobtain the algorithmic entropy (parallel to Shannon’s entropy) reported by Baez [7], namely K U ( y ) = − ln X x ∈ X e − β | x | : U ( x ) = y ! , please note that according with Def. 2.1, the complexity of the shortest program x ∈ X thatprints y and halts, is specially derived from H ( y ) for β = ln 2, yielding K ( y ) = | x | + O (1).The expressions (15) and (16) can now be inserted into the structure of the generalisedentropies H + ( X ) and H − ( X ). To simplify the analysis we are to use their series represen-tations (10). In that regard the entropy H + ( X ) of a distribution p y ( s ) relative to a prior q ( y ), as defined in (13), becomes H + ( p y k q ) = − X l ∈ N X k ∈ N k ! [ p y ( l ) ln p y ( l )] k + X l ∈ N X k ∈ N k ! " p y ( l ) ln X x ∈ X e − β | x | : U ( x ) = y ! k = − X k ∈ N k ! ln k X x ∈ X e − β | x | : U ( x ) = y ! + X k ∈ N k ! ln k X x ∈ X e − β | x | ! = K + ( y ) + X k ∈ N k ! ln k ( Z ) , (17)where K + ( y ) is the effective algorithmic entropy that generalises the one given in [7] andconsequently in [4, 5, 30].Ditto, now we compute the entropy H − ( X ) of a distribution p y ( s ) relative to a prior11 ( y ), which yields H − ( p y k q ) = − X l ∈ N X k ∈ N ( − k +1 k ! [ p y ( l ) ln p y ( l )] k + X l ∈ N X k ∈ N ( − k +1 k ! " p y ( l ) ln X x ∈ X e − β | x | : U ( x ) = y ! k = − X k ∈ N ( − k +1 k ! ln k X x ∈ X e − β | x | : U ( x ) = y ! + X k ∈ N ( − k +1 k ! ln k X x ∈ X e − β | x | ! = K − ( y ) + X k ∈ N ( − k +1 k ! ln k ( Z ) , (18)where K − ( y ) is the effective algorithmic entropy related to the entropy H − ( X ).There is a curious aspect that we would like to highlight regarding the effective algo-rithmic entropies K + and K − , derived from (17) and (18) respectively. Indeed they do notonly generalise the algorithmic entropy in [7], but at the same time are special cases of therelative entropies H + ( p y k q ) and H − ( p y k q ) and satisfy Theorem 2. Suggesting that theentropies 9 are fundamental measures of information.Finally, we have discussed that the algorithmic entropies K + and K − differ from thestandard case K in a regime of low-density programs whereas the three measures coincideasymptotically for bigger chunks of data. In principle, these functionals are uncomputable,still let us make use of a numerical trick to give some hint of their behaviour. To illustratethis, we have generated the Fig. 1, where one can observe that there is a region in whichthe algorithmic entropy K − would account for a more economical description than the twoother measures, yet as | x | (usually measured in bytes) grows the three measures tend tocoincide. We cannot assess the definite impact of their actual differences on the descriptionof complex objects, but can the complex structures whose number of components arereasonably small, have a different description than the one estimated by the standardtheory? We have explored a route that may establish a link between the algorithmic informationtheory and the superstatistics framework. The consequences of such unification couldderive in a different understanding on the processes of information. Some of these aspectsare manifested in our results that we shall summarise as follows.12 | x | [units of digital information] a l go r i t h m i c e n t r o p y K n K + n K − n Figure 1: Numerical comparison of effective algorithmic entropies, K n , K + n and K − n indifferent data size regimes | x | . Note how the three measures converge as | x | grows, whilethey differ from each other at extremely small chunks of information, below 8 bits. As oftoday, still there are applications depending on codes of such lengths. That is the case ofthe American Standard Code for Information Interchange (ASCII), which is an 8-bit code.Formally speaking, the connection between the superstatistical framework and the Kol-mogorov complexity comes immediately by following the relation (7), which is nothing buta generalisation of the statement formulated in [4, 5, 30], which associates the Shannon’sentropy with the SCM. Yet, from our generalised statistical viewpoint, any measure ofcomplexity can be interpreted as an average rate of data compression that usually willdiffer from the one appraised by the standard theory.It does not mean, even though, that all CMs are truly realisable. Note that the ef-fective Boltzmann factor (1) is computed from a probability distribution that may conveyfree parameters. Although this parametric structure grants the entropies (4) of enoughflexibility to be potentially adapted to any circumstance. But there is a tradeoff with thefunctional’s stability, which is a condition that shall indistinctly be satisfied in order toqualify as an information measure and, in consequence, as a CM.It has been shown in [12] that the generalised entropies H ± ( X ) absolutely fulfil the13riteria of stability as the Shannon’s entropy does [11]. Besides, these are entropies thatdo not depend on free parameters and resemble the Shannon’s entropy in the regime ofhigh-density states. Both formulations state individually a coding theorem (see Theorem 1and [27]) and also correspond to the effective CMs K ± ( X ), as assured by Theorem 2. Evenmore, what Theorem 2 confirms is that the CMs K ± ( X ) inherit the attributes adjudicatedto the entropies H ± ( X ), therefore they qualify genuinely as CMs of information.To reinforce our arguments, we have also showed that pursuing a generalisation of thenotion of relative entropy (12), permits the reconstruction of the complexities K ± ( X ), asrepresented in Eqs. (17) and (18). These results are equivalent to our previous computa-tions via Theorem 2.As a final remark, the complexities K ± ( X ) converge asymptotically to the usual Kol-mogorov measure, while they reflect differences with the standard theory in case that theamount of information is handled in small chunks of data. Certainly K − could imply that,when dealing with short strings, the average rate of complexity is susceptible to furthercompression (or a briefer description) than the one specified by K . We point out thatthere is a possible route to generalise the algorithmic information theory on the basis ofsuperstatistics, where H ± ( X ) are the unique entropies resembling the Shannon’s theorywhile accounting for non-equilibrium phenomena without using parameters. A Generalised logarithms and stretched exponentials
Let ǫ : R → R be a stretched exponential. The functions Λ : R → R fulfilling the conditionsΛ(1) = 0 and Λ ′ (1) = 1, such that Λ( ǫ x ) = ǫ Λ( x ) = x are called generalised (or effective)logarithms. Their series representation can usually be put into terms of the fundamentallogarithm functions ln or log. For a more detailed discussion than the one presented here,see Ref. [31]. We limit to present the basic structure of the effective logarithms ln + andln − , and their inverses.We have the functions ln + ( x ) ≡ − − x x x ln − ( x ) ≡ − x − x − x , (19)for x ∈ [0 , ( ± ) do not fulfil the three laws of logarithms. Yet they can beexpanded in series asln + ( x ) = ln x + 12! x ln x + 13! x ln x + 14! x ln x + · · · , (20)and ln − ( x ) = ln x − x ln x + 13! x ln x − x ln x + · · · , (21)14ote that the first term is in both cases leads the series, while higher order terms becomesubdominant as x →
0. This peculiar flexibility grants to entropies (9) the simultaneouscharacter of accounting for non-equilibrium phenomena in the low-probability regime, whilepreserving a well defined thermodynamical limit.The corresponding stretched exponentials of (19) do not posses a closed form, in thiscase we make use of a numerical representation. These functions have been constructed asexp ± ( x ) ≡ exp( − x ) ∞ X j =0 a ± ( j ) x j , a ± ( j ) ∈ R , (22)the first nine coefficients a ± ( j ) are given in Table 1. a + ( j ) a − ( j ) j = 8 -0.000157095 0.000105402 j = 7 0.00373467 -0.00211934 j = 6 -0.0362676 0.0166679 j = 5 0.186358 -0.0675544 j = 4 -0.546751 0.16867 j = 3 0.905157 -0.317048 j = 2 -0.709322 0.3725 j = 1 0.0228963 0.0147449 j = 0 1 1Table 1: a ( ± ) coefficients. References [1] A. N. Kolmogorov. Three approaches to the definition of the concept “quantity ofinformation.
Probl. Peredachi Inf. , 1(1):3–11, 1965.[2] R. J. Solomonoff. A preliminary report on a general theory of inductive inference.1960.[3] Gregory J. Chaitin. On the simplicity and speed of programs for computing infinitesets of natural numbers.
J. ACM , 16(3):407–422, July 1969.[4] Gregory J. Chaitin. A theory of program size formally identical to information theory.
J. ACM , 22(3):329–340, July 1975. 155] Alexander K Zvonkin and Leonid A Levin. The complexity of finite objects and thedevelopment of the concepts of information and randomness by means of the theoryof algorithms.
Russian Mathematical Surveys , 25(6):83, 1970.[6] Kohtaro Tadaki. A statistical mechanical interpretation of algorithmic informationtheory: Total statistical mechanical interpretation based on physical argument. In
Journal of Physics: Conference Series , volume 201, page 012006. IOP Publishing,2010.[7] John Baez and Mike Stay. Algorithmic thermodynamics.
Mathematical Structures inComputer Science , 22(5):771–787, 2012.[8] Ming Li and Paul Vit´anyi.
An introduction to Kolmogorov complexity and its appli-cations , volume 3. Springer, 2008.[9] Stefano Galatolo, Mathieu Hoyrup, and Crist´obal Rojas. Effective symbolic dynam-ics, random points, statistical behavior, complexity and entropy.
Information andComputation , 208(1):23 – 41, 2010.[10] C. Beck and E.G.D. Cohen. Superstatistics.
Physica A: Statistical Mechanics and itsApplications , 322:267 – 275, 2003.[11] B. Lesche. Instabilities of R´enyi entropy.
J. Stat. Phys. , 27(2):419–422, 1982.[12] Nana Cabo Bizet, Jes´us Fuentes, and Octavio Obreg´on. Generalised asymptotic classesfor additive and non-additive entropies.
EPL (Europhysics Letters) , 128(6):60004, feb2020.[13] S. Kullback and R. A. Leibler. On information and sufficiency.
Ann. Math. Statist. ,22(1):79–86, 03 1951.[14] Kohtaro TADAKI. A generalization of chaitin’s halting probability ω and haltingself-similar sets. Hokkaido Math. J. , 31(1):219–253, 02 2002.[15] Constantino Tsallis and Andre MC Souza. Constructing a statistical mechanics forbeck-cohen superstatistics.
Physical Review E , 67(2):026106, 2003.[16] Gregory J. Chaitin.
Exploring Randomness . Springer-Verlag London, 1 edition, 2001.[17] A. Lempel and J. Ziv. On the complexity of finite sequences.
IEEE Transactions onInformation Theory , 22(1):75–81, 1976.[18] L. L. Campbell. Definition of entropy by means of a coding problem.
Zeitschrift f¨urWahrscheinlichkeitstheorie und Verwandte Gebiete , 6(2):113–118, Jun 1966.1619] L. A. Levin. Universal sequential search problems.
Problems Inform. Transmission ,9(3):115–116, 1973.[20] Walter Kirchherr, Ming Li, and Paul Vit´anyi. The miraculous universal distribution.
The Mathematical Intelligencer , 19(4):7–15, 1997.[21] C. E. Shannon. A mathematical theory of communication.
The Bell System TechnicalJournal , 27(3):379–423, July 1948.[22] O. Obreg´on. Superstatistics and gravitation.
Entropy , 12:2067, 2010.[23] O. Obreg´on. Generalized information and entanglement entropy, gravitation andholography.
Int. J. Mod. Phys. A , 30(16):1530039, 2015.[24] J. Fuentes, J.L. L´opez, and O. Obreg´on. Generalised fokker-planck equations de-rived from non-extensive entropies asymptotically equivalent to boltzmann-gibbs. (preprint) , 2020.[25] A. Gil-Villegas, O. Obreg´on, and J. Torres-Arenas. Computer simulation of effectivepotentials for generalized boltzmann-gibbs statistics.
Journal of Molecular Liquids ,248:364 – 369, 2017.[26] J.L. L´opez, O. Obreg´on, and J. Torres-Arenas. Thermodynamic geometry for a non-extensive ideal gas.
Phys. Lett. A , 382:364, 2018.[27] J. Fuentes and O. Obreg´on. Generalised noiseless coding theorems. (preprint) , 2020.[28] R.J. Solomonoff. A formal theory of inductive inference. part i.
Information andControl , 7(1):1 – 22, 1964.[29] R.J. Solomonoff. A formal theory of inductive inference. part ii.
Information andControl , 7(2):224 – 254, 1964.[30] Gregory J. Chaitin. Algorithmic entropy of sets.
Computers and Mathematics withApplications , 2(3):233 – 245, 1976.[31] Rudolf Hanel, Stefan Thurner, and Murray Gell-Mann. Generalized entropies and log-arithms and their duality relations.