Maximum-Entropy Weighting of Multi-Component Earth Climate Models
aa r X i v : . [ phy s i c s . g e o - ph ] A ug Climate Dynamics manuscript No. (will be inserted by the editor)
Maximum-Entropy Weighting of Multiple Earth ClimateModels
Robert K. Niven
Abstract
A maximum entropy-based framework is pre-sented for the synthesis of projections from multipleEarth climate models. This identifies the most represen-tative (most probable) model from a set of climate mod-els – as defined by specified constraints – eliminatingthe need to calculate the entire set. Two approaches aredeveloped, based on individual climate models or en-sembles of models, subject to a single cost (energy) con-straint or competing cost-benefit constraints. A finite-time limit on the minimum cost of modifying a modelsynthesis framework, at finite rates of change, is alsoreported.
Keywords climate model · maximum entropymethod · Boltzmann principle · thermodynamics · cost-benefit analysis · finite-time information limit A major challenge facing humanity is the possibility ofclimate change due to human and/or natural forcings,and how best to respond in a rational and informedmanner. To this end, detailed global circulation models(GCMs) have been developed to predict the behaviourof the Earth climate system (atmosphere and oceans),involving solution of the continuity, Navier-Stokes, an-gular momentum and energy equations and constitutive
Robert K. NivenSchool of Engineering and Information Technology, The Univer-sity of New South Wales at ADFA, Northcott Drive, Canberra,ACT, 2600, Australia.Currently at Institut Pprime, CNRS / Universit de Poitiers /ENSMA, CEAT, 43 rue de l’Arodrome, F-86036 Poitiers Cedex,France.E-mail: [email protected] relations over two- or three-dimensional domains, sub-ject to various initial and boundary conditions [1, 2].These are run interrogatively to yield climate projec-tions – predictions as a function of future time – to ex-amine various forcing and response scenarios. However,a serious difficulty for policy-makers is the promulga-tion of multiple models by different research groups,due to different modelling priorities, assumptions andinput parameters, and inherent difficulties in the con-struction of climate models, especially in the handlingof coupled phenomena (e.g. humidity [3]) and the needto dramatically reduce their computational complexity,necessitating a turbulence closure scheme. Even withthe same (or similar) inputs, different models can pro-vide significantly different climate projections [4]. A ra-tional framework for the synthesis of such projections– which operates in a transparent and fully defensiblemanner – is urgently required, to avoid the lack of ob-jectivity of seemingly ad hoc amalgamations of projec-tions from different groups.Over the past century, maximum entropy (MaxEnt)methods have been developed for the construction ofprobabilistic models, initially in thermodynamics [5, 6]and subsequently for all probabilistic systems [7, 8, 9].Although imbued with several information-theoretic in-terpretations [7, 8, 9, 10], the success of such modelsrests ultimately on the maximum probability (MaxProb)principle of Boltzmann [5, 6, 11, 12, 13, 14, 15, 16, 17,18]: “ a system can be represented by its most probablestate. ” This provides a probabilistic definition of the(relative) entropy function: H rel = K ln P (1)where P is the governing probability of an observablerealization (macrostate) of the system and K is a con-stant. The maximum of H rel thus coincides with that of P . If the system can be represented by the allocation ofdistinguishable balls (objects) to distinguishable boxes(categories), then P will satisfy the multinomial distri-bution [19, 20, 21]: P = Prob( n , ..., n s | q , ..., q s , N ) = N ! s Y i =1 q n i i n i ! (2)where n i is the observed occupancy (number of balls)and q i is the prior probability of the i th category, N is the total number of balls and s the number of cate-gories. Insertion of (2) into (1) with K = N − , takingthe asymptotic limit N → ∞ and n i /N → p i , gives the(negative) Kullback-Leibler entropy function: H KL = − s X i =1 p i ln p i q i (3)Maximisation of (3) for a system which satisfies (2),subject to its constraints, is therefore equivalent to seek-ing its most probable realization in the asymptotic limit,subject to the same constraints.We therefore adopt a broader concept of ‘entropy’than that normally used in the physical sciences. Cli-matologists will be familiar with the thermodynamicentropy S , which has a clearly defined meaning as thestate function S = R δQ/T (Clausius) or S = k ln W (Boltzmann), where δQ is the increment of heat en-tering a system, T is temperature, k is the Boltzmannconstant and W is the number of microstates within agiven realization (macrostate) of a system. Its rate ofchange is dS/dt , of which the excess (exported) compo-nent is commonly termed the thermodynamic entropyproduction ˙ σ [30, 31, 32]. However, under the MaxProbor MaxEnt approach adopted here, entropy acquires amore fundamental meaning in terms of the probabilis-tic state space of a system, however defined. To empha-sise their generic character, such entropies are here de-noted H . The thermodynamic entropy is in fact a specialcase of the generic, being derivable by the applicationof MaxEnt to an energetic system [5, 6, 7, 8, 9]. Theensuing analyses are based entirely on generic entropyfunctions, not necessarily related to S ; that said, muchof the underlying mathematical structure is identical.The aim of this study is to construct a frameworkfor the synthesis of climate projections from multipleclimate models, based on the MaxProb (hence Max-Ent) principle. By analogy with thermodynamics, twoapproaches are presented, involving constraints on theproperties of individual climate models or of ensem-bles of climate models. In each case, the analysis iden-tifies the most representative (most probable) modelfrom a set of climate models, circumventing the needto calculate the entire set. Other implications of these frameworks, which arise from the mathematical struc-ture given by Jaynes [7, 8, 9], are examined. In addition,we report a curious finite-time limit on the minimumcost of varying the overall framework at specified ratesof change, using a theorem from finite-time thermody-namics [22, 23, 24, 25, 26, 27]. Consider an individual Earth general climate model(GCM), composed of J separable computational com-ponents. Each component j = 1 , ..., J is executed by asingle choice i ( j ) of algorithm, methodology or paradigm,from a total of I ( j ) possible choices. As shown in Figure1, this gives a combinatorial scheme in which an indi-vidual model is constructed from a set of unique choices i ( j ) ∈ { , ..., I ( j ) } , ∀ j . We assume that all models arecalculated using the same set of input parameters andassumptions θ ; moreover, to accommodate variabilityor errors in θ , each model will yield a set or domain ofclimate projections, which can be explored by MonteCarlo analysis or by some other means. If we move be-yond the deterministic mindset that an individual cli-mate model must be the “correct” one, how should weweight the projection sets from different climate mod-els, to obtain a (statistical) picture of their merged setsof projections? One could simply combine an availableset of model outputs using equal or assigned weightingfactors, as suggested in [4], but unless every possiblecombination has been computed, the resulting compos-ite model will be rather arbitrary. In addition, if themodel space is infinite (or merely very large), it will beimpossible to compute the composite model in the life-time of the universe (or in any reasonable time frame).Moreover, the use of equal weights does not allow theincorporation of additional constraints on the modelspace. We therefore propose a MaxProb-based (henceMaxEnt-based) framework for the weighting of multi-ple climate models, for which two distinct approachesare available.2.1 “Microcanonical” FrameworkWe first construct a “microcanonical” climate modelweighting framework, based on the properties of indi-vidual climate models. Extending the representation inFigure 1, consider a single climate model shown in Fig-ure 2(a), in which we choose to rank each choice of algo-rithm or method i ( j ) by its cost or energy ǫ ij , indicat-ing (for example) the relative programming and com-putational cost of execution of this particular choice. i ( j )=1 i ( j )=2 i ( j )=3 i ( j )= I ( j ) ... ... ...... ... j =2 j =1 j = Jj =3 Model
Fig. 1
Generic combinatorial representation of the climatemodel weighting framework, showing a single model composedof individual discrete choices of the i ( j )th algorithm or method-ology for each model component j = 1 , ..., J .(b) ... ...... gij , ... ...... ... j =1 j =2 ... j j I j j (a) j j j I j ... ...... gij , ... ...... ... j =1 j =2 ... Fig. 2
Combinatorial representations of (a) the microcanonicalframework, showing a single model composed of g ij degeneratechoices i ( j ) for each model component j = 1 , ..., J (ranked byenergy level ǫ ij ); and (b) the canonical framework, composedof an ensemble of N amalgamated microcanonical models. Ballnumbers denote the model index. Each energy level i ( j ) is considered to have the degen-eracy g ij ≥
1, equal to the number of choices whichshare the same cost ǫ ij . The ranking scheme i ( j ) there-fore accounts for, but does not distinguish between,choices of equal cost. Each level i ( j ) is taken to havethe occupancy m ij ∈ { , } (the choices are unique).From simple probabilistic considerations [19, 20, 21] forequiprobable degenerate choices, the probability of agiven choice i ( j ) is given by the reduced multinomial distribution: P ( µ ) i | j = Prob( m j | g j , θ ) = 1 G j I ( j ) Y i =1 g m ij ij m ij ! (4)where m j = { m j , ..., m I ( j ) j } , g j = { g j , ..., g I ( j ) j } , G j = P I ( j ) i =1 g ij and superscript µ denotes the micro-canonical framework. Eq. (4) reduces to P ( µ ) τ | j = g τj /G j ,where τ ( j ) is the selected choice, but it is preferableto keep the m ij explicit using (4). The probability ofselecting a single overall model, assuming that the J components are independent, is therefore given by the“multi-multinomial”: P ( µ ) = Prob( m | g , θ ) = J Y j =1 Prob( m j | g j , θ )= J Y j =1 P ( µ ) i | j = J Y j =1 G j I ( j ) Y i =1 g m ij ij m ij ! (5)where m and g are the respective matrices of m ij and g ij . Each model is subject to J constraints on the totaloccupancy within each component j : I ( j ) X i =1 m ij = 1 , ∀ j (6)Assuming that the costs are additive over the J com-ponents, we can also include a constraint on the totalcost E of running the overall model: J X j =1 I ( j ) X i =1 ǫ ij m ij = E (7)To determine the most probable or equilibrium model,given the above occupancy and total energy constraints,we should maximise (5) with respect to the unknowns { m ij } , subject to (6)-(7). From the Boltzmann defini-tion (1) with K = 1, this is equivalent to maximisingthe entropy: H ( µ ) = ln P ( µ ) = J X j =1 n − ln G j + I ( j ) X i =1 ( m ij ln g ij − ln m ij !) o (8)subject to the same constraints. We again emphasisethat (8) is defined on the space of climate models, andhas no connection to the thermodynamic entropy S .If one adopts the Stirling [33] approximation ln m ij ! ≈ m ij ln m ij − m ij for large m ij (in fact this is not strictlyvalid), (8) reduces to: H ( µ ) St = J X j =1 n − ln G j − I ( j ) X i =1 ( m ij ln m ij g ij + m ij ) o (9) Extremisation of (9) subject to (6)-(7) yields the (mi-crocanonical) Boltzmann distribution at equilibrium: m ∗ ij = g ij e − ( λ j +1) − λ E ǫ ij = 1 Z ( µ ) j g ij e − λ E ǫ ij (10)where ∗ denotes the asymptotic (Stirling-approximate)extremum, λ j and λ E are Lagrangian multipliers re-spectively for the allocation (6) and total energy (7)constraints, and Z ( µ ) j = e λ j +1 = P I ( j ) i =1 g ij e − λ E ǫ ij isthe j th microcanonical partition function. Eq. (10) canbe solved in conjunction with (7) to calculate the pre-dicted occupancies m ∗ ij . If the occupancies are restrictedto discrete values { , } , this will yield the choices i ( j )of the optimal climate model, subject to the total en-ergy constraint E . In practice, numerical solution willtypically give floating-point values of m ∗ ij , which can beused as weighting factors with which to combine mul-tiple models of the same total energy E .As noted, since m ij ∈ { , } , Stirling’s approxima-tion does not strictly apply to the above analysis, andso (10) is only an approximate solution. This can beaddressed by directly maximising the non-asymptoticentropy (8) with respect to m ij , subject to (6)-(7), giv-ing the equilibrium distribution [13, 14, 15, 16]: m ij = Λ − (cid:0) ln g ij − λ j − λ E ǫ ij (cid:1) (11)where Λ − ( y ) = ψ − ( y −
1) is the upper inverse of the function Λ ( x ) = ψ ( x + 1), defined for convenience, in which ψ ( x )is the digamma function. (Note (11) can be written withadditional terms in G j and N [c.f. 16]; these are hereincorporated in λ j .) In this case, no explicit partitionfunctions exist, and (11) must be solved in conjunctionwith (6)-(7). This method will give more precise valuesof the optimal weighting factors m ij , although in prac-tice, its numerical solution can be difficult. The non-asymptotic solution (11) is itself an approximation tothe true discrete MaxProb solution (with m ij ∈ { , } ),which must be identified by a (computationally expen-sive) combinatorial search scheme. Example : The above framework can be demonstratedby a simple example, in which a climate model is con-structed from J = 3 components, with I = [3 , , g = , ǫ = units (12) In this framework, more (degenerate) algorithms, andalgorithms with a fourth energy level, are available formodel component 2. For a total energy per model of E = 17 units, the inferred asymptotic (10) and non-asymptotic (11) solutions are, respectively: m ∗ = . . . . . . . . . . , λ ∗ = [0 . , . , . ⊤ , λ ∗ E = 0 . m = . . . . . . . . . . , λ = [0 . , . , . ⊤ , λ E = 0 . m ∗ (or, arguably, m ). In this example, al-gorithms of intermediate cost (the second energy level)have the highest weighting. Some difference is evidentbetween the asymptotic and non-asymptotic solutions m and Massieu functions λ , due to the small modelspace of this simplified example. The energy multipliers λ E of the two solutions are, however, quite similar.2.2 “Canonical” Framework IThe foregoing methodology is mathematically soundand provides a formal framework for the combinationof different climate models. It is, however, somewhatrestrictive in that it only includes models of a singletotal energy E . It is possible to conduct the analysisat a higher “canonical” level – in the same manner asin thermodynamics – by the analysis of “systems ofsystems”, in this case involving ensembles of individ-ual climate models. This is shown in Figure 2(b), inwhich an ensemble is constructed by collecting a sam-ple (without replacement) of N individual models, andamalgamating the results. This can be represented bya combinatorial scheme in which distinguishable balls– labelled by the model index z ∈ { , ..., N } – are al-located to distinguishable levels i ( j ), again indicatingchoices of energy level ǫ ij with degeneracy g ij . Thisgives the occupancies n ij ∈ { , N } for each energy level of the ensemble, which are connected to those for eachmodel by: n ij = N X z =1 m ( z ) ij , ∀ j (15) N = I ( j ) X i =1 n ij = I ( j ) X i =1 N X z =1 m ( z ) ij , ∀ j (16)The probability of a specified set of occupancies n ij fora particular j is now given by [19, 20, 21]: P ( χ ) j = Prob( n j | g j , N, θ ) = N ! G Nj I ( j ) Y i =1 g n ij ij n ij ! (17)where n j = { n j , ..., n I ( j ) j } , while χ denotes the canon-ical framework. The multinomial factor N ! / Q I ( j ) i =1 n ij !accounts for number of permutations of models whichattain the same set of occupancies n j . The probabilityof a specified ensemble, again assuming J independentcomponents, is thus given by the “multi-multinomial”: P ( χ ) = Prob( n | g , N, θ ) = J Y j =1 Prob( n j | g j , N, θ )= J Y j =1 P ( χ ) j = J Y j =1 N ! G Nj I ( j ) Y i =1 g n ij ij n ij ! (18)where n is the matrix of n ij , whence (1) with K = N − gives the entropy: H ( χ ) = 1 N ln P ( χ ) = 1 N J X j =1 n ln N ! − N ln G j + I ( j ) X i =1 ( n ij ln g ij − ln n ij !) o (19)This is subject to constraints on the occupancies, givenby the first part of (16).How should we analyse ensembles of models? Wecould, in the first instance, examine the set of all pos-sible models, of cardinality Q Jj =1 G Nj [17]. This wouldnot, however, be very informative, since all models would a priori be of equal weight and so would not be discrim-inated by the MaxProb (or MaxEnt) method. The totalensemble also does not allow the inclusion of additionalinformation about the desired set of models. If, on theother hand, we impose a constraint on the mean energyof the ensemble:1 N J X j =1 I ( j ) X i =1 ǫ ij n ij = h E i (20)we then impose a decision rule on its desired compo-sition, namely, on the average cost of constructing its constituent models. In contrast to the microcanonicalframework, this allows models of greater-than-averagetotal energy E > h E i , so long as these are balanced inthe ensemble by models of lower energy E < h E i . Com-bining (19) with (16) and (20) gives the Lagrangian: L ( χ ) = J X j =1 n N ln N ! − ln G j + I ( j ) X i =1 ( n ij N ln g ij − N ln n ij !) o − J X j =1 κ j n I ( j ) X i =1 n ij N − o − κ E n J X j =1 I ( j ) X i =1 ǫ ij n ij N − h E i o (21)where κ j and κ E are Lagrangian multipliers for theallocation (16) and energy (20) constraints. Extremisa-tion gives the non-asymptotic equilibrium solution: n ij = Λ − (cid:0) ln g ij − κ j − κ E ǫ ij (cid:1) , ∀ j (22)(again all constant terms are brought into κ j ). For anygiven N , these can be solved numerically in conjunctionwith the constraints (16) and (20), to give the optimumnumber of times (weighting factor) n ij that each choice i ( j ) should be included in the ensemble, subject to h E i .When the factorials in (19) satisfy Stirling’s approx-imation, (22) gives the (canonical) Boltzmann distribu-tion at equilibrium: n ∗ ij N = 1 Z ( χ ) j g ij e − κ E ǫ ij (23)where Z ( χ ) j = N e κ j +1 = P I ( j ) i =1 g ij e − κ E ǫ ij is the j thcanonical partition function. Example : The canonical framework can be demon-strated using the example described previously (12),now constrained by a mean total energy per model of h E i = 17 units (less than the mean of all possible mod-els, h E i = 24 . n ∗ N = m ∗ , κ ∗ = ln (cid:0) . N − (cid:1) − (cid:0) . N − (cid:1) − (cid:0) . N − (cid:1) − , κ ∗ E = λ ∗ E (24)For N = 27 (say) this gives κ ∗ = [ − . − . − . ⊤ . In comparison, the non-asymptotic solution (22) at N = 27 is: n N = . . . . . . . . . . , κ = [ − . , − . , − . ⊤ , κ E = 0 . j , and is closer to the asymptoticform (24).2.3 “Canonical” Framework IIOne difficulty with the above canonical framework isthat it – like its microcanonical precursor – still requiresseparability of the model into J distinct components,for which the costs ǫ ij are additive. In more generalsituations, this separability may not be possible dueto coupling between components. In that case we mustrevert to a model space based on ensembles of entiremodels. Severing all connection to the components j , weconsider a model space from which we collect a sample(ensemble) of N models, containing n ı models each oftotal energy E ı . Each energy level has degeneracy g i .The probability of a specified ensemble is: P ( χ ) II = Prob( n | g , N , θ ) = N ! G N I Y ı =1 g n ı ı n ı ! (26)where G = P Iı =1 g ı . Boltzmann’s equation (1) with K = N − gives the entropy: H ( χ ) II = 1 N ln P ( χ ) II = 1 N n ln N ! − N ln G + I X ı =1 ( n ı ln g ı − ln n ı !) o (27)This is subject to the occupancy and mean ensembleenergy constraints: I X ı =1 n ı = N (28)1 N I X ı =1 E ı n ı = h E i (29)Forming the Lagrangian and extremisation gives thenon-asymptotic equilibrium occupancies: n ı = Λ − (cid:0) ln g ı − ϕ − ϕ E E ı (cid:1) (30) where ϕ and ϕ E are Lagrangian multipliers for theoccupancy and energy constraints. If (27) satisfies theStirling approximation, the distribution reduces to: n ∗ ı N = 1 Z ( χ ) II g ı e − ϕ E E ı (31)where Z ( χ ) II = N e ϕ +1 = P Iı =1 g ı e − ϕ E E ı is the canon-ical II partition function. Either (30) or (if valid) (31)can be solved in conjunction with the constraints (28)-(29), to give the weights n ı of the most representativemodel.2.4 SummaryAt this point, it is worth summarising some importantfeatures of the microcanonical and two canonical frame-works proposed: • As evident from the predicted solutions (10)-(11) and(22)-(23), if one seeks the optimal model to describea set of climate models, it is not necessary to com-pute all possible combinations of models. Using theMaxProb method, one can directly calculate the sin-gle model or a reduced set of models which best rep-resents the model set, subject to constraint(s) on themodel or ensemble properties. The effect of two com-peting constraints is examined in the next section. • The microcanonical framework imposes constraint(s)on individual models, whereas the two canonical frame-works impose constraint(s) over ensembles of models.The latter enable the synthesis of larger sets of mod-els. • Note that, due to the assumed independence of the J model components, the microcanonical and canonicalI frameworks are “multi-multinomial” (5) and (18).The choices i ( j ) for a specified j = ϑ are thus inde-pendent of the other choices j = ϑ . The MaxProbprediction can therefore be computed using individ-ual models composed of whichever choices i ( j ) areconvenient, so long as the overall set conforms to theMaxProb prediction. In the canonical II model, weovercome the difficulty of coupled model componentsby considering ensembles of entire models, with con-straints on the total energy of each model. • How should we interpret the Lagrangian multiplierson the energy constraint? By analogy with thermo-dynamics, these can be interpreted as λ E = 1 /kT ( µ ) , κ E = 1 /kT ( χ ) and ϕ E = 1 /kT ( χ ) II , where the T pa-rameters are framework “temperatures” and k is aconstant with units of energy (or cost) per tempera-ture unit. The T ’s are not thermodynamic tempera-tures, but express the distribution of energy over the available energy levels, in the relevant model or en-semble space. In effect, they serve as proxy variablesfor the total model cost E or mean ensemble cost h E i . • Although the MaxProb framework is primarily de-signed to determine the most probable (maximum en-tropy) model, it is also possible to interrogate the La-grangian to determine the minimum entropy model(s),i.e. those which lie farthest from the optimum. Inthis manner, one can explore the extremities of themodel or ensemble space, to identify model outliers.Since minimum entropy solutions tend to lie on non-continuous boundaries of the solution domain, theyare generally inaccessible to extremisation methods[34]; nonetheless, they should be identifiable by nu-merical optimisation algorithms such as simulated an-nealing. • The mathematical structure of the output from theMaxEnt algorithm gives rise to many more featuresof the predicted solution. Some of these features areexplored in later sections. h E i , as before (29), and also a constraint onsome measure of the average ensemble “worthiness” or“benefit” h B i (for example, a measure of its precisionor accuracy). In this manner, we construct a MaxProbframework with which to conduct cost-benefit analysesof various ensembles of models, and to interrogate thetrade-off between costs and benefits . In general, theenergy and benefit levels will have different ranks, ne-cessitating the use of different indices i ∈ { , ..., I } (asbefore) and ℓ ∈ { , ..., L } . We therefore consider modelchoices ranked by total model energies E iℓ and bene-fits B iℓ , of joint degeneracy g iℓ . The probability thatan ensemble of N models has the occupancies { n iℓ } isgoverned by the multinomial: P = N ! G N I Y i =1 L Y ℓ =1 g n iℓ iℓ n iℓ ! (32)where now G = P Ii =1 P L ℓ =1 g iℓ (for convenience we dropthe super- and subscript labels). From (1) with K = This approach is applicable not only to climate models, butmodels of any type, including economic models. N − , we maximise the entropy: H = 1 N n ln N ! − N ln G + I X i =1 L X ℓ =1 ( n iℓ ln g iℓ − ln n iℓ !) o (33)subject to the constraints:1 N I X i =1 L X ℓ =1 n iℓ = 1 (34)1 N I X i =1 L X ℓ =1 E iℓ n iℓ = h E i (35)1 N I X i =1 L X ℓ =1 B iℓ n iℓ = h B i (36)to give the non-asymptotic equilibrium solution: n iℓ = Λ − (cid:0) ln g iℓ − ω − ω E E iℓ − ω B B iℓ (cid:1) (37)where ω , ω E and ω B are the Lagrangian multipliers.If Stirling’s approximation applies, the entropy is: H St = n ln N − ln G − I X i =1 L X ℓ =1 ( n iℓ N ln n iℓ g iℓ ) o (38)whence extremisation gives: n ∗ iℓ N = g iℓ Z e − ω E E iℓ − ω B B iℓ Z = N e ω +1 = I X i =1 L X ℓ =1 g iℓ e − ω E E iℓ − ω B B iℓ (39)where Z is the partition function.The Lagrangian multiplier ω E can again be inter-preted as an inverse ensemble temperature ω E = 1 /kT ,where k is a constant. The multiplier ω B can be consid-ered as a measure of the overall benefit provided by theensemble, in reciprocal benefit units. In effect, it actsas a proxy variable for the mean benefit h B i . Since h B i measures the average information or value provided bythe framework, it can be interpreted very crudely asa reciprocal density or volume, whereupon we can in-terpret ω B = P/kT , in which P is a mean ensemblepressure (this interpretation should not be taken tooseriously).3.2 Jaynesian Mathematical StructureNow that we have the main results, we can examineseveral important mathematical features of the solu-tion. Most of these were reported in a generic contextby Jaynes [7, 8, 9] (see also Kapur & Kesavan [34] and Tribus [35]), although many were previously knownin thermodynamics. The foregoing microcanonical andcanonical I and II frameworks also exhibit these fea-tures, but it is more interesting to examine the effect oftwo competing constraints.Firstly, for the Stirling-approximate case, substitu-tion of (39) into (38), by sorting into expectations alongthe lines of [8], gives the asymptotic maximum entropy: H ∗ = − ln G − φ + ω E h E i + ω B h B i (40)where for convenience we define the potential function(negative Massieu function) φ = − ω = − ln Z . Themost probable state of the ensemble is thus given bya constant term, plus the Massieu function, plus thesum of products of the constraints and their conjugateLagrangian multipliers.Since the entropy function and constraints are statevariables on the space of ensembles of models, (40) pro-vides a linear homogenous equation which describes theframework . This can be used to examine the responseof the framework to changes in the constraints and/ormultipliers. For constant G and φ we immediately seethat [7, 8, 9]: ∂ H ∗ ∂ h E i (cid:12)(cid:12)(cid:12)(cid:12) h B i = ω E , ∂ H ∗ ∂ h B i (cid:12)(cid:12)(cid:12)(cid:12) h E i = ω B (41)Second differentiation gives the Hessian matrix: − a = ∂ H ∗ ∂ h E i ∂ H ∗ ∂ h E i ∂ h B i ∂ H ∗ ∂ h B i ∂ h E i ∂ H ∗ ∂ h B i = ∂ω E ∂ h E i ∂ω B ∂ h E i ∂ω E ∂ h B i ∂ω B ∂ h B i (42)If the mixed derivatives are equivalent (i.e. H ∗ is con-tinuous and continuously differentiable, at least up tosecond order), this gives the reciprocal or Maxwell-likerelation [8, 9]: ∂ω E ∂ h B i = ∂ω B ∂ h E i (43)Equivalently, (40) can be rewritten as a function of thepotential φ , whence it can be shown that [7, 8, 9]: ∂φ∂ω E (cid:12)(cid:12)(cid:12)(cid:12) ω B = h E i , ∂φ∂ω B (cid:12)(cid:12)(cid:12)(cid:12) ω E = h B i (44) Strictly, if the initial terms in G and φ are constant, the dif-ferential of (40) is a linear homogeneous first-order differentialequation. Absorbing the constant into φ , (40) can then be inter-preted as an Euler equation [c.f. 36]. Second differentiation gives: − α = ∂ φ∂ω E ∂ φ∂ω E ∂ω B ∂ φ∂ω B ∂ω E ∂ φ∂ω B = ∂ h E i ∂ω E ∂ h B i ∂ω E ∂ h E i ∂ω B ∂ h B i ∂ω B (45)giving, again for equivalent mixed derivatives [8, 9]: ∂ h B i ∂ω E = ∂ h E i ∂ω B (46)From (42) and (45), it is evident that: a = α − (47)This defines a Legendre transformation between H ∗ and φ representations of the system [8, 9, 34].Finally, we note that it may be desirable to rankclimate models by more than two properties, e.g. themodel cost E and several different benefits B , B , ..., B M .The foregoing analysis can readily be extended into asmany dimensions as desired, giving the above mathe-matical structure as a function of the constraints h E i ,and h B i , ..., h B M i .3.3 ImplicationsWhat are the implications of the above Jaynesian math-ematical structure? In essence, it governs the effect ofchanges to the constraints and/or multipliers on themanifold of equilibrium positions of the framework. Thisincludes: • Firstly, the first derivatives (41) and (44) can be in-terpreted as equations of state on the space of ensem-bles of models, describing the relationship betweenthe rate of change of the entropy or potential as afunction of the constraints or their conjugate multi-pliers [37]. • Secondly, the second derivatives (42) and (45) de-scribe the susceptibilities of the framework, i.e. thefunctional connections between the constraints andmultipliers. In thermodynamics, such susceptibilitiesinclude the heat capacity, isothermal compressibil-ity, coefficient of thermal expansion and so on [e.g.27, 31, 36, 38]; if desired, such parameters can alsobe defined for the model framework proposed here.The Maxwell-like relations (43) and (46) reflect thecoupling between the constraints, such that changesin one constraint or its multiplier, at constant H ∗ or φ , will produce adjustments to the other pair. • Thirdly, the second derivative matrix (45) of the po-tential function φ contains even further information,since in the asymptotic limit ( N → ∞ ), it is equiva-lent (with change of sign) to the variance-covariancematrix of the constraints [7, 8, 9, 34]: α = h E i − h E i , h EB i − h E ih B ih EB i − h E ih B i , h B i − h B i (48)Accordingly, α is positive definite (or semi-definiteif singularities exist) [34]. From the Legendre trans-formation (47), a is also positive definite (or semi-definite) [34]. In consequence, from (42) and (45) (in-cluding the tensor sign reversals), H ∗ ( h E i , h B i ) and φ ( ω E , ω B ) are both concave functions. Furthermore,the diagonal of (48) gives the magnitude of the stan-dard deviation or “fluctuations” of the ensemble withrespect to each constraint, usually expressed in nor-malised form by the coefficients of variation [36]: C V ( E ) = p h E i − h E i h E i = 1 h E i s − ∂ h E i ∂ω E C V ( B ) = p h B i − h B i h B i = 1 h B i s − ∂ h B i ∂ω B (49)The covariance, similarly normalised, provides a mea-sure of the coupling between constraints [36]: C V ( E, B ) = s h EB i − h E ih B ih E ih B i = s h E ih B i (cid:12)(cid:12)(cid:12)(cid:12) − ∂ h B i ∂ω E (cid:12)(cid:12)(cid:12)(cid:12) = s h E ih B i (cid:12)(cid:12)(cid:12)(cid:12) − ∂ h E i ∂ω B (cid:12)(cid:12)(cid:12)(cid:12) (50) • Fourthly, the manifold of predicted equilibrium po-sitions defined by H ∗ ( h E i , h B i ) or φ ( ω E , ω B ) can beinterpreted as a framework geometry , analogous to thethermodynamic geometry examined by Gibbs [39, 40,41] (see also [36, 42, 43]). For example, if we con-sider h B i as a function of h E i , as shown graphicallyin Figure 3, we can represent positions of constant en-tropy H ∗ by a series of isentropic curves on this graph.From (40), these will be straight lines with negativegradient − ω E /ω B , indicating that an increase in theenergy or cost h E i , at constant H ∗ (and φ ), causes acorresponding decrease in h B i . Of course, many othercurves can also be plotted on the diagram, includingisoenergetic, isobenefit, iso- ω E and iso- ω B curves, de-fined by rearrangements of (40). We can also plot ω B as a function of ω E , on which we can construct isopotential curves with negative gradient −h E i / h B i .(Adopting the crude analogy of § P as a function of T for the model framework.) Three-dimensional graphs such as H ∗ ( h E i , h B i ) or φ ( ω E , ω B ) can also be constructed,containing isosurfaces of various kinds [40, 41]. Aspointed out by Gibbs [39, 40, 41], it is advantageous toplot “fundamental equations” such as H ∗ ( h E i , h B i ) or φ ( ω E , ω B ), rather than forms unobtainable from theseby Legendre transformation (such as H ∗ ( ω E , ω B )), sothat all parameters not represented on the axes can becalculated for a given path simply by differentiation.Recalling that the frameworks herein consist of allpossible models consistent with the constraints, theresulting manifold H ∗ ( h E i , h B i ) or φ ( ω E , ω B ) shouldfor the most part be continuous in its geometric space,reflecting infinitesimal changes in parameters and in-cremental changes in model algorithms. However, insome circumstances there may be discontinuities inthe manifold, due to abrupt changes in model algo-rithm or adoption of different scientific paradigms.Such changes can be described as phase changes or tipping points within the model space, leading to as-sortments of stable and unstable solutions and path-dependent hysteresis effects. These may create partic-ular difficulties, but can of course be handled in muchthe same manner as in thermodynamics. • Finally, it can be shown that either framework H ∗ ( h E i , h B i )or φ ( ω E , ω B ) can be endowed with a Riemannian ge-ometry (entirely distinct from the framework geom-etry just described), using the metric furnished bythe respective (positive definite) Hessian matrix a or α [22, 23, 24, 25, 26, 27]. As noted, the two metricsand hence the geometries are connected by Legendretransformation (47). The Riemannian interpretationleads to an important physical limit: a least actionbound on the cost, in units of H ∗ or φ , to move theframework from one equilibrium position to anotherat finite rates of change of the constraints or multi-pliers. This bound – which constitutes an extensionof finite time thermodynamics [22, 23, 24, 25, 26, 27],but is in some sense allied to the informational limitsidentified by Szilard [44], Landauer [45], Bennett [46]and similar workers [47] – is examined in more detailin Appendix A. In this study, several maximum-entropy frameworks arepresented for the synthesis of outputs from multipleEarth climate models, based on constraints on the prop-erties of individual models (microcanonical framework)or ensembles of models (two canonical frameworks).The asymptotic and non-asymptotic entropy functionsfor each case are derived by combinatorial reasoning, B Isentropic lines Isenergetic linesIso- benefit lines E Fig. 3
Schematic diagram of Gibbs’ geometry, for the MaxEntcost-benefit climate model weighting framework of § and applied to simple systems constrained by the to-tal model energy E (microcanonical) or mean ensem-ble energy h E i (canonical). In each case it is shownthat the MaxEnt method identifies the most represen-tative (most probable) model from a set of climate mod-els, subject to the specified constraints, eliminating theneed to calculate the entire set. The parametric and ge-ometric implications of the underlying Jaynesian math-ematical structure are examined, with reference to acanonical framework with competing cost and bene-fit constraints, allowing interrogation of the trade-offbetween costs and benefits. Finally, a finite-time limiton the minimum cost of modification of the synthesisframework, at finite rates of change, is also reported.The foregoing analysis therefore provides climatemodellers – or those who must rank and combine cli-mate models – with a rational tool to amalgamate alarge set of models into a single representative model (ora small representative set). This enables the weightingof climate projections from different groups, and willalso dramatically reduce the computational demand onthe climate modelling community. Indeed, the benefitsextend into other fields: as commented by a reviewer,for long-range weather forecasts it is common practiceto combine projections from different meteorologicalmodels, to improve reliability. The MaxEnt frameworksproposed here could equally be applied to this task.A caveat to the foregoing analysis is that the in-ferred equilibrium climate model will not necessarilybe the “most correct” model, but merely the one whichis most representative of the available set of models. Ifthe model space is incomplete, or their underlying phys-ical or modelling assumptions are incorrect, any result-ing errors will also be incorporated in the equilibriummodel. A more comprehensive probabilistic framework,which incorporates the errors associated with our lackof knowledge (of data, phenomena and models), wouldconsist of a Bayesian inferential framework extendingback to all raw climate data, a substantial endeavourwhich – as its minimum condition – would require cli- mate scientists to abandon their use of orthodox meth-ods for statistical inference and parameter estimation[9]. Acknowledgements
This work was inspired by discussions atthe Mathematical and Statistical Approaches to Climate Mod-elling and Prediction workshop, Isaac Newton Institute for Math-ematical Sciences, Cambridge, UK, 11 Aug. to 22 Dec. 2010. Theauthor sincerely thanks the workshop organisers for travel sup-port.
Appendix A: The Least Action Bound
The Riemannian geometric interpretation in § H ∗ ( h E i , h B i )or φ ( ω E , ω B ), specified by some path parameter ξ in themodel space, which may – but need not – correspondto time. The arc length of the path from position 1 to2, represented by ξ = 0 to ξ = ξ max , is given by [27]: L = ξ max Z p ˙f ⊤ a ˙f dξ = ξ max Z p ˙Ω ⊤ α ˙Ω dξ (51)where f = [ h E i , h B i ] ⊤ , Ω = [ ω E , ω B ] ⊤ and the overdotindicates the rate of change with respect to ξ . Now, inthe H ∗ representation, the total change in the frame-work entropy along the same path can be shown to be[27]: ∆ H ∗ = H ∗ Z H ∗ d H ∗ = ¯ ǫ ξ max Z ˙f ⊤ a ˙f dξ = ¯ ǫ J (52)where ¯ ǫ is a mean dissipation parameter (e.g. minimumdissipation time) and J is an action integral definedwithin the model space. Similarly, in the φ representa-tion, the total change is: − ∆φ = − φ Z φ dφ = ¯ ǫ ξ max Z ˙Ω ⊤ α ˙Ω dξ = ¯ ǫ J (53)From the Cauchy-Schwarz inequality, (51)-(53) give, ineither case: J ≥ L ξ max (54)Eq. (54) can be considered as a generalised least actionbound on processes on the manifold of optimal solutions.In essence, it specifies the minimum cost or penalty, inunits of H ∗ or φ , to move the system from ξ = 0 to ξ = ξ max at the specified rates ˙f or ˙Ω . If the processoccurs infinitely slowly, the lower bound of the action is zero (it is “reversible”); otherwise, it is necessary to paythe minimum penalty ∆ H ∗ min = − ∆φ min = ¯ ǫ J min = ¯ ǫ L /ξ max to be able to alter the framework within thefinite parameter duration ξ max (it is “dissipative”). Inthe present scenario, we assume that the costs h E i andbenefits h B i of the model framework are realisable asexternal physical quantities, outside the model spaceitself; likewise, so will be the entropy H ∗ and potential φ , either in the units of k or the equivalent informa-tion units. Eq. (54) therefore provides an informationlimit on the minimum price for making alterations toa constrained modelling framework. (Of course, it ap-plies to any modelling framework, not just for climatemodelling.) In some sense, this limit is allied to theinformational principles demonstrated by Szilard [44],Landauer [45], Bennett [46] and many others [47], al-though it is of quite different character. References
1. J.P. Peixoto & A.H. Oort (1992), Physics of Cli-mate, AIP, Melville, USA.2. K. McGuffie & A. Henderson-Sellers (2005) A Cli-mate Modelling Primer, 3rd ed., Wiley, NY.3. G. Paltridge, A. Arking & M. Pook, Trends inmiddle- and upper-level tropospheric humidityfrom NCEP reanalysis data, Theor. Appl. Clima-tol. (3-4), 351-359 (2009).4. T.F. Stocker, D. Qin, G.-K. Plattner, M. Tig-nor & P.M. Midgley (eds.) (2010): Meeting Re-port of the Intergovernmental Panel on ClimateChange Expert Meeting on Assessing and Com-bining Multi Model Climate Projections, IPCCWorking Group I Technical Support Unit, Uni-versity of Bern, Bern, Switzerland.5. L. Boltzmann, ¨Uber die Beziehung zwischendem zweiten Hauptsatze dewr mechanischenW¨armetheorie und der Wahrscheinlichkeit-srechnung, respective den S¨atzen ¨uber dasW¨armegleichgewicht, Wien. Ber. , 373-435(1877); English transl.: J. Le Roux (2002) 1-63 ∼ leroux/ .6. M. Planck, ¨Uber das gesetz der Energieverteilungim Normalspektrum, Annalen der Physik , 553-563 (1901).7. E.T. Jaynes, Information theory and statisticalmechanics, Phys. Rev., , 620-630 (1957).8. E.T. Jaynes, Information theory and statisticalmechanics, in Ford, K.W. (ed), Brandeis Uni-versity Summer Institute, Lectures in Theoreti-cal Physics, Vol. 3: Statistical Physics, Benjamin-Cummings Publ. Co., 1963, 181-218. 9. E.T. Jaynes (G.L. Bretthorst, ed.) ProbabilityTheory: The Logic of Science, Cambridge U.P.,Cambridge, 2003.10. C.E. Shannon, A mathematical theory of commu-nication, Bell Sys. Tech. J. , 379-423; 623-659(1948).11. I. Vincze, Progress in Statistics 2 (1974) 869.12. M. Grend´ar, Jr. M. Grend´ar, What is the questionthat MaxEnt answers? A probabilistic interpre-tation, in A. Mohammad-Djafari (ed.), BayesianInference and Maximum Entropy Methods in Sci-ence and Engineering (MaxEnt 2000), AIP Conf.Proc., 2001, 83-94.13. R.K. Niven Exact Maxwell-Boltzmann, Bose-Einstein and Fermi-Dirac statistics, Physics Let-ters A, 342(4): 286-293 (2005).14. R.K. Niven, Cost of s-fold decisions in exactMaxwell-Boltzmann, Bose-Einstein and Fermi-Dirac statistics, Physica A, 365(1): 142-149(2006).15. R.K. Niven, Combinatorial entropies and statis-tics, European Physical Journal B 70, 49-63(2009).16. R.K. Niven, Non-asymptotic thermodynamic en-sembles, EPL : 20010 (2009).17. R.K. Niven & M. Grendar, Generalized classical,quantum and intermediate statistics and the Polyaurn model, Physics Letters A : 621-626 (2009).18. M. Grendar & R.K. Niven, The Polya informationdivergence, Information Sciences Les Statistiques Quantiques et LeursApplications , Les Presses Universitaires de France,Paris, 1930.20. R. Fortet,
Elements of Probability Theory , Gordonand Breach Science Publ., London, 1977.21. C.B. Read, in S. Kotz, N.L. Johnson,
Encyclope-dia of Statistical Sciences , vol. 3, John Wiley, NY,1983, 63-66.22. F. Weinhold, Metric geometry of equilibrium ther-modynamics. III. Elementary formal structure ofa vector-algebraic representation of equilibriumthermodynamics, J. Chem. Phys. (6) 2488-2495(1975).23. G. Ruppeiner, Thermodynamics: a Reimanniangeometric model, Phys. Rev. A (4) 1608-1613(1979).24. P. Salamon, B. Andresen, P.D. Gait, R.S. Berry,The significance of Weinhold’s length, J. Chem.Phys. (2) 1001-1002 (1980), erratum (10)5407 (1980).25. P. Salamon, R.S. Berry, Thermodynamic lengthand dissipated availability, Phys. Rev. Lett. (13) 1127-1130 (1983).26. J. Nulton, P. Salamon, B. Andresen, Q. Anmin,Quasistatic processes as step equilibrations, J.Chem. Phys. (1) 334-338 (1985).27. R.K. Niven & B. Andresen, in Dewar, R.L., Deter-ing, F. (eds) Complex Physical, Biophysical andEconophysical Systems, World Scientific, Hacken-sack, NJ, 283 (2009).28. G.W. Paltridge,Global dynamics and climate - asystem of minimum entropy exchange, Quart. J.Royal Meteorol. Soc. (11) 5551-5553 (1982).39. J.W. Gibbs, Graphical methods in the thermody-namics of fluids, Trans. Connecticut Acad. , 309-342 (1873).40. J.W. Gibbs, A method of graphical representationof the thermodynamic properties of substances bymeans of surfaces, Trans. Connecticut Acad. ,382-404 (1873).41. Gibbs, J.W. (1875-78) On the equilibrium of het-erogeneous substances, Trans. Connecticut Acad. , 105-109(2002). 43. R.A. Gaggioli, D.M. Paulus Jr, Available energy– Part II: Gibbs extended, J. Energy ResourcesTechnol., ASCE , 110-115 (2002).44. L. Szilard, Zeitschrift f¨ur Physik 53 (1929) 840;English transl.: A. Rapoport, M. Knoller (1964), in H.S. Leff, A.F. Rex, Maxwell’s Demon: En-tropy, Information, Computing, Princeton Univ.Press, NJ, (1990) 124.45. R. Landauer, Irreversibility and heat generation inthe computing process IBM J. Res. Dev. , 183-191 (1961).46. C.H. Bennett, Logical reversibility of computationIBM J. Res. Dev.17