The Price equation program: simple invariances unify population dynamics, thermodynamics, probability, information and inference
ggit • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank The Price equation program: simple invariances unify population dynamics,thermodynamics, probability, information and inference
Steven A. Frank ∗ The fundamental equations of various disciplines often seem to share the same basic structure. Nat-ural selection increases information in the same way that Bayesian updating increases information.Thermodynamics and the forms of common probability distributions express maximum increase inentropy, which appears mathematically as loss of information. Physical mechanics follows paths ofchange that maximize Fisher information. The information expressions typically have analogousinterpretations as the Newtonian balance between force and acceleration, representing a partitionbetween the direct causes of change and the opposing changes in the frame of reference. This webof vague analogies hints at a deeper common mathematical structure. I suggest that the Priceequation expresses that underlying universal structure. The abstract Price equation describes dy-namics as the change between two sets. One component of dynamics expresses the change in thefrequency of things, holding constant the values associated with things. The other component ofdynamics expresses the change in the values of things, holding constant the frequency of things. Theseparation of frequency from value generalizes Shannon’s separation of the frequency of symbolsfrom the meaning of symbols in information theory. The Price equation’s generalized separationof frequency and value reveals a few simple invariances that define universal geometric aspects ofchange. For example, the conservation of total frequency, although a trivial invariance by itself,creates a powerful constraint on the geometry of change. That constraint plus a few others seem toexplain the common structural forms of the equations in different disciplines. From that abstractperspective, interpretations such as selection, information, entropy, force, acceleration, and physicalwork arise from the same underlying geometry expressed by the Price equation. These claims ofuniversal structure are, at present, conjectures that deserve further study.
Keywords:
Natural selection, symmetry, maximum entropy, d’Alembert’s principle, Bayesianinference
Introduction 2The abstract Price equation 2Canonical form 3Preliminary interpretation 3Temporal dynamics 4Key results 5History of earlier forms 6Mathematical properties 7D’Alembert’s principle 9 ∗ Department of Ecology and Evolutionary Biology, Uni-versity of California, Irvine, CA 92697–2525, USAweb: https://stevefrank.org
Information theory 10Extreme action 11Entropy and thermodynamics 12Entropy and statistical mechanics 13Invariance and sufficiency 14Inference: data as a force 14Invariance and probability 15Meaning 16References 17Appendix A: Value of synthesis by invariance 18Appendix B: Mathematical expressions fromvarious disciplines 20 a r X i v : . [ q - b i o . P E ] D ec it • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank Introduction
The Price equation is an abstract mathematical de-scription for the change in populations. The mostgeneral form describes a way to map entities be-tween two sets. That abstract set mapping parti-tions the forces that cause change between popula-tions into two components, the direct and inertialforces.The direct forces change frequencies. The inertialforces change the values associated with populationmembers. Changed values can be thought of asan altered frame of reference driven by the inertialforces.From the abstract perspective of the Price equa-tion, one can see the same partition of direct andinertial forces in the fundamental equations of manydifferent subjects. That abstract unity clarifies un-derstanding of natural selection and its relationsto such disparate topics as thermodynamics, infor-mation, the common forms of probability distribu-tions, Bayesian inference, and physical mechanics.In a special form of the Price equation, thechanges caused by the direct and inertial forcescancel so that the total remains conserved. Thatconservation law defines a universal invariance andcanonical separation of the direct and inertialforces. The canonical separation of forces clarifiesthe common mathematical structure of seeminglydifferent topics.This article sketches the overall argument for thecommon mathematical structure of different sub-jects. The argument is, at present, a broad framingof conjectures. The conjectures raise many inter-esting problems that require further work. ConsultFrank (2012a, 2017) for mathematical details, openproblems, and citations to additional literature.
The abstract Price equation
The Price equation describes the change in the av-erage value of some property between two popu-lations (Price, 1972a; Frank, 2012a). Consider apopulation as a set of things. Each thing has aproperty indexed by i . Those things with a com-mon property index comprise a fraction, q i , of the population and have average value, z i , for whateverwe choose to measure by z . Write q and z as thevectors over all i . The population average value is¯ z = q · z = (cid:80) q i z i , summed over i .A second population has matching vectors q (cid:48) and z (cid:48) . Those vectors for the second population are de-fined by the special set mapping of the abstractPrice equation. In particular, q (cid:48) i is the fraction ofthe second population derived from entities with in-dex i in the first population. The second populationdoes not have its own indexing by i . Instead the sec-ond population’s indices derive from the mappingof the second population’s members to the membersof the first population.Similarly, z (cid:48) i is the average value in the secondpopulation of members derived from entities withindex i in the first population. Let ∆ be the differ-ence between the derived population and the origi-nal population, ∆ q = q (cid:48) − q and ∆ z = z (cid:48) − z .To calculate the change in average value, it isuseful to begin by considering q and z as abstractvariables associated with the first set, and q (cid:48) and z (cid:48) as corresponding variables from the second set.The change in the product of q and z is ∆( qz ) = q (cid:48) z (cid:48) − qz . Note that q (cid:48) = q + ∆ q and z (cid:48) = z + ∆ z .We can write the total change in the product as adiscrete analog of the chain rule for differentiationof a product, yielding two partial change terms∆( qz ) = ( q + ∆ q )( z + ∆ z ) − qz = (∆ q ) z + ( q + ∆ q )∆ z = (∆ q ) z + q (cid:48) ∆ z. The first term, (∆ q ) z , is the partial difference of q holding z constant. The second term, q (cid:48) ∆ z , isthe partial difference of z holding q constant. Inthe second term, we use q (cid:48) as the constant valuebecause, with discrete differences, one of the partialchange terms must be evaluated in the context ofthe second set.The same product rule can be applied to vectors,yielding the abstract form of the Price equation∆¯ z = ∆( q · z ) = ∆ q · z + q (cid:48) · ∆ z . (1)The abstract Price equation simply partitions thetotal change in the average value into two partialchange terms.2 it • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank Note that q has a clearly defined meaning as fre-quency, whereas z may be chosen arbitrarily as anyvalues assigned to members. The values, z , definethe frame of reference. Because frequency is clearlydefined, whereas values are arbitrary, the frequencychanges, ∆ q , take on the primary role in analyzingthe structural aspects of change that unify differentsubjects.The primacy of frequency change naturally labelsthe first term, with ∆ q , as the changes caused bythe direct forces acting on populations. Because q and q (cid:48) define a sequence of probability distribu-tions, the primary aspect of change concerns thedynamics of probability distributions.The arbitrary aspect of the values, z , naturallylabels the second term, with ∆ z , as the changescaused by the forces that alter the frame of refer-ence, the inertial forces.Table 1 defines commonly used symbols. Tables2 and 3 in Appendix B summarize mathematicalforms and relations between disciplines. Canonical form
The prior section emphasized the primary rolefor the dynamics of probability distributions, ∆ q ,which follows as a consequence of the forces actingon populations.The canonical form of the Price equation focuseson the dynamics of probability distributions andthe associated forces that cause change. To obtainthe canonical form, define a i = ∆ q i q i (2)as the relative change in the frequency of the i thtype.We can use any value for z in the Price equation.Choose z ≡ a . Then∆¯ a = ∆ q · a + q (cid:48) · ∆ a = 0 , (3)in which the equality to zero expresses the conser-vation of total probability¯ a = q · a = (cid:88) i q i ∆ q i q i = (cid:88) i ∆ q i = 0 , because the total changes in probability must cancelto keep the sum of the probabilities constant at one.Thus, eqn 3 appears as a seemingly trivial re-sult, a notational spin on (cid:80) ∆ q i = 0. However,many generalities and connections between seem-ingly different disciplines follow from the partitionof conserved probability into the two terms of eqn 3. Preliminary interpretation
The Price equation by itself does not calculate theparticular ∆ q values of dynamics. Instead, theequation emphasizes the fundamental constraint ondynamics that arises from invariant total probabil-ity. The changes, ∆ q , must satisfy the constraintin eqn 3, specifying certain properties that any pos-sible dynamical path must have.Put another way, all possible dynamical pathswill share certain invariant properties. It is thoseinvariant properties that reveal the ultimate unitybetween different applications and disciplines.Note that q is fundamental, whereas z is an arbi-trary assignment of value or meaning. The focus on q corresponds to the reason why information the-ory considers only probabilities, without considera-tion of meaning or values. In general, the unifyingfundamental aspect among disciplines concerns thedynamics of probability distributions. We can thenadd values or meaning to that underlying funda-mental basis.In particular, we can first study universal aspectsof the canonical invariant form based on a . We canthen derive broader results by simply making thecoordinate transformation a (cid:55)→ z , yielding the mostgeneral expression of the abstract Price equation ineqn 1.Constraints on ¯ z or ∆¯ z specify additional invari-ances, which determine further structure of the pos-sible dynamical paths and equilibria. Each z i maybe a vector of values, allowing multiple constraintsassociated with the z values.Alternatively, one can study the conditions re-quired for ∆¯ z to change in particular ways. Forexample, what are the necessary and sufficient pat-terns of association between initial frequency, q , rel-ative frequency change, a , and value, z , to drive the3 it • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank Table 1:
Definitions of key symbols and concepts
Symbol Definition Equationq
Vector of frequencies with (cid:80) q i = 1 1 z Values with average ¯ z = q · z ; use z ≡ a , F , etc. for specificinterpretations 1∆ q Discrete changes, ∆ q i = q (cid:48) i − q i , may be large 1 ˙q Small, differential changes, ∆ q → ˙q ≡ d q a Relative change of the i th type, a i = ∆ q i /q i → ˙ q i /q i = log q (cid:48) i /q i m Malthusian parameter, m = log q (cid:48) / q , log of relative fitness, w w Relative fitness, w i = q (cid:48) i /q i , with m = log w F Direct nondimensional forces, may be used for values z ≡ F I Inertial nondimensional forces, may be interpreted asacceleration (24) 4 φ Force vector F ≡ φ when specific for particular case 6∆ q · F Abstract notion of physical work as displacementmultiplied by force 5 D ( q (cid:48) || q ) Kullback-Leibler divergence between q (cid:48) and q F Fisher information, nondimensional expression 5 L Lagrangian, used to find extremum subject to constraints 6 L Likelihoods, L θ , for parameter values, θ ; interpreted asforce, F ≡ L F Partial change caused by direct forces, e.g., ∆ q · F or∆ q · φ or ∆ q · L (cid:107)·(cid:107) Euclidean vector length, e.g., (cid:107) z (cid:107) or (cid:107) F (cid:107) or (cid:107) ∆ q (cid:107) r Unitary coordinates, r = √ q , with (cid:107) r (cid:107) = 1 as invarianttotal probability 22change, ∆¯ z , in a particular direction? Temporal dynamics
The frequency change terms, ∆ q i , arise from the ab-stract set mapping assignment of members in thesecond set to members in the first set. In somecases, the abstract set mapping may differ from thetraditional notion of dynamics as a temporal se-quence, in which q (cid:48) i is the frequency of type i in thesecond set.We may add various assumptions to achieve atemporal interpretation in which i retains its mean-ing as a type through time. For example, followingPrice (1995), we may partition q (cid:55)→ q (cid:48) into twosteps. In the initial step, q (cid:55)→ q ∗ , the mapping pre-serves type, such that q ∗ i describes the frequency of type i in the second set.In the subsequent step, q ∗ (cid:55)→ q (cid:48) , the mappingaccounts for the forces that change type. For aforce that makes the change i (cid:55)→ j , we map type j members in the second set to type j members inthe first set. Thus, ∆ q j = q (cid:48) j − q ∗ j describes the netfrequency change from the gains and losses causedby the forces of type reassignment.For this two-step process that preserves type, thenet change q (cid:55)→ q (cid:48) combines the type-changingforces with other forces that alter frequency. Thus,we may consider type-preserving maps as a specialcase of the general abstract set mapping. In thisarticle, I focus on the properties of the general ab-stract set mapping.4 it • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank Key results
Later sections use the abstract Price equation toshow formal relations between natural selection andinformation theory, the dynamics of entropy andprobability, basic aspects of physical dynamics, andother fundamental principles (Frank, 2017). Here,I list some key results without derivation or discus-sion. This listing gives a sense of where the argu-ment will go, providing a target for further devel-opment in later sections.Throughout this article, I use ratios of vectorsto denote elementwise division, for example q (cid:48) / q = q (cid:48) /q , q (cid:48) /q , . . . . A constant added to or multipliedby a vector applies the operation to each elementof the vector, for example, a + b z , for constants a and b , yields a + bz i for each i . D’Alembert’s principle of physical mechan-ics.
We can write the canonical Price equation ofeqn 3 as d’Alembert’s partition (Frank, 2015, 2017)between the direct forces, F = a , and the inertialforces of acceleration, I , as∆¯ a = ( F + I ) · ∆ q = 0 . (4)This equation generalizes Newton’s second law thatforce equals mass times acceleration, describing thebalance between force and acceleration. Here, thedirect forces, F , balance the inertial forces of ac-celeration, I , along the path of change, ∆ q . Thecondition ∆¯ a = 0 describes conservative systems.For nonconservative systems, we can use a (cid:55)→ z ,with ∆¯ z not necessarily conserved. Information theory.
For small changes, ∆ q → ˙q and F = a → log( q (cid:48) / q ), the direct force term is∆ q · F = ∆ q · a = D (cid:0) q (cid:48) || q (cid:1) + D (cid:0) q || q (cid:48) (cid:1) = (cid:88) ˙ q i q i = F , (5)in which D is the Kullback-Leibler divergence, afundamental measure of information, and F is anondimensional expression of Fisher information(Cover & Thomas, 1991). Extreme action.
The term for direct force, or action, ˙q · F , yields frequency change dynamics, ˙q ,determined by the extremum of the action, subjectto constraint L = (cid:88) ˙ q i φ i − κ (cid:18)(cid:88) ˙ q i q i − C (cid:19) − ξ (cid:16)(cid:88) ˙ q i − (cid:17) , (6)in which φ = F is a given force vector. Thefirst parenthetical term constrains the incremen-tal distance between probability distributions to be F = (cid:80) ˙ q i /q i = C , for a given constant, C . Thesecond parenthetical term constrains the total prob-ability to remain invariant. Entropy and thermodynamics.
The force vec-tor, φ , can be described as a growth process, q (cid:48) i = q i e φ i , with φ i = log( q (cid:48) i /q i ). A constraint on the sys-tem’s partial change in some quantity, ˙q · z = B ,constrains the new frequency vector, q (cid:48) . We maywrite the constraint as ˙q · log q (cid:48) = − λ ( ˙q · z ) = − λB ,thus L = − ˙q · log q − κ (cid:0) F − C (cid:1) − ξ ( ˙q · − − λ ( ˙q · z − B ) . The action term, − ˙q · log q , is the increase in en-tropy, − q · log q . Maximizing the action maximizesthe production of entropy. Maximum entropy and statistical mechanics.
In the prior example, the work done by the force ofconstraint is ˙q · F c = − λB , with F c = log q (cid:48) =log k − λ z . At maximum entropy, we obtain anequilibrium, log q (cid:48) = log q . Thus, the maximumentropy equilibrium probability distribution is q = ke − λz . (7)This Gibbs-Boltzmann-exponential distribution isthe principal result of statistical mechanics. Here,we obtained that result through a Price equationabstraction that led to maximum entropy produc-tion, subject to a constraining invariance on a com-ponent of change in ¯ z . Constraint, invariance and sufficiency.
Themaximum entropy probability distribution ex-presses the forces of constraint, F c , acting on z .Different constraints yield different distributions.For example, the constraint q · ( z − µ ) = σ yields5 it • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank a Gaussian distribution for given mean, µ , and vari-ance, σ . This constraint is sufficient to determinethe form of the distribution. Similarly, for smallchanges, the total change of the direct forces∆ q · a = ∆ q · F → (cid:88) ˙ q i q i = F , (8)does not require the exact form of the frequencychanges, ˙q . It is sufficient to know the Fisher infor-mation distance, (cid:80) ˙ q i /q i = F , which determinesthe subsets of the possible change vectors, ˙q , withthe same invariant Fisher distance, F . Many resultsfrom the abstract Price equation express invarianceand sufficiency. Inference: data as a force.
Use θ ≡ i as an in-dex for different parameter values. Then q θ matchesthe Bayesian notion of a prior probability distribu-tion for the values of θ . The posterior distributionis q (cid:48) θ = q θ L θ , (9)in which the normalized likelihood, L θ , describesthe force of the data that drives the change in prob-ability. In Price notation, the normalized likelihoodis equivalent to the force vector, L ≡ F , and also L − ≡ a . With that definition for a in termsof the force of the data, the structure and generalproperties of Bayesian inference follow as a specialcase of the abstract Price equation. Invariance, scale and probability distribu-tions.
The maximum entropy probability distribu-tion in eqn 7 is invariant to affine transformation, z (cid:55)→ a + bz , because k and λ adjust to a and b . Thataffine invariance with respect to z , which arises di-rectly from the abstract Price equation, is sufficientby itself to determine the structure of commonlyobserved probability distributions, without need ofinvoking entropy maximization. The structure ofcommon probability distributions is q = ke − λe βw . The function w ( z ) is a scale for z , such that a shiftin that scale, w (cid:55)→ α + w , only changes z by a con-stant multiple, and therefore does not change theprobability pattern. Simple forms of w lead to the various commonly observed continuous probabilitydistributions. For example, w ( z ) = log z yields thestretched exponential distribution. History of earlier forms
Before analyzing the abstract Price equation andthe unification of disciplines, it is useful to writedown some of the earlier expressions and applica-tions of the Price equation from biology (Frank,1995, 1997, 2012a; Walsh & Lynch, 2018).
Fitness and average excess
This section extends the definition of relativechanges in eqn 2. Let w i = q (cid:48) i /q i be the relativegrowth, or relative fitness, of the i th type. Thenwe may define a i = w i − q (cid:48) i q i − q i q i , (10)which, in biology, is Fisher’s average excess in fit-ness (Fisher, 1941). Note that ∆ q i = q i a i and thatthe average value of w is ¯ w = 1, thus a i = w i − ¯ w . Variance in fitness
Considering a as a measure of fitness, the first termof eqn 3 becomes the partial change in average fit-ness caused by the direct forces, F . In symbols∆ F ¯ a = ∆ q · a = (cid:88) i ∆ q i (cid:18) ∆ q i q i (cid:19) = (cid:88) i q i (cid:18) ∆ q i q i (cid:19) = (cid:88) i q i a i = V w , (11)in which ∆ F is the partial change caused by thedirect forces, and V w is the variance in fitness. Fundamental theorem
If we let a i = αx i + (cid:15) i be the regression of fitness, a i , on some predictor, x i , and define g i = αx i , then∆ F ¯ a = (cid:88) i q i a i = V g + V (cid:15) . (12)6 it • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank If one interprets x i as an inherited gene, and (cid:15) i asan environmental effect that is not transmitted tothe next generation, then the partial change in fit-ness by natural selection that is transmitted to thenext generation is ∆ NS ¯ a = V g . This result is anal-ogous to Fisher’s fundamental theorem of naturalselection (Fisher, 1958; Price, 1972b; Ewens, 1989;Frank, 1997).The analysis tracks three sets. The initial set be-fore selection with ¯ a , the second set after selectionwith ¯ a † , and the third set after transmission with¯ a (cid:48) . The set after transmission retains only thosechanges associated with x i , interpreted as an inher-ited gene, such that ∆¯ a = ¯ a (cid:48) − ¯ a . Covariance form and replicators
Using the definitions of relative fitness and averageexcess, the first term of the Price equation is∆ q · z = (cid:88) (∆ q i ) z i = (cid:88) q i a i z i = (cid:88) q i ( w i − ¯ w ) z i = Cov( w, z ) , (13)in which Cov( w, z ) is the covariance between fitnessand value. This covariance implies that natural se-lection tends to increase the average value of z inproportion to the association between fitness andvalue. If the values do not change, ∆ z i = 0, thenthe total change is∆¯ z = Cov( w, z ) . This covariance equation has been widely used tostudy natural selection (Robertson, 1966; Wade,1985; Gardner, 2008; Queller, 2017; Walsh & Lynch,2018).In one common application, sometimes referredto as the replicator problem, we label each individ-ual in a population by its own unique index, i , andlet z i = p i be 0 or 1 to specify if each individualis a type 0 or type 1 individual (Taylor & Jonker,1978; Schuster & Sigmund, 1983). We can think of p i as the frequency of type 1 in individual i . Then¯ p is the frequency of type 1 individuals in the pop-ulation, and ∆¯ p = Cov( w, p ) (14)is the frequency change of types in the population (Price, 1970). Here, we assume that individuals donot change their type during transmission, ∆ p i = 0,so that the second Price equation term is zero. Thisassumption is usually interpreted in biology as theabsence of mutation. Levels of selection
We can write the second Price equation term as q (cid:48) · ∆ z = (cid:88) q (cid:48) i (∆ z i ) = (cid:88) q i w i (∆ z i ) = E( w ∆ z ) , (15)in which E denotes the expectation operator forthe average value. Combining this expression witheqn 13, we obtain an alternative form of the Priceequation ∆¯ z = Cov( w, z ) + E( w ∆ z ) . (16)This form is often used to analyze how selection actsat different levels, such as individual versus groupselection (Price, 1972a; Hamilton, 1975). As an ex-ample, consider a variant of the replicator problem,which uses z ≡ p , yielding∆¯ p = Cov( w, p ) + E( w ∆ p ) , (17)in which p i now denotes the frequency of type 1individuals within the i th group of individuals, w i is the fitness of the i th group relative to all othergroups, and ∆ p i is the change in the frequency oftype 1 individuals within the i th group. Thus, thetwo terms can be interpreted as the change causedby selection between groups and the change causedby selection between individuals within groups. Mathematical properties
This section illustrates mathematical properties ofthe Price equation. These mathematical propertiesset the foundation for unifying apparently differentkinds of problems from different disciplines.
Geometry and work
Write the standard Euclidean geometry vectorlength as the square root of the sum of squares (cid:107) z (cid:107) = (cid:113)(cid:88) z i . (18)7 it • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank For any vector z ∆ q · z = (cid:107) ∆ q (cid:107)(cid:107) z (cid:107) cos ω = Cov( w, z ) , in which ω is the angle between the vectors ∆ q and z . If we interpret z ≡ F as an abstract, nondimen-sional force, then∆ q · F = (cid:107) ∆ q (cid:107)(cid:107) F (cid:107) cos ω (19)expresses an abstract notion of work as the distancemoved, (cid:107) ∆ q (cid:107) , multiplied by the component of forceacting along the path, (cid:107) F (cid:107) cos ω . Divergence between sets
If we let z ≡ a describe the relative growth of thevarious frequencies, a i = ∆ q i /q i , then the diver-gence between sets can be expressed as∆ F ¯ a = ∆ q · a = (cid:88)(cid:18) ∆ q i √ q i (cid:19) = (cid:13)(cid:13)(cid:13)(cid:13) ∆ q √ q (cid:13)(cid:13)(cid:13)(cid:13) = V w = R , (20)in which R is the radius of a sphere on which mustlie all possible ∆ q / √ q changes with the same di-vergence between sets. If we choose to interpret a as an abstract notion of force, or fitness, acting onfrequency changes, then ∆ q · a is the work, withmagnitude (cid:13)(cid:13) ∆ q / √ q (cid:13)(cid:13) , that separates the proba-bility distribution q (cid:48) from q . Small changes, paths and logarithms
If we think of the separation between sets as a se-quence of small changes along a path, with eachsmall change as ∆ q → ˙q , then a → ˙qq = d log q , in which the overdot and the symbol “d” equiv-alently describe the differential. Then the partialchange by direct forces separates the probabilitydistributions of the two sets by the path length∆ F ¯ a = ∆ q · a = (cid:13)(cid:13)(cid:13)(cid:13) ˙q √ q (cid:13)(cid:13)(cid:13)(cid:13) = F , (21)in which F is an abstract, nondimensional expres-sion of the Fisher information distance metric. Unitary and canonical coordinates
Let r = √ q . Then (cid:107) r (cid:107) = 1, expressing the conser-vation of total probability as a vector of unit length,in which all possible probability combinations of r define the surface of a unit sphere. In Hamiltoniananalyses of d’Alembert’s principle for the canonicalPrice equation, r is a canonical coordinate system(Frank, 2015).The unitary coordinates, r , also provide a directdescription of Fisher information path length as adistance between two probability distributions4 (cid:107) ˙r (cid:107) = 4 (cid:107) d √ q (cid:107) = (cid:13)(cid:13)(cid:13)(cid:13) ˙q √ q (cid:13)(cid:13)(cid:13)(cid:13) = F . (22)The constraint on total probability makes squareroot coordinates the natural system in which to an-alyze Euclidean distances, which are the sums ofsquares. See Figure 1. Affine invariance
Affine transformation shifts and stretches (multi-plies) values, z (cid:55)→ a + b z , for shift by a and stretchby b . Here, addition or multiplication of a vectorby a constant applies to each element of the vector.In the abstract Price equation∆¯ z = ∆ q · z + q (cid:48) ∆ z , affine transformation, z (cid:55)→ a + b z , alters the termsas: ∆¯ z (cid:55)→ b ∆¯ z , because the shift constant can-cels in the differences; ∆ q · z (cid:55)→ b ∆ q · z , becausein (cid:80) (∆ q i )( a + bz i ), we have (cid:80) a ∆ q i = 0; and q (cid:48) ∆ z (cid:55)→ b q (cid:48) ∆ z , because the shift constant cancelsin the differences. The stretch factor b multiplieseach term and therefore cancels, leaving the Priceequation invariant to affine transformation of the z values. Much of the universal structure expressedby the Price equation follows from this affine invari-ance. Probability vs frequency
In this article, I use probability and frequency in-terchangeably. Many subtle issues distinguish theconcepts and applications associated with those al-ternative words. However, in this attempt to iden-tify common mathematical structure between var-ious subjects, those distinctions are not essential.8 it • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank See Jaynes (2003) for discussion.
D’Alembert’s principle
The remaining sections repeat the list of topics inthe
Key results section. Prior publications dis-cussed these topics (Frank, 2012a, 2017). Here, Ipresent additional details, roughly sketching howthe structure provided by the abstract Price equa-tion unifies various subjects.We can rewrite the canonical Price equation forthe conservation of total probability in eqn 3 as∆¯ a = ( F + I ) · ∆ q = 0 . (23)Here, ∆ q satisfies the constraint on total probabil-ity and any other specified constraints. The directforces are F = a = ∆ q / q . The inertial forces are I = ∆ q ∆ q − ∆ qq , (24)in which ∆ q = ∆( q (cid:48) − q ) is the second differenceof q , which is roughly like an acceleration.D’Alembert’s principle is a generalization ofNewton’s second law, force equals mass times ac-celeration (Lanczos, 1986). In one dimension, New-ton’s law is F = − I , for force, F , and mass timesacceleration, − I , so that F + I = 0. D’Alembertgeneralizes Newton’s law to a statement about mo-tion in multiple dimensions such that, in conser-vative systems, the total work for a displacement,∆ q , and total forces, F + I , is zero. Work is the dis-tance moved multiplied by the force acting in thedirection of the movement.The canonical Price equation of eqn 3 isan abstract, nondimensional generalization ofd’Alembert for probability distributions that con-serve total probability. The movement of the proba-bility distribution between two populations, or sets,can be partitioned into the balancing work compo-nents of the direct forces, ∆ q · F , and the inertialforces, ∆ q · I . We can often specify the direct forcesin a simple and clear way. The balancing inertialforces may then be analyzed by d’Alembert’s prin-ciple (Lanczos, 1986).The movement of probability distributions in the canonical Price equation is always conserva-tive, ∆¯ a = 0, so that d’Alembert’s principle holds.When we transform to the general Price equationby a (cid:55)→ z , then it may be that ∆¯ z (cid:54) = 0 and the sys-tem is not conservative. In that case, we may con-sider constraints on ∆¯ z and how those constraintsinfluence the possible paths of change for ∆ q .We can obtain a simple form of d’Alembert’sprinciple for probability distributions when dis-placements are small, ∆ q → ˙q ≡ d q . Definethe relative change operator as d log, the differ-ential of the logarithm. Then F = d log q and I = d log(d log q ) = d log q , yielding( F + I ) · d q = (cid:0) d log q + d log q (cid:1) · d q = 0 , (25)with the direct force proportional to the relativechange in frequencies, and the inertial force propor-tional to the relative nondimensional acceleration infrequencies.From eqn 5, the work of the direct forces, d q · F = ˙q · F = F , is the Fisher information path lengththat separates the probability distributions, q (cid:48) and q , associated with the two sets. The inertial forcescause a balancing loss, ˙q · I = −F , which describesthe loss in Fisher information that arises from therecalculation of the relative forces in the new frameof reference, q (cid:48) . The balancing loss occurs becausethe average relative force, or fitness, is always zeroin the current frame of reference, for example, q · a = (cid:80) q i ( ˙ q i /q i ) = 0. Any gain in relative fitness, ˙q · F = F , must be balanced by an equivalent lossin relative fitness, ˙q · I = −F .Here, the notions of force, inertia, and work are nondimensional mathematical abstractions thatarise from the common underlying structure be-tween the Price equation and the equations of phys-ical mechanics. Similarly, the Fisher informationmeasure here is an abstraction of the standard us-age of the Fisher metric.By equating force with relative frequency change,we intentionally blur the distinction between ex-ternal causes and internal effects. By describingchange as the difference between two abstract setsrather than change through time or space, we in-tentionally blur the scale of change. By separatingfrequencies, q , from property values, z , we inten-tionally distinguish universal aspects of structural9 it • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank .r ; r / jj r jj D (cid:31) r ; r (cid:30) jjP r jj D p F =2 (cid:31) q p q ; q p q (cid:30) (cid:31) q p q ; q p q (cid:30) ˇˇˇˇˇˇˇˇ (cid:31)q p q ˇˇˇˇˇˇˇˇ D p V w D p J ! p F ( a ) ( b ) Figure 1:
Geometry of change by direct forces. See Table 1 for definition of symbols. Tables 2 and 3 summarizedistance expressions and point to locations in the text with further details. ( a ) The abstract physical work of thedirect forces as the distance moved between the initial set with frequencies q , and the altered set with frequencies q (cid:48) . For discrete changes, the frequencies are normalized by the square root of the frequencies in the initial set. Thedistance can equivalently be described by the various expressions shown, in which V w is the variance in fitness frompopulation biology, J is the Jeffreys divergence from information theory, and F is the Fisher information metricwhich arises in many disciplines. The symbol “ → ” denotes the limit for small changes. ( b ) When changes are small,the same geometry and distances can be described more elegantly in unitary square root coordinates, r = √ q . change between sets from the particular interpreta-tions of property values in each application. Theblurring of cause, effect and scale, and the separa-tion of frequency from value, lead to abstract math-ematical expressions that reveal the common un-derlying structure between seemingly different sub-jects. Information theory
When changes are small, the direct force term ofthe canonical Price equation expresses classic mea-sures of information theory (eqn 5). In particu-lar, ˙q · a = ˙q · F is a symmetric expression of theKullback-Leibler divergence, which measures thechange in information associated with the separa-tion between two probability distributions (Cover& Thomas, 1991).For small changes, the Kullback-Leibler diver-gence is equivalent to a nondimensional expressionof the Fisher information metric. The Fisher met- ric provides the foundation for much of classic sta-tistical theory and for the subject of informationgeometry (Fisher, 1925; Amari & Nagaoka, 2000).The Fisher metric also arises as an equivalent de-scription for dynamics in many classic problems inphysics and other subjects (Frieden, 2004).What does it mean that the Price equationmatches classic measures of information, which alsoarise other subjects? That remains an open ques-tion. I suggest that the Price equation revealsthe common mathematical structure among thoseseemingly different subjects. That mathematicalstructure arises from the conserved quantities, in-variances, or constraints that impose a commonpattern on dynamics. By this interpretation, dy-namics is just a description of the changes betweena sequence of sets.The key aspect of the Price equation seems tobe the separation of frequencies from property val-ues. That separation shadows Shannon’s separa-tion of the information in a message, expressed by10 it • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank frequencies of symbols in sets, from the meaningof a message, expressed by the properties associ-ated with the message symbols. The Price equa-tion takes that separation further by consideringthe abstract description of the separation betweensets rather than the information in messages. Price(1995) was clearly influenced by the informationtheory separation between frequency and propertyin his discussion of a generalized notion of naturalselection that might unify disparate subjects.The equivalence of the Price equation and infor-mation measures arises directly from the assump-tion of small changes. For larger changes, the re-lation between the Price equation and informationremains an open problem. We might, for example,describe larger changes as q (cid:48) i = q i e m i , (26)in which m i is a nondimensional expression for thetotal force that separates frequencies. From thatexpression, m i = log q (cid:48) i q i = log w i , (27)in which w i is a form relative fitness, and m i iscalled the Malthusian parameter in biology. Then,similarly to eqn 5, we have∆ q · m = D (cid:0) q (cid:48) || q (cid:1) + D (cid:0) q || q (cid:48) (cid:1) , (28)which is known as the Jeffreys divergence. In thiscase, with ∆ q not necessarily small, we no longerhave a direct equivalence to Fisher information.Information geometry, which analyzes continuouspaths along contours of conserved total probability,describes the relations between Fisher informationand this discrete divergence (Dabak & Johnson,2002). The idea is that big changes, ∆ q , become aseries of small changes, ˙q , along a continuous paththat connects the endpoints, q to q (cid:48) . Each smallstep along the path can be described as a Fisher in-formation path length, and the sum of those smalllengths equals the Jeffreys divergence.Earlier work in population genetics theory de-rived the total change caused by natural selection as (cid:80) ˙ q /q i (reviewed by Ewens, 1992; Wei et al., 2009;Raju & Krishnaprasad, 2019). That initial work did not emphasize the equivalence of the changeby natural selection and Fisher information (Frank,2009b). Here, the Fisher metric arises most sim-ply as the continuous limiting form of the canonicalPrice equation description for the distance betweentwo sets. Extreme action
We can write eqn 6 as L = ˙q · φ − κ (cid:0) F − C (cid:1) − ξ ( ˙q · − . (29)By the principle of extreme action, the dynamics, ˙q ,maximize or minimize (extremize) the action, ˙q · φ ,subject to the constraints. In this case, maximizingthe action simply describes the fact that the move-ment, ˙q , tends to be in the direction of the forcevector, φ , subject to any constraints on motion.The Lagrangian, L , combines the action and theconstraints into one expression. To illustrate theprinciple of extreme action with the Lagrangianabove, we maximize the action subject to the con-straints by solving ∂ L /∂ ˙ q i = 0, while also solvingfor κ and ξ by requiring that F = C and ˙q · = 0.The solution is ˙ q i = κq i (cid:0) φ i − ¯ φ (cid:1) , (30)in which φ i − ¯ φ is the excess force relative to theaverage, and ξ = ¯ φ follows from satisfying the con-straint on total probability under the assumptionof small changes. The constant, κ = C/σ φ , satis-fies the constraint on total path length, F = C ,in which σ φ is the standard deviation of the forces.We can rewrite the solution as m i = ˙ q i q i = κ (cid:0) φ i − ¯ φ (cid:1) . This expression shows that we can determine thefrequency changes, ˙q , from the given forces, φ , orwe can determine the forces from the given fre-quency changes. The mathematics is neutral aboutwhat is given and what is derived.In this case, φ is an arbitrary force vector. Using z = φ in the general Price equation does not neces-sarily yield ∆¯ z = ∆ ¯ φ = 0. A nonconservative sys-tem does not satisfy d’Alembert’s principle. Often,11 it • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank we can specify certain invariances associated with∆¯ z , and use those invariances as additional forces ofconstraint on ˙q in the Lagrangian. The additionalforces of constraint typically alter the dynamics andthe potential equilibria, as shown in the followingsection.Across many disciplines, problems can often besolved by this variational method of writing a La-grangian and then extremizing the action subjectto the constraints (Lanczos, 1986). The difficulty isdetermining the correct Lagrangian for a particularproblem. No general method specifies the correctform.In this example, the Price equation essentiallygave us the form of the action and the constraints.Here, the action is the frequency displacement mul-tiplied by the arbitrary force vector, ˙q · φ , which isanalogous to the physical work done in the move-ment of the probability distribution. The con-straints follow from the conservation of total proba-bility and the description of total distance moved asFisher information, F , which arises from the canon-ical Price equation. Entropy and thermodynamics
The tendency for systems to increase in entropyprovides the foundation for much of thermody-namics (Van Ness, 1983). Entropy can be stud-ied abstractly by the information entropy quantity, E = − q · log q . For small changes in frequencies,the change in entropy is d E = − ˙q · log q .System dynamics often maximize the productionof entropy (Dewar et al., 2014). Maximum entropyproduction suggests that the dynamics may be an-alyzed by a Lagrangian in which the action to bemaximized is the production of entropy, − ˙q · log q .In the basic Lagrangian for dynamics given byeqn 29, the action is the abstract notion of physicalwork, ˙q · φ , the displacement, ˙q , multiplied by theforce, φ .The force vector, φ , can be related to frequencychange in a growth process, q (cid:48) i = q i e φ i , with φ i = m i = log( q (cid:48) i /q i ), as in eqn 27. The work becomes ˙q · φ = ˙q · log q (cid:48) − ˙q · log q , (31) in which the second term on the right is the pro-duction of entropy.If the system conserves the change in some quan-tity, ∆¯ z = B , then that invariant change imposes aconstraint on the possible change in the probabilitydistribution, ˙q = q (cid:48) − q . Suppose that the value z i is a property of a type, i , such that each typedoes not change its property value between sets,∆ z i = z (cid:48) i − z i = 0. Then, from the general Priceequation, ∆¯ z = B implies ˙q · z = B . This constraintacts as a force that limits the possible probabilitydistributions, q (cid:48) , given the initial distribution, q .We can express the constraint ˙q · z = B on z interms of a constraint on q (cid:48) as log q (cid:48) = log k − λ z ,for constant, k . Then the constraint ˙q · z has anequivalent expression in terms of q (cid:48) as ˙q · log q (cid:48) = − λ ( ˙q · z ) = − λB. (32)We can now split the total force, φ , as in eqn 31and, considering ˙q · log q (cid:48) as a force of constraint,we can rewrite the Lagrangian of eqn 29 as L = − ˙q · log q − κ (cid:0) F − C (cid:1) − ξ ( ˙q · − − λ ( ˙q · z − B ) . (33)The action term, d E = − ˙q · log q , is the increasein entropy, E = − q · log q . Maximizing the actionmaximizes the production of entropy.The maximization by solving ∂ L /∂ ˙ q i = 0 subjectto the constraints yields a solution with the sameform as eqn 30. The force term is replaced by apartition of forces into components that match thedirect entropy increase and the constraint on z as φ i − ¯ φ = E ∗ i − λz ∗ i , (34)in which the star superscripts denote the deviationsfrom average values, E ∗ i = − log q i − E and z ∗ i = z i − ¯ z , thus ˙ q i = κq i ( E ∗ i − λz ∗ i ) . (35)The value of κ is C/σ φ , as in the previous section.In this case, we use for φ the partition of the forceson the right side of eqn 34 into the direct entropyand the constraining forces.The constraint ˙q · z = B implies λ = β E z − Bκσ z . it • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank The term β E z is the regression of − log q on z ,which acts to transform the scale for the forces ofconstraint imposed by z to be on a common scalewith the direct forces of entropy, − log q . The term B/κσ z describes the required force of constraint onfrequency changes so that the new frequencies move¯ z by the amount ˙q · z = B . The term σ z is the vari-ance in z .In these examples of dynamics derived from La-grangians, the action is the partial change term ofthe direct forces derived from the universal prop-erties of the Price equation. Thus, the maximumentropy production in this case can be interpretedas a universal partial maximum entropy productionprinciple, in the Price equation sense of the partialchange associated with the direct forces, holdingthe inertial frame constant (Frank, 2017).In many applications, causal analysis reduces tothis pattern of partial change by direct focal causes,holding other causes constant. The particular par-tition into direct, constraining, and inertial forcesis a choice that we make to isolate or highlight par-ticular causes (Lanczos, 1986). Entropy and statistical mechanics
When entropy reaches its maximum value subjectto the forces of constraint, equilibrium occurs at q (cid:48) = q . From the force of constraint given in theprevious section, log q (cid:48) = log k − λ z , the equilibriumcan be written as q = ke − λz , (36)in which I have dropped the i subscript. ThisGibbs-Boltzmann-exponential distribution is theprincipal result of statistical mechanics (Feynman,1998). Here, we obtained the exponential distribu-tion through a Price equation abstraction that ledto maximum entropy production.This result suggests that equilibrium probabil-ity distributions are simple expressions of maxi-mum entropy subject to the forces of constraint.Jaynes (1957a,b) developed this maximum entropyperspective in his quest to overthrow Boltzmann’scanonical ensemble for statistical mechanics. Thecanonical ensemble describes macroscopic probabil- ity patterns by aggregation over a large number ofequivalent microscopic particles.The theory of statistical mechanics, based onthe microcanonical ensemble, yields several com-monly observed probability distributions. However,Jaynes (2003) emphasized that the same probabil-ity distributions commonly arise in economics, biol-ogy, and many other disciplines. In those nonphysi-cal disciplines, there is no meaningful canonical en-semble of identical microscopic particles. Accordingto Jaynes, there must another more general cause ofthe common probability patterns. The maximiza-tion of entropy is one possibility (Frank, 2009a).Jaynes emphasized that increase in entropy isequivalent to loss of information. The inherent ran-domizing tendency in all systems causes loss of in-formation. Maximum entropy is simply a conse-quence of that loss of information. Because sys-tems lose all information except the forces of con-straint, common probability distributions simplyreflect those underlying forces of constraint.The Gibbs-Boltzmann-exponential distributionin eqn 36 expresses the simple force of constraint onthe mean of some value, z , associated with the sys-tem. Different constraints lead to different distribu-tions. For example, the constraint q · ( z − µ ) = σ yields a Gaussian distribution for mean µ and vari-ance σ .Jaynes invoked maximum entropy as a conse-quence of the thermodynamic principle that sys-tems increase in entropy. Here, I developed themaximization of entropy from the abstract Priceequation expression for frequency dynamics and theextreme action principle.Extreme action simply expresses the notion thatchanging frequencies align with the direction of theforce vector. That geometric alignment is equiva-lent to the maximization of frequency change multi-plied by force, an abstract notion of physical work.Jaynes argued that the fundamental notion of in-formation sets the underlying structural unity ofthermodynamics, probability, and many aspects ofstatistical inference. I argue for underlying unitybased on abstract properties of invariance and ge-ometry (Frank, 2017). Those properties of invari-ance and geometry give a common mathematical13 it • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank structure to any problem that can be consideredabstractly by the Price equation’s description of thechange between two sets. The next section reviewsand extends these notions of invariance and com-mon mathematical structure. Invariance and sufficiency
The Price equation expresses constraints on thechange in probability distributions between sets,∆ q . For example, if ¯ z is a constant, conservedvalue, then the changes, ∆ q , must satisfy that con-straint. We may say that the conserved value of¯ z imposes a force of constraint on the frequencychanges. This section relates the Price equation’sabstract notions of change and constraint to Jaynes’arguments.Jaynes emphasized that systems tend to increasein entropy or, equivalently, to lose information. En-tropy increase is a force that drives a system to anequilibrium at which entropy is maximized subjectto any forces of constraint.Because entropy increase is essentially universal,it is sufficient to know the particular forces of con-straint to determine the most likely form of a prob-ability distribution. Sufficiency expresses the forcesof constraint in terms of conserved quantities.Put another way, sufficiency partitions all possi-ble populations into subsets. Each subset containsall of those populations with the same invariantconserved quantity. For example, if the constraintis a conserved value of ¯ z , then all populations withthe same invariant value of ¯ z fall into the same sub-set.To analyze the force arising from constraint on ¯ z and the most likely form of the associated proba-bility distribution, it is sufficient to know that thedynamics of populations driven by entropy increasemust remain within the subset with invariant valuesdefined by the constraints of the conserved quanti-ties.Jaynesian thermodynamics follows from the gen-eral force of information loss, in which the con-straints sufficiently describe the only informationthat remains after maximum information loss.The Price equation goes beyond Jaynes in reveal- ing the underlying abstract mathematical structurethat unifies seemingly different subjects. In all ofthe disciplines we have discussed, the key results foreach discipline arise from the basic description ofchange between sets constrained by invariant con-ditions that we place on frequency, q , and value, z .In addition, the Price equation expresses the intrin-sic invariance to affine transformation z (cid:55)→ a + b z .From the perspective of the abstract Price equa-tion, notions of information and entropy increasearise as secondary descriptions of the underlyingprimary geometric aspects of change between setssubject to intrinsic invariances and to invariant con-ditions imposed as constraints. Those aspects ofgeometry and invariance set the shared foundationsfor many seemingly different disciplines. Inference: data as a force
Jaynes considered information as a force thatchanges probability distributions. Entropy increaseis the force that causes loss of information, driv-ing probability distributions to maximum entropysubject to constraint. For inference, data providean informational force that drives the Bayesian dy-namics of probability distributions to provide es-timates of parameter values. The parameters aretypically the conserved, constrained quantities thatare sufficient to define maximum entropy probabil-ity distributions.How does the Jaynesian interpretation of data asan informational force in statistical inference fol-low from the underlying Price equation abstrac-tion? Consider the estimation of a parameter, θ ,such as the mean of an exponential probability dis-tribution. In the Bayesian framework, we describethe current information that we have about θ bythe probability distribution, q θ .The value of q θ represents the relative likelihoodthat the true value of the parameter is θ . Theprobability distribution over alternative values of θ represents our current knowledge, or information,about θ . To relate this to the Price framework, notethat we are now using θ as the subscript for typesinstead of i . The vector q now implicitly describesthe set of values for q θ .14 it • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank Our problem concerns how new informationabout θ changes the probability values to q (cid:48) θ . Thenew probability values summarize the combinationof our prior information in q θ and the force of thenew information in the data. This problem is theBayesian dynamics of combining a prior distribu-tion, q θ , with new data to generate a posterior dis-tribution, q (cid:48) θ , with ∆ q θ = q (cid:48) θ − q θ .We have from our universal definitions for changegiven earlier the relation q (cid:48) θ = q θ w θ , in which wecalled w = q (cid:48) /q the relative fitness, describing theforce of change on probabilities. Here, the forcearises from the way in which new data alters thenet likelihood associated with a value of θ .Following Bayesian tradition, denote that forceof the data as ˜ L ( D | θ ), the likelihood of observingthe data, D , given a value for the parameter, θ . Tointerpret a force as equivalent to relative fitness, theaverage value of the force must be one to satisfy theconservation of total probability. Thus, define w θ = L θ = ˜ L ( D | θ ) (cid:80) θ q θ ˜ L ( D | θ ) . We can now write the classic expression forBayesian updating of a prior, q θ , driven by the forceof new data, L θ = L ( D | θ ), to yield the posterior, q (cid:48) θ , as q (cid:48) θ = q θ L θ . (37)By recognizing L as a force vector acting on fre-quency change, we can use all of the general re-sults derived from the Price equation. For exam-ple, the Malthusian parameter, m , relates to thelog-likelihood as m = log q (cid:48) q = ∆ log q = log L . (38)This equivalence for log-likelihood relates frequencychange to the Kullback-Leibler expressions for thechange in information∆ q · log L = D (cid:0) q (cid:48) || q (cid:1) + D (cid:0) q || q (cid:48) (cid:1) , (39)which we may think of as the gain of informationfrom the force of the data. Perhaps the most gen-eral expression of change describes the relative sep-aration within the unitary square root coordinates as the Euclidean length∆ q · L = (cid:13)(cid:13)(cid:13)(cid:13) ∆ q √ q (cid:13)(cid:13)(cid:13)(cid:13) , which is an abstract, nondimensional expression forthe work done by the displacement of the frequen-cies, ∆ q , in relation to the force of the data, L .I defined L as a normalized form of the likelihood,˜ L , such that the average value is one, ¯ L = q · L =1. Thus, we have a canonical form of the Priceequation for normalized likelihood∆¯ L = ∆ q · L + q (cid:48) · ∆ L = 0 . (40)The second terms shows how the inertial forces alterthe frame of reference that determines the normal-ization of the likelihoods, ˜ L (cid:55)→ L . Typically, asinformation is gained from data, the normalizingforce of the frame of reference reduces the force ofthe same data in subsequent updates.All of this simply shows that Bayesian updatingdescribes the change in probability distributions be-tween two sets. That change between sets followsthe universal principles given by the abstract Priceequation.Prior work noted the analogy between natu-ral selection and Bayesian updating (Shalizi, 2009;Harper, 2010; Campbell, 2016). Here, I emphasizeda more general perspective that includes natural se-lection and Bayesian updating as examples of thecommon invariances and geometry that unify manytopics. Invariance and probability
In the earlier section
Affine invariance , I showedthat the Price equation is invariant to affine trans-formations z (cid:55)→ a + b z . This section suggests thatthe Price equation’s intrinsic affine invariance ex-plains universal aspects of probability distributionsin a more general and fundamental manner thanJaynes’ focus on entropy and information.The general form of probability distributions ineqn 36 followed from the constraint log q (cid:48) = log k − λ z . Affine transformation does not change the forceimposed by that constraint, becauselog k − λ z (cid:55)→ log k − aλ − bλ z = log k a − λ b z , it • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank in which k a = ke − aλ and λ b = bλ . Because theconstants, k a and λ b , adjust to satisfy underlyingconstraints, the shift and stretch constants a and b do not alter the constraints or the final form of theprobability distribution.Thus, the probability distribution in eqn 36, aris-ing from analysis of extreme action applied to a La-grangian, is affine invariant with respect to z . Wecan make a more fundamental argument, by deriv-ing the form of the probability distribution solelyas a consequence of the intrinsic affine invariance ofthe Price equation.In particular, shift invariance by itself explainswhy the probability distribution in eqn 36 has anexponential form (Frank, 2016a). If we assume thatthe functional form for the probability distribution, q i = f ( z i ), is invariant to a constant shift, a + z i ,then, dropping the i subscripts and using continu-ous notation, by the conservation of total probabil-ity (cid:90) k f ( z ) d z = (cid:90) k a f ( a + z ) d z = 1 (41)holds for any magnitude of the shift, a , in which theproportionality constant, k a , changes with the mag-nitude of the shift, a , independently of the value of z , in order to satisfy the conservation of total prob-ability.Because k a is independent of z , the condition forthe conservation of total probability is k a f ( a + z ) = k f ( z ) . (42)The invariance holds for any shift, a , so it must holdfor an infinitesimal shift, a = (cid:15) . We can write theTaylor series expansion for an infinitesimal shift as f ( (cid:15) + z ) = f ( z ) + (cid:15)f (cid:48) ( z ) = κ (cid:15) f ( z ) , with κ (cid:15) = 1 − λ(cid:15) , because (cid:15) is small and independentof z , and κ = 1. Thus, f (cid:48) ( z ) = − λf ( z )is a differential equation with solution q = f ( z ) = ke − λz , (43)in which k is determined by the conservation of to-tal probability, and λ is determined by ¯ z . When z ranges over positive values, z >
0, then k = λ =1 / ¯ z . Invariance to stretch transformation by b fol-lows from the adjustment, λ b , given above.Affine invariance of the probability distributionwith respect to z implies additional structure. Inparticular, we can write z = e βw , in which a shift w ( z ) (cid:55)→ α + w ( z ) multiplies z by a constant, whichdoes not change the form of the probability distri-bution. Thus, in terms of the shift-invariant scale, w ( z ), we obtain the canonical expression that de-scribes nearly all commonly observed continuousprobability distributions (Frank, 2016a,c) q d ψ = ke − λe βw d ψ, (44)when we add a few additional details about themeasure, d ψ z , and the commonly observed basescales, w ( z ). Understanding the abstract form ofcommon probability patterns clarifies the study ofmany problems (Frank, 2016b,c, 2018) (see Ap-pendix A). Meaning
One cannot explain mathematical form by appealto extrinsic physical notions. The structure ofmathematical results does not follow from energyor heat or natural selection. Instead, those extrin-sic phenomena arise as consistent interpretationsfor the structure of the mathematics.The mathematical structure can only be ana-lyzed, explained and understood by reference tomathematical properties. For example, we mayinvoke invariance, conserved values, and geome-try to understand why certain mathematical formsarise in the abstract Price equation description forchanges in frequency, and why those same formsrecur in many different applications. We may notinvoke entropy or information as a cause, only as adescription.My goal has been to reveal the common math-ematical structure that unifies seemingly disparateresults from different subjects. The common math-ematical structure arises primarily through simpleinvariances and their expression in geometry.16 it • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank Acknowledgments
The Donald Bren Foundation supports my research.I completed this work while on sabbatical in theTheoretical Biology group of the Institute for Inte-grative Biology at ETH Z¨urich.
References
Amari, S. & Nagaoka, H. (2000).
Methods of Infor-mation Geometry . New York: Oxford UniversityPress.Campbell, J. O. (2016). Universal Darwinism asa process of Bayesian inference.
Hypothesis andTheory , , 49.Chater, N. & Vit´anyi, P. M. (2003). The general-ized universal law of generalization. Journal ofMathematical Psychology , (3), 346–369.Cover, T. M. & Thomas, J. A. (1991). Elements ofInformation Theory . New York: Wiley.Dabak, A. G. & Johnson, D. H. (2002). Relationsbetween Kullback-Leibler distance and Fisher in-formation. unpublished manuscript.Dewar, R. C., Lineweaver, C. H., Niven, R. K.,& Regenauer-Lieb, K. (Eds.). (2014).
Beyondthe Second Law: Entropy Production and Non-equilibrium Systems . Berlin: Springer-Verlag.Ewens, W. J. (1989). An interpretation and proofof the fundamental theorem of natural selection.
Theoretical Population Biology , , 167–180.Ewens, W. J. (1992). An optimizing principle ofnatural selection in evolutionary population ge-netics. Theoretical Population Biology , , 333–346.Feynman, R. P. (1998). Statistical Mechanics: ASet Of Lectures (2nd ed.). New York: WestviewPress.Fisher, R. A. (1925). Theory of statistical estima-tion.
Math. Proc. Cambridge Phil. Soc. , , 700–725.Fisher, R. A. (1941). Average excess and averageeffect of a gene substitution. Annals of Eugenics , , 53–63.Fisher, R. A. (1958). The Genetical Theory of Nat-ural Selection (2nd ed. ed.). New York: Dover.Frank, S. A. (1986). Hierarchical selection theory and sex ratios I. General solutions for structuredpopulations.
Theoretical Population Biology , ,312–342.Frank, S. A. (1995). George Price’s contributionsto evolutionary genetics. Journal of TheoreticalBiology , , 373–388.Frank, S. A. (1997). The Price equation, Fisher’sfundamental theorem, kin selection, and causalanalysis. Evolution , , 1712–1729.Frank, S. A. (2007). Dynamics of Cancer: Inci-dence, Inheritance, and Evolution . Princeton,NJ: Princeton University Press.Frank, S. A. (2009a). The common patterns of na-ture.
Journal of Evolutionary Biology , , 1563–1585.Frank, S. A. (2009b). Natural selection maximizesFisher information. Journal of Evolutionary Bi-ology , , 231–244.Frank, S. A. (2012a). Natural selection. IV. ThePrice equation. Journal of Evolutionary Biology , , 1002–1019.Frank, S. A. (2012b). Natural selection. V. How toread the fundamental equations of evolutionarychange in terms of information theory. Journalof Evolutionary Biology , , 2377–2396.Frank, S. A. (2013). Natural selection. VI. Parti-tioning the information in fitness and charactersby path analysis. Journal of Evolutionary Biol-ogy , , 457–471.Frank, S. A. (2014). How to read probability dis-tributions as statements about process. Entropy , , 6059–6098.Frank, S. A. (2015). D’Alembert’s direct and iner-tial forces acting on populations: the Price equa-tion and the fundamental theorem of natural se-lection. Entropy , , 7087–7100.Frank, S. A. (2016a). Common probability patternsarise from simple invariances. Entropy , (5),192.Frank, S. A. (2016b). The invariances of power lawsize distributions. F1000Research , , 2074.Frank, S. A. (2016c). Invariant death. F1000Research , , 2076.Frank, S. A. (2017). Universal expressions of popu-lation change by the Price equation: Natural se-lection, information, and maximum entropy pro-17 it • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank duction. Ecology and Evolution , , 3381–3396.Frank, S. A. (2018). Measurement invariance ex-plains the universal law of generalization for psy-chological perception. Proceedings of NationalAcademy of Sciences USA , , 9803–9806.Frieden, B. R. (2004). Science from Fisher Informa-tion: A Unification . Cambridge, UK: CambridgeUniversity Press.Gardner, A. (2008). The Price equation.
CurrentBiology , , R198–R202.Hamilton, W. D. (1975). Innate social aptitudes ofman: an approach from evolutionary genetics. InR. Fox (Ed.), Biosocial Anthropology (pp. 133–155). New York: Wiley.Harper, M. (2010). The replicator equation as aninference dynamic. arXiv:0911.1763v3 .Jaynes, E. T. (1957a). Information theory and sta-tistical mechanics.
Phys. Rev. , (4), 620–630.Jaynes, E. T. (1957b). Information theory and sta-tistical mechanics. II. Phys. Rev. , (2), 171–190.Jaynes, E. T. (2003). Probability Theory: The Logicof Science . New York: Cambridge UniversityPress.Lanczos, C. (1986).
The Variational Principles ofMechanics (4th ed.). New York: Dover Publica-tions.Price, G. R. (1970). Selection and covariance.
Na-ture , , 520–521.Price, G. R. (1972a). Extension of covariance selec-tion mathematics. Annals of Human Genetics , , 485–490.Price, G. R. (1972b). Fisher’s ‘fundamental theo-rem’ made clear. Annals of Human Genetics , ,129–140.Price, G. R. (1995). The nature of selection. Jour-nal of Theoretical Biology , , 389–396.Queller, D. C. (2017). Fundamental theorems ofevolution. American Naturalist , , 345–353.Raju, V. & Krishnaprasad, P. S. (2019). A vari-ational problem on the probability simplex. In Proceedings of the 57th IEEE Conference on De-cision and Control , volume (preliminary draft).Robertson, A. (1966). A mathematical model of theculling process in dairy cattle.
Animal Produc-tion , , 95–108. Schuster, P. & Sigmund, K. (1983). Replicator dy-namics. Journal of Theoretical Biology , , 533–538.Shalizi, C. R. (2009). Dynamics of Bayesian updat-ing with dependent data and misspecified models. Electronic Journal of Statistics , , 1039–1074.Shepard, R. N. (1987). Toward a universal law ofgeneralization for psychological science. Science , , 1317–1323.Sims, C. R. (2018). Efficient coding explains theuniversal law of generalization in human percep-tion. Science , (6389), 652–656.Taylor, P. D. & Jonker, L. B. (1978). Evolutionarystable strategies and game dynamics. Mathemat-ical Biosciences , , 145–156.Van Ness, H. C. (1983). Understanding Thermody-namics . New York: Dover Publications.Wade, M. J. (1985). Soft selection, hard selection,kin selection, and group selection.
American Nat-uralist , , 61–73.Walsh, B. & Lynch, M. (2018). Evolution and Selec-tion of Quantitative Traits . Oxford, UK: OxfordUniversity Press.Wei, E., Justh, E. W., & Krishnaprasad, P. S.(2009). Pursuit and an evolutionary game. In
Proceedings of the Royal Society of London A ,volume 465, (pp. 1539–1559).
Appendix A: Value of synthesis byinvariance
I have been asked to comment on how this synthesisof concepts may enhance scientific progress. Theprimary modes of progress follow two lines.First, one can more easily understand the vast lit-erature that makes connections between disciplines.For example, information is often discussed as if itwere a primary concept that clarifies the meaningof biological or physical principles. By contrast, inthis synthesis based on the fundamental invariancesexpressed by the abstract Price equation, variousinformation and entropy forms arise directly. Thissynthesis provides value if one feels curiosity aboutthe similarity of mathematical forms or wishes tounderstand the literature that discusses such simi-18 it • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank larities.Second, new mathematical results and new in-sights into empirical phenomena may follow. I be-lieve this to be true. However, the argument fornovel results and insights is nearly impossible tomake. For any particular result or insight, it isalways possible to claim that the same could havebeen achieved without the broader framing. Ascrib-ing the origins of insight to a general framework isalmost always subjective.The strongest argument I can make arises fromtwo personal anecdotes. It is only in these casesthat I understand the origin of insight in relationto the broad use of invariance as a unifying per-spective. Probability, invariance, and maximum en-tropy
The first anecdote shows how observations in bi-ology motivated my search for a broader synthesisof concepts between disciplines. That synthesis, interms of invariance, helped me to understand theobserved biological patterns. It also led to a uni-fied understanding of the commonly observed prob-ability distributions in terms of the invariances thatdefine scale, and an understanding of the relationsbetween the equations of thermodynamics, naturalselection in biology, and probability patterns.In my work on cancer and other aspects of age-related disease (Frank, 2007, 2016c), I noted that awide variety of seemingly different dynamical mod-els of disease progression tended to converge to afew similar forms of probability distributions for theage of disease onset. At first, I used Jaynes’ maxi-mum entropy approach (Jaynes, 1957a,b, 2003) totry and understand the relations between appar-ently complex processes and the resulting simplepatterns (Frank, 2009a). That worked, in the sensethat one could find constraints that led to maxi-mum entropy distributions that matched the data.The problem with maximum entropy is that theconstraints simply describe the patterns in the data,without giving one a sense of how patterns arise andwhat relates different patterns to each other. In-stead, one ends up with a catalog of the commonlyobserved probability distributions and the match- ing constraints for each distribution.Those difficulties led me to study the forms ofcommonly observed probability distributions. I feltthat if I could understand probability patterns moredeeply, I would be in a better position to under-stand the biological problems that interested me.And, along the way, I would perhaps better under-stand more general aspects of probability patterns.Over many years, I developed a unified under-standing of probability patterns in terms of invari-ance and scale (Frank, 2014, 2016a). I used thatimproved understanding of probability to enhancemy analyses of age-related diseases (Frank, 2016c)and the size distributions of trees in forests (Frank,2016b).That work on invariance and scale in probabil-ity left open the puzzle of how that perspectiverelated to Jaynes’ classic maximum entropy ap-proach. Although my invariance approach to prob-ability patterns could stand separately from max-imum entropy, Jaynes’ approach was widely usedand formed a standard against which my new workwould reasonably be compared. Also, I developedmy ideas by initially starting with maximum en-tropy, and Jaynes himself strongly hinted that in-variance might be the way forward from where heleft the subject (Jaynes, 2003).How could I connect my pure invariance approachto Jaynes’ work on maximum entropy, which wasdeveloped explicitly as an extension to classicalthermodynamics and statistical mechanics?My work on probability seemingly has little rela-tion to the Price equation. However, in my otherstudies, I had been using the Price equation asa tool to understand natural selection in biology(Frank, 1986, 1995, 2012a). Over time, I began tosee the broader connections between the Price equa-tion and information theory (Frank, 2009b, 2012b,2013).Through those studies of natural selection andthe Price equation, I gained understanding of thedynamics of information. I was then able to seethe connections between some of the classic resultsof thermodynamic change in entropy and the equa-tions of natural selection.With that broader understanding of entropy19 it • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank and information dynamics, I could then synthesizeJaynes’ maximum entropy approach to probabilitywith my approach based on invariance and scale(Frank, 2017). Some fundamental aspects of phys-ical mechanics also began to fit within the unifiedstructure (Frank, 2015). All of that abstract workfed back into my analyses and understanding of age-related diseases, the sizes of trees, and the distribu-tion of enzyme rates (Frank, 2016c,b).For any of the particular insights into empiricalproblems or any of the particular mathematical re-sults, it would have been possible to achieve thesame without a broader perspective or an attemptto unify between disciplines. However, in fact, thebroader perspective and unification of disciplinesplayed a primary role. The universal law of generalization in psy-chology
The second anecdote shows how the broad frame-work led to a new insight for a particular discipline.In this case, I happened to read an article in
Science about an intriguing pattern in psychology (Sims,2018).The probability that an organism perceives twostimuli as similar typically decays exponentiallywith the separation between the stimuli. The ex-ponential decay in perceptual similarity is oftenreferred to as the universal law of generalization(Shepard, 1987; Chater & Vit´anyi, 2003).Both theory and empirical analysis depend on thedefinition of the perceptual scale. For example, howdoes one translate the perceived differences betweentwo circles with different properties into a quanti-tative measurement scale?There are many different suggestions in the liter-ature for how to define a perceptual scale. Each ofthose suggestions develops very specific notions ofmeasurement based, for example, on informationtheory, Kolmogorov complexity theory, or multi-dimensional scaling descriptions derived from ob-servations (Chater & Vit´anyi, 2003; Shepard, 1987;Sims, 2018).I showed that the inevitable shift invariance ofany reasonable perceptual scale determines the ex-ponential form for the universal law of generaliza- tion in perception (Frank, 2018). All of the otherdetails of information, complexity, and empiricalscaling are superfluous with respect to understand-ing why the universal law of generalization has theexponential form.Certainly, the insight that the inevitable shift in-variance of scale is a sufficient explanation does notrequire a broad conceptual framework derived fromthe Price equation. However, I was able to see im-mediately that solution only because I had for yearsbeen working toward a unified understanding of in-formation, scale, and invariance. Many others hadworked on this central puzzle in psychology withoutseeing the underlying simplicity.
Appendix B: Mathematical ex-pressions from various disciplines
See Tables 2 and 3 on following pages.20 it • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank Table 2:
Mathematical forms that highlight similarities between different disciplines, part 1
Mathematical form Comments Equation
Price equation: ∆¯ z = ∆ q · z + q (cid:48) · ∆ z Most general form; separates frequency, q , from propertyvalue, z ; partitions frequency and property value change 1∆¯ a = ∆ q · a + q (cid:48) · ∆ a = 0 Canonical form; emphasizes conservation of totalfrequency; recover general form by coordinate change a (cid:55)→ z Mathematical relations: ∆ q · z = (cid:107) ∆ q (cid:107)(cid:107) z (cid:107) cos ω Geometric equivalence for dot product; a ≡ F yieldsabstract expression of physical work (see below) 19∆ q · z = Cov( w, z ) Equivalent statistical form 13 q (cid:48) · ∆ z = E( w ∆ z ) Equivalent statistical form 15∆ q · a = (cid:13)(cid:13) ∆ q / √ q (cid:13)(cid:13) Geometric expression for total distance between sets interms of frequency; discrete generalization of Fisherinformation, F Physical mechanics: ∆¯ a = ( F + I ) · ∆ q = 0 Abstraction of D’Alembert’s principle for physical work inconservative systems; work from direct forces,∆ q · F = ∆ q · a , balances work from inertial forces,∆ q · I = q (cid:48) · ∆ a ; generalize by coordinatetransformation a (cid:55)→ z ; cases in which ∆¯ z (cid:54) = 0 describenonconservative systems 23∆ q · F = (cid:107) ∆ q (cid:107)(cid:107) F (cid:107) cos ω Abstract form of work as distance moved, (cid:107) ∆ q (cid:107) , multipliedby component of force along path, (cid:107) F (cid:107) cos ω ; for givenlengths of force and frequency change vectors, thefrequency changes that minimize the angle betweenforce and frequency change maximize the work 19 Information theory: ∆ q · m = J ( q (cid:48) , q ) Jeffreys divergence, J = D ( q (cid:48) || q ) + D ( q || q (cid:48) ) for z ≡ m = log q (cid:48) / q q · m → ˙q · a For small changes, m → a for ∆ q → ˙q ˙q · a = (cid:13)(cid:13) ˙q / √ q (cid:13)(cid:13) = F Abstract nondimensional expression of Fisher informationas distance of relative frequency changes 21 (cid:13)(cid:13) ˙q / √ q (cid:13)(cid:13) = 4 (cid:107) ˙r (cid:107) = F Fisher information as simple Euclidean geometric distanceof frequency change in unitary coordinates, r = √ q ˙q · F = ˙q · d log q = F For F ≡ a , work of direct forces in terms of d’Alembert 25 ˙q · I = ˙q · d log q = −F Work of inertial forces, the change in frame of reference 25
Bayesian inference: log L ≡ m ; L − ≡ a For relative likelihood, L q (cid:48) θ = q θ L θ Bayesian updating 37∆ q · log L = J ( q (cid:48) , q ) Follows from log L ≡ m q · log L → ˙q · a = F Follows from m → a for ∆ q → ˙q L = ∆ q · L + q (cid:48) · ∆ L = 0 Likelihood form of canonical Price equation, L − ≡ a it • arxiv @ arXiv-3.0-0::60e8239-2018-12-14 (2018-12-17 02:27Z) • safrank Table 3:
Mathematical forms that highlight similarities between different disciplines, part 2
Mathematical form Comments Equation
Natural selection: ∆ F ¯ a = ∆ q · a = V w Natural selection moves population a distance equal to thevariance in fitness; equivalent to abstract form ofphysical work with a ≡ F F ¯ a = V w = V g + V (cid:15) Partition variance (distance) into part associated withgenetic predictors, V g , and part associated with otherenvironment effects, V (cid:15) NS ¯ a = V g Analog of fundamental theorem, the part of totaltransmissible change caused by natural selection 12∆¯ p = Cov( w, p ) Replicator equation with p ≡ z as gene frequency withinindividuals and ¯ p as population gene frequency 14∆¯ p = Cov( w, p ) + E( w ∆ p ) Group selection with p ≡ z as gene frequency withingroups, first term as selection between groups, andsecond term as selection within groups 17 Extreme action: L = ˙q · φ + constraints Lagrangian as work of direct forces, φ ≡ F ; maximizing thework (action), ˙q · φ , chooses the frequency changes, ˙q ,in the direction of the forces subject to constraints 29˙ q i = κq i (cid:0) φ i − ¯ φ (cid:1) Dynamics for constrained total frequency and constrainedtotal distance, F = C , with κ = C/σ φ and σ φ asstandard deviation of forces 30 Thermodynamics: a = ∆ q / q → ˙q / q Equivalence for small changes 2 m = log q (cid:48) / q → ˙q / q Define force φ ≡ m , with q (cid:48) i = q i e m i → q i m i ˙q · φ = ˙q · log q (cid:48) − ˙q · log q Term − ˙q · log q is production of entropy 31 L = − ˙q · log q + constraints Maximizing Lagrangian maximizes production of entropy 33 ˙q · log q (cid:48) = − λ ( ˙q · z ) = − λB If ∆ z = , then constraint ∆¯ z = B implies ˙q · z = B , whichconstrains vector of new frequencies, q (cid:48) q (cid:48) = log k − λ z Force of constraint in previous line 32˙ q i = κq i ( E ∗ i − λz ∗ i ) Dynamics that maximize entropy production 35 Statistical mechanics: q i = ke − λz i Solution for probability distribution from force ofconstraint at equilibrium, q (cid:48) = q , and constraint¯ z = q · z = 1 /λ q i = ke − ( z i − µ ) / σ Gaussian distribution from constraint σ = q · ( z − µ ) q i = ke − λT ( z i ) Jaynesian maximum entropy distribution from constraint q · T ( z ) = 1 /λ Probability distributions: q = ke − λe βw Canonical form of continuous probability distributions; w ( z ) is shift-invariant scaling of z such that probabilitypattern is invariant to constant shift, w (cid:55)→ α + ww