aa r X i v : . [ c s . I T ] M a y Isotropy,entropy,andenergyscaling 1
Isotropy, entropy, and energy scaling
Robert Shour
Toronto, Canada
Abstract
Two principles explain emergence. First, in the Receipt’s reference frame,
Deg ( S ) = Deg ( R ), where Supply S is an isotropic radiative energy source, Receipt R re-ceives S ’s energy, and Deg is a system’s degrees of freedom based on its meanpath length. S ’s more degrees of freedom relative to R enables R ’s growth andincreasing complexity. Second, ρ ( R ) = Deg ( R ) × ρ ( r ), where ρ ( R ) represents thecollective rate of R and ρ ( r ) represents the rate of an individual in R : as Deg ( R ) in-creases due to the first principle, the multiplier e ff ect of networking in R increases.A universe like ours with isotropic energy distribution, in which both principles areoperative, is therefore predisposed to exhibit emergence, and, for reasons shown, aubiquitous role for the natural logarithm. Contents
The Network Rate Hypothesis . . . . . . . . . 82.2 Deriving and modeling η . . . . . . . . . . . . . . . . . . . . . . . . 92.2.1 The NRT : Observations, implications, and speculations . . . . 14 Degrees of Freedom Theorem 18
The DFT : Observations, implications, speculations . . . . . 21
List of Tables η . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 This article derives two theorems. One,
The Degrees of Freedom Theorem , describeshow more degrees of freedom in an energy source compared to the system receivingthe energy initiates and drives emergence. The other,
The Network Rate Theorem ,sotropy,entropy,andenergyscaling 2explains the thermodynamic benefit of networking. Together they provide a theory ofemergence. Related ideas possibly explain the natural logarithm and connect quantumscaling and gravity.Emergence is the name assigned to a process whereby an ensemble of simple com-ponents results in a system or process with features that the components do not have.An emergent phenomenon can not be predicted on the basis of the attributes of its fun-damental components. Life, markets, language, and ecosystems are emergent systems.
The Degrees of Freedom Theorem is more fundamental than
The Network RateTheorem for two reasons. First,
The Degrees of Freedom Theorem suggests how asystem begins. Second,
The Degrees of Freedom Theorem can be proved with onlymathematics. This article derives
The Network Rate Theorem first because it leadsto
The Degrees of Freedom Theorem . The Network Rate Theorem is revealed by ananalytical approach applied to language as an emergent system. An analytical approachattempts to account for an observed phenomenon. A synthetic approach instead wouldstart with components and deduce an outcome based on them. A synthetic approachhighlights a minimal set of relevant conditions necessary to deduce an outcome, but ishopeless as a starting point for an emergent outcome because it is impossible to knowin advance what conditions are minimally relevant. “. . . no collective organizationalphenomenon . . . has ever been deduced” (Laughlin, 2005, p. 88).That study of a collective phenomenon like language, which emerges due to rela-tionships among networked members of a society, could reveal laws of physics is notunexpected. The physicist David Bohm wrote: “Einstein’s basically new step [in spe-cial relativity] was in the adoption of a relational approach to physics” (Bohm, 1962,p. xvi). “ . . . the physical facts concerning time and space coordinates consist only of relationships between observed phenomena and instruments . . . . Likewise . . . the factsconcerning perception in common experience show that this also is always concernedwith relationships . . . ” (Bohm, 1962, p. 62). The physicist Lee Smolin writes that “net-works do not exist in space—they simply are. It is their network of interconnectionsthat define, in appropriate circumstances, the geometry of space . . . ” (Smolin, 1997,p. 285). Robert Laughlin, who won a Nobel Prize in physics, writes “The laws of na-ture that we care about . . . emerge through collective self-organization . . . ” (Laughlin,2005, p. xi). The philosopher David Hume wrote: “Tis evident, that all the scienceshave a relation, greater or less, to human nature” (Hume, 1739, p. xv).
The Network Rate Theorem emerges from conceptual foraging. Just as social in-sects follow promising paths over a physical landscape, humans follow promising pathsover a conceptual landscape. Honeybees scout potential hive locations; their societycollectively evaluates alternatives until a consensus emerges (Seeley, 2010). Foragingants follow other ants’ pheromone trails, and reinforce chemical signals. Due to ap-praisal, choice, review and refinement, a tested idea emerges from random foraging.In connection with computerized heuristics the mathematicians Zbigniew Michalewiczand David Fogel note: “The essential idea of evolutionary problem solving is quite sim-ple. A population of candidate solutions to the task at hand is evolved over successiveiterations of random variation and selection. Random variation provides the mecha-nism for discovering new solutions. Selection determines which solutions to maintainas a basis for further exploration” (Michalewicz, 2004, p. 161).sotropy,entropy,andenergyscaling 3Four ideas are pre-eminent in this article. One: the mean path length σ is theintrinsic scaling factor of a networked system of size n . Two: for n = σ η the exponent η gives the system’s intrinsic degrees of freedom and its intrinsic entropy. Using aparameter other than σ , as in the usual definition of entropy, gives only an indirect measure of a system’s intrinsic degrees of freedom. Three: intrinsic degrees of freedommultiplies capacity. Four: in the Receipt’s reference frame, an isotropic energy Supplyhas more intrinsic degrees of freedom than the system (Receipt) that receives theenergy.Some observations about the mean path length, degrees of freedom, intelligence,language and mathematics follow in this Introduction. Leaving out the observationsabout intelligence, language and mathematics would shorten the article and avoid thedistraction of possibly arguable points which do not a ff ect the derivations. The observa-tions are included because they provide context for reasoning leading to the derivations. On Mean Path Lengths.
The physicist Rudolf Clausius found that oxygen moleculesat the temperature of melting ice travel an average 461 meters per second (Clausius,1857; Brush I, p. 131). The physicist Buijs-Ballot objected that if so, “volumes ofgases in contact would necessarily speedily mix with one another”. “How then does ithappen that tobacco-smoke, in rooms, remains so long extended in immovable layers?”(Clausius, 1858; Brush I, p. 136). In reply, Clausius introduced the concept of the meanpath length (Brush I, p. 140). A gas molecule does not travel unimpeded but collideswith other gas molecules. The mean path length is the average distance (some fractionof a meter) a molecule moves before its center of gravity comes into contact with the‘sphere of action’ of another molecule.The psychologist Stanley Milgram (Milgram, 1967) asked, what is the length of anacquaintance chain connecting any two people selected arbitrarily from a large popu-lation. In his terminology, “A target point is said to be of the i th remove if it is of the i th generation and no lower generation.” Milgram asked people to mail a documenttowards a target in Boston. He measured the lengths of the acquaintance chains, andfound a mean of 5.2 links. His experiment is the origin of the expression ‘six degreesof separation’ presumed to separate, on average, any two people in the world.Clausius’s mean path length can be made equivalent to Milgram’s degrees of sep-aration by equating acquaintance and collision. Instead of finding the mean distancebetween gas molecules in meters, find the mean number of collisions separating gasmolecules. Suppose that both gases and societies collectively maximize their energye ffi ciency. Then the same fundamental equation that characterizes the e ffi cient use ofenergy should apply to both gases and societies. In this novel way the concept of meanpath length connects physical systems and networks; a general principle related to onemay apply to the other.In a seminal 1998 article, Watts and Strogatz analyzed ‘a small world network’.Earlier research had mostly studied entirely regular or entirely random networks. Theyexamined randomly rewired networks of an intermediate character. They defined the‘characteristic path length’ as the average of the least number of steps between pairsof vertices in a graph, a definition equivalent to degrees of separation. The clusteringcoe ffi cient C , which measures the result of their rewiring of a graph, is the networkaverage of all C i where C i is the proportion of one step away nodes that are actuallysotropy,entropy,andenergyscaling 4connected to each node a i in a network: C = P ni = C i n . For example, suppose node a k has5 neighbors one step away. If only 3 of them actually link to a k in one step, then theproportion connecting in one step is C k = . On Degrees of freedom.
The motion of a point in a plane has two degrees of freedom.The motions of N points on a plane have 2 N degrees of freedom. One degree of free-dom confers one choice on a given axis or line; a choice of left or right does not givetwo degrees of freedom. Suppose a node moves at any given time with 3 degrees offreedom. The rate at which collisions with other nodes in the same system occur doesnot a ff ect its degrees of freedom of motion at any given time. On the Nature of Intelligence.
IQ tests are designed and administered by psycholo-gists. A Task Force of the American Psychological Association in 1996 characterizedintelligence as the “ability to understand complex ideas, to adapt e ff ectively to the en-vironment, to learn from experience, to engage in various forms of reasoning, to over-come obstacles by taking thought” (Neisser, 1996). In this article, assume intelligenceis the rate of problem solving and that solved problems can be counted. That enablesmathematically modeling intelligence. Use of the model permits its appraisal.An IQ test indirectly measures an individual’s skill, using their innate problemsolving capacity, at solving the problem of learning from a society’s store of solvedproblems and from experience, and at applying (perhaps by joining together di ff erentideas) what that individual has learned. Since members of a society share the samestore of knowledge, average IQ measures, partly and indirectly, the IQ of that society.Think of average individual IQ, like a collective economic indicator, as the society’saverage problem solving rate per capita. On Collective Intelligences.
By analogy to economics, society collectively allocatesits collective resources to solving problems in a conceptual area until the outcomeis as beneficial as for collective resources spent on solving problems in some otherconceptual area. “. . . a potatoe-field should pay as well as a clover-field, and a clover-field as a turnip-field” (the economist Jevons, 1879, p. liv). Consider problems to bethe conceptual equivalents of turnips. “The product of the ‘final unit’ of labor is thesame as that of every unit, separately considered” (the economist, John B. Clark, 1899,p. viii): on average, solutions with the same energy cost should benefit society to thesame extent. Just as sectors in an economy compete for financial resources, problemsin a society compete for its collective problem solving resources. If an alternative useof problem solving energy gives society a better yield for its solutions, society divertsenergy to that alternative use until the solution yields are about the same.At all scales, a collective intelligence exceeds the intelligence of its component in-dividuals. Bees, wasps, ants and termites locate, design and build nests or hives witha collective skill that exceeds the cognitive capacities of individual insects. “Individ-ually, no ant knows what the colony is supposed to be doing, but together they actlike they have a mind” (Strogatz 2003, p. 250). The Greek mathematician Pappus (c.290 - c. 350) attributed a collective mathematical insight to bees: “Bees then, knowjust this fact which is of service to themselves, that the hexagon is greater than thesquare and the triangle and will hold more honey for the same expenditure of materialused in constructing the di ff erent figures” (Heath, Vol. 2, Ch. XIX, p. 390). Col-lective intelligence even occurs in bacteria (Ben-Jacob, 2010). In the field of Swarmsotropy,entropy,andenergyscaling 5Intelligence (SI) (Kennedy; Bonabeau), to solve di ffi cult problems computer scientistsand engineers use networked algorithms and robots to mimic the emergent collectiveintelligence of social insects.Cultures, economies and mathematics are collective intelligences. “Although gen-erated by the collective actions of lots of brains, cultures have storage and processingcapabilities not possessed by a single human” (Montague, 2006, p. 199), the wisdomof crowds (Surowiecki). A society’s economy has “dispersed bits of incomplete andfrequently contradictory knowledge which all separate individuals possess” (Hayek, p.77) leading to a market solution that “might have been arrived at by one single mindpossessing all the information” (Hayek, p. 86). In mathematics “the inner logic of itsdevelopment reminds one much more of the work of a single intellect, developing itsthought systematically and consistently using the variety of human individualities onlyas a means” (I.R. Shafarevitch, in Davis, 1995, p. 56). This also applies to physics,literature, biology and so on.Adoption by a society of a proposed solution to a problem depends on society’sestimate of its likelihood of success. A proposed site for a new bee hive is appraised bythe old hive. The success of a new electronic device in a human society is appraised,through the operation of a market, by all potential buyers. In SI, programmers imitatethis appraisal and approval e ff ect, creating ‘ant pheromone trail’ software for robots.An ant has about one million neurons; a human brain has about 100 billion. Sup-pose that the same physical laws govern the networking and problem solving outputcapacity of neurons in ants as in a single human brain. Then the collective behaviorof 10 billion neurons in 10 thousand ants and of 100 billion neurons in a human brainshould have similarities.Compare a society of 10,000 ants to a society of 10,000 human beings. If thesame physical laws govern, then the human society is as much more intelligent thanthe average component human as is the ant society than the average component ant.A society of 100 million humans—10,000 networked societies of 10,000 individualseach—is as much more intelligent than an average society of 10,000 humans as is asociety of 10,000 humans than its average component human. Consider that humanitythrough language, writing and culture can accumulate a store of solved problems forhundreds of human generations. The collective cumulative intelligence of all humansocieties is much greater than that of an individual human.Bert Holldobler and Edward O. Wilson write (1990, p. 252):For two reasons ants can be expected to practice economy in the evolutionof their communication systems, that is, to use a small number of relativelysimple signals derived from a limited number of ancestral structures andmovements. First, the small brain and short life span of ant workers limitthe amount of information these insects can process and store. Second, thetendency toward signal evolution through ritualization restricts the rangeof potential evolutionary pathways.On a di ff erent scale, the same observation applies to a human society. Consider thehistory of mathematics and language. On Mathematics and language.
For this article, it is not necessary to agree on what im-proves mathematics and language, or how. It is only necessary to assume that languagesotropy,entropy,andenergyscaling 6and ideas improve. Grounds for that assumption follow.From the time of the Babylonians 5,000 years ago until now mathematics has im-proved in the quality and e ffi ciency of concepts, methods and notation (Boyer, 1991;Cajori, 1928; Menninger, 1992). Like bees evaluating reports of nest sites from beescouts, incremental improvements in problem solving by individuals are evaluated bygroups of people for e ffi cacy and e ffi ciency. Similarly, language “is a continuous pro-cess of development” (Aitchison, 1989, quoting Wilhelm von Humboldt, 1836). His-torical linguistics (McMahon, 1994; Campbell, 1998) records a history of improvementin the formation of sounds and words; “a law of economy” (Herder, 1772, p. 164).The linguist Otto Jespersen (p. 324) observed that language “demands a maximum ofe ffi ciency and a minimum of e ff ort . . . [this] formula is simply one of modern energet-ics”. The linguist April McMahon writes “. . . sound systems tend toward economy” (p.30). In general, “. . . saving mental e ff ort may be the most important kind of economy”(Polya, 1962, II, p. 91). I conjecture that the average rate of progress in the e ffi ciency oflanguage is, for the economic reasons discussed above, the same as the rate of progressin mathematics. Both encode ideas—data.Data compression software provides an analogy to mathematics and language.In the 1990s, as the amount of electronic data transmitted, stored and accessed in-creased, and the processing power of computers increased, e ffi cient data compressionbecame economically important. Data compression software steadily improves (Sa-lomon, 2007). As society’s knowledge—data—increases, language increases com-pression of data it encodes through naming (categorization, or unification), contraction(can’t), clipping (bike, bus, condo) (Campbell, p. 278), acronyms (IBM), allusion(his, computer, next door), pattern (Salomon, p. 7), and metaphor (sunny disposition).Grammar—word order and word endings— “. . . provides relief to memory” (Diderot,The Encyclopedie). “Declensions and conjugations are merely shortcuts. . . ” (Herder,1772, p. 160).For humans “. . . language first of all is classification and arrangement of the streamof sensory experience . . . . In other words, language does in a cruder but also in abroader and more versatile way the same thing that science does” (the linguist Ben-jamin Whorf, 1956, p. 55). Society tests the utility, e ffi cacy and consistency of wordsthat encode—compress—perceptions and ideas. “All knowledge is a structure of ab-stractions, the ultimate test of the validity of which is, however, in the process of com-ing into contact with the world that takes place in immediate perception” (Bohm, 1965,p. 262).To encode and compress data into words requires a society to collectively solveproblems (McMahon, p. 138) that include: (1) how to devise and choose sounds tobe used for encoding; (2) how to assemble sounds into words; (3) what percepts andconcepts should be encoded. A language embodies a set of encoding problems collec-tively solved by the generations of a society that use it. It is a product of collectiveintelligence. Society performs the same function for language that software engineers(and their users) perform for data compression software. Hierarchies e ffi ciently orga-nize categories, structures, and methods for the assembly of words into larger structuressuch as sentences and theories. Analogy eases learning, remembering and using a lan-guage; similar endings, sounds, sentence structures, rhythms and musicality provide a‘relief to memory’. The problem solving elements of language—identifying conceptssotropy,entropy,andenergyscaling 7and encoding them—also apply to mathematics, physics, and ideas generally. Peoplewho adopt encodings with increased compression juggle more information per timeunit. Speedier problem solving is of immense value to organisms with finite life spans.Language also improves by adding new words: encodings of new or modified con-cepts. Collections of concepts, such as theories, also improve at all scales.Feedback from each use of a word is a scientific experiment. Society has testedmore encoded concepts more comprehensively, in more ways, more often, for peri-ods of time far longer, than they could be tested by any individual. Individual intel-ligence relies on an enormous store of highly tested and refined conceptual problemsolving tools created by thousands of generations of human societies. An individualusing someone else’s verified solution saves energy. Through compression a languageincreases in depth, and through the encoding of new concepts it increases in breadth.Diderot remarked over two centuries ago that “ . . . by merely comparing the vocabularyof a nation at di ff erent times, one would get a sense of its progress” (Diderot).Mathematical concepts are more e ffi cient, compressed, defined, and have beenmore widely tested, than concepts encoded into words. Societies test their own lan-guage over many generations, but all societies in all cultures have tested mathematicsin millions of contexts repeatedly over hundreds of generations. Mathematics can bemore precisely tested than words both through logical analysis and because the accu-racy of mathematics compared to physical phenomena can be measured . “Mathematicsis a part of physics. Physics is an experimental science, a part of natural science. Math-ematics is the part of physics where experiments are cheap” (Arnold, 1997).Just as 10,000 ants building a nest exhibit an intelligence far greater than that ofan individual ant, our store of mathematical knowledge created over the past severalthousand years by tens of thousands of mathematicians, tested and appraised by tensand hundreds of millions of people in daily, scientific and commercial contexts, exhibitsan intelligence beyond human comprehension. Mathematics as a disembodied networkof concepts ‘knows’ things that individuals do not. The ‘unreasonable’ e ff ectivenessof mathematics in the natural sciences (Wigner 1960; Hamming 1980) is like magic.An “education in . . . mathematics is a little like an induction into a mystical order”(Smolin, p. 178). The di ff erence between the intelligence of mathematics and of anindividual human is far greater than the di ff erence between the intelligence of 10,000societies and the intelligence of an individual human.Since mathematical ideas result from collective e ff orts of millions of people overhundreds of generations, common mathematical ideas such as counting numbers mustreflect fundamental principles underlying the natural world. Mathematical reasoningrelies on a higher, collective, intelligence. A mathematical theorem that predicts phe-nomena subsequently observed or that connects to other mathematical ideas is a formof experimental verification, the outcome of humanity’s collective scientific evaluationof mathematical concepts. Mathematical deduction can render express what is implicitin collective mathematical knowledge. A mathematical concept that appears to ap-ply to a phenomenon can be tested by applying it in di ff erent contexts, just as societycollectively does. On a question about intelligence.
Average IQs increase. Is this due to the improvementof language and ideas? This question impels foraging over the conceptual landscapesbelow.sotropy,entropy,andenergyscaling 8
The Network Rate Hypothesis
Consider words as part of society’s accumulated array of problem solving tools. If alexicon improves, so should problem solving. Researchers have observed that averageIQs have increased—improved—in the U.S. in the past 60 years or so by about 3.315%per decade; no one knows why (Flynn, 2007, p. 113, Table 1 at p. 180). Are increasingaverage IQs caused by words improving? Both improving at the same rate would bepositive evidence. A rate characteristic of language improvement is needed.
Measuring the depth of words is di ffi cult. Measuring the breadth (number) of words is a massiveundertaking, but has already been accomplished by academic dictionaries. If wordcounts of a lexicon at two di ff erent times use the same criteria, and if each count islarge, then the calculated rate of increase in the lexicon should be a good estimateof the rate of collective problem solving, because lexicons require a large number ofproblems to be collectively solved.The English lexicon increased from 200,000 words in 1657 (Lancashire, EMEDD)to 616,500 words in 1989 (Simpson, OED), 3.39% per decade. The University ofToronto’s partly completed Dictionary of Old English (DOE) contains Old Englishwords from the year 600 to the year 1150. Eight of 22 Old English letters, up to theletter g , had been completed at December 2008. Extrapolating from the 12,271 wordsfor the 8 completed letters—the dictionary counts æ as a separate letter—and assumingthe same average number of words per letter, gives 34,020 words in Old English for thewhole Old English alphabet of 22 letters. An increase from 34,020 words in 1150 to616,500 words in 1989 in the OED is an increase of 3.45% per decade. Both Englishlexical growth rates are close to the rate at which average IQs increase.The error arising from using an estimate ( ρ ( Lex )) Est of the actual rate of Englishlexical increase ( ρ ( Lex )) Act is calculable by comparing the di ff erence between the ac-tual size of the English lexicon in 1989, [ N ( t )] Act and an estimate [ N ( t )] Est based onan estimated English lexical growth rate ( ρ ( Lex )) Est applied to an actual initial Englishlexicon. Set ∆ N = [ N ( t )] Est − [ N ( t )] Act . Then( ρ ( Lex )) Est − ( ρ ( Lex )) Act = " ln + ∆ N [ N ( t )] Act ! ÷ ∆ t . (1)The error in the estimate of ( ρ ( Lex )) Act becomes smaller as the time period ∆ t increases.Do other studies measure the rate of increase in ideas accumulated by society? Thee ffi ciency of lighting in terms of its labor cost increased from 1750 B.C.E. to 1992 by . . = , . ffi ciency, perhaps due tothe choice of data.Morris Swadesh devised a method to estimate, based on the rate of their divergence,when two daughter languages had a common mother tongue. His method is calledglottochronology. First he compiled a list of 100 or 200 words basic to languages(the Basic List). He then calculated the rate of change between cognates (such asmoi in French and me in English) by comparing their use in historical records. Hefound (Swadesh, 1971) an average rate of divergence of two daughter lexicons of about14% per thousand years. In 1966, he used this divergence rate to estimate that Indo-European (English’s ancestral language) existed at least 7,000 years earlier (p. 84).Gray and Atkinson (2003) dated Indo-European to 8700 years earlier, using newermethods. Updating Swadesh’s calculation using Gray and Atkinson’s findings, twodaughter languages diverged from each other at ≈ × = .
2% per thousandyears; or 5 .
6% per thousand years each from a notionally static mother tongue. Whyhalf the updated Swadesh divergence rate 5 .
6% per thousand years is so much slowerthan the English lexical growth rate is a new problem. Swadesh’s method of estimatingthe divergence rate has been severely critiqued on criteria for identifying cognates andother grounds (Blust, 2000, p. 204; Campbell, 1998).If the updated Swadesh divergence rate estimates the common origin of two daugh-ter languages, does a rate exist which estimates when language itself began? To approx-imate the size of such a rate, suppose the 616,500 words of the 1989 Oxford EnglishDictionary grew from 100 words in 200,000 years. That would be 4 .
3% per thousandyears, not far o ff half the updated Swadesh divergence rate. Is half the updated Swadeshdivergence rate a fossil rate ρ ( r ) embedded in the much faster English lexical growthrate? This leads to: The Network Rate Hypothesis : There is a function η (small Greek eta) such that 3 . = ρ ( R ) = η × ρ ( r ), where ρ ( r ) is some kind offossil rate.The significance of being able to measure the rate of improvement in collective problem solving, via increasing average IQs, lexical growth and improvement in light-ing, is that measurability converts a qualitative question—do concepts improve—intoa testable hypothesis. η To investigate how language facility increases for an individual, ask how a child ac-quires words. A child learns words from two parents, who each learn from two par-ents, and so on. Suppose there are η antecedent generations, and (as an idealizationand simplification) each generation independently increases society’s accumulation ofwords at the same rate. If parents were the only source for words, the number of wordsource generations would be log (2 η ) = η . But other people can be word sources. Thescaling factor (the base of the log function) is not 2, but some unknown average value σ . σ must be determined in order to convert log σ ( n ) into a number. What number is σ ?sotropy,entropy,andenergyscaling 10Is σ an intrinsic average number of acquaintances? Primates usually live in bandsof 50 members; grooming is part of their social life (Dunbar, 1997, p. 120-122). Dun-bar suggests that a person virtually ‘grooms’ three times as many people using wordsas is possible grooming manually. Could σ be 3 or 50? Consider idealized speakerswho seek to transmit information with least e ff ort, and idealized listeners who seek todecode information with least e ff ort, as distinct groups (Zipf, 1949, p. 21). Does Dun-bar’s optimum audience of three balance the competing goals of speakers and hearers?Three or fifty, these numbers cannot work. Why? Appeal to mathematical reason-ing. If more persons transmit information (if σ is greater), information received shouldbe greater. But, on the contrary, as σ increases, log σ ( n ) decreases. It is impossible, if η multiplies ρ ( r ), that the Network Rate decreases for the individual with more infor-mation sources. The Network Rate Hypothesis , or an assumption, explicit or implicit,on which it is based, is wrong or the function η is not logarithmic. Suppose The Net-work Rate Hypothesis is valid and that η is a logarithmic function. Then reconsider theassumption that σ is a fixed number. What parameter would σ have to be for η to belogarithmic? σ must cause log σ ( n ) = η to increase when σ decreases.If information takes less time to reach an individual, then the rate of increase in theindividual’s store of information should increase. More information can be receivedduring a lifetime. A faster average rate of information transfer implies a shorter averageminimum transmission (or relationship ) distance per time unit between transmittingand receiving individuals. The mean path length corresponds exactly to such a distance.Suppose then that σ is a network’s mean path length. For simplicity’s sake, supposethat the actual number of steps between pairs of nodes equals the average number ofsteps, σ . Finding η is still not complete. In an actual network not all pairs of nodes σ steps apart are actually connected. If each node receives an average proportion C < ff ect of log σ ( n ), the network needs to increase the number ofnodes to σ η C to have the same value of η as a network with C =
1. Conclude that η = C × log σ ( n ) and that in general ρ ( R ) = C log σ ( n ) × ρ ( r ), where ρ ( R ) and ρ ( r ) arerates, and C is the network’s clustering coe ffi cient. On the assumption that an average exists.
To apply the formula for η to actual networksrequires that average rates proportional to the mean path length exist. Average IQsexist. Economists calculate average gross domestic product per capita. In principlecriteria for counting di ff erent kinds of problems solved by people can be designed andthe average number of problems solved per time period can be calculated. In principle,therefore, the average rate of problem solving per capita is calculable. If the averagerate of problem solving obeys laws of economic e ffi ciency, the average rate of lexicalproblem solving and the average rate of solving lighting problems in terms of laborcost, can be used as proxies for the average rate of collective problem solving. In thisarticle, only the average features of problems are of interest.Obtaining a count of problems is not easy, especially counts that are reflective ofsociety’s collective problem solving (such as words in a lexicon). All that is necessarythough is to assume counting is possible in principle. If problems can be counted inprinciple , the average collective problem solving rate can be calculated in principle . Testing η . Before spending time and energy scouting for an explanation for the pro- The balance of this article sorts out the implications of the answer to this question. sotropy,entropy,andenergyscaling 11
Network
Nodes Number of nodes σ C η NotesActors people 225,226 3.65 0.79 7.52 1
C. elegans neurons 282 2.65 0.28 1.62 1Human Brain neurons 10 Table 1: Calculations of η Notes to Table C and σ are based on values in the article by Watts and Strogatz (1998).2. The number of neurons: Nicholls, 2001, p. 480. σ and C : Achard, 2006.3. The number of words: OED (Simpson).4. σ and C : Ferrer, 2001 based on about 3 / σ = C = .53 based on an English thesaurus of about 30,000 words, a smaller andless representative sample.5. The number of words: EMEDD (Lancashire).6. σ and C : based on the actors study of Watts and Strogatz (1998).7. The number of people is an estimate of the English speaking societies in 1989, by adding censuses:1990 USA, 248.7 million people (Meyer, 2000); 1991 Canada 27,296,859; 1991 England 50,748,000; 1991Australia, 16,850,540 people. These total 343,595,000 people.8. The number of people in England: Table 7.8, following p. 207, for the year 1656, Wrigley, 1989. posed η , determine if it works. If it does, then why it does will be the next problem.For data, researchers have measured the mean path length σ and the clusteringcoe ffi cient C for some networks. Data on a line in Table 1 is used to calculate η ( n ) = C log σ ( n ) for the same line. The values of σ and C for a population of actors (Watts &Strogatz, 1998) are applied to human societies generally; this is justified below usingthe Natural Logarithm Theorems and
The Degrees of Freedom Theorem .If
The Network Rate Hypothesis is valid, the average problem solving capacity( ρ ( R )) av of English speaking society, not including the e ff ect of using language, from1657 to 1989 is ( η ( pop )) av = . + . = .
72 times the average individual rate, ρ ( r ).Treat English society itself, without the use of language, as a single collective brainwith innate problem solving capacity ( ρ ( r )) Coll = . × ρ ( r ). Multiply ( ρ ( r )) Coll bythe increase in capacity ( η ( Lex )) av conferred on ( ρ ( r )) Coll by the English lexicon. For1657 to 1989, ( η ( Lex )) av = . + . = .
68. Now find the average individual innateproblem solving capacity ρ ( r ) using society’s worded problem solving capacity: ρ ( R ) ≈ .
41% per decade = ( η ( Lex )) av × ( ρ ( r )) Coll = ( η ( Lex )) av × ( η ( pop )) av × ρ ( r ). ρ ( r ) = .
6% per thousand years, exactly half the updated Swadesh divergence rate.“Such an agreement between results which are obtained from entirely dif-ferent principles cannot be accidental; it rather serves as a powerful con-firmation of the two principles and the first subsidiary hypothesis annexedto them” (Clausius, 1850).In this case, the subsidiary hypothesis is
The Network Rate Hypothesis . From about June 2007 to June 2009, I averaged a starting individual rate of 0 and ρ ( r ), instead of sotropy,entropy,andenergyscaling 12Using values from about 1989 in Table 1, ρ ( R ) = η ( pop ) × η ( Lex ) × ρ ( r ) = × . × ρ ( r ) = × ρ ( r ) . (2)Equation (2) implies that what a 1989 individual experienced as a proprietary rate ofproblem solving, 71 × ρ ( r ), mostly derives from η ( pop ) × η ( Lex ).What manner of concept is η ? Why does it work?To simplify, assume that all binodal distances are σ steps, all nodes have equalcapacities to receive and transmit information and all transmissions have an equalamount of information and use the same amount of energy. Assume a network with σ η = n information sources. With these simplifying assumptions, the focus is on net-work level characteristics. Like the temperature outdoors, component level (molecu-lar) characteristics and variations are irrelevant. One number su ffi ces. If ρ ( r ) = k σ ,log σ ( σ η ) = log k σ (( k σ ) η ) = log k σ ( k η σ η ).Instead of two parents, four grandparents and so on supplying words, each first gen-eration receiver has σ second generation sources, σ third generation sources and soon up to σ η = n η th generation sources. Each node receives the η benefit of networking,which implies that all possible connections form. Then each node has σ + σ + . . . + σ η sources of information. But since the η th generation alone has n = σ η nodes as informa-tion sources, as well as information sources in generations 1 through η −
1, each nodewould have more information sources than there are nodes. A related issue is, supposethe network receives n units of energy per time unit for each generation of informa-tion exchange involving a particular node. There is not enough energy (and thereforenot enough time in a round of information transmission) for all possible combinatorialstates. Can this counting problem be resolved? (This problem relates to the ergodichypothesis, discussed below.) If not, the hypothesis fails.Next is the commensurability problem posed by dimensional analysis (Bridgman,1922): how can the mean path length, a measure of distance, scale n , a population size?A scaling subgroup for a population should be a sub-population, not a distance. Thirdis the n − n − n , unlessthe node transmits, impossibly, new information to itself. Fourth, how would such anetwork be wired? Fifth (a vexing problem of categorization): is a mean path length adistance or a scaling factor?If the first four problems are irresolvable (the fifth problem will be dealt with sep-arately later), then η ( n ) = C log σ ( n ) must be false. Yet ρ ( r ) matches half the updatedSwadesh divergence rate too closely to be coincidence. Since η applies to transmissionof information, ideas from information theory may help.Claude Shannon derived a formula (1948) for the information content η of a stringof 0s and 1s, where p i is the probability of the i th symbol, η = X p i log p i ! . (3) averaging the η ’s of the lexicon and population and holding ρ ( r ) constant over the relevant time period. Thisgave ρ ( r ) = η because of the precision of the 4 : 1 ratio. Some of my older preprints on arXiv havethis error. sotropy,entropy,andenergyscaling 13Shannon used a graph to show that η is maximum for a given number of bits when theprobability of each bit occurring is the same ( p i = p j , ∀ i , j ), which is also explained(Khinchin, 1957, p 41) by Jensen’s inequality. Shannon’s observation is called themaximum entropy principle (Jaynes). Equation (3) has the same form as that used forentropy in thermodynamics, K × X p i log x p i ! . (4)Assume equality of all of a network’s nodes, p i = n , ∀ i in Equation (4). Substitute σ for x . Then P n log σ ( / n ) = log σ ( n ). η in The Network Rate Hypothesis has thesame form as Equation (4). K in Equation (4) corresponds to the clustering coe ffi cient C . “. . . discoveries of connections between heterogeneous mathematical objects can becompared with the discovery of the connection between electricity and magnetism . . . ”(V.I. Arnold); connections between di ff erent mathematical models imply they share acommon principle. η ’s connection to entropy connects η to thermodynamics.Commensurability: suppose that the number of nodes n and the mean path length σ are both proportional to a common measure of energy . If it takes σ energy units totravel σ steps, then an average of σ people are within σ steps of each of the network’snodes.Counting problem: η mathematically requires multiple scalings yet a constant ar-gument n . Energy must scale in a uniformly nested way. A cluster of nodes scalesby σ , not like a pyramid, adding nodes at each next proceeding level, but internally,by uniformly subdividing into σ subclusters. For example, 27 nodes can be internallyscaled by 3 as follows:[ { aaa }{ aaa }{ aaa } ] [ { aaa }{ aaa }{ aaa } ] [ { aaa }{ aaa }{ aaa } ] . (5)A node when networked as in (5) has 3 network capacities, depending on in which sizecluster, 3 nodes, 9 nodes, or 27-nodes, its capacity is exercised: η (27) = log (27) = σ and the number of nodesper cluster decreases by 1 /σ . The number of nodes per generation is constant. The firstgeneration has σ clusters each with σ η − nodes, the second generation has σ clusterseach with σ η − nodes, and so on, until the η th generation of σ η clusters with one nodeeach. ( k + st generation clusters nest in k th generation clusters.Suppose a network of n equal nodes receives n energy units per time unit. Pertime unit, each node only has enough energy to binodally connect to one other node,not to all σ possible nodes. What η multiplies is capacity ; η measures R ’s degreesof freedom relative to R ’s mean path length, σ : η ( n ) = Deg σ ( R ). Since each nodeis in each uniformly scaled nested generation, each (average) individual has the samenumber of degrees of freedom as the network itself. This resolves the wiring problem.It also resolves the n − ρ ( r ) is σ ; ρ ( r ) ∝ σ . Define the intrinsic entropy or degreesof freedom of a system X with n nodes as Deg σ ( X ) = log σ ( n ). In network theory, σ equals the average degrees of separation. For a society with σ η = n people in it, everysotropy,entropy,andenergyscaling 14person has the same ‘relationship stride’ (acquaintanceship distance), σ . Treat the startposition as the first generation or stride. Then η strides, or degrees of freedom, spansthe society. In thermodynamics, for an ideal gas G , Deg σ ( G ) = log σ ( n ) where σ isthe intrinsic mean path length for colliding molecules, measured in collision steps. Ininformation theory, σ = The Network Rate Theorem ( NRT ): For anisotropic system R with n = σ η nodes and mean path length σρ ( R ) = η ( R ) × σ = Deg σ ( R ) × σ (6)and in general for a clustering coe ffi cient 0 < C ≤ ρ ( R ) = C log σ ( n ) × ρ ( r ) . (7)When C = R . The NRT : Observations, implications, and speculations
On the special role of the mean path length.
Is there a way, without using Shannon’sgraph or Jensen’s inequality, to show how the mean path length optimizes η ? Consider asystem of water containers. Level 1 has σ water containers, each supported underneathat level 2 by σ water containers. Water is supplied at the same rate to each of the firstlevel water containers. When a first level water container is full, water spills into itssupporting level 2 water containers. If one level 2 container is smaller than the restwhich are equal in size, it spills water while the other containers are still filling. Ifone level 2 container is bigger than the rest which are equal in size, the rest spill waterwhile the bigger container is still filling. Analogize water to energy. A networkedsystem utilizing energy supplied at a fixed rate will increase its rate of output if it usesmore (and wastes less) of the energy supplied per time unit. Nested, uniformly scaleddistribution of energy from a Supply S induces a nested, uniformly scaled structure ina networked system R receiving the energy, as otherwise energy supplied per time unitby S is not fully utilized by R .Suppose a central energy source radiates energy. E ffi cient flow must be uniform inevery direction. A wave front circular from the source maximizes entropy.What mechanism allows a network to find its average scaling factor? Unite theconcept of an ideal network with the concept of an ideal heat engine. The idealizednetwork discussed above consists of all possible pairs of nodes, all σ steps apart andequal in capacity. In Sadi Carnot’s (1824) ideal heat engine the cylinder contains aworking substance such as air between a fixed plate and a movable frictionless pis-ton. A furnace transmits heat to the otherwise perfectly insulated cylinder, causing theworking substance to expand. The furnace ceases contact with the cylinder and is putin touch with a heat sink which removes heat from the working substance causing it tocontract. Then the heat sink is removed, and the cycle repeats. The piston cycles upand down moving an attached articulating arm. Carnot proved that no heat engine canbe more e ffi cient than an ideal heat engine. No energy is lost other than to moving thearticulating arm.sotropy,entropy,andenergyscaling 15Consider the piston’s initial position and the unique turning point in the heat cycleto be two nodes: a heat engine’s heat cycle is intrinsically binodal. Treat the furnace asone node and the heat sink as the other. Remove the articulating arm. Place a furnaceand a heat sink at each node, so that energy can equally well move from one node tothe other. A binodal symmetric ideal heat engine can perfectly transfer energy fromone node to the other.Suppose that the amount of energy required to transmit information is proportionalto the amount of information transmitted. By analogy, construct a binodal symmetricideal information engine which transmits information from one node to the other. Allnodes have identical transmission and reception capacities with no energy or informa-tion loss. Form an ideal network consisting of symmetric ideal information engines.No information exchange network can be more e ffi cient. Each generation of isotropicinformation exchange is equally and perfectly e ffi cient. If the physical environmentchanges, a network whose nodes all have equal capacities in each generation of infor-mation exchange will be the quickest to cycle through the generations of informationexchange required to reach an optimal fitness landscape. This (I conjecture) modelshow networks binodally communicate change to their constituent components. Opti-mal local binodal exchange leads to global optimality ; social insects are an example of“a decentralized multiagent system whose control is achieved through locally sensedinformation” (Kube & Bonabeau, 2000, p. 91), as are languages (speakers and hearers),markets (buyers and sellers), and genes (two strands of DNA).
Comparing entropy and intrinsic entropy.
In 1848, William Thomson (Lord Kelvin)used Sadi Carnot’s analysis of an ideal heat engine and the contraction of gases whencooled to find absolute zero (Kelvin, 1848). The volume of an ideal gas contracts inproportion to absolute temperature. Clausius sought and found an invariant property ofthe ideal heat engine cycle. He called it entropy (Clausius, 1865, p. 400; in Englishtranslation, 1867, p. 365). In Clausius’s derivation of entropy (1879, p. 79), he com-pares the volumes of the working substance at di ff erent stages of the heat cycle andfinds (p. 83) that ¯ dQ T − ¯ dQ T = , (8)where ¯ d is an inexact di ff erential, ¯ dQ is the heat added to the heat engine from thefurnace at the absolute temperature T , and ¯ dQ is the heat removed from the heatengine by the heat sink at the absolute temperature T . ¯ dQT is the change in entropy.Boltzmann (1872) remarked that a system can achieve equilibrium on a macro-scopic scale. For example, air has a measurable temperature. At a microscopic scale,on the other hand, there is constant molecular motion. He inferred that the average exchange of energy of gas molecular collisions must also be steady. “The determina-tion of average values is the task of probability theory” (p. 90, English translation).Boltzmann’s H Theory ( − H = entropy) used probability and a log function. Build-ing on Boltzmann’s work, the physicist Max Planck derived the formula for entropy η = K P p i ln (cid:16) p i (cid:17) (Planck, 1914).Clausius’s definition of entropy is ba ffl ing. ¯ dQT = dS is the change in entropy, butwhat does it represent? The ratio definition, based on an ideal heat cycle, relies onsotropy,entropy,andenergyscaling 16experiment. A degree Kelvin equals a degree centigrade based on the freezing andboiling points of water. 0 degrees Kelvin was determined by experiment. The importantpractical advantage of Clausius’s ratio definition is its use of temperature, which canbe easily measured.In The Network Rate Theorem ρ ( R ) = η × σ ⇔ η = ρ ( R ) σ , where σ is the system’smean path length. Changing a system’s entropy changes its degrees of freedom. Forexample, d η = log σ ( σ m ) − log σ ( σ n ) = m − n . Assume a system’s output rate ρ ( R ) equalsits energy input rate ρ ( E ). Then, η = ρ ( E ) σ . In Clausius’s definition of entropy, ¯ dQT = d E ǫ , E being a total amount of energy and energy ǫ a scaling factor. The numerator onthe left side ¯ dQT is a change in heat, which is equivalent to a change in energy, andthe denominator is the absolute temperature, which is proportional to an amount ofenergy ǫ . Clausius’s definition of entropy is equivalent to η = ρ ( E ) σ , except that it usesa scaling factor T proportional to σ in the denominator. Clausius’s ratio definition so indirectly measures a system’s intrinsic degrees of freedom that it altogether obscuresits connection to degrees of freedom.Replace η in The Network Rate Theorem by log σ ( n ), and ρ ( E ) = ρ ( R ) = log σ ( n ) × σ. (9)A mathematical union of a ratio definition of intrinsic entropy with the statistical defi-nition of intrinsic entropy gives The Network Rate Theorem .If two gases at di ff erent temperatures mix, they will reach an equilibrium statewith a common temperature. A calculus proof of this uses di ff erential equations. Moresimply, when two gases mix, repeated binodal collisions lead to a new mean path length σ for the combined system, and hence a common average temperature ( ∝ σ ).Mechanics studies how two particles interact. It is not possible to consider ev-ery collision, for example, of 6.02 times 10 oxygen gas molecules (about 32 gramsworth). Boltzmann had the idea of dividing a space up into cells, and calculatedthe expected statistical distribution of energies among the di ff erent cells. Trillions ofmolecules have a small set of di ff erent speeds or energies, a statistical mechanics. Us-ing the mean path length to scale a system reduces the number of parameters from asmall set to one, a conceptually compressed statistical mechanics. Like categorizing acountry’s wealth by its GDP per capita. On degrees of freedom and system capacity.
The exponent of a system’s mean pathlength σ in n = σ η measures its intrinsic degrees of freedom. The collective rate ofa system ρ ( R ) = log σ ( n ) × ρ ( r ), where ρ ( r ) is the rate of an average individual. Sup-pose that ρ ( r ) is a constant, as is the case for average innate human problem solvingcapacity over the past few thousand years. While average individual innate capacityis unchanging, average individual capacity increases if the individual’s innate capacityhas more degrees of freedom to which it can be applied. That occurs when an indi-vidual adds to their store of solved problems—knowledge. In cellular phone networks,researchers observe that increasing the degrees of freedom in multiple input multipleoutput (MIM0) antenna systems leads to a ‘gain’ in capacity (Jafar, 2008; Borade,2003; Molisch, p. 521). More antennas, more degrees of freedom. On glottochronology: reconciling the divergence rate with the English lexical growthrate. .
6% per thousand years, half the updated Swadesh divergence rate, equals thesotropy,entropy,andenergyscaling 17innate individual average problem solving rate. A population M with a common mothertongue divides into daughter populations D D M . Assume that D D M all have the samesize populations and same size lexicons: at t , Lex M = Lex D = Lex D , and pop D = pop D = pop M , so η ( pop ) × η ( Lex ) = η for D D M . To find one half the averagedivergence rate assume Lex M ( t ) = Lex M ( t ), so ( ρ ( r )) M =
0. Then compare the rate ofchange for each daughter language to the rate for a static mother language.
Lex D ( t ) Lex M ( t ) = (1 + ( ρ ( r )) D ) × η × Lex D ( t )(1 + ( ρ ( r )) M ) × η × Lex M ( t ) = (1 + ( ρ ( r )) D )1 + = + ρ ( r ) . (10)The η ’s and the lexical sizes in numerator and denominator of Equation (10) cancel.If the daughter tongues undergo changes independent of each other, then each Lex D grows at the same average rate ρ ( r ) away from Lex M . The average rate of lexical diver-gence of the two daughter languages equals 2 ρ ( r ). Swadesh’s updated divergence rate(remarkably) indirectly measures twice the average innate individual human problemsolving rate. On mitochondrial Eve.
Using maternal mitochondrial DNA, Rebecca Cann, MarkStoneking and Allan Wilson dated a single woman ancestor to 200,000 years ago(Cann, 1987). Nested scaling implies that dates a first generation, not an individual(Gould, 2002).
On nested scaling and the natural logarithm.
Clusters of size σ scale by σ , so d σ dt = σ .This implies σ = e . In Table 1, the human brain, neurons in C. elegans, and Englishwords all have a path length close to e ≈ . The Natural Logarithm Theorem 1
The natural logarithm is a consequence of uni-formly nested energy scaling.
The number of generations is proportional to time. The ubiquitous role of the nat-ural logarithm in dating processes evidences the uniform nested scaling of isotropicsystems.
On economics.
An objection to applying statistical mechanics to economics is ‘indi-viduals are not gas molecules’ (Sinha, 2011, p. 147). The mean path length is a bridge.Let ( ρ ( EcGr )) av be a country’s average rate of economic growth and ( LP ) its laborparticipation rate. Let the economic productive capacity of the average working indi-vidual equal their economic problem solving capacity ( ρ ( r Ec )) av ≈ The Network Rate Theorem ,( ρ ( EcGr )) av = ( η ( pop )) av × ( LP ) av × ( ρ ( r Ec )) av . (11)The average individual in a society has the same number of degrees of freedom in theirrate of economic problem solving ( ρ ( r Ec )) av as the entire society, which at any givensotropy,entropy,andenergyscaling 18time is η ( pop ) × η ( K ), where K represents society’s store of knowledge. It follows that( ρ ( EcGr )) av = (( η ( pop )) av ) × ( η ( K )) av × ( LP ) av × ρ ( r ) , (12)where ρ ( r ) is the average innate problem solving rate. If education increases ( ρ ( r Ec )) av in Equation (11), economic growth should increase.Data can test Equation (11). ( LP ) is about 66% for the U.S. (Mosisa). Now estimatethe U.S. economic growth rate from 1880 to 1980. In 1880 the U.S. had 50,155,783people (1880 census, Table Ia) and in 1980, 226,545,805 (1980 census, Table 72). η ( pop ) = . η ( pop ) = . η ( pop )) av = . ρ ( EcGr ) = . × . × . per decade = . per year . (13)U.S. productivity per hour from 1880 to 1980 increased about 2.3% per year (Romer,1990) close to the calculated rate in Equation (13).Morality and laws might arise as an emergent set of rules for protecting the factorson the right side of Equation (11). Utility theory applied to economic maximization ischallenged by problems like choosing whether to hurt a person to save several people.The dichotomy of utility theory is built into Equation (11). On cosmology.
Suppose the universe is 13.7 billion years old (about 4 . × sec-onds) with constant scaling factor s proportional to time. Suppose its entropy is 10 (Frampton, 2008). For The Network Rate Hypothesis , let ρ ( S ) be the age of the uni-verse. Then ρ ( S ) = . × seconds = η × s = × s , (14)which implies that s has a finite quantum size proportional to about 10 − of a second,much smaller than one Planck time, about 10 − seconds. Perhaps intriguing. On possible connections to quantum mechanics. η repetitions of σ is wave-like. Dis-crete clusters are particle-like. Nestedness of cluster generations resembles superposi-tion in quantum mechanics. Clusters are countable like quanta. In quantum mechanics, E = h ν , where E is energy, h is Planck’s constant, and ν (small Greek nu) is frequency.Is this an analog of The Network Rate Theorem ( h being the analog of the scaling fac-tor)?Hugh Everett (1957) discussed a ‘many worlds’ interpretation of quantum mechan-ics. Nested degrees of freedom can replace ‘many worlds’. Robots and algorithms.
Nested scaling or degrees of freedom of robots and algorithmsshould increase the e ffi ciency of such systems. On epidemiology.
Transmission of disease is analogous to transmission of information.If ρ ( r ) can be determined, then it may be possible to calculate ρ ( R ) for a population. NRT testing.
Scaling occurs in allometry. Does
The Network Rate Theorem applythere? Degrees of Freedom Theorem
Allometry is the study of scaling relationships in organisms.sotropy,entropy,andenergyscaling 19In the allometry of metabolism Y = aM b ; Y is the organism’s metabolism, a isa constant, and M is the organism’s body mass. In The Network Rate Theorem , theexponent of the scaling factor varies with size, for metabolism the exponent b is fixed. The Network Rate Theorem must be adapted to apply it to allometry.First, some background. In 1879, Karl Meeh supposed that b = : an organism’ssurface area dispersing body heat grows by a power of 2 while its mass supplyingheat grows by a power of 3 (Whitfield). But is wrong. Not all energy goes to heat;energy is also used for movement, problem solving, growth and reproduction. Kleiber’sdata (1932) supports b = . West, Brown and Enquist (WBE 1997) compared scalingfactors, an idea adapted in the derivation below. Treating the circulatory system as atransport system for materials, they found b = . Kozlowski & Konarzewski (2004and 2005) identified errors in WBE’s mathematics; b = has not been mathematicallyproven.The erroneous but usefully simpler scaling hypothesis reveals a way to adapt TheNetwork Rate Theorem to metabolic scaling. Let ρ ( s ) be the average heat supply rateof an organism cell, and ρ ( r ) be the average heat dispersing rate of a unit area on itssurface. Assume heat generated is proportional to a small organism’s volume V (aSupply S ) and all generated heat is uniformly dispersed through its surface area θ (aReceipt R ). Let V ∝ ( ℓ ) , where ℓ is a length. Scale S ’s length by s . For a largerorganism V ∝ ( ℓ ) = ( s ℓ ) = s ℓ . Surface area θ ∝ ( ℓ ) = σ ℓ where ℓ scales by σ . In general, for S V k + ∝ s k ℓ ρ ( s ) (15)and for R , θ k + ∝ σ k ℓ ρ ( r ) . (16)By The Network Rate Theorem , the ratio of the capacities of Supply S to Receipt R islog s ( s k )log σ ( σ k ) =
32 (17).and of R to S is , the ratio of 2 dimensions to 3. In metabolic scaling, Supply S isthe circulatory system, Receipt R is the organism, and the ratio of their dimensions is1. How is a ratio possible?A connection to occurs in an intermediate step in the proof of Stefan’s Law (Allen& Maxwell, p. 742–743; Longair, 2003, p. 256–258) concerning isotropic energyradiation. Boltzmann derived Stefan’s empirically determined law (Boltzmann, 1884).Planck has the intermediate step as (1914, Ch. II, p. 62) ∂ S ∂ V ! T = u T , (18)where S is entropy, T absolute temperature, V volume, and u = UV is energy density. U is the total energy of the system.sotropy,entropy,andenergyscaling 20The left side of Equation (18) is the change of entropy per change in volume atabsolute temperature T . Since an ideal gas volume changes in proportion to T , theleft side measures how entropy changes relative to a scaling factor, ∂ V , proportional to T . uT on the right side measures the number of scalings in u (energy density received)based on T . Hence, implicitly Equation (18) says that for uniformly radiating energythe number of scalings on the left side is those on the right side, a ratio that connectsto metabolic scaling.If scaling applies to radiating energy then scaling should apply at all scales.Isotropic radiation explains the sphericity of the space fillers used in WBE 1997: Deg ( sphere ) =
3. Isotropy also is a large scale feature of the universe (Fixsen, 1996).Connect isotropy at all scales to energy distribution in organisms. Treat S as uniformlyscaled and nested. For the circulatory system, the aorta is the first generation and thecapillaries are the η th . Identify the average radius for a part of a cone of radiation withthe radius of a tube to obtain: The Degrees of Freedom Theorem : In R ’s reference frame, Deg ( S ) = Deg ( R ),where S is an isotropic supply of energy and R receives S ’s energy. Proof:
The uniformly nested scaled model must be extended to account for an initialenergy source. An initial energy source is external to a system. A system’s degrees offreedom are within it. Hence, a 0 th generation energy supply is required. Designate apoint source 0 as the 0 th generation of an energy supply S .Uniformly scaled nesting or degrees of freedom corresponds to S isotropically ra-diating energy at all scales at ρ ( s ) = s energy units per time unit into a Receipt R .Consider 0 and all points in a cone of radiated energy originating from it as comprisinga Supply. Let ℓ represent a radial distance traveled by radiation at the rate s energyunits per time unit or scaling. Let V k , ∀ k ≥
1, be the portion of the cone ( V k containssub-Supplies) with average radial length ℓ k . (The radial length is averaged since theends of V k are curved surfaces.)Let s scale V k , such that V k + = sV k . Since energy density D k + = s D k , ρ ( E k + ) = ρ ( V k + ) ρ ( D k + ) = s ρ ( V k ) s ρ ( D k ) = ρ ( V k ) ρ ( D k ) = ρ ( E k ); the rate of radiation is constant.To be able to compare the number of degrees of freedom in S relative to R , let γ ‘scale’ the average radial length ℓ k of V k : γ ≡ ℓ k + /ℓ k =
1. From 0 to the far endof V k + , the radiation front has η ( ℓ k + ) = log γ ( γ k ) = k scalings; the radiation front is( k + × ℓ from 0. In S , Deg s ( s k V ) = Deg γ ( γ k ℓ ) = Deg s ( s k ℓ ) and if Deg ( V ) =
1, then
Deg ( ℓ ) =
1. In S , γ = s .Let r k be the average radius of V k = π ( r k ) ℓ k . Cones have a uniform slope. Let β ≡ r k + / r k represent the scaling of the average radii for V k .Scaling factors s , β and γ are instrumental variables for comparing the degrees offreedom in Supply S relative to the degrees of freedom in its Receipt R .In S , since ℓ i = ℓ, ∀ i > V k + V k + = s = s k + V s k V = π ( β k + r ) ( ℓ k + ) π ( β k r ) ( ℓ k + ) = β , (19)so β = s . If Deg s ( V k ) =
1, then
Deg β ( r k ) = .sotropy,entropy,andenergyscaling 21Since radiation is isotropic, for every V k transmitting energy at the rate ρ ( E k ) let θ k be a corresponding spherical Receipt receiving energy at the same rate and scaling by σ with radius ξ k = ℓ k scaling by α ≡ ξ k + /ξ k . σ and α are instrumental variables fordetermining, in R , the degrees of freedom of the sphere θ k in R relative to θ k ’s radius ξ k . If Deg σ ( θ k ) =
3, then
Deg α ( ξ k ) = θ k + θ k + = σ = σ k + θ σ k θ = π ( α k + ξ ) π ( α k ξ ) = α , (20)so α = σ . If Deg σ ( θ k ) =
1, then for ξ k = ℓ k in R Deg α ( ξ k ) = Deg γ ( ℓ k ) = . But in S Deg γ ( ℓ k ) would be 1.Compare the relative number of degrees of freedom of V k and θ k . Use the relation-ship between the radius ξ k of θ k and the average radial length ℓ k of V k : ξ k = ℓ k . In S , Deg s ( ℓ ) =
1. Since ξ k in R has α = σ , in R Deg s ( ℓ ) = . In the third line of Equation(21), γ cannot have both 1 and degrees of freedom in terms of s . Calculating in thethird line the relative number of degrees of freedom of V k scaling by s in S comparedto θ k scaling by σ in R requires specifying in which reference frame, S or R , thecalculation is taking place.In S ’s reference frame in the first two lines, and in R ’s reference frame with γ = s in the third and fourth lines: V k + = s k V ( in S ) = π ( β ) k ( r ) ( γ k ) ℓ ( in S ) = π ( s ) k ( s ) k ( r ) ℓ ( in R ) = π s k ( r ) ℓ . ( in R ) (21)When s scales V k in S , σ scales θ in R , so in R Deg s ( S ) = Deg σ ( R ). The extra degree of freedom of S in R ’s reference frame is due to the e ff ect of radial motion in S . QED. The DFT : Observations, implications, speculations
Comments below about energy, quantum mechanics and gravity are speculations.
On metabolic scaling.
Assume that for organisms k and k +
1, their masses M k < M k + ,and that every organism R isotropically receives energy from a circulatory system withenergy supply capacity S .Assume that R ’s circulatory system volume V is proportional to R ’s volume θ . By The DFT , in R , S k + ∝ V k + = s V k ∝ s θ k , since V ∝ θ . Assume that R ’s averagenumber of cells N ∝ M , its mass, and that M ∝ θ ∝ ρ ( θ ) = Y , its metabolism. Then, fora Receipt θ k + = s θ k ∝ sM k .The exponent of the s factor of the Supply s V k must be 1 to match the degrees ofsotropy,entropy,andenergyscaling 22freedom of M k . A power of the Supply’s volume V k + ,( V k + ) = ( s V k ) = s ( V k ) ∝ s ( θ k ) ∝ s ( M k ) = ( s ( sM k )) = ( s ) ( M k + ) (22)supplies energy at the (Receipt) rate Y k + to the Receipt M k + . Hence, when V ∝ θ , theenergy supplied by V is proportional to M so Y ∝ M .Di ff erent mathematical reasoning gives the same result and implies that metabolicscaling occurs at the cellular level. Let ρ ( r ) be an organism’s average cellular metabolicrate. Modify The NRT by adding factors x , y and Deg m ( ρ ( r )), m a scaling factor: ρ ( R ) = xDeg σ ( R ) × yDeg m ( ρ ( r )) × ρ ( r ) . (23) The DFT implies x = in Equation (23). Assume that N does not vary (whichcan be shown to imply V ∝ M ). That is the idealized case for a mature organism.Then in Equation (23) R ∝ θ , ρ ( R ), and Deg σ ( R ) are constants; ρ ( r ) as an average isconstant. Since ρ ( R ) = Deg σ ( R ) × ρ ( r ) by The NRT , y in Equation (23) must be .Scaling up of an organism’s energy supply capacity is o ff set by scaling down of itsaverage cellular rate of energy use. Thus the metabolic capacity , which is the product Deg R ( R ) × Deg ρ ( r ) ( ρ ( r )), is invariant: × =
1. This observation uses the fact thatfor ρ ( r ) = k σ , log σ ( σ η ) = log k σ (( k σ ) η ).The metabolism of an organism’s N cells is Y = ρ ( r ) × N , N times average cellularmetabolism. Apply Equation (21) and the value of y in Equation (23). Then ρ ( R k + ) = Y k + = σ N k × m ρ ( r k ) . (24)If in Equation (23) y instead equals 1, then θ k + = σ θ k . Space expands. DFT and economics.
For the same reason as in metabolic scaling, in economicenterprise there are economies of scale. On the other hand, increasing the e ffi ciency ofindividuals frees up energy that can increase η , boosting economic growth. Turnstile Analogy.
A stadium has four seating levels, each with rows of n seats. It hasthree exit levels each with n turnstiles. Each stadium level empties one row per timeunit. Each turnstile level only allows a maximum of one row to exit per time unit. Ifthe stadium is full, the emptying rate of the stadium levels is the exiting capacity ofthe exit turnstiles. Two possible remedies are: (1) increase the number of turnstiles(the Receipt) by ; (2) decrease the rate of exiting persons by . The second solutionapplies to metabolic scaling. The first solution is consistent with the expansion of theuniverse. That is, if ρ ( r ) in Equation (23) does not scale down, then θ must scale up. A theory of emergence.
Together,
The Degrees of Freedom Theorem and
The NetworkRate Theorem explain emergence. In R ’s reference frame, S with more degreesof freedom than R initiates Deg ( R ) and causes Deg ( R ) to increase. That increasesthe multiplicative e ff ect of η in The Network Rate Theorem . Structures (stars, andorganisms) and processes (ecosystems, languages, markets, and mathematics) emergeat all scales.Michael Stumpf and Mason Porter recently (2012) suggested that allometric scalingfor metabolism has, of all putative scale-free power laws, the most evidentiary support.sotropy,entropy,andenergyscaling 23(The ratio, π , of the circumference to the diameter of a circle is an example of a scale-free relationship in Euclidean geometry.) The metabolism power law is a manifestationof The Degrees of Freedom Theorem , which may be the universe’s most fundamentalscale-free power law. If the universe is finite, then there are smallest and largest scalesfor
The Degrees of Freedom Theorem . S scaling creates space in R . R having been created, S increases its degrees offreedom to fill R . Perhaps a push-pull mechanism makes time one directional.The fractal dimension of isotropic energy distribution (the Supply) is ( ) rd that ofthe Receipt. Fractality of a Supply S induces fractality in its Receipt R at all scales. Another natural logarithm theorem.
Isotropy suggests the following.
The Natural Logarithm Theorem 2
For a finite isotropic network R , the base of thelogarithmic function describing R ’s intrinsic degrees of freedom is the natural loga-rithm. Proof:
The contribution of networking to the multiplication of capacity per transmit-ting node of R ’s n = σ η nodes as a proportion of η isd η dn = d (cid:2) log σ σ η (cid:3) d ( σ η ) = σ ) σ η . (25) The per node reception of the increase in capacity η due to networking as a proportionof η is n = σ η . For an isotropic network, the contribution to the increase in capacityper transmitting node, as described in Equation (25), equals the increase in capacityper receiving node, so σ η = σ ) σ η ⇒ ln( σ ) = ⇒ σ = e . (26)An information network where every node has an equal capacity to transmit and re-ceive is isotropic. Dunbar’s optimal audience of three is slightly more than e ≈ . On Clausius’s Mean Path Length Theorem.
Clausius in his paper introducing theconcept of mean path length (Clausius, 1858, p. 140 of translation in Brush) notes:The mean lengths of path for the two cases (1) where the remainingmolecules move with the same velocity as the one watched, and (2) wherethey are at rest, bear the proportion of to 1. It would not be di ffi cult toprove the correctness of this relation; it is, however, unnecessary for us todevote our time to it.One can prove Clausius’s theorem using The NRT and
The DFT . Theorem : The mean path length of an isotropic Supply S is of the Receipt’s ( R ’s). Proof:
Per time unit, every k th generation Supply S k isotropically supplies E k energyto a corresponding Receipt R k . If no energy is lost in transmission, then ρ ( E ) = ρ ( S ) = sotropy,entropy,andenergyscaling 24 ρ ( R ). Let S ’s mean path length s scale S and R ’s mean path length σ scale R . Then, ρ ( E ) = ρ ( S ) = Deg s ( S ) × s ( Network Rate T heorem ) = ! Deg σ ( R ) × s Degrees o f Freedom T heorem ! = Deg σ ( R ) × ! s = Deg σ ( R ) × σ ( ρ ( S ) = ρ ( R )) = ρ ( R ) ( Network Rate T heorem ) . (27)It follows that s = σ and so s = σ . QED.The mean path length of a social network—a Receipt—receiving isotropically trans-mitted information should be of e ( = . . A speculation about energy.
Use dimensional analysis. A cluster in a generation isanalogous to a mass (from a particle point of view). Let one cluster width also be oneunit of distance (from a wave point of view). The number of clusters in a generationis a distance per unit of η . Since η is proportional to time, the mass of clusters in ageneration per time is ( mass )( length ) / ( time ) . S radiates one generation per unit of η or time, ( length ) / ( time ). The dimension of a radiating mass of clusters per generationis ( mass )( length ) / ( time ) × ( length ) / ( time ) = ( mass )( length ) / ( time ) . (28)Energy has the same dimensions as the right side of Equation (28). That implies thatenergy is due to the excess Deg s ( S ) in R ’s reference frame. A 0 th generation (thesun) supplies energy to planets circling it, a black hole supplies energy to stars circlingit, and a singularity supplies energy for the universe that emerges from it. On Huygens principle.
The physicist Christiaan Huygens in 1690 had the idea that“every point on a propagating wavefront serves as the source of spherical secondarywavefronts” and “the secondary wavelets have the same frequency and speed” (Hecht,2002, p. 104), consistent with uniformly scaled nested degrees of freedom.
On dimensions. In S , Deg ( S ) = Deg =
1. In R , Deg ( R ) = Deg =
1. In R ’s reference frame, Deg ( S ) = Deg ( R ). In R ’s reference frame, assign S a 4 th dimension. Time as a 4 th dimensionappears in the special theory of relativity. Time in R may be due to radiative motion in S . On the ergodic hypothesis.
To derive the
H Theorem , Boltzmann assumed that everycombinatorial state (phase) was equally likely and would occur eventually—the ergodichypothesis. The physicist Shang-Keng Ma comments (p. 442):sotropy,entropy,andenergyscaling 25This argument is wrong, because this infinite long time must be muchlonger than O ( e N ), while the usual observation time is O (1).( O (1) means the order of magnitude of a counting number. In principle there is notenough time for the ergodic hypothesis to be true.) Theory now postulates equal prob-ability for all points in phase space. Neither the hypothesis nor a postulate is necessary.All that is necessary is the capacity to binodally connect or collide. Then the networkis scaled by its mean path length σ in time proportional to log e ( O ( e N )) = O (1). Thenetwork’s degrees of freedom are equally available in an isotropic system. The equipartition theorem.
In Statistical Mechanics the equipartition theorem (whichalso relies on the equal probability of all points in phase space) provides:In the mean each degree of freedom of the system at a temperature T hasthe thermal energy kT (Greiner, 1995, p. 198),where k is Boltzmann’s constant and T is absolute temperature. The ratio of 3 degreesof freedom in S to 2 degrees of freedom in motion through a plane in R is , or kT per S ’s degrees of freedom, a conjectural explanation. On a similarity to Schr¨odinger’s Equation . A succeeding generation in a uniformlyscaled S has moved one s -scaling that is orthogonal to the preceding generation. Let ψ represent a function that counts intrinsic degrees of freedom. Bearing in mind that η appears to be proportional to time t (in R ’s reference frame), ψ ( k + = i × s × d ψ ( k ) d η . (29)Apply Equation (29) to k scalings of s along all radii from a 0. Then the k + st genera-tion makes a right angle to the k th generation, because of the factor i in Equation (29),and forms a ring k + On special relativity and quantum scaling.
In Hermann Bondi’s ‘ k calculus’ (1962),time for one of two inertial travelers is scaled by k relative to the other. In Bondi’s onedimensional paradigm (Bondi, p. 102), Brian and Alfred are on unaccelerated pathsin space and pass by each other at point 0. Alfred sends a radar pulse at time t whichreaches Brian at time kt . Brian’s response to Alfred’s signal reaches Alfred at time k ( kt ) = k t . Bondi finds a formula for k (p. 103) and derives the Lorentz transformationsthat characterize the principle of relativity for inertial reference frames.Generalize for a finite universe: for all pairs of inertial paths find the 0 common toall. Each pair of paths has a k factor. The smallest is a quantum scaling factor.Instead of connecting points on a pair of intersecting inertial paths with straightlines as Bondi does, connect them with part of the arc of a circle. Consider a set ofradial lines intersecting at 0 with concentric circles superimposed. t i + = kt i = k i t .This resembles the model used in The Degrees of Freedom Theorem . Suppose thatevery local 0 acts as a gravity point relative to the set of inertial lines emanating from it.A quantum scaling factor helps describe the geometry of S , and is connected to gravity.An event perceived by two inertial observers with a common 0 as di ff erent in R occurs in the same generation in S . The perceived relativity of time and space maysotropy,entropy,andenergyscaling 26occur due to the reference frame adopted. We perceive space as a single referenceframe. The Degrees of Freedom Theorem suggests space has two. They are di ffi cultto distinguish because they each have 3 dimensions within their own domain. On the Arrow.
Aristotle describes Zeno’s
Arrow paradox in his
Physics : “. . . if every-thing when it occupies an equal space is at rest, and if that which is in locomotion isalways occupying such a space at any moment, the flying arrow is therefore motion-less”. An inertial object is not moving relative to its cluster in S . Inertial motion maybe the perception in R of S ’s background moving relative to the inertial object . The two slit experiment.
Light waves traveling through two parallel slits in a thin plateappear to interfere; if light consisted of particles, the impression left by the particlesshould add. Compare this to stereoscopic vision. Two eyes provide stereoscopic depthperception. Similarly, two slits may enable stereoscopic depth perception in R of S ’suniform nested scaling; the observer sees the moving background as waves with ampli-tudes of nested heights. Perhaps the two slit experiment is a reference frame problem. On Entanglement.
Consider a scaled generation of clusters:0 s z }| { −→ s −→ s −→ s . . . −→ s η . (30)In (30) a node in s connects to a node in s η in s steps and in η × s generations orsteps, counting the first generation as a step. In (30) the number of scalings out of η scalings is like a distance and is also a proportion of s . In S ’s reference frame, from theperspective of the average scaling factor s : s ⊃ k a , b k ⊃ k a , b k ⊃ . . . ⊃ (cid:13)(cid:13)(cid:13) a η , b η (cid:13)(cid:13)(cid:13) η , (31)where A ⊃ B means A contains B , and k a i , b i k i = k a i , . . . , b i k i is a representative clusterin the i th generation.In R ’s reference frame, distance is proportional to the number of scalings by s : k a , b k + k a , b k + . . . + (cid:13)(cid:13)(cid:13) a η , b η (cid:13)(cid:13)(cid:13) η = η × s . (32)In R ’s reference frame, the distance η × s spans the system, but in S ’s referenceframe, s as the average least distance between pairs of nodes spans the system. Is s >
0a distance or a scaling factor? Suppose the answer is, both.
The Natural Logarithm Theorem 3
If s spanning the network in one generation (d η )in S is equivalent to η segments each of s units spanning the network in R , then s = e. Proof:
The equivalence reduces to a di ff erential equation per unit of η :dsd η = η × s . (33) Consider s in Equation (33) as if it is a function. The solution for Equation (33) iss = e η . (34) If η = , then s = e. Equivalence of (31) and (32) leads to the natural logarithm. Scotty in the 2009 Star Trek movie: Imagine that! It never occurred to me to think of SPACE as thething that was moving! sotropy,entropy,andenergyscaling 27In other words, the natural logarithm is evidence of space’s dual reference frames.Wave particle duality may also be.Aspect, Dalibard and Roger (1982) performed an experiment that precluded com-munication between separated particles, and found entanglement: the result is consis-tent both with quantum mechanics and non-locality (Einstein, 1935; Bell, 1964). Thismay result from the equivalence of (31) and (32).
The section on
The Network Rate Theory found an innate rate of lexical change thatenables dating the beginning of language. If the same physical principles apply todiverse phenomena, then generalizing the same method should lead to a general theoryof emergence. As conjectured above.The universe consists of many kinds of structures and processes, complex at allscales. One mechanism for creating complexity is many rules applied to simple com-ponents. Another mechanism is to have a simple initiating process, such as isotropicradiation or scaling, with an enormous number of degrees of freedom. If the two theo-rems described above are valid, they are consistent with the secondly described mech-anism.That statistical mechanics first dealt with gas molecules is perhaps an accident ofhistory. Statistical mechanics might also have developed by asking how much intelli-gence a collective intelligence contributes to its component intelligences.
References [1] Achard, S., Salvador, R., Whitcher, B., Suckling, J. & Bullmore, E. (2006). AResilient, Low-Frequency, Small-World Human Brain Functional Network withHighly Connected Association Cortical Hubs. The Journal of Neuroscience 26(1),63 - 72.[2] Aitchison, J. (1989). Spaghetti junctions and recurrent routes—Some preferredpathways in language evolution. Lingua 77, 151 - 171.[3] Allen, H.S. & Maxwell, R.S. (1948). A Text-book of Heat. London: Macmillanand Co.[4] Aristotle (1921). The Works of Aristotle. Translated into English under the edi-torship of W.D. Ross. Oxford.[5] Arnold, V.I. (1997) On Teaching Mathematics. pauli.uni-muenster.de / mun-steg / arnold.html[6] Aspect, A., Dalibard, J. & Roger, G. (1982). Experimental Test of Bell’s Inequal-ities Using Time-Varying Analyzers. Phys. Rev. Lett. 45(25).[7] Bell, J.S. (1964). On the Einstein Podolsky Rosen Paradox, Physics I. 195-200.sotropy,entropy,andenergyscaling 28[8] Ben-Jacob et al. (2010). Genome sequence of the pattern forming Paenibacillusvortex bacterium reveals potential for thriving in complex environments. BMCGenomics 2010, 11:710.[9] Borade, S., Zheng, L. & Gallager, R. (2003). Maximizing Degrees of Freedom inWireless Networks. In Proc. of Allerton Conf. on Communication, Control andComputing.[10] Boyer, C.B. & Merzbach, U.C. (1991). A History of Mathematics. New York:Wiley.[11] Blust, R. (2000). Why lexicostatistics doesn’t work, p. 311, Vol. 2 in Renfrew, C.,McMahon, A. & Trask, L. (Ed.) Time Depth in Historical Linguistics, McDonaldInstitute for Archaeological Research.[12] Bohm, D. (1996). The Special Theory of Relativity. New York: Routledge Clas-sics. (Originally 1965, W.A. Benjamin).[13] Boltzmann, L. (1872). Weitere Studien ¨uber das W¨armegleichgewicht unter Gas-molek¨ulen. Sitzungsberichte Akademie der Wissenschaften 66 (1872): 275-370,included in Wissenschaftliche Abhandlungen, Vol. 1, 1909. 316-402. EnglishTranslation in Stephen G. Brush (1948). Kinetic Theory. Pergamon Press.[14] Boltzmann, L. (1884). Ableitung des Stefan’schen Gesetzes, betre ff end dieAbh¨angigkeit der W¨armestrahlung von der Temperatur aus der electromagnetis-chen Lichttheorie. Annalen der Physik und Chemie, 22(291-294).[15] Boltzmann, L. (1964 translation of 1898 work). Lectures on Gas Theory. Univer-sity of California Press.[16] Bonabeau, E., Dorigo, M. & Theraulaz, G. (1999). Swarm Intelligence: FromNatural to Artificial Systems. Oxford University Press.[17] Bondi, H. (1980; original edition 1962). Relativity and Common Sense: A NewApproach to Einstein. Dover.[18] Bridgman, P.W. (1922). Dimensional Analysis. Yale University Press.[19] Cajori, F. (1993, originally 1928 and 1929). A History of Mathematical Notation.New York: Dover.[20] Campbell, L. (1998). Historical Linguistics—An Introduction. MIT Press.[21] Cann. R.L., Stoneking, M. & Wilson, A.C. (1987). Mitochondrial DNA and hu-man evolution. Nature 325, 31-36.[22] Carnot, S. (1960; original 1824) Reflections on the Motive-Power of Fire (trans-lation of R´eflexions sur la puissance motrice du feu). New York: Dover.[23] Clark, J.B. (1899). The Distribution of Wealth: A Theory of Wages, Interest andProfits. MacMillan.sotropy,entropy,andenergyscaling 29[24] Clausius, R. (1988, originally 1850). On the Motive Power of Heat, and on thelaws which can be deduced from it for the theory of heat, in Reflections on theMotive Power of Fire. Dover.[25] Clausius, R. (1857). Ueber die mittlere L¨ange der Wege, welche bei Molecular-bewegung gasf¨ormigen K¨orper von den einzelnen Molec¨ulen zur¨uckgelegt wer-den, nebst einigen anderen Bemerkungen ¨uber die mechanischen W¨armetheorie.Annalen der Physick 105, pp. 239–58; English translation in Stephen G. Brush(1948). Kinetic Theory. Pergamon Press, Vol. 1, p. 111.[26] Clausius, R. (1858). Ueber die Art der Bewegung, welche wir W´’arme nennen.Annalen der Physick 101, pp. 353–380; English translation, The Nature of theMotion which we call Heat, in Stephen G. Brush (1948). Kinetic Theory. Perga-mon Press, Vol. 1, p. 135.[27] Clausius, R. (1865). Ueber verschieden f¨ur die Anwendung bequeme Formender Hauptgleichungen der mechanischen W¨’armethorie. Ann. der Physik, undChemie. 125, p 353. English translation in Ninth Memoir, p 327, Clausius, R.(1867). The Mechanical Theory of Heat. John van Voorst.[28] Clausius, R. (1879). The Mechanical Theory of Heat. Macmillan.[29] Dautenhahn, K. (1999). Embodiment and interaction in socially intelligent life-like agents. In Nehaniv, C.L. (Ed.). Computation for Metaphors, Analogy andAgent, 102-142.[30] Davis, P.J., Hersh, R. & Marchisotto, E.A. (1995). The Mathematical Experience- Study Edition. Boston: Birkhauser.[31] Diderot, D. (1750–1772). Article on Encyclopedia (translator Philip Stewart), inThe Encyclopedia of Diderot & d’Alembert, Collaborative Translation Project.[32] Dunbar, R. (1997). Grooming, Gossip and Language. Cambridge, Massachusetts:Harvard University Press.[33] Einstein, A., Podolsky, B. & Rosen, N. (1935). Can Quantum-Mechanical De-scription of Physical Reality Be Considered Complete? Phys. Rev. 47, 777.[34] Eisner, M. (2003). Long-Term Historical Trends in Violent Crime. Crime andJustice; A Review of Research, 30, 83142.[35] Everett, H. (1957). The Many-Worlds Interpretation of Quantum Mechanics.Princeton University, Thesis.[36] Ferrer i Cancho, R. & Sol´e, R.V. (2001). The Small-World of Human Language.Proceedings of the Royal Society of London, B 268, 2261-2266.[37] Fixsen, D.J., Cheng, E.S., Gales, J.M., Mather, J.C., Shafer, R.A. & Wright, E.L.(1996). The cosmic microwave background spectrum from the full COBE FIRASdata set. Astrophys. J. (473) 576-587.sotropy,entropy,andenergyscaling 30[38] Flynn, J.R. (2007). What is Intelligence? Cambridge University Press.[39] Frampton, P.H., Hsu, S.D.H., Reeb, D. & Kephart, T.W. (2008). What is the en-tropy of the universe? arXiv0801.1847v3.[40] Gibbs, J.W. (1902) Elementary Principles in Statistical Mechanics. Yale.[41] Gould, S.J. (2002). Eve and Her Tree. Discover, July 1.[42] Gray, R.D. & Atkinson, Q.D. (2003). Language-Tree Divergence Times Supportthe Anatolian Theory of Indo-European Origin. Nature 426, 435-439.[43] Greiner, W., Neise, L. & St¨ocker, H. (1995). Thermodynamics and StatisticalMechanics. Springer.[44] Hamming, R.W. (1980). The Unreasonable E ff ectiveness of Mathematics. TheAmerican Mathematical Monthly 87(2).[45] Hayek, F.A. (1948). The Use of Knowledge in Society, in Individualism and Eco-nomic Order. University of Chicago Press.[46] Heath, T. (1921). A History of Greek Mathematics. Oxford.[47] Hecht, E. (2002). Optics, Fourth Edition. Addison Wesley.[48] Helpman, E. (2004). The Mystery of Economic Growth. Belknap Press.[49] Herder, J. G. (1966)[first published 1772]. In Two Essays On the Origin of Lan-guage. Chicago: University of Chicago Press.[50] Holldobler, B. & Wilson, E. O. (1990). The Ants. Harvard University Press.[51] Hume, D. (1992, originally published 1739). Treatise of Human Nature. Amherst,New York: Prometheus.[52] Jafar, S.A. & Shamai, S. (2008) Degrees of Freedom Region for the MIMO XChannel, IEEE Transactions on Information Theory, 54(1), 151-170. A preprintis in arXiv.[53] Jaynes, E.T. (1957). Information Theory and Statistical Mechanics, The PhysicalReview 106(4) 620.[54] Jespersen, O. (1922). Language - Its Nature, Development and Origin. New York:MacMillan.[55] Jevons, W.S. (1879). The Theory of Political Economy, Second edition. MacMil-lan.[56] Thomson, Sir William (Lord Kelvin). Mathematical and Physical Papers. Cam-bridge University Press, 1884.sotropy,entropy,andenergyscaling 31[57] Thomson, Sir William (Lord Kelvin). On an absolute thermometric scale foundedon Carnot’s theory of the motive power of heat, and calculated from Regnault’sobservations. Cambridge Philosophical Society Proceeding for June 5, 1848; andPhil. Magazine Oct. 1848, Vol. I, p. 100.[58] Kennedy, J., Eberhart, R.C. & Shi, Y. (2001). Swarm Intelligence. New York:Morgan Kaufman.[59] Khinchin, A. Ya (1957). Mathematical Foundations of Information Theory. NewYork: Dover.[60] Kleiber, M. (1932). Body Size and Metabolism. Hilgardia 6, 315.[61] Kozlowski, J. & Konarzewski, M. (2004). Is West, Brown and Enquist’s model ofallometric scaling mathematically correct and biologically relevant? FunctionalEcology 18, 283-289.[62] Kozlowski, J. & Konarzewski, M. (2005). West, Brown and Enquist’s model ofallometric scaling again: the same questions remain. Functional Ecology 19, 739-743.[63] Kube, C.R. & Bonabeau, E. (2000). Cooperative transport by ants and robots.Robotics and Autonomous System 30, 85–100.[64] Lancashire, I. (Ed.) (1999). The Early Modern English Dictionaries Database(EMEDD). University of Toronto.[65] Laughlin, R.B. (2005). A Di ff erent Universe (Reinventing Physics From the Bot-tom Down).[66] Longair, M.S. (2003). Theoretical Concepts in Physics, Second ed. CambridgeUniversity Press. Basic Books.[67] Ma, S-K. (2000, originally 1985). Statistical Mechanics. World Scientific.[68] McMahon, A.M.S. (1994). Understanding Language Change. Cambridge Univer-sity Press.[69] Menninger, K. (1992). Number Words and Number Symbols - A Cultural Historyof Numbers Minola, New York: Dover.[70] Meyer, J. (2000). Age 2000. nationalatlas.gov. Census 2000 Brief Series.[71] Michalewicz, Z. & Fogel, D.B. (2004). How to Solve It: Modern Heuristics. NewYork: Springer.[72] Milgram, S. & Travers, J. (1969). An Experimental Study of the Small WorldProblem. Sociometry, 32(4) 425-443.[73] Molisch, A.F. (2011). Wireless Communications, Second Edition. Wiley.sotropy,entropy,andenergyscaling 32[74] Montague, R. (2006). Why Choose This Book? How We Make Decisions. NewYork: Penguin.[75] Mosisa, A. & Hipple, S. (2006). Trends in Labor Force Participation in the UnitedStates. Monthly Labor Review 58(5).[76] Motter, A., de Moura, A., Lai, Y.-C. & Dasgupta, P. (2002). Topology of theconceptual network of language. Phys. Rev. E. 65, 065102(R).[77] Neisser, U., Boodoo, G., Bouchard, T.J., Boykin, A.W., Brody, N., Ceci, S.J.,Halpern, D.F., Loehlin, J.C., Perlo ff , R., Sternberg, R.J. & Urbina, S. (1996).Intelligence: knowns and unknowns. American Psychologist, 51(2) 77.[78] Nicholls, J.G., Martin, A.R., Wallace, B.G. & Fuchs, P.A. (2001). From Neuronto Brain. Sinauer.[79] Nordhaus, W.D. (1997) Do real-output and real-wage measures capture reality?The history of lighting suggests not. Cowles Foundation Paper No. 957, and inGordon, R.J. & Bresnahan, T.F. (Eds.) (1997). The Economics of Goods. U. ofChicago Press.[80] Oeppen, J. & Vaupel, J.W. (2002). Broken Limits to Life Expectancy. Science296, 1029.[81] Odlyzko, A. & Tilly, B. (2005). A refutation of Metcalfe’s Law anda better estimate for the value of networks and network interconnec-tions. http: // / ∼ odlyzko / doc / metcalfe.pdf. Also, Briscoe, B.,Odlyzko, A. & Tilly, B. (July 2006). Metcalfe’s Law is wrong. ieee spectrum.[82] Pellegrino, F., Coup´e, C. & Marsico E. (2011). Across-language perspective onspeech information rate. Language 87(3), 539.[83] Planck, M. (translator, Morton Masius) (1914). The Theory of Heat Radiation.Philadelphia: P. Blackistons Son & Co.[84] Polya, G. (1962). Mathematical Discovery—On Understanding, Learning, andTeaching Problem Solving. New York: Wiley.[85] Popper, K. (2002, original 1935). The Logic of Scientific Discovery. Routledge.[86] Salomon, D. (2007). Data Compression—The Complete Reference (4th edition).London: Springer.[87] Seeley, T.D. (2010). Honeybee Democracy. Princeton: Princeton UniversityPress.[88] Shannon, C.E. & Weaver, W. (1949). The Mathematical Theory of Communica-tion. University of Illinois.[89] Sinha, S., Chatterjee, A., Chakraborti, A. & Chakrabarti, B.K. (2011).Econophysics—An Introduction. Wiley.sotropy,entropy,andenergyscaling 33[90] Simpson, J.A. & Weiner, E.S.C. (Eds.) (1989). Oxford English Dictionary. Ox-ford.[91] Smolin, L. (1997). The Life of the Cosmos. Oxford University Press.[92] Strogatz, S. (2003), Sync. New York: Hyperion.[93] Stumpf, M.P.H. & Porter, M.A. (2012). Critical Truths About Power Laws. Sci-ence 335, 665.[94] Surowiecki, J. (2004). The Wisdom of Crowds: Why the Many Are Smarter thanthe Few and How Collective Wisdom Shapes Business, Economies, Societies,and Nations. New York: Doubleday.[95] Swadesh, M. (1971). The Origin and Diversification of Language. Chicago:Aldine-Atherton.[96] Watts, D.J. & Strogatz, S.H. (1998). Collective dynamics of ‘small-world’ net-works. Nature 393, 440.[97] West, G.B., Brown, J.H. & Enquist, B. J. (1997). A General Model for the Originof Allometric Scaling Laws in Biology. Science 276, 122.[98] Whitfield, J. (2006). In the Beat of a Heart. Joseph Henry Press.[99] Whorf, B. (1956). Language Thought and Reality - Selected Writings. John B.Carroll ed. Cambridge, Massachusetts: MIT Press.[100] Wigner, E. (1960). The Unreasonable E ff ectiveness of Mathematics in the Nat-ural Sciences. In Communications in Pure and Applied Mathematics 13(1). NewYork: John Wiley & Sons, Inc.[101] Wrigley, E.A., Schofield, R. & Lee, R.D. (1989). The population history of Eng-land, 1541-1871: a reconstruction. Cambridge University Press.[102] Zipf, G.K. (1949) [1972 reprint]. Human Behavior and the Principle of LeastE ff / popest.[105] Census Characteristics of Australia—1991 Census of Population and HousingAustralian Bureau of Statistics, 1993, Catalogue No. 2710.0.[106] 1880 Population of the United States at the Tenth Census (June 1, 1880). De-partment of the Interior, Census O ffiffi