TThe Distributions in Nature and Entropy Principle
Oded Kafri
The derivation of the maximum entropy distribution of particles in boxes yields two kinds of distributions: a "bell-like" distribution and a long-tail distribution. The first one is obtained when the ratio between particles and boxes is low, and the second one - when the ratio is high. The obtained long tail distribution yields correctly the empirical Zipf law, Pareto's 20:80 rule and Benford's law. Therefore, it is concluded that the long tail and the "bell-like" distributions are outcomes of the tendency of statistical systems to maximize entropy.
Keywords:
Maximum Entropy; Long tail distribution; 20:80 Pareto's rule; Zipf Law; Benford's law; Bell-like distribution ntroduction:
There are two common distributions in life: The first one is the "bell-like" distribution, which is found in the distribution of IQ, human heights, human age at death etc. This "almost universal" distribution was introduced for the first time by Moivre in the 18 th century and explored by Laplace and Gauss around 1800. As opposed to the bell curve distribution, many quantities are distributed unevenly [1]. For example, the probability to live in a big city is higher than the probability to live in a small village. Similarly, the probability to be poor is higher than the probability to be rich. Although intuitively it is logical for cities' population and wealth to have a bell curve distribution, it is not so. Their distributions are uneven and are characterized by a long tail to the right, in which few have a lot and many have quite a little. These distributions were observed by Pareto, Zipf, Newcomb and Benford about a century later and received their name accordingly: Zipf law [2,3], Pareto's rule [4,5], and Benford's law [6,7]. The first to discover it was Pareto. In 1896 he observed that the ownership of lands in Italy is distributed among the population in the ratio of around 20:80, namely, about 20% of the population own about 80% of the land. From his observations of other countries as well, he concluded that this ratio s general. Mussolini embraced the Italian Marquis Pareto because he believed that the Pareto's rule proves nature's preference of the fittest. Zipf - a Harvard professor of linguistic - found out that the ratio between the first most frequent word and the second one, in any text in many languages, is two. Similarly, the ratio between the second most frequent word and the fourth one is also two, etc. He claimed that the shortest and most "efficient" words appear more frequently [2]. Zipf believed in the evolutionary philosophy, i.e. the most "useful" and "efficient" words are the winners, in the spirit of "the survival of the fittest". On the other hand, many people and political movements believe that Pareto's rule is unfair and the wealth should be shared more equally, namely, as in the bell curve distribution. The discovery of Newcomb about the uneven frequency of digits in logarithmic table in 1881 [6], (the higher the value of a digit, the lower its frequency) raises some doubts as for the real reason for the uneven distributions. Later, in 1938, Benford confirmed Newcomb's uneven distribution of digits in a wide range of numerical data [7]. He attempted, unsuccessfully, to present a formal proof to Newcomb's equation, see Eq.(12). Since than, this distribution was found also in prime numbers [8], physical constants, Fibonacci numbers and many more [9]. In this paper it is argued that the "bell-like" distribution and the long tail distribution are the boundaries of the same probability distribution. This probability function is obtained by a fair and unbiased random distribution of particles in boxes. We consider a set of N boxes scoring P particles; it is assumed that all the boxes have an equal probability to score a particle, namely, the probability of a box to score a particle is Nq = . Therefore, the probability to score n particles is nn Nq ⎟⎠⎞⎜⎝⎛= . It is clear that qq n < . This is the basic reason why the rich are fewer than the poor. In the case of NP << , where a multiple score in negligible, the "bell-like" distribution is obtained; and in the case of P >> N , a long tail distribution is obtained. I. How P particles are distributed in N boxes? The answer to it is not new: the particles are distributed in a way that maximizes the entropy [10]. According to Boltzmann, entropy is proportional to the maximum possible number of the different configurations (microstates) of a set. Namely, Ω= ln S (1) we take here the Boltzmann constant ≡ B k ) . A microstate is one possible distinguishable configuration of a set of boxes and particles. Boltzmann entropy is obtained from the Gibbs-Shannon entropy by assuming that all the microstates have an equal probability. The Gibbs-Shannon entropy is given by: ∑ Ω= −= ln j jj ppS , (2) where is the probability of the microstate j and j p Ω is the number of microstates to be maximized. If all the microstates have an equal probability, namely, Ω= j p , Boltzmann entropy Ω ln is obtained. Therefore, the distribution of particles that maximizes Boltzmann entropy means an equal probability to any configuration as well as an equal probability to any particle to be in any box. The number of microstates (different configurations) of P particles in N states is given by the Plank expression [10] namely, )!1(! )!1(),( −−+=Ω NP PNNP . (3) o visualize the problem we start with a numerical example; namely, calculating the distribution of 3 particles in 3 boxes that maximizes entropy. According to Eq. (3) the number of microstates )3,3( Ω is 10, as follows: 3|0|0, 0|3|0, 0|0|3, 2|1|0, 2|0|1, 1|2|0, 0|2|1, 1|0|2, 0|1|2, and 1|1|1. We see that although each box has an equal chance to score 1, 2, or 3 particles, the boxes with 1 particle appear 9 times, those with 2 particles appear 6 times , and those with 3 particles appear 3 times. The relative frequency of the boxes with one particle in a set of three boxes is therefore ; with two particles = f = f and with three particles = f . To calculate the relative frequencies , we designate )( nf NPn = , where is the number of particles in a box, and apply the Stirling's formula n . We obtain [10] from Eqs.(1) and (3) that, NNNN −≅ ln!ln ∑ = −++≅−++≅ Nn nnnnnnnnNS }ln)1ln()1{(}ln)1ln()1{( (4) Now we write the Lagrange equation, })({}ln)1(ln)1{()( ∑∑ == −−−++≅ NnNn nnPnnnnnF φβ (5) he first term on the RHS is the entropy and the second term is the constraint of the number of particles. Namely, is the number of particles, ∑ = = Nn nnP )( φ )( n φ is the number of boxes that scored n particles and β is a Lagrange multiplier. )( n φ can be interpreted as the probability of a box to have n particles . The normalized )( n φ , is the relative frequency of the boxes that scored n particles. From )( nf =∂∂ nnF one obtains, )11ln()( nn += − βφ (6) Eq.(6) is the analogue of Planck equation, [11,12,13] namely, )( −= n en βφ . (7) Hereafter, we examine three cases: In the first case we assume that . Here one can expect to find a large number of particles (limited by P ) in any of the boxes. For example, if we conduct a popularity poll between the N words among P authors, and there are many more authors than words, then the maximum entropy distribution of the votes between the words is shown to be the Zipf law. >> n In the second case we consider the intermediate zone where n is in the range of the number of the boxes. This case fits well to the distribution of ranks, namely, Pareto's rule and Benford's law. n the third case we consider << n , where the number of particles is negligible as compared to the number of boxes. This case fits well to the probability of guessing correctly the IQ of a person in a single guess based only on the knowledge of the average. This case yields the "bell-like" distribution. IIa Zipf law:
Consider the case where where . In this case NP >> >> n << βφ , therefore from Eq.(7) )( n φ can be approximated to, βφ ≅ nn (8) Eq.(8) is the Zipf law. Namely, the ratio in the frequencies between n =1 (the most frequent word) and n =2 (the second most frequently word) is 2 which is identical to the ratio between n =2 and n =4 etc. This ratio is not a function of β as, ≅=== nn φφφφφφ . IIb Pareto's rule: to calculate the relative frequency of Eq.(6), namely, we have to divide )( nf )( n φ by the sum over all the M occupied boxes NM ≤ , namely, )1ln()1ln......23ln12(ln)( +=++++= −−= ∑ MMMn Mi ββφ . (9) Therefore, )1ln( )11ln()( ++= M nnf (10) Like in the Zipf law, for integer n 's, the relative frequency is not a function of β . We define a rank PNnr ≡ where Rr ,.....,3,2,1 = . By defining the ranks we combined the boxes into clusters of boxes such that each cluster will contain groups of Rr ,.....,3,2,1 = NP particles. = r means 10 times more particles than = r . We can repeat the calculation of the frequency again but instead of using n , we will use r , and obtain; )1ln( )11ln()( ++= R rrf (11) In Fig.1 The relative frequencies for a set of clusters and according to Eq.(11) is plotted. A long tail distribution is demonstrated. = R Rr ,....,3,2,1 = .E+001.E-062.E-063.E-064.E-065.E-066.E-067.E-068.E-069.E-061.E-050.E+00 1.E+05 2.E+05 3.E+05 4.E+05 5.E+05 6.E+05 7.E+05 8.E+05 9.E+05 1.E+06 Number of clusters of boxesf(r)
Fig.1:
A million clusters and their probabilities. The rank increases as its probability decreases.
Eq.(11) "behaves" as a power law, this is so because a plot of the logarithm of the cluster r versus the logarithm of its probability yields a straight line as demonstrated for a million ranks. -8-7-6-5-4-3-2-10 0 1 2 3 4 5 6 Ln r Ln f ( r ) Fig.2:
Log-Log plot of frequency versus the rank for R=million is a straight line. he Pareto's 20:80 rule of thumb was proved to be correct not only in wealth distribution but in many other phenomena as well. For example, it is believed that 20 percent of customers yields 80 percent of the revenue; 20 percent of the drivers cause 80 percent of the accidents; etc [5]. In order to find the ratio obtained from Eq.(11) we divide the boxes into 10 ranks. Each rank contains 1, 2, 3,….,9, 10 equal groups of particles. We construct the table below rrf += : from r
10 9 8 7 6 5 4 3 2 1 f ( r )% 4 4.4 4.9 5.6 6.6 7.6 9.3 12 16.9 28.9 Table 1: The relative frequencies of
10 ranks
The total number of groups is . However, the richest five ranks contain groups. Their total frequencies are , which means that about 73% of the packages are in the hands of about 25% of the boxes. This is a typical behavior of the Pareto's rule but with a small deviation from the empirical rule of thumb of 20:80, namely, a 25:75 rule. ∑ = = r r ∑ = = r r ∑ = = %5.25)( r rf IIc Benford's Law : Another application of Eq.(11) is Benford's law. Newcomb suggested Benford's law in 1881 from observations of the hysical tear and wear of books containing logarithmic tables [6]. Benford further explored the phenomenon in 1938, and empirically checked it for a wide range of numerical data. The main application of Benford's distribution is based on its existence in numerous random numerical files like financial data, street addresses, etc. Since one intuitively expects to obtain an even distribution of digits, as would be in the case of an unbiased lottery, some income tax authorities are looking at balance sheets for digit distributions in order to detect fraud detection. If the balance sheets don't fit to Benford's law, a further inspection is done [14]. In the derivation of Benford's law we assume that a digit is a box with n particles. This assumption is logical as a digit, unlike a word, has an absolute meaning as compared to other digits, exactly as the meaning of the number of particles in a box. There is a constraint though: the number of particles in a digit cannot exceed 9. The digit zero does not appear in Benford's law distribution of the first order. In Eq.(11) r may have any number. In digits, per definition, , therefore, it is legitimate to calculate the equilibrium distribution of the occupied boxes and to add as many empty boxes without affecting the distribution. In this case R is 9 and Eq.(11) yields the relative frequency, ≤ r rrrf +=+= (12) This is the Benford's law. Fig 3 . Benford's law predicts a decreasing frequency of first digits, from 1 through 9.
III "bell-like" distribution:
Zipf law, Pareto's rule and Benford's law occurs where the number of particles is larger than the number of boxes. Hereafter, the case where NP << is considered. In the case where , we neglect the boxes that scored several particles, because, practically there are no such boxes. We want to find the probability distribution of N boxes to score one particle. In this limit, and Eq.(7) can be approximated to, << n >> βφ e i en i βφ− = (13) Here is the fraction of a particle in a box and the frequency i n )( ii n φφ = is the probability to find this fraction. The total number of particles P is given by the same expression that we used in the Lagrange equation (5) namely, i eNnNP iii βφ φφ − == , (14) in the limit → β one obtains that all the frequencies i φ of the boxes are equal, namely NP = φ . This is an even distribution. The even distribution is the intuitive distribution that one expects to find in a distribution of particles in boxes. This distribution causes us to believe that uneven distributions are counterintuitive. In the case where β is finite . (15) ∑∑ == − == Ni iNi i
PeP i ),( βφφ βφ PP i ),( βφ is the relative probability to find a particle in a box. From Eq.(15) it is seen that ),( βφ i P has two components, the first is the frequency i φ of the fraction of the particle and the second is the fraction of particles. As opposed to the case where , the frequency i n NP >> )( n φ itself is not the robability to find n particles but the probability to find a fraction of a particle. To find the probability of a single particle we have to multiply the frequency by the fraction of the particle namely ii n φ . When the frequency increases the associate fraction of particles decreases exponentially with the frequency. The larger the β , the steeper is the decay. Since ),( βφ P is a linearly increasing function of i φ multiplied by an exponentially decay function of i φ , the distribution of particles in a box has a definite maximum. Fig 4: The number of boxes, and their probability to find a single particle for N=1000 and = β . he maximum probability is obtained from =−=∂∂ −− βφβφ βφφ eeP and is given by βφ max = . In Fig.(4) we see that the obtained curve is typical of velocity of molecules, human age at death etc. IV Discussion : The long tail distribution attracts a considerable attention because it is so ubiquitous [15]. Sometimes it is called a power law distribution and scale-free distribution. This is because a Log-Log presentation of the distribution yields a straight line as seen in Fig.2. When a power law fits are done, different slopes obtained for different statistics. For example, in Zipf law the ratio between the frequency of the 1 st and the frequency of the 2 nd is 2; in Pareto's rule and in Benford's law this ratio is about 1.7. Namely, in different regimes of different "slopes" are obtained as is seen in Fig. 5. Another notable point is that the normalized frequencies for are not a function of NP / )( nf NP >> β . This is with contradistinction to the case NP << in which the distribution is a function of β . Fig.5. a plot of φ ln versus : for high values of n a "power law" decay is obtained, however for low values of n an exponential decay is obtained. n ln The Lagrange multiplier β has a meaning. In thermodynamic the temperature is related to it via β ∝ T . We see that in the case of Zipf law the frequency multiplied by the number of particles is proportional to the temperature. In the case of the temperature is proportional to the frequency in which the probability to find a particle is the highest. This is the main difference between the long tail distribution and the "bell-like" distribution. In the long tail the temperature means the average wealth of a box. In the bell curve the temperature means the average maximum probability. << n Summary:
The distribution of P non-interacting particles in N boxes is calculated for a fair system. Since there is no preference to any configuration of particles and boxes, the entropy principle can be applied. It is shown that when the number of the particles is negligible as compared to the number of oxes, the "bell-like" distribution (which prefers the average) is obtained. However, when the number of particles is higher than the number of boxes, a long tail distribution is obtained. The obtained long tail distribution yields correctly Zipf law, Pareto's rule and Benford's law. The Pareto's rule usually is conceived as an evolutionary law. Namely, the 20% of the drivers that cause 80% of the accidents are the bad drivers. Maybe the personality of these drivers is the reason for their excessive involvement in car accidents. Similarly, there might be good reasons for the fact that few people get rich and the majority remains poor. These kinds of questions cannot be answered by this kind of analysis. However, one should bear in mind that particles without personality, interactions or statistical bias are also distributed in the same way. Acknowledgment:
I acknowledge Alex Ely Kossovsky, Dan Weiss and H. Kafri for most valuable comments. References:
1. Per Bak, How Nature Works: The science of self-organized criticality, Springer-Verlag, New York, 1996. 2. Zipf G. K. Human Behavior and the Principle of Least-Effort, Addison-Wesley 1949. 3. Miller G. A. ; Newman E. B. Tests of a statistical explanation of the rank-frequency relation for words in written English, American Journal of Psychology,
71, 209-218. 4. Pareto V.. Cours d'economie politique (Droz, Geneva Switzerland, 1896) (Rouge, Lausanne et Paris, 1897). 5. Jurgan J. M. ed. Quality Control Handbook, McGraw-Hill Book Company , 78, 551-572. 8. Cohen D.; Talbot K. Prime numbers and the first digit phenomenon, J. Number. Theory, , 18, 261-268. 9. Kossovsky A. E. Towards a better understanding of the leading digits phenomena, arxiv:0612627 10. Planck M. On the law of distribution of energy in the normal spectrum, Annalen der Physik,
4, 553-562. 11. Kafri O. The second law as a cause for the evolution arxiv:0711.4507 12. Kafri O. Sociological inequality and the second law, arxiv: 0805.3206 13. Kafri O., Entropy principle in direct derivation of Benford's law arxiv:
18, 72-91. 15. Newman M. E. Power-law, Pareto Distribution and Zipf's law, arxiv:0412,0042118, 72-91. 15. Newman M. E. Power-law, Pareto Distribution and Zipf's law, arxiv:0412,00421