Size limiting in Tsallis statistics
aa r X i v : . [ c ond - m a t . s t a t - m ec h ] F e b Size limiting in Tsallis statistics.
Hari M. Gupta, Jos´e R. Campanha, Sidney J. SchinaiderIGCE - Departamento de F´ısica, IGCE, UNESPCaixa Postal 178, CEP 13500-970Rio Claro - S˜ao Paulo - BrazilNovember 23, 2018
Abstract
Power law scaling is observed in many physical, biological andsocio-economical complex systems and is now considered as an impor-tant property of these systems. In general, power law exists in thecentral part of the distribution. It has deviations from power law forvery small and very large step sizes. Tsallis, through non-extensivethermodynamics, explained power law distribution in many cases in-cluding deviation from the power law, both for small and very largesteps. In case of very large steps, they used heuristic crossover ap-proach.In real systems, the size is limited and thus, the size limiting factoris important. In the present work, we present an alternative modelin which we consider that the entropy factor q decreases with stepsize due to the softening of long range interactions or memory. Thisexplains the deviation of power law for very large step sizes. Finally,we apply this model for distribution of citation index of scientistsand examination scores and are able to explain the entire distributionincluding deviations from power law.
I. Introduction
Only recently physicists started to study the natural systems as a wholerather than in parts [1-6] and are interested in holistic properties of thesesystems normally called “Complex Systems”. These systems are difficult to1nderstand from the basic principles. The difficulties in understanding thesesystems arise from the fact that, in most of the cases, a large number ofelementary interactions is taking place at the same time for a large numberof components. Further, these systems are in constant evolution and do nothave an equilibrium state [1]. Power law scaling [7,8] is observed in manybiological [9-11], physical [2,12-20] and socio-economical complex systems[21-29] and it is now considered an important property of them. As socio-economical systems also have almost the same characteristics, physicists arealso studying some of these systems.In 1988, Tsallis [30] presented non-extensive thermodynamics in which heincorporated long range interactions and long memory effects. He proposeda generalized definition of entropy ( S q ): S q = C − P Wi =1 p qi q − W X i =1 p i = 1)where C is a positive constant, and W is the total number of microscopicpossibilities of the system. q is an entropic index, which plays a centralrole and is related to long range interactions and long memory effect ina network. This expression recovers the usual Boltzmann-Gibbs entropy( − C P Wi =1 p i ln p i ) in the limit q →
1, i.e. in short range interactions [31].The size frequency distribution function N ( x ) is given through dN ( x ) dx = − λN ( x ) (2)where λ is a positive constant. N ( x ) is the frequency probability of step size x . This gives N ( x ) = N exp( − λx ) (3)2here N is a normalization constant, thus we have an exponential decaywhich is exactly the case of Boltzmann statistics considering short rangeinteractions. However for q >
1, a more generalized equation [32] holdsgiving: dN ( x ) dx = − λN q ( x ) (4)hence N ( x ) = N [1 + ( q − λx ] q − (5)or in an alternative form: N ( x ) = N [1 + βx ] α (6)where β = ( q − λ and α = 1 / ( q − . For relatively large values of x, the distribution becomes N ( x ) ≈ const.x − α (7)i.e. a power law. In this case, a logN ( x ) vs.log ( x ) plot exhibits a straightline for large values of x . Power law distribution can not continue forever inreal systems. It has to be truncated in some way to avoid infinite variance.Recently, we have shown that by gradually truncating power law distribu-tion after certain critical value, we are able to explain the entire distributionincluding very large steps in financial and physical complex systems [33-35].Although this model explains empirical results for large step sizes, it has anundesirable discontinuity at critical step size. Also, this model fails for smallsteps.In discussing folding-unfolding phenomena that occurs in proteins, Tsalliset. al. [36] argued that with the increase of temperature, increases thermalmotion, which in turn decreases long memory or long range interactions3nd finally decreases entropy index q. Thus q approaches to 1. At lowtemperatures, the distribution function, which shows power law becomesexponential at relatively high temperatures. For a fixed temperature, q isconsidered to be a constant. In order to consider long range departure, theyassume a crossover to another type of behavior and modify Equation (4) as dN ( x ) dx = − µ r N r ( x ) − ( λ − µ r ) N q ( x ) (8) µ r is very small compared to λ . That gives a crossover between two differentpower laws (respectively characterized by q and r ) or from power law tonormal distribution within a nonextensive scenario, which is definitely a casefor many complex systems and gives multifractality. The cut-off is sharpestwhen r = 1. In this case N ( x ) = N (1 − λµ + λµ e ( q − µ x ) q − (9)Although cross-over behavior as suggested by Tsallis can avoid an infinitevariance, in the present work, we are looking for another possibility, i.e.truncation of power law due to finite size in real systems which in fact isnot a cross-over behavior. We therefore suggest an alternative approach toaddress the long time ( t ) or long distance ( x ) departures. We consider thatentropy factor q decreases with step size ( x ) due to the softening of longrange interactions or memory effects due to finite size in real systems whicharises because of physical limitation of the component or the system itself.Thus q depends on the step size. This is similar as anharmonic terms areimportant for calculating potential energy in lattice vibrations.Finally, we apply this model for the distribution of the citation index ofscientists and examination scores of an entrance examination and compareit with Tsallis approach [36]. II. The model
The physical limiting factor is of a very small importance for small steps,while it is necessary for larger steps. Entropy index q is equal to 1 in theabsence of long memory or long range interactions. Thus the information4bout these interactions is given through ( q − x due to finite size in realsystems. In general, for this, we propose( q ( x ) −
1) = ( q − P j θ j x j (10)where q and q ( x ) are values of entropy index q for step size zero and stepsize x respectively. θ i and i are adjustable parameters, depending on the sizelimiting.To simplify, we propose an exponential decay i.e .q ( x ) − q −
1) exp( − ( θx ) i ) (11)where θ and i show the rate of decrease of the importance of these interactionswith the increase of step size x. The higher value of i indicates a sharper cut-off. For very large values of x , q ( x ) approaches to 1 and thus gives normaldistribution as required through central limit theorem. In the present modelthe distribution function is given through: N ( x ) = N [1 + ( q − λx exp( − ( θx ) i )] − (exp(( θx ) i )) / ( q − (12)In Figure 1 we compare N ( x ) vs. x in Tsallis approach through Equation(9) and present approach through Equation (12) for very large steps. Underthe present model, the gradual truncation of the power law can be adjustedfrom very sharp to very slow through the value of i without interfering inpower law behavior in the central part of the distribution. This is not possiblein Tsallis approach. For larger values of µ r (line II of Figure 1), we can get asharp cut-off, but then it deviates significatory from power law in the centralpart. 5 igure 1 – Theoretical distribution of Tsallis and present Model in log-log scale. We consider: N = 1 . ; λ = 0 . q = 1 .
5. Curves I and IIare through Tsallis model considering r = 1 and µ r = 1 . − and 1 . − respectively. Curves A, B, C, D, E, F, and G are through present modelconsidering i = 1 / θ = 3 . − (Curve A), i = 1 / θ = 3 . − (Curve B), i = 1 and θ = 3 . − (Curve C), i = 2 and θ = 1 . − (CurveD), i = 3 and θ = 1 . . − (Curve E), i = 4 and θ = 1 . . − (Curve F) and i = 5 and θ = 2 . − for Curve G. III. Distribution of citation Index
Now we apply this model to describe the distribution of citation index ofthe scientists. The citation index of a scientist is the total number of timesthat his articles are cited in other articles. The citation patterns of scientificpublications form a rather complex network [37]. Here nodes are published6apers. The citation of an article is an interaction of a scientific work withother scientific works. Most of the articles are cited in the proper group only.However many articles go beyond it and are of the interest of others. Somepioneer articles are cited for many decades. Thus, citation index arises fromboth short and long range interactions and can be treated through statisticaldistribution based on Tsallis entropy concept.The fact that a scientist is cited more times facilitates him to get morefinancial help to his research projects and better students. Some other smallgroups also came in his influence. This, in turn, contributes to form a betterand larger group. In physical terms these effects produce long range interac-tions. A pioneer work is also cited just to complete introduction of a problem,and thus is cited for a larger time [42], although the problem is not directlyconnected to it. This gives a long range memoryIn case of the scientist’s citation index, unfortunately, reliable informationis available only for some of the most cited physicists or chemists. There aremany scientists with the same name and do not exist a rigid control to sep-arate them. This can make a significant error for low cited scientists. Thus,it is not possible to have a complete statistical analysis as we did in the caseof scientific publications [38] and found that non-extensive thermodynamicaldistribution (Tsallis statistics) is valid over eight orders of magnitude (10 − to 10 ) . It is therefore interesting to construct a Zipf plot [39] in case of ci-tation index of scientists, in which the number of citations of the n th mostcited scientists out of an ensemble of M scientists is plotted versus rank n .By its very definition, the Zipf plot is closely related to the cumulative large x tail of the citation distribution. This plot is therefore well suited for de-termining the large x of the citation distribution. This plot also smoothesout the fluctuations in the high-citation tail and thus facilitates quantitativeanalysis.Given an ensemble of M scientists and the corresponding number of cita-tions for each of these scientists in rank order Y > Y > Y ... > Y n > ...Y M ,then the number of citations of the n th most-cited scientist Y n may be esti-mated by the criterion [39]: Z ∞ Y n N ( x ) dx = n (13)This specifies that there are n scientists out of the ensemble of M which7re cited at least Y n times. From the dependence of Y n on n in a Zipf plot,one can test whether it agrees with a hypothesized form for N ( x ).We analyze citation index of (a) the most-cited Brazilian physicists andchemists and (b) Internationally most cited physicists and chemists. ByBrazilian scientists we mean all scientists who are working in Brazil or havea permanent working address in Brazil. All physicists (chemists) includingBrazilian physicists publish their work in the same Journals and work almoston the same problems. Physics, like any other basic science, is the same allover the world. In case of the internationally most cited physicists, we havedistribution function only for a few physicists ( ≈ Y n ) versus rank ( n ) for first 205Brazilian physicists in 1999 [40] and compare with Tsallis and the presentmodel. For the present model we use the following parameters N = 1 . ,q = 1 . , λ = 0 . θ = 2 . . − and i = 1. For Tsallis statistics(Equation 9), the parameters used are N = 1 . , q = 1 . , λ = 0 . ,µ r = 3 . − and r = 1. 8 igure 2 Zipf plot of the number of citation of the n th ranked Brazilianphysicist Y n versus rank n on a double logarithmic scale.9 igure 3 Zipf plot of the number of citation of the n th ranked Internationalphysicist Y n versus rank n on a double logarithmic scale.In Figure 3, we plot citation number ( Y n ) versus rank ( n ) for 1120 mostcited physicists over the period 1981-June 1997 [41] and compare it with thetheoretical curve with the same value of q and λ as used in Figure 2. Wechanged the value of constant N from 1 . .
5% of the total citations, which is reasonable. The value of θ changes from 2 . . − to 3 . − . This shows that size limiting factors aremuch more important for Brazilian physicists compared to internationallymost cited physicists which are mostly from U.S.A. This is perhaps due to10he absence of basic infrastructure and large research laboratories in Brazilto work on important problems particularly in experimental physics. It isinteresting to note that 8 out of 10 most cited Brazilian physicists are workingin theoretical physics. In case of Tsallis statistics we use the same basicparameters ( q and λ ) as in Figure 2. We change N from 1 .
65 to 135 and µ from 3 . − to 3 . . − . We again observe a good agreement both in oursand Tsallis model. Note that we are able to explain both distributions withthe same values of basic parameters. Figure 4
Zipf plot of the number of citation of the n th ranked Brazilianchemist Y n versus rank n on a double logarithmic scale.11 igure 5 Zipf plot of the number of citation of the n th ranked Internationalchemist Y n versus rank n on a double logarithmic scale.In Figure 4, we plot citation number ( Y n ) versus rank ( n ) for the first 119Brazilian chemists in 1999 [40]. We compare it with present model and Tsallismodel. We consider following parameters for the present model N = 0 . ,q = 1 . , λ = 0 .
006 and θ = 2 . . − and i = 1. For Tsallis statistics weuse N = 0 . , q = 1 . , λ = 0 .
006 and µ r = 1 . − and r = 1.In Figure 5, we plot the citation number versus rank for the first 10838chemists [39], and compare this plot with present model considering N = 180and θ = 2 . . − and i = 1 In case of Tsallis statistics we use N = 190 and µ r = 3 . . − and r = 1. The other parameters are the same as in Figure 4.The values of N show that citations of Brazilian chemists is roughly 0 . θ changes from 2 . . − to 2 . . − . Theadjustment is good both for ours and Tsallis model.12e found that our approach considering the softening of long range inter-actions, as well as Tsallis approach considering cross-over behavior (Equation9), gives almost the same results and can explain the entire empirical curveincluding deviations for small and very large steps. Thus, the model pre-sented in this work is an alternative approach. The present approach isinteresting as parameter θ is related to size limitations of the system andthus can be more informative. IV. Distribution of an Entrance Examination Scores
Recently we studied the statistical distribution of the student’s perfor-mance, which is measured through their marks, in the university entrance ex-amination (Vestibular) of UNESP (Universidade Estadual Paulista) in Brazil,for the years 1998, 1999 and 2000. To our surprise, we observed long ubiqui-tous power law tails in place of normal distribution in physical and biologicalsciences [29, 35]. In humanities we have almost normal distribution. Thesepower law tails in physical and biological sciences exist independently of eco-nomical, teaching, and study conditions [29]. This shows that the powerlaw tails are due to the nature of the subject itself. These observations areinteresting as they treats education as a complex system and bring out therelative importance of the different factors on science and mathematics edu-cation at high school level, which is today, an issue of great concern in oursociety.In our earlier works [29], we took marks of the students in a block of sub-jects, i.e. physics, chemistry and mathematics together for physical sciencesand physics, chemistry and biology together for biological sciences. Thusit is not possible to make a detailed quantitative analysis. Further theseare optional subjects for the examination. Thus the student’s interest for aparticular area is also an important factor.To confirm our observations, in the present work, we analyze the statisti-cal distribution of the marks obtained by students in individual subjects i.e.in Physics, Mathematics, and Portuguese as native language in the Air ForceAcademy entrance examination in Brazil. These students don’t have anyspecial interest for any of these subjects as they like to have a military careerin the Air Force. Thus the statistical distribution can give better informationabout the peculiar nature of each subject at high school level.13hysics and Mathematics are areas of systematic study and depend muchon regular study. To understand a chapter, the students need to know thematerial given in the earlier chapters. A student who understands well thefirst chapter has better conditions to understand the second and subsequentchapters. The one, who didn’t understand the first chapter, will find manydifficulties in understanding the subsequent chapters. This gives a kind ofpositive feedback or long term memory effect, the reason behind power law[8, 31]. In case of Portuguese, being a native language, each chapter is moreor less independent. Further being native language they learn Portuguesein a natural way. Although in Portuguese there is also some dependence ofunderstanding earlier materials, it is not as strong as in Physics and Mathe-matics.Figure 6 – Distribution of marks obtained by students in log-log scale inPhysics for years 2003 to 2006 considering together.14e compare the marks obtained by the students in Physics, Mathematics,and Portuguese as native language. To show clearly the validity of powerlaw we plotted log (frequency) vs. log (marks). In figure 6, we comparethe distribution for Physics with present and Tsallis approach for all theyears 2003 to 2006 together to have a better idea of the distribution. Thedistribution for an individual year, i.e. 2003, 2004, 2005 or 2006 is almostthe same as all the years together. The parameters of distribution are: N =4350; λ = 0 .
09 and q = 1 . θ = 1 . . − and i = 12 for our modeland µ r = 0 . r = 1 for Tsallis model. In figure 7, we did the samefor Mathematics. The parameters of distribution are: N = 5250; λ = 0 . q = 1 . θ = 1 . . − and i = 9 for our model and µ r = 0 . r = 1 for Tsallis model. We shifted the origin axis to x m , i.e. we use( x − x m ) in place of x both in Equations (9) and (12), where x m is themark for maximum frequency. We took x m = 38 for Physics and 39 forMathematics. We observe that the distribution in cut-off region is bettergiven through present approach. In figure 8, we compare the distributionof marks obtained in Portuguese with Normal distribution and found thatNormal distribution explains the distribution satisfactory. The parametersfor Normal distribution are: N = 148700, µ = 52 .
64 and σ = 26 . V. Discussion
In case of scientist´s citation index, there is no visible limit and thereforethe cut-off is slower and can be explained both through Tsallis or presentapproach. In case of examination score there is a visible limit. No one canget more than the maximum marks. This make size limiting very strong andgives a sharp cut-off. As size limiting exists in all real systems, the presentapproach is appropriate.In conclusion, in the present paper, we presented a statistical distributionconsidering that the entropic index ( q − eferences [1] P. Bak, How Nature Works, Oxford University Press, Oxford (1997)[2] B. B. Mandelbrot, The Fractal Geometry of Nature (Freeman, New York1982)[3] B. B. Mandelbrot, Science 156, 637 (1967)[4] D. L. Ruderman, Network: Computation in Neural Systems, 5, 517(1994)[5] L. Poon, C. Grebogi, Phys. Rev. Lett. 75, 4023 (1995)[6] R. Badii, A. Politi, Complexity – Hierarchical Structures and Scaling inPhysics, Cambridge, U. K. (1997)[7] P. L´evy, Th´eorie de lAddition des Variables Al´eatories (Gauthier-Villars,Paris 1937)[8] V. Pareto, Cours d’ ´Economie Politique. Reprinted as a volume of Oeu-vres Compl´etes (Droz, Geneva 1896-1965)[9] C. K. Peng et al., Phys. Rev. Lett. 70, 1343 (1993)[10] G. F. Zebende, P. M. C. de Oliveira and T. J. Penna, Phys. Rev. E 57,3311 (1998)[11] J. B. Bassingthwaighte, L. S. Liebovitch and B. J. West, Fractal Physi-ology (Oxford Univ. Press, New York 1994)[12] U. Frish, M. F. Shlesinger and G. Zaslavasky, L´evy Flights and RelatedPhenomena in Physics (Springer-Verlag, Berlin 1994)[13] T. H. Solomon, E. R. Weeks and H. L. Swinney, Phys. Rev. Lett. 71,3975 (1993)[14] M. Nelkin, Adv. Phys. 43, 143 (1994)[15] A. Ott., J. P. Bouchard, D. Langevin and W. Urbach, Phys. Rev. Lett.65, 2201 (1990) 1916] Z. Olami, H. J. S. Feder and K. Christensen, Phys. Rev. Lett. 68, 1244(1992)[17] B. Chabaud et. al., Phys. Rev. Lett. 73, 3227 (1994)[18] E. Weeks, J. Urbach and H. L. Swinney, Physica D 97, 291 (1996)[19] T. H. Solomon, E. R. Weeks and H. L. Swinney, Physica D 76, 70 (1994)[20] H. E. Hurst, Trans. Am. Soc. Civil Eng. 116, 770 (1951)[21] Proceedings of First International Conference on High Frequency Datain Finance (Olsen & Associates, Zurich 1995)[22] R. N. Mantegna and H. E. Stanley, Nature 376, 46 (1995)[23] G. Ghashghaie et. al., Nature 381, 767 (1996); A. Arneodo et al. preprintcond-mat/9607120[24] E. F. Fama, Management Sci. 11, 404 (1965)[25] A. L. Tucker, J. of Business & Economic Statistics, 10, 73 (1992)[26] J. P. Bouchaud, M. Potters, Theorie des risques financieres, Alea Saclay(1997)[27] A. Arneodo, J. F. Muzy and D. Sornette, Eur. Phys. J. B2, 227 (1998)[28] T. Lux and M. Marches, Nature 397, 498 (1999)[29] H. M. Gupta , J. R. Campanha and F. R. Chavarette, Int. J. ModernPhys. C 14, 449 (2003)[30] C. Tsallis, J. Stat. Phys. 52, 479 (1988)[31] C. Tsallis, Brazilian Journal of Physics 29, 1 (1999)[32] C. Tsallis, M. P. Albuquerque, Eur. Phys. J. B , 777 (2000)[33] H. M. Gupta, J. R. Campanha, Physica A , 531 (2000)[34] H. M. Gupta, J. R. Campanha, Physica A , 231 (1999)2035] H. M. Gupta, J. R. Campanha, and F. D. Prado, Int. Journal of ModernPhysics C, , 1273 (2000)[36] C. Tsallis, G. Bemski, and R. S. Mendes, Physics Letters A, , 93(1999)[37] S. Redner, Phys. Today , 49 (2005)[38] H. M. Gupta, J. R. Campanha and R. A. G. Pesce, Brazilian J. Physics, , 981 (2005)[39] J. Galambos, The Assymptotic Theory of Extreme Order Statistics(John Wiley & Sons, New York, 1978)[40] Folha de S˜ao Paulo, S. P. - Brazil. Se¸c˜ao Mais - 12 th Sept. 1999[41] Data from Web site http://physics.bu.edu/˜redner/[42] M. V. Simkin and V. P. Roychowdhury, Complex Systems, , 269(2003), Scientometrics , 367 (2005)[43] K. B. Hajra and P. Sen, Physica A , 44 (2005) and Physica A ,575 (2006) 21 X N ( x ) IIIGFEDCBA
10 100 1000100100010000 C i t a t i on s RankTsallis Present model
10 100 1000100010000100000 C i t a t i on s RankPresent modelTsallis
10 10010100100010000 C i t a t i on s RankPresent modelTsallis
10 100 1000 10000100100010000100000
Tsallis Present model C i t a t i on s Rank
0 50 60 70 80 90 100 110 12010 Marks F r e qu e n cy TsallisPresent modelEmpirical points
0 50 60 70 80 90 100 110 12010 Marks F r e qu e n cy TsallisPresent modelEmpirical points
0 50 60 70 80 90 100 110 12010 Marks F r e qu e n cycy