ZZipf ’s law for Atlas models
Ricardo T. Fernholz Robert Fernholz May 8, 2019
Abstract
A set of data with positive values follows a
Pareto distribution if the log-log plot of value versusrank is approximately a straight line. A Pareto distribution satisfies
Zipf’s law if the log-log plot has aslope of −
1. Since many types of ranked data follow Zipf’s law, it is considered a form of universality.We propose a mathematical explanation for this phenomenon based on Atlas models and first-ordermodels, systems of positive continuous semimartingales with parameters that depend only on rank. Weshow that the stable distribution of an Atlas model will follow Zipf’s law if and only if two naturalconditions, conservation and completeness, are satisfied. Since Atlas models and first-order models canbe constructed to approximate systems of time-dependent rank-based data, our results can explain theuniversality of Zipf’s law for such systems. However, ranked data generated by other means may follownon-Zipfian Pareto distributions. Hence, our results explain why Zipf’s law holds for word frequency,firm size, household wealth, and city size, while it does not hold for earthquake magnitude, cumulativebook sales, the intensity of solar flares, and the intensity of wars, all of which follow non-Zipfian Paretodistributions.
MSC2010 subject classifications:
The authors thank Xavier Gabaix, Ioannis Karatzas, members of the Intech SPT seminar, and participants of the 2017Thera Stochastics Conference for their invaluable comments and suggestions regarding this research. Claremont McKenna College, 500 E. Ninth St., Claremont, CA 91711, [email protected]. Intech Investments, One Palmer Square, Princeton, NJ 08542, [email protected]. a r X i v : . [ q -f i n . E C ] M a y Introduction
A set of empirical data with positive values follows a
Pareto distribution if the log-log plot of the valuesversus rank is approximately a straight line. Pareto distributions are ubiquitous in the social and naturalsciences, appearing in a wide range of fields from geology to economics (Simon, 1955; Bak, 1996; Newman,2005). A Pareto distribution satisfies
Zipf ’s law if the log-log plot has a slope of −
1, following Zipf (1935),who noticed that the frequency of written words in English follows such a distribution. We shall refer tothese distributions as
Zipfian . Zipf’s law is considered a form of universality, since Zipfian distributions occuralmost as frequently as Pareto distributions. Nevertheless, according to Tao (2012), “mathematicians do nothave a fully satisfactory and convincing explanation for how the law comes about and why it is universal.”We propose a mathematical explanation of Zipf’s law based on
Atlas models and first-order models ,systems of continuous semimartingales with parameters that depend only on rank. Atlas and first-ordermodels can be constructed to approximate empirical systems of time-dependent rank-based data that exhibitsome form of stability (Fernholz, 2002; Banner et al., 2005). Atlas models have stable distributions that arePareto, while first-order models are more general than Atlas models and can be constructed to have anystable distribution. We show that under two natural conditions, conservation and completeness, the stabledistribution of an Atlas model will satisfy Zipf’s law. However, many empirical systems of time-dependentrank-based data generate distributions with log-log plots that are not actually straight lines but rather areconcave curves with a tangent of slope − quasi-Zipfian, and we shall use first-order models to approximate the systems thatgenerate them.The dichotomy between Zipfian and non-Zipfian Pareto distributions is of interest to us here. We findthat Zipfian and quasi-Zipfian distributions are usually generated by systems of time-dependent rank-baseddata, and it is this class of systems that we can approximate by Atlas models or first-order models. Incontrast, data that follow non-Zipfian Pareto distributions are usually generated by other means, oftenof a cumulative nature. Examples of time-dependent rank-based systems that generate Zipfian or quasi-Zipfian distributions include the market capitalization of companies (Simon and Bonini, 1958; Fernholz,2002), the population of cities (Gabaix, 1999), the employees of firms (Axtell, 2001), the income and wealthof households (Atkinson et al., 2011; Blanchet et al., 2017), and the assets of banks (Fernholz and Koch,2017). From the comprehensive survey of Newman (2005) we find an assortment of non-Zipfian Paretodistributions: the magnitude of earthquakes, citations of scientific papers, copies of books sold, the diameterof moon craters, the intensity of solar flares, and the intensity of wars, all of which are cumulative systems.Consider, for example, the magnitude of earthquakes: each new earthquake adds a new observation to thedata, but once recorded, these observations do not change over time. Such cumulative systems may generatePareto distributions, but we have no reason to believe that these distributions will be Zipfian.In the next sections we first review the properties of Atlas models and first-order models, and thencharacterize Zipfian and quasi-Zipfian systems using these models. We apply our results to the capitalizationof U.S. companies, with an analysis of the corresponding quasi-Zipfian distribution curve. Finally, we considera number of examples of other time-dependent systems as well as other approaches that have been used tocharacterize these systems. Proofs of all propositions are in the appendix, along with an example. We shall use systems of positive continuous semimartingales { X , . . . , X n } , with n >
1, to approximatesystems of time-dependent data. For such a system we define the rank function to be the random permutation r t ∈ Σ n such that r t ( i ) < r t ( j ) if X i ( t ) > X j ( t ) or if X i ( t ) = X j ( t ) and i < j . Here Σ n is the symmetricgroup on n elements. The rank processes X (1) ≥ · · · ≥ X ( n ) are defined by X ( r t ( i )) ( t ) = X i ( t ).We have assumed that X i ( t ) > t ∈ [0 , ∞ ) and i = 1 , . . . , n , a.s., so we can consider the logarithms ofthese processes. The processes (log X ( k ) − log X ( k +1) ), for k = 1 , . . . , n −
1, are called gap processes , and we2efine Λ
Xk,k +1 to be the local time at the origin for (log X ( k ) − log X ( k +1) ), with Λ X , = Λ Xn,n +1 ≡ X i spend no local time at triple points, then the rank processes satisfy d log X ( k ) ( t ) = n (cid:88) i =1 { r t ( i )= k } d log X i ( t ) + 12 d Λ Xk,k +1 ( t ) − d Λ Xk − ,k ( t ) , a.s. , (2.1)for k = 1 , . . . , n (Fernholz, 2002; Banner and Ghomrasni, 2008).We are interested in systems that show some kind of stability by rank, at least asymptotically. Since wemust apply our definition of stability to systems of empirical data as well as to continuous semimartingales,we use asymptotic time averages rather than expectations for our definitions. For the systems of continuoussemimartingales we consider, the law of large numbers implies that the asymptotic time averages are equalto the expectations (Banner et al., 2005; Ichiba et al., 2011). Definition 2.1. (Fernholz, 2002) A system of positive continuous semimartingales { X , . . . , X n } is asymp-totically stable if1. lim t →∞ t (cid:0) log X (1) ( t ) − log X ( n ) ( t ) (cid:1) = 0 , a.s. ( coherence );2. lim t →∞ t Λ Xk,k +1 ( t ) = λ k,k +1 > , a.s.;3. lim t →∞ t (cid:10) log X ( k ) − log X ( k +1) (cid:11) t = σ k,k +1 > , a.s.;for k = 1 , . . . , n − k = 1 , . . . , n , let us define the processes X [ k ] (cid:44) X (1) + · · · + X ( k ) , (2.2)in which case we can express X [ k ] in terms of the X i and Λ Xk,k +1 . Lemma 2.2.
Let X , . . . , X n be positive continuous semimartingales that satisfy (2.1) . Then dX [ k ] ( t ) = n (cid:88) i =1 { r t ( i ) ≤ k } dX i ( t ) + 12 X ( k ) ( t ) d Λ Xk,k +1 ( t ) , a.s. (2.3) for k = 1 , . . . , n . Lemma 2.2 describes the dynamic relationship between the combined value X [ k ] of the k top ranks andthe local time process Λ Xk,k +1 . This local time process compensates for turnover into and out of the top k ranks. Over time, some of the higher-ranked processes will decrease and exit from the top ranks, while someof the lower-ranked processes will increase and enter those top ranks. The process of entry and exit into andout of the top k ranks is quantified by the last term in (2.3), which measures the replacement of the topranks of the system by lower ranks.Lemma 2.2 allows us to express the local time Λ Xk,k +1 in terms of X i , X ( k ) , and X [ k ] , all of which areobservable. Hence, the parameters λ k,k +1 can be expressed as λ k,k +1 = lim T →∞ T (cid:90) T (cid:18) dX [ k ] ( t ) X ( k ) ( t ) − n (cid:88) i =1 { r t ( i ) ≤ k } dX i ( t ) X ( k ) ( t ) (cid:19) , a.s. , (2.4)for k = 1 , . . . , n −
1. In a similar fashion we can write σ k,k +1 = lim T →∞ T (cid:90) T d (cid:10) log X ( k ) − log X ( k +1) (cid:11) t , a.s. , (2.5)for k = 1 , . . . , n −
1. Equations (2.4) and (2.5) will allow us to define parameters equivalent to λ k,k +1 and σ k,k +1 for time-dependent systems of empirical data.3 Atlas models and first-order models
The simplest system we shall consider is an
Atlas model (Fernholz, 2002), a system of positive continuoussemimartingales { X , . . . , X n } defined by d log X i ( t ) = − g dt + ng { r t ( i )= n } dt + σ dW i ( t ) , (3.1)where g and σ are positive constants and ( W , . . . , W n ) is a Brownian motion. Atlas models are asymptoti-cally stable with parameters λ k,k +1 = 2 kg, and σ k,k +1 = 2 σ , (3.2)for k = 1 , . . . , n − X i in an Atlas model are exchangeable, so each X i asymptotically spends equal time ineach rank and hence has zero asymptotic log-drift. The gap processes (log X ( k ) − log X ( k +1) ) for Atlas modelshave stable distributions that are independent and exponentially distributed withlim T →∞ T (cid:90) T (cid:0) log X ( k ) ( t ) − log X ( k +1) ( t ) (cid:1) dt = σ k,k +1 λ k,k +1 , a.s. , (3.3)for k = 1 , . . . , n − X ( k ) versus rank will belim T →∞ T (cid:90) T log X ( k ) ( t ) − log X ( k +1) ( t )log( k ) − log( k + 1) dt (3.4)at rank k , so if we define the slope parameters s k by s k (cid:44) k lim T →∞ T (cid:90) T (cid:0) log X ( k ) ( t ) − log X ( k +1) ( t ) (cid:1) dt, (3.5)for k = 1 , . . . , n −
1, then − s k (cid:18) k (cid:19) < lim T →∞ T (cid:90) T log X ( k ) ( t ) − log X ( k +1) ( t )log( k ) − log( k + 1) dt < − s k , (3.6)for k = 1 , . . . , n −
1. Accordingly, for large enough k the slope parameter s k will be approximately equalto minus the slope given in (3.4). For expositional simplicity, we shall treat the s k as if they measured thetrue log-log slopes between adjacent ranks, but it is important to remember that this equivalence is only asaccurate as the range in inequality (3.6).For an Atlas model, it follows from (3.2) and (3.3) that s k = σ g , a.s. , (3.7)for k = 1 , . . . , n −
1, so the stable distribution of an Atlas model follows a Pareto distribution, at least withinthe approximation (3.6), and when σ = 2 g, (3.8)it follows Zipf’s law.A modest generalization of the Atlas model is a first-order model (Fernholz, 2002; Banner et al., 2005),a system of positive continuous semimartingales { X , . . . , X n } with d log X i ( t ) = g r t ( i ) dt + G n { r t ( i )= n } dt + σ r t ( i ) dW i ( t ) , (3.9)where σ , . . . , σ n are positive constants, g , . . . , g n are constants satisfying g + · · · + g n ≤ g + · · · + g k < k < n, (3.10)4 n = − ( g + · · · + g n ), and ( W , . . . , W n ) is a Brownian motion. First-order models are asymptotically stablewith parameters λ k,k +1 = − (cid:0) g + · · · + g k (cid:1) , a.s. , (3.11)and σ k,k +1 = σ k + σ k +1 , a.s. , (3.12)for k = 1 , . . . , n − simple if there is a positive constant g suchthat g k = − g , for k = 1 , . . . , n , and the σ k are nondecreasing, with 0 < σ ≤ · · · ≤ σ n .The processes X i in a first-order model are exchangeable, as they are for Atlas models, so again each X i asymptotically spends equal time in each rank and hence has zero asymptotic log-drift. Moreover, first-ordermodels have asymptotically exponential gaps, and (3.3) continues to hold in this more general case (Banneret al., 2005). The slope parameters for a first-order model are s k = k (cid:0) σ k + σ k +1 (cid:1) λ k,k +1 = − k (cid:0) σ k + σ k +1 (cid:1) (cid:0) g + · · · + g k (cid:1) , a.s. , (3.13)for k = 1 , . . . , n −
1, so the stable distribution of a first-order model is not confined to the class of Paretodistributions.A further generalization to hybrid Atlas models, systems of processes with growth rates and variancerates that depend both on rank and on name (denoted by the index i ), was introduced by Ichiba et al.(2011), who showed that these more general systems are also asymptotically stable. In a hybrid Atlas modelthe processes are not necessarily exchangeable, so processes occupying a given rank need not have the samegrowth rates and variance rates, and the asymptotic distribution of the gap processes may be mixtures ofexponential distributions rather than pure exponentials (Ichiba et al., 2011). Nevertheless, although we canexpect (3.3) to hold precisely only for systems in which the growth rates and variance rates are determinedby rank alone, in many cases this relation can still provide a reasonably accurate characterization of theinvariant distribution of the system.It is convenient to consider families of Atlas models and first-order models that share the same parameters,and for this purpose we define a first-order family to be a sequence of constants { g k , σ k } k ∈ N , with g + · · · + g k < ,σ k > , (3.14)for k ∈ N . A first-order family generates a class of first-order models { X , . . . , X n } , for n ∈ N , each definedas in (3.9) with the common parameters g k and σ k , the positive square root of σ k , and G n = − ( g + · · · + g n ),for n ∈ N . A first-order family is simple if all the first-order models generated by it are simple. An Atlasfamily is a first-order family with g k = − g < σ k = σ >
0, for k ∈ N .For first-order families, the parameters σ k,k +1 , λ k,k +1 and s k are defined uniquely for k ∈ N by (3.2),(3.7), (3.11), (3.12), and (3.13), as the case may be. Let us note that the slope parameters s k given by (3.13)do not depend on the number of processes in the model as long as n > k , so a first-order family definesa unique asymptotic distribution curve. These families will allow us to derive results about asymptoticdistribution curves without repeatedly reciting the characteristics of individual Atlas or first-order models.Moreover, we shall only consider values derived from the models in a first-order family when these modelsare in their stable distribution. Essentially, we need only consider the values that result from the parameters { g k , σ k } k ∈ N , and we can ignore the models themselves.A model { X , . . . , X n } in a simple first-order family will satisfy d log X i ( t ) = − g dt + ng { r t ( i )= n } dt + σ r t ( i ) dW i ( t ) , where g >
0, the σ k are nondecreasing, and ( W , . . . , W n ) is a Brownian motion. Hence, for a simplefirst-order family, λ k,k +1 = 2 kg, a.s. , (3.15)5nd s k = σ k + σ k +1 g , a.s. , (3.16)for k ∈ N , with the s k nondecreasing. Hence, in this case the log-log plot of the stable distribution will beconcave.It appears that actual empirical time-dependent systems often behave like simple first-order families,and we analyze one such example below, the capitalizations of U.S. companies (see Figures 1 and 2). Thecondition that the variance rates increase at the lower ranks seems natural — even in the original observationof Brown (1827) it would seem likely that the water molecules would have buffeted the smaller particles morevigorously than the larger ones. Suppose that { Y , . . . , Y n } , for n >
1, is an asymptotically stable system of positive continuous semimartin-gales with rank function ρ t ∈ Σ n such that ρ t ( i ) < ρ t ( j ) if Y i ( t ) > Y j ( t ) or if Y i ( t ) = Y j ( t ) and i < j . Let { Y (1) ≥ · · · ≥ Y ( n ) } be the corresponding rank processes with Y ( ρ t ( i )) ( t ) = Y i ( t ). As in Definition 2.1, for theprocesses Y . . . , Y n we can define the parameters λ k,k +1 (cid:44) lim t →∞ t Λ Yk,k +1 ( t ) > , a.s. , σ k,k +1 (cid:44) lim t →∞ t (cid:10) log Y ( k ) − log Y ( k +1) (cid:11) t > , a.s. , (4.1)for k = 1 , . . . , n −
1, and by convention λ , = 0, σ , = σ , , and σ n,n +1 = σ n − ,n . Definition 4.1. (Fernholz, 2002) Let { Y , . . . , Y n } be an asymptotically stable system of positive continuoussemimartingales with parameters λ k,k +1 and σ k,k +1 , for k = 1 , . . . , n , defined by (4.1). Then the first-orderapproximation for { Y , . . . , Y n } is the first-order model { X , . . . , X n } with d log X i ( t ) = g r t ( i ) dt + G n { r t ( i )= n } dt + σ r t ( i ) dW i ( t ) , (4.2)for i = 1 , . . . , n , where r t ∈ Σ n is the rank function for the X i , the parameters g k and σ k are defined by g k = 12 λ k − ,k − λ k,k +1 , for k = 1 , . . . , n − , and g n = g + · · · + g n − n − ,σ k = 14 (cid:0) σ k − ,k + σ k,k +1 (cid:1) , for k = 1 , . . . , n, (4.3)where σ k is the positive square root of σ k , G n = − ( g + · · · + g n ), and ( W , . . . , W n ) is a Brownian motion.For the first-order model (4.2) with parameters (4.3), equations (3.11) and (3.12) imply that λ k,k +1 = − (cid:0) g + · · · + g k (cid:1) = λ k,k +1 , a.s. , (4.4)for k = 1 , . . . , n −
1, and σ k,k +1 = σ k + σ k +1 = 14 (cid:0) σ k − ,k + 2 σ k,k +1 + σ k +1 ,k +2 (cid:1) , a.s. , for k = 1 , . . . , n −
1. Hence, (3.3) becomeslim T →∞ T (cid:90) T (cid:0) log X ( k ) ( t ) − log X ( k +1) ( t ) (cid:1) dt = σ k,k +1 λ k,k +1 = σ k − ,k + 2 σ k,k +1 + σ k +1 ,k +2 λ k,k +1 , a.s. , (4.5)6or k = 1 , . . . , n −
1. If the processes Y . . . , Y n satisfylim T →∞ T (cid:90) T (cid:0) log Y ( k ) ( t ) − log Y ( k +1) ( t ) (cid:1) dt ∼ = σ k,k +1 λ k,k +1 , (4.6)for k = 1 , . . . , n −
1, then the stable distribution (4.5) for the first-order approximation will be a smoothedversion of the stable distribution (4.6) for the Y i . The approximation (4.6) will be accurate if the gapseries (log Y ( k ) ( t ) − log Y ( k +1) ( t )) behave like reflected Brownian motion, which has an exponential stabledistribution. We can expect this approximation to hold when the behavior of the processes Y . . . , Y n is determined mostly by rank. The accuracy of this approximation is likely to deteriorate when moreidiosyncratic characteristics are present, characteristics that depend on the indices i .Now suppose that we have a time-dependent system { Z ( τ ) , Z ( τ ) , . . . } of positive-valued data observedat times τ ∈ { , , . . . , T } . Let N τ = { Z ( τ ) , Z ( τ ) , . . . } and N = N ∧ · · · ∧ N T , (4.7)where ρ τ : N → N be the rank function for the system { Z ( τ ) , Z ( τ ) , . . . } suchthat ρ τ restricted to the subset { , . . . , N τ } is the permutation with ρ τ ( i ) < ρ τ ( j ) if Z i ( τ ) > Z j ( τ ) or if Z i ( τ ) = Z j ( τ ) and i < j , and for i > N τ , ρ τ ( i ) = i . We define the ranked values { Z (1) ( τ ) ≥ Z (2) ( τ ) ≥ · · · } such that Z ( ρ τ ( i )) ( τ ) = Z i ( τ ) for i ≤ N τ , and for definiteness we can let Z ( k ) ( τ ) = 0 for k > N τ . With thesedefinitions, we have Z [ k ] ( τ ) = Z (1) ( τ ) + · · · + Z ( k ) ( τ ) , for k = 1 , . . . , N and τ ∈ { , , . . . , T } .We can mimic the time averages (2.4) and (2.5) to define the parameters λ k,k +1 (cid:44) T − T − (cid:88) τ =1 (cid:18) Z [ k ] ( τ + 1) − Z [ k ] ( τ ) Z ( k ) ( τ ) − N (cid:88) i =1 { ρ τ ( i ) ≤ k } Z i ( τ + 1) − Z i ( τ ) Z ( k ) ( τ ) (cid:19) , (4.8)and σ k,k +1 (cid:44) T − T − (cid:88) τ =1 (cid:16)(cid:0) log Z ( k ) ( τ + 1) − log Z ( k +1) ( τ + 1) (cid:1) − (cid:0) log Z ( k ) ( τ ) − log Z ( k +1) ( τ ) (cid:1)(cid:17) (4.9)for k = 1 , . . . , N −
1, and by convention λ , = 0 and σ , = σ , . Definition 4.2.
Suppose that { Z ( τ ) , Z ( τ ) , . . . } is a time-dependent system of positive-valued data with N , λ k,k +1 , and σ k,k +1 defined as in (4.7), (4.8), and (4.9). The first-order approximation of { Z ( τ ) , Z ( τ ) , . . . } is the first-order family { g k , σ k } k ∈ N with g k = 12 λ k − ,k − λ k,k +1 , for k = 1 , . . . , N − , and g k = g + · · · + g N − N − , for k ≥ N,σ k = 14 (cid:0) σ k − ,k + σ k,k +1 (cid:1) , for k = 1 , . . . , N − , and σ k = σ N − , for k ≥ N, (4.10)With this definition the slope parameters s k given in (3.13) are constant for k ≥ N . If the data satisfy1 T T (cid:88) τ =1 (cid:0) log Z ( k ) ( τ ) − log Z ( k +1) ( τ ) (cid:1) dt ∼ = σ k,k +1 λ k,k +1 , (4.11)for k ∈ N , then the stable distribution (4.5) for the first-order approximation will be a smoothed version ofthe distribution (4.11) for the data { Z (1) ( τ ) , Z (2) ( τ ) , . . . } . As was the case with (4.6), the approximation(4.11) will be accurate if the gap series (log Z ( k ) ( τ ) − log Z ( k +1) ( τ )) are distributed like reflected Brownianmotion. We shall say that a system of time-dependent data that satisfies (4.11) is rank-based , and we canexpect this approximation to hold when the behavior of the data is determined mostly by rank. We shouldalso note that (4.8), (4.9), and (4.11) are not true asymptotic values, but rather estimates based on limiteddata. 7 Zipfian systems of time-dependent data
Zipf’s law originally referred to the frequency of words in a written language (Zipf, 1935), with the system { Z ( τ ) , Z ( τ ) , . . . } , where Z i ( τ ) represents the number of occurrences of the i th word in a language at time τ .To measure the relative frequency of written words in a language it is not possible to observe all the writtenwords in that language. Instead, the words must be sampled, where a random sample is selected (withoutreplacement), and the frequency versus rank of this random sample is studied. For example, in Wikipedia(2019) 10 million words in each of 30 languages were sampled, and the resulting distribution curves created.If the sample is large enough, the distribution of the sampled data should not differ materially from thedistribution of the entire data set, at least for the higher ranks.An additional advantage that arises from using sampled data is that the total number of data in thesample remains constant over time. The total number of written words that appear in a language is likely toincrease over time, and this increase could bias estimates of some parameters. Sampling the data will removesuch a trend from the data, since a constant number of words can be sampled at each time. Accordingly,in all cases we shall assume that global trends have been removed from the data, either by sampling or bysome other means of detrending.Since we have assumed that we have a constant sample size or that the data have been detrendedsomehow, the total count of our sampled data will remain constant, so Z ( τ ) + Z ( τ ) + · · · = constant , (5.1)for τ ∈ { , , . . . , T } , where in the case of the Wikipedia words the constant would be 10 million.Suppose we have a time-dependent system of positive-valued data { Z ( τ ) , Z ( τ ) , . . . } and we observe thetop n ranks, for 1 < n < N , with N from (4.7), along with Z [ n ] ( τ ) = Z (1) ( τ ) + · · · + Z ( n ) ( τ ) . Since the total value of the sampled data in (5.1) is constant, for large enough n it is reasonable to expectthe relative change of the top n ranks to satisfy Z [ n ] ( τ + 1) − Z [ n ] ( τ ) Z [ n ] ( τ ) ∼ = 0 , (5.2)as n becomes large. This condition is essentially a “conservation of mass” criterion for { Z ( τ ) , Z ( τ ) , . . . } ,and we would like to interpret this in terms of first-order families.In all that follows, for a first-order family { g k , σ k } k ∈ N we shall use the notation E n to denote the ex-pectation with respect to the stable distribution for the model { X , . . . , X n } defined by that family. Thefollowing definition is motivated by the condition (5.2). Definition 5.1.
The first-order family { g k , σ k } k ∈ N is conservative iflim n →∞ E n (cid:20) dX [ n ] ( t ) X [ n ] ( t ) (cid:21) = 0 . (5.3)For the system { Z ( τ ) , Z ( τ ) , . . . } and for n < N , the effect of processes that leave the top n ranks overthe time interval [ τ, τ + 1] and are replaced by processes from the lower ranks is measured by Z [ n ] ( τ + 1) − N (cid:88) i =1 { ρ τ ( i ) ≤ n } Z i ( τ + 1) , or (cid:0) Z [ n ] ( τ + 1) − Z [ n ] ( τ ) (cid:1) − (cid:18) N (cid:88) i =1 { ρ τ ( i ) ≤ n } (cid:0) Z i ( τ + 1) − Z i ( τ ) (cid:1)(cid:19) . n , i.e., that1 T − T − (cid:88) τ =1 (cid:20) Z [ n ] ( τ + 1) − Z [ n ] ( τ ) Z [ n ] ( τ ) − N (cid:88) i =1 { ρ τ ( i ) ≤ n } Z i ( τ + 1) − Z i ( τ ) Z [ n ] ( τ ) (cid:21) ∼ = 0 , (5.4)for large enough n . In terms of the first-order approximation { g k , σ k } k ∈ N to { Z ( τ ) , Z ( τ ) , . . . } , the corre-sponding condition will belim T →∞ T (cid:90) T (cid:18) dX [ n ] ( t ) X [ n ] ( t ) − N (cid:88) i =1 { r t ( i ) ≤ n } dX i ( t ) X [ n ] ( t ) (cid:19) ∼ = 0 , a.s. , for large enough n , where N > n and { X , . . . , X N } is a first-order model defined by { g k , σ k } k ∈ N . By (2.3),this is equivalent to lim T →∞ T (cid:90) T X ( n ) ( t )2 X [ n ] ( t ) d Λ Xn,n +1 ( t ) ∼ = 0 , a.s. , for large enough n . Sincelim T →∞ T (cid:90) T d Λ Xn,n +1 ( t ) = λ n,n +1 = − (cid:0) g + · · · + g n (cid:1) , a.s. , (5.5)condition (5.4) can be interpreted aslim T →∞ T (cid:90) T − (cid:0) g + · · · + g n (cid:1) X ( n ) ( t ) X [ n ] ( t ) dt ∼ = 0 , a.s. , for large enough n . For the model { X , . . . , X n } defined by the family { g k , σ k } k ∈ N , G n = − (cid:0) g + · · · + g n (cid:1) ,so the following definition is derived from condition (5.4). Definition 5.2.
The first-order family { g k , σ k } k ∈ N is complete iflim n →∞ E n (cid:20) G n X ( n ) ( t ) X [ n ] ( t ) (cid:21) = 0 . (5.6)For an Atlas family or simple first-order family, (5.6) is equivalent tolim n →∞ E n (cid:20) ngX ( n ) ( t ) X [ n ] ( t ) (cid:21) = 0 , (5.7)since G n = ng . Definition 5.3.
An Atlas family or first-order family is
Zipfian if its slope parameters s k = 1, for k ∈ N . Atime-dependent rank-based system is Zipfian if its first-order approximation is Zipfian.
Proposition 5.4.
An Atlas family is Zipfian if and only if it is conservative and complete.
Since many empirical distributions are not Zipfian but rather quasi-Zipfian, we need to formalize thisconcept for first-order models.
Definition 5.5.
A first-order family is quasi-Zipfian if its slope parameters s k are nondecreasing with s ≤ k →∞ s k ≥ , (5.8)where this limit includes divergence to infinity. A time-dependent rank-based system is quasi-Zipfian if itsfirst-order approximation is quasi-Zipfian. 9ecause the slope parameters s k are approximately equal to minus the slope of a log-log plot of sizeversus rank, Definition 5.5 implies that a time-dependent rank-based system will be quasi-Zipfian if thislog-log plot is concave with slope not steeper than − − Proposition 5.6.
If a simple first-order family is conservative, complete, and satisfies lim n →∞ E n (cid:20) X (1) ( t ) X [ n ] ( t ) (cid:21) ≤ , (5.9) then it is quasi-Zipfian. We show in Example A.1 below that a conservative and complete first-order family { g k , σ k } k ∈ N with g k = − g <
0, for k ∈ N , can have a Pareto distribution with log-log slope steeper than − σ k are notnondecreasing. Here we apply the methods we developed above to an actual time-dependent rank-based system. We alsodiscuss a number of other such systems, as well as other approaches to time-dependent rank-based systems.
Example 6.1.
Market capitalization of companies.
The market capitalization of U.S. companies was studied as early as Simon and Bonini (1958), and herewe follow the methodology of Fernholz (2002). The capitalization of a company is defined as the price of thecompany’s stock multiplied by the number of shares outstanding. Ample data are available for stock prices,and this allows us to estimate the first-order parameters we introduced in the previous sections.Figure 1 shows the smoothed first-order parameters σ k and − g k for the U.S. capital distribution for the10 year period from January 1990 to December 1999. The capitalization data we used were from the monthlystock database of the Center for Research in Securities Prices at the University of Chicago. The market weconsider consists of the stocks traded on the New York Stock Exchange, the American Stock Exchange, andthe NASDAQ Stock Market, after the removal of all Real Estate Investment Trusts, all closed-end funds,and those American Depositary Receipts not included in the S&P 500 Index. The parameters in Figure 1correspond to the 5000 stocks with the highest capitalizations each month. The first-order parameters g k and σ k were calculated as in (4.3) from the parameters λ k,k +1 and σ k,k +1 , and then smoothed by convolutionwith a Gaussian kernel with ± .
16 standard deviations spanning 100 months on the horizontal axis, withreflection at the ends of the data.We see in Figure 1 that the values of the parameters − g k are relatively constant compared to the pa-rameters σ k , which increase almost linearly with rank. The near-constant − g k suggest that the first-orderapproximation will generate a simple first-order family. In Figure 2, the distribution curve for the capital-izations is represented by the black curve, which represents the average of the year-end capital distributionsfor the ten years spanned by the data. The broken red curve is the first-order approximation of the distri-bution following (4.5). The two curves are quite close, and this indicates that the time-dependent systemof company capitalizations is mostly rank-based. The black dot on the curve between ranks 100 and 500 isthe point at which the log-log slope of the tangent to the curve is −
1, so this is a quasi-Zipfian distribution,consistent with Proposition 5.6. Note that if we had considered only the top 100 companies, the completenesscondition, Definition 5.2, would have failed, as we would expect for an incomplete distribution.
Example 6.2.
Frequency of written words.
Word frequency is the origin of Zipf’s law (Zipf, 1935), but testing our methodology with word-frequencycould be difficult. Ideally, we would like to construct a first-order approximation for the data and comparethe first-order distribution to that of the original data. However, the parameters λ k,k +1 and σ k,k +1 for10he top-ranked words in a language are likely to be difficult to estimate over any reasonable time frame,since the top-ranked words probably seldom change ranks. Nevertheless, while the top ranks may requirecenturies of data for accurate estimates, the lower ranks could be amenable to analysis similar to that whichwe carried out for company capitalizations. Moreover, it might be possible to combine, for example, all theIndo-European languages and generate accurate estimates of the λ k,k +1 and σ k,k +1 even for the top ranksof the combined data.We can see from the remarkable chart in Wikipedia (2019) that the log-log plots for 30 different languagesare (almost) straight. Actually, these plots are slightly concave, or quasi-Zipfian in nature. It is possiblethat this slight curvature is due to sampling error at the lower ranks, which would raise the variances andsteepen the slope, but this would have to be determined by studying the actual data. Example 6.3.
Random growth processes.
Economists have traditionally used random growth processes to model time-dependent systems withquasi-Zipfian distributions. For example, these processes were used by Gabaix (1999) to model the dis-tribution of city populations and by Blanchet et al. (2017) to construct a piecewise approximation to thedistribution curves for the income and wealth of U.S. households. A random growth process is an Itˆo processof the form dX ( t ) X ( t ) = µ ( X ( t )) dt + σ ( X ( t )) dW ( t ) , (6.1)where W is Brownian motion and µ and σ are well-behaved real-valued functions. We can convert this intologarithmic form by Itˆo’s rule, in which case d log X ( t ) = (cid:18) µ ( X ( t )) − σ ( X ( t ))2 (cid:19) dt + σ ( X ( t )) dW ( t ) , a.s. (6.2)We shall assume that this equation has at least a weak solution with X ( t ) >
0, a.s., and that the solutionhas a stable distribution.Let us construct n i.i.d. copies X , . . . , X n of X , all defined by (6.1) or, equivalently, by (6.2), andassume that the X i are all in their common stable distribution. Let us assume that the X i spend no localtime at triple points, so we can define the rank processes and (2.1) and (2.3) will be valid. If the systemis asymptotically stable we can calculate the corresponding rank-based growth rates g k , but if we know thestable distribution of the original process (6.1), then there is a simpler way to proceed.If we know the common stable distribution of the X i , then we can calculate expectations under thisstable distribution and let g k = E (cid:20) µ ( X ( k ) ( t )) − σ ( X ( k ) ( t ))2 (cid:21) and σ k = E (cid:2) σ ( X ( k ) ( t ) (cid:3) , (6.3)for k = 1 , . . . , n . Under appropriate regularity conditions on the µ and σ , the expectations here willbe equal to the asymptotic time averages of the functions. Since the X i are stable, the geometric mean (cid:0) X X . . . X n (cid:1) /n = (cid:0) X (1) X (2) . . . X ( n ) (cid:1) /n will also be stable, so (cid:0) g + · · · + g n (cid:1) t = E (cid:2) log (cid:0) X (1) ( t ) · · · X ( n ) ( t ) (cid:1) − log (cid:0) X (1) (0) · · · X ( n ) (0) (cid:1)(cid:3) = 0 . Hence, g + · · · + g n = 0 , with g + · · · + g k < , for k < n, (6.4)so the g k and σ k define the first-order model d log Y i ( t ) = g r t ( i ) dt + σ r t ( i ) dW i ( t ) , (6.5)where W , . . . , W n is n -dimensional Brownian motion. In this case, G n = 0.11f the functions µ and σ in (6.1) are smooth enough, then the system is likely to be rank-based and thestable distributions of the gap processes (log X ( k ) − log X ( k +1) ) will be (close to) exponential. In this casethe stable distribution of the first-order model (6.5) will be close to that of the original system (6.1). Moreconditions are required to ensure that this stable distribution be quasi-Zipfian, and to achieve a true Zipfiandistribution, a lower reflecting barrier or other equivalent device must be included in the model (Gabaix,2009). Example 6.4.
Population of cities.
The distribution of city populations is a prominent example of Zipf’s law in social science. However, asthe comprehensive cross-country investigation of Soo (2005) shows, city size distributions in most countriesare not Zipfian but rather quasi-Zipfian. Gabaix (1999) hypothesized that the quasi-Zipfian distribution ofU.S. city size was caused by higher population variances at the lower ranks, consistent with Proposition 5.6.Which of the deviations from Zipf’s law uncovered by Soo (2005) are due to population variances thatdecrease with increasing city size remains an open question.There is another phenomenon that occurs with city size distributions. Suppose that rather than studyinga large country like the U.S. we consider instead the populations of the cities in New York State. Accordingto the 2010 U.S. census, the largest city, New York City, had a population of 8,175,133, while the secondlargest, Buffalo, had only 261,310, so this distribution is non-Zipfian. The corresponding population of NewYork State was 19,378,102, so hypothesis (5.9) of Proposition 5.6 is satisfied, but nevertheless the propositionfails. This calls for an explanation, and we conjecture that while the population of the cities of New YorkState comprise a time-dependent system, this system is not rank-based. The population of New York City isnot determined merely by its rank among New York State cities, but is highly city-specific in nature. Hence,we cannot expect the stable distribution for the gap process between New York City and second-rankedBuffalo to be exponential, and we cannot expect the distribution of the system to be quasi-Zipfian.
Example 6.5.
Assets of banks.
Fernholz and Koch (2016) show that the distribution of assets held by U.S. bank holding companies,commercial banks, and savings and loan associations are all quasi-Zipfian. This is true despite the factthat these distributions have undergone significant changes over the past few decades. However, as Fernholzand Koch (2017) show, the first-order approximations of these time-dependent rank-based systems generallydo not satisfy the hypotheses of Proposition 5.6, since the parameters σ k,k +1 are, in most cases, lower forhigher values of k . Nonetheless, the parameters λ k,k +1 vary with k in such a way as to generate quasi-Zipfiandistributions. Example 6.6.
Employees of firms.
Axtell (2001) shows that the distribution of employees of U.S. firms is close to Zipfian, with only slightconcavity. A number of empirical analyses have shown that for all but the tiniest firms, employment growthrates of U.S. firms do not vary with firm size (Neumark et al., 2011). This observation together with the slightconcavity demonstrated by Axtell (2001) suggests that the first-order approximation of U.S. firm employeesis simple, which would explain its quasi-Zipfian nature.
We have shown that the stable distribution of an Atlas family will follow Zipf’s law if and only if two naturalconditions, conservation and completeness, are satisfied. We have also shown that a simple first-order familywill have a stable distribution that is quasi-Zipfian if the family is conservative and complete, provided thatthe largest weight is not greater than one half. Since many systems of time-dependent rank-based empiricaldata can be approximated by Atlas families or simple first-order families, our results offer an explanationfor the universality of Zipf’s law for these systems. 12
Proofs and examples
Proof of Lemma 2.2.
Suppose that the rank processes X ( k ) satisfy (2.1), so we have d log X ( k ) ( t ) = n (cid:88) i =1 { r t ( i )= k } d log X i ( t ) + 12 d Λ Xk,k +1 ( t ) − d Λ Xk − ,k ( t ) , a.s. , for k = 1 , . . . , n . By Itˆo’s rule this is equivalent to dX ( k ) ( t ) X ( k ) ( t ) = n (cid:88) i =1 { r t ( i )= k } dX i ( t ) X i ( t ) + 12 d Λ Xk,k +1 ( t ) − d Λ Xk − ,k ( t )= n (cid:88) i =1 { r t ( i )= k } dX i ( t ) X ( k ) ( t ) + 12 d Λ Xk,k +1 ( t ) − d Λ Xk − ,k ( t ) , a.s. , for k = 1 , . . . , n . From this we have dX ( k ) ( t ) = n (cid:88) i =1 { r t ( i )= k } dX i ( t ) + 12 X ( k ) ( t ) d Λ Xk,k +1 ( t ) − X ( k ) ( t ) d Λ Xk − ,k ( t )= n (cid:88) i =1 { r t ( i )= k } dX i ( t ) + 12 X ( k ) ( t ) d Λ Xk,k +1 ( t ) − X ( k − ( t ) d Λ Xk − ,k ( t ) , a.s. , for k = 1 , . . . , n , since the support of d Λ Xk − ,k is contained in the set (cid:8) t : log X ( k − ( t ) = log X ( k ) ( t ) (cid:9) . Nowwe can add up dX (1) ( t ) + · · · + dX ( k ) ( t ) = dX [ k ] ( t ) and we have dX [ k ] ( t ) = n (cid:88) i =1 { r t ( i ) ≤ k } dX i ( t ) + 12 X ( k ) ( t ) d Λ Xk,k +1 ( t ) , a.s. , for k = 1 , . . . , n . Proof of Proposition 5.4.
For an Atlas model { X , . . . , X n } with parameters g > σ >
0, Itˆo’s ruleimplies that dX i ( t ) = (cid:18) σ − g + ng { r t ( i )= n } (cid:19) X i ( t ) dt + σX i ( t ) dW i ( t ) , a.s. , for all i = 1 , . . . , n . Hence, dX [ n ] ( t ) = (cid:18) σ − g (cid:19) X [ n ] ( t ) dt + X [ n ] ( t ) dM ( t ) + ngX ( n ) ( t ) dt, a.s. , where M is a local martingale incorporating all of the terms σ dW i ( t ). From this we have dX [ n ] ( t ) X [ n ] ( t ) = (cid:18) σ − g (cid:19) dt + dM ( t ) + ngX ( n ) ( t ) X [ n ] ( t ) dt, a.s. , so E n (cid:20) dX [ n ] ( t ) X [ n ] ( t ) (cid:21) = (cid:18) σ − g (cid:19) dt + E n (cid:20) ngX ( n ) ( t ) X [ n ] ( t ) (cid:21) dt, (A.1)and it follows from (5.3) and (5.7) that σ / g = 1. Hence, conservation and completeness imply that theAtlas family will be Zipfian.If the Atlas family is Zipfian, then σ / g = 1 and with the Atlas model { X , . . . , X n } in its stable distri-bution, then the random variables log (cid:0) X ( k ) ( t ) /X ( k +1) ( t ) (cid:1) will be independent and exponentially distributedwith mean k − , for k = 1 , . . . , n . Hence,log (cid:0) X ( n ) ( t ) /X (1) ( t ) (cid:1) = n (cid:88) k =1 log (cid:0) X ( k ) ( t ) /X ( k − ( t ) (cid:1) = − O (log n ) , a.s. , n → ∞ (Etemadi, 1983), so X ( n ) ( t ) /X (1) ( t ) = O (1 /n ) , a.s. , as n → ∞ . Hence, X [ n ] ( t ) X (1) ( t ) = n (cid:88) k =1 X ( k ) ( t ) X (1) ( t ) = O (log n ) , a.s. , as n → ∞ , and lim n →∞ nX ( n ) ( t ) X [ n ] ( t ) = 0 , a.s.Since nX ( n ) ( t ) /X [ n ] ( t ) ≤
1, we can invoke bounded convergence, solim n →∞ E n (cid:20) ngX ( n ) ( t ) X [ n ] ( t ) (cid:21) = 0 , (A.2)and the family will be complete. Since σ / g and (A.2) holds, the right-hand side of (A.1) converges tozero, so the left-hand side must also converge to zero, and the family will be conservative. Remark.
In (A.2) the infinite series ∞ (cid:88) k =1 E n (cid:2) X ( k ) ( t ) (cid:3) would not converge, however, this does not affect us since we consider only finite portions of the series. Proof of Proposition 5.6.
For a first-order model { X , . . . , X n } with parameters g = · · · = g n = g > < σ ≤ · · · ≤ σ n , Itˆo’s rule implies that dX i ( t ) = (cid:18) σ r t ( i ) − g + ng { r t ( i )= n } (cid:19) X i ( t ) dt + σ r t ( i ) X i ( t ) dW i ( t ) , a.s. , for i = 1 , . . . , n . Hence, dX [ n ] ( t ) = n (cid:88) k =1 X ( k ) ( t ) (cid:18) σ k − g (cid:19) dt + dM ( t ) + ngX ( n ) ( t ) dt, a.s. , where M is a local martingale incorporating all of the terms σ r t ( i ) X i ( t ) dW i ( t ), so E n (cid:20) dX [ n ] ( t ) X [ n ] ( t ) (cid:21) = (cid:18) n (cid:88) k =1 E n (cid:20) X ( k ) ( t ) X [ n ] ( t ) (cid:21) σ k − g (cid:19) dt + E n (cid:20) ngX ( n ) ( t ) X [ n ] ( t ) (cid:21) . (A.3)If the system is conservative (5.3) and complete (5.7), then as n tends to infinity the first and last terms of(A.3) vanish and we have lim n →∞ n (cid:88) k =1 E n (cid:20) X ( k ) ( t ) X [ n ] ( t ) (cid:21) σ k g = 1 . (A.4)Let us now show that (5.9) implies that s ≤
1. Since the σ k are nondecreasing, (A.4) implies that1 ≥ lim n →∞ E n (cid:20) X (1) ( t ) X [ n ] ( t ) (cid:21) σ g + lim n →∞ n (cid:88) k =2 E n (cid:20) X ( k ) ( t ) X [ n ] ( t ) (cid:21) σ g = lim n →∞ E n (cid:20) X (1) ( t ) X [ n ] ( t ) (cid:21) σ g + (cid:18) − lim n →∞ E n (cid:20) X (1) ( t ) X [ n ] ( t ) (cid:21)(cid:19) σ g ≥ σ g + 12 σ g = s , k →∞ s k ≥ s k diverge to infinity. Since the σ k are nondecreasing,as k tends to infinity they must either converge to a finite value or diverge to infinity. If the σ k diverge toinfinity, the same will be true for the s k . If lim k →∞ σ k = σ then lim k →∞ s k = σ / g , and since the σ k arenondecreasing, 1 = lim n →∞ n (cid:88) k =1 E n (cid:20) X ( k ) ( t ) X [ n ] ( t ) (cid:21) σ k g ≤ σ g . It follows that lim k →∞ s k ≥ Example A.1.
A conservative and complete first-order family with a non-Zipfian Pareto distribution.
Con-sider the first-order family { g k , σ k } k ∈ N such that, g k = g,σ k − = ρ , (A.5) σ k = 2 σ − ρ , where g > σ > g , and 0 < ρ < σ . In this case, σ k + σ k +1 = 2 σ , so, according to (3.16), the slope parameters s k for this model will be s k = σ g , for k ∈ N . Hence, the log-log plot of the stable distribution for this family is a straight line with slope − σ / g < −
1, i.e., a Pareto distribution.Because this first-order model has the same stable distribution as an Atlas model with σ / g >
1, wecan use an argument similar to (A.2) to conclude that E n (cid:2) ngX ( n ) ( t ) /X [ n ] ( t ) (cid:3) → n → ∞ , so the familyis complete. In addition, we have E n (cid:20) dX [ n ] ( t ) X [ n ] ( t ) (cid:21) = (cid:18) n (cid:88) k =1 E n (cid:20) X ( k ) ( t ) X [ n ] ( t ) (cid:21) σ k − g (cid:19) dt + E n (cid:20) ngX ( n ) ( t ) X [ n ] ( t ) (cid:21) dt, so if we can show that for some choice of ρ ,lim n →∞ n (cid:88) k =1 E n (cid:20) X ( k ) ( t ) X [ n ] ( t ) (cid:21) σ k g, (A.6)then the conservation condition, lim n →∞ E n (cid:20) dX [ n ] ( t ) X [ n ] ( t ) (cid:21) = 0 , will also hold for the family with that value of ρ .The expectations in (A.6) are invariant with respect to the choice of ρ , so by (A.5) the sum in (A.6) willbe continuous in ρ . If we evaluate this sum at ρ = 0, only the even ranks will appear, so for large enough σ / g (cid:29)
1, lim n →∞ n (cid:88) k =1 E n (cid:20) X (2 k ) ( t ) X [2 n ] ( t ) (cid:21) σ k O (cid:18) σ lim n →∞ n (cid:88) k =1 (2 k ) − σ / g (cid:19) = O (cid:18) σ − σ / g lim n →∞ n (cid:88) k =1 k − σ / g (cid:19) = O (cid:0) σ − σ / g (cid:1) , σ tends to infinity. Hence, for large enough σ / g (cid:29) ρ ∼ = 0 the sum in (A.6)will be close to zero by continuity. For large enough σ / g (cid:29)
1, the log-log slope − σ / g of the distributioncan become arbitrarily steep, so E n (cid:2) X (1) ( t ) /X [ n ] ( t ) (cid:3) ∼ = 1. In this case, for ρ ∼ = 2 σ ,lim n →∞ n (cid:88) k =1 E n (cid:20) X ( k ) ( t ) X [ n ] ( t ) (cid:21) σ k ∼ = σ ρ ∼ = σ (cid:29) g > . By continuity, for some ρ ∈ (0 , σ ), (A.6) will hold, so the family will be conservative. Hence, for thatvalue of ρ the family will be conservative and complete, but it is neither Zipfian nor quasi-Zipfian. References
Atkinson, A. B., T. Piketty, and E. Saez (2011, March). Top incomes in the long run of history.
Journal ofEconomic Literature 49 (1), 3–71.Axtell, R. (2001, September). Zipf distribution of U.S. firm sizes.
Science 293 (5536), 1818–1820.Bak, P. (1996).
How Nature Works . New York: Springer-Verlag.Banner, A., R. Fernholz, and I. Karatzas (2005). Atlas models of equity markets.
Annals of AppliedProbability 15 (4), 2296–2330.Banner, A. and R. Ghomrasni (2008, July). Local times of ranked continuous semimartingales.
StochasticProcesses and their Applications 118 (7), 1244–1253.Blanchet, T., J. Fournier, and T. Piketty (2017, March). Generalized Pareto curves: Theory and applications.Technical report, World Wealth & Income Database.Brown, R. (1827). Brownian motion. Unpublished experiment.Etemadi, N. (1983). Stability of sums of weighted nonnegative random variables.
Journal of MultivariateAnalysis 13 , 361–365.Fernholz, E. R. (2002).
Stochastic Portfolio Theory . New York: Springer-Verlag.Fernholz, R. and I. Karatzas (2009). Stochastic portfolio theory: an overview. In A. Bensoussan andQ. Zhang (Eds.),
Mathematical Modelling and Numerical Methods in Finance: Special Volume, Handbookof Numerical Analysis , Volume XV, pp. 89–168. Amsterdam: North-Holland.Fernholz, R. T. and C. Koch (2016, February). Why are big banks getting bigger?
Federal Reserve Bank ofDallas Working Paper 1604 .Fernholz, R. T. and C. Koch (2017, May). Big banks, idiosyncratic volatility, and systemic risk.
AmericanEconomic Review: Papers and Proceedings 107 (5), 603–607.Gabaix, X. (1999, August). Zipf’s law for cities: An explanation.
Quarterly Journal of Economics 114 (3),739–767.Gabaix, X. (2009, 05). Power laws in economics and finance.
Annual Review of Economics 1 (1), 255–294.Harrison, J. M. and R. J. Williams (1987). Multidimensional reflected Brownian motions having exponentialstationary distributions.
The Annals of Probability 15 (1), 115–137.Ichiba, T., V. Papathanakos, A. Banner, I. Karatzas, and R. Fernholz (2011). Hybrid Atlas models.
Annalsof Applied Probability 21 , 609–644. 16eumark, D., B. Wall, and J. Zhang (2011, February). Do small businesses create more jobs? New ev-idence for the United States from the National Establishment time series.
Review of Economics andStatistics 93 (1), 16–29.Newman, M. E. J. (2005, September-October). Power laws, Pareto distributions, and Zipf’s law.
Contem-porary Physics 46 (5), 323–351.Simon, H. and C. Bonini (1958). The size distribution of business firms.
American Economic Review 48 ,607–617.Simon, H. A. (1955, December). On a class of skew distribution functions.
Biometrika 42 (3/4), 425–440.Soo, K. T. (2005, May). Zipf’s law for cities: A cross-country investigation.
Regional Science and UrbanEconomics 35 (3), 239–263.Tao, T. (2012). E pluribus unum: From complexity, universality.
Daedalus 141 (3), 23–34.Wikipedia (2019). Zipf’s law. https://en.wikipedia.org/wiki/Zipf%27s_law .Zipf, G. (1935).
The Psychology of Language: An Introduction to Dynamic Philology . Cambridge, MA:M.I.T. Press. 17 . . . . . . . . Rank G r o w t h and v a r i an c e r a t e s Figure 1: U.S. capital distribution first-order parameters (smoothed): σ k (black), − g k (red, broken). - - - - Rank W e i gh tt