[PDF] Zipf's Law for Atlas Models

Abstract

A set of data with positive values follows a Pareto distribution if the log-log plot of value versus rank is approximately a straight line. A Pareto distribution satisfies Zipf's law if the log-log plot has a slope of -1. Since many types of ranked data follow Zipf's law, it is considered a form of universality. We propose a mathematical explanation for this phenomenon based on Atlas models and first-order models, systems of positive continuous semimartingales with parameters that depend only on rank. We show that the stable distribution of an Atlas model will follow Zipf's law if and only if two natural conditions, conservation and completeness, are satisfied. Since Atlas models and first-order models can be constructed to approximate systems of time-dependent rank-based data, our results can explain the universality of Zipf's law for such systems. However, ranked data generated by other means may follow non-Zipfian Pareto distributions. Hence, our results explain why Zipf's law holds for word frequency, firm size, household wealth, and city size, while it does not hold for earthquake magnitude, cumulative book sales, the intensity of solar flares, and the intensity of wars, all of which follow non-Zipfian Pareto distributions.

Full PDF

ZZipf ’s law for Atlas models

Ricardo T. Fernholz Robert Fernholz May 8, 2019

Abstract

A set of data with positive values follows a

Pareto distribution if the log-log plot of value versusrank is approximately a straight line. A Pareto distribution satisﬁes

Zipf’s law if the log-log plot has aslope of −

1. Since many types of ranked data follow Zipf’s law, it is considered a form of universality.We propose a mathematical explanation for this phenomenon based on Atlas models and ﬁrst-ordermodels, systems of positive continuous semimartingales with parameters that depend only on rank. Weshow that the stable distribution of an Atlas model will follow Zipf’s law if and only if two naturalconditions, conservation and completeness, are satisﬁed. Since Atlas models and ﬁrst-order models canbe constructed to approximate systems of time-dependent rank-based data, our results can explain theuniversality of Zipf’s law for such systems. However, ranked data generated by other means may follownon-Zipﬁan Pareto distributions. Hence, our results explain why Zipf’s law holds for word frequency,ﬁrm size, household wealth, and city size, while it does not hold for earthquake magnitude, cumulativebook sales, the intensity of solar ﬂares, and the intensity of wars, all of which follow non-Zipﬁan Paretodistributions.

MSC2010 subject classiﬁcations:

The authors thank Xavier Gabaix, Ioannis Karatzas, members of the Intech SPT seminar, and participants of the 2017Thera Stochastics Conference for their invaluable comments and suggestions regarding this research. Claremont McKenna College, 500 E. Ninth St., Claremont, CA 91711, [email protected]. Intech Investments, One Palmer Square, Princeton, NJ 08542, [email protected]. a r X i v : . [ q -f i n . E C ] M a y Introduction

A set of empirical data with positive values follows a

Pareto distribution if the log-log plot of the valuesversus rank is approximately a straight line. Pareto distributions are ubiquitous in the social and naturalsciences, appearing in a wide range of ﬁelds from geology to economics (Simon, 1955; Bak, 1996; Newman,2005). A Pareto distribution satisﬁes

Zipf ’s law if the log-log plot has a slope of −

1, following Zipf (1935),who noticed that the frequency of written words in English follows such a distribution. We shall refer tothese distributions as

Zipﬁan . Zipf’s law is considered a form of universality, since Zipﬁan distributions occuralmost as frequently as Pareto distributions. Nevertheless, according to Tao (2012), “mathematicians do nothave a fully satisfactory and convincing explanation for how the law comes about and why it is universal.”We propose a mathematical explanation of Zipf’s law based on

Atlas models and ﬁrst-order models ,systems of continuous semimartingales with parameters that depend only on rank. Atlas and ﬁrst-ordermodels can be constructed to approximate empirical systems of time-dependent rank-based data that exhibitsome form of stability (Fernholz, 2002; Banner et al., 2005). Atlas models have stable distributions that arePareto, while ﬁrst-order models are more general than Atlas models and can be constructed to have anystable distribution. We show that under two natural conditions, conservation and completeness, the stabledistribution of an Atlas model will satisfy Zipf’s law. However, many empirical systems of time-dependentrank-based data generate distributions with log-log plots that are not actually straight lines but rather areconcave curves with a tangent of slope − quasi-Zipﬁan, and we shall use ﬁrst-order models to approximate the systems thatgenerate them.The dichotomy between Zipﬁan and non-Zipﬁan Pareto distributions is of interest to us here. We ﬁndthat Zipﬁan and quasi-Zipﬁan distributions are usually generated by systems of time-dependent rank-baseddata, and it is this class of systems that we can approximate by Atlas models or ﬁrst-order models. Incontrast, data that follow non-Zipﬁan Pareto distributions are usually generated by other means, oftenof a cumulative nature. Examples of time-dependent rank-based systems that generate Zipﬁan or quasi-Zipﬁan distributions include the market capitalization of companies (Simon and Bonini, 1958; Fernholz,2002), the population of cities (Gabaix, 1999), the employees of ﬁrms (Axtell, 2001), the income and wealthof households (Atkinson et al., 2011; Blanchet et al., 2017), and the assets of banks (Fernholz and Koch,2017). From the comprehensive survey of Newman (2005) we ﬁnd an assortment of non-Zipﬁan Paretodistributions: the magnitude of earthquakes, citations of scientiﬁc papers, copies of books sold, the diameterof moon craters, the intensity of solar ﬂares, and the intensity of wars, all of which are cumulative systems.Consider, for example, the magnitude of earthquakes: each new earthquake adds a new observation to thedata, but once recorded, these observations do not change over time. Such cumulative systems may generatePareto distributions, but we have no reason to believe that these distributions will be Zipﬁan.In the next sections we ﬁrst review the properties of Atlas models and ﬁrst-order models, and thencharacterize Zipﬁan and quasi-Zipﬁan systems using these models. We apply our results to the capitalizationof U.S. companies, with an analysis of the corresponding quasi-Zipﬁan distribution curve. Finally, we considera number of examples of other time-dependent systems as well as other approaches that have been used tocharacterize these systems. Proofs of all propositions are in the appendix, along with an example. We shall use systems of positive continuous semimartingales { X , . . . , X n } , with n >

1, to approximatesystems of time-dependent data. For such a system we deﬁne the rank function to be the random permutation r t ∈ Σ n such that r t ( i ) < r t ( j ) if X i ( t ) > X j ( t ) or if X i ( t ) = X j ( t ) and i < j . Here Σ n is the symmetricgroup on n elements. The rank processes X (1) ≥ · · · ≥ X ( n ) are deﬁned by X ( r t ( i )) ( t ) = X i ( t ).We have assumed that X i ( t ) > t ∈ [0 , ∞ ) and i = 1 , . . . , n , a.s., so we can consider the logarithms ofthese processes. The processes (log X ( k ) − log X ( k +1) ), for k = 1 , . . . , n −

1, are called gap processes , and we2eﬁne Λ

Xk,k +1 to be the local time at the origin for (log X ( k ) − log X ( k +1) ), with Λ X , = Λ Xn,n +1 ≡ X i spend no local time at triple points, then the rank processes satisfy d log X ( k ) ( t ) = n (cid:88) i =1 { r t ( i )= k } d log X i ( t ) + 12 d Λ Xk,k +1 ( t ) − d Λ Xk − ,k ( t ) , a.s. , (2.1)for k = 1 , . . . , n (Fernholz, 2002; Banner and Ghomrasni, 2008).We are interested in systems that show some kind of stability by rank, at least asymptotically. Since wemust apply our deﬁnition of stability to systems of empirical data as well as to continuous semimartingales,we use asymptotic time averages rather than expectations for our deﬁnitions. For the systems of continuoussemimartingales we consider, the law of large numbers implies that the asymptotic time averages are equalto the expectations (Banner et al., 2005; Ichiba et al., 2011). Deﬁnition 2.1. (Fernholz, 2002) A system of positive continuous semimartingales { X , . . . , X n } is asymp-totically stable if1. lim t →∞ t (cid:0) log X (1) ( t ) − log X ( n ) ( t ) (cid:1) = 0 , a.s. ( coherence );2. lim t →∞ t Λ Xk,k +1 ( t ) = λ k,k +1 > , a.s.;3. lim t →∞ t (cid:10) log X ( k ) − log X ( k +1) (cid:11) t = σ k,k +1 > , a.s.;for k = 1 , . . . , n − k = 1 , . . . , n , let us deﬁne the processes X [ k ] (cid:44) X (1) + · · · + X ( k ) , (2.2)in which case we can express X [ k ] in terms of the X i and Λ Xk,k +1 . Lemma 2.2.

Let X , . . . , X n be positive continuous semimartingales that satisfy (2.1) . Then dX [ k ] ( t ) = n (cid:88) i =1 { r t ( i ) ≤ k } dX i ( t ) + 12 X ( k ) ( t ) d Λ Xk,k +1 ( t ) , a.s. (2.3) for k = 1 , . . . , n . Lemma 2.2 describes the dynamic relationship between the combined value X [ k ] of the k top ranks andthe local time process Λ Xk,k +1 . This local time process compensates for turnover into and out of the top k ranks. Over time, some of the higher-ranked processes will decrease and exit from the top ranks, while someof the lower-ranked processes will increase and enter those top ranks. The process of entry and exit into andout of the top k ranks is quantiﬁed by the last term in (2.3), which measures the replacement of the topranks of the system by lower ranks.Lemma 2.2 allows us to express the local time Λ Xk,k +1 in terms of X i , X ( k ) , and X [ k ] , all of which areobservable. Hence, the parameters λ k,k +1 can be expressed as λ k,k +1 = lim T →∞ T (cid:90) T (cid:18) dX [ k ] ( t ) X ( k ) ( t ) − n (cid:88) i =1 { r t ( i ) ≤ k } dX i ( t ) X ( k ) ( t ) (cid:19) , a.s. , (2.4)for k = 1 , . . . , n −

1. In a similar fashion we can write σ k,k +1 = lim T →∞ T (cid:90) T d (cid:10) log X ( k ) − log X ( k +1) (cid:11) t , a.s. , (2.5)for k = 1 , . . . , n −

1. Equations (2.4) and (2.5) will allow us to deﬁne parameters equivalent to λ k,k +1 and σ k,k +1 for time-dependent systems of empirical data.3 Atlas models and ﬁrst-order models

The simplest system we shall consider is an

Atlas model (Fernholz, 2002), a system of positive continuoussemimartingales { X , . . . , X n } deﬁned by d log X i ( t ) = − g dt + ng { r t ( i )= n } dt + σ dW i ( t ) , (3.1)where g and σ are positive constants and ( W , . . . , W n ) is a Brownian motion. Atlas models are asymptoti-cally stable with parameters λ k,k +1 = 2 kg, and σ k,k +1 = 2 σ , (3.2)for k = 1 , . . . , n − X i in an Atlas model are exchangeable, so each X i asymptotically spends equal time ineach rank and hence has zero asymptotic log-drift. The gap processes (log X ( k ) − log X ( k +1) ) for Atlas modelshave stable distributions that are independent and exponentially distributed withlim T →∞ T (cid:90) T (cid:0) log X ( k ) ( t ) − log X ( k +1) ( t ) (cid:1) dt = σ k,k +1 λ k,k +1 , a.s. , (3.3)for k = 1 , . . . , n − X ( k ) versus rank will belim T →∞ T (cid:90) T log X ( k ) ( t ) − log X ( k +1) ( t )log( k ) − log( k + 1) dt (3.4)at rank k , so if we deﬁne the slope parameters s k by s k (cid:44) k lim T →∞ T (cid:90) T (cid:0) log X ( k ) ( t ) − log X ( k +1) ( t ) (cid:1) dt, (3.5)for k = 1 , . . . , n −

1, then − s k (cid:18) k (cid:19) < lim T →∞ T (cid:90) T log X ( k ) ( t ) − log X ( k +1) ( t )log( k ) − log( k + 1) dt < − s k , (3.6)for k = 1 , . . . , n −

1. Accordingly, for large enough k the slope parameter s k will be approximately equalto minus the slope given in (3.4). For expositional simplicity, we shall treat the s k as if they measured thetrue log-log slopes between adjacent ranks, but it is important to remember that this equivalence is only asaccurate as the range in inequality (3.6).For an Atlas model, it follows from (3.2) and (3.3) that s k = σ g , a.s. , (3.7)for k = 1 , . . . , n −

1, so the stable distribution of an Atlas model follows a Pareto distribution, at least withinthe approximation (3.6), and when σ = 2 g, (3.8)it follows Zipf’s law.A modest generalization of the Atlas model is a ﬁrst-order model (Fernholz, 2002; Banner et al., 2005),a system of positive continuous semimartingales { X , . . . , X n } with d log X i ( t ) = g r t ( i ) dt + G n { r t ( i )= n } dt + σ r t ( i ) dW i ( t ) , (3.9)where σ , . . . , σ n are positive constants, g , . . . , g n are constants satisfying g + · · · + g n ≤ g + · · · + g k < k < n, (3.10)4 n = − ( g + · · · + g n ), and ( W , . . . , W n ) is a Brownian motion. First-order models are asymptotically stablewith parameters λ k,k +1 = − (cid:0) g + · · · + g k (cid:1) , a.s. , (3.11)and σ k,k +1 = σ k + σ k +1 , a.s. , (3.12)for k = 1 , . . . , n − simple if there is a positive constant g suchthat g k = − g , for k = 1 , . . . , n , and the σ k are nondecreasing, with 0 < σ ≤ · · · ≤ σ n .The processes X i in a ﬁrst-order model are exchangeable, as they are for Atlas models, so again each X i asymptotically spends equal time in each rank and hence has zero asymptotic log-drift. Moreover, ﬁrst-ordermodels have asymptotically exponential gaps, and (3.3) continues to hold in this more general case (Banneret al., 2005). The slope parameters for a ﬁrst-order model are s k = k (cid:0) σ k + σ k +1 (cid:1) λ k,k +1 = − k (cid:0) σ k + σ k +1 (cid:1) (cid:0) g + · · · + g k (cid:1) , a.s. , (3.13)for k = 1 , . . . , n −

1, so the stable distribution of a ﬁrst-order model is not conﬁned to the class of Paretodistributions.A further generalization to hybrid Atlas models, systems of processes with growth rates and variancerates that depend both on rank and on name (denoted by the index i ), was introduced by Ichiba et al.(2011), who showed that these more general systems are also asymptotically stable. In a hybrid Atlas modelthe processes are not necessarily exchangeable, so processes occupying a given rank need not have the samegrowth rates and variance rates, and the asymptotic distribution of the gap processes may be mixtures ofexponential distributions rather than pure exponentials (Ichiba et al., 2011). Nevertheless, although we canexpect (3.3) to hold precisely only for systems in which the growth rates and variance rates are determinedby rank alone, in many cases this relation can still provide a reasonably accurate characterization of theinvariant distribution of the system.It is convenient to consider families of Atlas models and ﬁrst-order models that share the same parameters,and for this purpose we deﬁne a ﬁrst-order family to be a sequence of constants { g k , σ k } k ∈ N , with g + · · · + g k < ,σ k > , (3.14)for k ∈ N . A ﬁrst-order family generates a class of ﬁrst-order models { X , . . . , X n } , for n ∈ N , each deﬁnedas in (3.9) with the common parameters g k and σ k , the positive square root of σ k , and G n = − ( g + · · · + g n ),for n ∈ N . A ﬁrst-order family is simple if all the ﬁrst-order models generated by it are simple. An Atlasfamily is a ﬁrst-order family with g k = − g < σ k = σ >

0, for k ∈ N .For ﬁrst-order families, the parameters σ k,k +1 , λ k,k +1 and s k are deﬁned uniquely for k ∈ N by (3.2),(3.7), (3.11), (3.12), and (3.13), as the case may be. Let us note that the slope parameters s k given by (3.13)do not depend on the number of processes in the model as long as n > k , so a ﬁrst-order family deﬁnesa unique asymptotic distribution curve. These families will allow us to derive results about asymptoticdistribution curves without repeatedly reciting the characteristics of individual Atlas or ﬁrst-order models.Moreover, we shall only consider values derived from the models in a ﬁrst-order family when these modelsare in their stable distribution. Essentially, we need only consider the values that result from the parameters { g k , σ k } k ∈ N , and we can ignore the models themselves.A model { X , . . . , X n } in a simple ﬁrst-order family will satisfy d log X i ( t ) = − g dt + ng { r t ( i )= n } dt + σ r t ( i ) dW i ( t ) , where g >

0, the σ k are nondecreasing, and ( W , . . . , W n ) is a Brownian motion. Hence, for a simpleﬁrst-order family, λ k,k +1 = 2 kg, a.s. , (3.15)5nd s k = σ k + σ k +1 g , a.s. , (3.16)for k ∈ N , with the s k nondecreasing. Hence, in this case the log-log plot of the stable distribution will beconcave.It appears that actual empirical time-dependent systems often behave like simple ﬁrst-order families,and we analyze one such example below, the capitalizations of U.S. companies (see Figures 1 and 2). Thecondition that the variance rates increase at the lower ranks seems natural — even in the original observationof Brown (1827) it would seem likely that the water molecules would have buﬀeted the smaller particles morevigorously than the larger ones. Suppose that { Y , . . . , Y n } , for n >

1, is an asymptotically stable system of positive continuous semimartin-gales with rank function ρ t ∈ Σ n such that ρ t ( i ) < ρ t ( j ) if Y i ( t ) > Y j ( t ) or if Y i ( t ) = Y j ( t ) and i < j . Let { Y (1) ≥ · · · ≥ Y ( n ) } be the corresponding rank processes with Y ( ρ t ( i )) ( t ) = Y i ( t ). As in Deﬁnition 2.1, for theprocesses Y . . . , Y n we can deﬁne the parameters λ k,k +1 (cid:44) lim t →∞ t Λ Yk,k +1 ( t ) > , a.s. , σ k,k +1 (cid:44) lim t →∞ t (cid:10) log Y ( k ) − log Y ( k +1) (cid:11) t > , a.s. , (4.1)for k = 1 , . . . , n −

1, and by convention λ , = 0, σ , = σ , , and σ n,n +1 = σ n − ,n . Deﬁnition 4.1. (Fernholz, 2002) Let { Y , . . . , Y n } be an asymptotically stable system of positive continuoussemimartingales with parameters λ k,k +1 and σ k,k +1 , for k = 1 , . . . , n , deﬁned by (4.1). Then the ﬁrst-orderapproximation for { Y , . . . , Y n } is the ﬁrst-order model { X , . . . , X n } with d log X i ( t ) = g r t ( i ) dt + G n { r t ( i )= n } dt + σ r t ( i ) dW i ( t ) , (4.2)for i = 1 , . . . , n , where r t ∈ Σ n is the rank function for the X i , the parameters g k and σ k are deﬁned by g k = 12 λ k − ,k − λ k,k +1 , for k = 1 , . . . , n − , and g n = g + · · · + g n − n − ,σ k = 14 (cid:0) σ k − ,k + σ k,k +1 (cid:1) , for k = 1 , . . . , n, (4.3)where σ k is the positive square root of σ k , G n = − ( g + · · · + g n ), and ( W , . . . , W n ) is a Brownian motion.For the ﬁrst-order model (4.2) with parameters (4.3), equations (3.11) and (3.12) imply that λ k,k +1 = − (cid:0) g + · · · + g k (cid:1) = λ k,k +1 , a.s. , (4.4)for k = 1 , . . . , n −

1, and σ k,k +1 = σ k + σ k +1 = 14 (cid:0) σ k − ,k + 2 σ k,k +1 + σ k +1 ,k +2 (cid:1) , a.s. , for k = 1 , . . . , n −

1. Hence, (3.3) becomeslim T →∞ T (cid:90) T (cid:0) log X ( k ) ( t ) − log X ( k +1) ( t ) (cid:1) dt = σ k,k +1 λ k,k +1 = σ k − ,k + 2 σ k,k +1 + σ k +1 ,k +2 λ k,k +1 , a.s. , (4.5)6or k = 1 , . . . , n −

1. If the processes Y . . . , Y n satisfylim T →∞ T (cid:90) T (cid:0) log Y ( k ) ( t ) − log Y ( k +1) ( t ) (cid:1) dt ∼ = σ k,k +1 λ k,k +1 , (4.6)for k = 1 , . . . , n −

1, then the stable distribution (4.5) for the ﬁrst-order approximation will be a smoothedversion of the stable distribution (4.6) for the Y i . The approximation (4.6) will be accurate if the gapseries (log Y ( k ) ( t ) − log Y ( k +1) ( t )) behave like reﬂected Brownian motion, which has an exponential stabledistribution. We can expect this approximation to hold when the behavior of the processes Y . . . , Y n is determined mostly by rank. The accuracy of this approximation is likely to deteriorate when moreidiosyncratic characteristics are present, characteristics that depend on the indices i .Now suppose that we have a time-dependent system { Z ( τ ) , Z ( τ ) , . . . } of positive-valued data observedat times τ ∈ { , , . . . , T } . Let N τ = { Z ( τ ) , Z ( τ ) , . . . } and N = N ∧ · · · ∧ N T , (4.7)where ρ τ : N → N be the rank function for the system { Z ( τ ) , Z ( τ ) , . . . } suchthat ρ τ restricted to the subset { , . . . , N τ } is the permutation with ρ τ ( i ) < ρ τ ( j ) if Z i ( τ ) > Z j ( τ ) or if Z i ( τ ) = Z j ( τ ) and i < j , and for i > N τ , ρ τ ( i ) = i . We deﬁne the ranked values { Z (1) ( τ ) ≥ Z (2) ( τ ) ≥ · · · } such that Z ( ρ τ ( i )) ( τ ) = Z i ( τ ) for i ≤ N τ , and for deﬁniteness we can let Z ( k ) ( τ ) = 0 for k > N τ . With thesedeﬁnitions, we have Z [ k ] ( τ ) = Z (1) ( τ ) + · · · + Z ( k ) ( τ ) , for k = 1 , . . . , N and τ ∈ { , , . . . , T } .We can mimic the time averages (2.4) and (2.5) to deﬁne the parameters λ k,k +1 (cid:44) T − T − (cid:88) τ =1 (cid:18) Z [ k ] ( τ + 1) − Z [ k ] ( τ ) Z ( k ) ( τ ) − N (cid:88) i =1 { ρ τ ( i ) ≤ k } Z i ( τ + 1) − Z i ( τ ) Z ( k ) ( τ ) (cid:19) , (4.8)and σ k,k +1 (cid:44) T − T − (cid:88) τ =1 (cid:16)(cid:0) log Z ( k ) ( τ + 1) − log Z ( k +1) ( τ + 1) (cid:1) − (cid:0) log Z ( k ) ( τ ) − log Z ( k +1) ( τ ) (cid:1)(cid:17) (4.9)for k = 1 , . . . , N −

1, and by convention λ , = 0 and σ , = σ , . Deﬁnition 4.2.

Suppose that { Z ( τ ) , Z ( τ ) , . . . } is a time-dependent system of positive-valued data with N , λ k,k +1 , and σ k,k +1 deﬁned as in (4.7), (4.8), and (4.9). The ﬁrst-order approximation of { Z ( τ ) , Z ( τ ) , . . . } is the ﬁrst-order family { g k , σ k } k ∈ N with g k = 12 λ k − ,k − λ k,k +1 , for k = 1 , . . . , N − , and g k = g + · · · + g N − N − , for k ≥ N,σ k = 14 (cid:0) σ k − ,k + σ k,k +1 (cid:1) , for k = 1 , . . . , N − , and σ k = σ N − , for k ≥ N, (4.10)With this deﬁnition the slope parameters s k given in (3.13) are constant for k ≥ N . If the data satisfy1 T T (cid:88) τ =1 (cid:0) log Z ( k ) ( τ ) − log Z ( k +1) ( τ ) (cid:1) dt ∼ = σ k,k +1 λ k,k +1 , (4.11)for k ∈ N , then the stable distribution (4.5) for the ﬁrst-order approximation will be a smoothed version ofthe distribution (4.11) for the data { Z (1) ( τ ) , Z (2) ( τ ) , . . . } . As was the case with (4.6), the approximation(4.11) will be accurate if the gap series (log Z ( k ) ( τ ) − log Z ( k +1) ( τ )) are distributed like reﬂected Brownianmotion. We shall say that a system of time-dependent data that satisﬁes (4.11) is rank-based , and we canexpect this approximation to hold when the behavior of the data is determined mostly by rank. We shouldalso note that (4.8), (4.9), and (4.11) are not true asymptotic values, but rather estimates based on limiteddata. 7 Zipﬁan systems of time-dependent data

Zipf’s law originally referred to the frequency of words in a written language (Zipf, 1935), with the system { Z ( τ ) , Z ( τ ) , . . . } , where Z i ( τ ) represents the number of occurrences of the i th word in a language at time τ .To measure the relative frequency of written words in a language it is not possible to observe all the writtenwords in that language. Instead, the words must be sampled, where a random sample is selected (withoutreplacement), and the frequency versus rank of this random sample is studied. For example, in Wikipedia(2019) 10 million words in each of 30 languages were sampled, and the resulting distribution curves created.If the sample is large enough, the distribution of the sampled data should not diﬀer materially from thedistribution of the entire data set, at least for the higher ranks.An additional advantage that arises from using sampled data is that the total number of data in thesample remains constant over time. The total number of written words that appear in a language is likely toincrease over time, and this increase could bias estimates of some parameters. Sampling the data will removesuch a trend from the data, since a constant number of words can be sampled at each time. Accordingly,in all cases we shall assume that global trends have been removed from the data, either by sampling or bysome other means of detrending.Since we have assumed that we have a constant sample size or that the data have been detrendedsomehow, the total count of our sampled data will remain constant, so Z ( τ ) + Z ( τ ) + · · · = constant , (5.1)for τ ∈ { , , . . . , T } , where in the case of the Wikipedia words the constant would be 10 million.Suppose we have a time-dependent system of positive-valued data { Z ( τ ) , Z ( τ ) , . . . } and we observe thetop n ranks, for 1 < n < N , with N from (4.7), along with Z [ n ] ( τ ) = Z (1) ( τ ) + · · · + Z ( n ) ( τ ) . Since the total value of the sampled data in (5.1) is constant, for large enough n it is reasonable to expectthe relative change of the top n ranks to satisfy Z [ n ] ( τ + 1) − Z [ n ] ( τ ) Z [ n ] ( τ ) ∼ = 0 , (5.2)as n becomes large. This condition is essentially a “conservation of mass” criterion for { Z ( τ ) , Z ( τ ) , . . . } ,and we would like to interpret this in terms of ﬁrst-order families.In all that follows, for a ﬁrst-order family { g k , σ k } k ∈ N we shall use the notation E n to denote the ex-pectation with respect to the stable distribution for the model { X , . . . , X n } deﬁned by that family. Thefollowing deﬁnition is motivated by the condition (5.2). Deﬁnition 5.1.

The ﬁrst-order family { g k , σ k } k ∈ N is conservative iflim n →∞ E n (cid:20) dX [ n ] ( t ) X [ n ] ( t ) (cid:21) = 0 . (5.3)For the system { Z ( τ ) , Z ( τ ) , . . . } and for n < N , the eﬀect of processes that leave the top n ranks overthe time interval [ τ, τ + 1] and are replaced by processes from the lower ranks is measured by Z [ n ] ( τ + 1) − N (cid:88) i =1 { ρ τ ( i ) ≤ n } Z i ( τ + 1) , or (cid:0) Z [ n ] ( τ + 1) − Z [ n ] ( τ ) (cid:1) − (cid:18) N (cid:88) i =1 { ρ τ ( i ) ≤ n } (cid:0) Z i ( τ + 1) − Z i ( τ ) (cid:1)(cid:19) . n , i.e., that1 T − T − (cid:88) τ =1 (cid:20) Z [ n ] ( τ + 1) − Z [ n ] ( τ ) Z [ n ] ( τ ) − N (cid:88) i =1 { ρ τ ( i ) ≤ n } Z i ( τ + 1) − Z i ( τ ) Z [ n ] ( τ ) (cid:21) ∼ = 0 , (5.4)for large enough n . In terms of the ﬁrst-order approximation { g k , σ k } k ∈ N to { Z ( τ ) , Z ( τ ) , . . . } , the corre-sponding condition will belim T →∞ T (cid:90) T (cid:18) dX [ n ] ( t ) X [ n ] ( t ) − N (cid:88) i =1 { r t ( i ) ≤ n } dX i ( t ) X [ n ] ( t ) (cid:19) ∼ = 0 , a.s. , for large enough n , where N > n and { X , . . . , X N } is a ﬁrst-order model deﬁned by { g k , σ k } k ∈ N . By (2.3),this is equivalent to lim T →∞ T (cid:90) T X ( n ) ( t )2 X [ n ] ( t ) d Λ Xn,n +1 ( t ) ∼ = 0 , a.s. , for large enough n . Sincelim T →∞ T (cid:90) T d Λ Xn,n +1 ( t ) = λ n,n +1 = − (cid:0) g + · · · + g n (cid:1) , a.s. , (5.5)condition (5.4) can be interpreted aslim T →∞ T (cid:90) T − (cid:0) g + · · · + g n (cid:1) X ( n ) ( t ) X [ n ] ( t ) dt ∼ = 0 , a.s. , for large enough n . For the model { X , . . . , X n } deﬁned by the family { g k , σ k } k ∈ N , G n = − (cid:0) g + · · · + g n (cid:1) ,so the following deﬁnition is derived from condition (5.4). Deﬁnition 5.2.

The ﬁrst-order family { g k , σ k } k ∈ N is complete iflim n →∞ E n (cid:20) G n X ( n ) ( t ) X [ n ] ( t ) (cid:21) = 0 . (5.6)For an Atlas family or simple ﬁrst-order family, (5.6) is equivalent tolim n →∞ E n (cid:20) ngX ( n ) ( t ) X [ n ] ( t ) (cid:21) = 0 , (5.7)since G n = ng . Deﬁnition 5.3.

An Atlas family or ﬁrst-order family is

Zipﬁan if its slope parameters s k = 1, for k ∈ N . Atime-dependent rank-based system is Zipﬁan if its ﬁrst-order approximation is Zipﬁan.

Proposition 5.4.

An Atlas family is Zipﬁan if and only if it is conservative and complete.

Since many empirical distributions are not Zipﬁan but rather quasi-Zipﬁan, we need to formalize thisconcept for ﬁrst-order models.

Deﬁnition 5.5.

A ﬁrst-order family is quasi-Zipﬁan if its slope parameters s k are nondecreasing with s ≤ k →∞ s k ≥ , (5.8)where this limit includes divergence to inﬁnity. A time-dependent rank-based system is quasi-Zipﬁan if itsﬁrst-order approximation is quasi-Zipﬁan. 9ecause the slope parameters s k are approximately equal to minus the slope of a log-log plot of sizeversus rank, Deﬁnition 5.5 implies that a time-dependent rank-based system will be quasi-Zipﬁan if thislog-log plot is concave with slope not steeper than − − Proposition 5.6.

If a simple ﬁrst-order family is conservative, complete, and satisﬁes lim n →∞ E n (cid:20) X (1) ( t ) X [ n ] ( t ) (cid:21) ≤ , (5.9) then it is quasi-Zipﬁan. We show in Example A.1 below that a conservative and complete ﬁrst-order family { g k , σ k } k ∈ N with g k = − g <

0, for k ∈ N , can have a Pareto distribution with log-log slope steeper than − σ k are notnondecreasing. Here we apply the methods we developed above to an actual time-dependent rank-based system. We alsodiscuss a number of other such systems, as well as other approaches to time-dependent rank-based systems.

Example 6.1.

Market capitalization of companies.

The market capitalization of U.S. companies was studied as early as Simon and Bonini (1958), and herewe follow the methodology of Fernholz (2002). The capitalization of a company is deﬁned as the price of thecompany’s stock multiplied by the number of shares outstanding. Ample data are available for stock prices,and this allows us to estimate the ﬁrst-order parameters we introduced in the previous sections.Figure 1 shows the smoothed ﬁrst-order parameters σ k and − g k for the U.S. capital distribution for the10 year period from January 1990 to December 1999. The capitalization data we used were from the monthlystock database of the Center for Research in Securities Prices at the University of Chicago. The market weconsider consists of the stocks traded on the New York Stock Exchange, the American Stock Exchange, andthe NASDAQ Stock Market, after the removal of all Real Estate Investment Trusts, all closed-end funds,and those American Depositary Receipts not included in the S&P 500 Index. The parameters in Figure 1correspond to the 5000 stocks with the highest capitalizations each month. The ﬁrst-order parameters g k and σ k were calculated as in (4.3) from the parameters λ k,k +1 and σ k,k +1 , and then smoothed by convolutionwith a Gaussian kernel with ± .

16 standard deviations spanning 100 months on the horizontal axis, withreﬂection at the ends of the data.We see in Figure 1 that the values of the parameters − g k are relatively constant compared to the pa-rameters σ k , which increase almost linearly with rank. The near-constant − g k suggest that the ﬁrst-orderapproximation will generate a simple ﬁrst-order family. In Figure 2, the distribution curve for the capital-izations is represented by the black curve, which represents the average of the year-end capital distributionsfor the ten years spanned by the data. The broken red curve is the ﬁrst-order approximation of the distri-bution following (4.5). The two curves are quite close, and this indicates that the time-dependent systemof company capitalizations is mostly rank-based. The black dot on the curve between ranks 100 and 500 isthe point at which the log-log slope of the tangent to the curve is −

1, so this is a quasi-Zipﬁan distribution,consistent with Proposition 5.6. Note that if we had considered only the top 100 companies, the completenesscondition, Deﬁnition 5.2, would have failed, as we would expect for an incomplete distribution.

Example 6.2.

Frequency of written words.

Word frequency is the origin of Zipf’s law (Zipf, 1935), but testing our methodology with word-frequencycould be diﬃcult. Ideally, we would like to construct a ﬁrst-order approximation for the data and comparethe ﬁrst-order distribution to that of the original data. However, the parameters λ k,k +1 and σ k,k +1 for10he top-ranked words in a language are likely to be diﬃcult to estimate over any reasonable time frame,since the top-ranked words probably seldom change ranks. Nevertheless, while the top ranks may requirecenturies of data for accurate estimates, the lower ranks could be amenable to analysis similar to that whichwe carried out for company capitalizations. Moreover, it might be possible to combine, for example, all theIndo-European languages and generate accurate estimates of the λ k,k +1 and σ k,k +1 even for the top ranksof the combined data.We can see from the remarkable chart in Wikipedia (2019) that the log-log plots for 30 diﬀerent languagesare (almost) straight. Actually, these plots are slightly concave, or quasi-Zipﬁan in nature. It is possiblethat this slight curvature is due to sampling error at the lower ranks, which would raise the variances andsteepen the slope, but this would have to be determined by studying the actual data. Example 6.3.

Random growth processes.

Economists have traditionally used random growth processes to model time-dependent systems withquasi-Zipﬁan distributions. For example, these processes were used by Gabaix (1999) to model the dis-tribution of city populations and by Blanchet et al. (2017) to construct a piecewise approximation to thedistribution curves for the income and wealth of U.S. households. A random growth process is an Itˆo processof the form dX ( t ) X ( t ) = µ ( X ( t )) dt + σ ( X ( t )) dW ( t ) , (6.1)where W is Brownian motion and µ and σ are well-behaved real-valued functions. We can convert this intologarithmic form by Itˆo’s rule, in which case d log X ( t ) = (cid:18) µ ( X ( t )) − σ ( X ( t ))2 (cid:19) dt + σ ( X ( t )) dW ( t ) , a.s. (6.2)We shall assume that this equation has at least a weak solution with X ( t ) >

0, a.s., and that the solutionhas a stable distribution.Let us construct n i.i.d. copies X , . . . , X n of X , all deﬁned by (6.1) or, equivalently, by (6.2), andassume that the X i are all in their common stable distribution. Let us assume that the X i spend no localtime at triple points, so we can deﬁne the rank processes and (2.1) and (2.3) will be valid. If the systemis asymptotically stable we can calculate the corresponding rank-based growth rates g k , but if we know thestable distribution of the original process (6.1), then there is a simpler way to proceed.If we know the common stable distribution of the X i , then we can calculate expectations under thisstable distribution and let g k = E (cid:20) µ ( X ( k ) ( t )) − σ ( X ( k ) ( t ))2 (cid:21) and σ k = E (cid:2) σ ( X ( k ) ( t ) (cid:3) , (6.3)for k = 1 , . . . , n . Under appropriate regularity conditions on the µ and σ , the expectations here willbe equal to the asymptotic time averages of the functions. Since the X i are stable, the geometric mean (cid:0) X X . . . X n (cid:1) /n = (cid:0) X (1) X (2) . . . X ( n ) (cid:1) /n will also be stable, so (cid:0) g + · · · + g n (cid:1) t = E (cid:2) log (cid:0) X (1) ( t ) · · · X ( n ) ( t ) (cid:1) − log (cid:0) X (1) (0) · · · X ( n ) (0) (cid:1)(cid:3) = 0 . Hence, g + · · · + g n = 0 , with g + · · · + g k < , for k < n, (6.4)so the g k and σ k deﬁne the ﬁrst-order model d log Y i ( t ) = g r t ( i ) dt + σ r t ( i ) dW i ( t ) , (6.5)where W , . . . , W n is n -dimensional Brownian motion. In this case, G n = 0.11f the functions µ and σ in (6.1) are smooth enough, then the system is likely to be rank-based and thestable distributions of the gap processes (log X ( k ) − log X ( k +1) ) will be (close to) exponential. In this casethe stable distribution of the ﬁrst-order model (6.5) will be close to that of the original system (6.1). Moreconditions are required to ensure that this stable distribution be quasi-Zipﬁan, and to achieve a true Zipﬁandistribution, a lower reﬂecting barrier or other equivalent device must be included in the model (Gabaix,2009). Example 6.4.

Population of cities.

The distribution of city populations is a prominent example of Zipf’s law in social science. However, asthe comprehensive cross-country investigation of Soo (2005) shows, city size distributions in most countriesare not Zipﬁan but rather quasi-Zipﬁan. Gabaix (1999) hypothesized that the quasi-Zipﬁan distribution ofU.S. city size was caused by higher population variances at the lower ranks, consistent with Proposition 5.6.Which of the deviations from Zipf’s law uncovered by Soo (2005) are due to population variances thatdecrease with increasing city size remains an open question.There is another phenomenon that occurs with city size distributions. Suppose that rather than studyinga large country like the U.S. we consider instead the populations of the cities in New York State. Accordingto the 2010 U.S. census, the largest city, New York City, had a population of 8,175,133, while the secondlargest, Buﬀalo, had only 261,310, so this distribution is non-Zipﬁan. The corresponding population of NewYork State was 19,378,102, so hypothesis (5.9) of Proposition 5.6 is satisﬁed, but nevertheless the propositionfails. This calls for an explanation, and we conjecture that while the population of the cities of New YorkState comprise a time-dependent system, this system is not rank-based. The population of New York City isnot determined merely by its rank among New York State cities, but is highly city-speciﬁc in nature. Hence,we cannot expect the stable distribution for the gap process between New York City and second-rankedBuﬀalo to be exponential, and we cannot expect the distribution of the system to be quasi-Zipﬁan.

Example 6.5.

Assets of banks.

Fernholz and Koch (2016) show that the distribution of assets held by U.S. bank holding companies,commercial banks, and savings and loan associations are all quasi-Zipﬁan. This is true despite the factthat these distributions have undergone signiﬁcant changes over the past few decades. However, as Fernholzand Koch (2017) show, the ﬁrst-order approximations of these time-dependent rank-based systems generallydo not satisfy the hypotheses of Proposition 5.6, since the parameters σ k,k +1 are, in most cases, lower forhigher values of k . Nonetheless, the parameters λ k,k +1 vary with k in such a way as to generate quasi-Zipﬁandistributions. Example 6.6.

Employees of ﬁrms.

Axtell (2001) shows that the distribution of employees of U.S. ﬁrms is close to Zipﬁan, with only slightconcavity. A number of empirical analyses have shown that for all but the tiniest ﬁrms, employment growthrates of U.S. ﬁrms do not vary with ﬁrm size (Neumark et al., 2011). This observation together with the slightconcavity demonstrated by Axtell (2001) suggests that the ﬁrst-order approximation of U.S. ﬁrm employeesis simple, which would explain its quasi-Zipﬁan nature.

We have shown that the stable distribution of an Atlas family will follow Zipf’s law if and only if two naturalconditions, conservation and completeness, are satisﬁed. We have also shown that a simple ﬁrst-order familywill have a stable distribution that is quasi-Zipﬁan if the family is conservative and complete, provided thatthe largest weight is not greater than one half. Since many systems of time-dependent rank-based empiricaldata can be approximated by Atlas families or simple ﬁrst-order families, our results oﬀer an explanationfor the universality of Zipf’s law for these systems. 12

Proofs and examples

Proof of Lemma 2.2.

Suppose that the rank processes X ( k ) satisfy (2.1), so we have d log X ( k ) ( t ) = n (cid:88) i =1 { r t ( i )= k } d log X i ( t ) + 12 d Λ Xk,k +1 ( t ) − d Λ Xk − ,k ( t ) , a.s. , for k = 1 , . . . , n . By Itˆo’s rule this is equivalent to dX ( k ) ( t ) X ( k ) ( t ) = n (cid:88) i =1 { r t ( i )= k } dX i ( t ) X i ( t ) + 12 d Λ Xk,k +1 ( t ) − d Λ Xk − ,k ( t )= n (cid:88) i =1 { r t ( i )= k } dX i ( t ) X ( k ) ( t ) + 12 d Λ Xk,k +1 ( t ) − d Λ Xk − ,k ( t ) , a.s. , for k = 1 , . . . , n . From this we have dX ( k ) ( t ) = n (cid:88) i =1 { r t ( i )= k } dX i ( t ) + 12 X ( k ) ( t ) d Λ Xk,k +1 ( t ) − X ( k ) ( t ) d Λ Xk − ,k ( t )= n (cid:88) i =1 { r t ( i )= k } dX i ( t ) + 12 X ( k ) ( t ) d Λ Xk,k +1 ( t ) − X ( k − ( t ) d Λ Xk − ,k ( t ) , a.s. , for k = 1 , . . . , n , since the support of d Λ Xk − ,k is contained in the set (cid:8) t : log X ( k − ( t ) = log X ( k ) ( t ) (cid:9) . Nowwe can add up dX (1) ( t ) + · · · + dX ( k ) ( t ) = dX [ k ] ( t ) and we have dX [ k ] ( t ) = n (cid:88) i =1 { r t ( i ) ≤ k } dX i ( t ) + 12 X ( k ) ( t ) d Λ Xk,k +1 ( t ) , a.s. , for k = 1 , . . . , n . Proof of Proposition 5.4.

For an Atlas model { X , . . . , X n } with parameters g > σ >

0, Itˆo’s ruleimplies that dX i ( t ) = (cid:18) σ − g + ng { r t ( i )= n } (cid:19) X i ( t ) dt + σX i ( t ) dW i ( t ) , a.s. , for all i = 1 , . . . , n . Hence, dX [ n ] ( t ) = (cid:18) σ − g (cid:19) X [ n ] ( t ) dt + X [ n ] ( t ) dM ( t ) + ngX ( n ) ( t ) dt, a.s. , where M is a local martingale incorporating all of the terms σ dW i ( t ). From this we have dX [ n ] ( t ) X [ n ] ( t ) = (cid:18) σ − g (cid:19) dt + dM ( t ) + ngX ( n ) ( t ) X [ n ] ( t ) dt, a.s. , so E n (cid:20) dX [ n ] ( t ) X [ n ] ( t ) (cid:21) = (cid:18) σ − g (cid:19) dt + E n (cid:20) ngX ( n ) ( t ) X [ n ] ( t ) (cid:21) dt, (A.1)and it follows from (5.3) and (5.7) that σ / g = 1. Hence, conservation and completeness imply that theAtlas family will be Zipﬁan.If the Atlas family is Zipﬁan, then σ / g = 1 and with the Atlas model { X , . . . , X n } in its stable distri-bution, then the random variables log (cid:0) X ( k ) ( t ) /X ( k +1) ( t ) (cid:1) will be independent and exponentially distributedwith mean k − , for k = 1 , . . . , n . Hence,log (cid:0) X ( n ) ( t ) /X (1) ( t ) (cid:1) = n (cid:88) k =1 log (cid:0) X ( k ) ( t ) /X ( k − ( t ) (cid:1) = − O (log n ) , a.s. , n → ∞ (Etemadi, 1983), so X ( n ) ( t ) /X (1) ( t ) = O (1 /n ) , a.s. , as n → ∞ . Hence, X [ n ] ( t ) X (1) ( t ) = n (cid:88) k =1 X ( k ) ( t ) X (1) ( t ) = O (log n ) , a.s. , as n → ∞ , and lim n →∞ nX ( n ) ( t ) X [ n ] ( t ) = 0 , a.s.Since nX ( n ) ( t ) /X [ n ] ( t ) ≤

1, we can invoke bounded convergence, solim n →∞ E n (cid:20) ngX ( n ) ( t ) X [ n ] ( t ) (cid:21) = 0 , (A.2)and the family will be complete. Since σ / g and (A.2) holds, the right-hand side of (A.1) converges tozero, so the left-hand side must also converge to zero, and the family will be conservative. Remark.

In (A.2) the inﬁnite series ∞ (cid:88) k =1 E n (cid:2) X ( k ) ( t ) (cid:3) would not converge, however, this does not aﬀect us since we consider only ﬁnite portions of the series. Proof of Proposition 5.6.

For a ﬁrst-order model { X , . . . , X n } with parameters g = · · · = g n = g > < σ ≤ · · · ≤ σ n , Itˆo’s rule implies that dX i ( t ) = (cid:18) σ r t ( i ) − g + ng { r t ( i )= n } (cid:19) X i ( t ) dt + σ r t ( i ) X i ( t ) dW i ( t ) , a.s. , for i = 1 , . . . , n . Hence, dX [ n ] ( t ) = n (cid:88) k =1 X ( k ) ( t ) (cid:18) σ k − g (cid:19) dt + dM ( t ) + ngX ( n ) ( t ) dt, a.s. , where M is a local martingale incorporating all of the terms σ r t ( i ) X i ( t ) dW i ( t ), so E n (cid:20) dX [ n ] ( t ) X [ n ] ( t ) (cid:21) = (cid:18) n (cid:88) k =1 E n (cid:20) X ( k ) ( t ) X [ n ] ( t ) (cid:21) σ k − g (cid:19) dt + E n (cid:20) ngX ( n ) ( t ) X [ n ] ( t ) (cid:21) . (A.3)If the system is conservative (5.3) and complete (5.7), then as n tends to inﬁnity the ﬁrst and last terms of(A.3) vanish and we have lim n →∞ n (cid:88) k =1 E n (cid:20) X ( k ) ( t ) X [ n ] ( t ) (cid:21) σ k g = 1 . (A.4)Let us now show that (5.9) implies that s ≤

1. Since the σ k are nondecreasing, (A.4) implies that1 ≥ lim n →∞ E n (cid:20) X (1) ( t ) X [ n ] ( t ) (cid:21) σ g + lim n →∞ n (cid:88) k =2 E n (cid:20) X ( k ) ( t ) X [ n ] ( t ) (cid:21) σ g = lim n →∞ E n (cid:20) X (1) ( t ) X [ n ] ( t ) (cid:21) σ g + (cid:18) − lim n →∞ E n (cid:20) X (1) ( t ) X [ n ] ( t ) (cid:21)(cid:19) σ g ≥ σ g + 12 σ g = s , k →∞ s k ≥ s k diverge to inﬁnity. Since the σ k are nondecreasing,as k tends to inﬁnity they must either converge to a ﬁnite value or diverge to inﬁnity. If the σ k diverge toinﬁnity, the same will be true for the s k . If lim k →∞ σ k = σ then lim k →∞ s k = σ / g , and since the σ k arenondecreasing, 1 = lim n →∞ n (cid:88) k =1 E n (cid:20) X ( k ) ( t ) X [ n ] ( t ) (cid:21) σ k g ≤ σ g . It follows that lim k →∞ s k ≥ Example A.1.

A conservative and complete ﬁrst-order family with a non-Zipﬁan Pareto distribution.

Con-sider the ﬁrst-order family { g k , σ k } k ∈ N such that, g k = g,σ k − = ρ , (A.5) σ k = 2 σ − ρ , where g > σ > g , and 0 < ρ < σ . In this case, σ k + σ k +1 = 2 σ , so, according to (3.16), the slope parameters s k for this model will be s k = σ g , for k ∈ N . Hence, the log-log plot of the stable distribution for this family is a straight line with slope − σ / g < −

1, i.e., a Pareto distribution.Because this ﬁrst-order model has the same stable distribution as an Atlas model with σ / g >

1, wecan use an argument similar to (A.2) to conclude that E n (cid:2) ngX ( n ) ( t ) /X [ n ] ( t ) (cid:3) → n → ∞ , so the familyis complete. In addition, we have E n (cid:20) dX [ n ] ( t ) X [ n ] ( t ) (cid:21) = (cid:18) n (cid:88) k =1 E n (cid:20) X ( k ) ( t ) X [ n ] ( t ) (cid:21) σ k − g (cid:19) dt + E n (cid:20) ngX ( n ) ( t ) X [ n ] ( t ) (cid:21) dt, so if we can show that for some choice of ρ ,lim n →∞ n (cid:88) k =1 E n (cid:20) X ( k ) ( t ) X [ n ] ( t ) (cid:21) σ k g, (A.6)then the conservation condition, lim n →∞ E n (cid:20) dX [ n ] ( t ) X [ n ] ( t ) (cid:21) = 0 , will also hold for the family with that value of ρ .The expectations in (A.6) are invariant with respect to the choice of ρ , so by (A.5) the sum in (A.6) willbe continuous in ρ . If we evaluate this sum at ρ = 0, only the even ranks will appear, so for large enough σ / g (cid:29)

1, lim n →∞ n (cid:88) k =1 E n (cid:20) X (2 k ) ( t ) X [2 n ] ( t ) (cid:21) σ k O (cid:18) σ lim n →∞ n (cid:88) k =1 (2 k ) − σ / g (cid:19) = O (cid:18) σ − σ / g lim n →∞ n (cid:88) k =1 k − σ / g (cid:19) = O (cid:0) σ − σ / g (cid:1) , σ tends to inﬁnity. Hence, for large enough σ / g (cid:29) ρ ∼ = 0 the sum in (A.6)will be close to zero by continuity. For large enough σ / g (cid:29)

1, the log-log slope − σ / g of the distributioncan become arbitrarily steep, so E n (cid:2) X (1) ( t ) /X [ n ] ( t ) (cid:3) ∼ = 1. In this case, for ρ ∼ = 2 σ ,lim n →∞ n (cid:88) k =1 E n (cid:20) X ( k ) ( t ) X [ n ] ( t ) (cid:21) σ k ∼ = σ ρ ∼ = σ (cid:29) g > . By continuity, for some ρ ∈ (0 , σ ), (A.6) will hold, so the family will be conservative. Hence, for thatvalue of ρ the family will be conservative and complete, but it is neither Zipﬁan nor quasi-Zipﬁan. References

Atkinson, A. B., T. Piketty, and E. Saez (2011, March). Top incomes in the long run of history.

Journal ofEconomic Literature 49 (1), 3–71.Axtell, R. (2001, September). Zipf distribution of U.S. ﬁrm sizes.

Science 293 (5536), 1818–1820.Bak, P. (1996).

How Nature Works . New York: Springer-Verlag.Banner, A., R. Fernholz, and I. Karatzas (2005). Atlas models of equity markets.

Annals of AppliedProbability 15 (4), 2296–2330.Banner, A. and R. Ghomrasni (2008, July). Local times of ranked continuous semimartingales.

StochasticProcesses and their Applications 118 (7), 1244–1253.Blanchet, T., J. Fournier, and T. Piketty (2017, March). Generalized Pareto curves: Theory and applications.Technical report, World Wealth & Income Database.Brown, R. (1827). Brownian motion. Unpublished experiment.Etemadi, N. (1983). Stability of sums of weighted nonnegative random variables.

Journal of MultivariateAnalysis 13 , 361–365.Fernholz, E. R. (2002).

Stochastic Portfolio Theory . New York: Springer-Verlag.Fernholz, R. and I. Karatzas (2009). Stochastic portfolio theory: an overview. In A. Bensoussan andQ. Zhang (Eds.),

Mathematical Modelling and Numerical Methods in Finance: Special Volume, Handbookof Numerical Analysis , Volume XV, pp. 89–168. Amsterdam: North-Holland.Fernholz, R. T. and C. Koch (2016, February). Why are big banks getting bigger?

Federal Reserve Bank ofDallas Working Paper 1604 .Fernholz, R. T. and C. Koch (2017, May). Big banks, idiosyncratic volatility, and systemic risk.

AmericanEconomic Review: Papers and Proceedings 107 (5), 603–607.Gabaix, X. (1999, August). Zipf’s law for cities: An explanation.

Quarterly Journal of Economics 114 (3),739–767.Gabaix, X. (2009, 05). Power laws in economics and ﬁnance.

Annual Review of Economics 1 (1), 255–294.Harrison, J. M. and R. J. Williams (1987). Multidimensional reﬂected Brownian motions having exponentialstationary distributions.

The Annals of Probability 15 (1), 115–137.Ichiba, T., V. Papathanakos, A. Banner, I. Karatzas, and R. Fernholz (2011). Hybrid Atlas models.

Annalsof Applied Probability 21 , 609–644. 16eumark, D., B. Wall, and J. Zhang (2011, February). Do small businesses create more jobs? New ev-idence for the United States from the National Establishment time series.

Review of Economics andStatistics 93 (1), 16–29.Newman, M. E. J. (2005, September-October). Power laws, Pareto distributions, and Zipf’s law.

Contem-porary Physics 46 (5), 323–351.Simon, H. and C. Bonini (1958). The size distribution of business ﬁrms.

American Economic Review 48 ,607–617.Simon, H. A. (1955, December). On a class of skew distribution functions.

Biometrika 42 (3/4), 425–440.Soo, K. T. (2005, May). Zipf’s law for cities: A cross-country investigation.

Regional Science and UrbanEconomics 35 (3), 239–263.Tao, T. (2012). E pluribus unum: From complexity, universality.

Daedalus 141 (3), 23–34.Wikipedia (2019). Zipf’s law. https://en.wikipedia.org/wiki/Zipf%27s_law .Zipf, G. (1935).

The Psychology of Language: An Introduction to Dynamic Philology . Cambridge, MA:M.I.T. Press. 17 . . . . . . . . Rank G r o w t h and v a r i an c e r a t e s Figure 1: U.S. capital distribution ﬁrst-order parameters (smoothed): σ k (black), − g k (red, broken). - - - - Rank W e i gh tt