A Markov process associated with plot-size distribution in Czech Land Registry and its number-theoretic properties
aa r X i v : . [ phy s i c s . s o c - ph ] D ec A Markov process associated with plot-size distribution inCzech Land Registry and its number-theoretic properties
Pavel Exner , and Petr Šeba , , November 11, 2018 Nuclear Physics Institute, Czech Academy of Sciences, 25068 ˇRež near Prague, Czech Republic Doppler Institute for Mathematical Physics and Applied Mathematics, Czech Technical University,Bˇrehová 7, 11519 Prague 1, Czech Republic Institute of Physics, Czech Academy of Sciences, Cukrovarnická 10, Prague 8, Czech Republic Department of Mathematical Physics, University of Hradec Králové, Víta Nejedlého 573, Hradec Králové,Czech Republic
Abstract
The size distribution of land plots is a result of land allocation processes in the past. In theabsence of regulation this is a Markov process leading an equilibrium described by a probabilisticequation used commonly in the insurance and financial mathematics. We support this claim byanalyzing the distribution of two plot types, garden and build-up areas, in the Czech Land Registrypointing out the coincidence with the distribution of prime number factors described by Dickmanfunction in the first case.
The distribution of commodities is an important research topic in economy – see [CC07] foran extensive literature overview. In this letter we focus on a particular case, the allocation of landrepresenting a non-consumable commodity, and a way in which the distribution is reached. Generallyspeaking, it results from a process of random commodity exchanges between agents in the situationwhen the aggregate commodity volume is conserved, in other words, one deals with pure tradingwhich leads to commodity redistribution.Models of this type were recently intensively discussed [SGG06] and are usually referred to askinetic exchange models. Our approach here will be different, being based on the concept known asperpetuity. The latter is a random variable D that satisfies a stochastic fixed-point equation the form D , a ( D + 1) , (1)where a and D are independent random variables and the symbol , means that the two sides of theequation have the same probability distribution; by an appropriate scaling, of course, the value one in (1) can be replaced by any fixed positive number. It is supposed that the distribution P ( a ) of thevariable a is given and one looks for the distribution Q ( D ) of D .The equation (1) has a solution provided the variable a satisfies mean(log a ) < , cf. [Ve79]. Itappears in the literature under various names depending on the field of application; it is known as the1ervaat perpetuity [Ve79], stochastic affine mapping, random difference equation, stochastic fix pointequation, and so on. Before proceeding further, let us remark that the equation (1) looks innocent butit is not. The situation when a is a Bernoulli variable is tricky, in particular, it was proved in [BR01]that the probability measure associated with Q ( D ) is singularly continuous in this case.Perpetuities themselves appear in different contexts. In the insurance and financial mathematics,for instance, a perpetuity represents the value of a commitment to make regular payments [GM00].Another situation where we meet perpetuities arises in connection with recursive algorithms such asthe selection procedure Quickselect – see, e.g., [HT02] or [MMS95], and they also describe randompartitioning problems [Hu05].Related quantities emerge, however, even in purely number-theoretic problems, in particular, in theprobabilistic number theory they describe the largest factor in the prime decomposition of a randominteger — see, for instance, [DG93, Cor. 2] or [HT93, MV07] and references therein. Specifically, thefollowing claim is valid:
Proposition:
Denote by p ( D, X ) the probability that a random integer n ∈ (1 , X ) has its greatestprime factor ≤ X /D . The limit Q ( D ) = lim X →∞ p ( D, X ) exists and coincides with the solution ofthe equation D , a ( D + 1) corresponding to the uniform distribution, P ( a ) = 1 .Recall that the respective Q ( D ) is known in the number theory as the Dickman function [BS07].Let us pass now to our main subject which is the land plot distribution. We observe that the presentsizes of the plots result from repeated land redistribution — land purchases and sales — in the pastwhich represents a complex allocation process.In attempt to understand it within a simple model, consider first a situation where the overall areais fixed and there are only three land owners; one can think about a small island having just threeinhabitants. We consider a discrete time and denote by D j ( n ) , j = 1 , , , the area of the land ownedby the respective holder at time t = n assuming that the overall area is equal to one. Consequently, thetriple { D k ( n ) : k = 1 , , } belongs for all n = 1 , , .. to a 3-simplex, D ( n ) + D ( n ) + D ( n ) = 1 .The land trading on the island proceeds as follows: two holders j, k with j, k ∈ { , , } such that j = k are picked randomly and they trade their lots according to the rule D j ( n + 1) = a n [ D j ( n ) + D k ( n )] D k ( n + 1) = (1 − a n )[ D j ( n ) + D k ( n )] (2)where a n ∈ (0 , are independent equally distributed random numbers. (We suppose that no landexchange involves all the three holders simultaneously.) Let us take for simplicity j = 1 and k = 2 .The simplex condition gives D ( n ) + D ( n ) = 1 − D ( n ) , thus the relation (2) implies D ( n + 1) = a n (1 − D ( n ) . In a steady situation the areas D k posses identical distributions equal to the distributionof the same random variable D ; the replacement of D and D by independent copies of D then leadsto the equation D , a (1 − D ) . A simple substitution D → − D and a → − a leads finally to theequation (1), however, the distribution of D is, of course, invariant under such a transformation, up toa mirror image. Consequently, the land trading on our island with three inhabitants leads formally tothe lot area distribution described by the perpetuity equation (1). Notice that the constant appearingin it comes from the simplex constraint D + D + D = 1 ; it can be regarded as a manifestation ofthe fact that the overall area is preserved in the trading.There is another argument that leads to the same equation and which can be applied to the caseof numerous land traders. The plot size D ( n ) owned at the instant n by a chosen one of them can be2egarded as a result of two independent actions. The first step is the sale of a random fraction a n ofthe property owned at time n − , i.e. D n − → a n D n − . The following step is the purchase of a newpiece of land of a size d n and adding it to the one mentioned. In combination, these actions give D n = a n D n − + d n , n = 1 , , . . . . (3)The process is obviously of Markov type and the distribution of the plot sizes to which it converges isgiven by the equation D , aD + d where d is the random variable associated with the acquisitions.In fact, convergence of the process is closely related to the existence of solution to this equation[Ve79]. The process is linked to the Dickman function in case that the two random variables coincide, d n = a n for all n ∈ Z + , and they are identically distributed; then (3) obviously leads to (1). Fromthe point of land plot reallocations such an assumption is an idealization and it is formally exact –as mentioned above – for three traders only. It is an open question to what extent this distributionchanges in situations with a larger number of players involved.To understand better that how the Dickmann function can arise in the process, we observe that thesolution of (1) is obtained formally as the following infinite sum [Ve79], D = ∞ X n =1 n Y k =1 a a ...a k , (4)where a , a , ..., a k , ... are independent uniformly distributed random variables. On the other hand,the Markov process (3) leads in our particular case, d k = a k , to D n +1 = a n + a n a n − + a n a n − a n − + .. + a n a n − a n − ...a D , (5)where D is the initial holding of the trader which we put equal to one. We may naturally relabel thevariables and to write the right-hand side of (5) also as a + a a + · · · ; it is crucial that this leads tothe same random variable D n +1 since all the a k ’s are independent and equally distributed. In this formthe relation between (4) and (5) is clearly seen; the question is whether the two quantities are close toeach other in the situation we are interested in . The point is that the number of the trading steps isof course finite and not very large. Even without big historical disturbances we can hardly expect thefree land trading to have a history longer than roughly three centuries. Assuming that a given plot istraded once in a generation we can thus run the process (5) realistically up to n . .Luckily enough for us the convergence of (5) to (4) is rather fast: using the mentioned relabellingwe find D − D n +1 = a a a ...a n e D , (6)where D and e D are statistically equivalent. Denoting as usual by E ( a ) the mean of a , we get therefore E ( D − D n +1 ) = E ( a ) n E ( e D ) = 2 − n , (7)since a is identically distributed in (0 , by assumption and E ( e D ) = 1 , which means that the conver-gence is exponentially fast. But this is not all, one can also show that the convergence is extremely Using a modified random variable we can rewrite (3) also in alternative forms, say, D n = a n ( D n − + c n − ) . Somemathematical results about convergence of such processes with specific random variables c can be found in [Du90]. D n +1 = a n ( D n + 1) and denote by G n ( t ) the probabilitythat D n < t . For the uniformly distributed variable a we then have G n +1 ( t ) = Z G n (cid:18) ta − (cid:19) d a , (8)and consequently, the densities g n ( t ) := G ′ n ( t ) satisfy g n +1 ( t ) = Z g n (cid:18) ta − (cid:19) d aa ; (9)the support of all these functions lies by definition of the nonnegative real axis, G n ( t ) = g n ( t ) = 0 for t < . A simple substitution u = t/a − then gives finally a relation between the densities g n +1 and g n , namely g n +1 ( t ) = Z ∞ t − g n ( u ) d uu + 1 . (10)Let us now start with the situation far from the expected equilibrium assuming, for instance, that allthe owners have at the initial instant land plots of the same are, i.e. g ( t ) = δ ( t − ; then (10) gives g ( t ) = (cid:26) if t ∈ [0 , t > g ( t ) = ln 3 if t ∈ [0 , ln (cid:0) t (cid:1) if t ∈ [1 , t > and so on. It can be seen easily from (10) that the functions g n ( t ) , n = 3 , , · · · , have the followingproperties: g n ( t ) = c n for t ∈ (0 , where c n is a constant depending on n only, and moreover, g n ( t ) is decreasing for t ∈ [1 , n ] and g n ( t ) = 0 holds for t > n . Furthermore, we have c n → e − γ as n → ∞ with γ being the Euler–Mascheroni constant — note that e − γ is the value of the Dickman function for t ∈ (0 , . Hence even if the original distribution had nothing in common with the Dickman function,the densities g n ( t ) are form robust and approach rapidly such a shape. In fact integrating further wefind that the densities g n for n > are already very close to the Dickman limit g ∞ .The conclusion for our model is that there is a chance to see the equilibrium situation in the landplot distributions provided the trading go undisturbed for at least four generations. If this is the case itmakes sense to ask about a relation between the plot distribution and the equation (1); it is clear thatonly an inspection of actual data can show whether such a model assumption is good or not.Let us thus look whether these considerations have something in common with land plot distri-bution in reality. One has to be cautious, of course, when choosing which types of plots are to beconsidered. Recall, for instance, that A. Abul-Magd tried recently to describe the wealth distributionin the ancient Egyptian society using areas of the house found by the excavations in Tell el-Amarna[AM02]. Their distribution exhibited a Pareto like distribution [Pa1897] known to describe the wealthallocation among individuals. It has an algebraic tail, and therefore it behaves in a way different fromthe Vervaat perpetuity for constant P ( a ) ; recall that the Dickman function Q ( D ) , for instance, satis-fies asymptotically the inequality Q ( D ) < D − D for large D . It is not surprising, however, that thiscase does not fit into our scheme, the aggregate volume being not locally conserved: upgrading ahouse as a result of one’s wealth need not affect the areas of the neighbouring houses.The land plots we look for have to satisfy several criteria. As the above example suggests, theyhave to be arranged in connected areas, so that the gain of a purchaser is the same as the loss of the4 Figure 1: The probability density of the normalized garden areas (full line) is compared with theDickman function (dashed line). We have used areas of 4000 gardens located in the two towns.corresponding seller. At the same time, they must be divisible so one can sell and buy parts of them.Choosing such a plot type, one can look into the land registry where the present holdings are recorded.As we have said they are the result of repeated land purchase and land sell done by the ancestors inthe past, but we are not going to look into the history being interested in the resulting distribution. Wehave to make sure, however, that the process was not affected by the outside influences like agrarianreforms or other forms of redistribution en gros .One plot type suitable for our purpose are gardens in urban areas. We used the Czech real estatecadastre concentrating on the sizes of four thousand gardens in the urban area of the towns
Rychnovnad Knˇežnou and
Dobruška in East Bohemia. To compare their distribution with the perpetuity resultmentioned above we need, of course, a proper normalization: we choose the scale in which the meansize of the plot is equal to one. The result confirms our conjecture: the probability distribution of thegarden areas coincides with the Dickman function as shown in Figure 1.This finding can be easily understood. Gardens in urban areas are desirable properties, and asa result, any piece of a garden is equally good for the market. This means the process (2) goes onwith the variables a and d approximately homogeneously distributed over the interval (0 , , so thatthe Dickman function gives a good fit confirming our expectation. Another conclusion we can makewithout looking into a detailed history is that the overall area of gardens in these towns did not changesignificantly in the course of the time.Another suitable land plot type recorded in the cadaster which was not affected by agrarian re-forms are yards and the build-up areas. In the latter case we take into account the areas only withoutpaying attention to the type (number of floors, etc.) of the buildings which may be constructed on5 Figure 2: The probability density of the normalized sizes of the yard and build-up areas (full line)compared with the solution of eq. (1) with uniformly distributed a ∈ (0 , . (dashed line). We haveused the sizes of 47000 yards and build-up areas.them. The situation is somewhat different from the garden distribution case, since now we need notrestrict ourselves to the urban areas. This enables us to work with a much larger data set; altogetherwe employed data about 47000 yards and build-up areas.The allocation process is different because it is not conservative in this case. In the course of timea new size of build-up area or yard can be a result of merging a number of smaller areas into a largerone, and at the same time, a larger area can come from the transformation of another land type into thebuilding land. To describe such a process we use again the equation (1) with the uniformly distributedvariable a supposing that all changes occur with equal probability, however, we change the variablerange taking a ∈ (0 , A ) with A > to take into account the fact that the build-up area can expand.The result is plotted on the figure 2 and we see that choosing A = 1 . we get an excellent fit.It would be interesting to test the concept described here on other plot types such as the distributionof fields or forests. For this purpose, unfortunately, the Czech land registry is not suitable because inthis case the distribution was formed not only by standard market forces but also by processes likecollectivization, etc. A brief look at the corresponding data shows that the distributions cannot bedescribed with the help of (1) with a symmetric distribution P ( a ) . Acknowledgement:
We thank the referees for useful remarks. The research was supported by theCzech Ministry of Education, Youth and Sports within the project LC06002. The cooperation of theLand Registry District Office in Rychnov nad Knˇežnou is gratefully acknowledged.6 eferences [AM02] A.Y. Abul-Magd: Wealth distribution in an ancient Egyptian society,
Phys. Rev.
E66 (2002),057104 .[BR01] M. Baron, A.L. Rukhin: Perpetuities and asymptotic change-point analysis,
Stat. and Probab.Lett. (2001), 29–38.[BS07] W.D. Banks, I.E. Shparlinski: Integers with a large smooth divisors, Integers (2007), A17;see also arXiv:math.NT/0601460 [CC07] A. Chatterjee, B.K. Chakrabarti: Kinetic exchange models for income and wealth distribu-tions, arXiv:0709.1543 [physics.soc-ph] [DG93] P. Donnelly, G. Grimmett: On the asymptotic distribution of large prime factors, J. Lond.Math. Soc. (1993), 395–404.[Du90] D. Dufresne: The distribution of a perpetuity with application to risk theory and pensionfunding, Scand. Actuarial J. (1990), 750–783.[GM00] Ch.M. Goldie, R.A. Maller: Stability of perpetuities,
Ann. Probab. (2000), 1195–1218.[HT93] A. Hildebrand, G. Tenebaum: Integers without large prime factors, J. de Théorie des Nombresde Bordeaux (1993), 411–484.[Hu05] T. Huillet: Random partitioning problems involving Poisson point processes o the interval, Int. J. Pure Appl. Math. (2005), 143–179.[HT02] H.–K. Hwang, T.–H. Tsai: Quickselect and the Dickman function, Combin. Probab. Comput. (2002), 353–371.[MMS95] H. Mahmoud, R. Modarres, R. Smythe: Analysis of QUICKSELECT: an algorithm fororder statistics, RAIRO Inform. Theor. Appl. (1995), 255- ˝U276.[Pa1897] V. Pareto: Course d’Economie Politique , Macmillan, Paris 1897.[SGG06] E. Scalas, M. Gallegati, E. Guerci, D. Mas, A. Tedeschi: Growth and allocation of resourcesin economics: The agent-based approach,
Physica
A370 (2006), 86–90.[Ve79] W. Vervaat: On a stochastic difference equation and a representation of of non-negative in-finitely divisible random variables,