[PDF] A Proof of First Digit Law from Laplace Transform

Abstract

The first digit law, also known as Benford's law or the significant digit law, is an empirical phenomenon that the leading digit of numbers from real world sources favors small ones in a form log(1+1/d) , where d=1,2,...,9 . Such a law keeps elusive for over one hundred years because it was obscure whether this law is due to the logical consequence of the number system or some mysterious mechanism of the nature. We provide a simple and elegant proof of this law from the application of the Laplace transform, which is an important tool of mathematical methods in physics. We reveal that the first digit law is originated from the basic property of the number system, thus it should be attributed as a basic mathematical knowledge for wide applications.

Full PDF

AA Proof of First Digit Law from Laplace Transform ∗† Mingshu Cong and Bo-Qiang Ma

1, 2, ‡ School of Physics and State Key Laboratory of Nuclear Physics and Technology,Peking University, Beijing 100871 Center for High Energy Physics, Peking University, Beijing 100871

Abstract

The ﬁrst digit law, also known as Benford’s law or the signiﬁcant digit law, is an empiricalphenomenon that the leading digit of numbers from real world sources favors small ones in a formlog(1 + 1 /d ), where d = 1 , , ...,

9. Such a law keeps elusive for over one hundred years becauseit was obscure whether this law is due to the logical consequence of the number system or somemysterious mechanism of the nature. We provide a simple and elegant proof of this law fromthe application of the Laplace transform, which is an important tool of mathematical methods inphysics. We reveal that the ﬁrst digit law is originated from the basic property of the numbersystem, thus it should be attributed as a basic mathematical knowledge for wide applications. ∗ Published in Chinese Physics Letters 36 (2019) 070201 † Supported by the National Natural Science Foundation of China under Grant No. 11475006 ‡ Corresponding author. Electronic address: [email protected] a r X i v : . [ s t a t . O T ] A ug he ﬁrst digit law, which is also called the signiﬁcant digit law or Benford’s law, wasﬁrst noticed by Newcomb in 1881 [1], and then re-discovered independently by Benfordin 1938 [2]. It is an empirical observation that the ﬁrst digits of natural numbers prefersmall ones rather than a uniform distribution as might be expected. More accurately, theprobability that a number begins with digit d , where d = 1 , , ..., P d = log(1 + 1 d ) , d = 1 , , ..., , (1)as shown in Fig. 1. FIG. 1: Benford’s law of the ﬁrst digit distribution, from which we see that the probability ofﬁnding numbers with leading digit 1 is larger than that with 2 , ...,

Empirically, the populations of countries, the areas of lakes, the lengths of rivers, thearabic numbers on the front page of a newspaper [2], physical constants [3], the stock marketindices [4], ﬁle sizes in a personal computer [5], etc., all conform to the peculiar law well.Benford’s law has been veriﬁed to hold true for a vast number of examples in various domains,such as economics [4], social science [6], environmental science [7], biology [8], geology [9],astronomy [10], statistical physics [11, 12], nuclear physics [13–17], particle physics [18], andsome dynamical systems [19–21]. Also, there have been many explorations on applicationsof the law in various ﬁelds, mainly to detect data and judge their reasonableness, such as in2istinguishing and ascertaining fraud in taxing and accounting [22–24] and falsiﬁed data inscientiﬁc experiments [25].Benford’s law has several elegant properties. It is scale-invariant [26, 27], which meansthat the law does not depend on any particular choice of units. This law is also base-invariant [28–30], which means that it is independent of the base b with a general form P d = log b (1 + 1 d ) , d = 1 , , ..., b − , for b ≥ . (2)The law is also found to be power-invariant [18], i.e., any power ( (cid:54) = 0) on numbers in the dataset does not change the ﬁrst digit distribution. Though there have been many studies onBenford’s law [31], the underlying reason for the success of this law remains elusive for morethan one hundred years. It was unclear whether Benford’s Law is due to some unknownmechanism of the nature or it is merely a logical consequence of human number system.However, the situation has been changed due to the appearance of a general derivationof Benford’s law from the application of the Laplace transform [32], where a strict version ofBenford’s law is derived as composed of a Benford term and an err term. The Benford termexplains the prevalence of Benford’s law and the err term leads to derivations from the lawwith four categories of number sets. It is the purpose of this Letter to provide a more simpleand elegant version of the derivation of Benford’s law compared to Ref. [32]. Through thisderivation, it is easier to understand the rationality of Benford’s law. We reveal that theﬁrst digit law can be derived as the main term from the Laplace transform. This explainswhy Benford’s law is so successful for many number sets. We perform similar analysis on theregularities of the second digit and i th-signiﬁcant digit distributions, and extend the law toa more general rule for the ﬁrst several digit distribution. We also estimate the error termand point out conditions for the validity of this law.For simpliﬁcation, we constrain ourselves to the decimal system ﬁrst. Let F ( x ) be anarbitrary normalized probability density function deﬁned on positive real number set R + .(Here we use the capital letter F instead of the lowercase one, as opposed to the convention.)Of course, in the real case, the variable x may be negative or bounded, but this is not harmfulto our derivation. When x can be negative, we can use the probability density function ofits absolute value, keeping results unchanged.In the decimal system, the probability P d of ﬁnding a number whose ﬁrst digit is d is thesum of the probability that it is contained in the interval [ d · n , ( d + 1) · n ) for all integer3 , therefore P d can be expressed as P d = ∞ (cid:88) n = −∞ (cid:90) ( d +1) · n d · n F ( x )d x , (3)which can also be rewritten as P d = (cid:90) ∞ F ( x ) g d ( x )d x , (4)with g d ( x ) being a new density function whose signiﬁcance will be clear in the following.(Here the lowercase letter is used, due to conventions for Laplace transform in the followingsections.) Adopting the Heaviside step function, η ( x ) =  , if x ≥ , if x < , (5)we can write g d ( x ) as g d ( x ) = ∞ (cid:88) n = −∞ [ η ( x − d · n ) − η ( x − ( d + 1) · n )] . (6)Based on the above discussion, we can understand to some extent why numbers prefersmaller ﬁrst digits. Naively one might think that the 9 digits in the decimal system play thesame roles, but they deﬁne diﬀerent density g d ( x ) as shown above, thus behave diﬀerentlyin the decimal system. For better illustration, we draw the images of g ( x ) and g ( x ) in theinterval [1 , P d for any given F ( x ) numerically. Usually, it does not strictly ﬁt in with Eq. (1). In thissense, Benford’s law is not a rigorous “law” with strong predictive power. However, by usingthe technique of Laplace transform, we show in the following that Benford’s law is a rathergood approximation for those well-behaved probability density functions.We now prove that if a probability density function has an inverse Laplace transform, itsatisﬁes Benford’s law well. Recalling the complex inversion formula [33], if F(x), extendedto the complex plane, satisﬁes:1. F(x) is analytic on C except for a ﬁnite number of isolated singularities;4 IG. 2: Images of g ( x ) (upper) and g ( x ) (lower), from which we notice that the gap between thecolored areas in g ( x ) is wider than than that is g ( x ). This shows that the distribution of g ( x )is more dense than g ( x ) in the whole number range.

2. F(x) is analytic on the half plane { x | Re z > } ;3. There are positive constants M , R , and β such that | F ( x ) | ≤ M/ | x | β whenever | z | ≥ R ,F(x) has an inverse Laplace transform.We call a probability density function “well-behaved” if it satisﬁes these three conditionsand its inverse Laplace transform is smooth enough, i.e., without violent oscillation. Expo-nential functions, some fractional functions, and a handful of other common functions are allwell-behaved. Thus, the derivation in the following has wide application. In what follows,we assume that F ( x ) is well-behaved.Let f ( t ) be the inverse Laplace transform of F ( x ), and G ( t ) be the Laplace transform of g ( x ), i.e., F ( x ) = (cid:90) ∞ f ( t ) e − tx d t , (7) G ( t ) = (cid:90) ∞ g ( x ) e − tx d x . (8)5aplace transforms have the following property (cid:90) ∞ F ( x ) g ( x )d x = (cid:90) ∞ d xg ( x ) (cid:90) ∞ f ( t ) e − tx d t = (cid:90) ∞ d tf ( t ) (cid:90) ∞ g ( x ) e − tx d x = (cid:90) ∞ f ( t ) G ( t )d t , (9)which means that Laplace transform can act on either the function f or g with the aboveintegral result keeping unchanged.To derive the left-hand side of the above equation, we would like to calculate the right-hand side instead. Because it is comparably convenient to calculate the Laplace transformof g d ( x ), G d ( t ) = (cid:90) ∞ g d ( x ) e − tx d x = ∞ (cid:88) n = −∞ (cid:90) ( d +1) · n d · n e − tx d x = 1 t ∞ (cid:88) n = −∞ ( e − td · n − e − t ( d +1) · n ) , (10)which can be treated as a function of two variables d and t . Although d is deﬁned on thedecimal digit set 1 , , ...,

9, it can be extended to the whole real axis. Therefore, G d ( t ) isa continuous function of d as well as t . A technique to evaluate G d ( t ) is to calculate itspartial derivative with respect to d approximately, and then integrate the partial derivativeto derive the result ∂G d ( t ) ∂d = ∞ (cid:88) n = −∞ ( − n e − td · n + 10 n e − t ( d +1) · n ) (cid:39) (cid:90) ∞−∞ ( − x e − td · x + 10 x e − t ( d +1) · x )d x = 1ln10 (cid:90) ∞ ( − e − tdy + e − t ( d +1) y )d y = 1ln10 ( − td + 1 t ( d + 1) ) . (11)There is one and the only one approximation, i.e., we adopt an integration to replace asummation. Because G d ( t ) → d → ∞ , Eq. (11) can be integrated to yield G d ( t ) (cid:39) t log (1 + 1 d ) . (12)6hen using Eq. (9), we obtain P d = (cid:90) ∞ F ( x ) g d ( x )d x = (cid:90) ∞ G d ( t ) f ( t )d t (cid:39) (cid:90) ∞ f ( t ) t log (1 + 1 d )d t = log (1 + 1 d ) (cid:90) ∞ f ( t ) t d t = log (1 + 1 d ) , (13)where we have used the following normalization condition of f ( t ),1 = (cid:90) ∞ F ( x )d x = (cid:90) ∞ d x (cid:90) ∞ f ( t ) e − tx d t = (cid:90) ∞ d tf ( t ) (cid:90) ∞ e − tx d x = (cid:90) ∞ f ( t ) t d t. (14)Eq. (13) is exactly the ﬁrst digit law for the decimal system. Thus we show that well-behaved functions satisfy Benford’s law approximately. A more rigorous derivation withoutthe approximately equal signs in Eqs. (11), (12), (13) can be found in Ref. [32].Compared to Ref. [32], the method provided above accords with our intuition better.In fact, unnecessary complicated treatments are introduced to guarantee the strictness ofthe proof in Ref. [32]. For example, a logarithmic scale is adopted after Laplace transform,merely to derive Eq. (12) of Ref. [32], which corresponds to Eq. (13) in this paper. Eq. (13),though approximately holds, is set up on the original linear scale, thus manifests itself as aproperty of the direct Laplace transform, instead of the logarithmic Laplace transform whichbears less intuitive physical meanings. In this paper, no logarithmic transform is requiredto derive Benfords law.According to derivations so far, we can already explain the rationality of Benford’s lawthrough a clear chain of logic, as follows:1. The integral of the product of F ( x ) and g ( x ) equals the integral of the product of theinverse Laplace transform of F ( x ) and the Laplace transform of g ( x ), i.e., Eq. (9).7. The Laplace transform of g ( x ) approximately equals the Benford term divided by t ,i.e., Eq. (12).3. The normalization condition of F ( x ) guarantees that the integral of the inverse Laplacetransform of F ( x ) divided by t equals 1, i.e., Eq. (14).4. Therefore, the integral of the product of F ( x ) and g ( x ) approximately equals theBenford term, i.e., Eq. (13).Such a chain of logic is not apparent in Ref. [32].The second signiﬁcant digit law was also given by Newcomb [1]. In the decimal system,it is P (2nd digit being d ) = (cid:88) k =1 log (1 + (10 k + d ) − ) , d = 0 , , ..., . (15)Hill derived a general i th-signiﬁcant digit law [30]: letting D i ( D , D , ... ) denote the i th-signiﬁcant digit (with base 10) of a number (e.g. D (0 . D (0 . D (0 . k and all d j ∈ , , ..., j = 1 , , ...k , one has P ( D = d , ..., D k = d k ) = log [1 + ( k (cid:88) i =1 d i · k − i ) − ] . (16)We propose here a general form of digit law, and show that both the second signiﬁcantdigit law and the general i th-signiﬁcant digit law are only corollaries of this general form.We calculate P b,d,l,k , which is the probability that the integer composed of the ﬁrst k digits (base b ) of an arbitrary number [e.g. for the number 0 . k = 2, this integeris 31] is between d and d + l ( b k − ≤ d < d + l < b k ). Correspondingly we introduce thedensity function g b,d,l,k ( x ) as g b,d,l,k ( x ) = ∞ (cid:88) n = −∞ [ η ( x − d · b n ) − η ( x − ( d + l ) · b n )] , (17)where the right hand side is independent of k (while k puts restrictions on d and l ). Thuswe can omit the subscript k in the following.Similar technique gives the Laplace transform of g b,d,l ( x ) G b,d,l ( t ) (cid:39) t log b (1 + ld ) . (18)8hus we arrive at the general signiﬁcant digit law P b,d,l,k = log b (1 + ld ) . (19)We ﬁnd that Benford’s law (2) corresponds to a special case of this general form for k = 1 and l = 1, whereas Hill’s general i th-signiﬁcant law (Eq. (16)) corresponds to thecase for b = 10, d = (cid:80) ki =1 d i · k − i and l = 1. Newcomb’s second signiﬁcant digit law canbe considered as a corollary of Hill’s law according to the addition principle in probabilitytheory.We now calculate the error brought by our replacement of the summation to the integra-tion in Eq. (11). Since Eq. (4) is always an accurate expression, the total error is∆ total = (cid:90) ∞ F ( x ) g b,d,l ( x )d x − log b (1 + ld ) . (20)If we deﬁne ∆ b,d,l ( t ) = tG b,d,l ( t ) − log b (1 + ld ) , (21)the total error can be written as∆ total = (cid:90) ∞ f ( t ) t [ tG b,d,l ( t ) − log b (1 + ld )]d t = (cid:90) ∞ f ( t ) t ∆ b,d,l ( t )d t. (22)Checking the deﬁnitions of the two terms of ∆ b,d,l , we ﬁnd that the variables of both ofthem can be multiplied by b and the results keep unchanged, i.e., ∆ b,d,l is scale invariant.Hence ∆ b,d,l ( bt ) = ∆ b,d,l ( t ) . (23)For clarity, we deﬁne t = e s , (24) (cid:101) ∆ b,d,l ( s ) = ∆ b,d,l ( e s ) , (25) (cid:101) f ( s ) = f ( e s ) . (26)The corresponding normalization condition is (cid:90) + ∞−∞ (cid:101) f ( s )d s = 1 , (27)9nd the property Eq. (23) becomes (cid:101) ∆ b,d,l ( s + ln b ) = (cid:101) ∆ b,d,l ( s ) . (28)Clearly, (cid:101) ∆ b,d,l is a function of period ln b . Furthermore, according to the result for exponentialdistribution in Ref. [34] (Corollary 2, (cid:101) f ( s ) here is exactly h ( x ) in Ref. [34], | (cid:101) ∆ ,d, ( s ) | is | h ( x ) − log (1 + d ) | in the equation of Corollary 2), a rather good estimation can be madewhen b = 10 and d = l = 1 0 . < max | (cid:101) ∆ , , ( s ) | < . . (29)We notice that the total error can be expressed as∆ total = (cid:90) + ∞−∞ (cid:101) ∆ b,d,l ( s ) (cid:101) f ( s )d s, (30)where (cid:101) f ( s ) is dependent on F ( x ) ultimately. In most cases, the correlation between (cid:101) f ( s )and (cid:101) ∆ b,d,l ( s ) is small, so is the total error. Therefore, Benford’s law can be a rather goodapproximation. However, if (cid:101) f ( s ) is close to a periodic function with the exact period ln b ,or (cid:101) f ( s ) changes signs very fast between positive and negative numbers (this may happenwhen F ( x ) is artiﬁcially chosen, as the case of telephone numbers in a given region), thesmall (cid:101) ∆ b,d,l ( s ) is counted and accumulated for many times, therefore the correlation becomeslarge. Similar problems also exist for some special types of probability density functions,whose inverse Laplace transforms oscillate violently between positive and negative numbers,e.g., uniform distributions or normal distributions with small variances. Number sets drawnfrom such distributions, e.g., heights or ages of people, though being natural, still violateBenford’s law. By arguing this, we point out that although the above derivation seems quitegeneral, it cannot be universally true. More rigorous discussions about the err term withgeneral applications to four types of number sets can be found in Ref. [32].A special case is when the integral of (cid:101) f ( s ) is not only convergent to 1, but also absolutelyconvergent to a positive real number M , then∆ total ≤ (cid:90) + ∞−∞ | (cid:101) ∆ , , ( s ) || (cid:101) f ( s ) | d s ≤ (cid:90) + ∞−∞ . | (cid:101) f ( s ) | d s = 0 . M. (31)10f f ( s ) is a positive or negative deﬁnite function, it is absolutely integrable. Such an F ( x )is called the completely monotonic function in mathematics. This means that M is 1,thus ∆ total is not greater than 0.03. Consequently Benford’s law is a good estimation. Forexample, when F ( x ) = √ x e −√ x , f ( t ) = √ πt e − t >

0, and when F ( x ) = x +1) , f ( x ) = e − t >

0. We can assert that in these cases, the total errors are less than 0.03. In fact, numericalresults are 0.0005 and 0.009. This veriﬁes our estimation.As a rule of thumb, distributions with monotonic decreasing and relatively smooth proba-bility density functions often conform to Benford’s law well [32]. Inverse Laplace transformsof such probability density functions generally change signs only for ﬁnite times, thus be-ing absolutely convergent. To understand this, one can view inverse Laplace transform asdecomposing the original probability density function into a series of exponential functions,among which some are positive and others negative. If a monotonic decreasing probabilitydensity function is relatively smooth, i.e., without a sharp change of probability density, itcan be approached mainly by positive exponential distributions, therefore its inverse Laplacetransform does not oscillate between positive and negative numbers very much. As an appli-cation of this rule of thumb, for non-monotonic decreasing distributions, we can transformthem into monotonic decreasing distributions, resulting in better performance of Benford’slaw, e.g., for normal distributions, we can subtract the mean value from the original dataset and obtain a monotonic decreasing distribution.The above calculations and derivations tell us that the signiﬁcant digit behaviors demon-strate that although our nature has no preference to any speciﬁc number, it does havediscrimination to digits in numbers as a logical consequence of human’s counting system.Therefore our results justify the conventional wisdom that the violation of Benford’s lawis a sign that a table of numbers is artiﬁcial or anomalous. The underlying reason for theuneven distribution of ﬁrst digits is due to the basic property of digital system, but not somedynamic source behind the nature as people suspected. This also explains why we can useBenford’s law to distinguish anomalies or unnaturalness in artiﬁcial numbers.The mathematical expressions and derivations provided in this paper are simple, elegant,and all with clear intuitive pictures. They are easily comprehensible. Therefore, this versionof proof of Benford’s law can also serve as an example for the application of the Laplacetransform.The ﬁrst digit law reveals an astonishing regularity in realistic numbers. We provide in11his Letter a proof of this law from the Laplace transform, and point out the condition forthe validity of the law. Compared to Ref. [32], the derivation in this Letter is simple andelegant, and it directly reveals the rationality of the ﬁrst digit law. From our work, theﬁrst digit law is due to the basic structure of the number system. Thus the ﬁrst digit lawis a general rule that applies to vast data sets in natural world as well as in human socialactivities. It is not strange anymore why Benford’s law is so successful in various domains.Such a law should be regarded as a basic mathematical knowledge with great potential forvast applications. [1] Newcomb S 1881

Am. J. Math. Proc. Am. Phil. Soc. Am. J. Phys. Am. Stat. Eur. J. Phys. L17[6] Leemis L M, Schmeiser B W and Evans D L 2000

Am. Stat. Analyst.

Aquat. Bot. Mathematical Geology Astropart. Phys. Physica A

Phys. Rev. E Eur. J. Phys. Eur. Phys. J. A Commun. Theor. Phys. Eur. Phys. J. A Chin. Phys. Lett. Mod. Phys. Lett. A Chaos Trans. Am. Math. Soc.

Discrete. Cont. Dyn. Sys. A

22] Nigrini M J 1996

J. Am. Tax. Assoc. Intern. Auditor J. Accountancy

J. Appl. Stat. Ann. Math. Stat. J. Theor. Probab. Am. Math. Mon.

Proc. Am. Math. Soc.

Stat. Sci. Benford Online Bibliography

Phys. Lett. A

Basic complex analysis– 3rd ed. (New York: W. H. Freeman)471[34] Engel H A and Leuenberger C 2003

Stat. Probabil. Lett.361