[PDF] Benford or not Benford: new results on digits beyond the first

Abstract

In this paper, we will see that the proportion of d as p th digit, where p > 1 and d ∈ 0, 9, in data (obtained thanks to the hereunder developed model) is more likely to follow a law whose probability distribution is determined by a specific upper bound, rather than the generalization of Benford's Law to digits beyond the first one. These probability distributions fluctuate around theoretical values determined by Hill in 1995. Knowing beforehand the value of the upper bound can be a way to find a better adjusted law than Hill's one.

Full PDF

BBenford or not Benford: new results on digitsbeyond the ﬁrst

September 22, 2018

Abstract

In this paper, we will see that the proportion of d as p th digit, where p > d ∈ (cid:74) , (cid:75) , in data (obtained thanks to the hereunder devel-oped model) is more likely to follow a law whose probability distributionis determined by a speciﬁc upper bound, rather than the generalizationof Benford’s Law to digits beyond the ﬁrst one. These probability dis-tributions ﬂuctuate around theoretical values determined by Hill in 1995.Knowing beforehand the value of the upper bound can be a way to ﬁnd abetter adjusted law than Hill’s one. Introduction

Benford’s Law is really amazing: according to it, the ﬁrst digit d , d ∈ (cid:74) , (cid:75) , ofnumbers in many naturally occurring collections of data does not follow a dis-crete uniform distribution; it rather follows a logarithmic distribution. Havingbeen discovered by Newcomb in 1881 ([11]), this law was deﬁnitively brought tolight by Benford in 1938 ([2]). He proposed the following probability distribu-tion: the probability for d to be the ﬁrst digit of a number is equal to log(1 + d ).Most of the empirical data, as physical data (Knuth in [10] or Burke and Kin-canon in [4]), demographic and economic data (Nigrini and Wood in [12]) orgenome data (Friar et al. in [6]), follow approximately Benford’s Law. To suchan extent that this law is used to detect possible frauds in lists of socio-economicdata ([15]) or in scientiﬁc publications ([5]).In [3], Blondeau Da Silva, building a rather relevant representative model,showed that, in this case, the proportion of each d as leading digit, d ∈ (cid:74) , (cid:75) ,structurally ﬂuctuates. It strengthens the fact that, concerning empirical datasets, this law often appears to be a good approximation of the reality, but nomore than an approximation ([7]). We can note that there also exist distribu-tions known to disobey Benford’s Law ([13] and [1]).Generalizing Benford’s Law, Hill ([8]) extends the law to digits beyond theﬁrst one: the probability for d , d ∈ (cid:74) , (cid:75) , to be the p th digit of a number isequal to (cid:80) p − − j =10 p − log(1 + j + d ).Building a very similar model to that described in [3], the naturally occurringdata will be considered as the realizations of independant random variablesfollowing the hereinafter constraints: ( a ) the data is strictly positive and isupper-bounded by an integer n , constraint which is often valid in data sets, the1 a r X i v : . [ s t a t . O T ] M a y hysical, biological and economical quantities being limited ; ( b ) each randomvariable is considered to follow a discrete uniform distribution whereby the ﬁrststrictly positive p -digits integers ( p >

1) are equally likely to occur ( i beinguniformly randomly selected in (cid:74) p − , n (cid:75) ). This model relies on the fact thatthe random variables are not always the same.Through this article we will demonstrate that the predominance of 0 over1 (and of 1 over 2, and so on), as p th , ( p >

1) digit is all but surprising.Hill’s probabilities became standard values that should exactly be followed bymost of naturally occurring collections of data. However the reality is that theproportion of each d as leading digit structurally ﬂuctuates. There is not asingle law but numerous distinct laws that we will hereafter examine. Let p and d be two strictly positive integers such that p > d ∈ (cid:74) , (cid:75) . Let m be a strictly positive integer such that m ≥ p − . Let U{ p − , m } denotethe discrete uniform distribution whereby integers between 10 p − and m areequally likely to be observed.Let n be a strictly positive integer such that n ≥ p − . Let us considerthe random experiment E n of tossing two independent dice. The ﬁrst one is afair ( n + 1 − p − )-sided die showing n + 1 − p − diﬀerent numbers from 1to n + 1 − p − . The number i rolled on it deﬁnes the number of faces on thesecond die. It thus shows i diﬀerent numbers from 10 p − to i + 10 p − − n as follows: Ω n = { ( i, j ) : i ∈ (cid:74) , n + 1 − p − (cid:75) and j ∈ (cid:74) p − , i + 10 p − − (cid:75) } . Our probability measure isdenoted by P.Let us denote by D ( n,p ) the random variable from Ω n to (cid:74) , (cid:75) that mapseach element ω of Ω n to the p th digit of the second component of ω .As our aim is to determine the probability that the p th digit of the integerobtained thanks to the second throw is d , it can be considered with no con-sequence on our results that we ﬁrst select an integer i equal to or less than n among at least p -digits integers (following the U{ p − , n } discrete uniformdistribution); afterwards we select an other at least p -digits integer equal to orless than i (following the U{ p − , i } discrete uniform distribution). d Through the below proposition, we will express the value of P( D ( n,p ) = d ) i.e. the probability that the p th digit of our second throw in the random experiment E n is d . Proposition 2.1.

Let k denote the integer such that: k = max { i ∈ N : 10 i + p ≤ n } .Let l denote the positive integer such that: l = (cid:98) n − (10 p − + d )10 k +1 k +2 (cid:99) + 10 p − . he value of P( D ( n,p ) = d ) is: n + 1 − p − (cid:16) k (cid:88) i =0 (cid:0) p − − (cid:88) j =10 p − (10 j +( d +1))10 i − (cid:88) b =(10 j + d )10 i b − ((9 j + d )10 i + 10 p − − b + 1 − p − + p − − (cid:88) j =10 p − − p + i − , (10( j +1)+ d )10 i − (cid:88) a =max(10 p + i − , (10 j +( d +1))10 i ) i ( j + 1) − p − a + 1 − p − (cid:1) + r ( n,d,p ) (cid:17) , where r ( n,d,p ) is, if the p th digit of n is d : l (cid:88) j =10 p − min( n, (10 j +( d +1))10 k +1 − (cid:88) b =(10 j + d )10 k +1 b − ((9 j + d )10 k +1 + 10 p − − b + 1 − p − + l − (cid:88) j =10 p − − j +1)+ d )10 k +1 − (cid:88) a =max(10 p + k , (10 j +( d +1))10 k +1 ) k +1 ( j + 1) − p − a + 1 − p − , or where r ( n,d,p ) is, if the p th digit of n is all but d : l (cid:88) j =10 p − (10 j +( d +1))10 k +1 − (cid:88) b =(10 j + d )10 k +1 b − ((9 j + d )10 k +1 + 10 p − − b + 1 − p − + l (cid:88) j =10 p − − n, (10( j +1)+ d )10 k +1 − (cid:88) a =max(10 p + k , (10 j +( d +1))10 k +1 ) k +1 ( j + 1) − p − a + 1 − p − . Proof.

Let us denote by F ( n,p ) the random variable from Ω n to (cid:74) , n +1 − p − (cid:75) that maps each element ω of Ω n to the ﬁrst component of ω . It returns thenumber obtained on the ﬁrst throw of the unbiased ( n + 1 − p − )-sided die.For each q ∈ (cid:74) , n + 1 − p − (cid:75) , we have:P( F ( n,p ) = q ) = 1 n + 1 − p − . (1)According to the Law of total probability, we state:P( D ( n,p ) = d ) = n +1 − p − (cid:88) q =1 P( D ( n,p ) = d | F ( n,p ) = q ) P( F ( n,p ) = q ) . (2)Thereupon two cases appear in determining the value, for q ∈ (cid:74) , n + 1 − p − (cid:75) , of P( D ( n,p ) = d | F ( n,p ) = q ). Let k q be the integer such that k q =max { k ∈ N : 10 p + k ≤ q + 10 p − − } in both cases.Let us study the ﬁrst case where the p th digit of q + 10 p − − d . Forall i in (cid:74) , k q (cid:75) , there exist 9 × p − sequences of 10 i consecutive integers from(10 j + d )10 i to (10 j + ( d + 1))10 i −

1, where j ∈ (cid:74) p − , p − − (cid:75) , whose p th digit is d . The higher of these integers is (10(10 p − −

1) + ( d + 1))10 k q −

1, the3ast ( p + k q )-digit number in this case. Thus, from 10 p − to 10 p + k q −

1, thenumber of integers whose p th digit is d is: k q (cid:88) i =0 10 p − − (cid:88) j =10 p − (10 j +( d +1))10 i − (cid:88) (10 j + d )10 i × p − k q (cid:88) i =0 i = 10 p − (10 k q +1 − . This equality still holds true for k q = −

1. Such types of sum would be considerednull in the rest of the article. From 10 p + k q to q +10 p − −

1, there exist t sequencesof 10 k q +1 consecutive integers from (10 j + d )10 k q +1 to (10 j + ( d + 1))10 k q +1 − j ∈ (cid:74) p − , p − + t − (cid:75) , whose p th digit is d . There also exist q +10 p − − − (10(10 p − + t ) + d )10 k q +1 + 1 additional integers in this case between(10(10 p − + t ) + d )10 k q +1 and q + 10 p − −

1. Finally the total number of integerswhose p th digit is d is:10 p − (10 k q +1 −

1) + t × k q +1 + q + 10 p − − − (10(10 p − + t ) + d )10 k q +1 + 1i.e. q + 10 p − − − (cid:16)(cid:0) p − + t ) + d (cid:1) k q +1 − (cid:17) .It may be inferred that:P( D ( n,p ) = d | F ( n,p ) = q ) = q + 10 p − − − (cid:16)(cid:0) p − + t ) + d (cid:1) k q +1 − (cid:17) q , (3)the p th digit of q + 10 p − − d .In the second case, we consider the integers q +10 p − − p th digits arediﬀerent from d . On the basis of the previous case, the total number of integerswhose p th digit is d is, where t is the number of sequences of consecutive integerslower than q + 10 p − − p − (10 k q +1 −

1) + t × k q +1 i.e. 10 k q +1 (10 p − + t ) − p − .It can be concluded that:P( D ( n,p ) = d | F ( n,p ) = q ) = 10 k q +1 (10 p − + t ) − p − q , (4)the p th digit of q + 10 p − − d .Using equalities (1), (2), (3) and (4), we get our result.For example, we get: Examples . Let us ﬁrst determine the value of P( D (10003 , = 2). The prob-ability that the ﬁfth digit of a randomly selected number in (cid:74) , (cid:75) is2 is , those in (cid:74) , (cid:75) is , those in (cid:74) , (cid:75) is and those in (cid:74) , (cid:75) is . Hence we have:P( D (10003 , = 2) = 14 (cid:16)

01 + 02 + 13 + 14 (cid:17) ≈ . . It is the second case of Proposition 2.1, where n = 10003, d = 2, p = 5, k = − l = 1000. 4et us now determine the value of P( D (1113 , = 1) (ﬁrst case of Proposition2.1); in this case, we have k = 0 and l = 11. P( D (1113 , = 1) = 11014 (cid:16) (cid:88) j =10 j − j −

98 + (cid:88) j =10 10( j +1) (cid:88) a =10 j +2 j − a −

99 + (cid:88) a =992 a −

99 + (cid:88) a =1000 a − (cid:88) b =1010 b − b −

99 + (cid:88) a =1020 a −

99 + (cid:88) b =1110 b − b − (cid:17) = 11014 (cid:16)

12 + 13 + 14 + ... + 111 + 212 + 213 + ... + 89891 + 90892 + 90893 + ... + 90910+ 91911 + ... + 100920 + 100921 + ... + 1001010 + 1011011 + ... + 1041014 (cid:17) ≈ . . Let us determine the value of P( D (212 , = 9) (second case of Proposition 2.1);in this case, we have k = 0 and l = 1. P( D (212 , = 9) = 1203 (cid:16)

910 + (cid:88) j =1 10( j +1)+8 (cid:88) a =10( j +1) ja − (cid:88) a =100 a − (cid:88) b =190 b − b − (cid:88) a =200 a − (cid:17) = 1203 (cid:16)

110 + 111 + ... + 119 + 220 + 221 + ... + 889 + 990 + 991 + ... + 9180 + 10181+ 11182 + ... + 19190 + 19191 + ... + 19203 (cid:17) ≈ . . It is natural that we take a speciﬁc look at the values of n positioned one rankbefore the integers for which the number of digits has just increased.To this end we will consider the sequence (cid:0) P( D n,p = d ) (cid:1) n ∈ N \ (cid:74) , p − − (cid:75) . Inthe interests of simplifying notation, we will denote by ( P ( d,n,p ) ) n ∈ N \ (cid:74) , p − − (cid:75) this sequence. Let us study the subsequence ( P ( d,φ ( d,p ) ( n ) ,p ) ) n ∈ N \ (cid:74) ,p − (cid:75) where φ ( d,p ) is the function from N \ (cid:74) , p − (cid:75) to N that maps n to 10 n −

1. We getthe below result:

Proposition 3.1.

The subsequence ( P ( d,φ ( d,p ) ( n ) ,p ) ) n ∈ N \ (cid:74) ,p − (cid:75) converges to: − + n ( d,p ) + m ( d,p ) − l ( d,p ) − d × k ( d,p ) × p − + 190 ln( 10 p − + d p − ) + 19 ln( 10 p p −

10 + d + 1 ) , where:  k ( d,p ) = (cid:80) p − − j =10 p − ln( j +( d +1)10 j + d ) l ( d,p ) = (cid:80) p − − j =10 p − j ln( j +( d +1)10 j + d ) m ( d,p ) = (cid:80) p − − j =10 p − ln( j +1)+ d j +( d +1) ) n ( d,p ) = (cid:80) p − − j =10 p − j ln( j +1)+ d j +( d +1) ) . Proof.

Let n be a positive integer such that n ≥ p . According to Proposition2.1, we have P ( d,φ ( d,p ) ( n ) ,p ) = P ( d, n − ,p ) i.e. , knowing that in this case k =5ax { i ∈ N : 10 i + p ≤ n − } = n − p − n − p − (cid:16) n − p (cid:88) i =0 (cid:0) p − − (cid:88) j =10 p − (10 j +( d +1))10 i − (cid:88) b =(10 j + d )10 i b − ((9 j + d )10 i + 10 p − − b + 1 − p − + p − − (cid:88) j =10 p − − p + i − , (10( j +1)+ d )10 i − (cid:88) a =max(10 p + i − , (10 j +( d +1))10 i ) i ( j + 1) − p − a + 1 − p − (cid:1)(cid:17) . Let us denote by b ( i,d,p ) the positive number: p − − (cid:88) j =10 p − (10 j +( d +1))10 i − (cid:88) b =(10 j + d )10 i b − ((9 j + d )10 i + 10 p − − b + 1 − p − , and by a ( i,d,p ) the positive number: p − − (cid:88) j =10 p − − p + i − , (10( j +1)+ d )10 i − (cid:88) a =max(10 p + i − , (10 j +( d +1))10 i ) i ( j + 1) − p − a + 1 − p − . Thus we have: P ( d,φ ( d,p ) ( n ) ,p ) = 110 n − p − n − p (cid:88) i =0 (cid:16) b ( i,d,p ) + a ( i,d,p ) (cid:17) . Let us ﬁrst ﬁnd an appropriate lower bound of P ( d,φ ( d,p ) ( n ) ,p ) . We have: b ( i,d,p ) = p − − (cid:88) j =10 p − (cid:0) i − (10 j +( d +1))10 i − (cid:88) b =(10 j + d )10 i (9 j + d )10 i + 10 p − − p − b + 1 − p − (cid:1) = 9 × p + i − − p − − (cid:88) j =10 p − ((9 j + d )10 i + 10 p − − p − ) (10 j +( d +1))10 i − (cid:88) b =(10 j + d )10 i b + 1 − p − Recall that for all integers ( p, q ), such that 1 < p < q :ln( q + 1 p ) ≤ q (cid:88) k = p k ≤ ln( qp − . (5)Consequently, we obtain, for i ≥ b ( i,d,p ) ≥ × p + i − − p − − (cid:88) j =10 p − (9 j + d )10 i ln( (10 j + ( d + 1))10 i − p − (10 j + d )10 i − p − ) ≥ × p + i − − p − − (cid:88) j =10 p − (9 j + d )10 i (cid:0) ln( 10 j + ( d + 1)10 j + d ) + ln(1 + p − j +( d +1) i (10 j + d ) − p − ) (cid:1) ≥ × p + i − − d × i p − − (cid:88) j =10 p − ln( 10 j + ( d + 1)10 j + d ) − × i p − − (cid:88) j =10 p − j ln( 10 j + ( d + 1)10 j + d ) − p − − (cid:88) j =10 p − (9 j + d )10 i ln(1 + p − j +( d +1) i (10 j + d ) − p − ) (cid:1) . k ( d,p ) the positive number (cid:80) p − − j =10 p − ln( j +( d +1)10 j + d ) and l ( d,p ) thepositive number (cid:80) p − − j =10 p − j ln( j +( d +1)10 j + d ). Knowing that for all x ∈ ] −

1; + ∞ [,we have ln(1 + x ) ≤ x , we obtain: b ( i,d,p ) ≥ × p + i − − d × i k ( d,p ) − × i l ( d,p ) − p − − (cid:88) j =10 p − (9 j + d )10 i p − j +( d +1) i (10 j + d ) − p − ≥ × p + i − − d × i k ( d,p ) − × i l ( d,p ) − p − − (cid:88) j =10 p − i p − i × p − − p − ≥ × p + i − − d × i k ( d,p ) − × i l ( d,p ) − × p − i i − . Similarly, we have thanks to inequalities (5): a ( i,d,p ) ≥ p − − (cid:88) j =10 p − (10 i ( j + 1) − p − ) ln( (10( j + 1) + d )10 i + 1 − p − (10 j + ( d + 1))10 i + 1 − p − )+ (10 p − i − p − ) ln( (10 p − + d )10 i + 1 − p − p + i − + 1 − p − )+ (10 p − i − p − ) ln( 10 p + i + 1 − p − (10 p −

10 + d + 1)10 i + 1 − p − ) ≥ i p − − (cid:88) j =10 p − j ln( 10( j + 1) + d j + ( d + 1) ) + 10 i p − − (cid:88) j =10 p − j ln(1 + × (10 p − − j +1)+ d (10 j + d + 1)10 i + 1 − p − )+ (10 i − p − ) (cid:16) p − − (cid:88) j =10 p − (cid:0) ln( 10( j + 1) + d j + ( d + 1) ) + ln(1 + × (10 p − − j +1)+ d (10 j + d + 1)10 i + 1 − p − ) (cid:1)(cid:17) + (10 p − i − p − ) (cid:0) ln( 10 p − + d p − ) + ln(1 + d (10 p − − p − d p − i + 1 − p − ) (cid:1) + (10 p − i − p − )(ln( 10 p p −

10 + d + 1 ) + ln(1 + (10 p − − − d − p (10 p −

10 + d + 1)10 i + 1 − p − )) . Let us denote by m ( d,p ) the positive number (cid:80) p − − j =10 p − ln( j +1)+ d j +( d +1) ) and n ( d,p ) the positive number (cid:80) p − − j =10 p − j ln( j +1)+ d j +( d +1) ): a ( i,d,p ) ≥ i n ( d,p ) + (10 i − p − ) (cid:16) m ( d,p ) + p − − (cid:88) j =10 p − ln(1 + × (10 p − − j +1)+ d (10 j + d + 1)10 i + 1 − p − ) (cid:17) + (10 p − i − p − ) ln( 10 p − + d p − ) + (10 p − i − p − ) ln( 10 p p −

10 + d + 1 ) . Hence we have: P ( d,φ ( d,p )( n ) ,p ) ≥ n (cid:16) a (0 ,d,p ) + b (0 ,d,p ) + n − p (cid:88) i =1 (cid:0) × p + i − − d × i k ( d,p ) − × i l ( d,p ) + 10 i n ( d,p ) + 10 i m ( d,p ) + 10 p − i ln( 10 p − + d p − ) + 10 p − i ln( 10 p p −

10 + d + 1 ) − × p − − p − (cid:0) m ( d,p ) + ln( 10 p − + d p − ) + ln( 10 p p −

10 + d + 1 ) (cid:1) + (10 i − p − ) p − − (cid:88) j =10 p − ln(1 + × (10 p − − j +1)+ d (10 j + d + 1)10 i + 1 − p − ) (cid:1)(cid:17) .

7n light of the following equality (cid:80) n − pi =1 i = n − p +1 − , we have: P ( d,φ ( d,p )( n ) ,p ) ≥ − + 10 − p +1 ( n ( d,p ) + m ( d,p ) − l ( d,p ) − dk ( d,p ) )9 + 10 − p − + d p − )+ 19 ln( 10 p p −

10 + d + 1 ) + (cid:15) ( d,n,p ) , where (cid:15) ( d,n,p ) is: a (0 ,d,p ) + b (0 ,d,p ) n − p − n + dk ( d,p ) + 9 l ( d,p ) − n ( d,p ) − m ( d,p ) × n − − p − × n ln( 10 p − + d p − ) − p × n ln( 10 p p −

10 + d + 1 ) − p − ( n − p )10 n − p − ( n − p )10 n (cid:0) m ( d,p ) + ln( 10 p − + d p − )+ ln( 10 p p −

10 + d + 1 ) (cid:1) + 110 n p − (cid:88) i =1 (10 i − p − ) p − − (cid:88) j =10 p − ln(1 + × (10 p − − j +1)+ d (10 j + d + 1)10 i + 1 − p − ) . Knowing that for all x ∈ ] −

1; + ∞ [, we have ln(1 + x ) ≤ x , we obtain, for all i ∈ { , ..., p − } : p − − (cid:88) j =10 p − ln(1 + × (10 p − − j +1)+ d (10 j + d + 1)10 i + 1 − p − ) ≤ p − − (cid:88) j =10 p − × (10 p − − j +1)+ d (10 j + d + 1)10 i + 1 − p − ≤ p − p p − d + 2 ≤ p From the above upper bound and the deﬁnition of (cid:15) ( d,n,p ) , it may be deducedthat lim n → + ∞ (cid:15) ( d,n,p ) = 0.Let us now ﬁnd an appropriate upper bound of P ( d,φ ( d,p ) ( n ) ,p ) . Thanks toinequalities (5): b ( i,d,p ) ≤ × p + i − − p − − (cid:88) j =10 p − ((9 j + d )10 i + 10 p − − p − )ln( (10 j + ( d + 1))10 i + 1 − p − (10 j + d )10 i + 1 − p − ) ≤ × p + i − − p − − (cid:88) j =10 p − ((9 j + d )10 i + 10 p − − p − ) (cid:0) ln( 10 j + ( d + 1)10 j + d )+ ln(1 + p − − j +( d +1) i (10 j + d ) + 1 − p − ) (cid:1) ≤ × p + i − − d × i k ( d,p ) − × i l ( d,p ) + 10 p − k ( d,p ) . a ( i,d,p ) ≤ p − − (cid:88) j =10 p − i ( j + 1) ln( (10( j + 1) + d )10 i − p − (10 j + ( d + 1))10 i − p − )+ 10 p − i ln( (10 p − + d )10 i − p − p + i − − p − ) + 10 p − i ln( 10 p + i − p − (10 p −

10 + d + 1)10 i − p − ) ≤ i n ( d,p ) + 10 i p − − (cid:88) j =10 p − j ln(1 + × p − j +1)+ d (10 j + d + 1)10 i − p − )+ 10 i (cid:16) m ( d,p ) + p − − (cid:88) j =10 p − ln(1 + × p − j +1)+ d (10 j + d + 1)10 i − p − ) (cid:17) + 10 p − i (cid:0) ln( 10 p − + d p − ) + ln(1 + d × p − p − d p − i − p − ) (cid:1) + 10 p − i (cid:0) ln( 10 p p −

10 + d + 1 ) + ln(1 + p − − d − p (10 p −

10 + d + 1)10 i − p − ) (cid:1) . Hence we have: P ( d,φ ( d,p )( n ) ,p ) ≤ n − p − n − p (cid:88) i =0 (cid:16) × p + i − − d × i k ( d,p ) − × i l ( d,p ) + 10 i m ( d,p ) + 10 i n ( d,p ) + 10 p − i ln( 10 p − + d p − ) + 10 p − i ln( 10 p p −

10 + d + 1 )+ 10 p − k ( d,p ) + 10 i p − − (cid:88) j =10 p − j ln(1 + × p − j +1)+ d (10 j + d + 1)10 i − p − )+ 10 i p − − (cid:88) j =10 p − ln(1 + × p − j +1)+ d (10 j + d + 1)10 i − p − )+ 10 p − i ln(1 + d × p − p − d p − i − p − )+ 10 p − i ln(1 + p − − d − p (10 p −

10 + d + 1)10 i − p − ) (cid:17) . In light of the following equality (cid:80) n − pi =0 i = n − p +1 − , we have: lim n → + ∞ ( 110 n − p − n − p (cid:88) i =0 × p + i − ) = 10 − lim n → + ∞ ( − n − p − n − p (cid:88) i =0 d × i k ( d,p ) ) = − dk ( d,p ) × p − lim n → + ∞ ( − n − p − n − p (cid:88) i =0 × i l ( d,p ) ) = − l ( d,p ) − p lim n → + ∞ ( 110 n − p − n − p (cid:88) i =0 i n ( d,p ) ) = m ( d,p ) × p − lim n → + ∞ ( 110 n − p − n − p (cid:88) i =0 i n ( d,p ) ) = n ( d,p ) × p − lim n → + ∞ ( 110 n − p − n − p (cid:88) i =0 p − i ln( 10 p − + d p − )) = 190 ln( 10 p − + d p − )lim n → + ∞ ( 110 n − p − n − p (cid:88) i =0 p − i ln( 10 p p −

10 + d + 1 )) = 19 ln( 10 p p −

10 + d + 1 ))lim n → + ∞ ( 110 n − p − n − p (cid:88) i =0 p − k ( d,p ) ) = 0 . x ∈ ] −

1; + ∞ [, we have ln(1 + x ) ≤ x , we obtain, for i ≥ i p − − (cid:88) j =10 p − j ln(1 + × p − j +1)+ d (10 j + d + 1)10 i − p − ) ≤ i + p − p p − p − i − p − = 10 i +1 i − ≤ i p − − (cid:88) j =10 p − ln(1 + × p − j +1)+ d (10 j + d + 1)10 i − p − ) ≤ i p p − p − i − p − ≤ × p − p − i ln(1 + d × p − p − d p − i − p − ) ≤ p − i d × p − p − d p − i − p − ≤ p − i p − i − p − ≤ p − i ln(1 + p − − d − p (10 p −

10 + d + 1)10 i − p − ) ≤ p − i p − i − p − ≤ . Thanks to P ( d,φ ( d,p ) ( n ) ,p ) upper bound and the above inequalities, the resultfollows.Let us denote by α ( d,p ) the limit of ( P ( d,φ ( d,p ) ( n ) ,p ) ) n ∈ N \ (cid:74) ,p − (cid:75) . Here is a fewvalues of P ( d,φ ( d,p ) ( n ) ,p ) : d P ( d,φ ( d, (2) , P ( d,φ ( d, (3) , P ( d,φ ( d, (4) , P ( d,φ ( d, (5) , α ( d, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P ( d,φ ( d, ( n ) , and α ( d, , for n ∈ (cid:74) , (cid:75) . These values arerounded to the nearest ten-thousandth. d P ( d,φ ( d, (3) , P ( d,φ ( d, (4) , P ( d,φ ( d, (5) , α ( d, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P ( d,φ ( d, ( n ) , and α ( d, , for n ∈ (cid:74) , (cid:75) . These values arerounded to the nearest ten-thousandth.10 Graphs of ( P ( d,n,p ) ) n ∈ N \ (cid:74) , p − − (cid:75) Let us plot graphs of sequences ( P ( d,n, ) n ∈ N \ (cid:74) , p − − (cid:75) for values of n from10 to 1000 (Figure 1). Then we plot graphs of ( P ( d,n, ) n ∈ N \ (cid:74) , p − − (cid:75) , for n ∈ (cid:74) , (cid:75) (Figure 2).Figure 1: For d ∈ (cid:74) , (cid:75) , graphs of( P ( d,n, ) n ∈ N \ (cid:74) , p − − (cid:75) . Figure 2: For d ∈ (cid:74) , (cid:75) , graphs of( P ( d,n, ) n ∈ N \ (cid:74) , p − − (cid:75) . Note thatpoints have not been all represented.Let us plot two additional graphs of P ( d,n, versus log( n ) and P ( d,n, versus log( n ) for values of n from 10 to 2000000:Figure 3: For d ∈ (cid:74) , (cid:75) , graphsof P ( d,n, versus log( n ). Note thatpoints have not been all ploted. Theﬁrst ﬁve values of the above deﬁnedsubsequence, for each d , being repre-sented by bigger plots. Figure 4: For d ∈ (cid:74) , (cid:75) , graphsof P ( d,n, versus log( n ). Note thatpoints have not been all ploted. Theﬁrst four values of the above deﬁnedsubsequence, for each d , being repre-sented by bigger plots.Through Figures 3 and 4, the proportion of each d as leading digit, d ∈ (cid:74) , (cid:75) , seems to ﬂuctuate and consequently not follow Benford’s Law. Each”pseudo cycle” seems to be composed of 9 × p − short waves. Note that theseobservations were not obvious in view of Figures 1 and 2.We can also prove the following result: Proposition 4.1.

For all n ∈ N \ (cid:74) , p − − (cid:75) such that n ≥ p − + 9 andfor all ( a, b ) ∈ (cid:74) , (cid:75) such that a < b , we have: P ( a,n,p ) > P ( b,n,p ) . The relative position of graphs of P ( d,n,p ) , for d ∈ (cid:74) , (cid:75) , can be observed onFigures 1, 2 and 3. Proof. ( a, b ) ∈ (cid:74) , (cid:75) such that a < b . For all m ∈ (cid:74) p − , n (cid:75) , let us denote by E ( a,m ) the subset of N such that E ( a,m ) = { j ≤ m : the p th digit of j is a } .11or all e ∈ E ( b,m ) , we consider e (cid:48) = e − ( b − a ) × dg − p where dg is thenumber of digits of the integer e . It is clear that e (cid:48) ∈ E ( a,m ) . Thus we get: (cid:12)(cid:12) E ( a,m ) (cid:12)(cid:12) ≥ (cid:12)(cid:12) E ( b,m ) (cid:12)(cid:12) .We also have P ( a, p − + a,p ) = a +1 > P ( b, p − + a,p ) = 0. The result follows. Remark . For n ∈ N \ (cid:74) , p − − (cid:75) , we have, if n < p − + d , P ( d,n,p ) = 0.Hence for all n ∈ N \ (cid:74) , p − − (cid:75) and for all ( a, b ) ∈ (cid:74) , (cid:75) such that a < b , wehave: P ( a,n,p ) ≥ P ( b,n,p ) . Let us henceforth provide the following equality:

Proposition 4.3. P ( d,n,p ) = 1 n + 1 − p − (cid:16) P ( d, k + p − ,p ) × (10 k + p − p − ) + r ( n,d,p ) (cid:17) , where: k = max { i ∈ N : 10 i + p ≤ n } .Proof. Results are directly derived from Proposition 2.1. × p − additional subsequences To deﬁnitively bring to light the fact that the sequence ( P ( d,n,p ) ) n ∈ N \ (cid:74) , p − − (cid:75) does not converge, we will show that there exist additional subsequences thatconverge to limits diﬀerent from those of ( P ( d,φ ( d,p ) ( n ) ,p ) ) n ∈ N \ (cid:74) ,p − (cid:75) .For i ∈ (cid:74) p − , p − − (cid:75) , let us in this way study the 9 × p − subsequences( P ( d,ψ ( d,p,i ) ( n ) ,p ) ) n ∈ N \ (cid:74) ,p − (cid:75) where ψ ( d,p,i ) is the function from N \ (cid:74) , p − (cid:75) to N that maps n to (10 i + ( d + 1))10 n − p +1 −

1. We get the below result:

Proposition 5.1. i ∈ (cid:74) p − , p − − (cid:75) .The subsequence ( P ( d,ψ ( d,p,i ) ( n ) ,p ) ) n ∈ N \ (cid:74) ,p − (cid:75) converges to: α ( d,p ) p − + i + 1 − p − − k ( d,p,i ) d − l ( d,p,i ) + m ( d,p,i ) + n ( d,p,i ) + 10 p − ln( p − d p − )10 i + d + 1 , where:  k ( d,p,i ) = (cid:80) ij =10 p − ln( j +( d +1)10 j + d ) l ( d,p,i ) = (cid:80) ij =10 p − j ln( j +( d +1)10 j + d ) m ( d,p,i ) = (cid:80) i − j =10 p − ln( j +1)+ d j +( d +1) ) n ( d,p,i ) = (cid:80) i − j =10 p − j ln( j +1)+ d j +( d +1) ) . Proof. i ∈ (cid:74) p − , p − − (cid:75) . Thanks to Proposition 4.3, we have, for n ∈ N \ (cid:74) , p − (cid:75) : P ( d,ψ ( d,p,i ) ( n ) ,p ) = 1 (cid:0) i + ( d + 1) (cid:1) n − p +1 − p − (cid:16) P ( d, n − ,p ) × (10 n − p − )+ r ( ψ ( d,p,i ) ( n ) ,d,p ) (cid:17) . r ( ψ ( d,p,i ) ( n ) ,d,p ) can be simpliﬁed as follows: i (cid:88) j =10 p − (10 j +( d +1))10 n − p +1 − (cid:88) b =(10 j + d )10 n − p +1 (cid:16) − (9 j + d )10 n − p +1 + 10 p − − p − b + 1 − p − (cid:17) = 10 n − p +1 ( i − p − + 1) − i (cid:88) j =10 p − (cid:0) (9 j + d )10 n − p +1 + 10 p − − p − (cid:1) (10 j +( d +1))10 n − p +1 − (cid:88) b =(10 j + d )10 n − p +1 b + 1 − p − ∼ n → + ∞ n − p +1 ( i − p − + 1) − i (cid:88) j =10 p − (9 j + d )10 n − p +1 ln( 10 j + ( d + 1)10 j + d ) , thanks to inequalities 5.The second term of r ( ψ ( d,p,i ) ( n ) ,d,p ) can be simpliﬁed as follows: i − (cid:88) j =10 p − − j +1)+ d )10 n − p +1 − (cid:88) a =max(10 n , (10 j +( d +1))10 n − p +1 ) n − p +1 ( j + 1) − p − a + 1 − p − = (cid:0) n − p +1 p − − p − (cid:1) (10 p − + d )10 n − p +1 − (cid:88) a =10 n a + 1 − p − + (cid:0) n − p +1 ( j + 1) − p − (cid:1) i − (cid:88) j =10 p − (10( j +1)+ d )10 n − p +1 − (cid:88) a =(10 j +( d +1))10 n − p +1 a + 1 − p − ∼ n → + ∞ n − ln( 10 p − + d p − ) + i − (cid:88) j =10 p − n − p +1 ( j + 1) ln( 10( j + 1) + d j + ( d + 1) ) , thanks to inequalities 5.Knowing that P ( d, n − ,p ) ∼ n → + ∞ α ( d,p ) (see Proposition 3.1), the result follows.Let us denote by α ( d,p,i ) the limit of ( P ( d,ψ ( d,p,i ) ( n ) ,p ) ) n ∈ N \ (cid:74) ,p − (cid:75) . Here is afew values of P ( d,ψ ( d,p,i ) ( n ) ,p ) : d P ( d,ψ ( d, , , P ( d,ψ ( d, , , P ( d,ψ ( d, , , P ( d,ψ ( d, , , α ( d, , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 3: Values of P ( d,ψ ( d, , ( n ) , and α ( d, , , for n ∈ (cid:74) , (cid:75) and i = 7. Thesevalues are rounded to the nearest ten-thousandth.As a result, the sequence ( P ( d,n,p ) ) n ∈ N \ (cid:74) , p − − (cid:75) does not converge. The9 × p − convergent subsequences conﬁrm the remarks raised by Figures 3 and4 about the existence of ”pseudo cycles” in the graph of ( P ( d,n,p ) ) n ∈ N \ (cid:74) , p − − (cid:75) .13 P ( d,ψ ( d, , , P ( d,ψ ( d, , , P ( d,ψ ( d, , , α ( d, , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 4: Values of P ( d,ψ ( d, , ( n ) , and α ( d, , , for n ∈ (cid:74) , (cid:75) and i = 23. Thesevalues are rounded to the nearest ten-thousandth. From Figures 3 and 4, we notice that there exist ﬂuctuations in the graph of( P ( d,n,p ) ) n ∈ N \ (cid:74) , p − − (cid:75) . We deﬁne C ( d,p ) as follows: Deﬁnition 5.2. C ( d,p ) = 19 × p − p − − (cid:88) i =10 p − α ( d,p,i ) . Figure 5 below shows the diﬀerent values of α (0 , ,i ) , for i ∈ (cid:74) , (cid:75) and alsothe values of P (0 ,n, versus log( n ) for n ∈ (cid:74) , (cid:75) :Figure 5: Graph of P (0 ,n, versus log( n ). Note that points have not been allrepresented. Lines whose equation is y = α (0 , ,i ) , for i ∈ (cid:74) , (cid:75) , have also beenploted. Note that those of equations y = α (0 , , and y = α (0 , , are almostcoincident. We have C (0 , ≈ . p = 2 and p = 3, respectively).According to Hill ([9]), it is absolutely normal.We furthermore note, thanks to Table 5, that C (0 , slightly underestimates (cid:80) j =1 log(1 + j ) as can be infered from Figure 5. Conclusion

To conclude, through our model, we have seen that the proportion of d as p th digit, d ∈ (cid:74) , (cid:75) , in certain naturally occurring collections of data is more likely14 C ( d, (cid:80) j =1 log(1 + j + d )0 0 . . . . . . . . . . . . . . . . . . . . C ( d,p ) and probabilities associated to the second digit ([8]),for p = 2. These values are rounded to the nearest thousandth. d C ( d, (cid:80) j =10 log(1 + j + d )0 0 . . . . . . . . . . . . . . . . . . . . C ( d,p ) and probabilities associated to the third digit ([8]).These values are rounded to the nearest thousandth.to follow a law whose probability distribution is ( d, P ( d,n,p ) ) d ∈ (cid:74) , (cid:75) , where n is thesmaller integer upper bound of the physical, biological or economical quantitiesconsidered, rather than the generalized Benford’s Law. Knowing beforehandthe value of the upper bound n can be a way to ﬁnd a better adjusted law thanBenford’s one.The results of the article would have been the same in terms of ﬂuctuationsof the proportion of d ∈ (cid:74) , (cid:75) as p th digit, of limits of subsequences, or of re-sults on central values, if our discrete uniform distributions uniformly randomlyselected were lower bounded by a positive integer diﬀerent from 10 p − : ﬁrstterms in proportion formulas become rapidly negligible. Through our model weunderstand that the predominance of 0 as p th digit (followed by those of 1 andso on) is all but surprising in experimental data: it is only due to the fact that,in the lexicographical order, 0 appears before 1, 1 appears before 2, etc. However the limits of our model rest on the assumption that the randomvariables used to obtain our data are not the same and follow discrete uni-form distributions that are uniformly randomly selected. In certain naturallyoccurring collections of data it cannot conceivably be justiﬁed. Studying thecases where the random variables follow other distributions (and not necessarily15andomly selected) sketch some avenues for future research on the subject.

References [1] T. W. Beer. Terminal digit preference: beware of benford’s law.

Journalof Clinical Pathology , 62(2):192, 2009.[2] F. Benford. The law of anomalous numbers.

Proceedings of the AmericanPhilosophical Society , 78:127–131, 1938.[3] S. Blondeau Da Silva. Benford or not Benford: a systematic but not alwayswell-founded use of an elegant law in experimental ﬁelds. arXiv:1804.06186[math.PR] , 2018.[4] J. Burke and E. Kincanon. Benford’s law and physical constants: thedistribution of initial digits.

American Journal of Physics , 59:952, 1991.[5] A. Diekmann. Not the ﬁrst digit! using benford’s law to detect fraudulentscientiﬁc data.

Journal of Applied Statistics , 34(3):321–329, 2007.[6] J. L. Friar, T. Goldman, and J. P´erez-Mercader. Genome sizes and theBenford distribution.

PLOS ONE , 7(5), 2012.[7] N. Gauvrit and J.-P. Delahaye. Pourquoi la loi de benford n’est pasmyst´erieuse.

Math´ematiques et sciences humaines , 182(2):7–15, 2008.[8] T. Hill. The signiﬁcant-digit phenomenon.

The American MathematicalMonthly , 102(4):322–327, 1995.[9] T. Hill. A statistical derivation of the signiﬁcant-digit law.

StatisticalScience , 10(4):354–363, 1995.[10] D. Knuth.

The Art of Computer Programming 2 . Addison-Wesley, New-York, 1969.[11] R. Newcomb. Note on the frequency of use of the diﬀerent digits in naturalnumbers.

American Journal of Mathematics , 4:39–40, 1881.[12] M. Nigrini and W. Wood. Assessing the integrity of tabulated demographicdata. 1995. Preprint.[13] R. A. Raimi. The ﬁrst digit problem.

American Mathematical Monthly ,83(7):521–538, 1976.[14] G. Van Rossum.

Python tutorial , volume Technical Report CS-R9526.1995. Centrum voor Wiskunde en Informatica (CWI).[15] H. Varian. Benford’s law (letters to the editor).

The American Statistician ,26(3):62–65, 1972. 16 ppendix: Python script

Using Proposition 2.1, we can determine the terms of ( P ( d,n,p ) ) n ∈ N \ (cid:74) , p − − (cid:75) ,for d ∈ (cid:74) , (cid:75) . To this end, we have created a script with the Python pro-gramming language (Python Software Foundation, Python Language Reference,version 3 . . available at , see [14]). The implementedfunction expvalProp has three parameters: the rank n of the wanted term ofthe sequence, the position p of the considered digit and the value d of this digit.Here is the used algorithm: def expvalProp(n,d,p):k=-1;while(10**(k+p+1)¡=n):k=k+1l=math.ﬂoor((n-(10**(p-1)+d)*10**(k+1))/10**(k+2))+10**(p-2);S=0;T=0;if (k!=-1):for i in range(0,k+1):for j in range(10**(p-2),10**(p-1)):for b in range((10*j+d)*10**i,(10*j+(d+1))*10**i):T=T+(b-((9*j+d)*10**i+10**(p-2)-1))/(b+1-10**(p-1))for j in range(10**(p-2)-1,10**(p-1)):for a in range(max(10**(p+i-1),(10*j+(d+1))*10**i),min(10**(p+i),(10*(j+1)+d)*10**i)):S=S+((j+1)*10**i-10**(p-2))/(a+1-10**(p-1))if ((math.ﬂoor(n/10**(k+1))-10*math.ﬂoor(n/10**(k+2)))==d):for j in range(10**(p-2),l+1):for b in range((10*j+d)*10**(k+1),min(n,(10*j+(d+1))*10**(k+1)-1)+1):T=T+(b-((9*j+d)*10**(k+1)+10**(p-2)-1))/(b+1-10**(p-1))for j in range(10**(p-2)-1,l):for a in range(max(10**(p+k),(10*j+(d+1))*10**(k+1)),(10*(j+1)+d)*10**(k+1)):S=S+((j+1)*10**(k+1)-10**(p-2))/(a+1-10**(p-1))else:for j in range(10**(p-2),l+1):for b in range((10*j+d)*10**(k+1),(10*j+(d+1))*10**(k+1)):T=T+(b-((9*j+d)*10**(k+1)+10**(p-2)-1))/(b+1-10**(p-1))for j in range(10**(p-2)-1,l+1):for a in range(max(10**(p+k),(10*j+(d+1))*10**(k+1)),min(n,(10*(j+1)+d)*10**(k+1)-1)+1):S=S+((j+1)*10**(k+1)-10**(p-2))/(a+1-10**(p-1))return((S+T)/(n+1-10**(p-1)))def expvalProp(n,d,p):k=-1;while(10**(k+p+1)¡=n):k=k+1l=math.ﬂoor((n-(10**(p-1)+d)*10**(k+1))/10**(k+2))+10**(p-2);S=0;T=0;if (k!=-1):for i in range(0,k+1):for j in range(10**(p-2),10**(p-1)):for b in range((10*j+d)*10**i,(10*j+(d+1))*10**i):T=T+(b-((9*j+d)*10**i+10**(p-2)-1))/(b+1-10**(p-1))for j in range(10**(p-2)-1,10**(p-1)):for a in range(max(10**(p+i-1),(10*j+(d+1))*10**i),min(10**(p+i),(10*(j+1)+d)*10**i)):S=S+((j+1)*10**i-10**(p-2))/(a+1-10**(p-1))if ((math.ﬂoor(n/10**(k+1))-10*math.ﬂoor(n/10**(k+2)))==d):for j in range(10**(p-2),l+1):for b in range((10*j+d)*10**(k+1),min(n,(10*j+(d+1))*10**(k+1)-1)+1):T=T+(b-((9*j+d)*10**(k+1)+10**(p-2)-1))/(b+1-10**(p-1))for j in range(10**(p-2)-1,l):for a in range(max(10**(p+k),(10*j+(d+1))*10**(k+1)),(10*(j+1)+d)*10**(k+1)):S=S+((j+1)*10**(k+1)-10**(p-2))/(a+1-10**(p-1))else:for j in range(10**(p-2),l+1):for b in range((10*j+d)*10**(k+1),(10*j+(d+1))*10**(k+1)):T=T+(b-((9*j+d)*10**(k+1)+10**(p-2)-1))/(b+1-10**(p-1))for j in range(10**(p-2)-1,l+1):for a in range(max(10**(p+k),(10*j+(d+1))*10**(k+1)),min(n,(10*(j+1)+d)*10**(k+1)-1)+1):S=S+((j+1)*10**(k+1)-10**(p-2))/(a+1-10**(p-1))return((S+T)/(n+1-10**(p-1)))