aa r X i v : . [ m a t h . P R ] M a r Best finite approximations of Benford’s Law
Arno Berger and Chuang XuMathematical and Statistical SciencesUniversity of AlbertaEdmonton, Alberta,
Canada
March 29, 2018
Abstract
For arbitrary Borel probability measures with compact support on the real line, charac-terizations are established of the best finitely supported approximations, relative to threefamiliar probability metrics (L´evy, Kantorovich, and Kolmogorov), given any number ofatoms, and allowing for additional constraints regarding weights or positions of atoms. Asan application, best (constrained or unconstrained) approximations are identified for Ben-ford’s Law (logarithmic distribution of significands) and other familiar distributions. Theresults complement and extend known facts in the literature; they also provide new rigorousbenchmarks against which to evaluate empirical observations regarding Benford’s Law.
Keywords.
Benford’s Law, best uniform approximation, asymptotically best approximation,L´evy distance, Kantorovich distance, Kolmogorov distance.
MSC2010.
Given real numbers b > x = 0, denote by S b ( x ) the unique number in [1 , b [ such that | x | = S b ( x ) b k for some (necessarily unique) integer k ; for convenience, let S b (0) = 0. Thenumber S b ( x ) often is referred to as the base - b significand of x , a terminology particularly well-established in the case of b being an integer. (Unlike in much of the literature [2, 4, 19, 34], thecase of integer b does not carry special significance in this article.) A Borel probability measure µ on R is Benford base b , or b - Benford for short, if µ (cid:0) { x ∈ R : S b ( x ) ≤ s } (cid:1) = log s log b ∀ s ∈ [1 , b [ ; (1.1)here and throughout, log denotes the natural logarithm. Benford probabilities (or random vari-ables) exhibit many interesting properties and have been studied extensively [1, 14, 20, 25, 29].They provide one major pathway into the study of Benford’s Law , an intriguing, multi-faceted1henomenon that attracts interest from a wide range of disciplines; see, e.g., [4] for an intro-duction, and [25] for a panorama of recent developments. Specifically, denoting by β b the Borelprobability measure with β b ([1 , s ]) = log s log b ∀ s ∈ [1 , b [ , note that µ is b -Benford if and only if µ ◦ S − b = β b .Historically, the case of decimal (i.e., base-10) significands has been the most prominent, withearly empirical studies on the distribution of decimal significands (or significant digits) going backto Newcomb [27] and Benford [2]. If µ is 10-Benford, note that in particular µ (cid:0) { x ∈ R : leading decimal digit of x = D } (cid:1) = log(1 + D − )log 10 ∀ D = 1 , . . . , . (1.2)For theoretical as well as practical reasons, mathematical objects such as random variables orsequences, but also concrete, finite numerical data sets that conform, at least approximately,to (1.1) or (1.2) have attracted much interest [11, 23, 34, 35]. Time and again, Benford’s Lawhas emerged as a perplexingly prevalent phenomenon. One popular approach to understand thisprevalence seeks to establish (mild) conditions on a probability measure that make (1.1) or (1.2)hold with good accuracy, perhaps even exactly [7, 13, 14, 15, 29]. It is the goal of the presentarticle to provide precise quantitative information for this approach.Concretely, notice that while a finitely supported probability measure, such as, e.g., theempirical measure associated with a finite data set [5], may conform to the first-digit law (1.2),it cannot possibly satisfy (1.1) exactly. For such measures, therefore, it is natural to quantify, asaccurately as possible, the failure of equality in (1.1), that is, the discrepancy between µ ◦ S − b and β b . Utilizing three different familiar metrics d ∗ on probabilities (L´evy, Kantorovich, andKolmogorov metrics; see Section 2 for details), the article does this in a systematic way: Forevery n ∈ N , the value of min ν d ∗ ( β b , ν ) is identified, where ν is assumed to be supported onno more than n atoms (and may be subject to further restrictions such as, e.g., having onlyatoms of equal weight, as in the case of empirical measures); the minimizers of d ∗ ( β b , ν ) are alsocharacterized explicitly.The scope of the results presented herein, however, extends far beyond Benford probabilities.In fact, a general theory of best (constrained or unconstrained) d ∗ -approximations is developed.As far as the authors can tell, no such theories exist for the L´evy and Kolmogorov metrics —unlike in the case of the Kantorovich metric where it (mostly) suffices to rephrase pertinentknown facts [17, 36]. Once the general results are established, the desired quantitative insightsfor Benford probabilities are but straightforward corollaries. (Even in the context of Kantorovichdistance, the study of β b yields a rare new, explicit example of an optimal quantizer [17].)In particular, it turns out that, under all the various constraints considered here, the limit Q ∗ = lim n →∞ n min ν d ∗ ( β b , ν ) always exists, is finite and positive, and can be computed more orless explicitly. This greatly extends earlier results, notably of [5], and also suggests that n − Q ∗ may be an appropriate quantity against which to evaluate the many heuristic claims of closenessto Benford’s Law for empirical data sets found in the literature [3, 25, 26].The main results in this article, then, are existence proofs and characterizations for theminimizers of d ∗ ( µ, ν ) for arbitrary (compactly supported) probability measures µ , as provided2y Theorems 3.5, 3.6, 4.1, 5.1, and 5.4 (where additional constraints are imposed on the sizes orlocations of the atoms of ν ), as well as by Theorems 3.12 and 5.6 (where such constraints areabsent). As suggested by the title, this work aims primarily at a precise analysis of conformanceto Benford’s Law (or the lack thereof). Correspondingly, much attention is paid to the special caseof µ = β b , leading to explicit descriptions of best (constrained or unconstrained) approximationsof the latter (Corollaries 3.14, 4.4, and 5.9) and the exact asymptotics of d ∗ ( β b , ν ). As indicatedearlier, however, the main results are much more general. To emphasize this fact, two othersimple but illustrative examples of µ are repeatedly considered as well (though in less detail than β b ), namely the familiar Beta (2 ,
1) distribution and the (perhaps less familiar) inverse Cantordistribution. It turns out that while the former is absolutely continuous (w.r.t. Lebesgue measure)and its best approximations behave like those of β b in most respects (Examples 3.9, 3.15, 4.9,and 5.10), the latter is discrete and the behaviour of its best approximations is more delicate(Examples 3.10, 3.16, 4.10, and 5.11). Even with only a few details mentioned, these exampleswill help the reader appreciate the versatility of the main results.The organization of this article is as follows: Section 2 reviews relevant basic properties ofone-dimensional probabilities and the three main probability metrics used throughout. Each ofthe Sections 3 to 5 then is devoted specifically to one single metric. In each section, the problemof best (constrained or unconstrained) approximation by finitely supported probability measuresis first addressed in complete generality, and then the results are specialized to β b as well as otherconcrete examples. Section 6 summarizes and discusses the quantitative results obtained, andalso mentions a few natural questions for subsequent studies. Throughout, let I ⊂ R be a compact interval with Lebesgue measure λ ( I ) >
0, and P the setof all Borel probability measures on I . Associate with every µ ∈ P its distribution function F µ : R → R , given by F µ ( x ) = µ ( { y ∈ I : y ≤ x } ) ∀ x ∈ R , as well as its (upper) quantile function F − µ : [0 , → R , given by F − µ ( x ) = ( min I if 0 ≤ x < µ ( { min I } ) , sup { y ∈ I : F µ ( y ) ≤ x } if µ ( { min I } ) ≤ x < . (2.1)Note that F µ and F − µ both are non-decreasing, right-continuous, and bounded. The support of µ , denoted supp µ , is the smallest closed subset of I with µ -measure 1. Endowed with the weaktopology, the space P is compact and metrizable.Three important different metrics on P are discussed in detail in this article; for a panorama ofother metrics the reader is referred, e.g., to [16, 32] and the references therein. Given probabilities µ, ν ∈ P , their L´evy distance is d L ( µ, ν ) = ω inf { y ≥ F µ ( · − y ) − y ≤ F ν ≤ F µ ( · + y ) + y } , (2.2)3ith ω = max { , λ ( I ) } /λ ( I ); their L r - Kantorovich (or transport ) distance , with r ≥
1, is d r ( µ, ν ) = λ ( I ) − (cid:18)Z (cid:12)(cid:12) F − µ ( y ) − F − ν ( y ) (cid:12)(cid:12) r d y (cid:19) /r = λ ( I ) − k F − µ − F − ν k r ; (2.3)and their Kolmogorov (or uniform ) distance is d K ( µ, ν ) = sup x ∈ R | F µ ( x ) − F ν ( x ) | = k F µ − F ν k ∞ . Henceforth, the symbol d ∗ summarily refers to any of d L , d r , and d K . The (unusual) normalizingfactors in (2.2) and (2.3) guarantee that all three metrics are comparable numerically in thatsup µ,ν ∈P d ∗ ( µ, ν ) = 1 in either case. Note that d ( µ, ν ) = λ ( I ) − Z I | F µ ( x ) − F ν ( x ) | d x ∀ µ, ν ∈ P , by virtue of Fubini’s Theorem. The metrics d L and d r are equivalent: They both metrize theweak topology on P , and hence are separable and complete. By contrast, the complete metric d K induces a finer topology and is non-separable. However, when restricted to P cts := { µ ∈ P : µ ( { x } ) = 0 ∀ x ∈ I } , a dense G δ -set in P , the metric d K does metrize the weak topology on P cts and is separable. The values of d L , d r , and d K are not completely unrelated since, as is easilychecked, d ≤ λ ( I ) ωλ ( I ) d L , d r ≤ d s (if r ≤ s ) , d ≤ d K , d L ≤ ωd K , (2.4)and all bounds in (2.4) are best possible. Beyond (2.4), however, no relative bounds exist between d L , d r , and d K in general: If ∗ 6 = 1, ∗ 6 = ◦ , and ( ∗ , ◦ )
6∈ { ( L , K ) , ( r, s ) } with r ≤ s thensup µ,ν ∈P : µ = ν d ∗ ( µ, ν ) d ◦ ( µ, ν ) = + ∞ . Each metric d ∗ , therefore, captures a different aspect of P and deserves to be studied indepen-dently. To illustrate this further, let I = [0 , µ = δ ∈ P , and µ k = (cid:0) − k − (cid:1) δ + k − δ k − for k ∈ N ; here and throughout, δ a denotes the Dirac (probability) measure concentrated at a ∈ R . Then lim k →∞ d ∗ ( µ, µ k ) = 0, but the rate of convergence differs between metrics: d L ( µ, µ k ) = k − , d r ( µ, µ k ) = k − − /r , d K ( µ, µ k ) = k − ∀ k ∈ N . The goal of this article is first to identify, for each metric d ∗ introduced earlier, the bestfinitely supported d ∗ -approximation(s) of any given µ ∈ P . The general results are then appliedto Benford’s Law, as well as other concrete examples. Specifically, if µ = β b for some b > I = [1 , b ] . The following unified notation and terminologyis used throughout: For every n ∈ N , let Ξ n = { x ∈ I n : x , ≤ . . . ≤ x ,n } , Π n = { p ∈ R n : p ,j ≥ , P nj =1 p ,j = 1 } , and for each x ∈ Ξ n and p ∈ Π n define δ px = P nj =1 p ,j δ x ,j . For convenience, x , := −∞ and x ,n +1 := + ∞ for every x ∈ Ξ n , as well as P ,i = P ij =1 p ,j for i = 0 , . . . , n and p ∈ Π n ; note that P , = 0 and P ,n = 1 . Henceforth, usage of the symbol δ px tacitly assumes that x ∈ Ξ n and p ∈ Π n , for some n ∈ N either specified explicitly or else clear from the context. Call δ px a best d ∗ -approximation of µ ∈ P , given x ∈ Ξ n if d ∗ ( µ, δ px ) ≤ d ∗ ( µ, δ qx ) ∀ q ∈ Π n . δ px a best d ∗ -approximation of µ , given p ∈ Π n if d ∗ ( µ, δ px ) ≤ d ∗ (cid:0) µ, δ py (cid:1) ∀ y ∈ Ξ n . Denote by δ • x and δ p • any best d ∗ -approximation of µ , given x and p , respectively. Best d ∗ -approximations, given p = u n = ( n − , . . . , n − ) are referred to as best uniform d ∗ -approximati-ons, and denoted δ u n • . Finally, δ px is a best d ∗ -approximation of µ ∈ P , denoted δ • ,n • , if d ∗ ( µ, δ px ) ≤ d ∗ (cid:0) µ, δ qy (cid:1) ∀ y ∈ Ξ n , q ∈ Π n . Notice that usage of the symbols δ • x , δ p • , and δ • ,n • always refers to a specific metric d ∗ andprobability measure µ ∈ P , both usually clear from the context.Information theory sometimes refers to d ∗ (cid:0) µ, δ • ,n • (cid:1) as the n -th quantization error , and tolim n →∞ nd ∗ (cid:0) µ, δ • ,n • (cid:1) , if it exists, as the quantization coefficient of µ ; see, e.g., [17]. By analogy, d ∗ ( µ, δ u n • ) and lim n →∞ nd ∗ ( µ, δ u n • ), if it exists, may be called the n -th uniform quantization error and the uniform quantization coefficient , respectively. This section identifies best finitely supported d L -approximations (constrained or unconstrained)of a given µ ∈ P . To do this in a transparent way, it is helpful to first consider more generally a fewelementary properties of non-decreasing functions. These properties are subsequently specializedto either F µ or F − µ .Throughout, let f : R → R be non-decreasing, and define f ( ±∞ ) = lim x →±∞ f ( x ) ∈ R , where R = R ∪ {−∞ , + ∞} denotes the extended real line with the usual order and topology. Associatewith f two non-decreasing functions f ± : R → R , defined as f ± ( x ) = lim ε ↓ f ( x ± ε ). Clearly, f − is left-continuous whereas f + is right-continuous, with f ± ( −∞ ) = f ( −∞ ), f ± (+ ∞ ) = f (+ ∞ ),as well as f − ≤ f ≤ f + , and f + ( x ) ≤ f − ( y ) whenever x < y ; in particular, f − ( x ) = f + ( x ) if andonly if f is continuous at x . The (upper) inverse function f − : R → R is given by f − ( t ) = sup { x ∈ R : f ( x ) ≤ t } ∀ t ∈ R ;by convention, sup ∅ := −∞ (and inf ∅ := + ∞ ). Note that (2.1) is consistent with this notation.For what follows, it is useful to recall a few basic properties of inverse functions; see, e.g., [36,Sec.3] for details. Proposition 3.1.
Let f : R → R be non-decreasing. Then f − is non-decreasing and right-continuous. Also, ( f ± ) − = f − , and ( f − ) − = f + . Given two non-decreasing functions f, g : R → R , by a slight abuse of notation, and inspiredby (2.2), let d L ( f, g ) = inf { y ≥ f ( · − y ) − y ≤ g ≤ f ( · + y ) + y } ∈ [0 , + ∞ ] . For instance, d L ( µ, ν ) = ωd L ( F µ , F ν ) for all µ, ν ∈ P . It is readily checked that d L is symmetric,satisfies the triangle inequality, and d L ( f, g ) > f − = g − , or equivalently, f + = g + .Crucially, the quantity d L is invariant under inversion.5 roposition 3.2. Let f, g : R → R be non-decreasing. Then d L ( f − , g − ) = d L ( f, g ) . Thus, for instance, d L ( µ, ν ) = ωd L ( F − µ , F − ν ) for all µ, ν ∈ P . In general, the value of d L ( f, g ) may equal + ∞ . However, if the set { f = g } := { x ∈ R : f ( x ) = g ( x ) } is boundedthen d L ( f, g ) < + ∞ . Specifically, notice that { F µ = F ν } ⊂ I and { F − µ = F − ν } ⊂ [0 ,
1[ both arebounded for all µ, ν ∈ P .Given a non-decreasing function f : R → R , let I ⊂ R be any interval with the property that f − (sup I ) , − f + (inf I ) < + ∞ , (3.1)and define an auxiliary function ℓ f,I : R → R as ℓ f,I ( x ) = inf { y ≥ f − (sup I − y ) − y ≤ x ≤ f + (inf I + y ) + y } . Note that for each x ∈ R , the set on the right equals [ a, + ∞ [ with the appropriate a ≥ ℓ f,I ( x ) = a . Clearly, ℓ f,J ≤ ℓ f,I whenever J ⊂ I . Also, for every a ∈ R , thefunction ℓ f, { a } is non-increasing on ] −∞ , f − ( a )], vanishes on [ f − ( a ) , f + ( a )], and is non-decreasingon [ f + ( a ) , + ∞ [. A few elementary properties of ℓ f,I are straightforward to check; they are usedbelow to establish the main results of this section. Proposition 3.3.
Let f : R → R be non-decreasing, and I ⊂ R an interval satisfying (3.1) .Then ℓ f,I is Lipschitz continuous, and ≤ ℓ f,I ( x ) ≤ | x | + max { , f − (sup I ) , − f + (inf I ) } ∀ x ∈ R . Moreover, ℓ f,I attains a minimal value ℓ ∗ f,I := min x ∈ R ℓ f,I ( x ) = min { y ≥ f − (sup I − y ) − y ≤ f + (inf I + y ) + y } ≥ which is positive unless f − (sup I ) ≤ f + (inf I ) . For µ ∈ P , note that (3.1) automatically holds if f = F µ , or if f = F − µ and I ⊂ [0 , ℓ f,I has the properties stated in Proposition 3.3, and ℓ ∗ f,I ≤ .When formulating the main results, the following quantities are useful: Given µ ∈ P , n ∈ N ,and x ∈ Ξ n , let L • ( x ) = max n ℓ F µ , [ −∞ ,x , ] (0) , ℓ ∗ F µ , [ x , ,x , ] , . . . , ℓ ∗ F µ , [ x ,n − ,x ,n ] , ℓ F µ , [ x ,n , + ∞ ] (1) o ;similarly, given p ∈ Π n , let L • ( p ) = max nj =1 ℓ ∗ F − µ , [ P ,j − ,P ,j ] . To illustrate these quantities for a concrete example, consider µ = β b , where ℓ ∗ F µ , [ x ,j ,x ,j +1 ] is theunique solution of b ℓ = x ,j +1 − ℓx ,j + ℓ j = 1 , . . . , n − , whereas ℓ F µ , [ −∞ ,x , ] (0) and ℓ F µ , [ x ,n , + ∞ ] (1) solve b ℓ = x , − ℓ and b ℓ = b/ ( x ,n + ℓ ), respectively.(Recall that 1 ≤ x , ≤ . . . ≤ x ,n ≤ b .) Similarly, ℓ ∗ F − µ , [ P ,j − ,P ,j ] is the unique solution of2 ℓ = b P ,j − ℓ − b P ,j − + ℓ j = 1 , . . . , n ;6n particular, j ℓ ∗ F − µ , [( j − /n,j/n ] is increasing, and hence L • ( u n ) is the unique solution of2 L = b − L − b L − /n . (3.2)By using functions of the form ℓ f,I , the value of d L ( µ, ν ) can easily be computed whenever ν hasfinite support. Lemma 3.4.
Let µ ∈ P and n ∈ N . For every x ∈ Ξ n and p ∈ Π n , d L ( µ, δ px ) = ω max nj =0 ℓ F µ , [ x ,j ,x ,j +1 ] ( P ,j ) = ω max nj =1 ℓ F − µ , [ P ,j − ,P ,j ] ( x ,j ) . (3.3) Proof.
Label x ∈ Ξ n uniquely as x ,j +1 = . . . = x ,j < x ,j +1 = . . . = x ,j < x ,j +1 = . . .< . . . = x ,j m − < x ,j m − +1 = . . . = x ,j m , with integers i ≤ j i ≤ n for 1 ≤ i ≤ m , and j = 0, j m = n , and define y ∈ Ξ m and q ∈ Π m as y ,i = x ,j i and q ,i = P ,j i − P ,j i − , respectively, for i = 1 , . . . , m . For convenience, let I j = [ x ,j , x ,j +1 ] for j = 0 , . . . , n , and J i = [ y ,i , y ,i +1 ] = I j i for i = 0 , . . . , m . With this, δ qy = δ px ,and ω − d L ( µ, δ px ) = d L ( F µ , F δ qy )= inf { t ≥ F µ − ( y ,i +1 − t ) − t ≤ Q ,i ≤ F µ ( y ,i + t ) + t ∀ i = 0 , . . . , m } = max mi =0 ℓ F µ ,J i ( Q ,i ) ≤ max nj =0 ℓ F µ ,I j ( P ,j ) . To prove the reverse inequality, pick any j = 0 , . . . , n . If x ,j < x ,j +1 then I j = J i and P ,j = Q ,i ,with the appropriate i , and hence ℓ F µ ,I j ( P ,j ) = ℓ F µ ,J i ( Q ,i ). If x ,j = x ,j +1 then I j = { y ,i } forsome i . In this case either P ,j < F µ − ( y ,i ) and Q ,i − ≤ P ,j , and hence ℓ F µ ,I j ( P ,j ) = ℓ F µ , { y ,i } ( P ,j ) ≤ ℓ F µ , { y ,i } ( Q ,i − ) ≤ ℓ F µ ,J i − ( Q ,i − ) ;or F µ − ( y ,i ) ≤ P ,j ≤ F µ ( y ,i ), and hence ℓ F µ ,I j ( P ,j ) = ℓ F µ , { y ,i } ( P ,j ) = 0; or P ,j > F µ ( y ,i ) and Q ,i ≥ P ,j , and hence ℓ F µ ,I j ( P ,j ) = ℓ F µ , { y ,i } ( P ,j ) ≤ ℓ F µ , { y ,i } ( Q ,i ) ≤ ℓ F µ ,J i ( Q ,i ) . In all three cases, therefore, ω − d L ( µ, δ px ) ≥ max nj =0 ℓ F µ ,I j ( P ,j ), which establishes the first equal-ity in (3.3). The second equality, a consequence of Proposition 3.2, is proved analogously.Utilizing Lemma 3.4, it is straightforward to characterize the best finitely supported d L -approximations of µ ∈ P with prescribed locations. Theorem 3.5.
Let µ ∈ P and n ∈ N . For every x ∈ Ξ n , there exists a best d L -approximation of µ , given x . Moreover, d L ( µ, δ px ) = d L (cid:0) µ, δ • x (cid:1) if and only if, for every j = 0 , . . . , n , x ,j < x ,j +1 = ⇒ ℓ F µ , [ x ,j ,x ,j +1 ] ( P ,j ) ≤ L • ( x ) , (3.4) and in this case d L ( µ, δ px ) = ω L • ( x ) . roof. Fix µ ∈ P , n ∈ N , and x ∈ Ξ n . As in the proof of Lemma 3.4, write I j = [ x ,j , x ,j +1 ] forconvenience. By (3.3), for every p ∈ Π n , d L ( µ, δ px ) = ω max nj =0 ℓ F µ ,I j ( P ,j ) ≥ ω max { ℓ F µ ,I (0) , ℓ ∗ F µ ,I , . . . , ℓ ∗ F µ ,I n − , ℓ F µ ,I n (1) } = ω L • ( x ) . As seen in the proof of Lemma 3.4, validity of (3.4) implies ℓ F µ , [ x ,j ,x ,j +1 ] ( P ,j ) ≤ L • ( x ) for all j = 0 , . . . , n . Thus δ px is a best d L -approximation of µ , given x , whenever (3.4) holds, i.e., thelatter is sufficient for optimality. On the other hand, consider q ∈ Π n with Q ,j = 12 (cid:16) F µ − (cid:0) x ,j +1 − L • ( x ) (cid:1) + F µ (cid:0) x ,j + L • ( x ) (cid:1)(cid:17) ∀ j = 1 , . . . , n − . Note that q is well-defined, since j Q ,j is non-decreasing, and 0 ≤ Q ,j ≤ j =1 , . . . , n −
1. Moreover, by the definition of L • ( x ), ℓ F µ ,I j ( Q ,j ) ≤ L • ( x ) ∀ j = 0 , . . . , n , and hence d L ( δ qx , µ ) = ω L • ( x ). This shows that best d L -approximations of µ , given x , do exist,and (3.4) also is necessary for optimality.Best finitely supported d L -approximations of any µ ∈ P with prescribed weights can becharacterized in a similar manner. By virtue of (3.3), the proof of the following is completelyanalogous to the proof of Theorem 3.5 above. Proposition 3.6.
Let µ ∈ P and n ∈ N . For every p ∈ Π n , there exists a best d L -approximationof µ , given p . Moreover, d L ( µ, δ px ) = d L ( µ, δ p • ) if and only if, for every j = 1 , . . . , n , P ,j − < P ,j = ⇒ ℓ F − µ , [ P ,j − ,P ,j ] ( x ,j ) ≤ L • ( p ) , (3.5) and in this case d L ( µ, δ px ) = ω L • ( p ) . Remark 3.7. (i) With f, I as in Proposition 3.3, for every a ∈ R the set { ℓ f,I ≤ a } is a (possiblyempty or one-point) interval. Thus, conditions (3.4) and (3.5) are very similar in spirit to therequirements of [36, Thm.5.1 and 5.5], restated in Proposition 4.1 below, though the latter maybe quite a bit easier to work with in concrete calculations.(ii) Note that if n = 1 then (3.4) holds automatically, whereas (3.5) shows that d L ( µ, δ a ) isminimal precisely if the function ℓ F − µ , [0 , attains its minimal value at a .As a corollary, Proposition 3.6 identifies all best uniform d L -approximations of β b with b > I = [1 , b ], and hence ω = max { b, } − b − ω b in this case. Corollary 3.8.
Let b > and n ∈ N . Then δ u n x is a best uniform d L -approximation of β b if andonly if b j/n − L − L ≤ x ,j ≤ b ( j − /n + L + L ∀ j = 1 , . . . , n , where L is the unique solution of (3.2) ; in particular, δ u n • = n . Moreover, d L ( β b , δ u n • ) = ω b L , and lim n →∞ nd L ( β b , δ u n • ) = max { b, } − b − · b log b b log b . xample 3.9. Consider the
Beta (2 ,
1) distribution on I = [0 , F µ ( x ) = x for all x ∈ I . Given n ∈ N , it is straightforward to check that, analogously to (3.2), L • ( u n ) is the uniquesolution of L r n − L = 12 n − L , (3.6)and δ u n x with x ∈ Ξ n is a best uniform d L -approximation of µ if and only if r jn − L − L ≤ x ,j ≤ r j − n + L + L ∀ j = 1 , . . . , n . Moreover, d L ( µ, δ u n • ) = L , and (3.6) yields that lim n →∞ nd L ( µ, δ u n • ) = . Unlike in the case of β b , it is possible to have δ u n • < n whenever n ≥ Example 3.10.
Let again I = [0 ,
1] and consider µ ∈ P with µ ( { i − m } ) = 3 − m for every m ∈ N and every odd 1 ≤ i < m . Thus µ is a discrete measure with supp µ = I . In fact, µ simply is the inverse Cantor distribution , in the sense that F − µ ( x ) = F ν ( x ) for all x ∈ I ,where ν is the log 2 / log 3-dimensional Hausdorff measure on the classical Cantor middle-thirdsset. Given n ∈ N , Proposition 3.6 guarantees the existence of a best uniform d L -approximationof µ , though the explicit value of L • ( u n ) is somewhat cumbersome to determine. Still, utilizingthe self-similarity of F − µ , one finds that1216 ≤ lim inf n →∞ nd L ( µ, δ u n • ) ≤ , lim sup n →∞ nd L ( µ, δ u n • ) = 12 . (3.7)Thus ( n − ) is the precise rate of decay of (cid:0) d L ( µ, δ u n • ) (cid:1) , just as in the case of β b and Beta (2 , n →∞ nd L ( µ, δ u n • ) does not exist.By combining Theorem 3.5 and Proposition 3.6, it is possible to characterize the best d L -approximations of µ ∈ P as well, that is, to identify the minimizers of ν d L ( µ, ν ) subject onlyto the requirement that ν ≤ n . To this end, associate with every non-decreasing function f : R → R and every number a ≥ T f,a : R → R , according to T f,a ( x ) = f + (cid:0) f − ( x + a ) + 2 a (cid:1) + a ∀ x ∈ R . For every n ∈ N , denote by T [ n ] f,a the n -fold composition of T f,a with itself. The following propertiesof T f,a are readily verified. Proposition 3.11.
Let f : R → R be non-decreasing, a ≥ , and n ∈ N . Then T [ n ] f,a is non-decreasing and right-continuous. Also, a T [ n ] f,a ( x ) is increasing and right-continuous for every x ∈ R , and if x ≤ a + f (+ ∞ ) then the sequence (cid:16) T [ k ] f,a ( x ) (cid:17) is non-decreasing. To utilize Proposition 3.11 for the d L -approximation problem, let f = F µ with µ ∈ P . Then (cid:16) T [ k ] F µ ,a (0) (cid:17) is non-decreasing; in fact, lim k →∞ T [ k ] F µ ,a (0) = a + 1. On the other hand, given n ∈ N ,clearly T [ n ] F µ ,a (0) ≥ a ≥
1, and hence L • ,n • := min n a ≥ T [ n ] F µ ,a (0) ≥ o < + ∞ . L • ,n • only depends on µ and n . The sequence (cid:0) L • ,n • (cid:1) is non-increasing, and n L • ,n • ≤ for every n . Also, L • ,n • = 0 if and only if µ ≤ n .For a concrete example, consider µ = β b with a < ( b − T F µ ,a ( x ) = a if x < − a ,a + log b ( b x + a + 2 a ) if − a ≤ x < − a + log b ( b − a ) ,a + 1 if x ≥ − a + log b ( b − a ) , from which it is easily deduced that L • ,n • is the unique solution of b nL = 2 L + b ( b L − b − L )2 L + b L − b − L . (3.8)As the following result shows, the quantity L • ,n • always plays a central role in identifying best(unconstrained) d L -approximations of a given µ ∈ P . Theorem 3.12.
Let µ ∈ P and n ∈ N . There exists a best d L -approximation of µ , and d L (cid:0) µ, δ • ,n • (cid:1) = ω L • ,n • . Moreover, for every x ∈ Ξ n and p ∈ Π n , the following are equivalent: (i) d L ( µ, δ px ) = d L (cid:0) µ, δ • ,n • (cid:1) ; (ii) all implications in (3.4) are valid with L • ( x ) replaced by L • ,n • ; (iii) all implications in (3.5) are valid with L • ( p ) replaced by L • ,n • .Proof. To see that best d L -approximations of µ do exist, simply note that the set { ν ∈ P : ν ≤ n } is compact, and the function ν d L ( µ, ν ) is continuous, hence attains a minimalvalue for some ν = δ px with x ∈ Ξ n and p ∈ Π n . Clearly, any such δ px also is a best approximationof µ , given p . By Proposition 3.6, therefore, d L ( µ, δ px ) = ω L • ( p ), as well as F − µ − (cid:0) P ,j − L • ( p ) (cid:1) − L • ( p ) ≤ x ,j ≤ F − µ (cid:0) P ,j − + L • ( p ) (cid:1) + L • ( p )whenever P ,j − < P ,j , and indeed for every j = 1 , . . . , n . It follows that P ,j ≤ T F µ , L • ( p ) ( P ,j − ) forall j , and hence 1 = P ,n ≤ T [ n ] F µ , L • ( p ) (0), that is, L • ,n • ≤ L • ( p ). This shows that d L ( µ, δ px ) ≥ ω L • ,n • .To establish the reverse inequality, let m = min n i ≥ T [ i ] F µ , L • ,n • (0) ≥ o . Clearly, 1 ≤ m ≤ n , and L • ,m • = L • ,n • . Define q ∈ Π m via Q ,i = T [ i ] F µ , L • ,n • (0) ∀ i = 1 , . . . , m − . Note that i Q ,i is non-decreasing, and 0 ≤ Q ,i ≤
1, so q is well-defined. Also, consider y ∈ Ξ m with y ,i = 12 (cid:0) F − µ − ( Q ,i − L • ,m • ) + F − µ ( Q ,i − + L • ,m • ) (cid:1) ∀ i = 1 , . . . , m . By the definitions of L • ,m • , q , and y , ℓ F − µ , [ Q ,i − ,Q ,i ] ( y ,i ) ≤ L • ,m • ∀ i = 1 , . . . , m , d L ( µ, δ px ) ≤ d L (cid:0) µ, δ qy (cid:1) = ω max ni =1 ℓ F − µ , [ Q ,i − ,Q ,i ] ( y ,i ) ≤ ω L • ,m • = ω L • ,n • . This shows that indeed d L ( µ, δ px ) = ω L • ,n • and also proves (i) ⇒ (iii). The implication (i) ⇒ (ii)follows by a similar argument. That, conversely, either of (ii) and (iii) implies (i) is evident from(3.3), together with the fact that, as seen in the proof of Lemma 3.4 above, validity of (3.4)and (3.5) implies max nj =0 ℓ F µ , [ x ,j ,x ,j +1 ] ( P ,j ) ≤ L • ( x ) and max nj =1 ℓ F − µ , [ P ,j − ,P ,j ] ( x ,j ) ≤ L • ( p ),respectively. Remark 3.13. (i) The above proof of Theorem 3.12 shows that in fact L • ,n • = min x ∈ Ξ n L • ( x ) = min p ∈ Π n L • ( p ) . (ii) Theorem 3.12 is similar to classical one-dimensional quantization results as presented, e.g., in[17, Sec.5.2]. What makes the theorem (and its analogue, Theorem 5.6 in Section 5) particularlyappealing is that its conditions (ii) and (iii) not only are necessary for optimality, but alsosufficient. By contrast, it is well known that sufficient conditions for best d ∗ -approximationsmay be hard to come by in general; see, e.g., [17, Sec.4.1], and also Proposition 4.1(iii) below,regarding the case of ∗ = 1.When specialized to µ = β b , Theorem 3.12 yields the best finitely supported d L -approximati-ons of Benford’s Law. Corollary 3.14.
Let b > and n ∈ N . Then the best d L -approximation of β b is δ px , with x ,j = b (2 j − L + 2 L b jL − b L − − L = b P ,j − L − L ,P ,j = 1log b log (cid:18) b (2 j − L + 2 L b jL − b L − (cid:19) + L = log( x ,j + L )log b + L , for all j = 1 , . . . , n , where L is the unique solution of (3.8) ; in particular, δ • ,n • = n .Moreover, d L (cid:0) β b , δ • ,n • (cid:1) = ω b L , and lim n →∞ nd L ( β b , δ • ,n • ) = max { b, } − b − · log(1 + b log b ) − log(1 + log b )log b . To compare this to Corollary 3.8, note that P ,j j/n whenever n ≥
2, and then the n -thquantization error d L (cid:0) β b , δ • ,n • (cid:1) is smaller than the n -th uniform quantization error d L ( β b , δ u n • ).The d L -quantization coefficient of β b also is smaller than its uniform counterpart, sincelog(1 + b log b ) − log(1 + log b )log b < b log b b log b ∀ b > . Example 3.15.
For µ = Beta (2 , d L -approximation.Although the equation determining L • ,n • is less transparent than (3.8), it can be shown thatlim n →∞ nd L ( µ, δ • ,n • ) = (2 − log 3) < . 11Sfrag replacements b = 10 , n = 3 d L ( β , δ u • ) = L • ( u ) = 1 . · − d L ( β , δ • , • ) = L • , • = 1 . · − x F β ( x ) Figure 1: The best d L -approximation (solid red line) of β is unique, whereas best uniform d L -approximations (broken red lines) are not; see Corollaries 3.14 and 3.8, respectively. Example 3.16.
For the inverse Cantor distribution, a best d L -approximation exists by Theorem3.12, and utilizing the self-similarity of F − µ , it is possible to derive estimates such as1216 ≤ n log 3 / log 2 d L ( µ, δ • ,n • ) ≤ ∀ n ∈ N , (3.9)which shows that (cid:0) d L ( µ, δ u n • ) (cid:1) decays like ( n − log 3 / log 2 ), and hence faster than in the case of β b and Beta (2 , This section studies best finitely supported d r -approximations of Benford’s Law. Mostly, theresults are special cases of more general facts taken from the authors’ comprehensive study on d r -approximations [36]. d -approximations With d L replaced by d , the main results of the previous section have the following analogues,stated here for the reader’s convenience; see [36, Sec.5] for details. Proposition 4.1.
Let µ ∈ P and n ∈ N . (i) For every x ∈ Ξ n , there exists a best d -approximation of µ , given x . Moreover, d ( µ, δ px ) = d ( µ, δ • x ) if and only if, for every j = 0 , . . . , n , x ,j < x ,j +1 = ⇒ F µ − (cid:0) ( x ,j + x ,j +1 ) (cid:1) ≤ P ,j ≤ F µ (cid:0) ( x ,j + x ,j +1 ) (cid:1) . (4.1)12ii) For every p ∈ Π n , there exists a best d -approximation of µ , given p . Moreover, d ( µ, δ px ) = d ( µ, δ p • ) if and only if, for every j = 1 , . . . , n , P ,j − < P ,j = ⇒ F − µ − (cid:0) ( P ,j − + P ,j ) (cid:1) ≤ x ,j ≤ F − µ (cid:0) ( P ,j − + P ,j ) (cid:1) . (4.2)(iii) There exists a best d -approximation of µ , and if d ( µ, δ px ) = d (cid:0) µ, δ • ,n • (cid:1) then (4.1) and (4.2) are valid for every j = 1 , . . . , n . Remark 4.2.
Though the phrasing of Proposition 4.1 emphasizes its analogy to Theorem 3.5(and also to Theorem 5.1 below), there nevertheless is a subtle difference: While in (3.4) and (5.1)it can equivalently be stipulated that, respectively, ℓ F µ , [ x ,j ,x ,j +1 ] ( P ,j ) ≤ L • ( x ) and F µ − ( x ,j +1 ) − K • ( x ) ≤ P ,j ≤ F µ ( x ,j ) + K • ( x ) for all j = 0 , . . . , n , simple examples show that the “only if” partof Proposition 4.1(i) may fail, should (4.1) be replaced by F µ − (cid:0) ( x ,j + x ,j +1 ) (cid:1) ≤ P ,j ≤ F µ (cid:0) ( x ,j + x ,j +1 ) (cid:1) ∀ j = 0 , . . . , n. Similar observations pertain to Proposition 4.1(ii) vis-`a-vis Proposition 3.6 and Theorem 5.4.Proposition 4.1 immediately yields the existence of unique best uniform d -approximationsof β b ; see also [5, Cor.2.10]. Corollary 4.3.
Let b > and n ∈ N . Then the best uniform d -approximation of β b is δ u n x ,with x ,j = b (2 j − / (2 n ) for all j = 1 , . . . , n , and δ u n • = n . Moreover, d ( β b , δ u n • ) =1log b tanh (cid:18) log b n (cid:19) , and lim n →∞ nd ( β b , δ u n • ) = .Proof. By Proposition 4.1(ii), x ,j = b (2 j − / (2 n ) for all j = 1 , . . . , n , and nd ( β b , δ u n • ) = nb − X nj =1 Z j/n ( j − /n (cid:12)(cid:12)(cid:12) b y − b (2 j − / (2 n ) (cid:12)(cid:12)(cid:12) d y = n (cid:0) b / (4 n ) − b − / (4 n ) (cid:1) ( b −
1) log b X nj =1 b (2 j − / (2 n ) = n log b tanh (cid:18) log b n (cid:19) n →∞ −→ . Best (unconstrained) d -approximations of β b exist and are unique, too, by virtue of Proposition4.1 and a direct calculation. Corollary 4.4.
Let b > and n ∈ N . Then the best d -approximation of β b is δ px , with x ,j = (cid:18) j − n (cid:16) b / − (cid:17)(cid:19) (cid:18) jn (cid:16) b / − (cid:17)(cid:19) ,P ,j = 2log b log (cid:18) jn (cid:16) b / − (cid:17)(cid:19) , for all j = 1 , . . . , n ; in particular, δ • ,n • = n . Moreover, d ( β b , δ • ,n • ) = 1 n log b tanh (cid:18) log b (cid:19) . roof. Let δ px be a best d -approximation. Then, by Proposition 4.1(iii), b P ,j = x ,j + x ,j +1 ∀ j = 1 , . . . , n − , but also x ,j = b ( P ,j − + P ,j ) / for all j = 1 , . . . , n , and hence 2 b P ,j / = b P ,j − / + b P j +1 / . Since P = 0, P n = 1, it follows that b P ,j / = 1 + j ( b / − n − for all j = 0 , . . . , n . This yields theasserted unique δ px , and d ( β b , δ • ,n • ) = 1 b − X nj =1 Z P ,j P ,j − | b y − x ,j | d y = b − x ,n − ( x , − b −
1) log b = 1 n log b tanh (cid:18) log b (cid:19) , via a straightforward calculation.PSfrag replacements b = 10 , n = 3 d ( β , δ u • ) = 8 . · − d ( β , δ • , • ) = 7 . · − x F β ( x ) Figure 2: The best (solid blue line) and best uniform (broken blue line) d -approximations of β both are unique; see Corollaries 4.4 and 4.3, respectively. Coincidentally, best uniform d -approximations of β are best d K -approximations as well; see Corollary 5.9. Remark 4.5. (i) Due to the highly non-linear nature of the optimality conditions (4.1) and(4.2), best d -approximations are rarely given by explicit formulae such as those in Corollary 4.4.Aside from Benford’s Law, the authors know of only two other families of continuous distributionsthat allow for similarly explicit formulae, namely uniform and (one- or two-sided) exponentialdistributions.(ii) A popular family of metrics on P closely related to d are the so-called Fortet–Mourier r -distances (1 ≤ r < + ∞ ), given by d FM r ( µ, ν ) = Z I max { , | y |} r − | F µ ( y ) − F ν ( y ) | d y . Like the L´evy and Kantorovich metrics, the Fortet–Mourier r -distance also metrizes the weaktopology on P . The reader is referred to [28, 31] for details on the mathematical background of14 FM r and its use for stochastic optimization. Note that if I ⊂ [1 , + ∞ [ then d FM r ( µ, ν ) = λ (cid:0) T ( I ) (cid:1) r d (cid:0) µ ◦ T − , ν ◦ T − (cid:1) , with the homeomorphism T : x x r of [1 , + ∞ [. For instance, β b ◦ T − = β rb , and hence best(or best uniform) d FM r -approximations of β b can easily be identified using Corollary 4.4 (or 4.3). d r -approximations ( < r < + ∞ ) Similarly to the case of r = 1, [36, Thm.5.5] guarantees that, given any n ∈ N , there exists a(unique) best uniform d r -approximation δ u n • of β b . Except for r = 2, however, no explicit formulaseems to be available for δ u n • . It is desirable, therefore, to at least identify asymptotically bestuniform d r -approximations, that is, a sequence ( x n ) with x n ∈ Ξ n for all n ∈ N such thatlim n →∞ d r (cid:0) β b , δ u n x n (cid:1) d r ( β b , δ u n • ) = 1 . Usage of [36, Thm.5.15] accomplishes this and also yields the uniform d r -quantization coefficientof β b . (Notice that, as r ↓
1, the latter is consistent with Corollary 4.3.)
Proposition 4.6.
Let b, r > . Then (cid:0) δ u n x n (cid:1) , with x n,j = b (2 j − / (2 n ) for all n ∈ N and j =1 , . . . , n , is a sequence of asymptotically best uniform d r -approximations of β b . Moreover, lim n →∞ nd r ( β b , δ u n • ) = (log b ) − /r b − (cid:18) b r − r ( r + 1) (cid:19) /r . The remainder of this section studies best d r -approximations of β b . In general, the questionof uniqueness of best d r -approximations is a difficult one, for which only partial answers exist;see, e.g., [17, Sec.5]. Specifically, β b does not seem to satisfy any known condition (such as, e.g.,log-concavity) that would guarantee uniqueness. However, uniqueness can be established via adirect calculation. Theorem 4.7.
Let b, r > and n ∈ N . There exists a unique best d r -approximation δ • ,n • of β b ,and δ • ,n • = n .Proof. Existence follows as in Theorem 3.12; alternatively, see [17, Sec.4.1] or [36, Prop.5.22]. Toavoid trivialities, henceforth assume n ≥
2. If d r ( β b , δ px ) = d r (cid:0) β b , δ • ,n • (cid:1) , then by [36, Thm.5.23], b P ,j = x ,j + x ,j +1 ∀ j = 1 , . . . , n − , but also Z log b x ,j P ,j − ( x ,j − b y ) r − d y = Z P ,j log b x ,j ( b y − x ,j ) r − d y ∀ j = 1 , . . . , n . (4.3)Eliminating P and substituting z = b y /x ,j in (4.3) yields n equations for x , , . . . , x ,n , namely Z x , ( z − r − d zz r = 2 − r g (cid:18) x , x , (cid:19) ,g r (cid:18) x ,j x ,j − (cid:19) = g (cid:18) x ,j +1 x ,j (cid:19) , ∀ j = 2 , . . . , n − , (4.4) g r (cid:18) x ,n x ,n − (cid:19) = g (cid:18) b − x ,n x ,n (cid:19) , g a , with a ∈ R , is given by g a ( x ) = Z x ( z − r − z a ( z + 1) d z , x ≥ . Assume that e x ∈ Ξ n also solves (4.4). If e x , > x , then e x ,j +1 / e x ,j > x ,j +1 /x ,j and hence e x ,j +1 > x ,j +1 for all j = 0 , . . . , n −
1, but by the last equation in (4.4) also 2 b/ e x ,n > b/x ,n ,an obvious contradiction. Similarly, e x , < x , leads to a contradiction. Thus, e x , = x , , andconsequently e x = x . (If n = 1 then (4.4) reduces to Z x , ( z − r − d zz r = 2 − r g (cid:18) b − x , x , (cid:19) , which also has a unique solution since, as x , increases from 1 to b , the left side increases from 0whereas the right side decreases to 0.) In summary, therefore, x ∈ Ξ n and p ∈ Π n are uniquelydetermined by d r ( β b , δ px ) = d r (cid:0) β b , δ • ,n • (cid:1) .As in the case of best uniform d r -approximations of β b , no explicit formula is available for δ • ,n • , not even when r = 2. Still, it is possible to identify asymptotically best d r -approximations,that is, a sequence (cid:0) δ p n x n (cid:1) with x n ∈ Ξ n and p n ∈ Π n for all n ∈ N such thatlim n →∞ d r (cid:0) β b , δ p n x n (cid:1) d r (cid:0) β b , δ • ,n • (cid:1) = 1 . In addition, the d r -quantization coefficient of β b can be computed explicitly; for details see [36,Prop.5.26] and the references given there. Notice that, as r ↓
1, the result is consistent withCorollary 4.4.
Proposition 4.8.
Let b, r > . Then (cid:0) δ p n x n (cid:1) , with x n,j = (cid:18) jn + 1 (cid:16) b r/ ( r +1) − (cid:17)(cid:19) /r , P n,j = 1log b log x n,j + x n,j +1 , for all n ∈ N and j = 1 , . . . , n − , and x n,n = (cid:16) b r/ ( r +1) − nn +1 (cid:17) /r , is a sequence ofasymptotically best d r -approximations of β b . Moreover, lim n →∞ nd r ( β b , δ • ,n • ) = r + 12( b − b ) /r (cid:18) b r/ ( r +1) − r (cid:19) /r . Example 4.9.
For µ = Beta (2 , n ∈ N , a unique best uniform d r -approximationexists for each r ≥
1. The best uniform d -approximations δ u n x , where x ,j = q j − n for j =1 , . . . , n , also constitute a sequence of asymptotically best uniform d r -approximations for 1 2, with lim n →∞ nd r ( µ, δ u n • ) = (cid:18) − r ( r + 1)(2 − r ) (cid:19) /r , (4.5)in analogy to Proposition 4.6. For r ≥ 2, however, this analogy breaks down, aslim n →∞ n √ log n d ( µ, δ u n • ) = 14 √ , n →∞ n / /r d r ( µ, δ u n • ) is finite and positive whenever r > µ is log-concave, or by an argument similar to the one proving Theorem 4.7, there existsa unique best d r -approximation of µ . While the authors do not know of an explicit formula for δ • ,n • , simple asymptotically best d r -approximations in the spirit of Proposition 4.8 exist, andlim n →∞ nd r ( µ, δ • ,n • ) = 2 /r − r + 1( r + 2) /r ∀ r ≥ ≤ r < Example 4.10. For the inverse Cantor distribution, for every r ≥ α r = r − + (1 − r − ) log 2 / log 3, and note that log 2 / log 3 < α r ≤ 1. With this, 3 α r d r ( µ, δ u n • ) = d r ( µ, δ u n • ) forall n ∈ N , and it is readily deduced that2 /r − − /r ≤ n α r d r ( µ, δ u n • ) ≤ /r ∀ n ∈ N . Thus (cid:0) n α r d r ( µ, δ u n • ) (cid:1) is bounded below and above by positive constants. (The authors suspectthat this sequence is divergent for every r ≥ d r -approximations also exist, and in a similar spirit it can be shown that (cid:0) n e α r d r ( µ, δ • ,n • ) (cid:1) is bounded below and above by positive constants (and again, presumably, divergent), where e α r = α r log 3 / log 2. Note that 1 < e α r ≤ log 3 / log 2, and hence (cid:0) d r ( µ, δ • ,n • ) (cid:1) decays faster than( n − ) for every r ≥ This section discusses best finitely supported d K -approximations. Though ultimately the resultsare true analogues of their counterparts in Sections 3 and 4, the underlying arguments are subtlydifferent, which may be seen as a reflection of the fact that d K metrizes a topology finer than theweak topology of P . (Recall, however, that d K does metrize the weak topology on P cts .)Given µ ∈ P and n ∈ N , for every x ∈ Ξ n , let K • ( x ) = max (cid:26) F µ − ( x , ) , 12 max n − j =1 (cid:0) F µ − ( x ,j +1 ) − F µ ( x ,j ) (cid:1) , − F µ ( x ,n ) (cid:27) . Note that K • ( x ) = d K (cid:16) µ, δ π ( x ) x (cid:17) with Π( x ) ,j = (cid:0) F µ ( x ,j ) + F µ − ( x ,j +1 ) (cid:1) for all j = 1 , . . . , n − d K -approximations with prescribed locations are analogousto Theorem 3.5. Theorem 5.1. Assume that µ ∈ P , and n ∈ N . For every x ∈ Ξ n , there exists a best d K -approximation of µ, given x. Moreover, d K ( µ, δ px ) = d K ( µ, δ • x ) if and only if, for every j =0 , . . . , n , x ,j < x ,j +1 = ⇒ F µ − ( x ,j +1 ) − K • ( x ) ≤ P ,j ≤ F µ ( x ,j ) + K • ( x ) , (5.1) and in this case d K ( µ, δ • x ) = K • ( x ) . roof. Given x ∈ Ξ n and p ∈ Π n , let y ∈ Ξ m and q ∈ Π m as in the proof of Lemma 3.4. Then d K ( µ, δ px ) = max mi =0 sup t ∈ [ y ,i ,y ,i +1 [ | F µ ( t ) − Q ,i |≥ max (cid:26) F µ − ( y , ) , 12 max m − i =1 ( F µ − ( y ,i +1 ) − F µ ( y ,i )) , − F µ ( y ,m ) (cid:27) = max (cid:26) F µ − ( x , ) , 12 max n − j =1 ( F µ − ( x ,j +1 ) − F µ ( x ,j )) , − F µ ( x ,n ) (cid:27) = K • ( x ) . This shows that δ π ( x ) x is a best d K -approximation, given x , and d K ( µ, δ • x ) = K • ( x ). Moreover, d K ( µ, δ px ) = K • ( x ) if and only ifmax {| F µ − ( y ,i +1 ) − Q ,i | , | F µ ( y ,i ) − Q ,i |} ≤ K • ( x ) ∀ i = 1 , . . . , m − , that is, F µ − ( y ,i +1 ) − K • ( x ) ≤ Q ,i ≤ F µ ( y ,i ) + K • ( x ) ∀ i = 0 , . . . , m, which in turn is equivalent to the validity (5.1) for every j .To address the approximation problem with prescribed weights, an auxiliary function analo-gous to ℓ f,I in Section 3 is useful. Specifically, given a non-decreasing function f : R → R , let I ⊂ R be any bounded, non-empty interval, and define κ f,I : R → R as κ f,I ( x ) = max (cid:8)(cid:12)(cid:12) f − ( x ) − inf I (cid:12)(cid:12) , (cid:12)(cid:12) f + ( x ) − sup I (cid:12)(cid:12)(cid:9) . A few basic properties of κ f,I are easily established. Proposition 5.2. Let f : R → R be non-decreasing, and ∅ = I ⊂ R a bounded interval.Then, with s := f − (cid:0) (inf I + sup I ) (cid:1) , the function κ f,I is non-increasing on ] −∞ , s [ , and non-decreasing on ] s, + ∞ [ . Moreover, κ f,I attains a minimal value whenever inf I ≤ (cid:0) f − ( s ) + f + ( s ) (cid:1) ≤ sup I . It is worth noting that κ f,I may in general not attain its infimum, as the example of f = 15 F µ ,with µ = λ (cid:12)(cid:12) [0 , + δ , and I = [6 , 8] shows, for which s = 5, and κ f,I (5 − ) = 3, κ f,I (5) = 7, κ f,I (5+) = 9; correspondingly, (cid:0) f − (5) + f + (5) (cid:1) / ∈ I .By using functions of the form κ f,I , the value of d K ( µ, ν ) can easily be bounded above when-ever ν has finite support. For convenience, for every n ∈ N let Ξ + n = { x ∈ Ξ n : x , < . . . < x ,n } .The proof of the following analogue of Lemma 3.4 is straightforward. Proposition 5.3. Let µ ∈ P and n ∈ N . For every x ∈ Ξ n and p ∈ Π n , d K (cid:0) µ, δ px (cid:1) ≤ max nj =1 κ F µ , [ P ,j − ,P ,j ] ( x ,j ) , (5.2) and equality holds in (5.2) whenever x ∈ Ξ + n . Consider for instance µ = λ (cid:12)(cid:12) [0 , + δ , and x = (1 , p ∈ Π , clearly d K ( µ, δ px ) = , whereas max j =1 κ F µ , [ P ,j − ,P ,j ] ( x ,j ) = + (cid:12)(cid:12) p , − (cid:12)(cid:12) ≥ . Thus the inequality(5.2) may be strict if x / ∈ Ξ + n . This, together with the fact that a function κ f,I may not attain18ts infimum, suggests that d K -approximations with prescribed weights are potentially somewhatfickle. Still, best approximations do exist and can be characterized in a spirit similar to Sections3 and 4. To this end, given µ ∈ P and n ∈ N , for every p ∈ Π n , let K • ( p ) = d K (cid:16) µ, δ pξ ( p ) (cid:17) with ξ ( p ) ,j = F − µ (cid:18) 12 ( P ,j − + P ,j ) (cid:19) ∀ j = 1 , . . . , n. Note that K • ( p ) ≤ max nj =1 p ,j , and in fact K • ( p ) = max nj =1 p ,j whenever µ ∈ P cts . Theorem 5.4. Assume that µ ∈ P , and n ∈ N . For every p ∈ Π n , there exists a best d K -approximation of µ, given p . Moreover, d K ( µ, δ px ) = d K ( µ, δ p • ) if and only if, for every j =1 , . . . , n , P ,j − < P ,j = ⇒ F − µ − (cid:0) P ,j − K • ( p ) (cid:1) ≤ x ,j ≤ F − µ (cid:0) P ,j − + K • ( p ) (cid:1) , (5.3) and in this case d K ( µ, δ p • ) = K • ( p ) . Proof. Note first that deleting all zero entries of p does not change the value of K • ( p ), and hencedoes not affect (5.3), nor of course the asserted existence of a best d K -approximation, given p .Thus assume min nj =1 p ,j > ξ ( p ) simply as ξ , and for every x ∈ Ξ n , write F δ px as G . To prove the existence of a best d K -approximation of µ , given p , as wellas d K ( µ, δ p • ) = K • ( p ), clearly it suffices to show that d K ( µ, δ px ) ≥ d K (cid:16) µ, δ pξ (cid:17) ∀ x ∈ Ξ n . (5.4)Similarly to the proof of Lemma 3.4, label ξ uniquely as ξ , = . . . = ξ ,j < ξ ,j +1 = . . . = ξ ,j < ξ ,j +1 = . . .< . . . = ξ ,j m − < ξ ,j m − +1 = . . . = ξ ,j m , with integers i ≤ j i ≤ m for 1 ≤ i ≤ m , and j = 0, j m = n , and define η ∈ Ξ m and q ∈ Π m as η ,i = ξ ,j i and q ,i = P ,j i − P ,j i − , respectively. With this, δ pξ = δ qη , and by Proposition 5.3, K • ( p ) = d K (cid:0) µ, δ qη (cid:1) = max mi =1 κ F µ , [ Q ,i − ,Q ,i ] ( η ,i ) . Pick i such that κ F µ , [ Q ,i − ,Q ,i ] ( η ,i ) = K • ( p ), that is,max {| F µ − ( η ,i ) − Q ,i − | , | F µ ( η ,i ) − Q ,i |} = K • ( p ) . Clearly, to establish (5.4) it is enough to show thatmax {| F µ − ( η ,i ) − G − ( η ,i ) | , | F µ ( η ,i ) − G ( η ,i ) |} ≥ K • ( p ) (5.5)and this will now be done. To this end, notice that by the definition of η ,12 (cid:0) P ,j i − − + P ,j i − (cid:1) ≤ F µ − ( η ,i ) ≤ (cid:0) P ,j i − + P ,j i − +1 (cid:1) , (5.6)but also 12 ( P ,j i − + P ,j i ) ≤ F µ ( η ,i ) ≤ 12 ( P ,j i + P ,j i +1 ) , (5.7)19ith the convention that P , − = 0 and P ,n +1 = 1.Assume first that K • ( p ) = | F µ − ( η ,i ) − Q ,i − | . If η ,i ≤ x ,j i − then G − ( η ,i ) ≤ P ,j i − − , andhence F µ − ( η ,i ) − G − ( η ,i ) ≥ F µ − ( η ,i ) − P ,j i − , but also, by (5.6), F µ − ( η ,i ) − G − ( η ,i ) ≥ F µ − ( η ,i ) − P ,j i − − (cid:0) F µ − ( η ,i ) − P ,j i − − − P ,j i − (cid:1) = P ,j i − − F µ − ( η ,i ) , and consequently F µ − ( η ,i ) − G − ( η ,i ) ≥ (cid:12)(cid:12) F µ − ( η ,i ) − P ,j i − (cid:12)(cid:12) = | F µ − ( η ,i ) − Q ,i − | = K • ( p ) . If x ,j i − < η ,i ≤ x ,j i − +1 then G − ( η ,i ) = P ,j i − and hence | F µ − ( η ,i ) − G − ( η ,i ) | = K • ( p ) . Finally, if η ,i > x ,j i − +1 then G − ( η ,i ) ≥ P ,j i − +1 , and hence G − ( η ,i ) − F µ − ( η ,i ) ≥ P ,j i − − F µ − ( η ,i ), but also, again by (5.6), G − ( η ,i ) − F µ − ( η ,i ) ≥ P ,j i − +1 − F µ − ( η ,i ) − (cid:0) P ,j i − + P ,j i − +1 − F µ − ( η ,i ) (cid:1) = F µ − ( η ,i ) − P ,j i − , and therefore G − ( η ,i ) − F µ − ( η ,i ) ≥ (cid:12)(cid:12) F µ − ( η ,i ) − P ,j i − (cid:12)(cid:12) = K • ( p ) . Thus (5.5) holds whenever K • ( p ) = | F µ − ( η ,i ) − Q ,i − | .Next assume that K • ( p ) = | F µ ( η ,i ) − Q ,i | . Utilizing (5.7) instead of (5.6), completely anal-ogous arguments show that | F µ ( η ,i ) − G ( η ,i ) | ≥ K • ( p ) in this case as well, which again implies(5.5). The latter therefore holds in either case. As seen earlier, this proves the existence of abest d K -approximation of µ , given p , and also that d K ( µ, δ p • ) = K • ( p ).Finally, with y ∈ Ξ + m and p ∈ Π m as in the proof of Lemma 3.4, observe that d K ( µ, δ px ) = K • ( p )if and only if max mi =1 κ F µ , [ Q ,i − ,Q ,i ] ( y ,i ) = K • ( p ), by Proposition 5.3. As seen in the proof ofTheorem 5.1, this means that F µ − ( y ,i +1 ) − K • ( p ) ≤ Q ,i ≤ F µ ( y ,i ) + K • ( p ) ∀ i = 0 , . . . , m , or equivalently, F − µ − ( Q ,i − K • ( p )) ≤ y ,i ≤ F − µ ( Q ,i − + K • ( p )) ∀ i = 1 , . . . , m , which in turn is equivalent to the validity of (5.3) for every j . Corollary 5.5. Assume µ ∈ P cts , and n ∈ N . Then d K ( µ, δ u n x ) ≥ n − for all x ∈ Ξ n , withequality holding if and only if F − µ − (cid:18) j − n (cid:19) ≤ x ,j ≤ F − µ (cid:18) j − n (cid:19) ∀ j = 1 , . . . , n. 20y combining Theorems 5.1 and 5.4, it is possible to characterize best d K -approximationsof µ ∈ P as well. For this, associate with every non-decreasing function f : R → R and everynumber a ≥ S f,a : R → R , given by S f,a ( x ) = f + (cid:0) f − ( x + a ) (cid:1) + a ∀ x ∈ R . This map is a true analogue of T f,a in Section 3, and in fact, Proposition 3.11, with T f,a replacedby S f,a , remains fully valid. Identical reasoning then shows that K • ,n • := min n a ≥ S [ n ] F µ ,a (0) ≥ o < + ∞ ;again, (cid:0) K • ,n • (cid:1) is non-increasing, n K • ,n • ≤ for every n , and K • ,n • = 0 if and only if µ ≤ n .Notice that if µ ∈ P cts then S F µ ,a ( x ) = a if x < − a , a + x if − a ≤ x < − a ,a + 1 if x ≥ − a , from which it is clear that K • ,n • = n − . Theorem 5.6. Let µ ∈ P and n ∈ N . There exists a best d K -approximation of µ , and d K (cid:0) µ, δ • ,n • (cid:1) = K • ,n • . Moreover, for every x ∈ Ξ n and p ∈ Π n , the following are equivalent: (i) d K ( µ, δ px ) = d K (cid:0) µ, δ • ,n • (cid:1) ; (ii) all implications in (5.1) are valid with K • ( x ) replaced by K • ,n • ; (iii) all implications in (5.3) are valid with K • ( p ) replaced by K • ,n • .Proof. Note that once the existence of a best d K -approximation of µ is established, the proof isvirtually identical to that of Theorem 3.12. Thus, only the existence is to be proved here. To thisend, let a = inf x ∈ Ξ n ,p ∈ Π n d K ( µ, δ px ), and pick sequences ( x k ) and ( p k ) in Ξ n and Π n , respectively,with the property that lim k →∞ d K (cid:0) µ, δ p k x k (cid:1) = a . By the compactness of Ξ n , assume w.o.l.g. thatlim k →∞ x k = η ∈ Ξ n . Since a ≤ K • ( x k ) ≤ d K (cid:0) µ, δ p k x k (cid:1) , it suffices to show that K • ( η ) ≤ a . To seethe latter, assume that η ,j < η ,j +1 for any j = 1 , . . . , n − 1. Then x k,j < x k,j +1 for all sufficientlylarge k , and hence by Theorem 5.1, F µ − ( x k,j +1 ) − F µ ( x k,j ) ≤ K • ( x k ), which in turn implies F µ − ( η ,j +1 ) − F µ ( η ,j ) ≤ lim inf k →∞ ( F µ − ( x k,j +1 ) − F µ ( x k,j )) ≤ a . Since, similarly, F µ − ( η , ) ≤ a and 1 − F µ ( η ,n ) ≤ a , it follows that K • ( η ) ≤ a , as claimed. Corollary 5.7. Assume µ ∈ P cts , and n ∈ N . Then K • ,n • = K • ( u n ) = n − , and δ px with x ∈ Ξ n , p ∈ Π n is a best d K -approximation of µ if and only if it is a best uniform d K -approximation of µ . Remark 5.8. (i) By Theorem 5.6, K • ,n • = min x ∈ Ξ n K • ( x ) = min p ∈ Π n K • ( p ).(ii) If µ has even a single atom, then K • ,n • may be smaller than K • ( u n ), and thus a bestuniform d K -approximation may not be a best d K -approximation. A simple example illustratingthis is µ = δ + λ (cid:12)(cid:12) [0 , , where K • ,n • = (2 n − − whereas K • ( u n ) = max { n, } − , and hence K • ,n • < K • ( u n ) for every n ≥ 2. 21or Benford’s Law, the best d K -approximations are the same as the best uniform d -approxi-mations; see also Figure 1. Corollary 5.9. Assume b > , and n ∈ N . Then δ u n x n with x n,i = b (2 j − / (2 n ) for all j = 1 , . . . , n is the unique best (uniform) d K -approximation of β b . Moreover, d K (cid:0) β b , δ • ,n • (cid:1) = n − . Example 5.10. For µ = Beta (2 , F µ and F − µ are continuous. By Corollaries 5.5 and5.7, the best (or best uniform) d K -approximation of µ is δ u n x , with x ,j = q j − n for j = 1 , . . . , n ,and d K ( µ, δ u n • ) = d K ( µ, δ • ,n • ) = n − . With Examples 3.9, 3.15, and 4.9, therefore, the sequences (cid:0) nd ∗ ( µ, δ • ,n • ) (cid:1) all converge to a finite, positive limit, and so do (cid:0) nd ∗ ( µ, δ u n • ) (cid:1) , provided that r < ∗ = r . Example 5.11. Even though the inverse Cantor distribution is discrete with infinitely manyatoms, a best uniform d K -approximation exists, by Theorem 5.4. Utilizing (2.4), a tedious butelementary analysis of F µ reveals that (3.7) is valid with d K instead of d L . With Examples3.10 and 4.10, therefore, (cid:0) nd ∗ ( µ, δ u n • ) (cid:1) is bounded below and above by positive constants for ∗ = L , , K , but tends to + ∞ for ∗ = r > d K -approximation exists, by Theorem 5.6, and the estimates (3.9) holdwith d K instead of d L . Thus, (cid:0) n log 3 / log 2 d ∗ ( µ, δ • ,n • ) (cid:1) is bounded below and above by positiveconstants for ∗ = L , , K , but tends to + ∞ for ∗ = r > As the title of this article suggests, and the introduction explains, the general results have beenmotivated by a quantitative analysis of Benford’s Law, and the precise statements regardingthe latter are but simple corollaries of the former. In particular, Sections 3 to 5 show thatthe quantization coefficients Q ∗ = lim n →∞ nd ∗ ( β b , δ • ,n • ) and their uniform counterparts Q ∗ ,u =lim n →∞ nd ∗ ( β b , δ u n • ) all are finite and positive for each metric d ∗ considered. Clearly, Q ∗ ≤ Q ∗ ,u for all b > 1. Also, note that (cid:0) nd ∗ ( β b , δ • ,n • ) (cid:1) is non-increasing, possibly constant, whereas (cid:0) nd ∗ ( β b , δ u n • ) (cid:1) is non-decreasing. Figure 3 summarizes the results obtained earlier.The dependence of Q ∗ and Q ∗ ,u on b is illustrated in Figure 4. On the one hand, Q L and Q L ,u tend to as b ↓ 1, but also as b → + ∞ , both attaining their respective minimal valuefor b = 2. On the other hand, Q r and Q r,u both tend to ( r + 1) − /r as b ↓ 1, whereaslim b → + ∞ (log b ) /r Q r = ( r + 1) r − ( r +1) /r and lim b → + ∞ (log b ) /r − Q r,u = r − /r ( r + 1) − /r .Finally, Q K = Q K ,u = for all b . Remark 6.1. In the context of Benford’s Law, I = [1 , b ], and since S b < b always, it may seemmore natural to study the approximation problem not on all of P , but rather on the (dense)subset e P := (cid:8) µ ∈ P : µ ( { b } ) = 0 (cid:9) . Clearly, d L and d r both metrize the weak topology on e P butare not complete. (By contrast, d K is complete but not separable, and induces a finer topology.)Since e P is a G δ -set in P , a classical theorem [12, Thm.2.5.4] yields, for instance, e d ( µ, ν ) = Z | G µ − G ν | + X ∞ k =1 − k (cid:12)(cid:12)(cid:12)R − k − ( G µ − G ν ) (cid:12)(cid:12)(cid:12)R − k − G µ R − k − G ν + (cid:12)(cid:12)(cid:12)R − k − ( G µ − G ν ) (cid:12)(cid:12)(cid:12) , ∗ Q ∗ Q ∗ ,u L max { b, } − b − · log(1 + b log b ) − log(1 + log b )log b max { b, } − b − · b log b b log br ≥ r + 12( b − b ) /r (cid:18) b r/ ( r +1) − r (cid:19) /r (log b ) − /r b − (cid:18) b r − r ( r + 1) (cid:19) /r K 12 12 Figure 3: The quantization ( Q ∗ ) and uniform quantization ( Q ∗ ,u ) coefficients of β b for d ∗ ; seealso Figure 4.PSfrag replacements bQ Q Q ,u Q ,u Q K = Q K ,u Q L Q L ,u Figure 4: Comparing the quantization coefficients Q ∗ (solid curves) and uniform quantizationcoefficients Q ∗ ,u (broken curves) of β b , for ∗ = L (red), ∗ = 1 , ∗ = K (black),respectively; see also Figure 3.with G µ = b − F − µ , G ν = b − F − ν , as an equivalent complete, separable metric on e P . However, e d appears to be quite unwieldy, and the authors do not know of an equivalent complete metricon e P for which explicit results similar to those in Sections 3 and 4 could be established.Also, it is readily confirmed that, given any µ ∈ e P , there exists a best (or best uniform) d ∗ -approximation δ • ,n • ∈ e P (or δ u n • ∈ e P ), i.e., these approximation problems always have a solutionin (cid:0) e P , d ∗ (cid:1) , notwithstanding the fact that the latter space is not complete (if ∗ = L , r ) or notseparable (if ∗ = K ).For Benford’s Law, as seen above, all best (or best uniform) approximations considered con-verge at the same rate, namely ( n − ); the same is true for the Beta (2 , 1) distribution whenever1 ≤ r < 2. These are not coincidences. Rather, for many other probability metrics n − turns23ut to yield the correct order of magnitude of the n -th quantization error as well. Specifically,consider a metric d on P for which a k F s µ − F s ν k ≤ d ( µ, ν ) (6.1) ≤ a (cid:0) ǫ k F s µ − F s ν k ∞ + (1 − ǫ ) k F − µ − F − ν k ∞ (cid:1) ∀ µ, ν ∈ P , with positive constants a , a , s , s , and ǫ ∈ { , } ; see, e.g., [8, 31, 32] for examples and prop-erties of such metrics. Note that validity of (6.1) causes d to metrize a topology at least as fineas the weak topology, and clearly (6.1) holds for any d = d ∗ . The latter fact, together with [17,Thm.6.2] yields a simple observation regarding the prevalence of the rate ( n − ). Proposition 6.2. Let d be a metric on P satisfying (6.1) . Then, for every µ ∈ P , lim sup n →∞ n inf x ∈ Ξ n ,p ∈ Π n d (cid:0) µ, δ px (cid:1) < + ∞ , and if µ is non-singular (w.r.t. λ ) then also lim inf n →∞ n inf x ∈ Ξ n ,p ∈ Π n d (cid:0) µ, δ px (cid:1) > . Remark 6.3. (i) Apart from d ∗ , examples of familiar probability metrics that satisfy (6.1)include the discrepancy distance sup I ⊂ R | µ ( I ) − ν ( I ) | and the L r -distance k F µ − F ν k r betweendistribution functions [31]. For the important Prokhorov distance, validity of the right-handinequality in (6.1) appears to be unknown [16], but best approximations are suspected to convergeat the rate (cid:0) n − (cid:1) regardless [18, Sec.4]. Also, ( n − ) is established in [10] as the universal rate ofconvergence for best approximations under Orlicz norms, which contains d r as a special case.(ii) In [32, Sec.4.2], for any a ≥ 0, the a -L´evy distance d L a ( µ, ν ) = inf { y ≥ F µ ( · − ay ) − y ≤ F ν ≤ F µ ( · + ay ) + y } is considered. Every d L a satisfies (6.1), and d L = d K , d L = ω − d L . Usage of a -L´evy distancesmay enable a unified treatment of the results in Sections 3 and 5.(iii) Under additional assumptions on µ , the value of n inf x ∈ Ξ n d ( µ, δ u n x ) can similarly bebounded above and below by positive constants [36, Thm.5.15].Finally, it is worth pointing out that, though motivated here by Benford’s Law, compactnessof the interval I was assumed largely for convenience, and can easily be dispensed with for manyof the general results in this article. For instance, if I is (closed but) unbounded then (2.2), with ω = 1, still yields d L as a complete, separable metric inducing the weak topology on P , thoughthe latter no longer is compact. Clearly, Theorem 3.5 is valid in this situation, as (3.1) holdsfor f = F µ and any interval I ⊂ R . Even though (3.1) may fail for f = F − µ when supp µ isunbounded, it is readily checked that nevertheless the conclusions of Proposition 3.3 remain intactfor ℓ F − µ ,I , provided that I ⊂ [0 , 1] but I = { } and I = { } . With ℓ ∗ F − µ , { } := ℓ ∗ F − µ , { } := 0,then, Proposition 3.6 holds verbatim, and so does Theorem 3.12. Analogously, Theorems 5.1,5.4, and 5.6 all can be seen to be correct, with the definition of K • ( p ) understood to assumethat p , p ,n > 0. By contrast, the classical L -Kantorovich distance d ( µ, ν ) = k F − µ − F − ν k 24s defined only on the (dense) subset P = (cid:8) µ ∈ P : R I | x | d µ ( x ) < + ∞ (cid:9) where it metrizes atopology finer than the weak topology. Still, with P replaced by P , Proposition 4.1 also remainsintact; see, e.g., [36, Sec.5]. Note that the sequence (cid:0) nd ∗ ( µ, δ u n • ) (cid:1) is bounded when ∗ = L , K because d L ≤ d K , whereas (cid:0) nd ( µ, δ • ,n • ) (cid:1) may decay arbitrarily slowly; see [36, Thm.5.32]. Fora simple application of these results to a probability measure with unbounded support, let µ bethe standard exponential distribution, i.e., F µ ( x ) = max { , − e − x } . Calculations quite similarto the ones shown earlier for Benford’s Law yieldlim n →∞ nd L ( µ, δ • ,n • ) = log 22 , lim n →∞ nd L ( µ, δ u n • ) = 12 , whereas lim n →∞ nd ( µ, δ • ,n • ) = 1 but lim n →∞ n log n d ( µ, δ u n • ) = 14 , and clearly nd K ( µ, δ • ,n • ) = nd K ( µ, δ u n • ) = for all n . Even though µ has finite moments of allorders, there exist probability metrics d for which (cid:0) nd ( µ, δ • ,n • ) (cid:1) is unbounded; see [18, Ex.5.1(d)]. Acknowledgements The first author was partially supported by an Nserc Discovery Grant. Both authors gratefullyacknowledge helpful suggestions made by F. Dai, B. Han, T.P. Hill, and an anonymous referee. References [1] P.C. Allaart , An invariant-sum characterization of Benford’s law , J. Appl. Probab., 34(1997), 288–291.[2] F. Benford , The law of anomalous numbers , Proc. Amer. Philos. Soc., 78 (1938), 551–572.[3] A. Berger and T.P. Hill , Benfords law strikes back: no simple explanation in sight formathematical gem , Math. Intelligencer, 33 (2011), 85–91.[4] A. Berger and T.P. Hill , An Introduction to Benford’s Law , Princeton University Press,Princeton, 2015.[5] A. Berger, T.P. Hill, and K.E. Morrison , Scale-distortion inequalities for mantissasof finite data sets , J. Theoret. Probab., 21 (2008), 97–117.[6] A. Berger, T.P. Hill, and E. Rogers , Benford Online Bibliography , , 2009. (Last accessed March 16th, 2018.)[7] A. Berger and I. Twelves , On the significands of uniform random variables , to appearin: J. Appl. Probab. (2018).[8] I. Bloch and J. Atif , Defining and computing Hausdorff distances between distributionson the real line and on the circle: link between optimal transport and morphological dilations ,Math. Morphol. Theory Appl., 1 (2016), 79–99.259] S.G. Bobkov and M. Ledoux , One-dimensional empirical measures,order statistics and Kantorovich transport distances , preprint (2016). .To appear in: Memoirs of the AMS.[10] S. Dereich and C. Vormoor , The high resolution vector quantization problem with Orlicznorm distortion , J. Theoret. Probab., 24 (2011), 517–544.[11] P. Diaconis , The distribution of leading digits and uniform distribution mod 1, Ann.Probab., 5 (1977), 72–81.[12] R. Dudley , Real Analysis and Probability . Wadsworth & Brooks/Cole Advanced Books &Software, Pacific Grove, 2004.[13] L. D¨umbgen and C. Leuenberger , Explicit bounds for the approximation error in Ben-ford’s law , Elect. Comm. in Probab., 13 (2008), 99–112.[14] W. Feller , An Introduction to Probability Theory and Its Applications. Vol. II , John Wileyand Sons, New York, 1966.[15] N. Gauvrit and J.-P. Delahaye , Scatter and Regularity Imply Benford’s Law... andMore , pp. 53–69 in: H. Zenil (ed.), Randomness Through Complexity , World Scientific,Singapore, 2011.[16] A.L. Gibbs and F.E. Su , On choosing and bounding probability metrics , Int. Stat. Rev.,70 (2002), 419–435.[17] S. Graf and H. Luschgy , Foundations of Quantization for Probability Distributions ,Lecture Notes in Mathematics 1730, Springer, 2000.[18] S. Graf and H. Luschgy , Quantization for probability measures in the Prokhorov metric ,Theory Probab. Appl., 53 (2009), 216–241.[19] T.P. Hill . A statistical derivation of the significant-digit law . Statistical Science, 10 (1995),354–363.[20] T.P. Hill , Base-invariance implies Benford’s law , Proc. Amer. Math. Soc., 123 (1995),887–895.[21] P.J. Huber , Robust Statistics , John Wiley and Sons, New York, 1981.[22] L.V. Kantorovich and G. Rubinstein , On a space of completely additive functions ,Vestnik Leningradskogo Universiteta, 13 (1958), 52–59.[23] D.E. Knuth , The Art of Computer Programming , Addison-Wesley, Reading, 1975.[24] T. Linder , On asymptotically companding quantization , Probl. Control. Inform., 20 (1991),475–484. 2625] S.J. Miller , Benford’s Law: Theory and Applications , Princeton University Press, Prince-ton, 2015.[26] Y. Mori and K. Takashima , On the distribution of the leading digit of a n : a study via χ statistics , Period. Math. Hung., 73 (2016), 224–239.[27] S. Newcomb , Note on the frequency of use of the different digits in natural numbers , Amer.J. Math., 4 (1881), 39–40.[28] G.C. Pflug and A. Pichler , Approximations for probability distributions and stochasticoptimization problems , Internat. Ser. Oper. Res. Manage. Sci. 1633, Springer, New York,2011, 343–387.[29] R.S. Pinkham , On the distribution of first significant digits , Ann. Math. Statist., 32 (1961),1223–1230.[30] K. P¨otzelberger and K. Felsenstein , An asymptotic result on principal points forunivariate distributions , Optimization, 28 (1994), 397–406.[31] S.T. Rachev , Probability Metrics and the Stability of Stochastic Models , John Wiley andSons, New York, 1991.[32] S.T. Rachev, L.B. Klebanov, S.V. Stoyanov, and F.J. Fabozzi , A structural classi-fication of probability distances , In: The Methods of Distances in the Theory of Probabilityand Statistics, Springer, New York, 2013.[33] S.T. Rachev and L. R¨uschendorf , Mass Transportation Problems. Vol. II: Applications ,Springer-Verlag, 1998.[34] R. A. Raimi , The first digit problem , Amer. Math. Monthly, 83 (1976), 521–538.[35] P. Schatte , On mantissa distributions in computing and Benford’s law , J. Inform. Process.Cybernet., 24 (1988), 443–455.[36]