Fixed point characterizations of continuous univariate probability distributions and their applications
aa r X i v : . [ m a t h . S T ] A ug Fixed point characterizations of continuous univariateprobability distributions and their applications
Steffen Betsch and Bruno Ebner
Karlsruhe Institute of Technology (KIT), Institute of StochasticsKarlsruhe, Germany
August 13, 2019
Abstract
By extrapolating the explicit formula of the zero-bias distribution occurring in the con-text of Stein’s method, we construct characterization identities for a large class of absolutelycontinuous univariate distributions. Instead of trying to derive characterizing distributionaltransformations that inherit certain structures for the use in further theoretic endeavours,we focus on explicit representations given through a formula for the density- or distributionfunction. The results we establish with this ambition feature immediate applications in thearea of goodness-of-fit testing. We draw up a blueprint for the construction of tests of fit thatinclude procedures for many distributions for which little (if any) practicable tests are known.To illustrate this last point, we construct a test for the Burr Type XII distribution for which,to our knowledge, not a single test is known aside from the classical universal procedures.
MSC 2010 subject classifications.
Primary 62E10 Secondary 60E10, 62G10
Key words and phrases
Burr Type XII distribution, Density Approach, Distributional Characterizations,Goodness-of-fit Tests, Non-normalized statistical models, Probability Distributions,Stein’s Method Introduction
Over the last decades, Stein’s method for distributional approximation has become a viable toolfor proving limit theorems and establishing convergence rates. At it’s heart lies the well-knownStein characterization which states that a real-valued random variable Z has a standard normaldistribution if, and only if, E (cid:2) f ′ ( Z ) − Zf ( Z ) (cid:3) = 0 (1.1)holds for all functions f of a sufficiently large class of test functions. To exploit this characteri-zation for testing the hypothesis H : P X ∈ (cid:8) N ( µ, σ ) | ( µ, σ ) ∈ R × (0 , ∞ ) (cid:9) (1.2)of normality, where P X is the distribution of a real-valued random variable X , against generalalternatives, Betsch and Ebner (2019b) used that (1.1) can be untied from the class of test func-tions with the help of the so-called zero-bias transformation introduced by Goldstein and Reinert(1997). To be specific, a real-valued random variable X ∗ is said to have the X -zero-bias distri-bution if E (cid:2) f ′ ( X ∗ ) (cid:3) = E (cid:2) Xf ( X ) (cid:3) holds for any of the respective test functions f . If E X = 0 and Var( X ) = 1, the X -zero-biasdistribution exists and is unique, and it has distribution function T X ( t ) = E (cid:2) X ( X − t ) { X ≤ t } (cid:3) , t ∈ R . (1.3)By (1.1), the standard Gaussian distribution is the unique fixed point of the transformation P X P X ∗ . Thus, the distribution of X is standard normal if, and only if, T X = F X , (1.4)where F X denotes the distribution function of X . In the spirit of characterization-based goodness-of-fit tests, an idea introduced by Linnik (1962), this fixed point property directly admits anew class of testing procedures as follows. Letting b T Xn be an empirical version of T X and b F n the empirical distribution function, both based on the standardized sample, Betsch and Ebner(2019b) proposed a test for (1.2) based on the statistic G n = n Z R (cid:12)(cid:12)(cid:12) b T Xn ( t ) − b F n ( t ) (cid:12)(cid:12)(cid:12) w ( t ) d t, where w is an appropriate weight function, which, in view of (1.4), rejects the normality hypothesisfor large values of G n . As these tests have several desirable properties such as consistency againstgeneral alternatives, and since they show a very promising performance in simulations, we devotethis work to the question to what extent the fixed point property and the class of goodness-of-fitprocedures may be generalized to other distributions.2aturally, interest in applying Stein’s method to other distributions has already grown anddelivered some corresponding results. Characterizations like (1.1) have been established en mass[for an overview on characterizing Stein operators and further references, we recommend the workby Ley et al. (2017)]. Charles Stein himself presented some ideas fundamental to the so-calleddensity approach [see Stein (1986), Chapter VI, and Stein et al. (2004), Section 5] which we shalluse as the basis of our considerations. Related results for the special case of exponential familieswere already given by Hudson (1978) and Prakasa Rao (1979). Another approach pioneeredby Barbour (1990) [see also G¨otze (1991)] includes working with the generator of the semi-group of operators corresponding to a Markov process whose stationary distribution is the one inconsideration. A third advance is based on fixed point properties of probability transformationslike the zero-bias transformation. Very general distributional transformations were introducedby Goldstein and Reinert (2005) and refined by D¨obler (2017). In the latter contribution, thetransformations, and with them the explicit formulae, rely heavily on sign changes of the so-calledbiasing functions. These sign changes, in turn, depend on the parameters of the distribution inconsideration which renders the explicit representations impractical for the use in goodness-of-fittesting.The starting point of the present paper is the density approach identity. Here, a result moregeneral than (1.1) is provided by showing that, for suitable density functions p , a given real-valuedrandom variable X has density p if, and only if, E (cid:20) f ′ ( X ) + p ′ ( X ) p ( X ) f ( X ) (cid:21) = 0 (1.5)holds for a sufficiently large class of test functions. We provide fixed point characterizations like(1.4) by using the analogy between (1.5) and (1.1) to extrapolate the explicit formula (1.3) ofthe zero-bias transformation to other distributions. Using this approach, these transformationswill no longer be probability transformations, but we maintain the characterizing identity whichsuffices for the use in goodness-of-fit testing. Our confidence in the approach is manifested bythe fact that it has already been implemented by Betsch and Ebner (2019a) for the special caseof the Gamma distribution.With our results we contribute to the growing amount of applications of Stein’s- (or the Stein-Chen) method and his characterization in the realm of statistics. Much has been done in the areaof stochastic modeling, which often includes statistical methods. For instance, Fang (2014) andReinert and R¨ollin (2010) [see also Barbour (1982) and Barbour et al. (1989)] tackle countingproblems in the context of random graphs with Stein’s method. The technique led to further in-sights in time series- and mean field analysis, cf. Kim (2000) and Ying (2017). Braverman and Dai(2017) and Braverman et al. (2016) developed Stein’s method for diffusion approximation whichis used as a tool for performance analysis in the theory of queues. As for statistical researchthat is more relatable to our pursuits, quite a bit is known when it comes to normal approx-3mation for maximum likelihood estimators, investigated, for instance, by Anastasiou (2018),Anastasiou and Gaunt (2019), Anastasiou and Reinert (2017), and Pinelis (2017), to name buta few contributions. Moreover, Gaunt et al. (2017) consider chi-square approximation to studyPearson’s statistic which is used for goodness-of-fit testing in classification problems. Also notethat Anastasiou and Reinert (2018) apply the results of Gaunt et al. (2017) to obtain boundsto the chi-square distribution for twice the log-likelihood ratio, the statistic used for the classi-cal likelihood ratio test. Finally, the contributions by Chwialkowski et al. (2016) and Liu et al.(2016) aim at the goal we also pursue in Section 7, namely to apply Steinian characterizations toconstruct goodness-of-fit tests for probability distributions.The paper at hand is organized as follows. We first introduce an appropriate setting for ourconsiderations by stating the conditions for a density function to fit into our framework and proveidentity (1.5) in this specific setting. We then give our characterization results, distinguishingbetween distributions supported by the whole real line, those with semi-bounded support anddistributions with bounded support. Throughout, we give examples of density functions of dif-ferent nature to show that our conditions are not restrictive, as well as to provide connectionsto characterizations that are already known and included in our statements. Next, we considerapplications in goodness-of-fit testing and show that the proposed tests include the classicalKolmogorov-Smirnov- and Cram´er-von Mises procedures, as well as three modern tests consid-ered in the literature. To illustrate the methods in the last part, we construct the first evergoodness-of-fit test specifically for the two parameter Burr Type XII distribution, and show in asimulation study that the test is sound and powerful compared to classical procedures. Throughout, let (Ω , A , P ) be a probability space and p a non-negative density function supportedby an interval spt( p ) = [ L, R ], where −∞ ≤
L < R ≤ ∞ , and with R RL p ( x ) d x = 1. Denoting by P the distribution function associated with p , we state the following regularity conditions: (C1) The function p is continuous and positive on ( L, R ), and there are
L < y < . . . < y m < R such that p is continuously differentiable on ( L, y ), ( y ℓ , y ℓ +1 ), ℓ ∈ { , . . . , m − } , and( y m , R ).Whenever (C1) holds, we write S ( p ) = ( L, R ) \ { y , . . . , y m } . (C2) For the map S ( p ) ∋ x κ p ( x ) = (cid:12)(cid:12)(cid:12) p ′ ( x ) min { P ( x ) , − P ( x ) } p ( x ) (cid:12)(cid:12)(cid:12) we have sup x ∈ S ( p ) κ p ( x ) < ∞ , (C3) R S ( p ) (cid:0) | x | (cid:1)(cid:12)(cid:12) p ′ ( x ) (cid:12)(cid:12) d x < ∞ , (C4) lim x ց L P ( x ) p ( x ) = 0, and 4 C5) lim x ր R − P ( x ) p ( x ) = 0.The integral R S ( p ) is understood as the sum of the integrals over the interval-components in (C1).For a probability density function p that satisfies (C1), and a function f : ( L, R ) → R which isdifferentiable on S ( p ) except in one point, we denote the point of non-differentiability in S ( p ) by t f , and set S ( p, f ) = S ( p ) \ (cid:8) t f (cid:9) = ( L, R ) \ (cid:8) y , . . . , y m , t f (cid:9) . We index the elements of (
L, R ) \ S ( p, f ) with y f < . . . < y fm +1 . Definition 2.1 ( Test functions ) . For a probability density function p with spt( p ) = [ L, R ] thatsatisfies (C1), we denote by F p the set of all functions f : ( L, R ) → R that are continuous on( L, R ) and differentiable on S ( p ) except in (precisely) one point, that satisfylim x ց L f ( x ) p ( x ) = lim x ր R f ( x ) p ( x ) = 0 , and for which x p ′ ( x ) p ( x ) f ( x ) and x f ′ ( x ) are bounded on S ( p, f ).We write L for the Borel-Lebesgue measure on the real line, and X ∼ p L whenever a randomvariable X has Lebesgue density p , and we definedisc( X ) = (cid:8) t ∈ ( L, R ) (cid:12)(cid:12) P ( X = t ) > (cid:9) , the set of all atoms of a random variable X , containing at most countably many points. In this section, we restate the density approach identity. Since we use a very particular class oftest functions which bears some technicalities, we give an outline of the proof in Appendix A.1,roughly following Ley and Swan (2013b). We refer to Section II of Ley and Swan (2013a) for adiscrete version of the density approach identity, and mention Ley and Swan (2016) for relatedstatements in the context of parametric distributions.
Lemma 3.1. If p is a probability density function with spt( p ) = [ L, R ] that satisfies (C1) and(C2), and if X : Ω → ( L, R ) is an arbitrary random variable with P (cid:0) X ∈ S ( p ) (cid:1) = 1 , then X ∼ p L if, and only if, E (cid:20) f ′ ( X ) + p ′ ( X ) p ( X ) f ( X ) (cid:21) = 0 for each f ∈ F p with t f / ∈ disc( X ) . emark 3.2. Note that some contributions to the scientific literature [like Ley and Swan (2011),Ley and Swan (2013b), Betsch and Ebner (2019a)] claim that the function(
L, R ) ∋ x Z xL (cid:16) ( L,t ] ( s ) − P ( t ) (cid:17) p ( s ) d s is differentiable when, in fact, it fails to be so in exactly one point, namely in t . This leads tothe unfortunate consequence that we cannot assume functions in F p to be differentiable, and ifthe random variable X is discrete with an atom at the point of non-differentiability of a testfunction, the expectation in Lemma 3.1 makes no sense. As such, the error has no consequencefor Ley and Swan (2013b), since they only consider absolutely continuous random variables forwhich disc( X ) = ∅ . For the general case, the restriction to test functions with t f / ∈ disc( X )becomes necessary. Remark 3.3.
Since f p ′ t ( x )+ p ′ ( x ) p ( x ) f pt ( x ) is uniformly bounded over x ∈ S ( p, f pt ) by equation (A.1),we can assume that, for each f ∈ F p , the function f ′ + p ′ p f is integrable with respect to anyprobability measure P X such that X ∈ S ( p, f ) P -almost surely. Note that conditions comparableto our assumptions (C1)–(C5) are commonly stated in the context of Stein’s method [see, e.g.,Section 13 by Chen et al. (2011), Section 4 by Chatterjee and Shao (2011), or D¨obler (2015)]. Seealso Remark 5.7 for further comments on the regularity conditions. It is easy to adapt the proofof Lemma 3.1 so that we can also allow for finitely many points in which the density function iszero [by changing condition (C1) accordingly]. However, for our characterization results later onwe need the continuity of the functions in F p on the whole interval ( L, R ), and this we cannotget from f pt when we allow for zeros in the function p . Remark 3.4.
For later use we note that if (C4) holds, any function f ∈ F p is subject tolim x ց L f ( x ) = 0, since f pt from the proof of Lemma 3.1 satisfieslim x ց L f pt ( x ) = lim x ց L P ( x ) p ( x ) (cid:16) − P ( t ) (cid:17) = 0 . By analogy, if (C5) holds, each function f ∈ F p can be taken to satisfy lim x ր R f ( x ) = 0.In a different form, the characterization given in Lemma 3.1 has successfully been applied fordistributional approximations in the Curie-Weiss model [see Chatterjee and Shao (2011)] or thehitting times of Markov chains [see Pek¨oz and R¨ollin (2011)]. For an overview, we refer to Section13 in Chen et al. (2011). In this paper, however, we use the characterization to derive another,more explicit identity that typifies distributions with density functions as above. We therebygeneralize the fixed point properties of the well-known zero-bias- and equilibrium transformationsbut also classical identities, such as the characterization of the exponential distribution throughthe mean residual life function. 6 Univariate distributions supported by the real line
Assume for now that p : R → [0 , ∞ ) is a probability density function supported by the whole realline. Theorem 4.1.
Suppose that p is a probability density function with spt( p ) = R that satisfies theconditions (C1) – (C3). Let X : Ω → R be a random variable with P (cid:0) X ∈ S ( p ) (cid:1) = 1 , and E (cid:12)(cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) (cid:12)(cid:12)(cid:12)(cid:12) < ∞ , E (cid:12)(cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) X (cid:12)(cid:12)(cid:12)(cid:12) < ∞ . (4.1) Then X ∼ p L if, and only if, the distribution function of X has the form F X ( t ) = E (cid:20) p ′ ( X ) p ( X ) ( t − X ) { X ≤ t } (cid:21) , t ∈ R . The proof is given in Appendix A.2.
Remark 4.2.
For a density function p supported by the whole real line which satisfies conditions(C1) – (C3), take the set of all distributions considered in Theorem 4.1, that is, P = (cid:26) P X (cid:12)(cid:12)(cid:12) P (cid:0) X ∈ S ( p ) (cid:1) = 1 , E (cid:12)(cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) (cid:12)(cid:12)(cid:12)(cid:12) < ∞ , and E (cid:12)(cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) X (cid:12)(cid:12)(cid:12)(cid:12) < ∞ (cid:27) . The previous theorem concerns properties of the mapping T : P → D ( R ) , F X
7→ T ( F X ) = (cid:18) t E (cid:20) p ′ ( X ) p ( X ) ( t − X ) { X ≤ t } (cid:21)(cid:19) , where D ( R ) is the c`adl`ag-space over R , and where we identified elements from P with theirdistribution function. In particular, Theorem 4.1 states that this mapping has a unique fixedpoint, namely P X = p L . Putting further restrictions on the distribution of X such that d Xp fromthe proof of Theorem 4.1 (see Appendix A.2) is a probability density function without assumingthat F X is given through our explicit formula, we have actually shown in the last calculation ofthat proof the existence of a distribution for some random variable X p with E (cid:2) f ′ ( X p ) (cid:3) = E (cid:20) − p ′ ( X ) p ( X ) f ( X ) (cid:21) for each f ∈ F p , and we could think of T as a distributional transformation. These additionalrestrictions [for the normal distribution they are E X = 0 and Var( X ) = σ , see Example 4.4below] scale down the class of distributions in which the characterization holds. Therefore, ourpoint is not to cling on to distributional transformations, which makes explicit formulae morecomplicated [as witnessed by D¨obler (2017), Remark 1 (d) and Remark 2], but extract whicheverinformation we can get from the explicit formula itself.In the proof of Theorem 4.1 we have actually also shown another characterization result, butvia the density function. 7 orollary 4.3. Let p be a probability density function with spt( p ) = R that satisfies conditions(C1) – (C3). When X : Ω → R is a random variable with density function f X , E (cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) (cid:12)(cid:12)(cid:12) < ∞ ,and E (cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) X (cid:12)(cid:12)(cid:12) < ∞ , then X ∼ p L if, and only if, the density function of X has the form f X ( t ) = E (cid:20) p ′ ( X ) p ( X ) { X ≤ t } (cid:21) , t ∈ R . It is clear from the proof that it suffices to have the above representation for the densityfunction of X only for L -almost every (a.e.) t ∈ R to conclude that X ∼ p L . This is much inline with the intuition about density functions, since they uniquely determine a probability law,but are themselves only unique L -almost everywhere (a.e.).To get a feeling for the results, we consider two examples. For brevity we only give thecharacterization via Theorem 4.1, the result via Corollary 4.3 being clear from that. Example 4.4 ( Mean-zero Gaussian distribution ) . For x ∈ R let p ( x ) = 1 √ πσ exp (cid:16) − x σ (cid:17) , where 0 < σ < ∞ . The function p is positive and continuously differentiable on the whole realline, so (C1) is satisfied [with m = 0 and S ( p ) = R ]. We have p ′ ( x ) p ( x ) = − xσ , x ∈ R . Condition (C3)follows from the existence of mean and variance of the normal distribution, and (C2) is provenusing the (easily verified) identities1 − P ( x ) p ( x ) ≤ σ x , x > , and P ( x ) p ( x ) = 1 − P ( − x ) p ( − x ) ≤ − σ x , x < . By Theorem 4.1, a real-valued random variable X with E X < ∞ follows the mean-zero Gaussianlaw with variance σ if, and only if, the distribution function of X has the form F X ( t ) = E (cid:20) Xσ ( X − t ) { X ≤ t } (cid:21) , t ∈ R . In this particular example, the map T introduced in Remark 4.2 is, up to a change of the do-main, the zero-bias transformation discussed in the introduction. The transformation P X P X ∗ (using notation from our introduction), which coincides with our mapping T in terms of the lawof the maps, has the normal distribution N (0 , σ ) as its unique fixed point and thus typifiesthis distribution within all distributions with mean zero and variance σ . The message of theexample at hand is that our characterization result (Theorem 4.1) has the characterization viathe zero-bias distribution as a special case. It is notable that we generalize this well-known char-acterization in the sense that the explicit formula given above identifies the normal distribution N (0 , σ ) not only within the class of all distributions with mean zero and variance σ , but withinthe class of all distributions with E X < ∞ . However, if E X = 0 or Var( X ) = σ , the formula8or F X ∗ may no longer be a distribution function, and T is to be understood as an extension ofthe operator that maps P X P X ∗ onto the larger domain P = n P X (cid:12)(cid:12)(cid:12) E X < ∞ o ) n P X (cid:12)(cid:12)(cid:12) E X = 0 and Var( X ) = σ o . The conditions (C1) – (C3) also hold for the normal distribution with location parameterincluded. We simply chose the setting above to illustrate the connection to the zero-bias distri-bution.
Example 4.5 ( Laplace distribution ) . For a location parameter µ ∈ R and a scale parameter σ > p ( x ) = 12 σ exp (cid:16) − | x − µ | σ (cid:17) , x ∈ R . Condition (C1) is satisfied with m = 1, y = µ , and S ( p ) = ( −∞ , µ ) ∪ ( µ, ∞ ). We have p ′ ( x ) p ( x ) = sign( µ − x ) σ , x = µ. To verify (C2), use that the distribution function of the Laplace distribution can be given explicitlyto obtain sup x ∈ S ( p ) κ p ( x ) ≤ < ∞ . Condition (C3) follows from a simple calculation. Conse-quently, Theorem 4.1 holds, and the characterization for the Laplace distribution reads as follows.A real-valued random variable X with distribution function F X , which satisfies P (cid:0) X = µ (cid:1) = 1and E | X | < ∞ , has the Laplace distribution with parameters µ and σ if, and only if, F X ( t ) = E (cid:20) sign( µ − X ) σ ( t − X ) { X ≤ t } (cid:21) , t ∈ R . In the context of probability distributions on the real line, we have also checked the conditions(C1) – (C3) for the Cauchy- and Gumbel distribution, showing that we do not need any momentassumptions to prove (C3), and that the characterizations include more complicated distributionswhich are important in applications. We will give more examples later on.
In this section, we seek to provide characterization results similar to those in the previous section,but for probability distributions with semi-bounded support. We have chosen in Section 4 to firstprove the characterization via the distribution function since this is the ’conventional’ way, orat least, say, the way the special case of the zero-bias transformation is known. From a logicalperspective it is more convenient to first establish the result via the density function as in Corollary4.3, and then to derive the corresponding distribution function. We first discuss the case when p is a density functions whose support is bounded from below. Namely, we let p : R → [0 , ∞ ) be aprobability density function with spt( p ) = [ L, ∞ ), L > −∞ . The most important case is L = 0,that is, density functions supported by the positive half line.9 heorem 5.1. Let p be a probability density function with spt( p ) = [ L, ∞ ) that satisfies theconditions (C1) – (C4). If X : Ω → ( L, ∞ ) is a random variable with density function f X , E (cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) (cid:12)(cid:12)(cid:12) < ∞ , and E (cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) X (cid:12)(cid:12)(cid:12) < ∞ , then X ∼ p L if, and only if, f X ( t ) = E (cid:20) − p ′ ( X ) p ( X ) { X > t } (cid:21) , t > L. The proof of this theorem consists of arguments and calculations that are very similar to thosein the proof of Theorem 4.1, and we refrain from giving the details. Instead we give some insighton the special case of density functions on the positive axis.
Remark 5.2.
The integrability condition on X can be weakened in cases where the densityfunction p is positive and continuously differentiable, as well as supported by the positive axis,that means especially, m = 0 and S ( p ) = (0 , ∞ ). In this case, the calculation in the sufficiencypart of the proof of Theorem 5.1 reduces to E (cid:2) f ′ ( X ) (cid:3) = Z t f f ′ ( s ) E (cid:20) − p ′ ( X ) p ( X ) { X > s } (cid:21) d s + Z ∞ t f f ′ ( s ) E (cid:20) − p ′ ( X ) p ( X ) { X > s } (cid:21) d s = E (cid:20) − p ′ ( X ) p ( X ) Z X f ′ ( s ) d s { X ≤ t f } (cid:21) + E (cid:20) − p ′ ( X ) p ( X ) Z t f f ′ ( s ) d s { X > t f } (cid:21) + E " − p ′ ( X ) p ( X ) Z Xt f f ′ ( s ) d s { X > t f } = E (cid:20) − p ′ ( X ) p ( X ) f ( X ) (cid:21) , and it suffices for the use of Fubini’s theorem to know that E (cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) X (cid:12)(cid:12)(cid:12) < ∞ (note that thiscondition on X is also enough to guarantee that the expectation which defines d Xp exists L -a.e.,see Appendix A.3). Consequently, it suffices to claim R ∞ x | p ′ ( x ) | d x < ∞ instead of (C3), andstill have the sufficiency part of the theorem be consistent in itself. What is more, this lastcondition yields Z ∞ Z ∞ t (cid:12)(cid:12) p ′ ( x ) (cid:12)(cid:12) d x d t = Z ∞ (cid:12)(cid:12) p ′ ( x ) (cid:12)(cid:12) Z x d t d x = Z ∞ x (cid:12)(cid:12) p ′ ( x ) (cid:12)(cid:12) d x < ∞ , and thus R ∞ t (cid:12)(cid:12) p ′ ( x ) (cid:12)(cid:12) d x < ∞ for L -a.e. t >
0. This suffices to derive the necessity part ofTheorem 5.1 with equality for L -a.e. t >
0. Putting together these thoughts, we obtain thefollowing special case.
Corollary 5.3 ( Densities supported by the positive axis ) . Assume that p is a probabilitydensity function with spt( p ) = [0 , ∞ ) that is positive and continuously differentiable on (0 , ∞ ) ,and satisfies (C2) and (C4). Moreover, assume that R ∞ x (cid:12)(cid:12) p ′ ( x ) (cid:12)(cid:12) d x < ∞ . Let X be a positiverandom variable with density function f X , and E (cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) X (cid:12)(cid:12)(cid:12) < ∞ . Then X ∼ p L if, and only if,we have for L -a.e. t > that f X ( t ) = E (cid:20) − p ′ ( X ) p ( X ) { X > t } (cid:21) .
10p next, we use Theorem 5.1 to derive a characterization result for the distribution function.
Theorem 5.4.
Assume that p is a probability density function with spt( p ) = [ L, ∞ ) satisfyingthe conditions (C1) – (C4). Let X : Ω → ( L, ∞ ) be a random variable with P (cid:0) X ∈ S ( p ) (cid:1) = 1 , E (cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) (cid:12)(cid:12)(cid:12) < ∞ , and E (cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) X (cid:12)(cid:12)(cid:12) < ∞ . Then X ∼ p L if, and only if, F X ( t ) = E (cid:20) − p ′ ( X ) p ( X ) (cid:16) min { X, t } − L (cid:17)(cid:21) , t > L. The proof is given in Appendix A.3. Note that the results on the distribution function aresomewhat richer than the characterizations via the density function, for the latter only identifythe underlying distribution within a subset of absolutely continuous probability distributionsfor which a density function exists. The characterization via the distribution function does notneed this restriction to absolutely continuous distributions, but only that X has no atoms in( L, ∞ ) \ S ( p ) = { y , . . . , y m } . Remark 5.5.
In the case where L = 0, and p is continuously differentiable and positive on (0 , ∞ ),the proof of Theorem 5.4 remains true if we replace (C3) with R ∞ x | p ′ ( x ) | d x < ∞ , and if wefurther drop the first integrability condition on X and only require E (cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) X (cid:12)(cid:12)(cid:12) < ∞ .We obtain the following special case of the characterization. Corollary 5.6 ( Densities supported by the positive axis ) . Assume that p is a probabilitydensity function with spt( p ) = [0 , ∞ ) that is positive and continuously differentiable on (0 , ∞ ) ,and satisfies (C2) and (C4). Moreover, assume that R ∞ x (cid:12)(cid:12) p ′ ( x ) (cid:12)(cid:12) d x < ∞ . Let X be a positiverandom variable with E (cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) X (cid:12)(cid:12)(cid:12) < ∞ . Then X ∼ p L if, and only if, F X ( t ) = E (cid:20) − p ′ ( X ) p ( X ) min { X, t } (cid:21) , t > . Now follows the major source of examples we give in this work. We omit the explicit proofsof the regularity conditions for they consist of (sometimes) tedious calculations which provide noinsight on the characterizations. Instead, we give the following remark on how the conditions areto be verified, and on their necessity in general.
Remark 5.7.
The regularity condition (C1) is easily understood and checked for a given densityfunction. Note that the weaker assumption of absolute continuity of p , which is mostly used inthe context of Stein’s method, entails similar problems as described in Remark 3.2: If p is merelyassumed to be absolutely continuous, then in order to handle random variables X with discreteparts (e.g. in Lemma 3.1) we would still have to identify the points of non-differentiability of p in order to make sense of the term p ′ ( X ). This would return us to considering a set like S ( p )which, technically, brings us to the setting we consider already.11ondition (C3) involves a direct calculation which can often be simplified if one has knowledgeof the existence of moments of the distribution at hand. From the proofs of our characterizationsit is apparent that (C1) and (C3) [as well as the integrability conditions on X which are in linewith (C3)] are necessary to use Fubini’s theorem and the fundamental theorem of calculus. Assuch, we do not see any truly instrumental weaker alternative conditions (apart from the specialcase discussed in Remarks 5.2 and 5.5) which still rigorously allow for all calculations.Both conditions (C4) and (C5) are trivially satisfied when the respective limit of the densityfunction is positive, and if that is not the case, L’Hospital’s rule gives a reliable handle for it. Withregard to these two conditions, we refer to Proposition 3.7 of D¨obler (2015) who discusses them inmuch detail and provides easy-to-check criteria. Moreover, this specific result from D¨obler (2015)indicates strongly that the two conditions are not restrictive in practice.To prove condition (C2), it is helpful to realize, in the case when p is continuously dif-ferentiable, that κ p is continuous. Thus, it suffices to check that lim sup x ց L κ p ( x ) < ∞ andlim sup x ր R κ p ( x ) < ∞ for (C2) to hold. Regularity conditions (C2) and (C4)/(C5) guaranteecertain beneficial properties of the test functions from F p . For one, they guarantee that, for f ∈ F p , lim x ց L f ( x ) = 0 (or lim x ր R f ( x ) = 0), see Remark 3.4, which we need to truly getrid of the test functions in our calculations (as in Appendix A.4). Condition (C2) is stated sothat functions f ∈ F p have uniformly bounded derivative. We use this fact in our proofs (e.g.,the last calculation in Appendix A.2) to apply the fundamental theorem of calculus on f ′ and tojustify the use of Fubini’s theorem. For both arguments the boundedness of f ′ is not a necessarycondition, but we have not found any alternative assumption for (C2) which allows for a soundand rigorous derivation of all results.Later on, we give an example for a distribution which fails the respective version of condition(C3) that ought to hold in order for that distribution to be included in our characterizationresults. For a (rather artificial) density functions which violates (C4), see Example 3.6 of D¨obler(2015).With these tools at hand, the regularity conditions for all examples below can be proven. Weuse Corollary 5.6 in each case, except for the L´evy distribution. The characterizations via thedensity functions are not stated explicitly to save space. Example 5.8 ( Gamma distribution ) . Assume that p ( x ) = λ − k Γ( k ) x k − exp (cid:0) − λ − x (cid:1) , x > , is the density function of the Gamma distribution with shape parameter k > λ >
0. If X is a positive random variable with E X < ∞ , then X follows the Gamma law with12arameters k and λ if, and only if, the distribution function of X has the form F X ( t ) = E (cid:20)(cid:18) − k − X + 1 λ (cid:19) min { X, t } (cid:21) , t > . Note that this result has been proven explicitly, and with a similar line of proof as our generalresults above, by Betsch and Ebner (2019a).
Example 5.9 ( Exponential Distribution ) . Denote the density of the exponential distributionwith rate parameter λ > p ( x ) = λe − λx , x >
0. This is an easy special case of the previousexample, namely the Gamma distribution with shape parameter k = 1 and scale parameter 1 /λ .Let X be a positive random variable with E X < ∞ . Then X has the exponential distributionwith parameter λ if, and only if, F X ( t ) = λ E h min { X, t } i , t > . This identity is [see Baringhaus and Henze (2000)] equivalent to the well-known characteri-zation of exponentiality via the mean residual life function, which states that a positive randomvariable X with E X < ∞ follows an exponential law if, and only if, E [ X − t | X > t ] = E [ X ], t >
0. For yet another observation, assume that X is a positive random variable with E X = λ − .With d Xp ( t ) = E (cid:20) − p ′ ( X ) p ( X ) { X > t } (cid:21) = λ P (cid:0) X > t (cid:1) , t > , as in the proofs of our results, we have d Xp ≥ Z ∞ d Xp ( t ) d t = λ Z ∞ P (cid:0) X > t (cid:1) d t = λ E X = 1 . If X e is a random variable with density function d Xp , the proof of Theorem 5.1 (see Remark 5.5)shows that E [ f ′ ( X e )] = λ E [ f ( X )] for each f ∈ F p . Up to a change in the class of test functions,this is the defining equation of the equilibrium distribution with respect to X . Lemma 3.1 impliesthat when restricting to E X = λ − , the exponential distribution with parameter λ is the uniquefixed point of the equilibrium transformation P X P X e . This fact is used for approximationarguments with Stein’s method [see Pek¨oz and R¨ollin (2011), who introduced the equilibriumdistribution, as well as Chapter 13.4 by Chen et al. (2011) and Section 5 by Ross (2011)]. As inthe case of the zero-bias transformation, we have generalized this characterization in the sense thatthe explicit formula of the equilibrium distribution uniquely identifies the exponential distributionwith parameter λ within the class of all distributions P X with E X < ∞ . Example 5.10 ( Inverse Gaussian distribution ) . Denote the inverse Gaussian density by p ( x ) = r λ π x − / exp (cid:18) − λ ( x − µ ) µ x (cid:19) , x > , µ, λ >
0. If X is a positive random variable with E (cid:2) X + X − (cid:3) < ∞ , then X follows theinverse Gaussian law with parameters µ and λ if, and only if, F X ( t ) = E (cid:20)(cid:18) − λ X + 32 X + λ µ (cid:19) min { X, t } (cid:21) , t > . Now we handle distributions that are of interest for applications. The Weibull distribution isapplied in hydrology and wind speed analysis, see Singh (1987) and Carrillo et al. (2014), the Burrdistribution is commonly taken as a model for household income, see Singh and Maddala (1976),and the Rice distribution appears in signal processing to describe how cancellation phenomenaaffect radio signals [cf. Chapter 13 of Proakis and Salehi (2008)]. The last example we give is theL´evy distribution which is used to model the length of paths that are followed by photons afterreflection from a turbid media, see Section 3 of Rogers (2008). Here we provide insight on thehandling of an additional location parameter which is often added to probability distributions.
Example 5.11 ( Weibull distribution ) . For k, λ > p ( x ) = kλ k x k − exp (cid:18) − (cid:16) xλ (cid:17) k (cid:19) , x > , be the density function of the Weibull distribution in its usual parametrization. Let X be anypositive random variable with E X k < ∞ . Then X has the Weibull distribution with parameters k and λ if, and only if, F X ( t ) = E (cid:20)(cid:18) k X k − λ k − k − X (cid:19) min { X, t } (cid:21) , t > . Example 5.12 ( Burr distribution ) . The Burr Type XII distribution with parameters c, k > σ > p ( x ) = c kσ (cid:16) xσ (cid:17) c − (cid:16) (cid:16) xσ (cid:17) c (cid:17) − k − , x > . A positive random variable X has the Burr distribution with parameters c, k, σ > X has the form F X ( t ) = E (cid:20)(cid:18) c ( k + 1) X c − σ c + X c − c − X (cid:19) min { X, t } (cid:21) , t > . Particularly interesting about this example is that, even though the Burr distribution is substan-tially more complicated than many of our other examples, no moment condition is needed for thecharacterization to hold, since E (cid:12)(cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) X (cid:12)(cid:12)(cid:12)(cid:12) ≤ | c − | + c ( k + 1) E (cid:20) X c σ c + X c (cid:21) ≤ | c − | + c ( k + 1) < ∞ . This implies that the characterization is universal in the sense that it identifies the Burr distri-bution within the set of all probability distributions on the positive axis.14 xample 5.13 ( Rice distribution ) . For parameters k, ̺ > p ( x ) = 2 ( k + 1) x̺ exp (cid:18) − k − ( k + 1) x ̺ (cid:19) I s k ( k + 1) ̺ x ! , x > , where I α denotes the modified Bessel function of first kind of order α ∈ Z . We chose theparametrization for p that is mostly used in signal processing and is easily found under thekeyword of Rician fading. Let X be a positive random variable with E X < ∞ . Then X has theRice distribution with parameters k and ̺ if, and only if F X ( t ) = E − X + 2 ( k + 1) X̺ − s k ( k + 1) ̺ · I (cid:18) q k ( k +1) ̺ X (cid:19) I (cid:18) q k ( k +1) ̺ X (cid:19) min { X, t } , for t >
0. Note that despite the complexity of the term p ′ ( x ) p ( x ) , the integrability conditions is E X < ∞ , since the quotient of the Bessel functions cancels via I ( y ) I ( y ) ≤ y > Example 5.14 ( L´evy Distribution ) . Take µ ∈ R and σ >
0. Let p ( x ) = r σ π (cid:0) x − µ (cid:1) − / exp (cid:18) − σ x − µ ) (cid:19) , x > µ, denote the density function of the L´evy distribution with location parameter µ and scale pa-rameter σ . Let X be a random variable which takes values in ( µ, ∞ ) almost surely such that E [( X − µ ) − ] < ∞ and E [( X − µ ) − ] < ∞ . Then X has the L´evy distribution with parameters µ and σ if, and only if, the distribution function of X has the form F X ( t ) = 12 E (cid:20)(cid:18) X − µ − σ ( X − µ ) (cid:19) (cid:16) min { X, t } − µ (cid:17)(cid:21) , t > µ. The following example is one which fails the regularity condition (C3). Recall that for dis-tributions which are not supported by the positive axis, we need (C3) fully, that is, we cannotapply Remarks 5.2 or 5.5.
Example 5.15 ( Shifted Gamma distribution ) . Assume that p ( x ) = λ − k Γ( k ) ( x − µ ) k − exp (cid:0) − λ − ( x − µ ) (cid:1) , x > µ, is the density function of the shifted Gamma distribution with shape parameter k >
0, scaleparameter λ >
0, and location parameter µ ∈ R \ { } . We have p ′ ( x ) p ( x ) = k − x − µ − λ , x > µ. µ = 0, in order to establish our characterization result, we have to verify the conditionsfrom Theorem 5.4 which includes (C3). However, for k < Z ∞ µ (cid:12)(cid:12) p ′ ( x ) (cid:12)(cid:12) d x ≥ Z ∞ µ | k − | x − µ p ( x ) d x − λ Z ∞ µ p ( x ) d x = 1 − kλ Γ( k ) Z ∞ z k − e − z d z − λ = ∞ . Next, we discuss the characterizations for probability distributions supported by the positiveaxis in the case of exponential families. More specifically, we focus on continuously differentiabledensity functions. Quite a few of the examples we already gave can be written as an exponentialfamily, but we do not reconsider them and instead give a new example at the end of this part.Of course the arguments below could also be used to treat exponential families over the real line,using Theorem 4.1. In detail, we let Θ ⊂ R d be non-empty, and consider an exponential family(over the positive axis) in the natural parametrization given through p ϑ ( x ) = c ( ϑ ) h ( x ) exp (cid:16) ϑ ⊤ T ( x ) (cid:17) , x > , ϑ ∈ Θ , where T = ( T , . . . , T d ) ⊤ : (0 , ∞ ) → R d and h : (0 , ∞ ) → [0 , ∞ ) are (Borel-) measurable functions, ϑ ⊤ is the transpose of a column vector ϑ , and c ( ϑ ) = (cid:18)Z ∞ h ( x ) exp (cid:16) ϑ ⊤ T ( x ) (cid:17) d x (cid:19) − . We choose Θ such that 0 < c ( ϑ ) < ∞ for each ϑ ∈ Θ. The exponential family is assumed to bestrictly d -parametric, that is, we take the functions 1 , T , . . . , T d to be linearly independent onthe complement of every null set. The definition of exponential families, and insights on theirproperties, are provided by virtually any classical textbook on mathematical statistics.We try to get an idea on how the conditions (C1) – (C4) can be handled for exponentialfamilies. Condition (C2) remains a little cryptic, meaning that it depends on the given examplehow it can be proven, and, at this point, we cannot give any improvement to what we discussedin Remark 5.7 concerning that condition. (C1) Assume that T and h are continuously differentiable, and that h is positive. Trivially, theseassumptions cover (C1) for they assure that for each ϑ ∈ Θ, p ϑ is continuously differentiableand positive on (0 , ∞ ). For x > p ′ ϑ ( x ) p ϑ ( x ) = ϑ ⊤ T ′ ( x ) + h ′ ( x ) h ( x ) , where T ′ ( x ) = (cid:0) T ′ ( x ) , . . . , T ′ d ( x ) (cid:1) ⊤ . 16 C3)
Using the weaker subsidy for (C3) given in the Remarks 5.2 and 5.5, a sufficient conditionfor (C3) is derived as follows. Let ϑ ∈ Θ, and take Z ∼ p ϑ L . Then Z ∞ x (cid:12)(cid:12) p ′ ϑ ( x ) (cid:12)(cid:12) d x ≤ (cid:13)(cid:13) ϑ (cid:13)(cid:13) E h(cid:13)(cid:13) T ′ ( Z ) (cid:13)(cid:13) Z i + E (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) h ′ ( Z ) h ( Z ) (cid:12)(cid:12)(cid:12)(cid:12) Z (cid:21) . Therefore, it suffices to know that E (cid:12)(cid:12)(cid:12)(cid:12) h ′ ( Z ) h ( Z ) Z (cid:12)(cid:12)(cid:12)(cid:12) < ∞ and E h(cid:12)(cid:12) T ′ j ( Z ) (cid:12)(cid:12) Z i < ∞ , j = 1 , . . . , d. (5.1)Since T often consists of monomials x k , k ∈ Z , or of some logarithmic term log( x ), (5.1)frequently reduces to a moment constraint which is satisfied if the expectation of T ( Z )exists. (C4) Note that P ϑ , the distribution function corresponding to p ϑ , satisfies lim x ց P ϑ ( x ) = 0, soif lim x ց p ϑ ( x ) > x ց p ϑ ( x ) = 0, a sufficient conditionfor (C4) is that lim x ց (cid:18) ϑ ⊤ T ′ ( x ) + h ′ ( x ) h ( x ) (cid:19) = ∞ . We now give the characterization result that follows from Corollary 5.6. Corollary 5.3 yields asimilar result via the density function, but we will not restate it explicitly.
Corollary 5.16.
Let (cid:8) p ϑ | ϑ ∈ Θ (cid:9) be an exponential family as above. Assume that each p ϑ iscontinuously differentiable and positive, and satisfies (C2) – (C4). Let X be a positive randomvariable with E (cid:20)(cid:18)(cid:13)(cid:13) T ′ ( X ) (cid:13)(cid:13) + (cid:12)(cid:12)(cid:12)(cid:12) h ′ ( X ) h ( X ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:19) X (cid:21) < ∞ . Then X ∼ p ϑ L if, and only if, the distribution function of X has the form F X ( t ) = E (cid:20) − (cid:18) ϑ ⊤ T ′ ( X ) + h ′ ( X ) h ( X ) (cid:19) min { X, t } (cid:21) , t > . Example 5.17 ( Log-normal distribution ) . For parameters µ ∈ R and σ > p ( x ) = 1 x √ π σ exp − (cid:0) log( x ) − µ (cid:1) σ ! = p − ϑ exp (cid:16) ϑ ϑ (cid:17) √ π x exp (cid:16) ϑ log( x ) + ϑ log ( x ) (cid:17) , x > , where ϑ = (cid:0) ϑ , ϑ (cid:1) ⊤ = (cid:0) µσ , − σ (cid:1) ⊤ . In the last representation we see that the class of log-normal distributions forms an exponential family with parameter space Θ = R × ( −∞ , h ( x ) = √ π x , T ( x ) = (cid:0) log( x ) , log ( x ) (cid:1) ⊤ , and c ( ϑ ) = √− ϑ exp (cid:0) ϑ ϑ (cid:1) , where c ( ϑ ) ∈ (0 , ∞ )17or every ϑ ∈ Θ. In this whole example we suppress the index ϑ for p . All of the followingarguments are valid for any fixed (but arbitrary) ϑ ∈ Θ.The density function p is continuously differentiable since h and T are such, and it is positiveas h is so. For x > p ′ ( x ) p ( x ) = ϑ ⊤ T ′ ( x ) + h ′ ( x ) h ( x ) = (cid:0) µ − σ (cid:1) − log( x ) σ x . For the log-normal density function we have lim x ց p ( x ) = 0, as well aslim x ց (cid:18) ϑ ⊤ T ′ ( x ) + h ′ ( x ) h ( x ) (cid:19) = lim x ց (cid:0) µ − σ (cid:1) − log( x ) σ x = ∞ , and the discussion of (C4) yields that this condition holds. In order to establish (C3), let Z ∼ p L .Then log( Z ) is Gaussian with mean µ and variance σ , and the expectation of log( Z ) exists, thatis, E | log( Z ) | < ∞ . Therefore, we have E (cid:12)(cid:12)(cid:12)(cid:12) h ′ ( Z ) h ( Z ) Z (cid:12)(cid:12)(cid:12)(cid:12) = 1 < ∞ , E (cid:12)(cid:12) T ′ ( Z ) Z (cid:12)(cid:12) = 1 < ∞ , and E (cid:12)(cid:12) T ′ ( Z ) Z (cid:12)(cid:12) = 2 E (cid:12)(cid:12) log( Z ) (cid:12)(cid:12) < ∞ , which suffices for (C3) by the discussions above. The proof of (C2) is a bit tedious and followsRemark 5.7. As it provides no insight on that regularity condition, we omit it here. The char-acterization result for the log-normal distribution as given in the Corollary 5.16 is as follows. If X is a positive random variable with E | log( X ) | < ∞ , then X follows the log-normal law withparameters µ ∈ R and σ > X has the form F X ( t ) = E (cid:20) − ( µ − σ ) − log( X ) σ X min { X, t } (cid:21) , t > . Finally, we state the characterization result for a probability density function p with supportbounded from above, spt( p ) = ( −∞ , R ], R < ∞ . We omit the proof since it is a collage of earlierproofs, and only state the result. A characterization via the density function also holds, and itsform is immediately conceivable from the result we state below. Similar observations concerningintegrability conditions carry over from the case of density function with support bounded frombelow. Corollary 5.18.
Let p be a probability density function with spt( p ) = ( −∞ , R ] that satisfiesthe conditions (C1) – (C3) and (C5). Take X : Ω → ( −∞ , R ) to be a random variable with P (cid:0) X ∈ S ( p ) (cid:1) = 1 , E (cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) (cid:12)(cid:12)(cid:12) < ∞ , and E (cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) X (cid:12)(cid:12)(cid:12) < ∞ . Then X ∼ p L if, and only if, − F X ( t ) = E (cid:20) p ′ ( X ) p ( X ) (cid:16) R − max { X, t } (cid:17)(cid:21) , t < R. For the sake of completeness, we study probability density functions p : R → [0 , ∞ ) with boundedsupport, spt( p ) = [ L, R ], where
L > −∞ and R < ∞ . The proofs of our previous characterizations18ely on the fact that lim x → ±∞ p ( x ) = 0. However, we can do more: The results can be extendedto cases where the limit to one endpoint of the support merely exists. The techniques needed forthe proofs of the statements in this section resemble the ones we have used so far, so we shortenthe arguments. As in Section 5, we start with the characterizations via the density function beforederiving further results from them. We divide the study into density functions for which the limitto the right endpoint of the support exists and such density functions for which the limit to theleft endpoint exists. Lemma 6.1.
Let p be a probability density function with spt( p ) = [ L, R ] that satisfies conditions(C1) – (C5), and for which the limit lim x ր R p ( x ) exists. Take X : Ω → ( L, R ) to be a randomvariable with density function f X , and E (cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) (cid:12)(cid:12)(cid:12) < ∞ . Then X ∼ p L if, and only if, f X ( t ) = E (cid:20) − p ′ ( X ) p ( X ) { X > t } (cid:21) + lim x ր R p ( x ) , L < t < R. The main ideas of the proof are summarized in Appendix A.4.
Remark 6.2.
Note that condition (C3) is simply R S ( p ) | p ′ ( x ) | d x < ∞ by the boundedness of thesupport. Also notice that E (cid:12)(cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) X (cid:12)(cid:12)(cid:12)(cid:12) ≤ max (cid:8) | L | , | R | (cid:9) E (cid:12)(cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) (cid:12)(cid:12)(cid:12)(cid:12) , so we never have to state both integrability conditions on X . Remark 6.3.
By the argument of Remark 5.2, in the case of a continuously differentiable densitywith L = 0, we can replace the integrability condition on X completely with E (cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) X (cid:12)(cid:12)(cid:12) < ∞ ,and substitute (C3) with R R x | p ′ ( x ) | d x < ∞ , which is weaker than R R | p ′ ( x ) | d x < ∞ . However,the equality f X ( t ) = E (cid:20) − p ′ ( X ) p ( X ) { X > t } (cid:21) + lim x ր R p ( x )in Lemma 6.1 will then only hold for L -a.e. 0 < t < R .Complementary to Lemma 6.1 (and with a similar proof), we have the following result. Lemma 6.4.
Let p be a probability density function with spt( p ) = [ L, R ] that satisfies the condi-tions (C1) – (C5), and for which the limit lim x ց L p ( x ) exists. Let X : Ω → ( L, R ) be a randomvariable with density function f X , and E (cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) (cid:12)(cid:12)(cid:12) < ∞ . Then X ∼ p L if, and only if, f X ( t ) = E (cid:20) p ′ ( X ) p ( X ) { X ≤ t } (cid:21) + lim x ց L p ( x ) , L < t < R. With obvious adaptations, Remark 6.3 also applies here (in the case R = 0). We now usethe Lemmata 6.1 and 6.4 to derive the corresponding characterization results via the distributionfunction. We start again with the case of an existing limit to the right endpoint of the support.19 orollary 6.5. Let p be a probability density function with spt( p ) = [ L, R ] such that (C1) – (C5)are satisfied. Assume that the limit lim x ր R p ( x ) exists. Suppose that X : Ω → ( L, R ) is a randomvariable with P (cid:0) X ∈ S ( p ) (cid:1) = 1 , and E (cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) (cid:12)(cid:12)(cid:12) < ∞ . Then X ∼ p L if, and only if, F X ( t ) = E (cid:20) − p ′ ( X ) p ( X ) (cid:16) min { X, t } − L (cid:17)(cid:21) + ( t − L ) lim x ր R p ( x ) , L < t < R. The proof runs along the lines of Theorem 5.4.
Remark 6.6.
Whenever p is continuously differentiable with support [0 , R ], it suffices to have R R x | p ′ ( x ) | d x < ∞ , instead of (C3), and the weaker condition E (cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) X (cid:12)(cid:12)(cid:12) < ∞ to cover therequirements of Corollary 6.5.The following result is complementary to Corollary 6.5. Corollary 6.7.
Assume that p is a probability density function with spt( p ) = [ L, R ] that satisfies(C1) – (C5). Further suppose that the limit lim x ց L p ( x ) exists. Let X : Ω → ( L, R ) be a randomvariable with P (cid:0) X ∈ S ( p ) (cid:1) = 1 , and E (cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) (cid:12)(cid:12)(cid:12) < ∞ . Then X ∼ p L if, and only if, the distributionfunction of X satisfies − F X ( t ) = E (cid:20) p ′ ( X ) p ( X ) (cid:16) R − max { X, t } (cid:17)(cid:21) + ( R − t ) lim x ց L p ( x ) , L < t < R. Remark 6.6 applies, with minor (but obvious) adaptations, in the case R = 0. In general,the characterization results for probability density functions with bounded support give a goodhandle on a variety of wrapped and truncated distributions, like the truncated normal- or thewrapped exponential distribution. However, we state only the uniform- and the beta distributionas examples explicitly. Again, we refrain from giving the details of the calculations to check theregularity conditions. For the beta distribution, we invoke Remark 6.6. Example 6.8 ( Uniform distribution ) . For x ∈ ( L, R ) let p ( x ) = R − L be the density functionof the uniform distribution on the interval ( L, R ). The conditions (C1) – (C5) are trivial to check.Since the derivate of p vanishes on ( L, R ), the identities from the Corollaries 6.5 and 6.7 are thesame. They read as follows. A random variable X : Ω → ( L, R ) is distributed uniformly over(
L, R ) if, and only if, its distribution function has the form F X ( t ) = t − LR − L , L < x < R.
Apparently, we recovered the observation that the explicitly calculable form of the uniform distri-bution function uniquely identifies this distribution, so our characterization is redundant in thiscase.
Example 6.9 ( Beta distribution ) . Let α > β >
1, and p ( x ) = x α − (1 − x ) β − B ( α, β ) , < x < , B ( α, β ) = Γ( α ) Γ( β )Γ( α + β ) denotes the Beta function. Since β >
1, the limit to the right endpointof the support exists. More precisely, we have lim x ր p ( x ) = 0. Therefore, Corollary 6.5 yieldsthe following characterization. Suppose X is a random variable which takes values in (0 ,
1) almostsurely and satisfies E (cid:12)(cid:12)(cid:12) X − X (cid:12)(cid:12)(cid:12) < ∞ . Then X has the Beta distribution with parameters α > β > X has the form F X ( t ) = E (cid:20)(cid:18) β − − X − α − X (cid:19) min { X, t } (cid:21) , < t < . The Beta distribution also marks a limitation of our characterizations. If 0 < α, β <
1, ourresults fail to hold since none of the required limits exist. A special case for this phenomenon isthe Arcsine distribution, which is the Beta distribution with parameters α, β = . The idea to use distributional characterizations as a basis for statistics in goodness-of-fit testingproblems is classic, see Nikitin (2017) and O’Reilly and Stephens (1982). In this spirit andregarding the results of the previous sections, we propose goodness-of-fit tests for any distributionwith a density function that satisfies the regularity conditions of either of our characterizations(Theorems 4.1, 5.4, and Corollaries 5.18, 6.5, 6.7). For the sake of readability, we give thefollowing discussion in the case of continuously differentiable and positive density functions onthe positive axis, dealt with in Corollary 5.6. This case includes the largest class of exampleswe gave previously, and it also includes the new test we provide at the end of this section. Thearguments for using the characterizations for density functions on the whole real line or suchdensities that have bounded support to construct corresponding goodness-of-fit tests are verysimilar, of course.We consider a parametric family of distributions P Θ = (cid:8) p ϑ L | ϑ ∈ Θ (cid:9) , Θ ⊂ R d , where weassume that spt( p ϑ ) = [0 , ∞ ) and that p ϑ is continuously differentiable and positive on (0 , ∞ ).Moreover, p ϑ is taken to satisfy the prerequisites of Corollary 5.6. Testing the fit of a positiverandom variable X to P Θ means to test the hypothesis H : P X ∈ P Θ (7.1)against general alternatives. Let s : (0 , ∞ ) × Θ → (0 , ∞ ) be a measurable function, used forscaling, such that X ∼ p ϑ L if, and only if, s ( X ; ϑ ) ∼ p ϑ ∗ L for some ϑ ∗ ∈ Θ ∗ ⊂ Θ. We assumethat E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p ′ ϑ ∗ (cid:0) s ( X ; ϑ ) (cid:1) p ϑ ∗ (cid:0) s ( X ; ϑ ) (cid:1) s ( X ; ϑ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < ∞ . By Corollary 5.6, we have s ( X ; ϑ ) ∼ p ϑ ∗ L if, and only if, the distribution function of s ( X ; ϑ )21as the form F s ( X ; ϑ ) ( t ) = E " − p ′ ϑ ∗ (cid:0) s ( X ; ϑ ) (cid:1) p ϑ ∗ (cid:0) s ( X ; ϑ ) (cid:1) min n s ( X ; ϑ ) , t o , t > . (7.2)In order to test H based on a sample X , . . . , X n of independent and identically distributed(iid.) positive random variables, put Y n,j = s ( X j ; b ϑ n ), j = 1 , . . . , n . We consider the empiricaldistribution function b F n of Y n, , . . . , Y n,n as an estimator of F s ( X ; ϑ ) . Hereby denoting a consistentestimator of ϑ by b ϑ n = b ϑ n ( X , . . . , X n ), we use b ϑ ∗ n = b ϑ n ( Y n, , . . . , Y n,n ) as an estimator of ϑ ∗ ∈ Θ ∗ ,and take b T n ( t ) = − n n X j =1 p ′ b ϑ ∗ n (cid:0) Y n,j (cid:1) p b ϑ ∗ n (cid:0) Y n,j (cid:1) min (cid:8) Y n,j , t (cid:9) , t > , as an estimator of the second quantity in (7.2). Taking some metric δ on a set containing bothfunctions, we propose as a goodness-of-fit statistic the quantity δ (cid:16) b T n , b F n (cid:17) . By (7.2), this term ought to be close to zero under H , so large values of the statistic will leadus to reject the hypothesis.As witnessed by Baringhaus and Henze (2000), Betsch and Ebner (2019a), and Betsch and Ebner(2019b), tests of this type are noteworthy competitors to established tests. An advantage liesin the range of their applicability. A substantial proportion of known procedures relies on acomparison between theoretical moment generating functions, see Caba˜na and Quiroz (2005),Henze and Jim´enez-Gamero (2019), and Zghoul (2010), or characteristic functions, as employedby Baringhaus and Henze (1988), Epps and Pulley (1983), and Jim´enez-Gamero et al. (2009),and their empirical pendants, or on a differential equation that characterizes the Laplace trans-formation, see Henze and Klar (2002) and Henze et al. (2012). All of these share the unpleasantfeature that in order to establish the theoretic basis for the test statistics, one has to have ex-plicit knowledge about these transformations for the distribution in consideration. Since theirhandling is not possible for every distribution, our suggestions provide a genuine alternative, forthey require no more than the knowledge of the density function and its derivative. Moreover,our tests do not rely on a characterization that is tailored to one specific distribution. Instead,we provide a framework for testing fit to many different distributions, as indicated by our list ofexamples. In Betsch and Ebner (2019a), the authors establish the result of Corollary 5.6 for the special caseof the Gamma distribution and examine the corresponding goodness-of-fit statistic. Denote by22 ϑ ( x ) = λ − k Γ( k ) x k − e − x/λ , x >
0, where ϑ = ( k, λ ) ∈ (0 , ∞ ) = Θ, the density function of theGamma distribution with shape parameter k and scale parameter λ . Let X be a positive randomvariable with E X < ∞ . To reflect the scale invariance of the class of Gamma distributions, choosethe scaling function s ( x ; ϑ ) = x/λ . Apparently, X ∼ p ϑ L if, and only if, s ( X ; ϑ ) ∼ p ϑ ∗ L , where ϑ ∗ = ( k, ∈ (0 , ∞ ) × { } = Θ ∗ , and E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p ′ ϑ ∗ (cid:0) s ( X ; ϑ ) (cid:1) p ϑ ∗ (cid:0) s ( X ; ϑ ) (cid:1) s ( X ; ϑ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ | k − | + λ − E X < ∞ . By Example 5.8, X follows a Gamma law with parameter vector ϑ = ( k, λ ) if, and only if, F X/λ ( t ) = F s ( X ; ϑ ) ( t ) = E (cid:20)(cid:18) − k − s ( X ; ϑ ) + 1 (cid:19) min (cid:8) s ( X ; ϑ ) , t (cid:9)(cid:21) , t > . To construct the goodness-of-fit test, let X , . . . , X n be iid. copies of X and consider a consistent,scale equivariant estimator b λ n = b λ n ( X . . . , X n ) of λ as well as a consistent, scale invariantestimator b k n = b k n ( X , . . . , X n ) of k . We set Y n,j = s ( X j ; b k n , b λ n ) = X j / b λ n , j = 1 , . . . , n .Naturally, b λ ∗ n = b λ n ( Y n, , . . . , Y n,n ) = 1 and b k ∗ n = b k n ( Y n, , . . . , Y n,n ) = b k n ( X , . . . , X n ) = b k n areconsistent estimators of λ ∗ = 1 and k ∗ = k . In accordance with our general consideration above,we take b T n ( t ) = 1 n n X j =1 − b k n − Y n,j + 1 ! min { Y n,j , t } , t > . Betsch and Ebner (2019a) considered the functions b T n and b F n = n − P nj =1 { Y n,j ≤ ·} as randomelements of the Hilbert space L (cid:0) (0 , ∞ ) , B > , w ( t ) d t (cid:1) , where w is an appropriate weight function.They obtained the statistic G n = n Z ∞ (cid:12)(cid:12)(cid:12) b T n ( t ) − b F n ( t ) (cid:12)(cid:12)(cid:12) w ( t ) d t, derived the limit distribution under the hypothesis using the Hilbert space central limit theorem,and gave a proof of the consistency of this test procedure against fixed alternatives with existingexpectation. Moreover, they explained how to implement the test using a parametric bootstrapand showed in a Monte Carlo simulation study that the test excels classical procedures andkeeps up with the best Gamma tests proposed so far. Contributions like Henze et al. (2012),Plubin and Siripanich (2017), and Villase˜nor and Gonz´alez-Estrada (2015) indicate that testingfit to the Gamma distribution is also a topic of ongoing research.The characterization of the exponential distribution via the mean residual life function is a spe-cial case of Corollary 5.6 (cf. Example 5.9), and thus the corresponding test for exponentiality isto be seen as a special case of the test for the Gamma distribution at hand. Baringhaus and Henze(2000) used the characterization, which was known in a different disguise already, to construct theassociated test for exponentiality in the sense described above. They showed that the limit distri-bution under the hypothesis coincides with the limiting null distribution of the classical Cram´er-von Mises statistic when testing for uniformity over the unit interval. Furthermore, they proved23he consistency of the test procedure against any fixed alternative distribution. The test hasalready been included in the extensive comparative simulation study conducted by Allison et al.(2017). Adding a tuning parameter to the weight function leads to the test statistic proposedby Baringhaus and Henze (2008). The recent papers by Cupari´c et al. (2018), Jovanovi´c et al.(2015), Nikitin (2017), Noughabi (2015), Torabi et al. (2018), Volkova and Nikitin (2015), andZardasht et al. (2015) show that tests for exponentiality are still of importance to the researchcommunity. The goodness-of-fit tests for normality proposed by Betsch and Ebner (2019b) are also includedin our framework (cf. Example 4.4). To fix notation, we write p ϑ for the normal distributiondensity with mean-variance-parameter vector ϑ = ( µ, σ ) ∈ R × (0 , ∞ ) = Θ. Consider a real-valued random variable X with E X < ∞ . Taking into account the invariance under lineartransformations of the class of normal distributions, Betsch and Ebner (2019b) used the scalingfunction s ( x ; ϑ ) = ( x − µ ) /σ . Naturally, X ∼ p ϑ L if, and only if, s ( X ; ϑ ) ∼ p ϑ ∗ L , where ϑ ∗ = (0 , s ( X ; ϑ ) follows the standard Gaussian law. Furthermore, we have E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p ′ ϑ ∗ (cid:0) s ( X ; ϑ ) (cid:1) p ϑ ∗ (cid:0) s ( X ; ϑ ) (cid:1) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = E (cid:12)(cid:12) s ( X ; ϑ ) (cid:12)(cid:12) ≤ σ (cid:16) E | X | + | µ | (cid:17) < ∞ and E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p ′ ϑ ∗ (cid:0) s ( X ; ϑ ) (cid:1) p ϑ ∗ (cid:0) s ( X ; ϑ ) (cid:1) s ( X ; ϑ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = E (cid:0) s ( X ; ϑ ) (cid:1) ≤ σ (cid:16) E X + 2 | µ | E | X | + µ (cid:17) < ∞ . As a consequence, Example 4.4 states that X follows a normal distribution with parameter vector ϑ = ( µ, σ ) if, and only if, F s ( X ; ϑ ) ( t ) = E h s ( X ; ϑ ) (cid:0) s ( X ; ϑ ) − t (cid:1) { s ( X ; ϑ ) ≤ t } i , t ∈ R . For iid. copies X , . . . , X n of X , we consider the sample mean X n = n − P nj =1 X j and variance S n = n − P nj =1 ( X j − X n ) as consistent estimators of µ and σ . We put Y n,j = s ( X j ; X n , S n ) = ( X j − X n ) /S n , j = 1 , . . . , n, and notice that b ϑ ∗ n = (cid:0) X ∗ n , S n ∗ (cid:1) = (0 , b T n ( t ) = 1 n n X j =1 Y n,j ( Y n,j − t ) { Y n,j ≤ t } , t ∈ R . It remains to compare b T n with the empirical distribution function b F n of Y n, , . . . , Y n,n by anappropriate measure of deviation. In particular, Betsch and Ebner (2019b) considered b T n and b F n as random elements in the Hilbert space L (cid:0) R , B , w ( t ) d t (cid:1) , where w is a suitable weight24unction, and chose as a metric the one induced by the Hilbert space norm. In accordance withour general considerations at the beginning of this section, their statistic has the form G n = n Z R (cid:12)(cid:12)(cid:12) b T Xn ( t ) − b F n ( t ) (cid:12)(cid:12)(cid:12) w ( t ) d t. Besides specifying weight functions for which the statistic has an explicit formula, Betsch and Ebner(2019b) used the central limit theorem for random elements in separable Hilbert spaces to de-rive the limit distributions under the hypothesis H in (7.1). Furthermore, they established theconsistency of the test procedures against fixed alternatives with existing second moment, andshowed in a Monte Carlo simulation study that these tests are serious competitors to establishedprocedures. The problem of testing for normality is still of interest in research, as evidenced byHenze and Jim´enez-Gamero (2019), Henze et al. (2019), Henze and Koch (2017), and numerouspreprints. We consider the uniform distribution on the unit interval, p ( t ) = (0 , ( t ), t ∈ R . According toExample 6.8, our characterization results for the uniform distribution reduce to the fact the lawis determined uniquely by its distribution function F ( t ) = t , 0 < t <
1. Thus, in line with thegeneral construction above, we obtain the statistics K n = √ n sup < t < (cid:12)(cid:12)(cid:12) b F n ( t ) − F ( t ) (cid:12)(cid:12)(cid:12) and ω n = n Z (cid:12)(cid:12)(cid:12) b F n ( t ) − F ( t ) (cid:12)(cid:12)(cid:12) d F ( t ) (7.3)for testing the uniformity hypothesis. Here, b F n is the empirical distribution function of X , . . . , X n ,which are iid. copies of a random variable X with values in (0 , In this subsection, we propose a new goodness-of-fit test for the two parameter Burr Type XIIdistribution Burr
XII ( k, c ), k, c >
0, based on the characterization given in Example 5.12, fixingthe scale parameter σ = 1. The distribution is known under a variety of names, e.g. as the Singh-Maddala distribution or as the Pareto (IV) distribution, for details see Kleiber and Kotz (2003),Section 6.2. We denote the density function corresponding to the Burr XII ( k, c ) distribution by25 ϑ ( x ) = c k x c − (1 + x c ) − k − , x >
0, where ϑ = ( k, c ) ∈ (0 , ∞ ) = Θ. For iid. copies X , . . . , X n of X define b T n ( t ) = 1 n n X j =1 b c n (cid:16)b k n + 1 (cid:17) X b c n − j X b c n j − b c n − X j ! min { X j , t } , t > , in accordance with the general framework above, which leads to the family of L -type statistics B n,a = n Z ∞ (cid:12)(cid:12)(cid:12) b T n ( t ) − b F n ( t ) (cid:12)(cid:12)(cid:12) w a ( t ) d t. Here b k n and b c n are consistent estimators of the parameters k and c , b F n is the empirical distribu-tion function of X , . . . , X n , and w a ( t ) = exp( − at ) is a weight function depending on a tuningparameter a >
0. Rejection of the hypothesis H in (7.1), i.e. that the data comes from the BurrType XII family, is for large values of B n,a . Writing X (1) ≤ . . . ≤ X ( n ) for the order statistics of X , . . . , X n , we have after some tedious calculations B n,a = 2 n X ≤ j<ℓ ≤ n A [1]( ℓ ) ,n A [1]( j ) ,n a (cid:0) − e − aX ( j ) (cid:1) + A [2]( j ) ,n a (cid:0) e − aX ( j ) + e − aX ( ℓ ) (cid:1) + b c n − a e − aX ( j ) − X ( j ) a e − aX ( j ) + A [2]( j ) ,n a e − aX ( ℓ ) + 1 n n X j =1 (cid:0) A [1]( j ) ,n (cid:1) (cid:18) − X ( j ) a e − aX ( j ) − a e − aX ( j ) + 2 a (cid:19) + 2( j − b c n a A [1]( j ) ,n e − aX ( j ) + 2 A [2]( j ) ,n a e − aX ( j ) + 2 b c n a n n X j =1 j e − aX ( j ) − a n n X j =1 e − aX ( j ) , where A [1]( j ) ,n = b c n (cid:16)b k n + 1 (cid:17) X b c n − j ) X b c n ( j ) − b c n − X ( j ) and A [2]( j ) ,n = − b c n (cid:16)b k n + 1 (cid:17) X b c n ( j ) X b c n ( j ) , which is an easily computable formula that avoids any numerical integration routines. In thefollowing simulation study we show the effectiveness of this new test statistics in comparisonto the classical procedures adapted for the composite hypothesis H , namely the Kolmogorov-Smirnov test K n , the Cram´er-von Mises test CM , the Anderson-Darling test AD , and the Watsontest W A . Let F ( x ; k, c ) = 1 − (1 + x c ) − k , x >
0, denote the distribution function of Burr
XII ( k, c ).The K n -statistic is K n = max { D + , D − } , where D + = max j = 1 ,...,n (cid:16) j/n − F (cid:16) X ( j ) ; b k n , b c n (cid:17)(cid:17) , − = max j = 1 ,...,n (cid:16) F (cid:16) X ( j ) ; b k n , b c n (cid:17) − ( j − /n (cid:17) . The statistics of Cram´er-von Mises and Anderson-Darling are given by CM = 112 n + n X j =1 (cid:18) F (cid:16) X ( j ) ; b k n , b c n (cid:17) − j − n (cid:19) and AD = − n − n n X j =1 h (2 j −
1) log F (cid:16) X ( j ) ; b k n , b c n (cid:17) + (cid:0) n − j ) + 1 (cid:1) log (cid:16) − F (cid:16) X ( j ) ; b k n , b c n (cid:17)(cid:17)i , respectively, whereas the W A -statistic takes the form
W A = CM − n n n X j =1 F (cid:16) X ( j ) ; b k n , b c n (cid:17) − . For all procedures the parameters are estimated via the maximum likelihood method, maximizingnumerically the log-likelihood function, see Jalali and Watkins (2009) and Wingo (1983). Thereare other estimation procedures available, like the maximum product of spacings method, seeShah and Gokhale (1993). Critical points are obtained for the classical tests, as well as for thenew test, by the same parametric bootstrap procedure, as follows: For a given sample X , . . . , X n of size n , compute the estimators b k n , b c n of k and c . Conditionally on b k n , b c n , generate 100 bootstrapsamples of size n from Burr XII ( b k n , b c n ). Calculate the value of the test statistic, say B ∗ j ( j =1 , , . . . , p n as B ∗ (90) , where B ∗ ( j ) denotethe ordered B ∗ j –values, and reject H if B n,a = B n,a ( X , . . . , X n ) > p n .The following (alternative) distributions are considered (all densities defined for x > θ > XII ( k, c ),2. the exponential distribution Exp( θ ),3. the linear increasing failure rate law LF ( θ ) with density (1 + θx ) exp( − x − θx / /π ) / exp( − x / HN ,5. the half-Cauchy distribution with density 2 / (cid:0) π (1 + x ) (cid:1) , denoted by HC ,6. the Gompertz law GO ( θ ) having distribution function 1 − exp[ θ − (1 − e x )],7. the inverse Gaussian law IG ( θ ) with density (cid:0) θ/ (2 π ) (cid:1) / x − / exp[ − θ ( x − / (2 x )],8. the Weibull distribution with density θx θ − exp( − x θ ), denoted by W ( θ ),9. the inverse Weibull distribution with density θ (1 /x ) θ +1 exp[ − (1 /x ) θ ], denoted by IW ( θ ).27ll computations are performed using the statistical computing environment R , see R Core Team(2019). In each scenario, we consider the sample sizes n = 100 and n = 200, and the nominallevel of significance α is set to 0 .
1. Each entry in Tables 1 and 2 presents empirical rejection ratescomputed with 10 000 Monte Carlo runs. The number of bootstrap samples in each run is fixedto 100, and for the tuning parameter a we consider the values { . , . , , , , } . The bestperforming test for each distribution and sample size is highlighted for easy reference.Alt./Test B . B . B B B B K n CM AD W A
Burr
XII (1 ,
1) 10 10 10 10 10 10 10 10 10 11Burr
XII (2 ,
1) 9 10 10 10 10 10 10 10 09 11Burr
XII (4 ,
1) 6 7 8 11 10 10 11 10 10 11Burr
XII (0 . ,
2) 9 9 9 10 9 8 10 10 10 10Burr
XII (2 , .
5) 10 10 10 10 10 11 10 10 10 10Exp(1) 0 27
55 44 42 51 61 66 53 LF (2) 0 0 2
73 63 56 67 74 59 LF (4) 0 0 0 56
60 48 57 64 46 HC
12 12 13 14 13 12 12 13 HN GO (2) 0 4 90 99 98 93 97 99 IG (0 .
5) 13 45 66 81
82 52 64 72 61 IG (1 .
5) 2 6 22 40
46 24 31 37 32 IG (3) 1 2 8 18
24 16 21 23 23 W (0 .
70 60 32 23 22 52 61 65 53 W (1 .
5) 0 0 38
54 47 52 60 65 52 W (3) 0 0 0 65
56 52 61 65 52 IW (1) 46 50 56
63 44 37 42 48 41Table 1: Percentage of rejection for 10 000 Monte Carlo repetitions ( n = 100, α = 0 . H depends on the true values ofthe parameters, but results for tests of location-scale families by Allison and Santana (2015) andTenreiro (2019) give hope for new developments. A good compromise for practitioners concerningthe choice of the tuning parameter is a = 3 in view of Tables 1 and 2.28lt./Test B . B . B B B B K n CM AD W A
Burr
XII (1 ,
1) 10 10 10 10 10 10 10 10 10 10Burr
XII (2 ,
1) 10 10 10 10 10 11 10 9 9 10Burr
XII (4 ,
1) 7 8 10 10 10 10 10 10 10 11Burr
XII (0 . ,
2) 10 10 10 10 10 9 11 10 10 9Burr
XII (2 , .
5) 10 10 9 10 9 10 10 10 10 10Exp(1) 4 78
83 71 66 81 88 92 82 LF (2) 0 0 19
95 89 85 92 95 88 LF (4) 0 0 0 89
85 75 83 89 76 HC
14 13 15
18 15 15 17 HN
98 94 97 99 GO (2) 0 57
100 100 100 100 100 100 100 100 IG (0 .
5) 29 78 93
99 99 99
82 93 97 92 IG (1 .
5) 1 13 41 72 82
44 53 66 56 IG (3) 0 2 13 34 48
28 37 46 40 W (0 .
94 86 55 40 36 81 89 91 80 W (1 .
5) 0 6 84 91 81 70 80 88 W (3) 0 0 6 93
88 80 89 92 82 IW (1) 74 78 83
93 93
85 63 72 78 69Table 2: Percentage of rejection for 10 000 Monte Carlo repetitions ( n = 200, α = 0 . B n,a to converge under H to the square of the L -norm of a centered Gaussianprocess, and the tests to be consistent against fixed alternatives. We devoted this work to the derivation of explicit characterizations for a large class of continu-ous univariate probability distributions. Our motivation was the fact that the characterizationof the standard normal distribution as the unique fixed point of the zero-bias transformationreduces to an explicit formula for the distribution function of the transformed distribution. Weextrapolated this formula to other distributions by applying the Stein type identity commonlyused within the density approach. Research related to our characterizations concerns the study ofdistributional transformations, see Goldstein and Reinert (2005) and D¨obler (2017). While theseare constructed from scratch and are used to prove Stein type characterizations, we took sucha Stein identity for granted and dropped the ambition to obtain distributional transformations.29hus, starting with more information and demanding less structure from the transformations, weestablished better accessible explicit characterization formulae. In the last section, we discussedan immediate application. We illustrated how to use the characterizations for the constructionof goodness-of-fit tests. The corresponding procedures for the normal-, the exponential- and theGamma distribution have already been investigated in the literature, and they show very promis-ing performance. The great advantage of our approach lies in the wide range of its applicability.To confirm this last claim, we constructed the (to our best knowledge) first ever goodness-of-fittest focused on the Burr Type XII distribution.
Acknowledgements
The authors would like to thank an associate editor as well as three anonymous reviewers fortheir comments and suggestions that led to a major improvement the paper.
A Proofs
A.1 Proof of Lemma 3.1 If X ∼ p L , any f ∈ F p satisfies E (cid:20) f ′ ( X ) + p ′ ( X ) p ( X ) f ( X ) (cid:21) = Z S ( p,f ) (cid:0) f · p (cid:1) ′ ( x ) d x = Z y f L (cid:0) f · p (cid:1) ′ ( x ) d x + m X ℓ =1 Z y fℓ +1 y fℓ (cid:0) f · p (cid:1) ′ ( x ) d x + Z Ry fm +1 (cid:0) f · p (cid:1) ′ ( x ) d x = lim x ր R f ( x ) p ( x ) − lim x ց L f ( x ) p ( x )+ m +1 X ℓ =1 lim x ր y fℓ f ( x ) p ( x ) − lim x ց y fℓ f ( x ) p ( x ) ! = 0 . For the converse, fix t ∈ S ( p ) \ disc( X ) and define f pt : ( L, R ) → R through f pt ( x ) = 1 p ( x ) Z xL (cid:16) ( L,t ] ( s ) − P ( t ) (cid:17) p ( s ) d s. The function f pt is continuous, andlim x ր R f pt ( x ) p ( x ) = Z RL (cid:16) ( L,t ] ( s ) − P ( t ) (cid:17) p ( s ) d s = P ( t ) − P ( t ) = 0 . Noting that f pt ( x ) = p ( x ) P ( x ) (cid:0) − P ( t ) (cid:1) for x < t , we also have lim x ց L f pt ( x ) p ( x ) = 0. Withthis representation of f pt ( x ) for x < t , as well as with f pt ( x ) = p ( x ) (cid:0) − P ( x ) (cid:1) P ( t ) for x > t , wesee that f pt is differentiable on S ( p ) \ { t } = S ( p, f pt ) with f p ′ t ( x ) = − p ′ ( x ) p ( x ) f pt ( x ) + ( L,t ] ( x ) − P ( t ) , x ∈ S ( p ) \ { t } . (A.1)30e get with condition (C2)sup x ∈ S ( p ) \ { t } (cid:12)(cid:12)(cid:12)(cid:12) p ′ ( x ) p ( x ) f pt ( x ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ x ∈ S ( p ) (cid:12)(cid:12)(cid:12)(cid:12) p ′ ( x ) min { P ( x ) , − P ( x ) } p ( x ) (cid:12)(cid:12)(cid:12)(cid:12) = 2 sup x ∈ S ( p ) κ p ( x ) < ∞ , and, by (A.1),sup x ∈ S ( p ) \ { t } (cid:12)(cid:12) f p ′ t ( x ) (cid:12)(cid:12) ≤ sup x ∈ S ( p ) \ { t } (cid:12)(cid:12)(cid:12)(cid:12) p ′ ( x ) p ( x ) f pt ( x ) (cid:12)(cid:12)(cid:12)(cid:12) + 2 ≤ x ∈ S ( p ) κ p ( x ) + 2 < ∞ . Thus f pt ∈ F p and t f pt = t / ∈ disc( X ). The assumption in the converse implication and (A.1) yield0 = E (cid:20) f p ′ t ( X ) + p ′ ( X ) p ( X ) f pt ( X ) (cid:21) = P ( X ≤ t ) − P ( t ) . Hence P ( X ≤ t ) = P ( t ) for all t ∈ S ( p ) \ disc( X ). As S ( p ) \ disc( X ) is dense in ( L, R ) and t P ( X ≤ t ), t P ( t ) are right-continuous, the claim follows. A.2 Proof of Theorem 4.1
Let X ∼ p L . By (C1) and (C3) we may use the fundamental theorem of calculus to obtain F X ( t ) = P ( t ) = Z t −∞ p ( y ) + k − X ℓ =1 (cid:16) p ( y ℓ +1 ) − p ( y ℓ ) (cid:17) + p ( s ) − p ( y k ) ! d s = Z t −∞ Z y −∞ p ′ ( x ) d x + k − X ℓ =1 Z y ℓ +1 y ℓ p ′ ( x ) d x + Z sy k p ′ ( x ) d x ! d s, where k = k ( s ) is the largest index in { , . . . , m } for which still y k < s [for all s ≤ y the y ℓ neednot be taken into account as p is continuously differentiable on ( −∞ , s ) in these cases]. Now,since X has density function p , we easily see [still using (C1)] that Z y ℓ +1 y ℓ p ′ ( x ) d x = E (cid:20) p ′ ( X ) p ( X ) { y ℓ < X ≤ y ℓ +1 } (cid:21) , for ℓ ∈ { , . . . , k − } , and similar representations for the other integrals give F X ( t ) = Z t −∞ E (cid:20) p ′ ( X ) p ( X ) { X ≤ s } (cid:21) d s = E (cid:20) p ′ ( X ) p ( X ) ( t − X ) { X ≤ t } (cid:21) , t ∈ R , where, in the second equality, we used Fubini’s theorem. That is admissible since Tonelli’s theoremand (C3) imply, for each t ∈ R , Z t −∞ E (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) (cid:12)(cid:12)(cid:12)(cid:12) { X ≤ s } (cid:21) d s = E (cid:20) | p ′ ( X ) | p ( X ) ( t − X ) { X ≤ t } (cid:21) ≤ Z S ( p ) (cid:12)(cid:12) p ′ ( x ) (cid:12)(cid:12) (cid:0) | t | + | x | (cid:1) d x < ∞ . For the converse, assume that the distribution function of X is given through the explicit formulain terms of X as in the theorem. Putting d Xp ( t ) = E (cid:20) p ′ ( X ) p ( X ) { X ≤ t } (cid:21) , t ∈ R , E (cid:20)Z t −∞ | p ′ ( X ) | p ( X ) { X ≤ s } d s (cid:21) = E (cid:20) | p ′ ( X ) | p ( X ) (cid:0) t − X (cid:1) { X ≤ t } (cid:21) < ∞ for every t ∈ R . Thus, Fubini’s theorem implies Z t −∞ d Xp ( s ) d s = Z t −∞ E (cid:20) p ′ ( X ) p ( X ) { X ≤ s } (cid:21) d s = E (cid:20) p ′ ( X ) p ( X ) Z t −∞ { X ≤ s } d s (cid:21) = F X ( t )for t ∈ R . Since F X is increasing and d Xp is right-continuous, we conclude d Xp ≥
0. Moreover, weinfer Z R d Xp ( s ) d s = lim t → ∞ Z t −∞ d Xp ( s ) d s = lim t → ∞ F X ( t ) = 1 , for F X is a distribution function. Hence, d Xp is the density function of X . Using the first part of(4.1), dominated convergence gives E (cid:20) p ′ ( X ) p ( X ) (cid:21) = lim t → ∞ E (cid:20) p ′ ( X ) p ( X ) { X ≤ t } (cid:21) = lim t → ∞ d Xp ( t ) = 0 . Therefore, we conclude that for each f ∈ F p E (cid:2) f ′ ( X ) (cid:3) = Z S ( p,f ) f ′ ( s ) d Xp ( s ) d s = Z y f −∞ f ′ ( s ) E (cid:20) p ′ ( X ) p ( X ) { X ≤ s } (cid:21) d s + m X ℓ =1 Z y fℓ +1 y fℓ f ′ ( s ) E (cid:20) p ′ ( X ) p ( X ) { X ≤ s } (cid:21) d s + Z ∞ y fm +1 f ′ ( s ) E (cid:20) − p ′ ( X ) p ( X ) { X > s } (cid:21) d s = E (cid:20) p ′ ( X ) p ( X ) (cid:16) f ( y f ) − f ( X ) (cid:17) { X ≤ y f } (cid:21) + m X ℓ =1 E (cid:20) p ′ ( X ) p ( X ) (cid:16) f ( y fℓ +1 ) − f ( X ) (cid:17) { y fℓ < X ≤ y fℓ +1 } (cid:21) + m X ℓ =1 E (cid:20) p ′ ( X ) p ( X ) (cid:16) f ( y fℓ +1 ) − f ( y fℓ ) (cid:17) { X ≤ y fℓ } (cid:21) + E (cid:20) − p ′ ( X ) p ( X ) (cid:16) f ( X ) − f ( y fm +1 ) (cid:17) { X > y fm +1 } (cid:21) = E (cid:20) − p ′ ( X ) p ( X ) f ( X ) (cid:21) . In the third equality, Fubini’s theorem is applicable since f ′ is bounded on S ( p, f ) and we have(4.1). Lemma 3.1 yields the claim. A.3 Proof of Theorem 5.4
Let X ∼ p L . By Theorem 5.1, we have F X ( t ) = P ( t ) = Z tL E (cid:20) − p ′ ( X ) p ( X ) { X > s } (cid:21) d s = E (cid:20) − p ′ ( X ) p ( X ) (cid:16) min { X, t } − L (cid:17)(cid:21) , t > L, Z ∞ L E (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) (cid:12)(cid:12)(cid:12)(cid:12) { X > s } (cid:21) d s = E (cid:20) | p ′ ( X ) | p ( X ) (cid:0) X − L (cid:1)(cid:21) ≤ Z S ( p ) | x | (cid:12)(cid:12) p ′ ( x ) (cid:12)(cid:12) d x + | L | Z S ( p ) (cid:12)(cid:12) p ′ ( x ) (cid:12)(cid:12) d x< ∞ . For the converse implication, we put d Xp ( s ) = E (cid:20) − p ′ ( X ) p ( X ) { X > s } (cid:21) , s > L, and notice that the integrability conditions on X imply E (cid:20)Z ∞ L | p ′ ( X ) | p ( X ) { X > s } d s (cid:21) ≤ E (cid:12)(cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) X (cid:12)(cid:12)(cid:12)(cid:12) + | L | · E (cid:12)(cid:12)(cid:12)(cid:12) p ′ ( X ) p ( X ) (cid:12)(cid:12)(cid:12)(cid:12) < ∞ . (A.2)Thus, Fubini’s theorem gives Z tL d Xp ( s ) d s = E (cid:20) − p ′ ( X ) p ( X ) Z tL { X > s } d s (cid:21) = F X ( t ) , t > L. Since d Xp is integrable by (A.2), dominated convergence implies that F X is continuous. Moreover,Lebesgue’s differentiation theorem [see Theorem 3.21 from Folland (1999), with nicely shrinkingsets E h = ( t, t + h ), h >
0] implies d Xp ( t ) = lim h ց h Z t + ht d Xp ( s ) d s = lim h ց F X ( t + h ) − F X ( t ) h ≥ L -a.e. t > L , where we used that F X is increasing. Finally, Z ∞ L d Xp ( s ) d s = lim t → ∞ Z tL d Xp ( s ) d s = lim t → ∞ F X ( t ) = 1 . We conclude that d Xp is the density function of X . The claim follows from Theorem 5.1. Remark A.1.
Note that we could have proven the theorem with the same argument we usedin Theorem 4.1, since the first integrability condition on X ensures that d Xp is left-continuous.However, in Remark 5.5 we extended the argument of Remark 5.2 dropping that first integrabilitycondition in the case L = 0. Then we can no longer conclude the left-continuity, so we had touse the different argument via Lebesgue’s differentiation theorem. A.4 Proof of Lemma 6.1
The necessity part follows with a simple rewriting of the density function, as before. For theconverse implication, assume that X is as in the statement of the lemma, and that d Xp ( t ) = E (cid:20) − p ′ ( X ) p ( X ) { X > t } (cid:21) + lim x ր R p ( x ) , L < t < R,
33s the density function of X . Since we assume both (C4) and (C5), we have by Remark 3.4 forany f ∈ F p [note that f is continuous on ( L, R )] Z S ( p,f ) f ′ ( x ) d x = Z y f L f ′ ( x ) d x + m X ℓ =1 Z y fℓ +1 y fℓ f ′ ( x ) d x + Z Ry fm +1 f ′ ( x ) d x = lim x ր y f f ( x ) − lim x ց L f ( x ) + m X ℓ =1 lim x ր y fℓ +1 f ( x ) − lim x ց y fℓ f ( x ) ! + lim x ր R f ( x ) − lim x ց y fm +1 f ( x )= 0 , where the integral exists by the boundedness of f ′ and the fact that S ( p, f ) ⊂ S ( p ) ⊂ ( L, R )which is a bounded interval. Using this fact, the proof is concluded via Lemma 3.1 with a similarcalculation as in previous proofs.
References
Allison, J. S. and Santana, L. (2015). On a data-dependent choice of the tuning parameterappearing in certain goodness-of-fit tests.
Journal of Statistical Computation and Simulation ,85(16):3276–3288.Allison, J. S., Santana, L., Smit, N., and Visagie, I. J. H. (2017). An ‘apples to apples’ comparisonof various tests for exponentiality.
Computational Statistics , 32(4):1241–1283.Anastasiou, A. (2018). Assessing the multivariate normal approximation of the maximum like-lihood estimator from high-dimensional, heterogeneous data.
Electronic Journal of Statistics ,12(2):3794–3828.Anastasiou, A. and Gaunt, R. (2019+). Multivariate normal approximation of the maximumlikelihood estimator via the delta method.
To appear in Brazilian Journal of Probability andStatistics .Anastasiou, A. and Reinert, G. (2017). Bounds for the normal approximation of the maximumlikelihood estimator.
Bernoulli , 23(1):191–218.Anastasiou, A. and Reinert, G. (2018). Bounds for the asymptotic distribution of the likelihoodratio.
ArXiv e-prints , 1806.03666.Barbour, A. D. (1982). Poisson convergence and random graphs.
Mathematical Proceedings ofthe Cambridge Philosophical Society , 92(2):349–359.34arbour, A. D. (1990). Stein’s method for diffusion approximations.
Probability Theory andRelated Fields , 84(3):297–322.Barbour, A. D., Karo´nski, M., and Ruci´nski, A. (1989). A central limit theorem for decomposablerandom variables with applications to random graphs.
Journal of Combinatorial Theory, SeriesB , 47(2):125–145.Baringhaus, L. and Henze, N. (1988). A consistent test for multivariate normality based on theempirical characteristic function.
Metrika , 35(1):339–348.Baringhaus, L. and Henze, N. (2000). Tests of fit for exponentiality based on a characterizationvia the mean residual life function.
Statistical Papers , 41(2):225–236.Baringhaus, L. and Henze, N. (2008). A new weighted integral goodness-of-fit statistic for expo-nentiality.
Statistics & Probability Letters , 78(8):1006–1016.Betsch, S. and Ebner, B. (2019a). A new characterization of the Gamma distribution and asso-ciated goodness-of-fit tests.
Metrika , https://doi.org/10.1007/s00184-019-00708-7.Betsch, S. and Ebner, B. (2019b). Testing normality via a distributional fixed point property inthe Stein characterization.
TEST , https://doi.org/10.1007/s11749-019-00630-0.Braverman, A. and Dai, J. G. (2017). Steins method for steady-state diffusion approximations of M/ Ph /n + M systems. The Annals of Applied Probability , 27(1):550–581.Braverman, A., Dai, J. G., and Feng, J. (2016). Steins method for steady-state diffusion approx-imations: An introduction through the Erlang-A and Erlang-C models.
Stochastic Systems ,6(2):301–366.Caba˜na, A. and Quiroz, A. (2005). Using the empirical moment generating function in testingfor the Weibull and the type I extreme value distributions.
TEST , 14(2):417–432.Carrillo, C., Cidr´as, J., D´ıaz-Dorado, E., and Obando-Monta˜no, A. F. (2014). An approachto determine the Weibull parameters for wind energy analysis: The case of Galicia (Spain).
Energies , 7(4):2676–2700.Chatterjee, S. and Shao, Q.-M. (2011). Nonnormal approximation by Steins method of exchange-able pairs with application to the Curie–Weiss model.
The Annals of Applied Probability ,21(2):464–483.Chen, L. H. Y., Goldstein, L., and Shao, Q.-M. (2011).
Normal approximation by Stein’s method .Springer, Berlin. 35hwialkowski, K., Strathmann, H., and Gretton, A. (2016). A kernel test of goodness of fit. In
Proceedings of the 33rd International Conference on Machine Learning - Volume 48 , ICML’16,pages 2606–2615.Cupari´c, M., Miloˇsevi´c, B., and Obradovi´c, M. (2018). New L -type exponentiality tests. ArXive-prints , 1809.07585.del Barrio, E., Cuesta-Albertos, J. A., Matr´an, C., Cs¨org¨o, S., Cuadras, C. M., de Wet, T., Gin´e,E., Lockhart, R., Munk, A., and Stute, W. (2000). Contributions of empirical and quantileprocesses to the asymptotic theory of goodness-of-fit tests.
TEST , 9(1):1–96.D¨obler, C. (2015). Stein’s method of exchangeable pairs for the Beta distribution and general-izations.
Electronic Journal of Probability , 20(109):1–34.D¨obler, C. (2017). Distributional transformations without orthogonality relations.
Journal ofTheoretical Probability , 30(1):85–116.Epps, T. W. and Pulley, L. B. (1983). A test for normality based on the empirical characteristicfunction.
Biometrika , 70(3):723–726.Fang, X. (2014). Discretized normal approximation by Steins method.
Bernoulli , 20(3):1404–1431.Folland, G. B. (1999).
Real Analysis: Modern Techniques and Their Applications (Second Edi-tion) . Pure and Applied Mathematics. John Wiley & Sons, Inc., New York.Gaunt, R., Pickett, A., and Reinert, G. (2017). Chi-square approximation by Stein’s methodwith application to Pearson’s statistic.
Annals of Applied Probability , 27(2):720–756.Goldstein, L. and Reinert, G. (1997). Stein’s method and the zero bias transformation withapplication to simple random sampling.
The Annals of Applied Probability , 7(4):935–952.Goldstein, L. and Reinert, G. (2005). Distributional transformations, orthogonal polynomials,and Stein characterizations.
Journal of Theoretical Probability , 18(1):237–260.G¨otze, F. (1991). On the rate of convergence in the multivariate CLT.
The Annals of Probability ,19(2):724–739.Henze, N. and Jim´enez-Gamero, M. D. (2019). A new class of tests for multinormality with i.i.d.and garch data based on the empirical moment generating function.
TEST , 28(2):499–521.Henze, N., Jim´enez-Gamero, M. D., and Meintanis, S. G. (2019). Characterizations of multi-normality and corresponding tests of fit, including for GARCH models.
Econometric Theory ,35(3):510–546. 36enze, N. and Klar, B. (2002). Goodness-of-fit tests for the inverse Gaussian distribution based onthe empirical Laplace transform.
Annals of the Institute of Statistical Mathematics , 54(2):425–444.Henze, N. and Koch, S. (2017). On a test of normality based on the empirical moment generatingfunction.
Statistical Papers , https://doi.org/10.1007/s00362-017-0923-7.Henze, N., Meintanis, S. G., and Ebner, B. (2012). Goodness-of-fit tests for the Gamma distri-bution based on the empirical Laplace transform.
Communications in Statistics - Theory andMethods , 41(9):1543–1556.Hudson, H. M. (1978). A natural identity for exponential families with applications in multipa-rameter estimation.
The Annals of Statistics , 6(3):473–484.Jalali, A. and Watkins, A. J. (2009). On maximum likelihood estimation for the two parameterBurr XII distribution.
Communications in Statistics - Theory and Methods , 38(11):1916–1926.Jim´enez-Gamero, M. D., Alba-Fern´andez, V., Mu˜noz-Garc´ıa, J., and Chalco-Cano, Y. (2009).Goodness-of-fit tests based on empirical characteristic functions.
Computational Statistics &Data Analysis , 53(12):3957–3971.Jovanovi´c, M., Miloˇsevi´c, B., Nikitin, Y. Y., Obradovi´c, M., and Volkova, K. Y. (2015). Testsof exponentiality based on Arnold–Villasenor characterization and their efficiencies.
Computa-tional Statistics & Data Analysis , 90:100–113.Kim, S.-T. (2000). A use of the Stein-Chen method in time series analysis.
Journal of AppliedProbability , 37(4):1129–1136.Kleiber, C. and Kotz, S. (2003).
Statistical Size Distributions in Economics and Actuarial Sci-ences . Wiley Series in Probability and Statistics. John Wiley and Sons, Inc., Hoboken, NewJersey.Ley, C., Reinert, G., and Swan, Y. (2017). Steins method for comparison of univariate distribu-tions.
Probability Surveys , 14:1–52.Ley, C. and Swan, Y. (2011). A unified approach to Stein characterizations.
ArXiv e-prints ,1105.4925v3.Ley, C. and Swan, Y. (2013a). Local Pinsker inequalities via Stein’s discrete density approach.
IEEE Transactions on Information Theory , 59(9):5584–5591.Ley, C. and Swan, Y. (2013b). Stein’s density approach and information inequalities.
ElectronicCommunications in Probability , 18. 37ey, C. and Swan, Y. (2016). Parametric Stein operators and variance bounds.
Brazilian Journalof Probability and Statistics , 30(2):171–195.Linnik, Y. V. (1962). Linear forms and statistical criteria I, II.
Selected Translations in Math-ematical Statistics and Probability , 3:1–40, 41–90. Originally published 1953 in the UkrainianMathematical Journal, Vol. 5, pp. 207–243, 247–290 (in Russian).Liu, Q., Lee, J. D., and Jordan, M. (2016). A kernelized Stein discrepancy for goodness-of-fittests. In
Proceedings of the 33rd International Conference on Machine Learning - Volume 48 ,ICML’16, pages 276–284.Nikitin, Y. Y. (2017). Tests based on characterizations, and their efficiencies: A survey.
Acta etCommentationes Universitatis Tartuensis de Mathematica , 21(1):3–24.Noughabi, H. A. (2015). Testing exponentiality based on the likelihood ratio and power compar-ison.
Annals of Data Science , 2(2):195–204.O’Reilly, F. J. and Stephens, M. A. (1982). Characterizations and goodness of fit tests.
Journalof the Royal Statistical Society. Series B (Methodological) , 44(3):353–360.Pek¨oz, E. A. and R¨ollin, A. (2011). New rates for exponential approximation and the theoremsof R´enyi and Yaglom.
The Annals of Probability , 39(2):587–608.Pinelis, I. (2017). Optimal-order uniform and nonuniform bounds on the rate of convergence tonormality for maximum likelihood estimators.
Electronic Journal of Statistics , 11(1):1160–1179.Plubin, B. and Siripanich, P. (2017). An alternative goodness-of-fit test for a Gamma distributionbased on the independence property.
Chiang Mai Journal of Science , 44(3):1180–1190.Prakasa Rao, B. L. S. (1979). Characterizations of distributions through some identities.
Journalof Applied Probability , 16(4):903–909.Proakis, J. G. and Salehi, M. (2008).
Digital Communications, 5th Edition . McGraw-Hill, NewYork.R Core Team (2019).
R: A Language and Environment for Statistical Computing . R Foundationfor Statistical Computing, Vienna, Austria.Reinert, G. and R¨ollin, A. (2010). Random subgraph counts and U-statistics: Multivariatenormal approximation via exchangeable pairs and embedding.
Journal of Applied Probability ,47(2):378–393.Rogers, G. L. (2008). Multiple path analysis of reflectance from turbid media.
Journal of theOptical Society of America A , 25(11):2879–2883.38oss, N. (2011). Fundamentals of Steins method.
Probability Surveys , 8:210–293.Shah, A. and Gokhale, D. V. (1993). On maximum product of spacings (mps) estimation for BurrXII distributions.
Communications in Statistics - Simulation and Computation , 22(3):615–641.Singh, S. K. and Maddala, G. S. (1976). A function for size distribution of incomes.
Econometrica ,44(5):963–970.Singh, V. P. (1987). On application of the Weibull distribution in hydrology.
Water ResourcesManagement , 1(1):33–43.Stein, C. (1986). Approximate computation of expectations.
Lecture Notes - Monograph Series ,7, Institute of Mathematical Statistics.Stein, C., Diaconis, P., Holmes, S., and Reinert, G. (2004). Use of exchangeable pairs in theanalysis of simulations. In
Stein’s Method, edited by P. Diaconis and S. Holmes , volume 46 of
Lecture Notes – Monograph Series , pages 1–25, Beachwood, Ohio, USA. Institute of Mathe-matical Statistics.Tenreiro, C. (2019). On the automatic selection of the tuning parameter appearing in certain fam-ilies of goodness-of-fit tests.
Journal of Statistical Computation and Simulation , 89(10):1780–1797.Torabi, H., Montazeri, N. H., and Gran´e, A. (2018). A wide review on exponentiality tests andtwo competitive proposals with application on reliability.
Journal of Statistical Computationand Simulation , 88(1):108–139.Villase˜nor, J. A. and Gonz´alez-Estrada, E. (2015). A variance ratio test of fit for Gamma distri-butions.
Statistics & Probability Letters , 96(C):281–286.Volkova, K. Y. and Nikitin, Y. Y. (2015). Exponentiality tests based on Ahsanullah’s character-ization and their efficiency.
Journal of Mathematical Sciences , 204(1):42–54.Wingo, D. R. (1983). Maximum likelihood methods for fitting the Burr type XII distribution tolife test data.
Biometrical Journal , 25(1):77–84.Ying, L. (2017). Stein’s method for mean-field approximations in light and heavy traffic regimes.In
SIGMETRICS 2017 Abstracts - Proceedings of the 2017 ACM SIGMETRICS / InternationalConference on Measurement and Modeling of Computer Systems . Association for ComputingMachinery, Inc.Zardasht, V., Parsi, S., and Mousazadeh, M. (2015). On empirical cumulative residual entropyand a goodness-of-fit test for exponentiality.
Statistical Papers , 56(3):677–688.39ghoul, A. A. (2010). A goodness of fit test for normality based on the empirical momentgenerating function.