AA Simple Algorithm for ExactMultinomial Tests ∗ Johannes Resin
Heidelberg Institute for Theoretical Studies, Heidelberg, GermanyKarlsruhe Institute of Technology, Karlsruhe, Germanye-mail: [email protected]
Abstract:
This work proposes a new method for computing acceptanceregions of exact multinomial tests. From this an algorithm is derived, whichfinds exact p -values for tests of simple multinomial hypotheses. Using con-cepts from discrete convex analysis, the method is proven to be exact forvarious popular test statistics, including Pearson’s chi-square and the log-likelihood ratio. The proposed algorithm improves greatly on the naiveapproach using full enumeration of the sample space. However, its use islimited to multinomial distributions with a small number of categories, asthe runtime grows exponentially in the number of possible outcomes.The method is applied in a simulation study and uses of multinomialtests in forecast evaluation are outlined. Additionally, properties of a teststatistic using probability ordering, referred to as the “exact multinomialtest” by some authors, are investigated and discussed. The algorithm isimplemented in the accompanying R package ExactMultinom . Keywords and phrases:
Acceptance regions, asymptotic approximation,forecast evaluation, goodness-of-fit test, hypothesis testing, log-likelihoodratio statistic, multinomial distribution, Pearson’s chi-square statistic, prob-ability mass statistic, power, R software.
1. Introduction
Multinomial goodness-of-fit tests feature prominently in the statistical literatureand a wide range of applications. Tests relying on asymptotics have been avail-able for a long time and have been rigorously studied all through the 20 th cen-tury. The use of various test statistics has been investigated with Pearson’s chi-square and the log-likelihood ratio statistic being vital examples. These statisticsare members of the general family of power divergence statistics (Cressie andRead, 1984). With the widespread availability of computing power, Monte Carlosimulations and exact methods have also gained popularity.As regards exact tests of a simple null hypothesis against an unspecified al-ternative, Tate and Hyer (1973) and Kotze and Gokhale (1980) used the “exactmultinomial test”, which orders samples by probability to assess the accuracyof asymptotic tests. In the words of Cressie and Read (1989), this “has providedmuch confusion and contention in the literature”. In accordance with Gibbons ∗ This work has been supported by the Klaus Tschira Foundation. I want to thank TilmannGneiting, Alexander Jordan and Sebastian Lerch for helpful comments, discussions and con-tinued encouragement. 1 a r X i v : . [ s t a t . C O ] A ug . Resin/A Simple Algorithm for Exact Multinomial Tests Fig 1 . An acceptance region (red) at level α = 0 . for the null π = ( , , ) and samplesof size n = 50 with m = 3 categories. Only the points within the ball (blue) around theexpectation (black point) have to be considered to find this acceptance region. and Pratt (1975) and Radlow and Alf (1975), they conclude that the asymptoticfit of a test should be assessed using the appropriate exact test based on thetest statistic in question. Nevertheless, the exact multinomial test is intuitivelyappealing and, as Kotze and Gokhale (1980) put it, “[i]n the absence of [...]a specific alternative, it is reasonable to assume that outcomes with smallerprobabilities under the null hypothesis offer a stronger evidence for its rejectionand should belong to the critical region”. In Section 2, an asymptotic chi-squareapproximation to the exact multinomial test is derived and an exemplary com-parison of popular test statistics in terms of power is provided.Regardless of the test statistic used, calculating an exact p -value by fully enu-merating the sample space is computationally challenging, as the test statisticand the probability mass function have to be evaluated at every possible sampleof which there are (cid:0) n + m − m − (cid:1) = O ( n m − ) for samples of size n with m categories.An improvement on this method has been proposed by Bejerano, Friedmanand Tishby (2004) and other, more elaborate approaches exist (see for exampleBaglivo, Olivier and Pagano, 1992; Hirji, 1997; Keich and Nagarajan, 2006). Inthis work, a new approach to exact multinomial tests is investigated.The key observation underlying the proposed algorithm is that acceptanceregions at arbitrary levels contain relatively few points, which are located in aneighborhood of the expected value under the null hypothesis as illustrated inFigure 1. An acceptance region can be found by iteratively evaluating pointswithin a ball of increasing radius around the expected value (w.r.t. the Manhat-tan distance). From this procedure an algorithm for computing exact p -valuesis derived by finding the probability mass of the smallest acceptance region thatdoes not contain an observation. If p -values below an arbitrary threshold arenot calculated exactly, the runtime of the algorithm is guaranteed to be asymp- . Resin/A Simple Algorithm for Exact Multinomial Tests totically faster than the approach using full enumeration as the diameter of anyacceptance region essentially grows at a rate proportional to the square root ofthe sample size. This is detailed and proven to work for various popular teststatistics in Section 3.Furthermore, the algorithm is illustrated to work well in applications de-tailed in Section 4. In particular, the algorithm’s runtime is compared to thefull enumeration method in a simulation study and the resulting p -values areused to assess the fit of asymptotic chi-square approximations and investigatedifferences between several test statistics. As a further application, the use ofmultinomial tests to quantify the gravity of discrepancies in forecast probabil-ities and outcome frequencies within the so-called calibration simplex (Wilks,2013) is outlined and justified.The R programming language (R Core Team, 2020) has been used for allcalculations throughout this work. An implementation of the proposed methodis provided within the R package ExactMultinom available at the CRAN packagerepository ( https://cran.r-project.org/ ).
2. A Brief Review on Testing a Simple Multinomial Hypothesis
Consider a multinomial experiment X = ( X , . . . , X m ) summarizing n ∈ N i.i.d.trials with m ∈ N possible outcomes. Let∆ m − := { p ∈ [0 , m | p + . . . + p m = 1 } denote the unit ( m − -simplex or probability simplex and∆ nm − = { x ∈ N m | x + . . . + x m = n } the regular discrete ( m − -simplex . The distribution of X is characterized bya parameter p = ( p , . . . , p m ) ∈ ∆ m − encoding the occurrence probabilitiesof the outcomes on any trial, or X ∼ M m ( n, p ) for short. The multinomialdistribution M m ( n, p ) is fully described by the probability mass function (pmf) f n,p : ∆ nm − → [0 , , x (cid:55)→ n ! m (cid:89) j =1 p x j j x j ! . Suppose that the true parameter p is unknown. Consider the simple nullhypothesis p = π for some π ∈ ∆ m − . The agreement of a realization x ∈ ∆ nm − of X with the null hypothesis is typically quantified by means of a test statistic T : ∆ nm − × ∆ m − → R . Given such a test statistic T and presuming from nowon that w.l.o.g. high values of T ( x, π ) indicate ‘extreme’ observations under thenull distribution P π , the p -value of x is defined as the probability p T ( x, π ) := P π ( T ( X, π ) ≥ T ( x, π )) (1)of observing an observation that is at least as extreme under the null hypothesis. . Resin/A Simple Algorithm for Exact Multinomial Tests The family of power divergence statistics introduced by Cressie and Read(1984) offers a variety of test statistics for multinomial goodness-of-fit tests. Itis defined as T λ ( x, π ) := 2 λ ( λ + 1) m (cid:88) j =1 x j (cid:32)(cid:18) x j nπ j (cid:19) λ − (cid:33) for λ ∈ R \ {− , } (2)and as the pointwise limit in (2) for λ ∈ {− , } . Notably, this includes Pearson’schi-square statistic T χ ( x, π ) := m (cid:88) j =1 ( x j − nπ j ) nπ j = m (cid:88) j =1 x j nπ j − n = T ( x, π )as well as the log-likelihood ratio (or G -test) statistic T G ( x, π ) := 2 log f n, xn ( x ) f n,π ( x ) = 2 m (cid:88) j =1 x j log x j nπ j = T ( x, π ) . Under a null hypothesis with π i > i = 1 , . . . , m , every power divergencestatistic is asymptotically chi-square distributed with m − f n,p : { x ∈ R m ≥ | x + . . . + x m = n } → R , x (cid:55)→ Γ( n + 1) m (cid:89) j =1 p x j j Γ( x j + 1)the continuous extension of the pmf f n,p to the convex hull of the discretesimplex ∆ nm − and define the probability mass test statistic as T P ( x, π ) := − f n,π ( x )¯ f n,π ( nπ ) . Obviously, the choice of strictly decreasing transformation does not affect the(exact) p -value given by (1) for T = T P . The following theorem gives rise toan asymptotic approximation of p -values derived from the probability mass teststatistic, which has not been studied previously. In the simulation study ofSection 4, the fit of this approximation is assessed empirically using exact p -values calculated with the new method for samples of size n = 100 with m = 5categories. Theorem 1. If X ∼ M m ( n, π ) follows a multinomial distribution with n ∈ N and π ∈ ∆ m − such that π j > for j = 1 , . . . , m , then T P ( X, π ) converges indistribution to a chi-square distribution χ m − with m − degrees of freedom as n → ∞ . . Resin/A Simple Algorithm for Exact Multinomial Tests Proof.
By Lemma 7 (in Appendix A), the difference between the log-likelihoodratio and the probability mass statistic is T P ( X, π ) − T G ( X, π ) = m (cid:88) j =1 (cid:18) log X j nπ j + O (1 /X j ) − O (1 /n ) (cid:19) . Clearly, the bounded terms converge to zero in probability and the log X j nπ j termsconverge to zero in probability by the continuous mapping theorem. Hence,the probability mass statistic has the same asymptotic distribution as the log-likelihood ratio statistic.In what follows, the focus is on the chi-square, log-likelihood ratio and prob-ability mass statistics. As outlined in the introduction, acceptance regions are of major importance tothe idea pursued in this work. Given a test statistic T , the acceptance region atlevel α > A Tn,π ( α ) := { x ∈ ∆ nm − | p T ( x, π ) ≥ α } . Equivalently, the acceptance region can be written as the sublevel set of T ( · , π )at any (1 − α )-quantile t − α of T ( X, π ) under the null hypothesis X ∼ M m ( n, π ),i.e., A Tn,π ( α ) = { x ∈ ∆ nm − : T ( x, π ) ≤ t − α } . By construction, the probability mass test statistic often yields a smallestacceptance region, because it assigns the samples with largest probabilities tothe acceptance region. This is the case precisely if P π ( X ∈ A T P n,π ( α )) − (1 − α ) < min x ∈ A T P n,π ( α ) P π ( X = x ). If tests are randomized to ensure equal level and sizeof the test, this property can be refined to yield an optimality property of theprobability mass test’s critical function. Figure 2 illustrates acceptance regionsfor different test statistics.In Section 3, it will be shown that acceptance regions of the chi-square, log-likelihood ratio and probability mass test statistic all grow at a rate O ( n m − ),as their diameter grows at a rate O ( √ n ) if α > The power function of a test T of the null hypothesis p = π at level α is∆ m − → [0 , , p (cid:55)→ − P p ( T ( X ) ∈ A Tn,π ( α )) , which is the probability of rejecting the null hypothesis at level α if the trueparameter is p . The size of a test is its power at p = π . A test T is said to be unbiased (for the null p = π at level α ) if its power is minimized at p = π . . Resin/A Simple Algorithm for Exact Multinomial Tests llllllll lllll lll lllllll l lllll l ll ll llll ll llll lll ll l lllll lll l ll lll lll ll ll ll lll lll ll lll ll ll ll l ll ll ll ll l ll lll lll ll ll l llll l lll ll l lllll l lllll l ll lll lll llll l ll lll ll lll llll l l lllll ll llll l ll ll lll ll ll l lllll lllll lllll lllll l l llll ll llll ll l llll ll ll ll l ll ll ll lll ll ll ll ll ll lll ll ll llll lll lll llll ll l ll l lll lll ll ll lll ll lll llll lll ll lll l lll lll lll l l l llllll ll l llll lll l l ll ll ll l l ll lll l ll l l lll ll ll lll l ll llll ll l lll ll llll ll llll llll ll lllll lllllll ll l lll ll lll llll ll lll ll l llll ll lll l lll ll llll lllll ll ll l ll l l ll lll ll llllll l ll lllll lllll lll llll ll l ll ll ll l lllll l ll llllllll ll ll ll llll l lll ll l lll llll ll llll ll ll ll l llllll llll ll l lll l llll lll lllll l lllll lllll l ll llllll lll lll ll ll ll lll llll l ll llll llll llll ll ll ll lll llllll l l lll lllllll ll ll lll lll lll ll ll l lll ll l ll llll ll lll ll lllllllll llll llll lll ll ll ll l ll lll lll lll l ll ll ll llllll l llll ll lll ll lll ll lllll llll llll ll llllllll lllll l ll lll llll l ll ll l ll lll ll ll llll llll llll llllllll lll lll lllll ll lll ll lll ll llllll llll lll lllll ll lll ll llll l l lllll lll lll ll lll llll llll llll lll ll ll lll l ll l ll lllll ll llll lllll llll lllllll ll ll l lllllll llll l l ll lll l llll l lll lll lllll l ll llll ll lll ll l lll llllllllllll ll l llllll lllll lll l ll llll ll lll ll l llll lll l lllllll lll llll ll l ll llll lll llll llll ll ll lll lll lllll ll llll l ll lll l lll lllllllll llll lll lll ll llllll l ll ll l lll llll ll lll l lll lll lllll lll ll l ll ll lll llll ll llllll lllll l ll l lll lll lll ll lllll lll lll lll llll l lll ll lll l ll ll lll llll ll ll llll lllll lll lllll lllll llllll llll llll lll lll ll ll l lll lll lll ll llll ll lllll l ll ll lll ll lll lll ll ll ll lll ll ll lll lll ll lll ll lll ll ll lll lll ll ll ll ll ll llll lll lllll lll ll ll lll llll lll lll ll lll lll llll l ll l l ll lll ll ll ll lll ll ll ll ll ll ll lll ll ll lll lll llll lll ll lll lll lll ll lll ll lll ll l llll lll lll llllll ll lllll llll l ll lllll ll llll ll lll l ll ll lll ll ll llll ll ll llll ll ll ll llll l llll ll llll ll lllll l llll l lll l lll lll ll ll ll lll lll lll ll llll lll lll ll llll ll l llll ll ll ll ll lll llll l llll lll l l llll ll lll lll l ll ll lll llll l lllll l ll ll lll ll lll lll ll l ll ll llll llll ll ll lll ll lll llll lll ll ll lll lll lll lll llll l lll lll ll ll l ll llll llllllll l ll l ll l lll ll l lll l lll lll ll lllll lll ll l ll llll llll ll l llll lll lll l lll lllllll ll llll lll l llll llll ll ll l lll l lll ll llll ll lll l lll l lll llll lll ll lllll llll ll lll llll l l llll l l lll llll llllll lll ll l l lllll lll lllll llll llll l ll l ll ll lll lll ll ll l l lll lll ll l llll llllllllllll l lllll lll ll ll ll ll ll lll l lll l ll ll ll lll l l llllll lll ll ll ll llll lll l llll lll lll l l lll llll ll l lll lll lll llll llll llllllll l ll lllll lll ll llll lll lll l lll ll ll llll ll l lllll lll l lll lll llll llll l lllllllll lll ll lll lll lll ll ll lll l lll l ll llllll llllll ll ll ll l llll l lll llll l lll lll ll ll l ll ll lllll lllll ll ll lll llll l ll llll ll lll ll lllll lll lllll ll lll l lll ll llll lll l ll llllllll lll l ll ll l l lll lll lll ll lll lll l ll ll ll lllll ll lll ll ll ll lll ll llll ll ll l lll ll ll ll llll l lll lll ll ll ll lll llll lll lll ll ll lll llll lllll lll ll ll llllllll lll ll llllllll ll ll llllllll ll ll lllllll lll lllllll lll llllll lll lllll ll lllll ll llll lllll llll llllll ll ll llll ll ll lll l lllll ll llllll ll ll l lll lll lll l l ll ll ll ll ll ll l ll l llll ll lllll lll lll ll llll l lll ll lll ll lll lll ll llll l ll lll ll ll lll ll ll l ll ll lll ll llll l llllll llll ll ll lll lll ll lll ll l lll llll llll ll l ll lll ll lll lll l ll ll l l llll ll lllllll l lll l lll ll ll ll l lll l l lll ll l llll llllll l ll lll lll ll llll l ll lll llll l llll lll lll l ll ll ll lll l llll ll ll l ll ll lll l ll lll ll lll l l ll lll ll ll l ll llll ll ll l lll lll lll ll lll lll llll ll lll l lll ll lll llll ll lll ll l ll l llll l ll ll lll l llll l lll ll lll lll l lllll ll lll llll lll l llllll ll lll lll l ll ll l llll lll l llllll lllll ll ll llll ll ll ll ll ll ll ll l ll lll ll ll l lllll ll llll llllll ll l ll ll ll l ll lll lll lllll lllllll llllll ll l l ll ll l llll l ll l ll lll lllllll lll ll ll ll ll l ll ll lll ll lll lll llll llll llll l ll l ll l ll lllll llll ll ll ll ll llllll llll llllll ll ll l lll ll lll lll ll lllll l lll lllll ll ll llll l llllllll ll l ll ll lll ll ll ll l lll ll llll ll llll ll llll llll ll ll llllll llll ll ll lll lllll llllll lll lllll ll llll l ll l llll lll l lll ll llll llllllll lll lll l ll llll l llll l llll l ll llll ll lll ll lll lll l lll lllll l lll lll l ll ll ll ll llll ll llll l lll llllllll llll l l llllll lll lll ll l lllll lll l ll ll llll ll ll ll ll lllll llll lllllllll l lll llll ll ll llll lll lll l lll ll llll l l lll lll lllll ll ll lll lll ll ll lll l llll ll ll lll ll lll l ll lll lllll llllll llll lll ll ll ll ll lll llll llllll lllll ll llll lll ll l lll lll l l l lll l llllllll lll llll ll llll lll ll l llll lll lll lllllll llll l ll ll ll llll l llll l llll ll ll lllll ll lll llll lll ll ll lll l l ll lll llll lllll lll ll lll ll lllll llll lll lllll lll lllll lll llll ll l lll lllllllllll Fig 2 . Acceptance regions of probability mass (left), chi-square (center) and log-likelihoodratio (right) statistics at level α = 0 . for n = 50 and π = ( , , ) . The regions contain108, 111 and 111 points, respectively (left to right). The tests are of size . , . and . , respectively. The color gradient represents (null) probabilities within the regions. In case of the uniform null hypothesis, i.e., π = ( m , . . . , m ), Cohen andSackrowitz (1975, Theorem 2.1) proved that the power function increases awayfrom p = π for tests statistics of the form T ( x ) = m (cid:88) j =1 h ( x j )if h is a convex function. They concluded that tests based on the chi-square andthe log-likelihood ratio test statistic are unbiased for the uniform null hypothesis.As a corollary to their theorem, it shall be noted that this also applies to theprobability mass test statistic. Corollary 2 (to Cohen and Sackrowitz, 1975, Theorem 2.1) . The probabilitymass test is unbiased for the uniform null hypothesis p = π = ( m , . . . , m ) .Proof. Since the probability mass statistic can be written as T P ( x, π ) = 2 m (cid:88) j =1 log Γ( x j + 1) − x j log π j − log Γ( nπ j + 1) π nπ j j , this is an immediate consequence of the fact that the Gamma function is log-arithmically convex on the positive real numbers, which is part of a character-ization given by the Bohr-Mollerup theorem (Beals and Wong, 2010, Theorem2.4.2).Many authors (e.g., West and Kempthorne, 1972; Cressie and Read, 1984;Wakimoto, Odaka and Kang, 1987; P´erez and Pardo, 2003) have conductedsmall sample studies to investigate the power of chi-square, log-likelihood ratioand other tests. In conducting these studies π, n and α need to be chosen, allof which influence the resulting power function. Furthermore, it is frequently . Resin/A Simple Algorithm for Exact Multinomial Tests infeasible to assess the power function across all alternatives and so alternativesof interest need to be picked.Therefore, most of these studies focused on the case of the uniform null hy-pothesis. In this case, the chi-square test has greater power for alternativesthat assign a large proportion of the probability mass to relatively few classes,whereas the log-likelihood ratio test has greater power for alternatives that as-sign considerable probability mass to many classes (see also Koehler and Larntz,1980).In the ternary case, that is, if m = 3, comparisons on the full probability sim-plex are visually accessible. Figure 3 illustrates, which of the three test statisticsyields the highest and lowest power across the full ternary probability simplex.As the actual size of the test varies with the choice of the test statistic due tothe actual size of a test frequently being smaller than the level α , the resultingpower functions are difficult to compare directly.To account for this, the tests are randomized to ensure that their respectivesize matches the level. For a test T and level α , let s n,π ( T, α ) = 1 − P π ( T ( X ) ∈ A Tn,π ( α )) denote the actual size of the test. The critical function φ : ∆ nm − → [0 , , x (cid:55)→ , if T ( x, π ) < t − α , α − s n,π ( T,α ) P π ( T ( X )= t − α ) , if T ( x, π ) = t − α , , if T ( x, π ) > t − α , defines a randomized test , which rejects the null hypothesis with probability φ ( x ) if x is observed. The power function of the randomized version of a test T at level α is p (cid:55)→ (cid:88) x ∈ ∆ nm − φ ( x ) P p ( X = x ) = 1 − (cid:88) x ∈ A Tn,π ( α ) (1 − φ ( x )) P p ( X = x ) . With this, the probability mass test minimizes the acceptance region in thesense that it minimizes the sum (cid:88) x ∈ ∆ nm − (1 − φ ( x ))across all randomized tests φ with (cid:80) x φ ( x ) f n,π ( x ) = α .Figure 3 suggests that the probability mass test and the log-likelihood ratiotest for the uniform null hypothesis are the same. However, this is not generallytrue as for other choices of α (e.g., α = 0 .
13, for which coincidentally theprobability mass statistic yields the same acceptance region as the chi-squarestatistic) the acceptance regions differ and so do the power functions.Figure 4 quantitatively compares powers along alternatives of the form p ( q, i ) = (˜ qπ , . . . , ˜ qπ i − , q, ˜ qπ i +1 , . . . , ˜ qπ m ) ∈ ∆ m with ˜ q = 1 − q − π i Randomized tests like this traditionally arise in the theory of uniformly most powerfultests, see for example Lehmann and Romano (2005, Chapter 3). . Resin/A Simple Algorithm for Exact Multinomial Tests Highest power lll lll llll lllll llllll lllllll llllllll lllllllll llllllllll lllllllllll llllllllllll lllllllllllll llllllllllllll lllllllllllllll llllllllllllllll lllllllllllllllll llllllllllllllllll lllllllllllllllllll llllllllllllllllllll lllllllllllllllllllll llllllllllllllllllllll lllllllllllllllllllllll llllllllllllllllllllllll lllllllllllllllllllllllll llllllllllllllllllllllllll lllllllllllllllllllllllllll llllllllllllllllllllllllllll lllllllllllllllllllllllllllll llllllllllllllllllllllllllllll lllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l Lowest Power lll lll llll lllll llllll lllllll llllllll lllllllll llllllllll lllllllllll llllllllllll lllllllllllll llllllllllllll lllllllllllllll llllllllllllllll lllllllllllllllll llllllllllllllllll lllllllllllllllllll llllllllllllllllllll lllllllllllllllllllll llllllllllllllllllllll lllllllllllllllllllllll llllllllllllllllllllllll lllllllllllllllllllllllll llllllllllllllllllllllllll lllllllllllllllllllllllllll llllllllllllllllllllllllllll lllllllllllllllllllllllllllll llllllllllllllllllllllllllllll lllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l Highest power lll lll llll lllll llllll lllllll llllllll lllllllll llllllllll lllllllllll llllllllllll lllllllllllll llllllllllllll lllllllllllllll llllllllllllllll lllllllllllllllll llllllllllllllllll lllllllllllllllllll llllllllllllllllllll lllllllllllllllllllll llllllllllllllllllllll lllllllllllllllllllllll llllllllllllllllllllllll lllllllllllllllllllllllll llllllllllllllllllllllllll lllllllllllllllllllllllllll llllllllllllllllllllllllllll lllllllllllllllllllllllllllll llllllllllllllllllllllllllllll lllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l Lowest Power lll lll llll lllll llllll lllllll llllllll lllllllll llllllllll lllllllllll llllllllllll lllllllllllll llllllllllllll lllllllllllllll llllllllllllllll lllllllllllllllll llllllllllllllllll lllllllllllllllllll llllllllllllllllllll lllllllllllllllllllll llllllllllllllllllllll lllllllllllllllllllllll llllllllllllllllllllllll lllllllllllllllllllllllll llllllllllllllllllllllllll lllllllllllllllllllllllllll llllllllllllllllllllllllllll lllllllllllllllllllllllllllll llllllllllllllllllllllllllllll lllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l lll ChisqLLRProb
Fig 3 . Ternary plots indicating the test statistic with the highest power (left) and lowest power(right) for the uniform null hypothesis π = ( , , ) (top) and π = ( , , ) (bottom) for n = 50 and randomized tests of size α = 0 . . Mixtures of colors indicate nearly equal powers(difference ¡ 0.0001). For example, violet indicates nearly equal powers of the log-likelihoodratio (LLR) and probability mass (Prob) statistic. Black indicates areas where all powers arenearly equal.. Resin/A Simple Algorithm for Exact Multinomial Tests . . . . p po w e r . . . . p po w e r . . . . p po w e r Fig 4 . Power functions along alternatives given by p ( p i , i ) , i = 1 , , for randomized tests ofsize α = 0 . of null hypothesis π = ( , , ) and sample size n = 50 . for i = 1 , . . . , m and q ∈ [0 , π and a corner of the probability simplex. The figures illustrate that in the case n = 50 , π = ( , , ) and α = 0 .
05, the log-likelihood ratio test, arguably,does not show any visible bias, whereas the chi-square test shows the most bias.The power function of the probability mass test lies in between the other powerfunctions across most of the probability simplex and so the probability masstest might serve as a good compromise in terms of power.
3. Exact p -Values via Acceptance Regions Throughout this section, T is some test statistic and m, n ∈ N and π ∈ ∆ m − are considered fixed. To ease notation, the subscripts in the pmf of the nulldistribution are omitted, i.e., f = f n,π and the test statistic T is considered asa function on the sample space only, i.e., T ( · ) = T ( · , π ). Let d : R m × R m → R ≥ , ( x, y ) (cid:55)→ (cid:107) x − y (cid:107) = 12 (cid:88) j | x j − y j | be a rescaled version of the Manhattan distance and B r ( y ) = { x ∈ ∆ nm − | d ( x, y ) ≤ r } the ball with radius r ∈ N and center y ∈ ∆ nm − . Furthermore, e i = ( δ ij ) mj =1 denotes the i -th vector of the standard basis of R m . As alluded to in the introduction, an acceptance region A = A Tn,π ( α ) for α ∈ (0 ,
1) can be found without enumerating all points of the sample space ∆ nm − , . Resin/A Simple Algorithm for Exact Multinomial Tests but only considering points in some ball around the expected value for manytest statistics. Specifically, if T is weakly quasi M-convex , that is, for all distinct x, y ∈ ∆ nm − there exist indices i, j ∈ { , . . . , m } such that x i > y i , x j < y j and T ( x − e i + e j ) ≤ T ( x ) or T ( y + e i − e j ) ≤ T ( y ) , the following theorem holds. Theorem 3.
Let T be weakly quasi M-convex. Let y ∈ ∆ nm − and r ∈ N suchthat (cid:80) x ∈ B r ( y ) f ( x ) ≥ − α for some α ∈ (0 , . If there exists a subset A ⊂ B r − ( y ) of the form A = { x ∈ B r ( y ) | T ( x ) ≤ t } such that (cid:80) x ∈ A f ( x ) ≥ − α ,then the smallest such subset is the acceptance region A Tn,π ( α ) . Hence, an acceptance region can be found by iteratively enumerating a ballof increasing radius with arbitrary center until a sublevel set with enough prob-ability mass is found and this sublevel set remains unchanged upon furtherincreasing the ball. This was illustrated in the introduction, see Figure 1 for anacceptance region of the probability mass statistic T = T P .The following proposition ensures that this approach can be applied to thechi-square, log-likelihood ratio and probability mass test statistics. Proposition 4. a) The probability mass test statistic T P is weakly quasi M-convex.b) The power divergence test statistic T λ is weakly quasi M-convex if λ ≥ .Proof. Throughout the proof, let x, y ∈ ∆ nm − such that x (cid:54) = y and define theindex sets S + := { i | x i > y i } and S − := { j | x j < y j } . a) Let T = T P and w.l.o.g. T ( x ) ≥ T ( y ). Then T ( y ) − T ( x ) = − f ( y ) f ( x ) = − (cid:89) i ∈ S + x i ! y i ! π y i − x i i · (cid:89) j ∈ S − x j ! y j ! π y j − x j j = − (cid:89) i ∈ S + x i − y i (cid:89) k =1 y i + kπ i · (cid:89) j ∈ S − y j − x j (cid:89) k =1 π j x j + k ≤ . Both double products contain an equal number of multiplicands (since (cid:80) j x j = (cid:80) j y j = n ) and are nonempty (since x (cid:54) = y ). As the entireproduct is at least 1, there exist indices i ∈ S + and j ∈ S − and naturalnumbers k + ≤ x i − y i and k − ≤ y j − x j such that the second inequalityholds in π j x j + 1 ≥ π j x j + k − ≥ π i y i + k + ≥ π i x i . Therefore, the inequality T ( x − e i + e j ) = T ( x ) − (cid:18) x i π i · π j x j + 1 (cid:19) ≤ T ( x ) . Resin/A Simple Algorithm for Exact Multinomial Tests holds.b) Let T = T λ and w.l.o.g. T ( x ) ≥ T ( y ). First, consider the case λ >
0. Notethat T ( x ) − T ( y ) = 2 λ ( λ + 1) (cid:88) i ∈ S + x λ +1 i − y λ +1 i ( nπ i ) λ − (cid:88) j ∈ S − y λ +1 j − x λ +1 j ( nπ j ) λ ≥ T ( x − e i ∗ + e j ∗ ) = T ( x ) − λ ( λ + 1) (cid:32) x λ +1 i ∗ − ( x i ∗ − λ +1 ( nπ i ∗ ) λ (cid:33) + 2 λ ( λ + 1) (cid:32) ( x j ∗ + 1) λ +1 − x λ +1 j ∗ ( nπ j ∗ ) λ (cid:33) (4)for i ∗ ∈ S + , j ∗ ∈ S − . If i ∗ = arg max i ∈ S + x λ +1 i − ( x i − λ +1 ( nπ i ) λ , j ∗ = arg min j ∈ S − ( x j + 1) λ +1 − x λ +1 j ( nπ j ) λ and d = d ( x, y ), then x λ +1 i ∗ − ( x i ∗ − λ +1 ( nπ i ∗ ) λ = 1 d (cid:88) i ∈ S + x i − y i (cid:88) k =1 x λ +1 i ∗ − ( x i ∗ − λ +1 ( nπ i ∗ ) λ ≥ d (cid:88) i ∈ S + x i − y i (cid:88) k =1 x λ +1 i − ( x i − λ +1 ( nπ i ) λ ≥ d (cid:88) i ∈ S + x i − y i (cid:88) k =1 ( x i + 1 − k ) λ +1 − ( x i − k ) λ +1 ( nπ i ) λ = 1 d (cid:88) i ∈ S + x λ +1 i − y λ +1 i ( nπ i ) λ (3) ≥ d (cid:88) j ∈ S − y λ +1 j − x λ +1 j ( nπ j ) λ (5)= 1 d (cid:88) j ∈ S − y j − x j (cid:88) k =1 ( x j + k ) λ +1 − ( x j − k ) λ +1 ( nπ j ) λ ≥ d (cid:88) j ∈ S − y j − x j (cid:88) k =1 ( x j + 1) λ +1 − x λ +1 j ( nπ j ) λ ≥ d (cid:88) j ∈ S − y j − x j (cid:88) k =1 ( x j ∗ + 1) λ +1 − x λ +1 j ∗ ( nπ j ∗ ) λ . Resin/A Simple Algorithm for Exact Multinomial Tests = ( x j ∗ + 1) λ +1 − x λ +1 j ∗ ( nπ j ∗ ) λ , Hence, T ( x ) ≥ T ( x − e i ∗ + e j ∗ ) by equation (4).For λ = 0, simply taking the limit (as λ →
0) in the above equations with i ∗ = arg max i ∈ S + x i log (cid:18) x i nπ i (cid:19) − x i −
1) log (cid:18) x i − nπ i (cid:19) ,j ∗ = arg min j ∈ S − x j + 1) log (cid:18) x j + 1 nπ j (cid:19) − x j log (cid:18) x j nπ j (cid:19) . yields the desired inequality, since2 x i ∗ log (cid:18) x i ∗ nπ i ∗ (cid:19) − x i ∗ −
1) log (cid:18) x i ∗ − nπ i ∗ (cid:19) = lim λ → λ ( λ + 1) x i ∗ (cid:32)(cid:18) x i ∗ nπ i ∗ (cid:19) λ − (cid:33) − lim λ → λ ( λ + 1) ( x i ∗ − (cid:32)(cid:18) x i ∗ − nπ i ∗ (cid:19) λ − (cid:33) = lim λ → λ ( λ + 1) (cid:32) x λ +1 i ∗ − ( x i ∗ − λ +1 ( nπ i ∗ ) λ − (cid:33) ( ?? ) ≥ lim λ → λ ( λ + 1) (cid:32) ( x j ∗ + 1) λ +1 − x λ +1 j ∗ ( nπ j ∗ ) λ − (cid:33) = 2( x j ∗ + 1) log (cid:18) x j ∗ + 1 nπ j ∗ (cid:19) − x j ∗ log (cid:18) x j ∗ nπ j ∗ (cid:19) . The rest of this section is devoted to the proof of Theorem 3. For furtherdetails on weak quasi M-convexity and discrete convex analysis in general, seeMurota (2003).Weakly quasi M-convex functions have the important property that their sub-level sets are weakly quasi M-convex sets (Murota and Shioura, 2003, Theorem3.10). A subset M ⊂ ∆ nm − is weakly quasi M-convex if for all distinct x, y ∈ M there exist indices i, j ∈ { , . . . , m } such that x i > y i , x j < y j and x − e i + e j ∈ M or y + e i − e j ∈ M. Equivalently, this can be characterized as follows.
Lemma 5.
A subset M ⊂ ∆ m − is weakly quasi M-convex if and only if forall x, y ∈ M and d = d ( x, y ) there exists a sequence x , x , . . . , x d ∈ M with x = x, x d = y and d ( x i , x i +1 ) = 1 for all i = 0 , , . . . , d − .Proof. . Resin/A Simple Algorithm for Exact Multinomial Tests “ ⇒ ”: By induction on d : Let x, y ∈ M and d = d ( x, y ). If d = 0, then x = x = y satisfies the condition. If d >
0, define x d − = y + e i − e j for some i, j suchthat x i > y i , x j < y j and w.l.o.g. x d − ∈ M . Then d ( x d − , y ) = 1 and d ( x, x d − ) = 12 (cid:32) (cid:88) k (cid:54) = i,j | x k − y k | + | x i − ( y i + 1) | (cid:124) (cid:123)(cid:122) (cid:125) = | x i − y i |− + | x j − ( y j − | (cid:124) (cid:123)(cid:122) (cid:125) = | x j − y j |− (cid:33) = 12 ( (cid:107) x − y (cid:107) −
2) = d − . By induction hypothesis, there exists a sequence x , x , . . . , x d − ∈ M such that, x = x , x , . . . , x d − , x d = y ∈ M is the sought-after sequence.“ ⇐ ”: Let x, y ∈ M and d = d ( x, y ). Let x , x , . . . , x d be a sequence as inthe lemma. As d ( x, x ) = 1, there exist i, j such that x = x − e i + e j .Furthermore, x i > y i and x j < y j , since d − d − (cid:88) l =1 d ( x l , x l +1 ) ≥ d ( x , y ) = 12 (cid:32) (cid:88) k (cid:54) = i,j | x k − y k | + | x i − − y i | + | x j + 1 − y j | (cid:33) . With this, the theorem can be proven as follows.
Proof of Theorem 3.
Let t be minimal such that A = { x ∈ B r ( y ) | T ( x ) ≤ t } has probability mass (cid:80) x ∈ A f ( x ) ≥ − α and A ∩ ( B r ( y ) \ B r − ( y )) = ∅ (i.e., A ⊆ B r − ( y )). Furthermore, fix a ∈ A such that T ( a ) = t .Assume there exists some b ∈ A Tn,π ( α ) \ A , i.e., T ( b ) ≤ t and b / ∈ B r ( y ). Recallthat the test statistic T is weakly quasi M-convex and therefore the sublevel set L = { x ∈ ∆ nm − | T ( x ) ≤ t } is weakly quasi M-convex. By Lemma 5, thereexists a sequence a = a , a , . . . , a d = b ∈ L with d = d ( a, b ) and d ( a i , a i +1 ) = 1for all i = 0 , , . . . , d −
1. By the triangle inequality d ( a i , y ) − ≤ d ( a i +1 , y ) ≤ d ( a i , y ) + 1. Thus, there exists some j ∈ { , . . . , d − } such that d ( a j , x ) = r , acontradiction (as T ( a j ) ≤ t and a j ∈ B r ( y ) \ B r − ( y )). Therefore, A Tn,π ( α ) ⊆ A and, hence, A = A Tn,π ( α ). p -value As described in the previous subsection, an acceptance region can be determinedby starting at some arbitrary point and increasing the radius of a ball aroundthis point until the acceptance region is found using the criterion provided byTheorem 3. Obviously, a point that is not within the acceptance region is not apractical starting point and, ideally, one would like to start at the center of theacceptance region, to minimize the necessary iterations and number of points forwhich to evaluate the pmf and the test statistic. The expected value E X = n · p . Resin/A Simple Algorithm for Exact Multinomial Tests of the multinomial distribution, which is the center of mass of all probabilityweighted points in the discrete simplex, is known and must be close to the centerof mass of the acceptance region, as the acceptance region contains most of themass. Therefore, a sample point close to the expected value should serve as agood starting point.The p -value of an observation x can be found by calculating the total proba-bility of the largest acceptance region not containing the observation. Though,this region can be large if the p -value of the observation is very small. To avoidthis, Algorithm 1 does not calculate very small p -values precisely, but only de-termines precise p -values above a certain threshold θ and otherwise states thatthe p -value is smaller than the threshold θ . Figure 5 illustrates the points evalu-ated by Algorithm 1 for samples with p -value greater, respectively smaller thansome threshold θ . Algorithm 1
Calculate exact p -value above some threshold. Require:
Observation x ∈ ∆ nm − , hypothesis π ∈ ∆ m − , threshold 0 < θ (cid:28) Ensure:
Exact p -value p ∈ [ θ,
1] or 0 if the p -value is less than θ .Calculate y ∈ ∆ nm − minimizing d ( y, E π X ) if T ( x ) ≤ T ( y ) then Set y = x end if Initialize r = 1, SumProb = 0 repeat Add f ( z ) to SumProb for points z ∈ B r ( x ) \ B r − ( y ) with T ( z ) < T ( x )Increment r = r + 1 until T ( x ) ≤ min { T ( z ) | d ( y, z ) = r } or SumProb > − θ if SumProb ≤ − θ thenreturn − SumProb elsereturn end if Enumeration of the full sample space can be implemented using a simple re-cursion. A similar, more complicated recursive scheme can be employed to enu-merate the samples at a given radius r in the repeat-loop of Algorithm 1. Thisis implemented in the R package ExactMultinom using a C++ subroutine toallow for fast recursions.As mentioned in the indroduction, algorithms for calculating exact multi-nomial tests superior to the full enumeration method have been proposed inthe literature. However, readily available open source implementations of thesemethods apparently do not exist. There are two packages implementing exactmultinomial tests using full enumeration of the sample space in R, namely,
EMT (Menzel, 2013) and
XNomial (Engels, 2015). Whereas
EMT is written purely inR, the function xmulti of the
XNomial package implements the full enumerationmethod using an efficient C++ subroutine for the recursion, which makes it alot more efficient. Therefore, xmulti was selected as reference method. . Resin/A Simple Algorithm for Exact Multinomial Tests llllllll lllll lll lllllll l lllll l ll ll llll ll llll lll ll l lllll lll l ll lll lll ll ll ll lll lll lll ll ll ll ll lll ll lll ll lll l lll l llll llllllll l l llllllll lllll lll lllllll l lllll l ll ll llll ll llll lll ll l lllll lll l ll lll lll ll ll ll lll lll ll lll ll ll ll l ll ll ll ll l ll lll lll ll ll l llll l lll ll l lllll l lllll l ll lll lll llll l ll lll ll lll lll l l l lllll ll llll l ll ll lll ll ll l lllll lllll lllll lllll l l llll ll llll ll l llll ll ll ll l ll ll ll lll ll ll ll ll ll lll ll ll llll lll lll llll ll l ll l lll lll ll ll l ll lll lll ll lll l lll lll ll l ll ll lll l llll l ll ll ll lll l ll lll ll lll lll l l ll ll l ll ll l l Fig 5 . Samples in ∆ (50) for which the probability mass and test statistic are evaluatedgiven the green observations x = (4 , , (left) and x = (10 , , (right) under the nullhypothesis p = ( , , ) and T = T P . The p -values are 0.3049 (left) and less than θ =0 . (right). The colored region on the left indicates the smallest acceptance region notcontaining the observed sample. The color gradient represents (null) probabilities within theregions. In the implementation of Algorithm 1 the p -values for the chi-square, log-likelihood ratio and probability mass test statistics are computed simultane-ously, as in xmulti and so comparability is ensured.The current implementation of Algorithm 1 accurately finds p -values of orderroughly as small as 10 − . Smaller p -values will often lead to negative outputbecause of limited computational precision in the addition of many floating pointnumbers. To ensure accurate results, I recommend to choose θ no less than 10 − with the current implementation.During early runs of the simulation study described in Section 4, it wasnoticed that the runtime of Algorithm 1 tends to increase drastically if thenull distribution contains very small probabilities, that is, there exists some i with π i (cid:28) n − . This is due to the acceptance regions becoming very flatand containing mostly points within a lower dimensional face of the discretesimplex for such null hypotheses. In this case, n is too small for Proposition 6below to take effect. As a heuristic, which turned out to be an effective remedy,the implementation does not enumerate entire balls if n · π i < , but onlyconsiders points z ∈ ∆ nm − with small z i , by skipping all points z for which P π ( X i ≥ z i ) < θ · − . The discrete simplex ∆ nm − contains | ∆ nm − | = (cid:0) n + m − m − (cid:1) points and so the fullenumeration takes O ( n m − ) operations to compute a p -value. In comparison, . Resin/A Simple Algorithm for Exact Multinomial Tests l l l l l l l
50 100 200 500 1000 2000
Log−log plot of runtime for m = 5 n r un t i m e i n m s O ( n ) O ( n ) l full enumerationAlgo. 1 with p Algo. 1 with p Fig 6 . Runtime of the full enumeration method and Algorithm 1 when enumerating a ballwith probability mass − θ for θ = 0 . and null hypotheses π = (0 . , . , . , . , . ,respectively π = (0 . , . , . , . , . . If the p -values of an observation are significantlylarger than θ , the runtime of Algorithm 1 considerably decreases. Times are mean values from10 runs. the acceptance regions at a fixed level α > O ( n m − ) pointsand this continues to hold for the smallest ball centered at the expected valuecontaining the acceptance region, as proven by Proposition 6 below. Therefore,Algorithm 1 only takes O ( n m − ) operations to determine a p -value above thethreshold θ . Figure 6 shows runtime as a function of n for m = 5. Whereas theruntime of the full enumeration method does not depend on the choice of π andthe observation x , the runtime of Algorithm 1 increases if the p -value of x issmall. Furthermore, the choice of π also influences the runtime of Algorithm 1with the uniform null hypothesis resulting in a longer runtime than sparse nullhypotheses. This is further investigated in the simulation study in Section 4.As the runtime increases exponentially in m , Algorithm 1 is only feasible if thenumber of categories m is small. Proposition 6.
For T ∈ { T χ , T G , T P } , there exists c such that A T ( α ) ⊂ B √ nc ( nπ ) for sufficiently large n .Proof. Consider the canonical extension ¯ T of T to ¯∆ nm − = { x ∈ R m ≥ : x + . . . + x m = n } and let B r ( nπ ) = { x ∈ ¯∆ nm − : (cid:107) x − nπ (cid:107) ≤ r } a ball in ¯∆ nm − with boundary ∂B r ( nπ ) = { x ∈ ¯∆ nm − : (cid:107) x − nπ (cid:107) = r } . Let r = min j π j > n ∈ N . If n ≥ n , then every x ∈ ∂B √ nn r ( nπ ) can be written as x = x ( n, x ) := nπ + √ nn ( x − π ) for some x ∈ ∂B r ( π ).Let ( t n, − α ) be the sequence of (1 − α )-quantiles of T n = T ( X n ) , X n ∼M m ( n, π ) for n ∈ N . As T n converges to χ m − in distribution, the sequence ofquantiles converges to the (1 − α )-quantile χ m − , − α (cf. Van der Vaart, 1998,Lemma 21.2). Consequently, the maximum t = max n t n, − α exists and the set A n = { x ∈ ¯∆ nm − : ¯ T ( x ) ≤ t } contains the acceptance region A Tn ( α ) for every n .As ¯ T is convex (Lemma 8 in Appendix B) and thus has convex sublevel . Resin/A Simple Algorithm for Exact Multinomial Tests sets, it suffices to show that n can be chosen such that min x ∈ ∂B √ nn r ( nπ ) ¯ T ( x )converges to a value > t to ensure that A Tn ( α ) ⊂ A n ⊂ B √ n ( √ n r ) ( nπ ) forsufficiently large n .In case T = T χ , observe that¯ T ( x ( n, x )) = (cid:88) j ( x j ( n, x ) − nπ j ) nπ j = (cid:88) j n ( x ,j − π j ) π j does not depend on n and so the canonical extension ¯ T of the chi-square statisticat radius √ nn r is bounded from below by b ( n ) = min x ∈ ∂B r ( n π ) ¯ T ( x ). Thisbound becomes arbitrarily large as n is increased.In case T = T G or T = T P , if n is fixed, ¯ T ( x ( n, x )) converges uniformlyto ¯ T χ ( x ( n, x )) for x ∈ ∂B r ( π ) (Lemma 9 in Appendix B). Therefore, theminimum min x ∈ ∂B √ nn r ( nπ ) ¯ T ( x ) converges to b ( n ).
4. Application
In this section, the use of the new method is illustrated in a simulation study. Onthe one hand, this serves to show the improvements in runtime in comparisonto the full enumeration method. On the other hand, this sheds some light onthe fit of the asymptotic approximation to the probability mass test providedby Theorem 1 for a medium sample size ( n = 100).As a practical application, the usage of exact multinomial tests to increasethe information conveyed by the calibration simplex (Wilks, 2013), a graphicaltool used to assess ternary probability forecasts, is outlined. For the simulation study, pairs ( π (1) , x (1) ) , . . . , ( π ( N ) , x ( N ) ) of null hypothesisparameters and samples were generated as i.i.d. realizations of the random quan-tity ( P, X ) with P ∼ U (∆ m − ) being uniformly distributed on the unit simplexand X | P ∼ M m ( n, P ). Then, for each pair, p -values were computed usingvarious test statistics and algorithms. In this way, no specific null hypothesishas to be chosen and instead a wide variety is considered. By drawing samplesfrom the null hypotheses, p -values follow a uniform distribution on [0 , N = 10 such pairs with samples ofsize n = 100 drawn from multinomial distributions with m = 5 outcomes. Exact p -values were computed using the implementation of Algorithm 1 provided bythe accompanying R package for all pairs. To estimate the speedup achievedby the new method in this study, the full enumeration method provided by the xmulti function of the XNomial package (Engels, 2015) was applied to the first10 pairs. Essentially, the computational cost of the full enumeration is constant,independent of the null hypothesis at hand and the resulting p -value, whereas . Resin/A Simple Algorithm for Exact Multinomial Tests . . . . . p T r un t i m e i n m s Fig 7 . Runtime against mean p -value in groups of 1000 samples with similar mean p -value.The solid line shows mean runtime per group, whereas the dashed lines are the 5% and 95%-quantile. The gray line shows the mean runtime using full enumeration. the cost of Algorithm 1 increases as the p -value decreases and also varies withthe null hypothesis.The implementation of Algorithm 1 took an average of 0.59 ms to calculate a p -value, whereas the full enumeration took 29.76 ms on average and so executionof the new method was about 50 times as fast. Perhaps surprisingly, MonteCarlo estimation (using xmonte from XNomial , which simulates 10000 samplesby default) took almost twice as long (53.49 ms) as the full enumeration. Figure7 illustrates the connection between runtime and size of the resulting p -valuesfor the new method. As there are other factors influencing the runtime and,as described in the previous section, the implementation computes p -values formultiple statistics simultaneously, samples were ordered by their mean p -value¯ p T = ( p T P + p T χ + p T G ) and then put in groups of 1000 samples each withsimilar mean p -value (in particular, a group contains all samples in between theempirical α - and ( α +11000 )-quantile for α = a and a = 0 , . . . , m − p -values for the three test statistics.Given a test statistic T and asymptotic approximation ˜ p T = ˜ p T ( x, π ) to theexact p -value p T = p T ( x, π ), the relative error was calculated as ˜ p T − p T p T , that is,the deviation of the approximation from the exact value in parts of the exactvalue. It can be seen that the asymptotic approximation to the chi-square statis-tic is quite accurate in most cases, but tends to underestimate small p -values( < . p -values on average. Asymptotic approximations of . Resin/A Simple Algorithm for Exact Multinomial Tests − . . . . Prob p T r e l a t i v e e rr o r − . . . . Chisq p T r e l a t i v e e rr o r − . . . . LLR p T r e l a t i v e e rr o r Fig 8 . Relative errors of asymptotic approximation for probability mass (Prob), chi-square(Chisq) and log-likelihood ratio (LLR) test statistic. The plots were obtained using the samegrouping scheme as in Figure 7. p−values Prob p T p−values Chisq p T p−values LLR p T Fig 9 . Histograms of asymptotic approximations to p -value for probability mass (Prob), chi-square (Chisq) and log-likelihood ratio (LLR) test statistic in black. The green lines indicatehistograms of respective exact p -values. The rightmost bar within the left histogram is notfully shown and extends further up to over 30000 counts.. Resin/A Simple Algorithm for Exact Multinomial Tests − − Prob − Chisq p T r e l a t i v e d i ff e r en c e − − Chisq − LLR p T r e l a t i v e d i ff e r en c e − − LLR − Prob p T r e l a t i v e d i ff e r en c e Fig 10 . Relative differences between exact p -values of probability mass (Prob), chi-square(Chisq) and log-likelihood ratio (LLR) test statistic against mean of compared p -values. Theplots were obtained using the same grouping scheme as in Figure 7. Pearson’s chi-square and the log-likelihood ratio have been studied well and theclassical chi-square approximations can be improved by using moment correc-tions (see Cressie and Read, 1989, and references therein). Furthermore, theerrors typically increase if some category has small expectation under the nullhypothesis. It can be seen that the approximation to the probability mass p -values provided by Theorem 1 produces somewhat larger errors especially forlarge p -values and that it clearly overestimates the p -values. This is emphasizedby the fact that within the simulation data only a vanishingly small number of p -values was slightly underestimated, all of which were well over 0.9. Figure 9illustrates how these estimation errors influence the distribution of the resulting p -values. Whereas the exact p -values clearly follow a uniform distribution (in-dicated in green), the asymptotic p -values clearly deviate from uniformity. Forthe probability mass statistic, the asymptotic test clearly yields a conservativetest, whereas the asymptotic log-likelihood ratio test (and also the asymptoticchi-square test at small significance levels) is slightly anti-conservative.Lastly, Figure 10 shows relative differences between exact p -values obtainedwith the three test statistics. Given test statistics T and T (cid:48) , the relative differ-ence between p -values p T = p T ( x, π ) and p T (cid:48) = p T ( x, π ) is calculated as p T − p T (cid:48) ¯ p T with ¯ p T = p T + p T (cid:48) . It can be seen that the choice of test statistic can make quitea difference. A closer look at the simulation data revealed that these differencestend to be smaller if expectations for all categories are large under the null. To . Resin/A Simple Algorithm for Exact Multinomial Tests Table 1
Exact p -values p T and asymptotic p -values ˜ p T of five randomly selected pairs ( x, π ) with . < p T G ( x, π ) < . . π p T P ˜ p T P p T χ ˜ p T χ p T G ˜ p T G (0 . , . , . , . , . . , . , . , . , . . , . , . , . , . . , . , . , . , . . , . , . , . , . provide some numerical insights, Table 1 lists exact and asymptotic p -values. Turning to an application in forecast verification, consider a random variable X and a probabilistic forecast F for X . For an introduction to probabilisticforecasting in general, see Gneiting and Katzfuss (2014). A probabilistic forecastis said to be calibrated if the conditional distribution of the quantity of interestgiven a forecast coincides with the forecast distribution, that is, X | F ∼ F (6)holds almost surely. Suppose now that X maps to one of three distinct outcomesonly. Then, a probabilistic forecast is fully described by the probabilities itassigns to each outcome.In this case, the calibration simplex (Wilks, 2013) can be used to graphi-cally identify discrepancies in predicted probabilities and conditional outcomefrequencies. Given i.i.d. realizations ( f , x ) , . . . , ( f N , x N ) consisting of forecastprobabilities (vectors within the unit 2-simplex) and observed outcomes encoded1, 2 and 3, forecast-outcome pairs with similar forecast probabilities are groupedaccording to a tessellation of the probability simplex. Thereafter, calibration isassessed by comparing average forecast and actual outcome frequencies withineach group.As illustrated in Figure 11, the calibration simplex is a graphical tool, toconduct this comparison visually. The groups are determined by overlayingthe probability simplex with a hexagonal grid. The circular dots correspondto nonempty groups of forecasts given by a hexagon. The dots’ areas are pro-portional to the number of forecasts per group. A dot is shifted away from thecenter of the respective hexagon by a scaled version of the difference in averageforecast probabilities and outcome frequencies. This provides valuable insightinto the forecast’s distribution and the conditional distribution of the quantityof interest. However, it is not apparent how big the differences may be merelyby chance.If the forecast is calibrated, then, by (6), the outcome frequencies ¯ x withina group of size n with mean forecast ¯ f follow a generalized multinomial distri-bution (the multinomial analog of the Poisson binomial distribution), that is, aconvolution of multinomial distributions M (1 , f i ) with parameters f , . . . , f n ∈ . Resin/A Simple Algorithm for Exact Multinomial Tests l l l l l l l l l l l l l l l l l l l l l l l l l p p p / / / / l l Error Scale −0.3 0 0.3
Fig 11 . Calibration Simplex with color-coded p -values using the probability mass statistic. Thisexample evaluates a total of 21240 club soccer predictions by FiveThirtyEight ( https: //projects. fivethirtyeight. com/ soccer-predictions/ ) for matches from September 2016until April 2019. Outcomes are encoded as “home win”, “draw” and “awaywin”. Only groups containing at least ten forecasts are shown. Blue indicates a p -value p T G ¿ 0.1, orange . > p T G ≥ . , red p T G < . and black p T G = 0 . ∆ m − . If these parameters only deviate little from their mean ¯ f = n (cid:80) i f i ,then, presumably, the generalized multinomial distribution should not deviatemuch from a multinomial distribution with parameter ¯ f . Under this presump-tion, multinomial tests can be applied to quantify the discrepancy within eachgroup through a p -value. As the number of outcomes m = 3 is small, exact p -values are efficiently computed by Algorithm 1 even for large sample sizes n .In Figure 11 p -values obtained from the log-likelihood ratio statistic are con-veyed through a coloring scheme. Note that a p -value will only ever be exactlyzero, if an outcome is forecast to have zero probability and said outcome stillrealizes. Figure 11 was generated using the R package CalSim (Resin, 2020).The calibration simplex can be seen as a generalization of the popular reli-ability diagram. In light of this analogy, the use of multinomial tests to assessthe statistical significance of differences in predicted probabilities and observedoutcome frequencies serves the same purpose as consistency bars in reliabilitydiagrams introduced by Br¨ocker and Smith (2007). Consistency bars are con-structed using Monte Carlo simulation. To justify the above presumption, themultinomial p -values used to construct Figure 11 were compared to p -values cal-culated from 10000 Monte Carlo samples obtained from the generalized multi- . Resin/A Simple Algorithm for Exact Multinomial Tests nomial distributions. To this end, the standard deviation of the Monte Carlo p -values was estimated using the estimated p -value in place of the true general-ized multinomial p -value. Most of the multinomial p -values were quite close tothe Monte Carlo estimates with an absolute difference less than two standarddeviations, whereas two of them deviated on the order of 6 to 8 standard devia-tions from the Monte Carlo estimates, which nonetheless resulted in a relativelysmall absolute error. In particular, using the Monte Carlo estimated p -values didnot change Figure 11. As computation of the Monte Carlo estimates from thegeneralized multinomial distributions is computationally expensive, the multi-nomial p -values serve as a fast and adequate alternative. Further improvinguncertainty quantification within the calibration simplex is a subject for futurework.
5. Concluding Remarks
A new method for calculating exact p -values was investigated. It has been il-lustrated that the new method works well when the number m of categories issmall. This results in a concrete speedup in practical applications as illustratedthrough a simulation study.Regarding the choice of test statistic, the “exact multinomial test” was treatedas a test statistic and the asymptotic distribution of the resulting probabilitymass statistic was derived. Like most prominent test statistics, the probabil-ity mass statistic yields unbiased tests for the uniform null hypothesis. It wasshown that a randomized test based on the probability mass statistic can becharacterized in that it minimizes the respective (weighted) acceptance region.Although asymptotic approximations work well in many use cases, there arecases, where these approximations are not adequate, for example, when deal-ing with small sample sizes or small expectations. On the other hand, there isnothing to be said against the use of exact tests whenever feasible and it isrecommended in the applied literature (McDonald, 2009, p. 83) for samples ofmoderate size up to 1000. As the available implementations of exact multinomialtests in R use full enumeration, the new implementation increases the scope ofexact multinomial tests for practitioners. Appendix A: Difference Between Log-Likelihood Ratio andProbability Mass StatisticLemma 7.
Let π ∈ ∆ m − with π j > for all j = 1 , . . . , m and x ∈ ∆ nm − .Then T P ( x, π ) − T G ( x, π ) = m (cid:88) j =1 (log( x j ) + 2 r ( x j ) − log( nπ j ) − r ( nπ j )) for a function r on the positive real numbers for which < r ( x ) < x for x > .In case x j = 0 for some j = 1 , . . . , m , the above equality holds if log(0) + 2 r (0) is understood to be 0. . Resin/A Simple Algorithm for Exact Multinomial Tests Proof.
The logarithm of the Gamma function can be written aslog Γ( x + 1) = log x Γ( x ) = x log( x ) − x + 12 log(2˜ πx ) + r ( x )for a function r on the positive real numbers for which 0 < r ( x ) < x holds forall x > π denotesArchimedes’ constant). This yieldslog ¯ f n, yn ( y ) = log Γ( n + 1) + (cid:88) j (cid:16) y j log y j n − log Γ( y j + 1) (cid:17) = log Γ( n + 1) + (cid:88) j (cid:18) y j log y j n − y j log( y j ) + y j −
12 log(2˜ πy j ) − r ( y j ) (cid:19) = log Γ( n + 1) + n (1 − log n ) − (cid:88) j (cid:18)
12 log(2˜ πy j ) + r ( y j ) (cid:19) for y ∈ R m> such that (cid:80) j y j = n , and hence T P ( x, π ) − T G ( x, π ) = 2(log ¯ f n,π ( nπ ) − log f n, xn ( x ))= 2 (cid:88) j (cid:18)
12 log x j nπ j + r ( x j ) − r ( nπ j ) (cid:19) Appendix B: Details for the Proof of Proposition 6
The following two lemmas provide further details not contained in the proof ofProposition 6 itself.
Lemma 8.
Using notation as in the proof of Proposition 6, x (cid:55)→ ¯ T ( x ) is convex.Proof. The function x (cid:55)→ ¯ T χ ( x ) = (cid:80) j x j nπ j − n is clearly convex as it is a sumof convex functions.The function x (cid:55)→ ¯ T G ( x ) = 2 (cid:80) j x j log( x j ) − x j log( nπ j ) is convex since x (cid:55)→ x log( x ) is convex (an elementary proof of this can be given using eitherthe inequality of the arithmetic and geometric means or the second derivative).The function x (cid:55)→ ¯ T P ( x ) = 2(log( ¯ f n,π ( nπ )) − log(Γ( n + 1)) + (cid:80) j log(Γ( x j +1)) − (cid:80) j x j log( p j )) is convex as the Gamma function is logarithmically convexby the Bohr-Mollerup theorem (Beals and Wong, 2010, Theorem 2.4.2). Lemma 9.
Using notation as in the proof of Proposition 6, the function ∂B r ( π ) → R , x (cid:55)→ ¯ T ( x ( n, x )) converges uniformly to ¯ T χ ( x ( n, x )) as n → ∞ if T = T G or T = T P . . Resin/A Simple Algorithm for Exact Multinomial Tests Proof.
Let x ∈ ∂B r ( π ) and define c = c ( x ) := √ n ( x − π ). Hence | c j | ≤√ n r < √ n for all j = 1 , . . . , m . Consider first the case T = T G . Then (usingthe Taylor expansion log(1 + x ) = (cid:80) ∞ k =1 ( − k +1 x k k )¯ T ( x ( n, x )) = 2 m (cid:88) j =1 x ( n, x ) j log x ( n, x ) j nπ j = 2 (cid:88) j ( nπ j + √ nc j ) log nπ j + √ nc j nπ j = 2 (cid:88) j ( nπ j + √ nc j ) ∞ (cid:88) k =1 ( − k +1 k (cid:18) c j √ nπ j (cid:19) k = 2 (cid:88) j (cid:32) √ nc j + c j π j − c j √ nπ j + nπ j + √ nc j √ n ∞ (cid:88) k =3 ( − k +1 c kj k √ n k − π kj (cid:33) As (cid:80) j c j = 0 and 2 (cid:80) j c j π j = T χ ( x ( n, x )), the inequalities | ¯ T χ ( x ( n, x )) − ¯ T ( x ( n, x )) | < (cid:88) j (cid:32) | c j | √ nπ j + nπ j + √ n | c j |√ n ∞ (cid:88) k =3 | c j | k k √ n k − π kj (cid:33) < (cid:88) j (cid:32) √ n √ nπ j + nπ j + √ n √ n √ n ∞ (cid:88) k =3 √ n k k √ n k − π kj (cid:33) < √ n (cid:88) j (cid:32) √ n π j + ( π j + √ n ) C ( n ) (cid:33) hold, where the series converges to some C ( n ) for sufficiently large n by the ratiotest and C ( n ) decreases as n increases. As this upper bound is independent ofthe choice of x uniform convergence is ensured.Using Lemma 7 in case T = T P , the inequality | ¯ T G ( x ( n, x )) − ¯ T ( x ( n, x )) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m (cid:88) j =1 (cid:18) log x ( n, x ) j nπ j + 2 r ( x ( n, x ) j ) − r ( nπ j ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:88) j (cid:18) log nπ j + √ nc j nπ j + 2 r ( nπ j + √ nc j ) − r ( nπ j ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < (cid:88) j (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) log (cid:18) − √ n r √ nπ j (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) + 212( nπ j − √ nn r ) (cid:19) . Resin/A Simple Algorithm for Exact Multinomial Tests holds and the upper bound converges to zero independent of the choice of x .Hence ¯ T χ − ¯ T = ( ¯ T χ − ¯ T G ) + ( ¯ T G − ¯ T )converges uniformly to zero as a function on ∂B r ( π ) in the sense of the lemma. References
Abramowitz, M. and
Stegun, I. A. (1972).
Handbook of Mathematical Func-tions with Formulas, Graphs, and Mathematical Tables , tenth printing ed.
National Bureau of Standards Applied Mathematics Series . Dover Pub-lishing. Baglivo, J. , Olivier, D. and
Pagano, M. (1992). Methods for exactgoodness-of-fit tests.
Journal of the American Statistical Association Beals, R. and
Wong, R. (2010).
Special Functions: A Graduate Text .Cambridge University Press.
Bejerano, G. , Friedman, N. and
Tishby, N. (2004). Efficient exact p-valuecomputation for small sample, sparse, and surprising categorical data.
Journalof Computational Biology Br¨ocker, J. and
Smith, L. A. (2007). Increasing the reliability of reliabilitydiagrams.
Weather and Forecasting Cohen, A. and
Sackrowitz, H. B. (1975). Unbiasedness of the chi-square,likelihood ratio, and other goodness of fit tests for the equal cell case.
TheAnnals of Statistics Cressie, N. and
Read, T. R. C. (1984). Multinomial goodness-of-fit tests.
Journal of the Royal Statistical Society: Series B (Methodological) Cressie, N. and
Read, T. R. C. (1989). Pearson’s X and the loglikelihoodratio statistic G : A comparative review. International Statistical Review Engels, B. (2015). XNomial: Exact goodness-of-fit test for multinomial datawith fixed probabilities R package version 1.0.4. https://CRAN.R-project.org/package=XNomial . Gibbons, J. D. and
Pratt, J. W. (1975). P-values: Interpretation andmethodology.
The American Statistician Gneiting, T. and
Katzfuss, M. (2014). Probabilistic forecasting.
Annual Re-view of Statistics and Its Application Hirji, K. F. (1997). A comparison of algorithms for exact goodness-of-fit testsfor multinomial data.
Communications in Statistics - Simulation and Com-putation Keich, U. and
Nagarajan, N. (2006). A fast and numerically robust methodfor exact multinomial goodness-of-fit test.
Journal of Computational andGraphical Statistics . Resin/A Simple Algorithm for Exact Multinomial Tests Koehler, K. J. and
Larntz, K. (1980). An empirical investigation ofgoodness-of-fit statistics for sparse multinomials.
Journal of the AmericanStatistical Association
Kotze, T. J. V. W. and
Gokhale, D. V. (1980). A comparison of thePearson- X and log-likelihood-ratio statistics for small samples by means ofprobability ordering. Journal of Statistical Computation and Simulation Lehmann, E. L. and
Romano, J. P. (2005).
Testing Statistical Hypotheses ,third edition ed.
Springer Texts in Statistics . Springer, New York.
McDonald, J. H. (2009).
Handbook of Biological Statistics , Second edition ed.Sparky House Publishing, Baltimore.
Menzel, U. (2013). EMT: Exact multinomial test: Goodness-of-fit test for dis-crete multivariate data R package version 1.1. https://CRAN.R-project.org/package=EMT . Murota, K. (2003).
Discrete Convex Analysis . SIAM Monographs on DiscreteMathematics and Applications . Society for Industrial and Applied Mathemat-ics (SIAM), Philadelphia, PA.
Murota, K. and
Shioura, A. (2003). Quasi M-convex and L-convex functions- quasiconvexity in discrete optimization.
Discrete Applied Mathematics
P´erez, T. and
Pardo, J. A. (2003). On choosing a goodness-of-fit test fordiscrete multivariate data.
Kybernetes Radlow, R. and
Alf, E. F. J. (1975). An alternate multinomial assessmentof the accuracy of the χ test of goodness of fit. Journal of the AmericanStatistical Association Resin, J. (2020). CalSim: The calibration simplex R package version 0.5.0. https://CRAN.R-project.org/package=CalSim . Tate, M. W. and
Hyer, L. A. (1973). Inaccuracy of the X test of goodnessof fit when expected frequencies are small. Journal of the American StatisticalAssociation R Core Team (2020). R: A language and environment for statistical com-puting R Foundation for Statistical Computing, Vienna, Austria . Van der Vaart, A. W. (1998).
Asymptotic Statistics . Cambridge Series in Sta-tistical and Probabilistic Mathematics . Cambridge University Press, Cam-bridge. Wakimoto, K. , Odaka, Y. and
Kang, L. (1987). Testing the goodness of fitof the multinomial distribution based on graphical representation.
Computa-tional Statistics & Data Analysis West, E. N. and
Kempthorne, O. (1972). A comparison of the chi andlikelihood ratio tests for composite alternatives. Journal of Statistical Com-putation and Simulation Wilks, D. S. (2013). The calibration simplex: A generalization of the reliabilitydiagram for three-category probability forecasts.
Weather and Forecasting28