Fast calculation of p-values for one-sided Kolmogorov-Smirnov type statistics
FFast calculation of p-values for one-sided
Kolmogorov-Smirnov type statistics
Amit MoscovichProgram in Applied and Computational Mathematics, Princeton [email protected] 11, 2020
Abstract
We present a method for computing exact p-values for a large family of one-sidedcontinuous goodness-of-fit statistics. This includes the higher criticism statistic, one-sided weighted Kolmogorov-Smirnov statistics, and the one-sided Berk-Jones statis-tics. For a sample size of 10,000, our method takes merely 0.15 seconds to run and itscales to sample sizes in the hundreds of thousands. This allows practitioners workingon genome-wide association studies and other high-dimensional analyses to use exactfinite-sample computations instead of statistic-specific approximation schemes.Our work has other applications in statistics, including power analysis, finding α -level thresholds for goodness-of-fit tests, and the construction of confidence bandsfor the empirical distribution function. The algorithm is based on a reduction to theboundary-crossing probability of a pure jump process and is also applicable to fieldsoutside of statistics, for example in financial risk modeling. Keywords:
Continuous goodness-of-fit, Higher criticism, Empirical process, Boundary cross-ing, Hypothesis testing 1 a r X i v : . [ s t a t . C O ] S e p Introduction
Let X , . . . , X n be n i.i.d. random variables drawn uniformly from the unit interval. Thispaper presents a fast O ( n ) algorithm for computing the one-sided non-crossing probability ,NCPROB( B , . . . , B n ) := Pr (cid:104) ∀ i : X ( i ) ≤ B i (cid:12)(cid:12) X , . . . , X n i.i.d. ∼ U [0 , (cid:105) . (1)where X (1) ≤ . . . ≤ X ( n ) are the order statistics of the unsorted sample X , . . . , X n . Aclosely related problem is the computation of one-sided non-crossing probabilities for theempirical cumulative distribution function (CDF). Given a function b : [0 , → R , this isthe probability thatPr (cid:104) ∀ t ∈ [0 ,
1] : b ( t ) ≤ n ˆ F n ( t ) (cid:12)(cid:12) X , . . . , X n i.i.d. ∼ U [0 , (cid:105) . (2)where ˆ F n ( t ) := n (cid:80) i ( X i ≤ t ). In fact, the problems of calculating (1) and (2) areequivalent. The calculation of the probability (2) can be reduced to (1) by setting thebounds B i to be the points where b ( t ) first passes the integer i . Conversely, the problemof calculating (1) can be reduced to the calculation of (2) by setting b ( t ) = (cid:80) i ( B i < t ).See Figure 1 for an illustration and Appendix A for a precise treatment of this reduction. b ( t )012345 0 B B B B Figure 1: A one-sided boundary function b ( t ) and a scaled empirical CDF n ˆ F n ( t ) with n = 5 samples. The empty circles mark the integer crossing points of b ( t ). It is easy tosee that n ˆ F n ( t ) does not cross b ( t ) if and only if for every i , ˆ F n ( B i ) ≥ i (equivalently that X ( i ) ≤ B i ). The filled circles at the integer crossings B , . . . , B n define a layer graph ofnon-crossing transitions of the empirical CDF.2 .1 Motivation The primary motivation for this work is the computation of p -values and power for a largefamily of one-sided continuous goodness-of-fit statistics. Examples include the Higher-Criticism statistic (Donoho and Jin, 2004), one-sided variants of the Kolmogorov-Smirnovstatistic (Kolmogorov, 1933; R´enyi, 1953; Eicker, 1979; Jaeschke, 1979; Mason and Schuen-emeyer, 1983; Jager and Wellner, 2004), variants of the one-sided Berk-Jones statistics(Berk and Jones, 1979; Jager and Wellner, 2005), φ -divergence statistics (Jager and Well-ner, 2007), tests based on local-level shape functions (Finner and Gontscharuk, 2018), andgGOF statistics (Zhang et al., 2020).All of these one-sided statistics have the general form S := max i =1 ,...,n s i ( F ( x ( i ) )) (3)where x (1) ≤ . . . ≤ x ( n ) are the order statistics of a sample that, under the null hypothesis,is drawn from some continuous distribution F , and s , . . . , s n : R → R are monotoneincreasing functions.Following the seminal work of Donoho and Jin (2004), this classical field has attractedsignificant renewed interest related to one-sided statistics for optimal detection of sparsesignals. Examples include the works of Hall and Jin (2010); Arias-Castro et al. (2011); Liand Siegmund (2015); Arias-Castro et al. (2020); Porter and Stewart (2020). In the nextsection, we describe how the computation of p -values and power of statistics of the form(3) is nothing but the calculation of the probability (1) for a suitably-chosen set of bounds.Thus, the fast and exact computation of the probability (1) is of fundamental importanceto statistics.An alternative to exact computation is the use of asymptotics. For the higher criticism,Berk-Jones, and some related statistics, the asymptotic distributions are known (Eicker,1979; Jaeschke, 1979; Wellner and Koltchinskii, 2003; Moscovich et al., 2016). Unfortu-nately, the convergence of the null distribution to its limiting form can be exceedingly slow(Gontscharuk et al., 2015), rendering the asymptotics inapplicable. More sophisticated ap-proximations were developed (for example, Li and Siegmund (2015)), but these are specificto a particular statistic and the quality of their approximation is difficult to analyze. Exactfinite-sample computations are generally preferable when they are fast enough.In the next subsections, we describe in detail how the computation of p -values andpower of one-sided statistics of the general form (3) naturally reduces to a calculation ofthe probability (1), we discuss a simple strategy for inverting the distribution of such astatistic to obtain α -level thresholds and mention various applications that involve thenon-crossing probability (1). Let S be a statistic of the maximum form (3). Clearly, S < s if and only if s i ( F ( x ( i ) )) < s forall i . Since s i and F are monotone non-decreasing, this occurs if and only if F ( x ( i ) ) < s − i ( s )3or all i . The distribution of the statistic S under the null hypothesis that X i i.i.d. ∼ F is thusPr (cid:104) S ≤ s (cid:12)(cid:12) X i i.i.d. ∼ F (cid:105) = Pr (cid:104) ∀ i : F ( X ( i ) ) ≤ s − i ( s ) (cid:12)(cid:12) X i i.i.d. ∼ F (cid:105) . (4)Define U i = F ( X i ) and note that under the assumption that X i ∼ F we have U i ∼ U [0 , U ( i ) = F ( X ( i ) ). This means we can rewrite Eq. (4) asPr (cid:104) ∀ i : U ( i ) ≤ s − i ( s ) (cid:12)(cid:12) U , . . . , U n i.i.d. ∼ U [0 , (cid:105) . (5)Thus the computation of the distribution of the statistic S reduces to the calculation ofthe probability in Eq. (1) with B i = s − i ( s ). The p -value of a the statistic S is given byp-value( s ) = Pr (cid:104) S > s (cid:12)(cid:12) X i i.i.d. ∼ F (cid:105) = 1 − NCPROB( s − , . . . , s − n ) . Computing the power of such a statistic against a known alternative that X , . . . , X n i.i.d. ∼ G similarly reduces to Eq. (1), since in that casePr (cid:104) S > s (cid:12)(cid:12) X i i.i.d. ∼ G (cid:105) = 1 − Pr (cid:104) ∀ i : s i ( F ( X ( i ) )) ≤ s (cid:12)(cid:12) X i i.i.d. ∼ G (cid:105) (6)= 1 − Pr (cid:104) ∀ i : G ( X ( i ) ) ≤ G ( F − ( s − i ( s ))) (cid:12)(cid:12) X i i.i.d. ∼ G (cid:105) (7)= 1 − Pr (cid:104) U ( i ) ≤ G ( F − ( s − i ( s ))) (cid:12)(cid:12) U i i.i.d. ∼ U [0 , (cid:105) (8)= 1 − NCPROB (cid:16) G (cid:0) F − ( s − ( s )) (cid:1) , . . . , G (cid:0) F − ( s − n ( s )) (cid:1)(cid:17) . (9) α -level thresholds Given α >
0, how can we pick a threshold to obtain an α -level test based on a statistic S ofthe form (3)? This is the threshold s n,α that satisfies Pr (cid:104) S > s n,α (cid:12)(cid:12) X i i.i.d. ∼ F (cid:105) = α . Since theprobability Pr [ S > s ] is monotone-decreasing in s , we can find s n,α by repeated bisection,thus inverting the cumulative distribution of the statistic S numerically. If we know that s n,α ∈ [ a, b ] then an approximation of s n,α with additive error < (cid:15) may be obtained usingbinary search. This search involves O (log(( b − a ) /(cid:15) )) calculations of probabilities of the form(1). When the range of s n,α is not known in advance, one can use a doubling search (Bentleyand Yao, 1976) to obtain an (cid:15) -approximation with O (log( s n,α /(cid:15) )) probability calculationsof the form (1). Applications which involve probabilities of the form (1) and (2) include the construction ofconfidence bands for empirical distribution functions (Owen, 1995; Frey, 2008; Matthews,2013), multiple hypothesis testing (Meinshausen and Rice, 2006; Roquain and Villers, 2011;4on Schroeder and Dickhaus, 2020), change-point detection (Worsley, 1986), sequentialtesting (Dongchu, 1998), financial risk modeling (Dimitrova et al., 2017; Goffard, 2019),genome-wide association studies (Sabatti et al., 2009; Barnett et al., 2017; Sun and Lin,2019), exoplanet detection (Sulis et al., 2017), cryptography (Ding et al., 2018), economet-rics (Goldman and Kaplan, 2018), and inventory management (Dimitrova et al., 2020).
Many methods for computing or estimating the one-sided non-crossing probability (1)have been proposed over the years. One simple and commonly practiced approach isto repeatedly generate X , . . . , X n i.i.d. ∼ F and measure the percentage of times that theinequalities X ( i ) ≤ B i hold. This Monte-Carlo approach does not yield accurate resultsand can be slow when the probability of interest is small and the sample size n is large.For the exact computation of the noncrossing probability (1), first note that for each setof order statistics with no repetitions X (1) < X (2) . . . < X ( n ) there are exactly n ! instancesof ( X , . . . , X n ) that map to it. Since X i i.i.d. ∼ U [0 , X , . . . , X n ) is one on the unit cube, the density of the sorted vector of order statistics( X (1) , . . . , X ( n ) ) is equal to n ! on the simplex that satisfies X (1) < . . . < X ( n ) and zeroelsewhere (we may ignore events of measure zero that X ( i ) = X ( i +1) ). It follows thatPr (cid:2) ∀ i : X ( i ) ≤ B i (cid:3) = Pr (cid:2) ∀ i : X ( i ) < B i (cid:3) (10)= n ! Pr [ ∀ i : X i < B i and X < X < . . . < X n ] . (11)This may be decomposed recursively as n ! Pr [ ∀ i : X i < B i and X < X < . . . < X n ] (12)= n ! (cid:90) B dX Pr [ ∀ i = 2 , . . . , n : X i < B i and X < X < . . . < X n | X ] (13)= n ! (cid:90) B dX (cid:90) B X dX Pr [ ∀ i = 3 , . . . , n : X i < B i and X < X < . . . < X n | X ] (14)= . . . = n ! (cid:90) B dX (cid:90) B X dX (cid:90) B X dX . . . (cid:90) B n X n − dX n . (15)This recursion was first noted by Wald and Wolfowitz (1939) who demonstrated the sym-bolic computation of this integral with n = 6 samples. The integral (15) was analyzed byDurbin (1973) for the case where the bounds B i increase linearly, leading to a closed-formexpression for the distribution of the one-sided Kolmogorov-Smirnov statistics.For more general boundaries, many authors have proposed recursive formulas. Of par-ticular note is the formula in Proposition 3.2 of Denuit et al. (2003). This O ( n ) recursive5ormula was first derived by No´e and Vandewiele (1968) and used to tabulate percent-age points of standardized one-sided Kolmogorov-Smirnov statistics for sample sizes up to n = 100. Their formula contains sums of large binomial coefficients multiplied by very smallnumbers, leading to numerical instabilities for n >
120 (See Section 1 of Khmaladze andShinjikashvili (2001)). A different approach with a computational cost of O ( n ) is based onthe numerical integration of Eq. (15) and was shown to be stable up to n ≈ ,
000 usingstandard double-precision floating-point numbers (Moscovich et al., 2016). While it is pos-sible to use variable precision floating-point numbers or rational arithmetic to alleviate theloss of numerical accuracy (see for example (Brown and Harvey, 2008a,b; von Schroederand Dickhaus, 2020)), this approach incurs heavy runtime penalties compared to the useof numerically stable methods that can run using standard floating-point numbers.
A related set of methods is concerned with the two-sided boundary crossing probability,Pr (cid:104) ∀ i : b i ≤ X ( i ) ≤ B i (cid:12)(cid:12) X , . . . , X n i.i.d. ∼ U [0 , (cid:105) . (16)Several methods have been proposed for the calculation of these probabilities (Epanech-nikov, 1968; Steck, 1971; Durbin, 1971; No´e, 1972; Friedrich and Schellhaas, 1998; Khmal-adze and Shinjikashvili, 2001; Moscovich and Nadler, 2017). All of these methods have acomputational cost of O ( n ) except for the O ( n log n ) method of (Moscovich and Nadler,2017), on which the current paper is based. We now describe the methods of Friedrich and Schellhaas (1998); Khmaladze and Shin-jikashvili (2001); Moscovich and Nadler (2017) that form the basis of our algorithm. Thesemethods compute the two-sided non-crossing probabilities of the form (16). Here, we focusour exposition on the one-sided case where b i = 0 for all i . In this subsection we describe a minor variant of “scheme 1” of Friedrich and Schellhaas(1998), specialized to the one-sided boundary case. For ease of exposition, we denote B := 0 , B n +1 := 1 . (17) The procedure of Steck (1971) is based on the computation of an n × n matrix determinant and thatof Durbin (1971) is based on solving a system of linear equations. Theoretically, with the Coppersmith-Winograd algorithm and related methods, both methods have an asymptotic runtime of just O ( n . ).However, these methods involve huge runtime constants and are not practical. S ( i, j ) be the following probability, S ( i, j ) := Pr (cid:104) n ˆ F n ( B i ) = j and X (1) ≤ B , . . . , X ( i − ≤ B i − (cid:105) . (18)Our quantity of interest is S ( n + 1 , n ) = Pr (cid:2) ∀ i : X ( i ) ≤ B i (cid:3) . We now show how to computethis quantity using recursion relations. The initial conditions are S (0 , j ) = δ ,j . Thetransition probabilities are given by the following Chapman-Kolmogorov equations, S ( i + 1 , j ) = (cid:88) k ≥ i S ( i, k ) · Pr (cid:104) n ˆ F n ( B i +1 ) = j (cid:12)(cid:12) n ˆ F n ( B i ) = k (cid:105) . (19)The condition k ≥ i , or equivalently that n ˆ F n ( B i ) ≥ i , is what guarantees that X ( i ) ≤ B i (see dashed arrows in Figure 1). The probability of the transition k → j is the Binomialprobability mass function (PMF),Pr (cid:104) n ˆ F n ( B i +1 ) = j (cid:12)(cid:12) n ˆ F n ( B i ) = k (cid:105) = Pr [Binomial( n − k, p i ) = j − k ] (20)= (cid:18) n − kj − k (cid:19) p j − ki (1 − p i ) n − j . (21)where p i := Pr [ B i < X < B i +1 | X > B i ]. Recall that we assumed that X ∼ U [0 ,
1] andtherefore p i = ( B i +1 − B i ) / (1 − B i ).To compute the non-crossing probability S ( n + 1 , n ), one can start by setting S (1 , j )for all j via Eq. (19), then proceed to compute S (2 , j ) for all j , etc. at a total runtime costof O ( n ). This computation, which we dub Stepwise Binomial propagation , is illustratedin Figure 1. The filled circles represent elements of S ( i, j ) with j ≥ i whereas the hollowcircles S ( i, i −
1) correspond to the probability of having X (1) ≤ B , . . . , X ( i − ≤ B i − but X ( i ) (cid:2) B i . In the empirical CDF formulation of Eq. (2), the elements S ( i, i −
1) correspondto paths of n ˆ F n ( t ) whose first crossing of the lower boundary b ( t ) occurs at B i . There is a simple connection between the empirical CDF of an i.i.d. sample and a condi-tioned Poisson process:
Lemma 1.
Let X , . . . , X n i.i.d. ∼ U [0 , be a sample and let ˆ F n ( t ) = n (cid:80) i ( X i ≤ t ) beits empirical CDF. The distribution of the process n ˆ F n ( t ) is identical to that of a Poissonprocess ξ n ( t ) with intensity n , conditioned on ξ n (1) = n . For the proof, see Shorack and Wellner (2009, Prop 2.2, Ch. 8). The calculation ofthe non-crossing probability in Eq. (2) may thus be reduced to the calculation of the non-crossing probability of a Poisson process ξ n with intensity n . Let Q ( i, , Q ( i, , . . . be thenon-crossing-up-to- B i probabilities of ξ n , defined in analogy to S ( i, j ) in Eq. (18), Q ( i, j ) := Pr [ ξ n ( B i ) = j and ξ n ( B ) ≥ ξ n ( B ) ≥ . . . , ξ n ( B i − ) ≥ i − . (22)7he recursion relations for all i = 0 , . . . , n + 1 and j = 0 , . . . , n mimic those of S ( i, j ), Q (0 , j ) = δ ,j , (23) Q ( i + 1 , j ) = (cid:88) k ≥ i Q ( i, k ) · Pr (cid:2) ξ n ( B i +1 ) = j (cid:12)(cid:12) ξ n ( B i ) = k (cid:3) , (24)where the transition probabilities are now given by Poisson counts,Pr (cid:2) ξ n ( B i +1 ) = j (cid:12)(cid:12) ξ n ( B i ) = k (cid:3) = Pr [Pois( n ( B i +1 − B i )) = j − k ] (25)= ( n ( B i +1 − B i )) j − k e − n ( B i +1 − B i ) / ( j − k )! (26)The algorithm based on this recursion, which we dub stepwise Poisson propagation proceedsby computing Q (1 , j ) for all j , then Q (2 , j ) for all j , etc. Finally, by Lemma 1, S ( n + 1 , n ) = Pr (cid:104) ∀ t : b ( t ) ≤ n ˆ F n ( t ) (cid:105) = Pr [ ∀ t : b ( t ) ≤ ξ n ( t ) | ξ n (1) = n ] (27)= Pr [ ∀ t : b ( t ) ≤ ξ n ( t ) and ξ n (1) = n ]Pr [Pois( n ) = n ] = Q ( n + 1 , n ) n n e − n /n ! . (28)This method was proposed by Khmaladze and Shinjikashvili (2001) for two-sided boundarycrossing probabilities. It has the same O ( n ) asymptotic running time as the stepwiseBinomial propagation of Friedrich and Schellhaas (1998) which we described in Section 2.1. In contrast to the Binomial propagation described in Section 2.1, in the stepwise Poissonpropagation the transition probabilities Pr (cid:2) ξ n ( B i +1 ) = j (cid:12)(cid:12) ξ n ( B i ) = k (cid:3) in Eq. (25) do notdepend on j or k but only on their difference. This is a direct result of the memorylessproperty of the Poisson process. As a result we can rewrite the recurrence relations (24)as a discrete convolution followed by a truncation of the first element, C ( i +1) = Q ( i ) (cid:63) π ( n ( B i +1 − B i )) (29) Q ( i +1) = ( C ( i +1)2 , . . . , C ( i +1) n − i +1 ) (30)where Q ( i ) := ( Q ( i, i ) , Q ( i, i + 1) , . . . , Q ( i, n )) (31) π ( λ ) := (Pr [Pois( λ ) = 0] , Pr [Pois( λ ) = 1] , . . . , Pr [Pois( λ ) = n ]) . (32)Each of these linear convolutions can be computed efficiently in O ( n log n ) time steps usingthe fast Fourier transform (FFT) and the circular convolution theorem for discrete signals(Press et al., 1992, Ch. 12,13). The resulting procedure has a total running time of O ( n log n ) and is numerically stable for large sample sizes using standard double-precision(64-bit) floating-point numbers. For more details see (Moscovich and Nadler, 2017).This stepwise FFT-based procedure can also be used to compute the non-crossing prob-abilities for non-homogeneous Poisson processes, negative binomial processes, and othertypes of stochastic jump processes (Dimitrova et al., 2020).8 Proposed algorithm
In the previous section we described how, for any i , one can obtain the non-crossing prob-abilities vector Q ( i ) , defined in Eq. (31), by computing a truncated linear convolution of Q ( i − and the PMF of a Poisson random variable. Starting from Q ( i ) for some i and re-peating this process k times we obtain Q ( i ) → Q ( i +1) → . . . → Q ( i + k ) in O ( kn log n ) time.In this section we show how, for any k ∈ (log n, n/ log n ), it is possible to go directly from Q ( i ) to Q ( i + k ) using just O ( kn ) steps. This makes the total runtime for computing Q ( n +1) be (cid:6) n +1 k (cid:7) O ( kn ) = O ( n ) . The main idea behind our method is simple. We first computeall the transition probabilities of the Poisson process ξ n , Q ( i, j ) · Pr [ ξ n ( B i + k ) = (cid:96) | ξ n ( B i ) = j ] (33)from all non-zero elements Q ( i, j ). This includes the contributions of good paths thatdo not cross the boundary but also the contributions of bad paths that intersect the lowerboundary function b ( t ) somewhere in the interval [ B i , B i + k ). All that remains is to subtractthe contributions of these bad paths. Each intersecting path must have the first intersectionat exactly one of the points B i , B i +1 , . . . B i + k − . With some careful accounting that wedescribe in the next section, we can efficiently compute the probability of having the firstintersection at each of these points and then subtract the individual contributions to thearrival probabilities in Eq. (33). Let f ( t ) : [0 , → { , , . . . } be some non-decreasing function which will represent aparticular instance of a Poisson process and let b ( t ) : [0 , → R be a lower boundaryfunction. Recall that a non-intersecting path satisfies that f ( B j ) ≥ j for all j . Definition 1.
For every i ∈ { , . . . , n + 1 } , we define two logical predicates,NC( f, i ) := ∀ (cid:96) < i : f ( B (cid:96) ) ≥ (cid:96), (34)FC( f, i ) := NC( f, i ) and f ( B i ) < i (35)= ∀ (cid:96) < i : f ( B (cid:96) ) ≥ (cid:96) and f ( B i ) = i − i ” and FC for “first crossing at i ”. By the definitionsof Q and NC, Q ( i + k, j ) = Pr [ ξ n ( B i + k ) = j and ∀ (cid:96) < i + k : ξ n ( B (cid:96) ) ≥ (cid:96) ] (37)= Pr [ ξ n ( B i + k ) = j and NC( ξ n , i + k )] . (38) Proposition 1.
Let k > be some integer. Given Q ( i ) the probabilities Pr [ ξ n ( B i + k ) = j and NC( ξ n , i )] can be computed for all values of j in O ( n log n ) time. roof. Pr [ ξ n ( B i + k ) = j and NC ( ξ n , i )] = (cid:88) i ≤ k ≤ j Q ( B i , k ) · Pr [Pois( n ( B i + k − B i )) = j − k ] . These values, as a function of j , are a truncated linear convolution of Q ( i ) and the PMF ofa Poisson random variable with intensity n ( B i + k − B i ). As explained in Section 2.3, thisconvolution can be computed in O ( n log n ) steps. Proposition 2.
Given Q ( i ) the probabilities Pr [ FC ( ξ n , j ) and ξ n ( B i + k ) = (cid:96) ] for all valuesof (cid:96) ∈ { i + k, . . . , n } and j ∈ { i, . . . , i + k − } can be computed in O ( k log k + nk ) time.Proof. We first note that for every j ≥
1, by Eq. (36),FC( ξ n , j ) = ∀ (cid:96) < j − f ( B (cid:96) ) ≥ (cid:96) and f ( B j − ) = f ( B j ) = j − . (39)By the chain rule we have,Pr [FC( ξ n , j )] (40)= Pr [ ∀ (cid:96) < j − f ( B (cid:96) ) ≥ (cid:96) and f ( B j − ) = j − · Pr [ f ( B j ) = j − | f ( B j − ) = j − Q ( j − , j − · Pr [Pois ( n ( B j − B j − )) = 0] . (41)From the definition of FC, if FC( ξ n , j ) then ξ n ( B j ) = j −
1, hencePr [ ξ n ( B i + k ) = (cid:96) | FC( ξ n , j )] = Pr [ ξ n ( B i + k ) = (cid:96) | ξ n ( B j ) = j −
1] (42)= Pr [Pois( n ( B i + k − B j )) = (cid:96) − ( j − . (43)Putting it all together, we havePr [FC( ξ n , j ) and ξ n ( B i + k ) = (cid:96) ] = Pr [FC( ξ n , j )] · Pr [ ξ n ( B i + k ) = (cid:96) | FC( ξ n , j )] (44)= Q ( j − , j − · Pr [Pois ( n ( B j − B j − )) = 0] · Pr [Pois( n ( B i + k − B j )) = (cid:96) − j + 1] . Given Q , evaluating this probability for all j ∈ { i, . . . , i + k − } and all (cid:96) ∈ { i + k, . . . , n } takes a total of O ( nk ) time. As for the computation of Q ( j − , j −
1) for all j ∈ { i, . . . , i + k − } , note that Q ( j, (cid:96) ) for all j, (cid:96) ∈ { i − , . . . , i + k − } is a k × k subarray that can becomputed in time O ( k log k ) using the FFT-based algorithm described in Section 2.2. Proposition 3.
Given Q ( i ) , we can compute Q ( i + k ) in O ( n log n + nk + k log k ) time.Proof. If NC( f, i ) holds then either NC( f, i + k ) or FC( f, j ) for exactly one j ∈ { i, . . . , i + k − } . Hence for a Poisson process ξ n ( t )Pr [NC( ξ n , i )] = Pr [NC( ξ n , i + k )] + i + k − (cid:88) j = i Pr [FC( ξ n , j )] . ξ n ( B i + k ) = (cid:96) . Adding this constraint andsubtracting the sum from both sides, we have,Pr [NC( ξ n , i + k ) and ξ n ( B i + k ) = (cid:96) ]= Pr [NC( ξ n , i ) and ξ n ( B i + k ) = (cid:96) ] (cid:124) (cid:123)(cid:122) (cid:125) ( ∗ ) − i + k − (cid:88) j = i Pr [FC( ξ n , j ) and ξ n ( B i + k ) = (cid:96) ] (cid:124) (cid:123)(cid:122) (cid:125) ( ∗∗ ) (45)By Proposition 1 all of the probabilities (*) can be computed in time O ( n log n ) and byProposition 2 the probabilities (**) can be computed in time O ( nk + k log k ). EvaluatingEq. (45) costs merely O ( nk ). The total running time of computing Q ( i + k ) given Q ( i ) istherefore O ( n log n + nk + k log k ).We can now put it all together. Starting from Q (0) = (1 , , , . . . , Q ( k ) and then compute Q (2 k ) , Q (3 k ) , etc. Until we reach Q ( n +1) . By Proposition 3, each of thesesteps takes O ( n log n + nk + k log k ) time. Thus the total running time is (cid:6) n +1 k (cid:7) O (cid:0) n log n + nk + k log k (cid:1) = O (cid:16) n log nk + n + nk log k (cid:17) . (46)For any choice of k ∈ (log n, n/ log n ), the running time is O ( n ). In this section we test the running time and accuracy of our method. The applicationchosen here is the computation of p -values for the M + n one-sided statistic of Berk andJones (1979). This statistic has the minimum form (analogous to (3)), M + n := min i =1 ,...,n s i ( F ( x ( i ) )) (47)where s i ( u ) := Pr [Beta( i, n − i + 1) < u ] = n !( i − n − i )! (cid:90) u t i − (1 − t ) n − i dt. (48)For each sample size n , we first compute an α -level threshold s n,α with the bisection methoddescribed in Section 1.1.2 for α = 5%. We then compute the probability that Pr [ M + n < s n,α ]using each of the following methods: • KS (2001): the O ( n ) two-sided algorithm of Khmaladze and Shinjikashvili (2001),described in Section 2.2. • MNS (2016): the O ( n ) one-sided algorithm of Moscovich et al. (2016) mentionedin Section 1.2. 11 MN (2017): the O ( n log n ) two-sided algorithm of Moscovich and Nadler (2017),described in Section 2.3. • New: the O ( n ) one-sided algorithm described in this paper.Figure 2 shows the running times for sample sizes n = 5000 , , . . . , , KS (2001)/MN (2017)/New all produce the sameresults in the tested range, with relative errors less than 10 − using standard double-precision (64-bit) floating-point numbers. In contrast, MNS (2016) is only accurate upto about n = 30 , n = 35 ,
000 it produces a relative error of 7%and for n > ,
000 it breaks down completely. Therefore, we did not test the running timeof
MNS (2016) for sample sizes larger than 30 , α -level threshold for n = 100 ,
000 with α = 5%. This figure does notshow benchmarks for KS (2001) due to its excessive running time for large sample sizes.The relative difference between the boundary-crossing probabilities computed using
MN(2017) and
New is small throughout the tested range, with a maximum value of 10 − forthe sample size n = 700 , SUPPLEMENTARY MATERIAL crossprob:
C++ code for the fast computation of one-sided and two-sided boundary cross-ing probabilities of the form (1) and (16). This code compiles into a command-line program and a Python extension module. The implemented algorithms are
KS(2001) (Khmaladze and Shinjikashvili, 2001),
MNS (2016) (Moscovich et al., 2016),
MN (2017) (Moscovich and Nadler, 2017), and the method
New proposed in thispaper. (GNU zipped tar file) benchmarks:
Python code for running the benchmarks and producing Figures 2 and 3(GNU zipped tar file) 12 , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , n10 − − R un t i m e [ s ec ] KS (2001)MNS (2016)MN (2017)NewFigure 2: Running times for computing the p -value of a one-sided goodness-of-fit statistic.Times shown are best out of 3 runs. Note the logarithmic y axis.13 , , , , , , , , , , , , , , , , , , , , , , n10 R un t i m e [ s ec ] MN (2017)NewFigure 3: Large scale benchmarks for computing the p -value of a one-sided goodness-of-fitstatistic. Note the logarithmic y axis. Large scale benchmarks of KS (2001) were notperformed due to excessive running times. 14
Appendix: reduction of the continuous boundarycrossing problem to a discrete set of inequalities
We first note that in the continuous boundary-crossing problem (2), the boundary function b ( t ) may be assumed to be monotone non-decreasing without loss of generality. This is adirect consequence of the following result, Proposition 4.
Let b max ( t ) := max u ∈ [0 ,t ] b ( u ) . Then:(i) b max is a non-decreasing function.(ii) ∀ t ∈ [0 ,
1] : b ( t ) ≤ n ˆ F n ( t ) ⇐⇒ ∀ t ∈ [0 ,
1] : b max ( t ) ≤ n ˆ F n ( t ) . Proof. (i) Trivial. (ii) For the ⇐ = direction, we simply note that b ≤ b max and therefore b ≤ b max ≤ n ˆ F n . The = ⇒ direction follows from the monotonicity of ˆ F n , b max ( t ) = max u ∈ [0 ,t ] b ( u ) ≤ max u ∈ [0 ,t ] n ˆ F n ( u ) = n ˆ F n ( t ) . (49)Now, let X , . . . , X n i.i.d. ∼ U [0 ,
1] be a sample and let ˆ F n ( t ) = n (cid:80) i X i ≤ t ) be itsempirical CDF. Consider the boundary crossing probabilityPr (cid:104) ∀ t ∈ [0 ,
1] : b ( t ) ≤ n ˆ F n ( t ) (cid:105) (50)where the boundary function b ( t ) is now assumed to be monotone non-decreasing. Wedefine a series of first integer passage times, B i := inf { t ∈ [0 ,
1] : b ( t ) > i − } i = 1 , . . . , (cid:100) b (1) (cid:101) , (51)where (cid:100) x (cid:101) is the smallest integer greater than or equal to x . The following lemma holdsthe key observation that allows one to replace the infinite set of inequality constraints ∀ t ∈ [0 ,
1] : b ( t ) ≤ n ˆ F n ( t ) with a finite set of inequalities. Lemma 2.
Let f : [0 , → { , , , . . . } be a nondecreasing step function.(i) If f ( B i ) ≥ i for all i = 1 , . . . , k then b ( t ) ≤ f ( t ) for all t ∈ [0 , B k ) .(ii) If f ( B i ) ≥ i for all i = 1 , . . . , (cid:100) b (1) (cid:101) then b ( t ) ≤ f ( t ) for all t ∈ [0 , .Proof. art (i) Divide the interval [0 , B k ) into a disjoint union,[0 , B k ) = [0 , B ) ∪ [ B , B ) ∪ . . . ∪ [ B k − , B k ) . (52)We now prove that b ≤ f in each interval separately:1. [0 , B ): By the definition of B , for all t < B we have b ( t ) ≤ f is anon-negative function it follows that b ( t ) ≤ ≤ f ( t ).2. ( B i , B i +1 ): By the definition of B i and the assumption that b ( t ) is non-decreasing wehave that if t > B i then b ( t ) > i − t < B i +1 then b ( t ) ≤ i . It follows that forall t ∈ ( B i , B i +1 ), b ( t ) ≤ i ≤ f ( B i ) ≤ f ( t ) (53)where the last inequality is due to the monotonicity of f .3. The points { B , . . . , B k − } : In the union (52) it is enough to only consider the non-empty intervals. Taking an intermediate point t ∈ ( B i , B i +1 ), using the monotonicityof b and applying Eq. (53), we obtain b ( B i ) ≤ b ( t ) ≤ f ( B i ) . (54) Part (ii):
Define B = 0. Since[0 ,
1] = (cid:91) i =0 ,..., (cid:100) b (1) (cid:101)− [ B i , B i +1 ) ∪ (cid:2) B (cid:100) b (1) (cid:101) , (cid:3) , (55)all that remains is to show that b ( t ) ≤ f ( t ) in the interval (cid:2) B (cid:98) b (1) (cid:99) +1 , (cid:3) . Using the mono-tonicity of b and f and the identity x ≤ (cid:100) x (cid:101) , b ( t ) ≤ b (1) < (cid:100) b (1) (cid:101) ≤ f ( B (cid:100) b (1) (cid:101) ) ≤ f ( t ) . (56)The converse to Lemma 2.(ii) is not true. Consider for example the functions f ( t ) = b ( t ) = (cid:40) t ≤ . t > . B = 0 .
5, it is true that b ( t ) ≤ f ( t ) everywhere but f ( B ) (cid:3)
1. Hence,Lemma 2.(ii) cannot be made into an if-and-only-if statement. However, the followinglemma demonstrates that for the random processes we are interested in, the probability ofhaving b ( t ) ≤ f ( t ) without f ( B i ) ≥ i holding is zero.16 emma 3. For any counting process f : [0 , → { , , . . . } with a finite rate, Pr [ ∀ t : b ( t ) ≤ f ( t ) and ¬ ( ∀ i : f ( B i ) ≥ i )] = 0 (58) Proof.
By Lemma 2.(ii) if ∀ i : f ( B i ) ≥ i then ∀ t : b ( t ) ≤ f ( t ). Hence,Pr [ ∀ i : f ( B i ) ≥ i ] ≤ Pr [ ∀ t : b ( t ) ≤ f ( t )] . (59)Let i ∈ { , . . . (cid:98) b (1) (cid:99) + 1 } . We need to show that the probability of having both b ≤ f and f ( B i ) < i is zero. Since f is a counting process f ( B i ) < i if and only if f ( B i ) ≤ i − f ( B i ) ≤ i − (cid:15) > f ( B i + (cid:15) ) = f ( B i ) ≤ i − B i in Eq. (51) we must have b ( B i + (cid:15) ) > i − f ( B i ) = f ( B i + (cid:15) ) ≤ i − < b ( B i + (cid:15) ) . (60)In contradiction to the assumption that b ≤ f everywhere. We have thus demonstratedthat for a given index i Pr [ ∀ t : b ( t ) ≤ f ( t ) and f ( B i ) < i ] = 0 . (61)Equation (58) follows from (59) and (61).The following is an immediate consequence of Lemma 2.(ii) and Lemma 3. Corollary 1 (reduction from the continuous to the discrete problem) . For any countingprocess f : [0 , → { , , . . . } with a finite rate, Pr [ b ≤ f ] = Pr [ ∀ i : f ( B i ) ≥ i ] = Pr (cid:2) ∀ i : X ( i ) ≤ B i (cid:3) . (62)Hence the problem of computing the non-crossing probability in Eq. (2), is reduced tothe probability that the inequalities f ( B i ) ≥ i hold at a finite set of first-integer-crossingtimes { B , B , . . . , B (cid:98) b (1) (cid:99) +1 } where B i were defined in Eq. (51).The reduction can also be made in the other direction, from the calculation of thediscrete boundary crossing probability (1) to the continuous boundary crossing (2). Corollary 2 (reduction from the discrete to the continuous problem) . Let X , . . . , X n i.i.d. ∼ U [0 , and let B , . . . , B n be a set of upper bounds in the discrete boundary crossing prob-ability (1) . Define their cumulative function as b ( t ) = (cid:80) i B i < t ) then Pr (cid:2) ∀ i : X ( i ) ≤ B i (cid:3) = Pr (cid:104) ∀ t : b ≤ n ˆ F n (cid:105) (63) where ˆ F n is the empirical CDF of X , . . . , X n .Proof. Note that by the construction of b ( t ), for all i , B i = inf { t : b ( t ) > i − } . (64)This coincides with the definition of B i in Eq. (51). Equation (62) follows.17 Appendix: benchmark and implementation details
All four methods compared in 2 were implemented in C++, compiled in clang
FFTW
KS (2001) and
MN (2017) that specifically handles consecutive lower bounds that satisfy b i +1 = b i asa special case (see Eq. (16)). This makes the two methods more efficient for one-sidedboundary computations, since in this case there are only non-trivial upper bounds B i whilethe lower bounds b i are all implicitly set to zero. This, in addition to several other technicalcode optimizations and the improvement in processor speed resulted in an 8-fold decreaseto the running time of MN (2017) in the one-sided boundary case, compared to a similarbenchmark in (Moscovich and Nadler, 2017).Our proposed algorithm has a configurable jump size parameter k . The entire range k ∈ [log n, n/ log n ] gives asymptotically optimal results at O ( n ). To get a ballpark estimatefor the optimal value of k we set k ( x ) = x √ n and minimized the asymptotic runtime in Eq.(46). The resulting minimizer is k = √ n . However, the setting used in the benchmarkswas k = √ n as this was empirically found to be faster. References
Arias-Castro, E., Cand`es, E. J., and Plan, Y. (2011). Global testing under sparse alter-natives: ANOVA, multiple comparisons and the higher criticism.
Annals of Statistics ,39(5):2533–2556. doi: .Arias-Castro, E., Huang, R., and Verzelen, N. (2020). Detection of sparse positive depen-dence.
Electronic Journal of Statistics , 14(1):702–730. doi: .Barnett, I., Mukherjee, R., and Lin, X. (2017). The Generalized Higher Criticism for Test-ing SNP-Set Effects in Genetic Association Studies.
Journal of the American StatisticalAssociation , 112(517):64–76. doi: .Bentley, J. L. and Yao, A. C.-C. (1976). An almost optimal algorithm for unbounded search-ing.
Information Processing Letters , 5(3):82–87. doi: .Berk, R. H. and Jones, D. H. (1979). Goodness-of-fit test statistics that dominate theKolmogorov statistics.
Zeitschrift f¨ur Wahrscheinlichkeitstheorie und Verwandte Gebiete ,47(1):47–59. doi: .Brown, J. R. and Harvey, M. E. (2008a). Arbitrary Precision Mathematica Functions toEvaluate the One-Sided One Sample K-S Cumulative Sampling Distribution.
Journal ofStatistical Software , 26(3):128–129. doi: .18rown, J. R. and Harvey, M. E. (2008b). Rational Arithmetic Mathematica Functions toEvaluate the Two-Sided One Sample K-S Cumulative Sampling Distribution.
Journal ofStatistical Software , 26(2):1–40. doi: .Denuit, M., Lef`evre, C., and Picard, P. (2003). Polynomial structures in order statis-tics distributions.
Journal of Statistical Planning and Inference , 113(1):151–178.doi: .Dimitrova, D., Ignatov, Z., and Kaishev, V. (2017). On the First Crossing of Two Bound-aries by an Order Statistics Risk Process.
Risks , 5(3):43. doi: .Dimitrova, D. S., Ignatov, Z. G., Kaishev, V. K., and Tan, S. (2020). On double-boundarynon-crossing probability for a class of compound processes with applications.
EuropeanJournal of Operational Research , 282(2):602–613. doi: .Ding, A. A., Zhang, L., Durvaux, F., Standaert, F.-X., and Fei, Y. (2018). TowardsSound and Optimal Leakage Detection Procedure. In
Lecture Notes in Computer Science ,volume 10728 LNCS, pages 105–122. doi: .Dongchu, S. (1998). Exact computation for some sequential tests.
Sequential Analysis ,17(2):127–150. doi: .Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures.
The Annals of Statistics , 32(3):962–994. doi: .Durbin, J. (1971). Boundary-crossing probabilities for the Brownian motion and Pois-son processes and techniques for computing the power of the Kolmogorov-Smirnov test.
Journal of Applied Probability , 8(3):431–453. doi: .Durbin, J. (1973).
Distribution Theory for Tests Based on the Sample DistributionFunction . Society for Industrial and Applied Mathematics. ISBN 978-0-89871-007-6,doi: .Eicker, F. (1979). The Asymptotic Distribution of the Suprema of the Standardized Em-pirical Processes.
The Annals of Statistics , 7(1):116–138. doi: .Epanechnikov, V. A. (1968). The Significance Level and Power of the Two-Sided Kol-mogorov Test in the Case of Small Sample Sizes.
Theory of Probability & Its Applications ,13(4):686–690. doi: .Finner, H. and Gontscharuk, V. (2018). Two-sample KolmogorovSmirnov-type tests revis-ited: Old and new tests in terms of local levels.
The Annals of Statistics , 46(6A):3014–3037. doi: . 19rey, J. (2008). Optimal distribution-free confidence bands for a distribu-tion function.
Journal of Statistical Planning and Inference , 138(10):3086–3098.doi: .Friedrich, T. and Schellhaas, H. (1998). Computation of the percentage points andthe power for the two-sided Kolmogorov-Smirnov one sample test.
Statistical Papers ,39(4):361–375. doi: .Frigo, M. and Johnson, S. (2005). The Design and Implementation of FFTW3.
Proceedingsof the IEEE , 93(2):216–231. doi: .Goffard, P.-O. (2019). Two-Sided Exit Problems in the Ordered Risk Model.
Methodologyand Computing in Applied Probability , 21(2):539–549. doi: .Goldman, M. and Kaplan, D. M. (2018). Comparing distributions by multiple test-ing across quantiles or CDF values.
Journal of Econometrics , 206(1):143–166.doi: .Gontscharuk, V., Landwehr, S., and Finner, H. (2015). The intermediates take it all:Asymptotics of higher criticism statistics and a powerful alternative based on equal locallevels.
Biometrical Journal , 57(1):159–180. doi: .Hall, P. and Jin, J. (2010). Innovated higher criticism for detecting sparse signals incorrelated noise.
Annals of Statistics , 38(3):1686–1732. doi: .Jaeschke, D. (1979). The Asymptotic Distribution of the Supremum of the StandardizedEmpirical Distribution Function on Subintervals.
The Annals of Statistics , 7(1):108–115.doi: .Jager, L. and Wellner, J. A. (2004). On the “Poisson boundaries” of the family of weightedKolmogorov statistics. volume 45 of
Lecture Notes–Monograph Series , pages 319–331.Institute of Mathematical Statistics, doi: .Jager, L. and Wellner, J. A. (2005). A new goodness of fit test: the reversed Berk-Jonesstatistic. Technical report.Jager, L. and Wellner, J. A. (2007). Goodness-of-fit tests via phi-divergences.
The Annalsof Statistics , 35(5):2018–2053. doi: .Khmaladze, E. and Shinjikashvili, E. (2001). Calculation of noncrossing probabilities forPoisson processes and its corollaries.
Advances in Applied Probability , 33(3):702–716.doi: .Kolmogorov, A. N. (1933). Sulla determinazione empirica di una legge di distribuzione.
Giornale dell’Istituto Italiano degli Attuari , 4(1):83–91.20i, J. and Siegmund, D. (2015). Higher criticism: p -values and criticism. The Annals ofStatistics , 43(3):1323–1350. doi: .Mason, D. M. and Schuenemeyer, J. H. (1983). A Modified Kolmogorov-SmirnovTest Sensitive to Tail Alternatives.
The Annals of Statistics , 11(3):933–946.doi: .Matthews, D. (2013). Exact Nonparametric Confidence Bands for the Survivor Function.
The International Journal of Biostatistics , 9(2):185–204. doi: .Meinshausen, N. and Rice, J. (2006). Estimating the proportion of false null hypothesesamong a large number of independently tested hypotheses.
The Annals of Statistics ,34(1):373–393. doi: .Moscovich, A. and Nadler, B. (2017). Fast calculation of boundary crossingprobabilities for Poisson processes.
Statistics & Probability Letters , 123:177–182.doi: .Moscovich, A., Nadler, B., and Spiegelman, C. (2016). On the exact Berk-Jones statis-tics and their p -value calculation. Electronic Journal of Statistics , 10(2):2329–2354.doi: .No´e, M. (1972). The Calculation of Distributions of Two-Sided Kolmogorov-Smirnov Type Statistics.
The Annals of Mathematical Statistics , 43(1):58–64.doi: .No´e, M. and Vandewiele, G. (1968). The Calculation of Distributions of Kolmogorov-Smirnov Type Statistics Including a Table of Significance Points for a Particular Case.
The Annals of Mathematical Statistics , 39(1):233–241. doi: .Owen, A. B. (1995). Nonparametric Likelihood Confidence Bands for a Distribution Func-tion.
Journal of the American Statistical Association , 90(430):516. doi: .Porter, T. and Stewart, M. (2020). Beyond HC: More sensitive tests for rare/weak alter-natives.
Annals of Statistics , 48(4):2230–2252. doi: .Press, W. H., Flannery, B. P., Teukolsky, S. A., and T., V. W. (1992).
Numerical recipesin C: the art of scientific computing . Cambridge University Press, 2nd edition.R´enyi, A. (1953). On the theory of order statistics.
Acta Mathematica Academiae Scien-tiarum Hungaricae , 4(3-4):191–231. doi: .Roquain, E. and Villers, F. (2011). Exact calculations for false discovery proportion withapplication to least favorable configurations.
The Annals of Statistics , 39(1):584–612.doi: . 21abatti et al., C. (2009). Genome-wide association analysis of metabolic traits in a birthcohort from a founder population.
Nature Genetics , 41(1):35–46. doi: .Shorack, G. R. and Wellner, J. A. (2009).
Empirical Processes with Applications toStatistics . Society for Industrial and Applied Mathematics. ISBN 978-0-89871-684-9,doi: .Steck, G. P. (1971). Rectangle Probabilities for Uniform Order statistics and the ProbabilityThat the Empirical Distribution Function Lies Between Two Distribution Functions.
TheAnnals of Mathematical Statistics , 42(1):1–11. doi: .Sulis, S., Mary, D., and Bigot, L. (2017). A Study of Periodograms Standardized UsingTraining Datasets and Application to Exoplanet Detection.
IEEE Transactions on SignalProcessing , 65(8):2136–2150. doi: .Sun, R. and Lin, X. (2019). Genetic Variant Set-Based Tests Using the General-ized Berk-Jones Statistic With Application to a Genome-Wide Association Studyof Breast Cancer.
Journal of the American Statistical Association , 0(0):1–13.doi: .von Schroeder, J. and Dickhaus, T. (2020). Efficient calculation of the joint distri-bution of order statistics.
Computational Statistics & Data Analysis , 144:106899.doi: .Wald, A. and Wolfowitz, J. (1939). Confidence Limits for Continuous Dis-tribution Functions.
The Annals of Mathematical Statistics , 10(2):105–118.doi: .Wellner, J. A. and Koltchinskii, V. (2003). A note on the asymptotic distribution of Berk-Jones type statistics under the null hypothesis. In
High Dimensional Probability III ,pages 321–332. Birkh¨auser Basel, Basel, doi: .Worsley, K. J. (1986). Confidence Regions and Tests for a Change-Point in a Sequence ofExponential Family Random Variables.
Biometrika , 73(1):91. doi: .Zhang, H., Jin, J., and Wu, Z. (2020). Distributions and Power of Optimal Signal-Detection Statistics in Finite Case.
IEEE Transactions on Signal Processing , 68:1021–1033. doi:10.1109/TSP.2020.2967179