[PDF] A Sequential Density-Based Empirical Likelihood Ratio Test for Treatment Effects

Abstract

In health-related experiments, treatment effects can be identified using paired data that consist of pre- and post-treatment measurements. In this framework, sequential testing strategies are widely accepted statistical tools in practice. Since performances of parametric sequential testing procedures vitally depend on the validity of the parametric assumptions regarding underlying data distributions, we focus on distribution-free mechanisms for sequentially evaluating treatment effects. In fixed sample size designs, the density-based empirical likelihood (DBEL) methods provide powerful nonparametric approximations to optimal Neyman-Pearson type statistics. In this article, we extend the DBEL methodology to develop a novel sequential DBEL testing procedure for detecting treatment effects based on paired data. The asymptotic consistency of the proposed test is shown. An extensive Monte Carlo study confirms that the proposed test outperforms the conventional sequential Wilcoxon signed-rank test across a variety of alternatives. The excellent applicability of the proposed method is exemplified using the ventilator-associated pneumonia study that evaluates the effect of Chlorhexidine Gluconate treatment in reducing oral colonization by pathogens in ventilated patients.

Full PDF

11 Department of Statistics and Biostatistics, California State University, East Bay, Hayward, CA 94542 Department of Biostatistics, The State University of New York at Buffalo, Buffalo, NY 14214 Correspondence to : Albert Vexler, Department of Biostatistics, The State University of New York at Buffalo E-mail: [email protected]

A Sequential Density-Based Empirical Likelihood Ratio Test for Treatment Effects

Li Zou , Albert Vexler , Jihnhee Yu , and Hongzhi Wan In health-related experiments, treatment effects can be identified using paired data that consist of pre- and post-treatment measurements. In this framework, sequential testing strategies are widely accepted statistical tools in practice. Since performances of parametric sequential testing procedures vitally depend on the validity of the parametric assumptions regarding underlying data distributions, we focus on distribution-free mechanisms for sequentially evaluating treatment effects. In fixed sample size designs, the density-based empirical likelihood (DBEL) methods provide powerful nonparametric approximations to optimal Neyman-Pearson type statistics. In this article, we extend the DBEL methodology to develop a novel sequential DBEL testing procedure for detecting treatment effects based on paired data. The asymptotic consistency of the proposed test is shown. An extensive Monte Carlo study confirms that the proposed test outperforms the conventional sequential Wilcoxon signed-rank test across a variety of alternatives. The excellent applicability of the proposed method is exemplified using the ventilator-associated pneumonia study that evaluates the effect of Chlorhexidine Gluconate treatment in reducing oral colonization by pathogens in ventilated patients.

Keywords : Empirical likelihood; Density-based empirical likelihood; Entropy; Likelihood ratio; Paired data; Sequential signed-rank test; Treatment effect; Ventilator-associated pneumonia. INTRODUCTION

In health-related studies, in order to test for treatment effects oftentimes investigators sequentially collect paired data that consist of pre- and post-treatment measurements. The proposed method in this article is motivated by the following example. The oral cavity, especially dental plaque biofilms, may be colonized by potential respiratory pathogens (PRPs) in mechanically-ventilated (MV), intensive care unit (ICU) patients. Thus, by improving oral hygiene for the MV-ICU patients, we may prevent ventilator-associated pneumonia (VAP). One of the primary goals of the VAP study was to evaluate the effect of Chlorhexidine Gluconate (CHX) treatment, a cationic chlorophenyl bis-biguanide antiseptic, in reducing oral colonization by pathogens in MV-ICU patients. The trial sequentially enrolled ventilated patients who were admitted to a trauma ICU of the Erie County Medical Center (ECMC). During this study, pre- and post-CHX treatment measurements of amount of aggregated bacteria (S. aureus, P. aeruginosa, Acinetobacter sps.) and enteric organisms (Klebsiella pneumoniae, Serratia marcescens, Enterobacter sps., Proteus mirabilis, E. coli) were recorded from each patient. We aim to develop a novel sequential methodology that requires substantially fewer numbers of patients to make a conclusion regarding CHX treatment effect based on the data from the VAP study. It is desirable for investigators to detect treatment effects as early as possible, since the following two reasons are in effect: 1) The ethical reason: the ongoing clinical trials can have early termination when the significant superiority of the new therapy is statistically proven. In this case, stopping early enables other patients to begin receiving the superior treatment sooner.

2) The efficiency reason: early termination of a trial yields savings in sample size. Thus, the sequential testing strategies can save resources and time. In these contexts, sequential statistical methods are important and frequently employed tools in practice. The statistical literature has dealt extensively with both the theoretical and applied aspects of various sequential statistical designs.

Commonly, to implement sequential statistical procedures, parametric assumptions regarding the underlying data distribution are stated. Performances of parametric sequential testing procedures strongly depend on the assessment of the parametric assumptions of data distributions. Retrospective studies are generally based on already collected datasets or combining existing pieces of data. In contrast with the analysis of data obtained retrospectively, we have the following problems related to sequential analysis. First, it is difficult to specify the parametric distribution form of the underlying data before data points are observed. Second, even if we have strong reasons to assume the parametric form of the data distribution, it will be extremely difficult, for example, to test the corresponding parametric assumptions after the execution of sequential procedures. Since sequential tests are assumed to be based on random number of observations, data obtained after sequential analyses cannot be evaluated for goodness-of-fit using the conventional retrospective tests. Toward this end, in this article we focus on an efficient nonparametric sequential approach. The modern theory of sequential analysis originates from the researches of Barnard and Wald . In particular, the method of the sequential probability ratio test (SPRT) has been the predominant influence of the subsequent developments in the area. Note that since analytical forms of underlying data distributions are not completely specified, one cannot employ the most powerful SPRT testing strategy via the Neyman-Pearson concept. As an alternative, Miller proposed a nonparametric sequential signed-rank test (SSRT), employing a Wilcoxon type test statistic. Wilcoxon type tests are commonly used to detect treatment effects based on paired data in fixed sample size designs. In accordance with the repeated significance test, the Miller’s approach fixes the total number of data points to be N and consists of a series of independent observations. The SSRT is performed after each data point, and stops either when the maximum sample size is N or the test statistic rejects the null hypothesis. In the settings of the SSRT, we introduce a simple sequential test with high and stable power for detecting treatment effects over a broad spectrum of alternatives based on paired data. Oppegaard et al. recently applied a two-sample sequential Wilcoxon test in order to evaluate the difference in preoperative cervical dilation before hysteroscopy between postmenopausal women who receive vaginal misoprostol and postmenopausal women who receive vaginal placebo. In this article, we present a novel nonparametric sequential testing procedure based on paired data, employing the data-driven likelihood ratio principle. The proposed method uses a nonparametric testing strategy that approximates corresponding optimal parametric Neyman-Pearson type statistics. The density-based empirical likelihood (EL) approach is modified and extended to be applied in the sequential setting. The proposed technique can be employed in the context of a single-arm trial, when a sample of individuals with the specified medical condition is given the study therapy and then followed over time to measure their outcomes. The statistical literature has shown that the EL methodology is a very powerful inference tool in various nonparametric settings. The conventional EL concept is outlined in Section 2.3. To the best of our knowledge, the conventional EL concept and the density-based empirical likelihood (DBEL) methodology have not been extensively studied in the statistical sequential literature. The proposed method is distribution-free, robust to underlying model settings and highly efficient. We evaluate the performance of the new sequential DBEL method and the SSRT tests in terms of the statistical power and the average sample number (ASN). The ASN, by definition, means the expected value of the sample size for making decisions. The proposed test demonstrates significantly higher power and smaller ASN than those of the SSRT test across a variety of alternatives treated in an extensive MC study. In this article, we establish the asymptotic consistency of the proposed test. It is clear that in general nonparametric settings there are no most powerful statistical mechanisms. Thus, it is very important to consider various reasonable and efficient distribution-free schemes in the framework of sequential analysis. This article is organized as follows. Section 2 outlines the classical SSRT test and introduces the proposed method. The development of the new test statistic is presented and its asymptotic properties are derived. In Section 3, an extensive MC study is conducted to evaluate the proposed method. In Section 4, the applicability of the proposed method is illustrated via a clinical trial study related to VAP. In this study, we detect treatment effect of CHX in ventilated patients by using the proposed method, while the SSRT test fails to show the corresponding statistical significance. In Section 5, we provide concluding remarks. The online supplementary material consists of the proof of the theoretical results presented in this article, the R-codes to implement the proposed method and the scatterplot that depicts the data considered in Section 4.

2. SEQUENTIAL TESTING

METHODS

In this section, we formalize the statement of the problem, describe the conventional methodology and introduce the novel approach. Let ( ) , i i X Y , i =1,2,…, denote sequentially surveyed independent and identically distributed ( i.i.d.) pairs of observations within a subject i , where i X represents the pre-treatment measurement and i Y represents the post-treatment measurement. Assume the maximum number of subjects allowed in the experiment is N . Define iii YXZ −= , i N ≤ ≤ . The nonparametric statistical literature tends to associate the problem of detecting treatment effects with the problem of testing the following null hypothesis: ( ) ( ) uFuFFFH HHH −−== , for all ∞<<∞− u versus ( ) ( ) uFuFFFH HHH −−≠= , for some ∞<<∞− u , (2.1) where ( ) ( ) Pr F u Z u = ≤ is an unknown distribution function of i.i.d . , 1 i Z i ≥ . In this section, we outline the commonly used SSRT test for hypothesis (2.1) in practice. Let ni R , i =1,…, n , be the rank of i Z in Z ,…, n Z .

The SSRT test statistic is ( ) nn i nii SR I Z R = = ≥ ∑ , where ( ) I ⋅ is the indicator function. In order to sequentially test for the null hypothesis (2.1), Miller applied the following stopping rule { } , min : 1, n N n n TS z α τ = ≥ ≥ , (2.2) where ( ) ( ) ( ) ( ) n n TS SR n n n n n − = − + + + , α is the pre-specified significance level and , N z α is the critical value associated with N and α . The decision making policy consists of the following algorithm: 1) if, for the first time, for some n ( ) N ≤ , n TS exceeds , N z α , we have n τ = and the experiment is terminated at that stage along with the decision to reject H ; 2) if, no such n exists, the null hypothesis is not rejected along with the termination of the experiment at the target maximum sample size N . The critical value , N z α can be determined using the following scheme. Define { } max N nn N

W TS ≤ ≤ = . Let Pr k H be the probability measure corresponding to the hypothesis k H ( k =0, 1). Thus, by definition (2.2), , N z α is the upper α -percentile of the distribution of N W satisfying { } , Pr H N N

W z α α≥ = . Note that, under H , the distribution function of N W is data-distribution-free, since { } , Pr H N N

W z α ≤ = { } Pr ,...,

H N N N

TS z TS z α α ≤ ≤ and the joint distribution function of { } : 1 n TS n N ≤ ≤ does not depend on the underlying data distribution function of ,..., N Z Z . The structure of the statistic N W is complicated. In practice, one can use the MC methodology in order to evaluate { } , Pr H N N

W z α ≥ and compute critical values of , N z α for various choices of N and α . In this section, we develop the DBEL based method for sequentially detecting treatment effects. We begin with considerations related to a retrospective statement of the testing problem, i.e. the sample size is fixed to be n . The EL approach:

In order to outline the conventional EL concept, we assume that n U ,...,U are i.i.d. data points with mean ( )

E U . Consider the commonly used EL ratio test for the null hypothesis H : ( ) E U θ= vs. H : ( ) E U θ≠ , where θ is known. In this case, the EL function has the form ∏ = = ni i pEL , where i p ’s are the probability weights. Under H , values of i p ’s can be derived by maximizing the EL function under the empirical constraints = ∑ = ni i p and n i ii U p θ = = ∑ . Under the alternative hypothesis H , the EL function is given by n nEL − = , since ∏ = ni i p is maximized by − = np i ( i= n) , when the only constraint = ∑ = ni i p is in effect. In this framework, we reject H for large values of ( ) n ii np = − ∑ that presents ( ) ( ) { } EL under H EL under H − . This methodology is well developed when data is collected retrospectively, i.e. n is fixed. The DBEL approach:

Motivated by the well-known Neyman-Pearson lemma, Vexler and Gurevich used the EL concept to develop the distribution-free density-based EL (DBEL) methodology for approximating parametric likelihood ratio type statistics. The DBEL method proposes to consider the likelihood in the form ( ) n nf i ii i DBEL f U f = = = = ∏ ∏ , ( ) i ( i ) f f U = , where ( ) ⋅ f is a density function of n U ,...,U , and ( ) ( ) n U .... U ≤ ≤ are the order statistics based on n U ,...,U . The DBEL approach is a technique to approximate values of i f , i =1,…, n , via maximization of f DBEL given a constraint related to an empirical version of the density property ( ) ∫ = duuf . The DBEL testing approach revolves around exact test statistics which are independent of underlying data distributions under H . Recent developments of the DBEL techniques can be found in various statistical publications. Taking into account the setting of Section 2.1, in the case of completely specified forms of the density functions, we have that the likelihood ratio test statistic based on n ZZ ,...., is ( )( ) ( ) ( ) ( ) ( ) ,1 11 ,1 11 n nn H H jjH i j jin nnH i H jH ji jj f Z ff ZLR f Z ff Z = === == = = = ∏ ∏∏∏ ∏∏ , where ( ) uf k H denotes the density function of Z under k H ( k =0,1) and ( ) ( ) n ZZ ≤⋅⋅⋅≤ are the order statistics based on observations Z ,…, n Z . Since in practice the data distributions are unknown, the DBEL approach focuses on approximating the values of ( ) ( ) jH Zf k ( k =0,1 and j =1,…, n ) via maximizing the likelihood ( ) ( ) ∏ = nj jH Zf k provided that ( ) ( ) ( ) ( ) k k H Hj n f Z ,..., f Z satisfy an empirical constraint that corresponds to ( ) ∫ = duuf k H under k H . To formalize this constraint, we specify, ( ) (1) r Z Z = if r ≤

1, and ( ) ( ) r n

Z Z = if r ≥ n , and employ the result ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( 1) ( 1)1 1 1 1( ) (1) ( ) ( ) j m n n i ij m n i i n m iZ Z Z ZH H H HZ Z Z Zj i m if u du f u du f u du f u dum m + − + +− − −= = −  = − +   ∑ ∑∫ ∫ ∫ ∫ for all integer nm ≤ . (2.3) (See Proposition 2.1 of Vexler and Gurevich. ) Since ( ) ( ) ( ) 1 1(1) n Z H HZ f u du f u du +∞−∞ ≤ = ∫ ∫

Equation (2.3) implies the inequality ( ) ( ) ( ) 1( ) j mj m Zn Hj Z m f u du +− = ≤ ∑ ∫ . In this case, one can expect that ( ) ( ) ( ) 1( ) j mj m Zn Hj Z m f u du +− = ≈ ∑ ∫ when / 0 m n → , as , m n → ∞ . Let ˆ H F and ˆ H F denote estimators of H F and H F , respectively. Using the approximate analog to the mean-value integration theorem, one can derive the following empirical approximations ( ) ( )( ) ( ) ( ) ( ) ( ) ( )1 11 0 0( ) ( ) ( )0 0 ,1 1 1 , , j m j m j mj m j m j m n n nZ Z ZH H jH H HZ Z Zj j jH H j f u ff u du f u du f u duf u f + + +− − − = = = = ≈ ∑ ∑ ∑∫ ∫ ∫ ( ) ( ) ( ) , ( ) ( )1 , , n H j H j m H j mj H j f F Z F Zf + −= ≈ − ∑ ( ) ( ) ( ) ( ) 1 1 1(1) ( ) (1) ˆ ˆ , n Z H H n HZ f u du F Z F Z ≈ − ∫ and ( ) ( ) ( 1) ( 1)1 1( ) ( ) n i in i i Z ZH HZ Z f u du f u du − + +−  +   ∫ ∫ ( ) ( ) ( ) ( ) ( ) ( ) ( 1) ( ) ( 1) ( ) ˆ ˆ ˆ ˆ

H n i H n i H i H i

F Z F Z F Z F Z − + − + ≈ − + − . Defining ˆ H F as the empirical distribution function, then we obtain the empirical version of (2.3) ( ) ( ) ( ) ( ) , ( ) ( )1 ,

11 1ˆ ˆ 1 .2 2 n H j H j m H j mj H j f mF Z F Zm f n n + −= − − = − −   ∑ In order to define ˆ H F , we apply the distribution-free estimation for a symmetric distribution proposed by Schuster. Thus, we obtain the empirical constraint ( ) ,1 ,

11 1 ,2 2 n H j jmj H j f mm f n = +∆ = − ∑ (2.4) where ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ){ } ∑ = −−++− ≤−−≤−≤−+≤=∆ ni mjimjimjimjijm ZZIZZIZZIZZIn . In order to find values of , H j f that maximize the log-likelihood ( ) ,1 log n H jj f = ∑ subject to constraint (2.4), we derive , H i f ∂ ∂ , i =1,…, n , from the Lagrange function ( )

11 0 ,,1 1 , n n H jH j jmj j H j fmf n m f λ λ = =  +Λ = + − − ∆    ∑ ∑ where λ is the Lagrange multiplier. Then it is clear that the equation , H j f λ ∂Λ ∂ = provides ( ) ( ) ( )

1, ,

H j H j jm m m nf f n − − += ∆ j =1,…, n . This implies that the empirical maximum likelihood approximation to the likelihood ,1 n H jj f = ∏ , under alternative hypothesis H , can be presented as ( ) ( ) ( ) n H jj jm m m nf n −= − +∆ ∏ Therefore, the approximate LR has the empirical form ( ) ( ) ( ) nnm j jm m m nV n −= − += ∆ ∏ (2.5) Note that the test statistic (2.5) has a structure similar to those of statistics based on sample entropy that are known to have asymptotic optimal properties. The performance of the statistic nm V strongly depends on the unknown value of the integer parameter m . In order to eliminate the dependence on the parameter m , we use the maximum likelihood based methods shown in Vexler and Gurevich and Vexler et al. Then we employ the maximum likelihood principle to propose the test statistic ( ) ( ) ( ) ( ) ( ) ( ) min 2 1 1 2 nn jmja n m b n V m m n n −=≤ ≤ = − + ∆ ∏ , (2.6) where ( ) δ+ = nna , ( ) ( ) min , / 2 b n n n δ− = , and ( ) ∈δ . Following the DBEL literature, we choose δ = in practice and define n jm /1 =∆ , if =∆ jm . The sequential DBEL approach:

Finally, to sequentially test hypothesis (2.1), we define the following stopping rule ( ) { } min : 1, log n N n n V c α τ = ≥ ≥ , (2.7) where n V is defined in (2.6), α is the significance level and , N c α is the critical value associated with N and α satisfying ( ) Pr H N τ α≤ = . Note that we set ( ) log 0 n V = , for n =1, 2, 3, since n ≥ is required to compute the statistic ( ) log n V . The proposed decision making policy regarding stopping rule (2.7) consists of the following algorithm: 1) if, for the first time, for some n ( ) N ≤ , ( ) log n V exceeds , N c α , we have n τ = and the experiment is terminated at that stage along with the decision to reject H ; 2) if, no such n exists, the null hypothesis is not rejected along with the termination of the experiment at the target maximum sample size N . We consider derivation of , N c α values in Section 2.4. Since the stopping rule τ is based on the statistic n V which approximates the optimal parametric LR, the proposed test procedure can be anticipated to be very efficient. This is empirically confirmed in Sections 3 and 4. The following proposition demonstrates the consistency of the proposed test. Proposition . Under H , we have ( ) { } H nn N

Pr max log V N γ≤ ≤ > → as ∞→ N , whereas, under H , we have ( ) { } H nn N

Pr max log V N γ≤ ≤ > → as ∞→ N , where n V is defined by (2.6) and ( ) ∈γ . Proof.

The proof of this proposition is outlined in Appendix A. The Type I error probability related to the proposed test procedure (2.7) is { } Pr H N τ ≤ = ( ) { } ,1 Pr max log .

H n Nn N

V c α≤ ≤ > Assume ( ) , N c O N γα = then Proposition 1 provides that ( ) { } ,1 Pr max log 0

H n Nn N

V c α≤ ≤ > → as ∞→ N . Then it is clear that, in order to satisfy { } Pr H N τ α≤ = , for a fixed value α and large values of N , , N c α should be in an order of ( ) o N γ . Thus, the proposed test procedure is consistent, since ( ) { } ,1 Pr max log 1

H n Nn N

V c α≤ ≤ > → as ∞→ N , when , N c α has an order smaller than that of . N γ (In this case, ( ) { } Pr max log

H nn N

V N γ≤ ≤ > is asymptotically a lower bound for ( ) { } ,1 Pr max log

H n Nn N

V c α≤ ≤ > , when , N c N γα < as N  .) τ In this section, we show that the proposed test statistic is distribution-free under H . We then present the critical values for the new test procedure. The stopping rule τ contains the DBEL test statistics , ,... V V , which by definitions (2.4) and (2.6) depend only on certain indicator functions. It turns out that the null distribution of the stopping rule τ is independent of the distribution of the observations , ,... Z Z . In order to explain this claim, we note that under H ( ) ( ) { } ( ) { } i j H i H j I Z Z I F Z F Z − −  ≤ = Φ ≤ Φ  and ( ) ( ) { } ( ) { } ( ) { } ( ) { } i j H i H j H i H j I Z Z I F Z F Z I F Z F Z − − − −    − ≤ = Φ − ≤ Φ = Φ − ≤ Φ    ( ) { } ( ) { }

H i H j

I F Z F Z − −  = −Φ ≤ Φ  , for [ ]

Nji ,1 ∈≠ , where ( ) x − Φ denotes the inverse function of the standard normal cumulative distribution function ( ) x Φ . This fact implies the Type I error rate is ( ) { } ( ) ( ) { } , ,,..., ~ 0,11 1 Pr max log Pr max log N H n N n NZ Z Normaln N n N

V c V c α α≤ ≤ ≤ ≤ > = > . Then it is clear that the proposed procedure (2.7) is exact. Let ( ) { } max log N nn N

DBTS V ≤ ≤ = . Thus, by definition (2.6), , N c α is the upper α -percentile of the distribution of N DBTS satisfying { } , Pr H N N

DBTS c α α≥ = . In a similar manner to the computing scheme shown in the SSRT procedure, we tabulate the critical values of , N c α for various choices of N and α using the MC approach based on 25,000 generations of ,..., N Z Z . The results are shown in Table 1.

Table 1.

Remark 1 . The MC method is a well-known approach for obtaining accurate approximations of the critical values for exact tests. Vexler et al. proposed an approach to compute critical values of exact test procedures using tabulated critical values and MC simulations in a Bayesian manner. In this framework, tabulated critical values are considered as prior information and simulated MC observations are used as data.

3. MONTE CARLO STUDY To evaluate the performance of the new testing strategy as compared to the classical SSRT procedure, we carried out an extensive MC study. We examined the ASN and the corresponding statistical powers of the considered procedures. Critical values of the tests were set at the 5% level of significance and the MC experiments were repeated 10,000 times in each scenario based on N =25, 50 and 75, respectively. According to the statistical literature, we used the alternatives in the MC study following the scenarios: (1) constant shifts in location, e.g., )1 ,0(~ NX i and )1 ,5.0(~ NY i , i =1,…, N ; (2) constant and nonconstant shifts, e.g., )1 ,0(~ NX i and )2 ,5.0(~ NY i , i =1,…, N ; (3) skewed alternatives, e.g., )1 ,1(~ LogNX i and ).50 ,1(~ LogNY i , i =1,…, N ; and (4) nonconstant shifts, e.g., )1 ,7.0(~ BetaX i and )2(~ ExpY i , i =1,…, N . Note that the statistical literature expects that the classical SSRT, which is based on the Wilcoxon signed-rank statistic, will be very efficient in scenario (1). Regarding Scenario (3), one can remark that that many measurements of markers related to health and social science have been shown to follow lognormal distributions. The corresponding MC results are presented in Table 2. (In the Supplementary Material (SM), we provide Table S1 with additional outputs of the MC study.) In Scenarios (1-2), the SSRT demonstrates a slightly higher power than the new test. (In the SM, we also show several cases, when the SSRT slightly outperforms the proposed procedure.) This result is consistent with the MC evaluations of the DBEL and the Wilcoxon signed rank tests in retrospective settings when observations are normally distributed. The ASNs of the proposed test are comparable with those of the SSRT. In Scenario (3), when the pre- and/or post- measurements are lognormally distributed, the proposed testing procedure substantially outperforms the SSRT in terms of the statistical power and the ASN in the considered alternatives for N =50, 75. Consider the case when iii YXZ −= , i =1,…, N , i X ~ ( ) LogN and i Y ~ ( ) U , i =1,…, N , with N =50, 75. In this scenario, the SSRT shows the powers of 0.22 and 0.28 with the corresponding ASNs of 45 and 64, respectively, whereas the proposed test provides the powers of 0.70 and 0.98 with corresponding ASNs of 40 and 44, respectively. In Scenario (4), the proposed testing procedure has better performance in terms of the statistical power and the ASN than the SSRT in most of the considered cases. Thus, compared to the SSRT, the sequential DBEL test procedure shows higher power and relatively smaller ASNs in many cases of the considered alternatives. Table 2.

4. APPLICATION TO THE VENTILATOR-ASSOCIATED PNEUMONIA (VAP) STUDY

The VAP data were produced in the course of an institutional study at the State University of New York at Buffalo, in which oral treatments were compared to investigate their effects on infection of patients’ respiratory system in an ICU. The pathogenesis of pneumonia, including VAP, involves aspiration of bacteria from the oropharynx into the lung, and subsequent failure of host defenses to clear the bacteria resulting in development of lung infection. The major potential respiratory bacterial pathogens (PRPs) include

Staphylococcus aureus , Pseudomonas aeruginosa , Acinetobacter sps . and enteric species. Prior biomedical studies have found the association between strains from bronchocopic cultures isolated at the time pneumonia was suspected and dental plaque/mucosa that is often colonized by PRPs.

Thus, improving oral hygiene in MV-ICU patients and reducing dental plaque load on teeth has the potential to reduce the risk of VAP. The trial sequentially enrolled and examined 83 patients who were admitted to a trauma ICU of the Erie County Medical Center (ECMC). During this study, pre- and post-CHX treatment measurements of amount of aggregated bacteria (S. aureus, P. aeruginosa, Acinetobacter sps., and enteric organisms) were recorded from each patient. Results of quantitative cultures were expressed as colony forming units (cfu) per ml. Figure 1 shows the histogram of post- and pre-measurement differences of the 83 paired data points. (In the SM, Figure S1 presents the scatterplot based on pre- and post-CHX treatment measurements.) Based on the Shapiro-Wilk test of normality we reject the normality assumption for the observed paired data points ( p-value = 0.03). Retrospectively, we apply the Wilcoxon signed-rank test based on the total 83 paired data points, obtaining the corresponding p-value of 0.04 (<0.05). Thus, the Wilcoxon singed-rank test rejects the hypothesis that there is no difference between pre- and post-measurements of aggregated bacteria. This conclusion is coherent to available clinical trial results that demonstrated an effect of CHX on the prevalence of oropharyngeal colonization of respiratory bacterial pathogens. We examine that the proposed sequential DBEL testing scheme and the conventional SSRT procedure can detect the treatment effect in a more efficient manner, in a sense that the sequential methods will provide significant testing results based on less than the total sample size of 83. In accordance with Section 2, we use the MC method to compute 95% critical values 2.676 and 5.166 for the SSRT and the proposed method, respectively, using N =83. The proposed sequential DBEL method rejected the null hypothesis that there is no difference between pre- and post-measurements based on 50 observations. However, the SSRT failed to reject the null hypothesis using the total 83 observations. In order to evaluate the robustness of the proposed approach, we conducted the following bootstrap-type analyses. The strategy is that a sample with size N (<83) was randomly selected from the 83 paired data points to calculate the stopping numbers based on the sample using the proposed method and the SSRT. We repeated this strategy 5,000 times calculating the ASNs and the frequencies that the proposed method and the SSRT reject the null hypothesis. Table 3 presents these results for the different maximum sample sizes of N =15, 25, 35, 50, 65, and 75. In this Bootstrap-type study, we notice that 1) the proposed method has substantially higher rejections rates of a false null hypothesis than those of the SSRT; 2) the proposed method consistently generates smaller ASNs than those of the SSRT, resulting in significant savings in sample sizes. Figure 1. Table 3.

5. CONCLUSIONS

We have developed a new nonparametric sequential technique for detecting treatment effects based on paired data. The proposed method employs the density-based EL methodology that approximates the optimal likelihood ratio statistic in a distribution-free fashion. In the real world study, the new test clearly outperforms the conventional SSRT. To the best of our knowledge, perhaps, this article presents a research that belongs to a first cohort of studies related to applications of the EL techniques in sequential manners. The method developed in this article can be extended to construct various nonparametric group sequential approaches. For example, for a fixed integer k , the stopping rule (2.7) can be modified to be based on statistic ( ) log kn V . Further empirical and theoretical studies are needed to evaluate the proposed approach in this framework. It is clear that in the distribution-free setting considered in this article, there are no most powerful statistical procedures. Therefore it is very important to consider and evaluate reasonably developed decision making policies in the context of detecting treatment effects in practice. APPENDIX A. PROOF OF PROPOSITION 1

The proof of Proposition 1 is relatively complicated, since, in general, we need to evaluate properties of the statistic ( ) ( ) ( ) ( ) nn N jma n m b n j max log min m / n ≤ ≤ ≤ ≤ = ∆ ∏ , whereas the published asymptotic DBEL results consider ( ) ( ) ( ) ( ) n jma n m b n j log min m / n ≤ ≤ = ∆ ∏ -type constructions as n → ∞ . The online supplementary material provides technical details of examining the property ( ) ( ) ( ) k nH a n m b n jn N jm mPr log min N k , k , , . , ,n γ γ ≤ ≤ =≤      < → − = ∈   ∆      ∏  as ∞→ N of the corresponding joint probabilities. SUPPLEMENTARY MATERIALS

Detailed proof of the proposition presented in Section 2, the R codes implementing the proposed method, the additional MC results are available in the supplementary materials and the scatterplot that illustrates the data considered in Section 4.

ACKNOWLEDGEMENTS

This research was supported by the National Institutes of Health (NIH) grant 1G13LM012241-01. We are grateful to the Editor, Associate Editor and reviewers for their helpful comments that led to a substantial improvement in this paper.

REFERENCES

1. Oppegaard K, Lieng M, Berg A, et al. A combination of misoprostol and estradiol for preoperative cervical ripening in postmenopausal women: a randomised controlled trial.

BJOG.

Crit. Care.

Statistical methods in medical research.

John Wiley & Sons; 2008. 4. Dmitrienko A, Koch GG.

Analysis of clinical trials using SAS: A practical guide.

SAS Institute; 2017. 5. Jennison C, Turnbull BW.

Group sequential methods with applications to clinical trials.

CRC Press; 1999. 6. O'Brien PC, Fleming TR. A multiple testing procedure for clinical trials.

Biometrics.

Biometrika.

Biometrics.

Statistics in the Health Sciences: Theory, Applications, and Computing.

CRC Press; 2018. 10. Vexler A, Hutson AD, Chen X.

Statistical testing strategies in the health sciences.

CRC Press; 2016.

11. Barnard GA. Sequential tests in industrial statistics.

Supplement to the J R Stat Soc.

Ann Math Stat.

J Am Stat Assoc.

Biometrika.

Biometrics bulletin.

J R Stat Soc Series A (General).

Biometrika.

Am Stat.

Biometrika.

Empirical likelihood.

Wiley Online Library; 2001. 23. Qin J, Lawless J. Empirical likelihood and general estimating equations.

Ann Stat.

24. Zhao Y. Regression analysis for long-term survival rate via empirical likelihood.

J Nonparametr Stat.

Comput Stat Data Anal.

Statistics and Computing.

J Stat Softw.

Am Stat.

Stat Methods Appl.

J Stat Plan inference.

Biometrika.

Ann Stat.

J Am Stat Assoc. ‐ sample density ‐ based empirical likelihood ratio tests based on paired data, with application to a treatment study of attention ‐ deficit/hyperactivity disorder and severe mood dysregulation. Stat Med.

Am J Hum Genet.

Scand Stat Theory Appl.

J Am Stat Assoc.

AIBS Bulletin.

Crit Care Med.

Table 1.

The critical values , N c α of the new test procedure at the significance levels α ’s. N/ α

40 6.478 6.026 5.716 5.525 4.949 4.830 4.534 4.030 3.514 3.253 2.854 45 6.478 6.123 5.716 5.638 4.949 4.841 4.589 4.081 3.514 3.338 2.854 50 6.478 6.168 5.751 5.667 5.027 4.890 4.723 4.121 3.614 3.412 2.882 55 6.575 6.194 5.783 5.667 5.061 4.926 4.735 4.211 3.685 3.441 2.940 60 6.724 6.345 5.959 5.716 5.176 4.949 4.841 4.288 3.768 3.514 3.027 65 6.594 6.278 5.906 5.716 5.158 4.949 4.843 4.288 3.839 3.514 3.091 70 6.739 6.345 6.046 5.716 5.280 4.980 4.890 4.288 3.930 3.537 3.161 75 6.734 6.345 6.048 5.716 5.321 5.017 4.929 4.309 3.976 3.606 3.231

Table 2 . The MC powers and ASNs of the proposed sequential DBEL (Seq_DBEL) test and the SSRT at the significance level =α . Seq_DBEL SSRT Scenario X F Y F N Power ASN Power ASN (S1) N(0,1) N(0.5,1) 25 0.256 22 0.297 22

50 0.446 40 0.534 39

75 0.660 52 0.712 50 (S2) N(0,1) N(0.5,

50 0.197 45 0.240 45

75 0.298 65 0.334 64 (S3)

LogN(1,1) LogN(1, ) 25 0.073 24 0.066 24

50 0.167 47 0.088 48

75 0.386 66 0.113 71 LogN(0,1) U(1,2) 25 0.168 23 0.162 23

50 0.698 40 0.222 45

75 0.981 44 0.286 64 LogN(0, ) Chisq(6) 25 0.314 23 0.186 23

50 0.962 32 0.245 44

75 1 33 0.315 62 LogN(1,1) Gamma(5,1) 25 0.233 22 0.283 22 50 0.561 39 0.459 40

75 0.867 48 0.607 53 (S4)

Exp(1) Gamma(2,3) 25 0.14 24 0.138 24

50 0.284 44 0.25 45

75 0.482 61 0.365 63 Gamma(5,1) Gamma(1,1/5) 25 0.089 24 0.101 24

50 0.183 46 0.126 47

75 0.408 65 0.158 69 Exp(1) U(-1,2) 25 0.249 23 0.268 23

50 0.565 39 0.497 40

75 0.853 47 0.688 51 Gamma(4,2) Gamma(5,2) 25 0.235 22 0.283 22 50 0.412 41 0.501 39

75 0.621 54 0.680 51 Chisq(1) Beta(0.7,1) 25 0.265 23 0.225 23

50 0.788 37 0.430 41

75 0.993 40 0.626 54 Gamma(2,1) U(1,2) 25 0.178 23 0.161 24

50 0.511 42 0.305 44

75 0.890 50 0.443 61 Figure 1.

The histogram of the total 83 paired differences of pre- and post-CHX treatment measurements. The estimated mean, median and standard deviation of the 83 paired data points are 1.212, 0.002 and 5.036, respectively.

Table 3.

The Bootstrap based rejection rates (RRs) and ASNs of the new test (Seq_DBEL) and the SSRT at the significance level of 0.05. Seq_DBEL SSRT N RRs ASNs RRs ASNs 15 21.7% 13 5.82% 15 25 27.1% 21 5.48% 24 35 34.4% 22 6.40% 34 50 42.2% 39 6.63% 48 65 54.9% 47 5.88% 63 75 63.2% 52 5.75% 72

A Sequential Density-Based Empirical Likelihood Ratio Test for Treatment Effects

Li Zou, Albert Vexler, Jihnhee Yu, and Hongzhi Wan In this supplementary material, we provide the proof of Proposition 1, the R codes used in Sections 2-3 in this article, the additional MC results and the scatterplot that depicts the data considered in Section 4.

1. THE PROOF OF PROPOSITION 1.

Proposition . Under H , we have ( ) { } →> ≤≤ γ NV nNnH as ∞→ N , whereas, under H , we have ( ) { } →> ≤≤ γ NV nNnH as ∞→ N , where ( ) ( ) ( ) ( ) ( ) ( ) min 2 1 1 2 nn jmja n m b n V m m n n −=≤ ≤ = − + ∆ ∏ , ( ) δ+ = nna , ( ) ( ) nnnb δ− = , ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) { } njm i i i ij m j m j m j mi n I Z Z I Z Z I Z Z I Z Z − + + − −= ∆ = ≤ + − ≤ − ≤ − − ≤ ∑ , ( )

0, 0.25 , δ∈ and ( ) ∈γ . Proof . It is clear that the deterministic term ( ) ( ) m n + in the definition of n V quickly vanishes to zero. We simplify the explanation of the proof and represent n V in the form of ( ) ( ) ( ) min 2 nn jmja n m b n V m n =≤ ≤ = ∆ ∏ . Toward this end, we will show that under H , ( ) ( ) → > ∆ ∏ =≤≤≤≤ γ Nn m nj jmnbmnaNnH as ∞→ N , and ( ) ( ) → > ∆ ∏ =≤≤≤≤ γ Nn m nj jmnbmnaNnH as ∞→ N . To prove that ( ) ( ) ( ) → > ∆ ∏ = −≤≤≤≤ γ Nnm nj jmnbmnaNnH as ∞→ N , we use the trivial inequality ( ) ( ) ( ) ( ) ∏∏ = −−= −≤≤ − ∆≤∆ nj jnnj jmnbmna nnnm δ δ for relatively large values of n that implies ( ) ( )  > ∆≤ > ∆ ∏∏ = −≤≤=≤≤≤≤ − γδγ δ Nn nNn m nj jnNnHnj jmnbmnaNnH . (1) Here we should note that the simple inequality ( ) ( ) ( ) ( ) ( ) ( ) /21 1 exp log2/1 1 max log min 2 / max log min 2 / n njm jma n m b n a n m nn N nj j m n m n δ≤ ≤ ≤ ≤≤ ≤ ≤ ≤= =    ∆ ≤ ∆       ∏ ∏ ( ) ( ) ( ) exp log(2)/ 1 max log min 2 / n jmn N a n m n j m n δ δ − ≤ ≤ ≤ ≤ =  + ∆   ∏ and the fact that ( ) ( ) ( ) /21 exp log(2)/ 1 Pr max log min 2 / 0 n jma n m nn j m n wN γδ ≤ ≤≤ ≤ =    ∆ > →      ∏ , for all fixed w < < and N → ∞ , also justify the use of inequality (1). In order to show that ( ) ( ) { } →>∆ ∏ = −−≤≤ − γδ δ Nnn nj jnNnH , as ∞→ N , we use the following inequalities. Let γ and γ satisfy γγ << and γγ << , we have  > ∆ ∏ = −≤≤ − γδ δ Nn n nj jnNnH =

11 1 nH n N j jn n Nn δ δ γ − −≤ ≤ =      >  ∆    ∑  > ∆+ ∆+ ∆≤ ∑∑∑ +−= −≤≤− += −≤≤= −≤≤ − −−− −− − γδδδ δ δδδ δδ δ Nn nn nn n nnnj jnNnnn nj jnNnnj jnNnH  > ∆+ ∆+ ∆= ∑∑∑ +−= −≤≤− += −≤≤= −≤≤ − −−− −− − ,2logmax2logmax2logmaxPr γδδδ δ δδδ δδ δ

Nn nn nn n nnnj jnNnnn nj jnNnnj jnNnH > ∆ ∑ − − = −≤≤

11 1 γδ δ δ Nn n nj jnNn  > ∆+ ∆+ ∆+ ∑∑∑ +−= −≤≤− += −≤≤= −≤≤ − −−− −− − ,2logmax2logmax2logmaxPr γδδδ δ δδδ δδ δ

Nn nn nn n nnnj jnNnnn nj jnNnnj jnNnH ≤ ∆ ∑ − − = −≤≤

11 1 γδ δ δ Nn n nj jnNn  −> ∆+ ∆+ > ∆≤ ∑∑∑ +−= −≤≤− += −≤≤= −≤≤ − −−− −− −

11 111 1011 10 γγδδγδ δ δδδ δδ δ

NNn nn nNn n nnnj jnNnnnnj jnNnHnj jnNnH  > ∆= ∑ − − = −≤≤

11 10 γδ δ δ Nn n n j jnNn H  > ∆−> ∆+ ∆+ ∑∑∑ −− −− −−− − − += −≤≤+−= −≤≤− += −≤≤

211 111 111 10 γδγγδδ δδ δδ δδδ δ

Nn nNNn nn n nn nj jnNnnnnj jnNnnn nj jnNnH  ≤ ∆−> ∆+ ∆+ ∑∑∑ −− −− −−− − − += −≤≤+−= −≤≤− += −≤≤

211 111 111 10 γδγγδδ δδ δδ δδδ δ

Nn nNNn nn n nn nj jnNnnnnj jnNnnn nj jnNnH

CBA ++≤ , (2) where  > ∆= ∑ − − = −≤≤

11 10 γδ δ δ Nn nA nj jnNnH ,  > ∆= ∑ −− − − += −≤≤

211 10 γδ δδ δ Nn nB nn nj jnNnH ,  −−> ∆= ∑ +−= −≤≤ − −

211 10 γγγδ δ δ

NNNn nC nnnj jnNnH . Consider item A. The statistic ( ) ( ) ( ) ( ) δδδ δδ −−== +− ≥−+≥ −≥−−≥+−+=∆ ∑∑ −− nnnjZZIZZInjn ni ini njijn

11 111 , for j n δ− = . Define β , satisfying γβ << , we obtain  > ∆+ ∆≤ ∑∑ − −− − = −≤≤= −≤≤

11 111 110 γδδ δ δβδ δβ

Nn nn nA nj jnNnNnj jnNnH ( ) ( )  −> ∆≤ −= −≤≤ ∑ − −

11 1

111 110 δβγδ δ δβ

NNn n nj jnNnNH , since δ δ − ≥∆ − n jn is applied taking into account the term ( ) { } max log 2 n jnjn N n n δ δβ δ − − −−=≤ ≤ ∆ ∑ . Define the function ( ) ( ) ( ) xFxFxD nn −= , where ( ) ( ) nn ii F x n I Z x − = = ≤ ∑ is the empirical distribution function. Then, for δ− ≤≤ nj , we can rewrite δ− ∆ jn in the form ( ) ( ) ( ) { } n njn j n j n F Z F Zn δ δ δ − − − + + −  ∆ = + − − −   , where ( ) ( ) ( ) ( ) ( ) ( ) ZFZDZF nn −+−=− and ( ) ( ) ( ) n nj n j n j n F Z D Z F Z δ δ δ− − − + + +      − = − + −           . The statement of ( ) ( ) F z F z = − − implies that, under H , we have ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ZFZFZFZDZFZDZF nnnnn −−++−=−+−=− ( ) ( ) ( ) ( ) nZDZD nn −++−= . In a similar manner to that shown above, we have ( ) ( ) ( ) ( ) n n nj n j n j n F Z D Z D Z n j n δ δ δ δ − − − − −+ + +      − = − + + − +           . Then ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ){ } δδδ δ −−− ++− −−−+−+−+=∆ njnnjnnnjn ZDZDZDZDnnj . Let β satisfy ββ << . We obtain ( ) ( )  −> ∆ −= −≤≤ ∑ − −

11 1

111 110 δβγδ δ δβ

NNn n nj jnNnNH ( ) ( )  >−> ∆= −≤≤−= −≤≤ ∑ − − max ,4log2logmaxPr

11 1 βδβγδ βδ δβ

NDNNn n nNnNnj jnNnNH ( ) ( )  ≤−> ∆+ −≤≤−= −≤≤ ∑ − − max ,4log2logmaxPr

11 1 βδβγδ βδ δβ

NDNNn n nNnNnj jnNnNH ( ) ( ) ( )  ≤−> ∆+>≤ −≤≤−= − ≤≤−≤≤ ∑ − − max ,4log2logmaxPrmaxPr

11 1 βδβγδβ βδ δββ

NDNNn nND nNnNnj jnNnNHnNnNH . (3) The proposition of Dvoretzky, Kiefer, and Wolfowitz

1, P.60 provides ββββ β −− −−−≤≤ −≤ > NNnNnNH eeCND , where C is a finite positive constant and β β −− −≈ − Ne N , as ∞→ N . These results yield the conclusion → > −≤≤ β β ND nNnNH as ∞→ N , (4) where ββ << . Set up δ to satisfy βδ << and γδ <− . In (3) under the event { } max nN n N D N β β−≤ ≤ ≤ , we have ( ) δβδδδ δ δβ −−−= −≤≤ = −−≤ ∆ ∑ − −

111 1

21 11

NONNn n nj jnNnN , by the fact that ( ) βδβδ δ −−−−− −≥−−+≥∆ − NnNnnj jn . Considering (3) and ( ) ( ) ( ) γδβγ NONN =− − , we conclude that ( ) ( )

11 1 → ≤−> ∆ −≤≤−= −≤≤ ∑ − − βδβγδ βδ δβ NDNNn n nNnNnj jnNnNH as ∞→ N . (5) By virtue of (4) and (5), we have ( ) ( )

111 110 → −> ∆ − = − ≤≤ ∑ − − δβγδ δ δβ NNn n n j jnNnNH as ∞→ N . This leads to

11 10 → > ∆= ∑ − − = −≤≤ γδ δ δ Nn nA nj jnNnH as ∞→ N . (6) Now we consider the second item ( )( ) { }

211 10 γδ δδ δ NnnB nn nj jnNnH >∆= ∑ −− − − += −−≤≤ at (2). The statistic ( ) ( ) ( ) ( ) δδ δδδ −= −= +− ≥ −≥−−≥+=∆ ∑∑ −−− nZZIZZInn ni njini njijn , for j n n n δ δ− − = + − . Define β , satisfying γβ << , to obtain  > ∆+ ∆≤ ∑∑ −− −−− − − += −≤≤− += −≤≤

211 1311 130 γδδ δδ δβδδ δβ

Nn nn nB nn nj jnNnNnn nj jnNnH ( ) ( ) ( )  −−> ∆≤ −− += −≤≤ ∑ −− −

11 1 δββγδ δδ δβ

NNNn n nn nj jnNnNH , since δ δ − ≥∆ − n jn is applied taking into account the term ( ) { } max log 2 n n jnj nn N n n δ δδβ δ − −− −− −= +≤ ≤ ∆ ∑ . In a similar manner to that we analyzed the item δ− ∆ jn , when δδ −− −≤≤+ nnjn , we obtain ( ) ( ) ( ) ( ) { }

12 2 n n n njn j n j n j n j n n D Z D Z D Z D Z δ δ δ δ δ δ − − − − − − − − + +        ∆ = + − + − − −               . Define β , satisfying ββ << , to obtain ( ) ( ) ( )  −−> ∆ −− += − ≤≤ ∑ −− − δββγδ δδ δβ NNNn n nn nj jnNnNH ( ) ( ) ( )  >−−> ∆= −≤≤−− += −≤≤ ∑ −− − max ,2log22logmaxPr

11 1 βδββγδ βδδ δβ

NDNNNn n nNnNnn nj jnNnNH ( ) ( ) ( )  ≤−−> ∆+ −≤≤−− += −≤≤ ∑ −− − max ,2log22logmaxPr

11 1 βδββγδ βδδ δβ

NDNNNn n nNnNnn nj jnNnNH ( ) maxPr β β −≤≤ >≤ ND nNnNH ( ) ( ) ( )  ≤−−> ∆+ −≤≤−− += −≤≤ ∑ −− − max ,2log22logmaxPr

11 1 βδββγδ βδδ δβ

NDNNNn n nNnNnn nj jnNnNH . (7) The result of Dvoretzky, Kiefer, and Wolfowitz

1, P.60 provides ( ) →> −≤≤ β β ND nNnNH as ∞→ N , (8) where ββ << . Set up δ to satisfy βδ << and γβδ <−+ . In (7), under the event { } max nN n N D N β β−≤ ≤ ≤ , we have ( ) ( ) ( )

111 1 βδβδδδ δδ δβ −+−−− += −≤≤ =−−−≤ ∆ ∑ −− − NONNNn n nn nj jnNnN , since βδ δ −− −≥∆ − Nn jn . Considering (7), since ( ) ( ) ( ) ( ) γδββγ NONNN =−− − , we conclude that ( ) ( ) ( )

11 1 → ≤−−> ∆ −≤≤−− += −≤≤ ∑ −− − βδββγδ βδδ δβ NDNNNn n nNnNnn nj jnNnNH , (9) as ∞→ N . By virtue of (8) and (9), we have ( ) ( ) ( )

11 1 → −−> ∆ −− += −≤≤ ∑ −− − δββγδ δδ δβ NNNn n nn nj jnNnNH as ∞→ N . This leads to

211 10 → > ∆= ∑ −− − − += −≤≤ γδ δδ δ Nn nB nnnj jnNnH as ∞→ N . (10) Now we consider the third item ( )( ) { }

211 10 γγγδ δ δ

NNNnnC n nnj jnNnH −−>∆= ∑ +−= −−≤≤ − − at (2). The statistic ( ) ( ) ( ) ( ) δδδ δδ −−= −=− ≥−+≥ −≥−−≥+−+=∆ ∑∑ −− nn jnnZZIZZIjnnn ni njini nijn , for j n n n δ− = − + . Define β , satisfying γβ << , to obtain  −−> ∆+ ∆≤ ∑∑ +−= −≤≤+−= −≤≤ − −− −

211 151 150 γγγδδ δ δβδ δβ

NNNn nn nC nnnj jnNnNnnnj jnNnH ( ) ( )  −−−> ∆≤ −+−= −≤≤ ∑ − −

11 1 δβγγγδ δ δβ

NNNNn n nnnj jnNnNH , since δ δ − ≥∆ − n jn is applied taking into account ( ) ( )

15 1

11 1 max log 2 / . n jnn N j n n n n δβ δ δ −− −≤ ≤ = − + ∆ ∑ In a similar manner to that we analyzed the item δ− ∆ jn , when njnn ≤≤+− − δ , we obtain ( ) ( ) ( ) ( ) ( ) ( ) { } n n n nn njn j n j n n n j D Z D Z D Z D Zn δ δ δ δ − − − − − − + −    ∆ = + − + − − −       . Define β , satisfying ββ << , to obtain ( ) ( )  −−−> ∆ −+−= −≤≤ ∑ − −

11 1 δβγγγδ δ δβ

NNNNn n nnnj jnNnNH ( ) ( )  >−−−> ∆= −≤≤−+−= −≤≤ ∑ − − max ,4log2logmaxPr

11 1 βδβγγγδ βδ δβ

NDNNNNn n nNnNnnnj jnNnNH ( ) ( )  ≤−−−> ∆+ −≤≤−+−= −≤≤ ∑ − − max ,4log2logmaxPr

11 1 βδβγγγδ βδ δβ

NDNNNNn n nNnNnnnj jnNnNH  >≤ −≤≤ maxPr β β ND nNnNH ( ) ( )  ≤−−−> ∆+ −≤≤−+−= −≤≤ ∑ − − max ,4log2logmaxPr

11 1 βδβγγγδ βδ δβ

NDNNNNn n nNnNnnnj jnNnNH . (11) The result of Dvoretzky, Kiefer, and Wolfowitz

1, P.60 provides → > −≤≤ β β ND nNnNH as ∞→ N , (12) where ββ << . Set up δ to satisfy βδ << and γδ <− . In (11), under the event { } max nN n N D N β β−≤ ≤ ≤ , we have ( ) δβδδδ δ δβ −−−+−= −≤≤ = −−≤ ∆ ∑ − −

111 1

61 15

NONNn n nnnj jnNnN , since ( ) βδβδ δ −−−−− −≥−−+≥∆ − NnNnjnn jn . Considering (11), since ( ) ( ) ( ) γδβγγγ NONNNN =−−− − , we conclude that ( ) ( )  −−−> ∆ −+−= −≤≤ ∑ − − ,4log2logmaxPr

11 1 δβγγγδ δ δβ

NNNNn n nnnj jnNnNH } →≤ −≤≤ β β ND nNnN as ∞→ N . (13) By virtue of (12) and (13), we have ( ) ( )

11 1 → −−−> ∆ −+−= −≤≤ ∑ − − δβγγγδ δ δβ NNNNn n nnnj jnNnNH as ∞→ N . This leads to

211 10 → −−> ∆= ∑ +−= −≤≤ − − γγγδ δ δ NNNn nC nnnj jnNnH as ∞→ N . (14) Applying the results (6), (10) and (14) to the Inequalities (1) and (2), we obtain ( ) ( ) → > ∆ ∏ =≤≤≤≤ γ Nn m nj jmnbmnaNnH as ∞→ N . For example, one can set up =δ , =γ , == γγ , =β , =β , =β , =β , =β , =β to hold the requirements regarding the values of the parameters , δ γ , γ , γ , β , β , β , , β β , and β stated above. This completes the proposition 1’s statement under H . Consider ( ) ( ) ( )  > ∆ ∏ = −≤≤≤≤ γ Nnm nj jmnbmnaNnH as ∞→ N . We have that ( ) ( ) ( ) ( ) ( ) ( )  ∆≥ ∆ ∏∏ = −≤≤= −≤≤≤≤ Nj jmNbmNanj jmnbmnaNn

Nmnm implies ( ) ( ) ( ) ( )  > ∆≥ > ∆ ∏∏ =≤≤=≤≤≤≤ γγ NN mNn m

Nj jmNbmNaHnj jmnbmnaNnH . (15) By virtue of Proposition 1 presented in Vexler et al., under H , we have ( ) ( ) ( )( )

11 1

N p Ha N m b N j jm H f Zm EN N f Z ≤ ≤ =     − →− +       ∆       ∏ as ∞→ N , for all << δ , where ( ) H f z is the density function of Z under H . Then ( )( ) ( )( ) ≥  −+−>  −+− Zf ZfEZf ZfE

HHHHHH . This leads to ( ) ( ) → > ∆ ∏ =≤≤ γ NN m

Nj jmNbmNaH as ∞→ N . (16) Using the results of (15) and (16), we complete the proof.

2. R CODES USED IN SECTIONS 2-3 n =4,…, N . for(n in 4:N){ Vnm<-array() z<- x[1:n] sz<- sort(z) m in (2.6) m <- c(round(n^(delta+0.5)):min(c(round((n)^(1-delta)),round(n/2)))) delta_nm<-array() jm ∆ at (2.6) Vnm_function<- function(m){ L<-c(1:n)- m LL<-replace(L, L <= 0, 1 ) U<-c(1:n)+ m UU<-replace(U, U >= n, n) zL<-sz[LL] zU<-sz[UU] for (i in 1:n){ delta_nm[i]<- (sum(sz<=zU[i])+sum(-sz<=zU[i])-sum(sz<=zL[i])-sum(-sz<=zL[i]))/(2*n)} delta_nm<- replace(delta_nm,delta_nm==0,1/n) Vnm<- log(prod(m*(2*n-m-1)/(n^2*delta_nm))) return(Vnm)} Vnm<- sapply(m,Vnm_function) n V in (2.6) Vmc_n[n-3]<- min(Vnm)} Vmc[mc]<- max(Vmc_n)} , N c α of the proposed test quantile(Vmc,1-alpha) N , number of the MC simulations, the critical value corresponding zU<-sz[UU] for (i in 1:n){ delta_nm[i]<- (sum(sz<=zU[i])+sum(-sz<=zU[i])-sum(sz<=zL[i])-sum(-sz<=zL[i]))/(2*n) } delta_nm<- replace(delta_nm,delta_nm==0,1/n) Vnm<- log(prod(m*(2*n-m-1)/(n^2*delta_nm))) return(Vnm)} Vnm<- sapply(m,Vnm_function) if(n<=N & min(Vnm)>=CV){ N_stop[mc]<- n no_reject<- no_reject+1 break}else if(n==N & min(Vnm)

3. THE ADDITIONAL MC RESULTS

Table S1.

The MC powers and ASNs of the proposed sequential DBEL (Seq_DBEL) test and the SSRT at the significance level =α . Seq_DBEL SSRT X F Y F N Power ASN Power ASN N(0,1) Cauchy(1,1) 25 0.314 21 0.404 21

50 0.603 37 0.656 35

75 0.846 45 0.833 42 Exp(1) LogN(0,

50 0.704 37 0.502 40

75 0.951 42 0.701 51 Beta(0.7,1) Exp(2) 25 0.059 24 0.054 25

50 0.087 48 0.062 49

75 0.215 70 0.073 72 N(0,1) U(-1,2) 25 0.272 22 0.321 22

50 0.543 39 0.57 38

75 0.718 50 0.756 48 U(1,2) LogN(0,1) 25 0.168 23 0.162 23

50 0.698 40 0.222 45

75 0.981 44 0.286 64 Gamma(2,4) Gamma(2,5) 25 0.118 24 0.131 24 50 0.187 46 0.220 45

75 0.286 66 0.312 65 Gamma(2,3) Gamma(3,2) 25 0.944 13 0.969 13 50 1 14 1 15

75 1 15 1 15 Exp(1) Beta(0.7,1) 25 0.671 19 0.645 19

50 0.977 24 0.93 26

75 1 25 0.989 28 Exp(1) U(-1,2) 25 0.249 23 0.268 23

50 0.565 39 0.497 40

75 0.853 47 0.688 51 LogN(-0.5,1) Beta(0.7,1) 25 0.594 19 0.581 20

50 0.953 26 0.885 28

75 0.999 27 0.977 31 Chisq(1) U(1,2) 25 0.605 19 0.617 18

50 0.986 24 0.877 26

75 1 25 0.968 29 Gamma(0.9,1) Beta(0.9,1) 25 0.316 22 0.29 23

50 0.774 35 0.559 38

75 0.985 39 0.746 49 Beta(0.7,1) U(-1,1) 25 0.698 18 0.691 18

50 0.975 23 0.952 24

75 0.999 24 0.994 26 Figure S1.

The scatterplot 'pre-CHX treatment measurements' vs. 'post-CHX treatment measurements'.

REFERENCES . Serfling RJ. Approximation theorems of mathematical statistics.

Vol 162: John Wiley & Sons; 2009. 2. Vexler A, Gurevich G, Hutson AD. An exact density-based empirical likelihood ratio test for paired data.