A statistical method for estimating the no-observed-adverse-event-level
AA statistical method for estimating theno-observed-adverse-event-level
Ludwig A. Hothorn,Im Grund 12, D-31867 Lauenau, Germany (retired from Leibniz University Hannover)
January 5, 2021
Abstract
In toxicological risk assessment the benchmark dose (BMD) is recommended instead of the no-observed-adverse effect-level (NOAEL). Still a simple test procedure to estimate NOAEL is proposedhere, explaining its advantages and disadvantages. Versatile applicability is illustrated using four dif-ferent data examples of selected in vivo toxicity bioassays.
At first glance, it is surprising why BMDL [2] is not used as a standard approach today, mainly becauseof the disadvantages of the NOEAL approach [1]. NOAEL depends on the design, i.e. sample size n i ,number of doses, chosen dose intervals, kind of endpoints, available historical controls, GLP conditions.In biomedical research, there are few better standardized studies than the in vivo and in vitro assays inregulatory toxicology, especially with respect to the criteria of minimum sample sizes, recommendednumber of doses, and a-priori defined endpoints. Thus, in the scenario considered here, the NOAEL doesnot combine all the described disadvantages of significance testing in mostly unplanned observationalstudies. Given the many alternative uncertainties, it should be accepted that the NOAEL is ’only’ anexperimental, i.e. designed dose and not, like BMDL, an interpolated estimate in the hundredths rangewith a spurious accuracy. Especially since an uncertainty factor with a wide definition range is addedto NOAEL for the risk assessment. Practically, the BMD concept has at least four main difficulties: i)its model dependence (which can be reduced by model averaging [17], ii) the necessary choice of BMR,which remains an unresolved challenge for multiple, continuous end points with different scales, vari-ances, and relevance ranges, iii) the considerable width of the confidence interval for designs with onlya few doses ( e.g., k = 3 + 1 design), and iv) the sometimes non-textbook shape of dose-response de-pendencies (e.g., pronounced plateaus or even downward effects). e.g., the common k = 3 + 1 design),and iv) the sometimes non-textbook shape of dose-response dependencies (e.g., pronounced plateaus oreven downward effects at high dose(s)). Furthermore, dose selection in the design phase is more difficultto obtain a smooth log-logistic curve, required for the BMD concept- compared to a design with at leastone non-significant and one significant dose (required for the NOAEL concept).The main difference between NOEAL and BMD lies in the sparseness of assumptions: i) NOEAL as-sumes only significance of a point null hypothesis and monotonicity of the dose-response relationship for D i > N OEAL (more precisely the monotonicity of p-values < . ), ii) BMD assumes a perfect fitof a specific nonlinear model as well as interpolation to the dose scale based on a selected BMR value(inverse regression). 1 a r X i v : . [ s t a t . A P ] J a n OAEL should be in fact estimated as the maximum safe dose, i.e., by a proof-of-safety approach[4] as step-up non-inferiority tests. This approach is not currently used at all, mainly because it dependson the a priori definition of a threshold value that can still be interpreted, which is difficult practically.Therefore the simple alternative of a proof-of-hazard approach is used:
N OAEL = M ED − , whereminimum effective dose (MED) is the lowest significant dose, although all higher doses must also besignificant. The Dunnett method [6] is often used for this purpose.As basic law in toxicological risk assessment, the Paracelsus theorem can be regarded ’All things arepoison, and nothing is without poison. The dose alone makes a thing not poison’ . This requires non-significant effects both below and up to the NOAEL, and also significant effects beyond the NOAEL.Based on this model one can derive an order constraint assumption. The Williams procedure [7] is basedexactly on this assumption, but makes no statement about the individual doses (except D max ).Two approaches to NOAEL estimation are described below, which were derived as special cases of theclose testing procedure (CTP) [8] for comparisons to the control and monotonicity assumption. CTP is a general and flexible test principle for multiple testing, which is based on a decision tree of closedpartition hypotheses tested by intersection-union-tests (IUT). The definition of the elementary hypothesesof interest is essential. For NOAEL estimation exactly the individual comparisons against control are ofinterest only, for example H = µ − µ , H = µ − µ , H = µ − µ . Based on these H ξ , all in-tersection hypotheses are created up to the final global hypothesis in order to obtain a closed intersectionhypotheses system: partition hypotheses: H = µ = µ − µ , H = µ = µ − µ , H = µ = µ − µ , global hyothesis: H = µ = µ = µ − µ An elementary hypothesis of interest is rejected if it is rejected exactly, as are all partition hypothe-ses and the global hypothesis containing them, each with a test at the α level. Because any level- α -test can be used, the closure test is flexible. If one assumes one-sided tests with an order restriction µ ≤ µ ≤ ... ≤ µ k (for any pattern of equalities and inequalities, but at least µ < µ k ) due to amonotonic dose-response relationship, the following applies: if H = µ = µ = µ − µ is rejected, H = µ = µ − µ , H = µ = µ − µ , H = µ − µ must also be rejected, without testing it.This allows the use of simple pairwise tests for all hypotheses. Here, pairwise contrast tests are used [10]to exploit the full degree of freedom of the k-sampling problem.The adjusted p-value for the elementary hypothesis of interest results from the maximum of all p-valuesin its hypothesis tree (IUT): H : H ⊂ H ⊂ H ⇒ p = max ( p , p , p ) H : H ⊂ H ⊂ H ⇒ p = max ( p , p ) H : H ⊂ H ⇒ p = p From a toxicological point of view, it is interesting that the comparison D max − is performed withan unadjusted level α -test (and thus of maximum possible power). The D max − − , D max − − , ... comparisons with slightly higher, increasing conservatism (because on the IUT condition). This is anintuitive, simple and powerful method for NOAEL estimation. In particular, it can be generalized tomany situations that occur in toxicology: parametric and non-parametric tests, assuming homogeneousvariance or not, considering proportions, counts, poly-k-estimates, using a mixed effect model (e.g. totake within/between-litter dependencies into account), etc.. Related R-software is available, see examplesin the Appendix. 2 Demonstration the procedures for four selected in-vivo bioassay
The first example contains body weight changes for Fischer 344 rats treated with 0, 100, 200, 500, and750 mg/kg doses of aconiazide over a 14-day period [11]. The second example considers relative kidneyweight data in a feeding study with a crop-protecting compound [12]. The third example use severityscores of lesions in epithelium after exposure to formaldehyde in doses of 0, 2, 6, and 15 ppm in Fisher344 rats [13]. The fourth example considers effect of vinylcyclohexene diepoxide exposure to 0 mg/ml,(group 0), 25 mg/ml (group 1), 50 mg/ml (group 2), and 100 (group 3) mg/ml, on the incidence ofalveolar/bronchiolar tumors in a long term rodent carcinogenicity study on female B6C3F1 mice [14].Figure 1 shows three boxplots (for examples 1,2,3 from the left to the right, and the mosaicplot for thecrude tumor proportions in example 4, right panel). This random selection shows designs with only 3dose groups (and 4 dose groups in example 1).Figure 1: Raw data plots for the four data examples (from left to right examples 1, 2, 3, 4)Since normal distribution and homogeneous variances can be assumed for the endpoint body weightgain in example 1, the closure test with parametric multiple contrast is estimated. (R-Code see Appendix1). In the original paper of example 2 ratio-to-control tests under variance heterogeneity were used [12].Therefore a closure test is used by simultaneous ratio-to-control tests [15] (R-Code see Appendix 2). Theseverity score endpoint in example 3 is analysed by a nonparametric multiple contrast test for the relativeeffect size [16] (p-value in brackets) and alternatively by contrast test using the most likely transformationmodel for count data [18] (R-Code see Appendix 3). In long-term carcinogenicity bioassay, the complexrelationship between tumor development and mortality can be modeled with the poly-k-test [19] whereasrelated multiple contrast tests are available in the R package MCPAN [20]. This uses a generalized con-trast test for the difference of proportions, more precisely, weighted proportions. (R-Code see Appendix4). Table 1 shows for these 4 examples the adjusted p-values of the specific closure tests for the respec-tive control comparisons and the estimated NOAEL (marked in bold). In example 2, no NOAEL canbe estimated as there is no non-significant dose, contrary to the original work [12], so the control 0 ismarked in bold (an inappropriate dose selection in this assay). The nonparametric approach in example 3reveals the same decision as the most likely transformation approach, but with slightly different p-values (0 . . < . .Exa. 1 Exa. 2 Exa. 3 Exa. 4Comp. p adj Comp. p adj Comp. p adj Comp. p adj -0 0.058 -0 0.059 -0 0.11 2/0 < . < . < . < . - ... - ... - ...Table 1: Adjusted p-values and estimated NOAEL (bold) for the four examplesThese examples have been selected to demonstrate that this approach can be applied to numerous3cenarios, i.e. different endpoints, different effect sizes, different test principles, different assumptions.To assess the difference between without/with assumption of order restriction, the p-values for theDunnett procedure and the CTP are compared directly in Table 2. The same NOAEL is estimated in bothapproaches. The p-values differ slightly, but within the accuracy range of a multivariate t-distribution.CTP Dunnett procedureComp. p adj Comp. p adj -0 0.1108 . e − . e − . e − Table 2: Adjusted p-values for Dunnett procedure and CTP in example 1
An intuitive, simple and flexible closure test with pairwise contrast tests is proposed to estimate NOAEL.It is based only on the assumption of a monotonic dose-response relationship. Notice, all discussed dis-advantages of the NOAEL approach compared to the BMD approach remain. Nevertheless, it can be rec-ommended as a possibility for the usual routine evaluation of in vivo and in vitro bioassays in regulatorytoxicology, particularly for the common-used designs with a few doses or concentrations only. Exten-sions for multiple primary endpoints, e.g. multiple tumors, and the mixed model for litter-dependenciesin reprotoxicological bioassays will be described next.
Acknowledgement
Dr. Christian Ritz, University of Copenhagen I thank for valuable suggestions for a better visualization of the approach.
References [1] R. L. Kodell, Replace the noael and loael with the bmdl01 and bmdl10, Environmental and Ecological Statistics 16 (1) (2009)3–12. doi:10.1007/s10651-007-0075-3 .[2] S. M. Jensen, F. M. Kluxen, C. Ritz, A review of recent advances in benchmark dose methodology, Risk Analysis 39 (10)(2019) 2295–2315. doi:10.1111/risa.13324 .[3] L. A. Hothorn, D. Hauschke, Identifying the maximum safe dose: a multiple testing approach, Biopharmaceutical Statistics10 (2000) 15-30.[4] L. A. Hothorn, M. Hasler, Proof of hazard and proof of safety in toxicological studies using simultaneous confidence intervalsfor differences and ratios to control, J of Biopharmaceutical Statistics 18 (2008) 915-933.[5] A. C. Tamhane, C. W. Dunnett, J. W. Green, J. D. Wetherington, Multiple test procedures for identifying the maximum safedose, Journal of the American Statistical Association 96 (455) (2001) 835–843. doi:10.1198/016214501753208546 .[6] C. W. Dunnett, A multiple comparison procedure for comparing several treatments with a control, Journal of the AmericanStatistical Association 50 (272) (1955) 1096–1121.[7] D. Williams, A test for differences between treatment means when several dose levels are compared with a zero dose control,Biometrics 27 (1971) 103-117.[8] R. Marcus, E. Peritz, K. R. Gabriel, Closed testing procedures with special reference to ordered analysis of variance,Biometrika 63 (3) (1976) 655–660. doi:10.1093/biomet/63.3.655 .[9] L. A. Hothorn, The two-step approach-a significant ANOVA F-test before Dunnett’s comparisons against a control-is not rec-ommended, Communications in Statistics- Theory and Methods 45 (11) (2016) 3332–3343. doi:10.1080/03610926.2014.902225 .
10] L. A. Hothorn, Closed test procedures for the comparison of dose groups against a negative control group or placebo. arXiv2011.13758v1[11] R. W. West, R. L. Kodell, Changepoint alternatives to the noael, Journal of Agricultural Biological and Environmental Statis-tics 10 (2) (2005) 197–211. doi:10.1198/108571105X46525 .[12] A. Tamhane, B. Logan, Finding the maximum safe dose level for heteroscedastic data, Journal of Biopharmaceutical Statistics14 (2004) 843-856.[13] T. Yanagawa, Y. Kikuchi, K. G. Brown, No-observed-adverse-effect levels in severity data, Journal of the American StatisticalAssociation 92 (438) (1997) 449–454.[14] W.W. Piegorsch, A.J. Bailer (1997): Statistics for environmental biology and toxicology. Chapman and Hall, London. Table6.5, page 238.[15] L. A. Hothorn, G. D. Djira, A ratio-to-control Williams-type test for trend, Pharmaceutical Statistics 10 (4) (2011) 289–292. doi:10.1002/pst.464 .[16] F. Konietschke, L. A. Hothorn, Rank-based multiple test procedures and simultaneous confidence intervals, Electronic Journalof Statistics 6 (2012) 738–759. doi:10.1214/12-EJS691 .[17] C. Ritz, D. Gerhard, L. A. Hothorn, A unified framework for benchmark dose estimation applied to mixed models and modelaveraging, Statistics in Biopharm. Res. 5 (2013) 79–90.[18] S. Siegfried, T. Hothorn, Count transformation models, Methods in Ecology and Evolution 11 (7) (2020) 818–827. doi:10.1111/2041-210X.13383 .[19] G. S. Bieler, R. L. Williams, Ratio estimates, the delta method, and quantal response tests for increased carcinogenicity,Biometrics 49 (3) (1993) 793–801. doi:10.2307/2532200 .[20] F. Schaarschmidt, M. Sill, L. A. Hothorn, Approximate simultaneous confidence intervals for multiple contrasts of binomialproportions, Biometrical Journal 50 (5) (2008) 782–792. doi:10.1002/bimj.200710465 . Appendix
Example 1
Wes <-structure(list(Gain = c(5.7, 10.2, 13.9, 10.3, 1.3, 12, 14, 15.1,8.8, 12.7, 8.3, 12.3, 6.1, 10.1, 6.3, 12, 13, 13.4, 11.9, 9.9,9.5, 8.1, 7, 7.8, 9.3, 12.2, 6.7, 10.6, 6.6, 7, 2.9, 5.6, -3.5,9.5, 5.7, 4.9, 3.8, 5.6, 5.6, 4.2, -8.6, 0.1, -3.9, -4, -7.3,-2.2, -5.2, -1, -8.1, -4.8), dose = c(0L, 0L, 0L, 0L, 0L, 0L,0L, 0L, 0L, 0L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L,100L, 100L, 200L, 200L, 200L, 200L, 200L, 200L, 200L, 200L, 200L,200L, 500L, 500L, 500L, 500L, 500L, 500L, 500L, 500L, 500L, 500L,750L, 750L, 750L, 750L, 750L, 750L, 750L, 750L, 750L, 750L),Dose = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("0","100", "200", "500", "750"), class = "factor")), row.names = c(NA,-50L), class = "data.frame")ni<-aggregate(Gain ~ Dose, data = Wes, length)$Gain Example 2
Tamh <- structure(list(dose = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,3L),kidneywt = c(6.593, 7.48, 6.93, 5.662, 6.789, 7.268, 6.647,6.443, 6.713, 6.057, 6.253, 7.045, 6.552, 5.668, 6.354, 6.511,7.111, 6.015, 7.062, 7.347, 7.733, 7.396, 8.173, 6.938, 6.988,6.621, 7.508, 6.657, 7.787, 6.537, 7.369, 6.623, 6.456, 6.507,6.154, 5.934, 6.909, 7.252, 7.006, 8.706, 7.257, 7.743, 7.026,8.561, 7.674, 7.45, 8.188, 8.15, 7.619, 8.722, 7.387, 6.798,7.617, 8.071, 7.02, 7.821, 7.063, 9.569, 9.362, 10.911, 9.961,9.497, 9.911, 8.544, 10.404, 10.421, 10.065, 9.67, 8.194, 8.989,7.347, 7.26, 9.017, 8.847, 8.723),Dose = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L),.Label = c("0", "1", "2", "3"), class = "factor")),row.names = c(NA, -75L), class = "data.frame")library(mratios)RcmatDu<-contrMatRatio(ni, type="Dunnett")cm01numC<-rbind(RcmatDu$numC[1,],RcmatDu$numC[1,])cm01denC<-rbind(RcmatDu$denC[1,],RcmatDu$denC[1,])cm02numC<-rbind(RcmatDu$numC[2,],RcmatDu$numC[2,])cm02denC<-rbind(RcmatDu$denC[2,],RcmatDu$denC[2,])cm03numC<-rbind(RcmatDu$numC[3,],RcmatDu$numC[3,])cm03denC<-rbind(RcmatDu$denC[3,],RcmatDu$denC[3,])
Example 3
Epi <-structure(list(dose = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,0L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 6L, 6L, 6L, 6L,6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,6L, 6L, 6L, 6L, 6L, 6L, 6L, 15L, 15L, 15L, 15L, 15L),score = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, L, 0L, 0L, 0L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 0L, 0L, 0L, 0L,0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 2L,2L, 2L, 2L, 2L, 3L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 2L, 2L, 2L, 2L,2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,3L, 3L, 3L, 4L, 4L),Dose = structure(c(1L, 1L, 1L, 1L, 1L, 1L,1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L), .Label = c("0", "2", "6", "15"), class = "factor")),row.names = c(NA, -86L), class = "data.frame")
Example 4 library(MCPAN)data(bronch)ni<-aggregate(Y ~ group, data = bronch, length)$Ycm03 <- rbind(CM03,CM03)cm02 <- rbind(CM02,CM02)cm01 <- rbind(CM01,CM01) library(MCPAN)data(bronch)ni<-aggregate(Y ~ group, data = bronch, length)$Ycm03 <- rbind(CM03,CM03)cm02 <- rbind(CM02,CM02)cm01 <- rbind(CM01,CM01)