Acta Paediatrica (Oslo, Norway : 1992) | 2019

Why results from Bayesian statistical analyses of clinical trials with a strong prior and small sample sizes may be misleading The case of the NICHD Neonatal Research Network Late Hypothermia Trial

Abstract

We would like to thank Laptook et al. (1) for their response to our ‘Major concerns about late hypothermia study’ (2). However, their response suggests that the difference between their opinion and ours arises because we are using frequentist statistics and they are using Bayesian. This is not the case. There is indeed general concern at present about the potential misuse of p-values in frequentist statistical practice. We agree that generally in situations where a limited number of observations are available, the usual frequentist requirement for the significance probability to be lower than 0.05 is too strict. Results with a significance probability of 0.10 or even 0.15 may also give valuable information, and correspondingly a confidence interval (frequentist) or credibility interval (Bayesian) of 0.95 is sometimes too strict. The heart of the matter is whether the observation that 19 of 78 neonates in group 1 (with cooling initiated in the time window from 6 to 24 hours after birth) showed adverse outcomes can be said to indicate that the associated probability p1 is smaller than the corresponding probability p0 in the control group, where 22 of 79 showed adverse outcomes. The original JAMA paper (3) discussed this in terms of the relative risk rr = p1/p0, and the question is whether there are any grounds to claim, with any meaningful confidence or credibility (to use the relevant frequentist and Bayesian terms), that rr is smaller than 1. Our primary analysis was indeed frequentist, demonstrating that with sample sizes 79 and 78 there can be no meaningful statistical difference between the probability estimates 19/78 = 0.244 and 22/79 = 0.278. The close proximity of these two estimates can be assessed in several ways, including a p-value far above the customary levels for significance (p = 0.75), and a confidence curve with the value rr = 1 in the middle with a 95 per cent confidence interval (0.51, 1.48) (see Fig. 1 in our previous communication (2)). We have nothing against Bayesian analyses in general, and we agree that Laptook et al.’s ‘neutral’ unimodal prior with rr = 1 is sensible if no prior knowledge on late cooling is available. The detailed shape and especially the tails of this prior probability distribution are of course rather uncertain. As our figure clearly shows, Laptook et al.’s results give little support for the claim that p1 is smaller than p0 (i.e. that the relative risk parameter rr = p1/p0 above is smaller than 1). We have also performed a sensitivity analysis on the results of the trial. We moved two infants from the control group to the cooled group, so that the outcome was death or disability for 21 of 78 infants in the cooled group and 20 of 79 in the control group. These results would indicate a slightly better outcome for the control group, which is certainly possible if there is no real difference between the two groups. The figure displays Laptook et al.’s Bayesian prior probability distribution (in red) and two posterior probability distributions (in black) for the rr parameter. The solid black curve is the posterior using the observed data, while the dashed black curve is the posterior using the hypothetical outcomes from the sensitivity analysis. The 95 per cent credibility intervals for rr are (0.61, 1.40) and (0.68, 1.58), respectively, for the two posterior distributions. rr = 1.00 is

Volume 108

Acta Paediatrica (Oslo, Norway : 1992) | 2019

Why results from Bayesian statistical analyses of clinical trials with a strong prior and small sample sizes may be misleading The case of the NICHD Neonatal Research Network Late Hypothermia Trial

Abstract

Volume 108

Pages 1190 - 1191

DOI 10.1111/apa.14800

Language English

Journal Acta Paediatrica (Oslo, Norway : 1992)

Full Text