Annals of Internal Medicine | 2019
Correcting Misinterpretations of the E-Value
Abstract
We thank Ioannidis and colleagues (1) for raising several important issues concerning the potential misuse of the E-value (2). These issues are not new, and we have addressed them previously and provided guidance as to how to avoid misuse (24). Let us begin with where we think Ioannidis and colleagues are correct (1). First, biases other than confounding, such as measurement error, selection bias, and publication bias, can also undermine estimates of causal effects. We agree, we have noted this previously (24), and we have now developed similar techniques to address other biases (5, 6). Second, the E-value is monotonically related to the effect estimate. This is true, although the relationship is not almost linear (1) until very large effect sizes and is especially not linear at modest effect sizes. Is it then worth reporting the E-value? The E-value transforms the effect estimate into a confounding scale concerning the strength of the confounding associations that could explain away the observed association. Most investigators cannot do the transformation in their head. Likewise, we often report both the CI and the P value; 1 of these can be derived from the other in many cases, but we report both, again because we cannot typically do these transformations in our head. Both pieces of information are useful (4). Third, Ioannidis and colleagues point out that the E-value assesses the minimum strength of associationthat an unmeasured confounder would need to have with both the treatment and the outcome to fully explain away a specific treatmentoutcome association, conditional on the measured covariates (2) and evaluates this magnitude if the 2 confounding associations are equal. This is true, and if 1 of these associations (exposureconfounder or confounderoutcome) is smaller than the E-value, the other would have to be considerably larger to compensate. We noted this in our articles, where we also provided means to analyze sensitivity when the exposureconfounder and confounderoutcome associations differ from each other (24). Fourth, 2 identical E-values from different studies may have different interpretations. We have pointed this out in our reports (24). What constitutes a small or large E-value depends, among other things, on the nature of the outcome, treatment, and measured covariates. Fifth, the E-value depends on the magnitude of the treatment comparison. This is true. For example, for continuous exposures, the choice of comparison (such as 1 year of age vs. 3 decades) will alter the E-value. However, this is reasonable both because it is often more plausible that a larger exposure change has a causal effect and because (for a larger exposure change) the unmeasured confounder is likely to differ more between exposure groups. As a result, a larger E-value is needed to indicate genuine evidence of robustness. Sixth, with multiple unmeasured confounders, a seemingly large E-value may not provide much evidence for causality. We have noted this elsewhere (4, 6). Seventh, Ioannidis and colleagues state, Some investigators may be biased to show that an effect is not causal or important and may use the E-value to conclude that confounding caused the effect. We have explicitly noted (2) an asymmetry with the E-value: It can sometimes provide evidence for robustness but cannot be used to demonstrate that no effect exists. Absence of evidence is not evidence of absence. Let us now turn to where we think Ioannidis and colleagues are incorrect. First, they state, E-values are operating on a scale of values with which users [of the biomedical literature] are unfamiliar. This is only partially correct. The scale being used is the risk ratio scale; what is being considered is the risk ratio for the outcome comparing 2 levels of the unmeasured confounder and the risk ratio for the unmeasured confounder comparing 2 levels of treatment. For a binary unmeasured confounder, these are nothing more than the risk ratios we are accustomed to and have prior insight on from existing biomedical literature. We have acknowledged (3, 4) that the interpretation is trickier when multiple or continuous unmeasured confounders are being considered. Second, they claim that the E-value has validity problems. Our derivation of the E-value included mathematical proofs (3). We are aware of no error in the derivation. Third, Ioannidis and colleagues contend that the E-value has the potential to do more harm than good. That has not been our experience, nor that of many who have used it already and have found it to be helpful and insightful. Just about any tool can be used for harm or good, and its ultimate effects depend on good practice. We strongly believe that the introduction of the E-value will do considerably more good than harm (6). We thus believe that none of the considerations above preclude the usefulness of the E-value metric. They only point to the need for careful interpretation, which we have repeatedly emphasized (24). One must be careful, but isn t care precisely what Ioannidis and colleagues are calling for? On a personal note, since teaching his first course on causal inference 12 years ago, the first author has always strongly encouraged sensitivity analysis, including many of the techniques discussed by Ioannidis and colleagues (1). In this teaching, sensitivity analysis was emphasized and homework assignments were required. Much to his dismay, students rarely used sensitivity analyses in practice in their subsequent research. When asked why not, they would often respond that the analyses were too complicated to describe in reports, were too difficult to present, and took up too much space and that reviewers and editors were often unsympathetic and believed that they could not be understood. The E-value was introduced to respond to these objectionsto make sensitivity analysis more common in actual practice. To that end, we have also provided easy-to-use software and an online E-value calculator (7). Given the relatively rapid uptake, our efforts to address these concerns seem to have had some success. Moreover, reporting both the E-value and the P value is surely preferable to reporting the P value aloneand arguably also more helpful than simply altering or redefining the already arbitrary threshold for P value significance of 0.05 (8). We are entirely in favor of carrying out a more extensive sensitivity analysis whenever possible. However, despite considerable efforts to encourage this, we have found it to be infrequently done in practice. The E-value offers an easy-to-use alternative, albeit with its own limitations. Strive for perfection by all means, but let us not make the perfect the enemy of the good.