Psychological Science | 2021

Functional MRI Can Be Highly Reliable, but It Depends on What You Measure: A Commentary on Elliott et al. (2020)

 
 
 
 
 

Abstract


In a recent article, Elliott and colleagues (2020) evaluated the reliability of individual differences in taskbased functional MRI (fMRI) activity and found reliability to be poor. They concluded that “commonly used taskfMRI measures generally do not have the test-retest reliability necessary for biomarker discovery or brain– behavior mapping” (p. 801). This is an important and timely effort, and we applaud it for spotlighting the need to evaluate the measurement properties of fMRI. Large samples combined with pattern-recognition techniques have made translational applications finally seem within reach. As the field gets serious about using the brain to predict behavior and health outcomes, reliability will become increasingly important. However, along with their findings and constructive criticism comes the potential for overgeneralization. Though Elliott et al. focused on arguably the most limited fMRI measure for biomarker development—the average response within individual brain regions—the article has garnered media attention that mischaracterizes its conclusions. One headline reads, “every brain activity study you’ve ever read is wrong” (Cohen, 2020). The causes of anti-fMRI sentiments are not our concern here, but it is important to specify the boundary conditions of Elliott et al.’s critique. As they suggest and we show below, fMRI can exhibit high test-retest reliability when multivariate measures are used. These measures, however, were not evaluated by Elliot et al., despite being commonly used for biomarker discovery (Woo, Chang, et al., 2017). Thus, their conclusions do not apply to all “common task-fMRI measures” but to a particular subset that does not represent the state of the art. Moreover, there are multiple use cases for fMRI biomarkers (FDA-NIH Biomarker Working Group, 2016)—many of which do not require high test-retest reliability (cf. Elliott et al.; Fig. 1a). Test-retest reliability estimates summarized by Elliott et al. reflect several limitations of the studies in their sample. These studies had (a) small sample sizes; (b) little data per participant (as little as 5 min); (c) singletask rather than composite-task measures, which can be more reliable (Gianaros et al., 2017; Kragel, Kano, et al., 2018); and (d) variable test-retest intervals, up to 140 days in the Human Connectome Project (HCP) data; in addition, they were limited to activity in individual brain regions. Multivariate measures optimized using machine learning can have high test-retest reliability (Woo & Wager, 2016). Elliott et al. acknowledged this possibility, but did not provide quantitative examples. Examining some benchmarks from recent studies reveals that the situation is not nearly so dire for task-fMRI as Elliott et al. concluded. For example, Gianaros et al. (2020) identified patterns predictive of risk for cardiovascular disease using an emotional picture-viewing task and the HCP Emotion task. The same-day test-retest reliability of these measures was good to excellent (SpearmanBrown rs = .82 and .73, Ns = 338 and 427, respectively; Fig. 1b). In contrast, test-retest reliabilities of individual regions (e.g., amygdala) were much lower (rs = .11– .27). In a second example, we assessed the same-day test-retest reliability of the neurologic pain signature—a neuromarker for evoked pain—in eight fMRI studies (N = 228; data from Geuter et al., 2020; Jepma et al., 2018). Reliability was good to excellent in all studies 989730 PSSXXX10.1177/0956797621989730Kragel et al.Reliable Functional MRI article-commentary2021

Volume 32
Pages 622 - 626
DOI 10.1177/0956797621989730
Language English
Journal Psychological Science

Full Text