[PDF] A shared neural encoding model for the prediction of subject-specific fMRI response

Abstract

Full PDF

AA shared neural encoding model for theprediction of subject-speciﬁc fMRI response

Meenakshi Khosla , Gia H. Ngo , Keith Jamison , Amy Kuceyeski , , andMert R. Sabuncu , ,

1. School of Electrical & Computer Engineering, Cornell University2. Radiology, Weill Cornell Medical College3. Brain and Mind Research Institute, Weill Cornell Medical College4. Nancy E. & Peter C. Meinig School of Biomedical Engineering, Cornell University

Abstract.

The increasing popularity of naturalistic paradigms in fMRI(such as movie watching) demands novel strategies for multi-subject dataanalysis, such as use of neural encoding models. In the present study, wepropose a shared convolutional neural encoding method that accounts forindividual-level diﬀerences. Our method leverages multi-subject data toimprove the prediction of subject-speciﬁc responses evoked by visual orauditory stimuli. We showcase our approach on high-resolution 7T fMRIdata from the Human Connectome Project movie-watching protocol anddemonstrate signiﬁcant improvement over single-subject encoding mod-els. We further demonstrate the ability of the shared encoding model tosuccessfully capture meaningful individual diﬀerences in response to tra-ditional task-based facial and scenes stimuli. Taken together, our ﬁnd-ings suggest that inter-subject knowledge transfer can be beneﬁcial tosubject-speciﬁc predictive models. Naturalistic imaging paradigms, such as movies and stories, emulate the di-versity and complexity of real-life sensory experiences, thereby opening a novelwindow into the brain. The last decade has seen an increased foothold of natural-istic paradigms in cognitive neuroimaging, fueled by the remarkable discovery ofinter-subject synchrony during naturalistic viewing [1]. Naturalistic stimuli alsodemonstrate increased test-retest reliability and more active subject engagementin comparison to alternate paradigms such as resting-state fMRI [2]. Further-more, experiments have shown that naturalistic stimuli can induce stronger neu-ral response than task-based stimuli [3], suggesting that the brain is intrinsicallymore attuned to the former. Taken together, these beneﬁts suggest an excitingfuture for naturalistic stimulation protocols in fMRI.With large-scale compilation of multi-subject neural data through open-source initiatives such as the Human Connectome Project (HCP) [4], the de-velopment of approaches that can handle this enormous data is becoming im-perative. Two approaches, namely inter-subject correlation (ISC) analysis [1, 5] Our code is available at https://github.com/mk2299/SharedEncoding_MICCAI . a r X i v : . [ q - b i o . N C ] J u l Khosla et al. and shared response model (SRM) [6], have dominated the analysis of multi-subject fMRI data under naturalistic conditions. The former approach exploitssimilarity in activation patterns across subjects to isolate stimulus-induced pro-cessing. The latter technique, SRM, decomposes neural activity into a sharedresponse component and subject-speciﬁc spatial bases, and has been used forinter-subject knowledge transfer through functional alignment. While simple andeﬃcient, both these approaches rely on a common time-locked stimulus acrosssubjects and cannot, by design, model responses to completely unseen stimuli.On the other hand, predictive modelling of neural activity through encodingmodels is based upon generalization to arbitrary stimuli and can thus oﬀer moreholistic descriptions of sensory processing in an individual [7].Neural encoding models map stimuli to ﬁne-grained voxel-level response pat-terns via complex feature transformations. Previously, neural encoding modelshave yielded several novel insights into the functional organization of auditoryand visual cortices [8–11]. Encoding models encapsulating diﬀerent hypothesisabout neural information processing can be pitted against each other to shed newlight on how information is represented in the brain. In this manner, neural en-coding models have been largely used for making group-level inferences. The po-tential to extract meaningful individual diﬀerences from naturalistic paradigmsremains largely untapped. Understanding inter-subject variability in behavior-to-brain representations is of key interest to neuroscience and can potentiallyeven help identify atypical response patterns [12]. Modelling individual brainfunction in response to naturalistic stimuli is one step in this direction; however,building accurate individual-level models of brain function often requires largeamounts of data per subject for good generalization. The problem is furtherexacerbated by the variability in anatomy and functional topographies acrossindividuals, making inter-subject knowledge transfer diﬃcult. There is limitedwork in leveraging multi-subject data for more robust and accurate individual-ized neural encoding. To our knowledge, this problem has been studied only inthe context of natural vision with a handful subjects using a Bayesian frame-work [13]. Further, the proposed method in [13] transfers knowledge from onesubject’s encoding model into another through a two-stage procedure and doesnot allow simultaneous optimization of encoding models across multiple subjects.In this paper, we attempt to ﬁll this gap; to this eﬀect, we propose a deep-learning based framework to build more powerful individual-level encoding mod-els by leveraging multi-subject data. Recent studies have revealed that coarse-grained response topographies are highly similar across subjects, suggesting thatindividual idiosyncrasies manifest in more ﬁne-grained response patterns [6, 14].This hints to the idea that encoding models could share representational spacesacross subjects to overcome the challenges imposed by a limited quantity of per-subject data. We exploit this intuition to develop a neural encoding model witha common backbone architecture for capturing shared response and subject-speciﬁc projections that account for individual response biases, as demonstratedin Figure 1. Our proposed approach has several merits: (i) It allows us to combinedata from multiple subjects watching same or diﬀerent movies to build a global rediction of subject-speciﬁc fMRI response 3

Fig. 1. Proposed approach:

Feature pyramid networks are used to extract hierar-chical features from pre-trained image/sound recognition networks. Dense features arereshaped into coarse 3D feature maps, which are mapped into increasingly ﬁne-grainedmaps using convolutions. Coarse feature transformation layers are shared across sub-jects while deeper convolutional layers close to predicted response are subject-speciﬁc. model of the brain. At the same time, it can capture meaningful individual-leveldeviations from the global model which can potentially be related to individual-speciﬁc traits. (ii) It is amenable to incremental learning with diverse, varyingstimuli across seen or novel subjects with less constraints on data collection fromsingle subjects. (iii) It poses minimal memory overhead with additional subjectsand can thus handle fMRI datasets with a large number of subjects.

Our proposed methodology is illustrated in Figure 1. Neural encoding modelscomprise two components: (a) a feature extractor, which pulls out relevant fea-tures from raw images or audio waveforms and (b) a response model, which mapsthese stimuli features into brain responses. In contrast to existing works that em-ploy a linear response model [9, 11], we propose a CNN-based response modelwhere the coarse 3D feature maps are shared across subjects and ﬁne-grainedfeature maps are individual-speciﬁc. Previous studies have reported a corticalprocessing hierarchy where low-level features from early layers of a CNN-basedfeature extractor best predict responses in early sensory areas while semantically-rich deeper layers best predict higher sensory regions [8, 9]. To account for thiseﬀect, we employ a hierarchical feature extractor based on feature pyramid net-works [15] that combines features from early, intermediate and later layers si-multaneously. The output of the feature extractor is fed into the convolutionalresponse model to predict the evoked fMRI activation. This enables us to trainboth components of the network simultaneously in an end-to-end fashion.Formally, let D = { X i , Y i } Ni =1 denote the training data pairs for N subjects,where X i denotes the stimuli presented to subject i and Y i denotes the cor-responding fMRI measurements. We represent X i as RGB images or grayscale Khosla et al. spectrograms for the visual and auditory models, respectively. The feature modelmaps the 2D input into a vector representation s and is parameterized using adeep neural network F ( X i ; φ ) that is common across subjects. In our experi-ments, this model is a feature pyramid network built upon pre-trained recog-nition networks as DNNs optimized for image or sound recognition tasks haveproven to provide powerful feature representations for encoding brain response.We deﬁne a diﬀerentiable function G ( s ; θ ) that maps the features into a sharedlatent volumetric space z , whose ﬁrst 3 axes represent the 3D voxel space andthe last axis captures the latent dimensionality. The predicted response for eachsubject is then deﬁned using subject-speciﬁc diﬀerentiable functions H i ( z ; ψ i ) that project the coarse feature maps z into an individualized brain response. Werepresent G and H i ’s using convolutional neural networks to have a suﬃcientlyexpressive model. Thus, θ and { ψ i } represent a mix of convolutional kernels ordense weight matrices. The number of shared parameters, | θ | + | φ | is kept muchgreater than the cardinality of subject-speciﬁc parameters | ψ i | to accurately es-timate the shared latent space. All parameters { φ, θ, ψ i } are trained jointly tominimize the mean squared error between the predicted and true response. Theproposed method allows us to propagate errors through the shared network evenif the subjects are not exposed to common stimuli since we can always backprop-agate errors for subjects independently within each batch. Furthermore, usingindividualized layers to account for subject-speciﬁc biases enables the model toweigh gradients coming from losses of each subject diﬀerently according to theirsignal-to-noise ratio. This makes the model less susceptible to noisy measure-ments when responses for the same stimuli are available from multiple subjects. We employ pre-trained Resnet-50 [16] and VGG-ish [17] architectures in thebottom-up path of Figure 1 to extract multi-scale features from images and au-dio spectrograms, respectively. The base architectures were selected because pre-trained weights of these networks optimized for classiﬁcation on large datasets,namely Imagenet [18] and Youtube-8M [19], were publically available. For Resnet-50, we use activations of the last residual block of each stage, namely, res2, res3,res4 and res5 (notation from [20]) to construct our stimulus descriptions s . Fromthe VGG network, we use the activations of each convolutional block, namely, conv2, conv3, conv4 and the penultimate dense layer fc2 [21]. The ﬁrst threeset of activations are reﬁned through a top-down path to enhance their semanticcontent, while the last activation is concatenated into s directly (res4 activationsare vectorized using global average pool). The top-down path comprises threefeature maps at diﬀerent resolutions with an up-sampling factor of 2 successivelyfrom the deepest layer of the bottom-up path. Each such feature map comprising256 channels is merged with the corresponding feature map in the bottom-uppath (reduced to 256 channels by 1x1 convolutions) by element-wise addition.Subsequently, the feature map at each resolution is collapsed into a 256 dimen-sional feature vector through a global average pool operation and concatenatedinto s . The aggregated features are then passed onto a shared CNN (denoted rediction of subject-speciﬁc fMRI response 5 G above) comprising the following feedforward computation: a fully connectedlayer to map the features into a vector space which is reshaped into a 1024-channel cuboid of size followed by two transposed convolutions(conv.T) with a stride of 2 to up-sample the latter and obtain z . Each convo-lution reduces the channel count by half, thereby, resulting in a shared latent z that is a 256 channel cuboid of size . Subject-speciﬁc functions H i ’s are parameterized as a cascade of two conv.T operations (stride 2)with output dimensions 128 and 1 respectively. It is important to emphasizethat these operations constitute much fewer parameters, thereby favoring theestimation of a shared truth. As we demonstrate empirically, a shared space al-lows much better generalization. At the same time, we ﬁnd that even the limitedsubject-speciﬁc parameters can adequately capture meaningful individual diﬀer-ences. All parameters were optimized using Adam [22] with a learning rate of1e-4. Auditory and visual models were trained for 25 and 50 epochs respectivelywith unit batch size. Validation curves were monitored to ensure convergence. We study 7T fMRI data (TR = 1s) from a randomly selected sample of N=10subjects from HCP movie-watching protocol [4, 23]. The dataset comprises 4audiovisual movies, each ∼

15 mins long. Preprocessing protocols are describedin detail in [23, 24]. For our experiments, we utilize the 1.6mm MNI-registeredvolumetric images of size

113 x 136 x 113 per TR. We compute log-mel spectro-grams using same parameters as [17] over every 1 second of audio waveform toobtain a 2D image-like input for the VGG audio feature extractor. We extractthe last frame of every second of the video to present to the image recognitionnetwork for visual features. We estimate a hemodynamic delay of sec usingregression based encoding models, as the response latency that yields highest en-coding performance. Thus, all proposed and baseline models are trained to usethe above stimuli to predict the fMRI response 4 seconds after the correspondingstimulus presentation. We train and validate our models on three movies using a9:1 train-val split and leave the fourth movie for independent testing. This yields2000 training, 265 validation and 699 test stimulus-response pairs per subject. Linear response model (individual subject): Here, we train independent mod-els for each subject using linear response models. We note that, thus far, thisis the dominant approach to neural encoding. To enable a fair comparison,we extract hierarchical features of the same dimensionality as the proposedmodel to present to the linear regressor. The only diﬀerence here is the lackof a top-down pathway (since it is not pre-trained), which prevents the reﬁne-ment of coarse feature maps before aggregation. We apply l regularizationon the regression coeﬃcients and adjust the optimal strength of this penaltythrough cross-validation using log-spaced values in { − , } . We reportthe performance of the best model as ‘Individual (Linear)’. Khosla et al. – CNN response model (individual subject): Here, we employ the same archi-tecture as the proposed model but with only one branch of subject-speciﬁclayers. We train this network independently for each subject without weightsharing and denote its performance as ‘Individual model (CNN)’. – Shared model (mean): Here, we employ the proposed model after training butinstead of computing predictions using the same subject’s learned weights,we compute N predictions from all subject-speciﬁc branches. We computethe mean performance obtained by correlating each of these predictions withthe ground truth response of a subject and denote this as ‘Shared (mean)’. We measure performance on the test movie by computing the

Pearson’s cor-relation coeﬃcient between the predicted and measured fMRI response at eachvoxel. Since diﬀerent subjects have a diﬀerent signal-to-noise ratio, we normalizeeach voxel’s correlation by the subject’s noise ceiling for that voxel. We computethe subject-speciﬁc noise ceiling by correlating their repeated measurements ona validation clip. Further, since we are only interested in the stimulus-drivenresponse, we measure performance in voxels that exhibit high inter-subject cor-relations. We randomly split the 10 subjects into groups of 5, and correlate themean activity of the two groups. We repeat this process 5 times and voxels thatexhibit a mean correlation greater than 0.1 are identiﬁed as synchronous voxels.We compute the mean normalized correlations across all synchronous voxels toachieve a single metric per subject, denoted as ‘Prediction accuracy’. We alsocorrelate the predicted response of each subject against the predicted and trueresponse of every other subject to obtain an N × N correlation matrix for sharedmodels. To account for higher variability in measured versus predicted response,we normalize the rows and columns of this correlation matrix following [25]. To investigate if the proposed model is indeed capturing meaningful individualdiﬀerences, we use the trained encoding model to predict fMRI activations fordistinct visual object categories from the HCP task battery. Speciﬁcally, we pre-dict brain response to visual stimuli (comprising faces, places, tools and bodyparts) from the HCP Working Memory (WM) task and use the predicted re-sponse to synthesize face and scene contrasts (FACES-AVG and PLACES-AVGrespectively) for each individual. The predicted and true contrasts are thresh-olded to keep top of the voxels. We compute the Dice overlap between thepredicted contrast for each subject against the true contrast of every subject(including self) to produce an N × N matrix for each contrast. Figure 2 shows prediction accuracy of the proposed (‘Shared’) and baseline meth-ods for each subject. The performance improvement is striking between proposed rediction of subject-speciﬁc fMRI response 7

Fig. 2. Quantitative evaluation : Bar charts illustrate subject-wise prediction accu-racy of all models, box plots depict the distribution over subjects for % of synchronousvoxels signiﬁcantly predicted (p<0.05, FDR corrected). N × N correlation matrices de-pict the (normalized) correlation coeﬃcient between predicted and measured responses. and individual subject models, suggesting that a shared backbone architecturecan signiﬁcantly boost generalization. Comparative boxplots further show thatthe proposed method predicts a much higher percentage of the synchronous cor-tex than individual subject models. Further, the diﬀerence between ‘Shared’ and‘Shared (mean)’ as well as the dominant diagonal structure in correlation matri-ces suggest that the proposed method is indeed capturing subject idiosyncrasiesrather than predicting a group-averaged response. Further, while the CNN re-sponse model performs slightly better in visual encoding, it incurs a performancedrop compared to linear regression in auditory encoding. This perhaps suggeststhat the boost in accuracy seen for shared models is largely due to inter-subjectknowledge transfer rather than the convolutional response model itself.In Figure 3(A) & 3(B), we visualize the un-normalized correlations betweenthe predicted and measured fMRI response for the proposed models, averagedacross subjects. For the auditory model, we see signiﬁcant correlations in theparabelt auditory cortex, extending into the superior temporal sulcus and someother language areas (55b) as well. For the visual model, while we see signiﬁcantcorrelations across the entire visual cortex (V1-V8), the performance is muchbetter in higher-order visual regions, presumably because of the semanticallyrich features. The lower performance in early visual regions could also resultfrom the dynamic nature of visual stimulation in movies.Figure 3(C) & 3(D) illustrate the ability of our proposed model to character-ize individual diﬀerences even beyond the experimental paradigm it was trainedon. The diagonal dominance in the dice matrix for both contrasts suggests thatpredicted contrasts are most similar to the same subject’s true contrast. Noprominent diagonal structure was observed for individual subject models, pre-sumably because of their poor generalization to out-of-domain stimuli from theHCP task battery. Further, predicted contrasts consistently highlight known ar- Khosla et al. eas for face and scene processing, namely the fusiform face area [26] and parahip-pocampal areas [27] respectively.

Fig. 3. (A), (B) Correlations between predicted response of the proposed model andtrue time series of each voxel averaged across subjects. Only signiﬁcantly predictedvoxels are shown (p<0.05, FDR corrected). Dice matrices of predicted versus true con-trasts for (C) faces and (D) scenes stimuli. (E) & (F) depict contrasts of two randomlyselected subjects. ROIs are labelled from the HCP MMP parcellation [28].

In this paper, we presented a framework for utilizing multi-subject fMRI data toimprove individual-level neural encoding. We showcased our approach on bothauditory and visual stimuli and demonstrated consistent improvement over com-peting approaches. Our experiments further suggest that a single experiment(free-viewing of movies) can characterize a multitude of brain processes at once.This has important implications for brain mapping which traditionally relies on abattery of carefully-constructed stimuli administered within block-designs. Inter-subject variability in response patterns induced by the complexity of naturalisticviewing can facilitate the development of novel imaging-based biomarkers. Neu-ral encoding models are not constrained to modeling the response to a limitedset of experimental stimuli; their good generalization performance suggests that rediction of subject-speciﬁc fMRI response 9 they can capture broad theories of cognitive processing. Accurate, individualizedneural encoding models can thus bring us one step closer to achieving the goalof biomarker discovery.

Acknowledgements

This work was supported by NIH grants R01LM012719 (MS), R01AG053949(MS), R21NS10463401 (AK), R01NS10264601A1 (AK), the NSF NeuroNex grant1707312 (MS), the NSF CAREER 1748377 grant (MS) and Anna-Maria andStephen Kellen Foundation Junior Faculty Fellowship (AK).

References

1. U. Hasson, Y. Nir, I. Levy, G. Fuhrmann, and R. Malach. Intersubject synchro-nization of cortical activity during natural vision.

Science , 303(5664):1634–1640,Mar 2004.2. S. Sonkusare, M. Breakspear, and C. Guo. Naturalistic Stimuli in Neuroscience:Critically Acclaimed.

Trends Cogn. Sci. (Regul. Ed.) , 23(8):699–714, Aug 2019.3. J. Schultz and K. S. Pilz. Natural facial motion enhances cortical responses tofaces.

Exp Brain Res , 194(3):465–475, Apr 2009.4. M. F. Glasser, S. N. Sotiropoulos, J. A. Wilson, T. S. Coalson, B. Fischl, J. L.Andersson, J. Xu, S. Jbabdi, M. Webster, J. R. Polimeni, D. C. Van Essen, andM. Jenkinson. The minimal preprocessing pipelines for the Human ConnectomeProject.

Neuroimage , 80:105–124, Oct 2013.5. U. Hasson, R. Malach, and D. J. Heeger. Reliability of cortical activity duringnatural stimulation.

Trends Cogn. Sci. (Regul. Ed.) , 14(1):40–48, Jan 2010.6. Po-Hsuan Cameron Chen, Janice Chen, Yaara Yeshurun, Uri Hasson, James V.Haxby, and Peter J. Ramadge. A reduced-dimension fMRI shared response model.In

NIPS , 2015.7. G. Varoquaux and R. A. Poldrack. Predictive models avoid excessive reductionismin cognitive neuroimaging.

Curr. Opin. Neurobiol. , 55:1–6, 04 2019.8. Alexander J.E. Kell, Daniel L.K. Yamins, Erica N. Shook, Sam V. Norman-Haignere, and Josh H. McDermott. A Task-Optimized Neural Network ReplicatesHuman Auditory Behavior, Predicts Brain Responses, and Reveals a Cortical Pro-cessing Hierarchy.

Neuron , 98(3):630–644.e16, may 2018.9. U. Guclu and M. A. van Gerven. Deep Neural Networks Reveal a Gradient in theComplexity of Neural Representations across the Ventral Stream.

J. Neurosci. ,35(27):10005–10014, Jul 2015.10. D. L. Yamins, H. Hong, C. F. Cadieu, E. A. Solomon, D. Seibert, and J. J. DiCarlo.Performance-optimized hierarchical models predict neural responses in higher vi-sual cortex.

Proc. Natl. Acad. Sci. U.S.A. , 111(23):8619–8624, Jun 2014.11. Haiguang Wen, Junxing Shi, Yizhen Zhang, Kun Han Lu, Jiayue Cao, and Zhong-ming Liu. Neural encoding and decoding with deep learning for dynamic naturalvision.

Cerebral Cortex , 28(12):4136–4160, dec 2018.12. J. Dubois and R. Adolphs. Building a Science of Individual Diﬀerences from fMRI.

Trends Cogn. Sci. (Regul. Ed.) , 20(6):425–443, 06 2016.13. Haiguang Wen, Junxing Shi, Wei Chen, and Zhongming Liu. Transferring and gen-eralizing deep-learning-based neural encoding models across subjects.

NeuroImage ,176:152–163, aug 2018.0 Khosla et al.14. Umut Güçlü and Marcel A.J. van Gerven. Increasingly complex representations ofnatural movies across the dorsal stream are shared between subjects.

NeuroImage ,145:329–336, jan 2017.15. Tsung-Yi Lin, Piotr Dollár, Ross B. Girshick, Kaiming He, Bharath Hariharan,and Serge J. Belongie. Feature pyramid networks for object detection. , pages 936–944,2016.16. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learningfor image recognition. , pages 770–778, 2015.17. Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, ArenJansen, R. Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, BryanSeybold, Malcolm Slaney, Ron J. Weiss, and Kevin W. Wilson. Cnn architec-tures for large-scale audio classiﬁcation. , pages 131–135, 2016.18. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. Imagenet:A large-scale hierarchical image database. , pages 248–255, 2009.19. Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Apostol Natsev, GeorgeToderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. Youtube-8m: A large-scale video classiﬁcation benchmark.

ArXiv , abs/1609.08675, 2016.20. Ross Girshick, Ilija Radosavovic, Georgia Gkioxari, Piotr Dollár, and Kaiming He.Detectron. https://github.com/facebookresearch/detectron , 2018.21. S. Hershley and et. al. Models for audioset: A large scale dataset of au-dio events. https://github.com/tensorflow/models/tree/master/research/audioset/vggish , 2016.22. Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization.

CoRR , abs/1412.6980, 2014.23. D. C. Van Essen, K. Ugurbil, E. Auerbach, D. Barch, T. E. Behrens, R. Bucholz,A. Chang, L. Chen, M. Corbetta, S. W. Curtiss, S. Della Penna, D. Feinberg,M. F. Glasser, N. Harel, A. C. Heath, L. Larson-Prior, D. Marcus, G. Michalareas,S. Moeller, R. Oostenveld, S. E. Petersen, F. Prior, B. L. Schlaggar, S. M. Smith,A. Z. Snyder, J. Xu, and E. Yacoub. The Human Connectome Project: a dataacquisition perspective.

Neuroimage , 62(4):2222–2231, Oct 2012.24. A. T Vu, K. Jamison, M. F. Glasser, S. M. Smith, T. Coalson, S. Moeller, E. J.Auerbach, K. Ugurbil, and E. Yacoub. Tradeoﬀs in pushing the spatial resolutionof fMRI for the 7T Human Connectome Project.

Neuroimage , 154:23–32, 07 2017.25. I. Tavor, O. Parker Jones, R. B. Mars, S. M. Smith, T. E. Behrens, and S. Jbabdi.Task-free MRI predicts individual diﬀerences in brain activity during task perfor-mance.

Science , 352(6282):216–220, Apr 2016.26. N. Kanwisher, J. McDermott, and M. M. Chun. The fusiform face area: a mod-ule in human extrastriate cortex specialized for face perception.

J. Neurosci. ,17(11):4302–4311, Jun 1997.27. S. Nasr, N. Liu, K. J. Devaney, X. Yue, R. Rajimehr, L. G. Ungerleider, and R. B.Tootell. Scene-selective cortical regions in human and nonhuman primates.

J.Neurosci. , 31(39):13771–13785, Sep 2011.28. M. F. Glasser, T. S. Coalson, E. C. Robinson, C. D. Hacker, J. Harwell, E. Ya-coub, K. Ugurbil, J. Andersson, C. F. Beckmann, M. Jenkinson, S. M. Smith, andD. C. Van Essen. A multi-modal parcellation of human cerebral cortex.