Decoding visual stimuli in human brain by using Anatomical Pattern Analysis on fMRI images
DDecoding visual stimuli in human brain by usingAnatomical Pattern Analysis on fMRI images
Muhammad Yousefnezhad and Daoqiang Zhang College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics, Nanjing, China [email protected], [email protected] Abstract.
A universal unanswered question in neuroscience and ma-chine learning is whether computers can decode the patterns of the hu-man brain. Multi-Voxels Pattern Analysis (MVPA) is a critical tool foraddressing this question. However, there are two challenges in the pre-vious MVPA methods, which include decreasing sparsity and noises inthe extracted features and increasing the performance of prediction. Inovercoming mentioned challenges, this paper proposes Anatomical Pat-tern Analysis (APA) for decoding visual stimuli in the human brain. Thisframework develops a novel anatomical feature extraction method and anew imbalance AdaBoost algorithm for binary classification. Further, itutilizes an Error-Correcting Output Codes (ECOC) method for multi-class prediction. APA can automatically detect active regions for eachcategory of the visual stimuli. Moreover, it enables us to combine ho-mogeneous datasets for applying advanced classification. Experimentalstudies on 4 visual categories (words, consonants, objects and scram-bled photos) demonstrate that the proposed approach achieves superiorperformance to state-of-the-art methods.
Keywords: brain decoding, multi-voxel pattern analysis, anatomical featureextraction, visual object recognition, imbalance classification
One of the key challenges in neuroscience is how the human brain activities canbe mapped to the different brain tasks. As a conjunction between neuroscienceand computer science, Multi-Voxel Pattern Analysis (MVPA) [1] addresses thisquestion by applying machine learning methods on task-based functional Mag-netic Resonance Imaging (fMRI) datasets. Analyzing the patterns of visual ob-jects is one of the most interesting topics in MVPA, which can enable us tounderstand how brain stores and processes the visual stimuli [2,3]. It can beused for finding novel treatments for mental diseases or even creating a newgeneration of the user interface in the future.Technically, there are two challenges in previous studies. The first challenge isdecreasing sparsity and noise in preprocessed voxels. Since, most of the previousstudies directly utilized voxels for predicting the stimuli, the trained features are a r X i v : . [ s t a t . M L ] S e p mostly sparse, high-dimensional and noisy; and they contain trivial useful infor-mation [2,3,4]. The second challenge is increasing the performance of prediction.Most of the brain decoding problems employed binary classifiers especially byusing a one-versus-all strategy [1,2,5,6,7]. In addition, multi-class predictors areeven mostly based on the binary classifiers such as the Error-Correcting Out-put Codes (ECOC) methods [8]. Since task-based fMRI experiments are mostlyimbalance, it is so hard to train an effective binary classifier in the brain decod-ing problems. For instance, consider collected data with 10 same size categories.Since this dataset is imbalance for one-versus-all binary classification, most ofthe classical algorithms cannot provide acceptable performance [2,5,9].For facing mentioned problems, this paper proposes Anatomical PatternAnalysis (APA) as a general framework for decoding visual stimuli in the humanbrain. This framework employs a novel feature extraction method, which usesthe brain anatomical regions for generating a normalized view. In practice, thisview can enable us to combine homogeneous datasets. The feature extractionmethod also can automatically detect the active regions for each category ofthe visual stimuli. Indeed, it can decrease noise and sparsity and increase theperformance of the final result. Further, this paper develops a modified versionof imbalance AdaBoost algorithm for binary classification. This algorithm usesa supervised random sampling and penalty values, which are calculated by thecorrelation between different classes, for improving the performance of predic-tion. This binary classification will be used in a one-versus-all ECOC method asa multi-class approach for classifying the categories of the brain response.The rest of this paper is organized as follows: In Section 2, this study brieflyreviews some related works. Then, it introduces the proposed method in Section3. Experimental results are reported in Section 4; and finally, this paper presentsconclusion and pointed out some future works in Section 5. There are three different types of studies for decoding visual stimuli in the humanbrain. Pioneer studies just focused on the special regions of the human brain,such as the Fusiform Face Area (FFA) or Parahippocampal Place Area (PPA).They only proved that different stimuli can provide different responses in thoseregions, or found most effective locations based on different stimuli [2].The next group of studies introduced different correlation techniques for un-derstanding similarity or difference between responses to different visual stimuli.Haxby et al. recently showed that different visual stimuli, i.e. human faces, an-imals, etc., represent different responses in the brain [2]. Further, Rice et al.proved that not only the mentioned responses are different based on the cate-gories of the stimuli, but also they are correlated based on different properties ofthe stimuli. They used GIST technique for extracting the properties of stimuliand calculated the correlations between these properties and the brain responses.They separately reported the correlation matrices for different human faces anddifferent objects (houses, chairs, bottles, shoes) [12].
Fig. 1: Anatomical Pattern Analysis (APA) frameworkThe last group of studies proposed the MVPA techniques for predicting thecategory of visual stimuli. Cox et al. utilized linear and non-linear versions ofSupport Vector Machine (SVM) algorithm [5]. Norman et al. argued for usingSVM and Gaussian Naive Bayes classifiers [1]. Carroll et al. employed the ElasticNet for prediction and interpretation of distributed neural activity with sparsemodels [13]. Varoquaux et al. proposed a small-sample brain mapping by usingsparse recovery on spatially correlated designs with randomization and cluster-ing. Their method is applied on small sets of brain patterns for distinguishingdifferent categories based on a one-versus-one strategy [14]. McMenamin et al.studied subsystems underlie abstract-category (AC) recognition and priming ofobjects (e.g., cat, piano) and specific-exemplar (SE) recognition and primingof objects (e.g., a calico cat, a different calico cat, a grand piano, etc.). Tech-nically, they applied SVM on manually selected ROIs in the human brain forgenerating the visual stimuli predictors [6]. Mohr et al. compared four differentclassification methods, i.e. L1/2 regularized SVM, the Elastic Net, and the GraphNet, for predicting different responses in the human brain. They show that L1-regularization can improve classification performance while simultaneously pro-viding highly specific and interpretable discriminative activation patterns [7].Osher et al. proposed a network (graph) based approach by using anatomicalregions of the human brain for representing and classifying the different visualstimuli responses (faces, objects, bodies, scenes) [3].
Blood Oxygen Level Dependent (BOLD) signals are used in fMRI techniquesfor representing the neural activates. Based on hyperalignment problem in thebrain decoding [2], quantity values of the BOLD signals in the same experiment for the two subjects are usually different. Therefore, MVPA techniques use thecorrelation between different voxels as the pattern of the brain response [3,4]. Asdepicted in Figure 1, each fMRI experiment includes a set of sessions (time seriesof 3D images), which can be captured by different subjects or just repeating theimaging procedure with a unique subject. Technically, each session can be par-titioned into a set of visual stimuli categories. Indeed, an independent categorydenotes a set of homogeneous conditions, which are generated by using the sametype of photos as the visual stimuli. For instance, if a subject watches 6 photosof cats and 5 photos of houses during a unique session, this 4D image includes 2different categories and 11 conditions.
Consider F ∈ R N × X × Y × Z = { number of scans ( N ) ×
3D images } for each sessionof the experiment. F can be written as a general linear model: F = Dβ + ε , where D = { number of scans (N) × P categories (regressors) } denotes the design ma-trix; ε is the noise (error of estimation); and also β = { number of categories ( P ) ×
3D images } denotes the set of correlations between voxels for the categories ofthe session. Design matrix can be calculated by convolution ( D ( t ) = ( S ∗ H )( t ))of onsets (or time series S ( t )) and the Hemodynamic Response Function (HRF)[4]. This paper uses Generalized Least Squares (GLS) approach for estimatingoptimized solution ( ˆ β = ( D (cid:124) V − D ) − D (cid:124) V − F ), where V is the covariance ma-trix of the noise ( V ar ( ε ) = V σ (cid:54) = I σ ) [4,2]. Now, this paper defines the positivecorrelation β = ˆ β > { ˆ β > , ˆ β > , . . . , ˆ β P > } = { β , β , . . . , β P } for allcategories as the active regions, where ˆ β denotes the estimated correlation, ˆ β p and β p are the correlation and positive correlation for the p -th category, respec-tively. Moreover, the data F is partitioned based on the conditions of the designmatrix as follows:ˆ C = { ˆ c , ˆ c , . . . , ˆ c Q , ˆ c , ˆ c , . . . , ˆ c Q , . . . , ˆ c p , ˆ c P , . . . , ˆ c PQ R } (1)where ˆ C denotes the set of all conditions in each session, P and Q r are respec-tively the number of categories in each session and the number of conditions ineach category. Further, ˆ c pq r = { number of scans ( K pq r ) ×
3D images } denotes the4D images for the p -th category and q r -th condition in the design matrix. Now,this paper defines the sum of all images in a condition as follows: C pq r = (cid:88) K ˆ c pq r = K pqr (cid:88) k =1 ˆ c pq r [ k, : , : , :] (2)where c pq r [ k, : , : , :] denotes all voxels in the k -th scan of q r -th condition of p -thcategory; also K pq r is the number of scans in the given condition. ζ pq r matrix isdenoted for applying the correlation of voxels on the response of each conditionas follows: ζ pq r = β p ◦ C pq r = {∀ [ x, y, z ] ∈ C pq r = ⇒ ( ζ pq r ) [ x,y,z ] = ( β p ) [ x,y,z ] × ( C pq r ) [ x,y,z ] } (3) where ◦ denotes Hadamard product; and ( C pq r ) [ x,y,z ] is the [ x, y, z ]-th voxel ofthe q r -th condition of p -th category; and also, ( β p ) [ x,y,z ] is the [ x, y, z ]-th voxelof the correlation matrix ( β values) of the p -th category.Since mapping 4D fMRI images to standard space decreases the performanceof final results, most of the previous studies use the original images insteadof the standard version. By considering 3D image ζ pq r for each condition, thispaper enables to map brain activities to a standard space. This mapping canprovide normalized view for combing homogeneous datasets. For registering ζ pq r to standard space, this paper utilizes the FLIRT algorithm [10], which minimizesthe following cost function: T ∗ = argmin T ∈ S T ( N M I ( Ref, Ξ pq r )) (4)where Ref denotes the reference image, S T is the space of allowable transforma-tions, the function N M I denotes the Normalized Mutual Information betweentwo images, Ξ pq r = T ( ζ pq r ) is the condition after registration ( T denotes the trans-formation function) [10]. The performance of (4) will be analyzed in Section 4.Now, consider Atlas = { A , A , . . . , A L } , where ∩ Ll =1 { A l } = ∅ , ∪ Ll =1 { A l } = A and A l denotes the set of indexes of voxels for the l -th region. The extractedfeature for l -th region of q r -th condition of p -th category is calculated as follows,where a v = [ x v , y v , z v ] denotes the index of v -th voxel of l -th atlas region; and A l is the set of indexes of voxels in the l -th region. ∀ a v = [ x v , y v , z v ] ∈ A l = ⇒ Γ pq r ( l ) = 1 | A l | | A l | (cid:88) v =1 ( Ξ pq r )[ a v ] = 1 | A l | | A l | (cid:88) v =1 ( Ξ pq r )[ x v , y v , z v ](5) This paper randomly partitions the extracted features G = { [ Γ (1) . . . Γ ( L )] , . . . , [ Γ Q (1) . . . Γ Q ( L )] , . . . , [ Γ PQ R (1) . . . Γ PQ R ( L )] } to the train set ( G tr ) and the testset ( G te ). As a new branch of AdaBoost algorithm, Algorithm 1 employs G tr for training binary classification. Then, G te is utilized for estimating the perfor-mance of the classifier. As mentioned before, training binary classification forfMRI analysis is mostly imbalance, especially by using a one-versus-all strategy.As a result, the number of samples in one of these binary classes is smaller thanthe other class. This paper also exploits this concept. Indeed, Algorithm 1 firstlypartitions the train data ( G tr ) to small ( G Str ) and large ( G Ltr ) classes (groups)based on the class labels ( I tr ∈ { +1 , − } ). Then, it calculates the scale ( J ) ofexisted elements between two classes; and employs this scale as the number ofthe ensemble iteration ( J +1). Here, Int () denotes the floor function. In the nextstep, the large class is randomly partitioned to J parts. Now, train data ( G j ) foreach iteration is generated by all instances of the small class ( G Str ), one of thepartitioned parts of the large class ( G Ltr ( j )) and the instances of the previousiteration ( ¯ G j ), which cannot truly be trained. In this algorithm, corr () functiondenotes the Pearson correlation ( corr ( A, B ) = cov ( A, B ) /σ A σ B ); and W j ∈ [0 , Algorithm 1
The proposed binary classification algorithm
Input:
Data set G tr : is train set, I tr : denotes real class labels of the train sets, Output:
Classifier E , Method:
1. Partition G tr = { G Str , G
Ltr } , where G Str , G Ltr are Small and Large classes.2. Calculate J = Int ( | G Str | / | G Ltr | ) based on number of elements in classes.3. Randomly sample the G Ltr = { G Ltr (1) , . . . , G
Ltr ( J ) } .4. By considering ¯ G = ¯ I = ∅ , generating j = 1 , . . . , J + 1 classifiers:5. Construct G j = { G Str , G
Ltr ( j ) , ¯ G j } and I j = { I Str , I
Ltr ( j ) , ¯ I j }
6. Calculate W j = { w j } | G j | = (cid:40) G Str or ¯ G j − | corr ( G Str , G
Ltr ) | for instances of G Ltr ( j )7.Train θ j = Classifier ( G j , I j , W j ).8. Construct ¯ G j +1 , ¯ I j +1 as the set of instances cannot truly trained in θ j .9. If ( j ≤ J + 1): go to line 5; Else: return Θ p = { θ , . . . , θ J +1 } as final classifier. is the train weight (penalty values), which is considered for the large class. Fur-ther, Classif ier () denotes any kind of weighted classification algorithm. Thispaper uses a simple classical decision tree as the individual classification algo-rithm ( θ j ) [9].Generally, there are two techniques for applying multi-class classification. Thefirst approach directly creates the classification model such as multi-class supportvector machine [5] or neural network [1]. In contrast, (indirect) decompositiondesign uses an array of binary classifiers for solving the multi-class problems. Asone of the classical indirect methods, Error-Correcting Output Codes (ECOC)includes three components, i.e. base algorithm, encoding and decoding proce-dures [8]. As the based algorithm in the ECOC, this paper employs Algorithm1 for generating the binary classifiers ( Θ p ). Further, it uses a one-versus-all en-coding strategy for training the ECOC method, where an independent categoryof the visual stimuli is compared with the rest of categories (see Figure 1.e).Indeed, the number of classifiers in this strategy is exactly equal to the numberof categories. This method also assigns the brain response to the category withclosest hamming distance in decoding stage. This paper employs two datasets, shared by openfmri.org , for running empiri-cal studies. As the first dataset, ‘Visual Object Recognition’ (DS105) includes 71sessions (6 subjects). It also contains 8 categories of visual stimuli, i.e. gray-scaleimages of faces, houses, cats, bottles, scissors, shoes, chairs, and scrambled (non-sense) photos. This dataset is analyzed in high-level visual stimuli as the binary predictor, by considering all categories except scrambled photos as objects, andlow-level visual stimuli in the multi-class prediction. Please see [2,5] for moreinformation. As the second dataset, ‘Word and Object Processing’ (DS107) in-cludes 98 sessions (49 subjects). It contains 4 categories of visual stimuli, i.e.words, objects, scrambles, consonants. Please see [11] for more information.These datasets are preprocessed by SPM 12 ( ),i.e. slice timing, realignment, normalization, smoothing. Then, the beta valuesare calculated for each session. This paper employs the
MNI 152 T1 1mm (seeFigure 1.d) as the reference image (
Ref ) in Eq. (4) for registering the extractedconditions ( ζ ) to the standard space ( Ξ ). In addition, this paper uses Talairach
Atlas (contains L = 1105 regions) in Eq. (5) for extracting features (See Figure1.d).Figures 2.a-c demonstrate examples of brain responses to different stimuli,i.e. (a) word, (b) object, and (c) scramble. Here, gray parts show the anatomicalatlas, the colored parts (red, yellow and green) define the functional activities,and also the red rectangles illustrate the error areas after registration. Indeed,these errors can be formulated as the nonzero areas in the brain image whichare located in the zero area of the anatomical atlas (the area without regionnumber). The performance of objective function (4) on DS105, and DS107 datasets is analyzed in Figure 2.d by using different distance metrics, i.e. Woodsfunction (W), Correlation Ratio (CR), Joint Entropy (JE), Mutual Information(MI), and Normalized Mutual Information (NMI) [10]. As depicted in this figure,the NMI generated better results in comparison with other metrics.Figure 3.a and c illustrate the correlation matrix of the DS105 and DS107at the voxel level, respectively. Similarly, Figure 3.b and d show the correlationmatrix the DS105 and DS107 in the feature level, respectively. Since, brain re-sponses are sparse, high-dimensional and noisy at voxel level, it is so hard todiscriminate between different categories in Figure 2.a and c. By contrast, Fig-ure 2.b and d provide distinctive representation when the proposed method usedthe correlated patterns in each anatomical regions as the extracted features. The performance of our framework is compared with state-of-the-art methods,i.e. Cox & Savoy [5], McMenamin et al. [6], Mohr et al. [7], and Osher et al.[3], by using leave-one-out cross validation in the subject level. Further, all ofalgorithms are implemented in the MATLAB R2016a (9.0) by authors in or-der to generate experimental results. Tables 1 and 2 respectively illustrate theclassification Accuracy and Area Under the ROC Curve (AUC) for the binarypredictors based on the category of the visual stimuli. All visual stimuli in thedataset DS105 except scrambled photos are considered as the object categoryfor generating these experimental results. As depicted in the Tables 1 and 2, theproposed algorithm has achieved better performance in comparison with othermethods because it provided a better representation of neural activities by ex-ploiting the anatomical structure of the human brain. Table 3 illustrates theclassification accuracy for multi-class predictors. In this table, ‘DS105’ includes
Fig. 2: Extracted features based on different stimuli, i.e. (a) word, (b) object,and (c) scramble. (d) The effect of different objective functions in (4) on theerror of registration. (a) (b)(c) (d)
Fig. 3: The correlation matrices: (a) raw voxels and (b) extracted features of theDS105 dataset, (c) raw voxels and (d) extracted features of the DS107 dataset.
Table 1: Accuracy of binary predictors
Data Sets Cox & Savoy McMenamin et al. Mohr el al. Osher et al. Binary-APADS105-Objects 71.65 ± ± ± ± ± DS107-Words 69.89 ± ± ± ± ± DS107-Consonants 67.84 ± ± ± ± ± DS107-Objects 65.32 ± ± ± ± ± ± ± ± ± ± Table 2: Area Under the ROC Curve (AUC) of binary predictors
Data Sets Cox & Savoy McMenamin et al. Mohr el al. Osher et al. Binary-APADS105-Objects 68.37 ± ± ± ± ± DS107-Words 67.76 ± ± ± ± ± DS107-Consonants 63.84 ± ± ± ± ± DS107-Objects 63.17 ± ± ± ± ± DS107-Scramble 66.73 ± ± ± ± ± Table 3: Accuracy of multi-class predictors
Data Sets Cox & Savoy McMenamin et al. Mohr el al. Osher et al. Multi-APADS105 (P=8) 18.03 ± ± ± ± ± DS107 (P=4) 38.01 ± ± ± ± ± ALL (P=4) 32.93 ± ± ± ± ± This paper proposes Anatomical Pattern Analysis (APA) framework for decod-ing visual stimuli in the human brain. This framework uses an anatomical featureextraction method, which provides a normalized representation for combininghomogeneous datasets. Further, a new binary imbalance AdaBoost algorithmis introduced. It can increase the performance of prediction by exploiting a su-pervised random sampling and the correlation between classes. In addition, thisalgorithm is utilized in an Error-Correcting Output Codes (ECOC) method formulti-class prediction of the brain responses. Empirical studies on 4 visual cate-gories clearly show the superiority of our proposed method in comparison withthe voxel-based approaches. In future, we plan to apply the proposed method todifferent brain tasks such as low-level visual stimuli, emotion and etc. Acknowledgment
We thank the anonymous reviewers for comments. This work was supportedin part by the National Natural Science Foundation of China (61422204 and61473149), Jiangsu Natural Science Foundation for Distinguished Young Scholar(BK20130034) and NUAA Fundamental Research Funds (NE2013105).