[PDF] Decoding visual stimuli in human brain by using Anatomical Pattern Analysis on fMRI images

Abstract

A universal unanswered question in neuroscience and machine learning is whether computers can decode the patterns of the human brain. Multi-Voxels Pattern Analysis (MVPA) is a critical tool for addressing this question. However, there are two challenges in the previous MVPA methods, which include decreasing sparsity and noises in the extracted features and increasing the performance of prediction. In overcoming mentioned challenges, this paper proposes Anatomical Pattern Analysis (APA) for decoding visual stimuli in the human brain. This framework develops a novel anatomical feature extraction method and a new imbalance AdaBoost algorithm for binary classification. Further, it utilizes an Error-Correcting Output Codes (ECOC) method for multi-class prediction. APA can automatically detect active regions for each category of the visual stimuli. Moreover, it enables us to combine homogeneous datasets for applying advanced classification. Experimental studies on 4 visual categories (words, consonants, objects and scrambled photos) demonstrate that the proposed approach achieves superior performance to state-of-the-art methods.

Full PDF

DDecoding visual stimuli in human brain by usingAnatomical Pattern Analysis on fMRI images

Muhammad Yousefnezhad and Daoqiang Zhang College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics, Nanjing, China [email protected], [email protected] Abstract.

A universal unanswered question in neuroscience and ma-chine learning is whether computers can decode the patterns of the hu-man brain. Multi-Voxels Pattern Analysis (MVPA) is a critical tool foraddressing this question. However, there are two challenges in the pre-vious MVPA methods, which include decreasing sparsity and noises inthe extracted features and increasing the performance of prediction. Inovercoming mentioned challenges, this paper proposes Anatomical Pat-tern Analysis (APA) for decoding visual stimuli in the human brain. Thisframework develops a novel anatomical feature extraction method and anew imbalance AdaBoost algorithm for binary classiﬁcation. Further, itutilizes an Error-Correcting Output Codes (ECOC) method for multi-class prediction. APA can automatically detect active regions for eachcategory of the visual stimuli. Moreover, it enables us to combine ho-mogeneous datasets for applying advanced classiﬁcation. Experimentalstudies on 4 visual categories (words, consonants, objects and scram-bled photos) demonstrate that the proposed approach achieves superiorperformance to state-of-the-art methods.

Keywords: brain decoding, multi-voxel pattern analysis, anatomical featureextraction, visual object recognition, imbalance classiﬁcation

One of the key challenges in neuroscience is how the human brain activities canbe mapped to the diﬀerent brain tasks. As a conjunction between neuroscienceand computer science, Multi-Voxel Pattern Analysis (MVPA) [1] addresses thisquestion by applying machine learning methods on task-based functional Mag-netic Resonance Imaging (fMRI) datasets. Analyzing the patterns of visual ob-jects is one of the most interesting topics in MVPA, which can enable us tounderstand how brain stores and processes the visual stimuli [2,3]. It can beused for ﬁnding novel treatments for mental diseases or even creating a newgeneration of the user interface in the future.Technically, there are two challenges in previous studies. The ﬁrst challenge isdecreasing sparsity and noise in preprocessed voxels. Since, most of the previousstudies directly utilized voxels for predicting the stimuli, the trained features are a r X i v : . [ s t a t . M L ] S e p mostly sparse, high-dimensional and noisy; and they contain trivial useful infor-mation [2,3,4]. The second challenge is increasing the performance of prediction.Most of the brain decoding problems employed binary classiﬁers especially byusing a one-versus-all strategy [1,2,5,6,7]. In addition, multi-class predictors areeven mostly based on the binary classiﬁers such as the Error-Correcting Out-put Codes (ECOC) methods [8]. Since task-based fMRI experiments are mostlyimbalance, it is so hard to train an eﬀective binary classiﬁer in the brain decod-ing problems. For instance, consider collected data with 10 same size categories.Since this dataset is imbalance for one-versus-all binary classiﬁcation, most ofthe classical algorithms cannot provide acceptable performance [2,5,9].For facing mentioned problems, this paper proposes Anatomical PatternAnalysis (APA) as a general framework for decoding visual stimuli in the humanbrain. This framework employs a novel feature extraction method, which usesthe brain anatomical regions for generating a normalized view. In practice, thisview can enable us to combine homogeneous datasets. The feature extractionmethod also can automatically detect the active regions for each category ofthe visual stimuli. Indeed, it can decrease noise and sparsity and increase theperformance of the ﬁnal result. Further, this paper develops a modiﬁed versionof imbalance AdaBoost algorithm for binary classiﬁcation. This algorithm usesa supervised random sampling and penalty values, which are calculated by thecorrelation between diﬀerent classes, for improving the performance of predic-tion. This binary classiﬁcation will be used in a one-versus-all ECOC method asa multi-class approach for classifying the categories of the brain response.The rest of this paper is organized as follows: In Section 2, this study brieﬂyreviews some related works. Then, it introduces the proposed method in Section3. Experimental results are reported in Section 4; and ﬁnally, this paper presentsconclusion and pointed out some future works in Section 5. There are three diﬀerent types of studies for decoding visual stimuli in the humanbrain. Pioneer studies just focused on the special regions of the human brain,such as the Fusiform Face Area (FFA) or Parahippocampal Place Area (PPA).They only proved that diﬀerent stimuli can provide diﬀerent responses in thoseregions, or found most eﬀective locations based on diﬀerent stimuli [2].The next group of studies introduced diﬀerent correlation techniques for un-derstanding similarity or diﬀerence between responses to diﬀerent visual stimuli.Haxby et al. recently showed that diﬀerent visual stimuli, i.e. human faces, an-imals, etc., represent diﬀerent responses in the brain [2]. Further, Rice et al.proved that not only the mentioned responses are diﬀerent based on the cate-gories of the stimuli, but also they are correlated based on diﬀerent properties ofthe stimuli. They used GIST technique for extracting the properties of stimuliand calculated the correlations between these properties and the brain responses.They separately reported the correlation matrices for diﬀerent human faces anddiﬀerent objects (houses, chairs, bottles, shoes) [12].

Fig. 1: Anatomical Pattern Analysis (APA) frameworkThe last group of studies proposed the MVPA techniques for predicting thecategory of visual stimuli. Cox et al. utilized linear and non-linear versions ofSupport Vector Machine (SVM) algorithm [5]. Norman et al. argued for usingSVM and Gaussian Naive Bayes classiﬁers [1]. Carroll et al. employed the ElasticNet for prediction and interpretation of distributed neural activity with sparsemodels [13]. Varoquaux et al. proposed a small-sample brain mapping by usingsparse recovery on spatially correlated designs with randomization and cluster-ing. Their method is applied on small sets of brain patterns for distinguishingdiﬀerent categories based on a one-versus-one strategy [14]. McMenamin et al.studied subsystems underlie abstract-category (AC) recognition and priming ofobjects (e.g., cat, piano) and speciﬁc-exemplar (SE) recognition and primingof objects (e.g., a calico cat, a diﬀerent calico cat, a grand piano, etc.). Tech-nically, they applied SVM on manually selected ROIs in the human brain forgenerating the visual stimuli predictors [6]. Mohr et al. compared four diﬀerentclassiﬁcation methods, i.e. L1/2 regularized SVM, the Elastic Net, and the GraphNet, for predicting diﬀerent responses in the human brain. They show that L1-regularization can improve classiﬁcation performance while simultaneously pro-viding highly speciﬁc and interpretable discriminative activation patterns [7].Osher et al. proposed a network (graph) based approach by using anatomicalregions of the human brain for representing and classifying the diﬀerent visualstimuli responses (faces, objects, bodies, scenes) [3].

Blood Oxygen Level Dependent (BOLD) signals are used in fMRI techniquesfor representing the neural activates. Based on hyperalignment problem in thebrain decoding [2], quantity values of the BOLD signals in the same experiment for the two subjects are usually diﬀerent. Therefore, MVPA techniques use thecorrelation between diﬀerent voxels as the pattern of the brain response [3,4]. Asdepicted in Figure 1, each fMRI experiment includes a set of sessions (time seriesof 3D images), which can be captured by diﬀerent subjects or just repeating theimaging procedure with a unique subject. Technically, each session can be par-titioned into a set of visual stimuli categories. Indeed, an independent categorydenotes a set of homogeneous conditions, which are generated by using the sametype of photos as the visual stimuli. For instance, if a subject watches 6 photosof cats and 5 photos of houses during a unique session, this 4D image includes 2diﬀerent categories and 11 conditions.

Consider F ∈ R N × X × Y × Z = { number of scans ( N ) ×

3D images } for each sessionof the experiment. F can be written as a general linear model: F = Dβ + ε , where D = { number of scans (N) × P categories (regressors) } denotes the design ma-trix; ε is the noise (error of estimation); and also β = { number of categories ( P ) ×

3D images } denotes the set of correlations between voxels for the categories ofthe session. Design matrix can be calculated by convolution ( D ( t ) = ( S ∗ H )( t ))of onsets (or time series S ( t )) and the Hemodynamic Response Function (HRF)[4]. This paper uses Generalized Least Squares (GLS) approach for estimatingoptimized solution ( ˆ β = ( D (cid:124) V − D ) − D (cid:124) V − F ), where V is the covariance ma-trix of the noise ( V ar ( ε ) = V σ (cid:54) = I σ ) [4,2]. Now, this paper deﬁnes the positivecorrelation β = ˆ β > { ˆ β > , ˆ β > , . . . , ˆ β P > } = { β , β , . . . , β P } for allcategories as the active regions, where ˆ β denotes the estimated correlation, ˆ β p and β p are the correlation and positive correlation for the p -th category, respec-tively. Moreover, the data F is partitioned based on the conditions of the designmatrix as follows:ˆ C = { ˆ c , ˆ c , . . . , ˆ c Q , ˆ c , ˆ c , . . . , ˆ c Q , . . . , ˆ c p , ˆ c P , . . . , ˆ c PQ R } (1)where ˆ C denotes the set of all conditions in each session, P and Q r are respec-tively the number of categories in each session and the number of conditions ineach category. Further, ˆ c pq r = { number of scans ( K pq r ) ×

3D images } denotes the4D images for the p -th category and q r -th condition in the design matrix. Now,this paper deﬁnes the sum of all images in a condition as follows: C pq r = (cid:88) K ˆ c pq r = K pqr (cid:88) k =1 ˆ c pq r [ k, : , : , :] (2)where c pq r [ k, : , : , :] denotes all voxels in the k -th scan of q r -th condition of p -thcategory; also K pq r is the number of scans in the given condition. ζ pq r matrix isdenoted for applying the correlation of voxels on the response of each conditionas follows: ζ pq r = β p ◦ C pq r = {∀ [ x, y, z ] ∈ C pq r = ⇒ ( ζ pq r ) [ x,y,z ] = ( β p ) [ x,y,z ] × ( C pq r ) [ x,y,z ] } (3) where ◦ denotes Hadamard product; and ( C pq r ) [ x,y,z ] is the [ x, y, z ]-th voxel ofthe q r -th condition of p -th category; and also, ( β p ) [ x,y,z ] is the [ x, y, z ]-th voxelof the correlation matrix ( β values) of the p -th category.Since mapping 4D fMRI images to standard space decreases the performanceof ﬁnal results, most of the previous studies use the original images insteadof the standard version. By considering 3D image ζ pq r for each condition, thispaper enables to map brain activities to a standard space. This mapping canprovide normalized view for combing homogeneous datasets. For registering ζ pq r to standard space, this paper utilizes the FLIRT algorithm [10], which minimizesthe following cost function: T ∗ = argmin T ∈ S T ( N M I ( Ref, Ξ pq r )) (4)where Ref denotes the reference image, S T is the space of allowable transforma-tions, the function N M I denotes the Normalized Mutual Information betweentwo images, Ξ pq r = T ( ζ pq r ) is the condition after registration ( T denotes the trans-formation function) [10]. The performance of (4) will be analyzed in Section 4.Now, consider Atlas = { A , A , . . . , A L } , where ∩ Ll =1 { A l } = ∅ , ∪ Ll =1 { A l } = A and A l denotes the set of indexes of voxels for the l -th region. The extractedfeature for l -th region of q r -th condition of p -th category is calculated as follows,where a v = [ x v , y v , z v ] denotes the index of v -th voxel of l -th atlas region; and A l is the set of indexes of voxels in the l -th region. ∀ a v = [ x v , y v , z v ] ∈ A l = ⇒ Γ pq r ( l ) = 1 | A l | | A l | (cid:88) v =1 ( Ξ pq r )[ a v ] = 1 | A l | | A l | (cid:88) v =1 ( Ξ pq r )[ x v , y v , z v ](5) This paper randomly partitions the extracted features G = { [ Γ (1) . . . Γ ( L )] , . . . , [ Γ Q (1) . . . Γ Q ( L )] , . . . , [ Γ PQ R (1) . . . Γ PQ R ( L )] } to the train set ( G tr ) and the testset ( G te ). As a new branch of AdaBoost algorithm, Algorithm 1 employs G tr for training binary classiﬁcation. Then, G te is utilized for estimating the perfor-mance of the classiﬁer. As mentioned before, training binary classiﬁcation forfMRI analysis is mostly imbalance, especially by using a one-versus-all strategy.As a result, the number of samples in one of these binary classes is smaller thanthe other class. This paper also exploits this concept. Indeed, Algorithm 1 ﬁrstlypartitions the train data ( G tr ) to small ( G Str ) and large ( G Ltr ) classes (groups)based on the class labels ( I tr ∈ { +1 , − } ). Then, it calculates the scale ( J ) ofexisted elements between two classes; and employs this scale as the number ofthe ensemble iteration ( J +1). Here, Int () denotes the ﬂoor function. In the nextstep, the large class is randomly partitioned to J parts. Now, train data ( G j ) foreach iteration is generated by all instances of the small class ( G Str ), one of thepartitioned parts of the large class ( G Ltr ( j )) and the instances of the previousiteration ( ¯ G j ), which cannot truly be trained. In this algorithm, corr () functiondenotes the Pearson correlation ( corr ( A, B ) = cov ( A, B ) /σ A σ B ); and W j ∈ [0 , Algorithm 1

The proposed binary classiﬁcation algorithm

Input:

Data set G tr : is train set, I tr : denotes real class labels of the train sets, Output:

Classiﬁer E , Method:

1. Partition G tr = { G Str , G

Ltr } , where G Str , G Ltr are Small and Large classes.2. Calculate J = Int ( | G Str | / | G Ltr | ) based on number of elements in classes.3. Randomly sample the G Ltr = { G Ltr (1) , . . . , G

Ltr ( J ) } .4. By considering ¯ G = ¯ I = ∅ , generating j = 1 , . . . , J + 1 classiﬁers:5. Construct G j = { G Str , G

Ltr ( j ) , ¯ G j } and I j = { I Str , I

Ltr ( j ) , ¯ I j }

6. Calculate W j = { w j } | G j | = (cid:40) G Str or ¯ G j − | corr ( G Str , G

Ltr ) | for instances of G Ltr ( j )7.Train θ j = Classifier ( G j , I j , W j ).8. Construct ¯ G j +1 , ¯ I j +1 as the set of instances cannot truly trained in θ j .9. If ( j ≤ J + 1): go to line 5; Else: return Θ p = { θ , . . . , θ J +1 } as ﬁnal classiﬁer. is the train weight (penalty values), which is considered for the large class. Fur-ther, Classif ier () denotes any kind of weighted classiﬁcation algorithm. Thispaper uses a simple classical decision tree as the individual classiﬁcation algo-rithm ( θ j ) [9].Generally, there are two techniques for applying multi-class classiﬁcation. Theﬁrst approach directly creates the classiﬁcation model such as multi-class supportvector machine [5] or neural network [1]. In contrast, (indirect) decompositiondesign uses an array of binary classiﬁers for solving the multi-class problems. Asone of the classical indirect methods, Error-Correcting Output Codes (ECOC)includes three components, i.e. base algorithm, encoding and decoding proce-dures [8]. As the based algorithm in the ECOC, this paper employs Algorithm1 for generating the binary classiﬁers ( Θ p ). Further, it uses a one-versus-all en-coding strategy for training the ECOC method, where an independent categoryof the visual stimuli is compared with the rest of categories (see Figure 1.e).Indeed, the number of classiﬁers in this strategy is exactly equal to the numberof categories. This method also assigns the brain response to the category withclosest hamming distance in decoding stage. This paper employs two datasets, shared by openfmri.org , for running empiri-cal studies. As the ﬁrst dataset, ‘Visual Object Recognition’ (DS105) includes 71sessions (6 subjects). It also contains 8 categories of visual stimuli, i.e. gray-scaleimages of faces, houses, cats, bottles, scissors, shoes, chairs, and scrambled (non-sense) photos. This dataset is analyzed in high-level visual stimuli as the binary predictor, by considering all categories except scrambled photos as objects, andlow-level visual stimuli in the multi-class prediction. Please see [2,5] for moreinformation. As the second dataset, ‘Word and Object Processing’ (DS107) in-cludes 98 sessions (49 subjects). It contains 4 categories of visual stimuli, i.e.words, objects, scrambles, consonants. Please see [11] for more information.These datasets are preprocessed by SPM 12 ( ),i.e. slice timing, realignment, normalization, smoothing. Then, the beta valuesare calculated for each session. This paper employs the

MNI 152 T1 1mm (seeFigure 1.d) as the reference image (

Ref ) in Eq. (4) for registering the extractedconditions ( ζ ) to the standard space ( Ξ ). In addition, this paper uses Talairach

Atlas (contains L = 1105 regions) in Eq. (5) for extracting features (See Figure1.d).Figures 2.a-c demonstrate examples of brain responses to diﬀerent stimuli,i.e. (a) word, (b) object, and (c) scramble. Here, gray parts show the anatomicalatlas, the colored parts (red, yellow and green) deﬁne the functional activities,and also the red rectangles illustrate the error areas after registration. Indeed,these errors can be formulated as the nonzero areas in the brain image whichare located in the zero area of the anatomical atlas (the area without regionnumber). The performance of objective function (4) on DS105, and DS107 datasets is analyzed in Figure 2.d by using diﬀerent distance metrics, i.e. Woodsfunction (W), Correlation Ratio (CR), Joint Entropy (JE), Mutual Information(MI), and Normalized Mutual Information (NMI) [10]. As depicted in this ﬁgure,the NMI generated better results in comparison with other metrics.Figure 3.a and c illustrate the correlation matrix of the DS105 and DS107at the voxel level, respectively. Similarly, Figure 3.b and d show the correlationmatrix the DS105 and DS107 in the feature level, respectively. Since, brain re-sponses are sparse, high-dimensional and noisy at voxel level, it is so hard todiscriminate between diﬀerent categories in Figure 2.a and c. By contrast, Fig-ure 2.b and d provide distinctive representation when the proposed method usedthe correlated patterns in each anatomical regions as the extracted features. The performance of our framework is compared with state-of-the-art methods,i.e. Cox & Savoy [5], McMenamin et al. [6], Mohr et al. [7], and Osher et al.[3], by using leave-one-out cross validation in the subject level. Further, all ofalgorithms are implemented in the MATLAB R2016a (9.0) by authors in or-der to generate experimental results. Tables 1 and 2 respectively illustrate theclassiﬁcation Accuracy and Area Under the ROC Curve (AUC) for the binarypredictors based on the category of the visual stimuli. All visual stimuli in thedataset DS105 except scrambled photos are considered as the object categoryfor generating these experimental results. As depicted in the Tables 1 and 2, theproposed algorithm has achieved better performance in comparison with othermethods because it provided a better representation of neural activities by ex-ploiting the anatomical structure of the human brain. Table 3 illustrates theclassiﬁcation accuracy for multi-class predictors. In this table, ‘DS105’ includes

Fig. 2: Extracted features based on diﬀerent stimuli, i.e. (a) word, (b) object,and (c) scramble. (d) The eﬀect of diﬀerent objective functions in (4) on theerror of registration. (a) (b)(c) (d)

Fig. 3: The correlation matrices: (a) raw voxels and (b) extracted features of theDS105 dataset, (c) raw voxels and (d) extracted features of the DS107 dataset.

Table 1: Accuracy of binary predictors

Data Sets Cox & Savoy McMenamin et al. Mohr el al. Osher et al. Binary-APADS105-Objects 71.65 ± ± ± ± ± DS107-Words 69.89 ± ± ± ± ± DS107-Consonants 67.84 ± ± ± ± ± DS107-Objects 65.32 ± ± ± ± ± ± ± ± ± ± Table 2: Area Under the ROC Curve (AUC) of binary predictors

Data Sets Cox & Savoy McMenamin et al. Mohr el al. Osher et al. Binary-APADS105-Objects 68.37 ± ± ± ± ± DS107-Words 67.76 ± ± ± ± ± DS107-Consonants 63.84 ± ± ± ± ± DS107-Objects 63.17 ± ± ± ± ± DS107-Scramble 66.73 ± ± ± ± ± Table 3: Accuracy of multi-class predictors

Data Sets Cox & Savoy McMenamin et al. Mohr el al. Osher et al. Multi-APADS105 (P=8) 18.03 ± ± ± ± ± DS107 (P=4) 38.01 ± ± ± ± ± ALL (P=4) 32.93 ± ± ± ± ± This paper proposes Anatomical Pattern Analysis (APA) framework for decod-ing visual stimuli in the human brain. This framework uses an anatomical featureextraction method, which provides a normalized representation for combininghomogeneous datasets. Further, a new binary imbalance AdaBoost algorithmis introduced. It can increase the performance of prediction by exploiting a su-pervised random sampling and the correlation between classes. In addition, thisalgorithm is utilized in an Error-Correcting Output Codes (ECOC) method formulti-class prediction of the brain responses. Empirical studies on 4 visual cate-gories clearly show the superiority of our proposed method in comparison withthe voxel-based approaches. In future, we plan to apply the proposed method todiﬀerent brain tasks such as low-level visual stimuli, emotion and etc. Acknowledgment

We thank the anonymous reviewers for comments. This work was supportedin part by the National Natural Science Foundation of China (61422204 and61473149), Jiangsu Natural Science Foundation for Distinguished Young Scholar(BK20130034) and NUAA Fundamental Research Funds (NE2013105).