[PDF] The Compositional Nature of Verb and Argument Representations in the Human Brain

Abstract

Full PDF

TThe Compositional Nature of Verb and ArgumentRepresentations in the Human Brain

Andrei Barbu ∗ [email protected] N. Siddharth ∗ [email protected] Caiming Xiong † [email protected] Jason J. Corso † [email protected] Christiane D. Fellbaum ‡ [email protected] Catherine Hanson § [email protected] Stephen Jos´e Hanson § [email protected] S´ebastien H´elie ¶ [email protected] Evguenia Malaia (cid:107) [email protected]

Barak A. Pearlmutter ∗∗ [email protected] Jeffrey Mark Siskind ∗ [email protected] Thomas Michael Talavage ∗ [email protected] Ronnie B. Wilbur †† [email protected] Abstract

How does the human brain represent simple compositions of objects, actors, andactions? We had subjects view action sequence videos during neuroimaging(fMRI) sessions and identiﬁed lexical descriptions of those videos by decoding(SVM) the brain representations based only on their fMRI activation patterns. Asa precursor to this result, we had demonstrated that we could reliably and withhigh probability decode action labels corresponding to one of six action videos( dig , walk , etc. ), again while subjects viewed the action sequence during scan-ning (fMRI). This result was replicated at two different brain imaging sites withcommon protocols but different subjects, showing common brain areas, includingareas known for episodic memory (PHG, MTL, high level visual pathways, etc. , i.e. , the ‘what’ and ‘where’ systems, and TPJ, i.e. , ‘theory of mind’). Given theseresults, we were also able to successfully show a key aspect of language com-positionality based on simultaneous decoding of object class and actor identity.Finally, combining these novel steps in ‘brain reading’ allowed us to accuratelyestimate brain representations supporting compositional decoding of a complexevent composed of an actor, a verb, a direction, and an object. The compositional nature of thought is taken for granted by many in the cognitive-science andartiﬁcial-intelligence communities. For example, in computer vision, representations for nouns, ∗ School of Electrical and Computer Engineering, Purdue University, West Lafayette IN 47907-2035 † Department of Computer Science and Engineering, SUNY Buffalo, Buffalo NY 14260-2500 ‡ Department of Computer Science, Princeton University, Princeton NJ 08540-5233 § Department of Psychology and Rutgers Brain Imaging Center, Rutgers University, Newark NJ 07102 ¶ Psychological Sciences, Purdue University, West Lafayette IN 47907 (cid:107)

Southwest Center for Mind, Brain, and Education, University of Texas at Arlington, Arlington TX 76019 ∗∗ Hamilton Institute & Dept Computer Sci, National University of Ireland Maynooth, Co. Kildare, Ireland †† Department of Speech, Language, and Hearing Sciences and Linguistics Program, Purdue University, WestLafayette IN 47907 a r X i v : . [ q - b i o . N C ] J un uch as those used for object detection, are independent of representations for verbs, such as thoseused for event recognition. Humans need not employ compositional representations; indeed, manyargue that such representations may be doomed to failure in AI systems (Brooks, 1991). This isbecause concepts like verb or even object are human constructs; there is debate as to how they arisefrom percepts (Smith, 1996). Recent advances in brain-imaging techniques enable exploration ofthe compositional nature of thought. To that end, subjects underwent functional magnetic resonanceimaging (fMRI) during which they were exposed to stimuli which evoke complex brain activitywhich was decoded, piece by piece. The video stimuli depicted events described by entire sentencescomposed of a verb , an object , an actor and a location or direction of motion. By decoding complexbrain activity into its constituent parts, we show evidence for the neural basis of the compositionalityof verb and argument representations.Recent work on decoding brain activity corresponding to nouns has recovered object identity fromnouns presented as image and orthographic stimuli. Hanson and Halchenko (2009) perform classi-ﬁcation on still images of two object classes: faces and houses, and achieve an accuracy above 93%on a one-out-of-two classiﬁcation task. Connolly et al. (2012) perform classiﬁcation on still imagesof objects, two instances of each of three classes: bugs, birds, and primates, and achieve an accuracybetween 60% and 98% on a one-out-of-two within-class classiﬁcation task and an accuracy between90% and 98% on a one-out-of-three between-class classiﬁcation task. Just et al. (2010) performclassiﬁcation on orthographically presented nouns, 5 exemplars from each of 12 classes, achievinga mean rank accuracy of 72.4% on a one-out-of-60 classiﬁcation task, both within and between sub-jects. Pereira et al. (2012) incorporate semantic priors and achieve a mean accuracy of 13.2% on aone-out-of-12 classiﬁcation task and 1.94% on a one-out-of-60 classiﬁcation task when attemptingto recover the object being observed. Miyawaki et al. (2008) recover the position of an object in theﬁeld of view by recovering low resolution images from the visual cortex. Object classiﬁcation fromvideo stimuli has not been previously demonstrated.Recent work on decoding brain activity corresponding to verbs has primarily been concerned withidentifying active brain regions. Kable and Chatterjee (2006) present the brain regions which attemptto distinguish between the different agents of actions and between the different kinds of actions theyperform. Kemmerer et al. (2008) analyze the regions of interest (ROI) of brain activity associatedwith orthographic presentation of twenty different verbs in each of ﬁve different verb classes. Kem-merer and Gonzalez Castillo (2010) analyze the brain activity associated with verbs in terms of themotor components of event structure and attempt to localize the ROIs of such motor components.While prior work analyzes regions which are activated when subjects are presented verbs as stimuli,we recover the content of the resulting brain activity by classifying the verb from brain scans.Recent work demonstrates the ability to decode the actor of an event using personality traits. Hass-abis et al. (2013) demonstrate the ability to recover the identity of an imagined actor from that actor’spersonality. Subjects are informed of the two distinguishing binary personality traits of four actors.During fMRI, they are presented sentences orthographically which describe an actor performingan action. The subjects are asked to imagine this scenario with this actor and to rate whether theactions of the actor accurately reﬂect the personality of that actor. The resulting brain activationcorresponding to these two binary personality traits is used to recover the identity of the actor. Noprior work has recovered the identity of an actor without relying on that actor’s personality. In thework presented here, the personality of the actor has no bearing on the actions being performed.In this paper, two new experiments are presented. In Experiment 1, subjects are shown videos andasked to think of verbs that characterize those videos. Their brains are imaged via fMRI and mea-sured neural activation is decoded to recover the verb that the subjects are thinking about. Decodingis done by means of a support vector machine (SVM) trained on brain scans of those same verbs. Weknow of no other work that decodes brain activity corresponding to verbs. We show early evidencethat the regions identiﬁed by this decoding process are not intimately tied to a particular subject via an additional analysis that trains on one subject and tests on another. In Experiment 2, subjects areshown videos and asked to think of complex sentences composed of multiple components that char-acterize those videos. We show a novel ability to decode brain activity corresponding to multipleobjects: the identity of an actor and the identity of an object. We decode the identity of an actorwithout relying on the personality traits of that actor. We know of no other work which recoversan entire sentence composed of multiple constituents. We ﬁnd evidence that suggests underlyingneural representations of mental states are independent and compose into sentences largely withoutmodifying one another. 2 Compositionality

We discuss a particular kind of compositionality as it applies to sentence structure: objects ﬁll argu-ment positions in predicates that combine to form the meaning of a sentence. Pylkk¨anen et al. (2011)reviews work which attempts to show this kind of compositionality using a task called complementcoercion . Subjects in this task are presented with sentences whose meaning is richer than their syn-tax. For example, the sentence

The boy ﬁnished the pizza is understood as meaning that the pizzawas eaten, even though the verb eat does not appear anywhere in the sentence (Pustejovsky, 1995).The presence of pizza , belonging to the category food , coerces the interpretation of ﬁnish as ﬁnisheating . By contrast, He ﬁnished the newspaper induces the interpretation ﬁnish reading . Becausethe syntactic complexity in this prior experiment was held constant, the assumption is that coercionis a purely semantic meaning-adding function application, with little consequence for the syntax.The participants completed this task, and brain activity was measured using magnetoencephalogra-phy (MEG). The results show activity related to coercion in the anterior midline ﬁeld. This resultsuggests an initial localization for at least some function application, but it is difﬁcult to use MEGto distinguish whether this activity is read from the ventromedial prefrontal cortex or the anteriorcingulate cortex. Earlier work on the representation of objects and actions in the brain also indicatesthat these representations may be independent.

Representing objects in the brain

Objects are static entities that can be represented by a (mostly)static neural representation. For example, the 3D representation of a soda can will look the samein many different contexts, and the appearance of the soda can is not unfolding in time. It is gen-erally believed that the lexicon of object concepts is represented in the medial temporal lobe whiledifferent areas of the temporal lobe may be combinatoric in constructing object types (Hanson et al.,2004) although there may be modal areas associated with different representational functions. Forexample, lesion data suggests that the temporal pole is associated with naming people, the inferiortemporal cortex is associated with naming animals, and the anterior lateral occipital regions are as-sociated with naming tools. In addition, some regions involved in object representation are modalityspeciﬁc. For example, spoken-word processing involves the superior temporal lobe (part of the au-ditory associative cortex; Binder et al., 2000) while reading words representing objects activatesoccipito-temporal regions because of the visual processing (Puce et al., 1996). Speciﬁcally, audi-tory word processing involves a stream of information starting in Heschl’s gyri that is transferredto the superior temporal gyrus. Once the superior temporal gyrus has been reached, the modalityof stimulus presentation is no longer relevant. In contrast, the initial processing for written wordsstarts in the occipital lobe (V1 and V2), and moves on to occipito-temporal regions specialized inidentifying orthographic units. The information then moves rostrally to the temporal lobe proper,where modality of presentation is no longer relevant (Binder et al., 2000).

Representing actions in the brain

Unlike objects, verbs are dynamic entities that unfold in time.For instance, observing someone pick up a ball takes time as the person’s movement unfolds. Ev-idence reviewed in Coello and Bidet-Ildei (2012) suggests that action verbs activate both semanticunits in the temporal cortex and a motor network. The motor network includes the premotor areas(including the supplementary motor area), the primary motor cortex, and the posterior parietal cor-tex. Some researchers went as far as suggesting that the well-known ventral/dorsal distinction in thevisual pathways corresponds to a semantic (ventral) and action (dorsal) distinction. Representationof action may involve ‘mirror neurons’ that have been shown in macaque to respond jointly in per-ception/action tasks, where the similarity of the self action is to the perceived action of an observedindividual.

All experiments reported follow the same procedure and are analyzed using the same methods andclassiﬁers. Videos are shown to subjects who are asked to think about some aspect(s) of the videowhile whole-brain fMRI scans are acquired every two seconds. Because fMRI acquisition times areslow, roughly equal to the length of the video stimuli, a single brain volume that corresponds to thebrain activation induced by that video stimulus is classiﬁed to recover the features that the subjectswere asked to think about. Multiple runs separated by several minutes of rest, where no data isacquired, are performed per subject. 3 .1 fMRI procedures

Imaging performed at Purdue University used a 3T GE Signa HDx scanner (Waukesha, Wisconsin)with a Nova Medical (Wilmington, Massachusetts) 16 channel brain array to collect whole-brainvolumes via a gradient-echo EPI sequence with 2000ms TR, 22ms TE, 200mm × ◦ ﬂip angle. We acquired 35 axial slices with a 3.000mm slice thickness using a 64 ×

64 acquisitionmatrix resulting in 3.125mm × × × ×

80 acquisition matrixresulting in 3.000mm × × Data was acquired in runs, with between three and eight runs per subject per experiment, and eachaxis of variation of each experiment was counterbalanced within each run. fMRI scans were pro-cessed using AFNI (Cox et al., 1996) to skull-strip each volume, motion correct and detrend eachrun, and align each subject’s runs to each other. Voxels within a run were z-scored, subtracting themean value of that voxel for the run and dividing by its variance. Because each brain volume hasvery high dimension, between 143,360 and 236,800 voxels, we eliminate voxels by computing aper-voxel Fisher score on our training set and keeping the 5,000 highest-scoring voxels. The Fisherscore of a voxel v for a classiﬁcation task with C classes where each class c has n c examples iscomputed as C (cid:88) c =1 n c ( µ c,v − µ ) C (cid:88) c =1 n c σ c,v (1)where µ c,v and σ c,v are the per-class per-voxel means and variances and µ is the mean for the entirebrain volume. A linear SVM classiﬁes the selected voxels.One run was taken as the test set and the remaining runs were taken as the training set. The thirdbrain volume after the onset of each stimulus was taken along with the class of the stimulus totrain an SVM. This lag of three brain volumes is required because fMRI does not measure neuralactivation but instead measures the ﬂow of oxygenated blood, the blood-oxygen-level-dependent(BOLD) signal, which correlates with increased neural activation. It takes roughly ﬁve to six secondsfor this signal to peak which puts the peak in the third volume after the stimulus presentation. Crossvalidation was performed by choosing each of the different runs as the test set.To understand our results and to demonstrate that they are not classifying noise or irrelevant features,we perform an analysis to understand the brain regions that are relevant to each experiment. Wedetermine these regions by two methods. First we employ a spatial searchlight (Kriegeskorte et al.,2006) which slides a small sphere across the entire brain volume and repeats the above analysiskeeping only the voxels inside that sphere. We use a sphere of radius three voxels, densely place itscenter at every voxel, and do not perform any dimensionality reduction on the remaining voxels. Wethen perform an eight-fold cross validation as described above for each position of the sphere. ForExperiment 1 we also back-project the SVM coefﬁcients onto the anatomical scans—the higher theabsolute value of the coefﬁcient the more that voxel contributes to the classiﬁcation performance ofthe SVM—and use a classiﬁer with a different metric, w ( i ) , as described by Hanson and Halchenko(2009). We conducted an experiment to evaluate the ability to identify brain activity corresponding to verbsdenoting actions. Subjects are shown video clips of humans interacting with objects and are told tothink of the verb being enacted, but otherwise have no task. The subjects were shown clips depictingeach of these verbs prior to the experiment and were instructed about the intended meaning of each4 arry dighold pick upput down walk

Figure 1: Key frames from sample stimuli for each of the six verbs in Experiment 1. Examplestimulus videos are included in the supplementary material.verb. One difﬁculty with such an experiment is that there is disagreement between human subjectsas to whether a verb occurred in a video or not. To overcome this difﬁculty, we asked ﬁve humansto annotate the DARPA Mind’s Eye year 2 video corpus with the extent of every verb. From thiscorpus, we chose video clips where at least two out of the ﬁve annotators agreed on the depiction.We selected between twenty seven and thirty 2.5s video clips depicting each of six different verbs( carry , dig , hold , pick up , put down , and walk ). Key frames from one clip for each of the six verbsare shown in Fig. 1. Despite multiple annotators agreeing on whether a video depicts a verb, thetask of classifying each clip remains very difﬁcult for human subjects as it is easy to confuse similarverbs such as carry and hold . We address this problem by presenting, in rapid succession, pairs ofvideo clips which depict the same verb and asking the subjects to think about the verb that wouldbest describe both videos.We employed a rapid event-related design similar to that of Just et al. (2010). We presented pairsof 2.5s video clips at 12fps, depicting the same verb, separated by 0.5s blanking and followed by anaverage of 4.5s (minimum 2.5s) ﬁxation. While the video clips within each pair depicted the sameverb, the clips across pairs within a run depicted different verbs, randomly counterbalanced. Eachrun comprised 48 stimulus presentations spanning 254 captured brain volumes and ended with 24sof ﬁxation. Eight runs for each of subjects 1 through 3 were collected at Purdue University. Threeruns for subject 4 and four runs for subject 5 were collected at St. James Hospital.We performed an eight-fold cross validation (fewer for subjects 4 and 5) for a six-way classiﬁcationtask, where runs constituted folds. The results are presented in Fig. 2. The per-subject accura-cies, averaged across class and fold, were: 80.73%, 87.24%, 78.91%, 35.94%, and 43.75% (chance16.66%). Note that the last two were trained on fewer runs than the ﬁrst three. This demonstratesthe ability to recover the verb that the subjects were thinking about. The robustness of this result isenhanced by the fact that it was replicated on two different fMRI scanners at different locations runby different experimenters.To evaluate whether the brain regions used for classiﬁcation generalize across subjects, we per-formed an additional analysis on the data for subjects 1 and 2. One run out of the eight was selectedas the test set and the data for one of the two subjects was classiﬁed. The training set consisted of allseven other runs for the subject whose data does not appear in the test set. The test was performedon the run omitted from the training set, even though it was gathered from a different subject, topreclude the possibility that the same stimulus sequence appeared in both the training and test sets.We performed cross validation by varying which subject contributes the test data and which subjectcontributes the training data, and within each of these folds we varied which of the eight runs isthe test set. These two cross validations yielded accuracies of 33.59% (subject 1 (cid:55)→ subject 2) and41.41% (subject 2 (cid:55)→ subject 1), averaged across class and fold, where chance again is 16.66%.To locate regions of the brain used in the previous analysis, we used a spatial-searchlight linear-SVMmethod on subject 1. We use the accuracy to determine the sensitivity of each voxel and thresholdupward to less then 5% of the cross-validation measures. These measures are overlaid and (2-stage)registered to MNI152 2mm anatomicals shown in Fig. 3(top). Notable are visual-pathway areas (lat-5igure 2: Results for Experiment 1. (left) Per-subject classiﬁcation accuracy on 1-out-of-6 verbclasses averaged across class and fold. Horizontal line indicates chance performance, 16.66%.(right) Corresponding confusion matrix averaged across subject and fold is mostly diagonal, withthe highest numbers of errors being made distinguishing hold and carry , two ambiguous stimuli.Figure 3: (top) Searchlight analysis for Experiment 1 indicating the classiﬁcation accuracy of differ-ent brain regions on the anatomical scans from subject 1, averaged across stimulus, class, and run.(bottom) A similar analysis using a w ( i ) metric.eral occipital-LO, lingual gyrus-LG, and fusiform gyrus) as well as prefrontal areas (inferior frontalgyrus, middle frontal gyrus, and cingulate) and areas consistent with the ‘mirror system’ (Arbib,2006) and the so-called ‘theory of mind’ (pre-central gyrus, angular gyrus-AG, and superior parietallobule-SPL) areas (Dronkers et al., 2004; Turken and Dronkers, 2011). Fig. 3(bottom) shows thedecoded ROIs from a similar SVM classiﬁer with a different metric, w ( i ) (Hanson and Halchenko,2009), showing similar brain areas but, due to higher sensitivity, also indicates sub-cortical regions(hippocampal) associated with encoding processes not seen with the cross-validation accuracy met-ric. As argued in Section 2, lateral-occipital areas are involved in visual processing speciﬁcallyrelated to language, and the fusiform gyrus is a hetero-modal area that could hold abstract represen-tations of the elements contained in the videos ( e.g. , semantics). This data brings initial support forthe hypothesis that concepts have both modality-speciﬁc and abstract representations. Hence, theelements used by the SVM to classify the videos are also neuroscientiﬁcally meaningful.6 arry chair carry shirt carry tortillafold chair fold shirt fold tortillaleave chair leave shirt leave tortilla Figure 4: Key frames from sample stimuli in Experiment 2. Example stimulus videos are includedin the supplementary material.

We conducted a further experiment to evaluate the ability to recover compositional semantics forentire sentences. Subjects were shown videos that depict sentences of the form: the actor verb the object direction/location . They were asked to think about the sentence depicted in each video andotherwise had no task. Videos depicting three verbs ( carry , fold , and leave ), each performed withthree objects ( chair , shirt , and tortilla ), each performed by four human actors, and each performedon either side of the ﬁeld of view were ﬁlmed for this task. The verbs were chosen to be discriminablebased on features described by Kemmerer et al. (2008): leave − state-change − contact fold + state-change + contact carry − state-change + contactNouns were chosen based on categories found to be easily discriminable by Just et al. (2010): chair (furniture), shirt (clothing), and tortilla (food) and also selected to allow each verb to be performedwith each noun. Because these stimuli are not as ambiguous as the ones from Experiment 1, theywere not shown in pairs. All stimuli enactments were ﬁlmed against the same nonvarying back-ground, which contained no other objects except for a table (Fig 4).This experiment, like Experiment 1, also used a rapid event-related design. We collected multiplevideos, between 4 and 7, for each cross product of the verb, object and human actor. Variation alongthe side of ﬁeld of view and direction of motion was accomplished by mirroring the videos about thevertical axis. Such mirroring induces variation in direction of motion (leftward vs. rightward) for theverbs carry and leave and induces variation in the location in the ﬁeld of view where the verb fold occurs (left half vs. right half). We presented 2s video clips at 10fps followed by an average of 4s(minimum 2s) ﬁxation. Each run comprised 72 stimulus presentations spanning 244 captured brainvolumes, with eight runs per subject, and ended with 24s of ﬁxation. Each run was individuallycounterbalanced for each of the four conditions (verb, object, actor, and mirroring). We collecteddata for three subjects at Purdue University but discarded the data for one of the three due to subjectmotion. One subject did eight runs without exiting the scanner. One subject exited the scannerbetween runs six and seven, which required cross-session registration. All subjects were aware ofthe experiment design, were informed of the intended depiction of each stimulus prior to the scan,and were instructed to think of the intended depiction after each presentation.This experimental design supports the following classiﬁcation analyses: event one-out-of-9 verb&noun ( carry , fold , and leave , each performed on chair , shirt , and tortilla ) verb one-out-of-3 verb ( carry , fold , and leave ) object one-out-of-3 noun ( chair , shirt , and tortilla ) actor one-out-of-4 actor identity direction one-out-of-2 motion direction for carry and leave (leftward vs. rightward) location one-out-of-2 location in the ﬁeld of view for fold (right vs. left)7he analysis performed was exactly the same as that for Experiment 1, including eight-fold crossvalidation for each of our analyses, where runs constituted folds. Fig. 5 presents an overview ofthe results along with per-subject classiﬁcation accuracies and aggregate confusion matrices for theeach of the above analyses. Note that we achieve signiﬁcantly above-chance performance on all sixanalyses with only a single fold for a single subject across all six analyses performing below chance. Verb performance is well above chance (76.22%, chance 11.11%). This replicates Experiment 1with different videos and a new verb and adds to the evidence that brain activity correspondingto verbs can reliably be decoded from fMRI scans.

Object performance was signiﬁcant as well(60.42%, chance 33.33%). Given neural activation, we can decode which object the subjects arethinking about. We know of no other work that decodes brain activity corresponding to objects fromvideos. The fact that the verb and object can be decoded independently already provides evidenceof argument compositionality. Were the neural representations not compositional at this level, de-coding would not be possible. For example, if the representation of carry was neurally encoded asa combination of walk and a particular object, verb performance would not exceed chance, becauseour experiment is counterbalanced with respect to the object with which the action is being per-formed. While this indicates that the representations for verbs and objects are independent of eachother to some degree, we also seek to quantify the level of independence. If the representation of carry is somewhat different depending on which object is being carried, we expect that performancewould increase when we jointly classify the object and the verb. This seems to not be the case.The accuracy of event is almost identical to the joint independent accuracy of verb and object :0.5538 ≈ = × ≈ = × event in Fig. 5(c)which remains diagonal.To decode complex brain activity corresponding to an entire sentence, we can combine actor , verb , object , and direction or location . We perform signiﬁcantly above chance on this one-out-of-72(4 × × ×

2) classiﬁcation:0.3281 × × × ( × + × ) = (cid:29) = (subject 1)0.3281 × × × ( × + × ) = (cid:29) = (subject 2)(Since direction applied to carry and leave while location disjointly applied to fold , this yields abinary classiﬁcation task across all verbs.) Thus we are able to classify entire sentences composi-tionally from their individual words.To locate regions of the brain used in the previous analyses, we applied the same searchlight linear-SVM method that was performed in Experiment 1 to subject 1’s data from this experiment andidentiﬁed similar areas in visual-pathway, parietal, and prefrontal areas. The resulting ROIs, shownin Fig. 6, are overlaid and color coded according to the speciﬁc visual feature being decoded. Ingeneral, it is clear that the decoding is sensitive to action/category information and various visualobject-and-motion features. Many of the same regions active for verb in Experiment 1 also showactivity in this experiment. Direction and location activity is present in the visual cortex with signif-icant location activity occurring in the early visual cortex.

Object activity is present in the temporalcortex, and agrees with previous work on object-category encoding (Gazzaniga et al., 2008).

We have demonstrated that it is possible to read a subject’s brain activity and decode a complexaction tableau corresponding to a sentence from its constituents. To do so, we showed novel workwhich decodes brain activity associated with verbs and simultaneously recovers lexical aspects ofdifferent parts of speech. Our results indicate that the neural representations for verbs and objectscompose together to form the meaning of a sentence apparently without modifying one another.These results indicate that representations which attempt to decompose meaning into constituentsmay have a neural basis.

Acknowledgments

AB, NS, and JMS were supported, in part, by Army Research Laboratory (ARL) Cooperative Agree-ment W911NF-10-2-0060. CX and JJC were supported, in part, by ARL Cooperative Agreement8a)

Classiﬁcation AccuracySubject event verb object actor direction location event verb objectactor direction location

Figure 5: Results for Experiment 2. (a) Per-subject mean classiﬁcation accuracies averaged acrossfold. Note that all six analyses perform above chance. (b) Per-subject classiﬁcation accuracies show-ing the means and variances of performance across the different folds for each class. The horizontalline indicates chance performance. (c) Corresponding confusion matrices, averaged across subjectand fold. Note that they are mostly diagonal. 9igure 6: Searchlight analysis for Experiment 2 indicating the classiﬁcation accuracy of differentbrain regions on the anatomical scans from subject 1 averaged across stimulus, class, and run.W911NF-10-2-0062 and NSF CAREER grant IIS-0845282. CDF was supported, in part, by NSFgrant CNS-0855157. CH and SJH were supported, in part, by the McDonnell Foundation. BAP wassupported, in part, by Science Foundation Ireland grant 09/IN.1/I2637. The views and conclusionscontained in this document are those of the authors and should not be interpreted as representingthe ofﬁcial policies, either express or implied, of the supporting institutions. The U.S. Governmentis authorized to reproduce and distribute reprints for Government purposes, notwithstanding anycopyright notation herein. Dr. Gregory G. Tamer, Jr. provided assistance with imaging and analysis.

References

M. A. Arbib.

Action to language via the mirror neuron system . Cambridge University Press, 2006.J. R. Binder, J. A. Frost, T. A. Hammeke, P. S. F. Bellgowan, J. A. Springer, J. N. Kaufman, and E. T.Possing. Human temporal lobe activation by speech and nonspeech sounds.

Cerebral Cortex , 10(5):512–28, 2000.R. A. Brooks. Intelligence without representation.

Artiﬁcial intelligence , 47(1):139–59, 1991.Y. Coello and C. Bidet-Ildei. Motor representation and language in space, object and movementperception. In Y. Coello and A. Bartolo, editors,

Language and Action in Cognitive Neuroscience ,chapter 4, pages 83–110. Psychology Press, 2012.A. C. Connolly, J. S. Guntupalli, J. Gors, M. Hanke, Y. O. Halchenko, Y.-C. Wu, H. Abdi, and J. V.Haxby. The representation of biological classes in the human brain.

The Journal of Neuroscience ,32(8):2608–18, 2012.R. W. Cox et al. AFNI: software for analysis and visualization of functional magnetic resonanceneuroimages.

Computers and Biomedical Research , 29(3):162–73, 1996.N. F. Dronkers, D. P. Wilkins, R. D. Van Valin, Jr., B. B. Redfern, J. J. Jaeger, et al. Lesion analysisof the brain areas involved in language comprehension.

Cognition , 92(1-2):145–77, 2004.M. S. Gazzaniga, R. B. Ivry, and G. R. Mangun.

Cognitive Neuroscience: The Biology of the Mind .W. W. Norton & Company, New York, third edition, 2008.S. J. Hanson and Y. O. Halchenko. Brain reading using full brain support vector machines for objectrecognition: There is no “face” identiﬁcation area.

Neural Computation , 20(2):486–503, 2009.S. J. Hanson, T. Matsuka, and J. V. Haxby. Combinatorial codes in ventral temporal lobe for objectrecognition: Haxby (2001) revisited: Is there a “face” area?

Neuroimage , 23(1):156–66, 2004.D. Hassabis, R. N. Spreng, A. A. Rusu, C. A. Robbins, R. A. Mar, and D. L. Schacter. Imagineall the people: How the brain creates and uses personality models to predict behavior.

CerebralCortex , 23, 2013. 10. A. Just, V. L. Cherkassky, S. Aryal, and T. M. Mitchell. A neurosemantic theory of concretenoun representation based on the underlying brain codes.

PloS One , 5(1):e8622, 2010.J. W. Kable and A. Chatterjee. Speciﬁcity of action representations in the lateral occipitotemporalcortex.

Journal of Cognitive Neuroscience , 18(9):1498–517, 2006.D. Kemmerer and J. Gonzalez Castillo. The two-level theory of verb meaning: An approach tointegrating the semantics of action with the mirror neuron system.

Brain and Language , 112(1):54–76, 2010.D. Kemmerer, J. Gonzalez Castillo, T. Talavage, S. Patterson, and C. Wiley. Neuroanatomicaldistribution of ﬁve semantic components of verbs: Evidence from fMRI.

Brain and Language ,107(1):16–43, 2008.N. Kriegeskorte, R. Goebel, and P. Bandettini. Information-based functional brain mapping.

Pro-ceedings of the National Academy of Sciences of the United States of America , 103(10):3863–8,2006.Y. Miyawaki, H. Uchida, O. Yamashita, M. Sato, Y. Morito, H. C. Tanabe, N. Sadato, and Y. Kami-tani. Visual image reconstruction from human brain activity using a combination of multiscalelocal image decoders.

Neuron , 60(5):915–29, 2008.F. Pereira, M. Botvinick, and G. Detre. Using Wikipedia to learn semantic feature representationsof concrete concepts in neuroimaging experiments.

Artiﬁcial Intelligence , 194:240–52, 2012.A. Puce, T. Allison, M. Asgari, J. C. Gore, and G. McCarthy. Differential sensitivity of human visualcortex to faces, letterstrings, and textures: a functional magnetic resonance imaging study.

TheJournal of Neuroscience , 16(16):5205–15, 1996.J. Pustejovsky.

Generative Semantics . MIT Press, 1995.L. Pylkk¨anen, J. Brennan, and D. K. Bemis. Grounding the cognitive neuroscience of semantics inlinguistic theory.

Language and Cognitive Processes , 26(9):1317–37, 2011.B. C. Smith.

On the origin of objects . MIT Press Cambridge, MA, 1996.U. Turken and N. F. Dronkers. The neural architecture of the language comprehension network:Converging evidence from lesion and connectivity analyses.