[PDF] Relating Reading, Visualization, and Coding for New Programmers: A Neuroimaging Study

Abstract

Understanding how novices reason about coding at a neurological level has implications for training the next generation of software engineers. In recent years, medical imaging has been increasingly employed to investigate patterns of neural activity associated with coding activity. However, such studies have focused on advanced undergraduates and professionals. In a human study of 31 participants, we use functional near-infrared spectroscopy to measure the neural activity associated with introductory programming. In a controlled, contrast-based experiment, we relate brain activity when coding to that of reading natural language or mentally rotating objects (a spatial visualization task). Our primary result is that all three tasks -- coding, prose reading, and mental rotation -- are mentally distinct for novices. However, while those tasks are neurally distinct, we find more significant differences between prose and coding than between mental rotation and coding. Intriguingly, we generally find more activation in areas of the brain associated with spatial ability and task difficulty for novice coding compared to that reported in studies with more expert developers. Finally, in an exploratory analysis, we also find a neural activation pattern predictive of programming performance 11 weeks later. While preliminary, these findings both expand on previous results (e.g., relating expertise to a similarity between coding and prose reading) and also provide a new understanding of the cognitive processes underlying novice programming.

Full PDF

RRelating Reading, Visualization, and Coding forNew Programmers: A Neuroimaging Study

Madeline Endres

Computer Science and EngineeringUniversity of [email protected]

Zachary Karas

Department of PsychologyUniversity of [email protected]

Xiaosu Hu

Department of PsychologyUniversity of [email protected]

Ioulia Kovelman

Department of PsychologyUniversity of [email protected]

Westley Weimer

Computer Science and EngineeringUniversity of [email protected]

Abstract —Understanding how novices reason about coding ata neurological level has implications for training the next gen-eration of software engineers. In recent years, medical imaginghas been increasingly employed to investigate patterns of neuralactivity associated with coding activity. However, such studieshave focused on advanced undergraduates and professionals.In a human study of 31 participants, we use functional near-infrared spectroscopy to measure the neural activity associatedwith introductory programming. In a controlled, contrast-basedexperiment, we relate brain activity when coding to that ofreading natural language or mentally rotating objects (a spatialvisualization task). Our primary result is that all three tasks—coding, prose reading, and mental rotation—are mentally distinctfor novices. However, while those tasks are neurally distinct, weﬁnd more signiﬁcant differences between prose and coding thanbetween mental rotation and coding. Intriguingly, we generallyﬁnd more activation in areas of the brain associated with spatialability and task difﬁculty for novice coding compared to thatreported in studies with more expert developers. Finally, in anexploratory analysis, we also ﬁnd a neural activation patternpredictive of programming performance 11 weeks later. Whilepreliminary, these ﬁndings both expand on previous results (e.g.,relating expertise to a similarity between coding and prosereading) and also provide a new understanding of the cognitiveprocesses underlying novice programming.

I. I

NTRODUCTION

In recent years, the software engineering community hasincreasingly used medical neuroimaging to understand thecognitive processes behind programming [1]–[7]. Unlike eye-tracking or other biometric methods, neuroimaging pinpointsbrain regions activated while completing speciﬁed tasks. Manyof the neuroimaging studies in software engineering have com-pared programming to reading or spatial manipulation, twoskills with well-understood cognitive structures. As broadlysummarized in Table I, such studies have generally foundstriking similarities between code comprehension and prosereading [1]–[3], and one study has found similarities betweenspatial reasoning and data structure manipulation [6].Neuroimaging studies of programmers have the potential toimprove our understanding of expertise, to inform softwareengineering pedagogy, and to guide tool development and retraining (see Floyd et al. [3, Sec. II-D] for a summary).Critically, however, to the best of our knowledge all of thesoftware engineering neuroimaging studies thus far have onlystudied programming experts that are either professionals orstudents with multiple years of experience (e.g., [2, Sec. 3.3]).Tantalizingly, one study [3] found that coding became moreneurologically similar to reading for programmers with evengreater expertise. However, to realize the potential of neu-roimaging for understanding software engineering expertise,we must also directly observe true novices . Studying noviceprogrammers is critical for exploring how cognitive processesfor coding evolve. In this paper, we seek to close this gap bypresenting the ﬁrst neuroimaging study of novice programmersduring code comprehension (cf. [2], [3]).Studying novice programmers presents several challengesrelative to studying experts. First, we must create experimen-tal stimuli that are amenable for novices with little to noprevious coding experience; the coding stimuli in previousneuroimaging studies all involve constructs (e.g., trees [6]) orsoftware engineering tasks (e.g., code review [3]) unfamiliarto novices. Second, we must pay particular care to recruitparticipants with equivalent programming expertise; even inan introductory course, some students have substantially moreprogramming experience than others. Finally, we propose anduse an experimental protocol that follows up with participantsmonths later to assess their programming progression using awritten assessment; to our knowledge, no previous neuroimag-ing studies in software engineering have involved time-delayedoutcome measurements.To better understand the cognitive processes of noviceprogrammers, we use functional near-infrared spectroscopy(fNIRS) to conduct a controlled neuroimaging study with 31participants with no previous programming experience, allenrolled in the same introductory computing class. We conductthese scans during the ﬁrst third of the class. We compareparticipants’ brain activation patterns while coding to thosewhile reading prose or using spatial reasoning (i.e., mentallyrotating 3D objects). We also use a written assessment at the a r X i v : . [ c s . S E ] F e b xperiment Like reading? Like spatial Novices?reasoning?Siegmund et al. (2014) [1] (cid:51) ? Siegmund et al. (2017) [2] (cid:51) ? Floyd et al. (2017) [3] (cid:51) ? Huang et al. (2019) [6] (cid:51) ? TABLE I: “What is coding like in programmers’ brains?”We use this informal summary of selected previous work tomotivate and contextualize the experiments in this paper.end of the semester to conduct a preliminary exploration ofan aspect of learning, assessing if novices’ brain activationpatterns while coding can predict their future programmingability.We ﬁnd that, for novices, coding is a working memoryintensive task that is neurally distinct from both a spatial taskand prose reading ( p < . , q < . ). Unlike previouswork with experts which generally reports strong similaritiesbetween coding and reading [1]–[3], we observe more signif-icant and substantial differences between coding and readingthan we do for coding and spatial tasks. This indicates thatnovices rely heavily on visiospatial cognitive processes whilecoding. Finally, we observe that particular activation patternsat the beginning of a course can predict how well studentsperform on a ﬁnal programming assessment 11 weeks later;in general, the less similar the activation patterns are betweencoding and mental rotation, the better an individual performs( r = 0 . , p = 0 . ). This may indicate that novices whouse more problem-solving intensive strategies at the beginningof the semester (i.e., ﬁnd programming more challenging)make less progress. We also compare our results to thoseof previous neuroimaging studies with more expert softwareengineers, and we close with a discussion of the implicationsof our results on introductory programming pedagogy andfuture software engineering research.II. B ACKGROUND

We give an overview of material necessary for understand-ing the methods and experiments in this paper. In Section II-A,we discuss neuroimaging and fNIRS, and in Section II-B,we present relevant notation. In Section II-C, we discuss thecognition behind spatial ability, and in Section II-D, we discussthe cognition behind reading. See Section IX for a discussionof work more directly related to software engineering.

A. Neuroimaging and fNIRS

Brain activity and cognitive processes can be studied using functional neuroimaging techniques. We focus on functionalnear infrared spectroscopy (fNIRS). It is non-invasive, avoidsthe ionizing radiation present in other methods (e.g., PET, CT),and can measure activity in brain regions not accessible tosome invasive techniques (e.g., electrocorticography). Impor-tantly, fNIRS offers higher spatial resolution than EEG, andhigher temporal resolution than fMRI, which is important forstudies relating a brain region’s contribution to a speciﬁc task.Finally, fNIRS can be used in more natural and ecologically-valid environments (e.g., standard desktop computer use, etc.) compared to alternatives like fMRI (which requires partici-pants to lie still in a small tube and also complicates the useof keyboards [7]). These properties motivate our decision touse fNIRS for a study of novice software engineers.fNIRS makes use of the hemodynamic response , or changein neuronal blood ﬂow to active brain regions, to measurebrain activity [8]. fNIRS measures this via the use of near-infrared light: transmitters and receivers are placed on a “cap”worn by participants. Oxygen-rich and oxygen-poor bloodhave different light absorption properties, so the hemodynamicresponses in a given brain region between a transmitter andreceiver pair (referred to as a channel ) can be measuredover time. fNIRS measures concentration changes in suchoxygenated and deoxygenated blood. The number of fNIRSpublications had doubled every 3.5 years since 1992 [9], andit has been used to study human development, injury, andpsychiatric conditions [9]–[12].For our purposes, the use of fNIRS imposes two keyexperimental constraints: contrast-based design and task du-ration. First, fNIRS experiments typically involve participantscompleting tasks (e.g., mentally rotating objects or solvingprogramming problems) while time-series data is recorded.Carefully controlled experiments are necessary, in which theactivity observed during one task is contrasted against theactivity observed during another. This allows confoundingbrain activations (e.g., motor cortex activity from moving thelungs to breathe) to be eliminated from consideration.Second, because fNIRS is based on the hemodynamicresponse, care must be taken to model [13], [14] the onsetof neuronal blood ﬂow (which peaks slightly after stimuliare presented [15], [16]) and the design must avoid saturationand weaker signals for tasks involving long activity [17]. Asa result, the hemodynamic response can typically be studiedonly in experiments with brief stimuli (e.g., under 30 secondsper question). Furthermore, fNIRS is only able to penetratea few centimeters down into the brain. More speciﬁcally, thenear-infrared light can reach a depth that is roughly half thedistance between a transmitter-receiver pair, depending on thewavelengths and light intensities used [18].Despite these limitations, fNIRS speciﬁcally, and medicalimaging in general, are growing in popularity for use insoftware engineering studies (e.g., [1]–[7], [19]–[22]). Theyprovide a physically-grounded insight into cognition withoutrelying on potentially unreliable self-reporting [6].

B. Neuroscience Vocabulary and Notation

Vocabulary:

The cerebrum of the human brain is composedof two (largely symmetric) hemispheres, left and right, andfour primary lobes: frontal, temporal, parietal, and occipital.Loosely, the frontal lobe is at the front of each hemisphere,the temporal lobe is on the side of each hemisphere, the parietal lobe is at the top of each hemisphere, and the occipitallobe is at the back of each hemisphere. Activation is called bilateral if both hemispheres are activated and lateralized ifone hemisphere activates disproportionately. Throughout thispaper, we will use various schema to refer to locations on the2erebrum’s cortex , the brain’s outer layer of neural tissue. Onesuch schema is

Brodmann Areas , an anatomical classiﬁcationsystem for the cortex [23]. Broadmann areas (BAs) dividethe cortex into 52 bilateral regions based on architecturalneurological features. Many BAs also have associated neuro-logical functions. For instance, BAs 41 and 42 are associatedwith auditory processing. We will also sometimes refer toregions by their common names or location on a lobe. Threeimportant regions mentioned in this paper are Wernicke’s area(left hemisphere, back of the parietal lobe), Broca’s area (lefthemisphere, lower frontal lobe), and the dorsolateral prefrontalcortex or DLPFC (bilateral in the frontal lobe). Wernicke’s andBroca’s areas are strongly associated with language functionswhile the DLPFC is associated with working memory.

Notation:

In this paper, we will use the neuroimaging notation

Task A > Task B to indicate the contrast between brain activa-tion patterns for two experimental tasks. The results of thesecontrasts are reported as statistical t -values corresponding toeach fNIRS channel. These t -values range from − to +8 ; apositive t -value indicates that the speciﬁed brain area was more active during Task A than Task B, while a negative t -valueindicates that the speciﬁed brain area was less active duringTask A than Task B. Values closer to − and +8 representstronger activation contrasts between the two tasks, and onlyareas with signiﬁcant contrast ( p < . ) after correction formultiple comparisons ( q < . ) are reported. Finally, we notethat contrast tests are directional: a signiﬁcant contrast in TaskA > Task B does not imply that the inverse contrast

Task B > Task A will also produce signiﬁcant results. This is becausethe differences in the inverse contrast may be too small to bestatistically signiﬁcant.

C. Spatial Reasoning and Cognition

We now turn to a discussion of Spatial Reasoning and itsneurological representations.

Spatial Reasoning refers to anindividual’s general ability to mentally manipulate objects andencompasses skills such as mental rotation, mental folding,pattern recognition, and spatial perception [24]. Spatial rea-soning has been shown to correlate with performance in avariety of activities including mathematics [25], [26], generalengineering [27], and programming [28], [29]. Spatial abilityis also malleable and can be improved through training [30].Part of spatial reasoning, mental rotation involves the imag-ined rotation of a two- or three-dimensional object around anaxis in three dimensional space [31]. Figure 2a depicts one ofthe mental rotation stimuli we used (see Section III-C). Thedifﬁculty of a mental rotation problem is determined by thesize of the angle of rotation; Shepard and Metzler found thatthe time a participant took to solve a given problem increasedlinearly with respect to the angle of rotation between the twoobjects [32]. In this paper, we use mental rotation ability as avalidated proxy for more general spatial reasoning ability.Generally, neuroimaging has found that mental rotation ac-tivates the posterior parietal and occipital cortices (BAs, 7, 17,19, 39, and 40—see Zacks [31] for a survey). While bilateral, the parietal and occipital activation tends to be slightly strongerin the right hemisphere; the right parietal lobe in particular isbelieved to be important for spatial ability and spatial attentiontasks [33]. Many mental rotation neuroimaging studies havealso revealed bilateral activation in the supplementary motorcortex, an area associated with motor control and planning(BA 6) [31]. This frontal activation is most common for mentalrotation tasks that allow motor stimulation strategies.

D. Reading and Cognition

Next, we include a brief summary of the neurologicalprocesses associated with reading. Neuroimaging has revealedthat language is supported by a complex network of cogni-tive areas generally lateralized to the left hemisphere (seePrice [34] and Vigneau et al. [35] for surveys). Some languageprocesses are localized to speciﬁc structures in the brain, whileother language processes arise from a distributed network ofareas with multiple functions [34].Two key left-hemisphere brain areas associated with readingare Broca’s area and Wernicke’s area. Located in the frontallobe, Broca’s area (BAs 44 and 45) plays an important rolein language production and, to a lesser extent, language com-prehension [35]. Wernicke’s area is in the posterior temporallobe and is associated primarily with comprehension of bothspoken and written language [35].III. E

XPERIMENTAL S ETUP AND D ESIGN

We now present our experimental design for understandingthe neurological basis for novice programmers. Our exper-iment was conducted in two parts: an initial fNIRS scanand a written followup assessment. During the fNIRS scan,participants were shown three types of stimuli: code compre-hension, mental rotation, and prose reading. During the writtenpost-test, participants completed a validated language-agnosticprogramming assessment. All participants were enrolled in thesame 15-week introductory programming course. The initialfNIRS scans were held during the ﬁrst third of the semesterwhile the written post-test was held during the last week ofthe semester. This design allows the controlled explorationof the relationships and contrasts between reading, coding,and spatial reasoning for novice programmers. It also opensa preliminary investigation of neurological factors that mightbe predictive for programming. In the rest of this section wedescribe our recruitment protocol (subsection III-A); outlineour fNIRS data collection, experimental setup, and stimuli(subsections III-B and III-C); and describe our followup pro-gramming assessment (subsection III-D).

A. Participant Recruitment

Participants were all recruited from the same CS1 courseat the University of Michigan, a large public US university,via a combination of email, forum posts, and an in-classpresentation. To be eligible, participants had to be over 18,have no prior programming experience, only be enrolled in A replication package containing all of our recruitment and stimulimaterials can be found at https://github.com/CelloCorgi/ICSE fNIRS2021. $20 in compensa-tion. In total, we collected fNIRS data from 37 participants.Data from 31 participants passed our analysis quality threshold(see Section IV). The ﬁnal 31 participants (24 female, 7 male)ranged in age from 18 to 21.Beyond the initial fNIRS scan, we also followed up withparticipants via email at the end of the semester, inviting themto attend an additional written programming assessment. Ofour 31 participants, 23 participated in this written post-test.Post-test participants were compensated an additional $20 .During participant selection, we were keenly aware thatconﬁrming the absence of previous programming ability fora diverse population is a challenging task. To mitigate self-selection bias, we implemented checks for participant pro-gramming ability in three places: recruitment, pre-screening,and statistical validation of scores on a written programmingtest. During recruitment, both in-person and written, we em-phasized that participants could have no prior programmingexperience of any kind. Prospective participants were explic-itly told, in person, that even minimal practice or exposure totextual or visual programming languages (such as Scratch [36]or MIT App Inventor) counted as prior experience.During prescreening, participants indicated if they “had anyprior programming experience” with one of “Yes”, “No”, or“Other/unsure”. We only retained participants who selected“No” outright. The presence of the “Other/unsure” optionmitigates some self-selection bias in experience reporting.We also asked potential participants to indicate concurrentand previous course enrollment from a list of courses at ouruniversity. Courses that contained “programming” or “code”in their syllabus or description precluded study participation.These questions eliminated 18% of pre-screening respondents.Finally, at the same time as the demographics questionnaire,participants were given a brief programming test. The testconsisted of twelve pseudocode multiple-choice questions andis a validated measure of CS1 concepts [37]. We expected lowscores: novices should have no prior programming exposure(beyond the ﬁrst few weeks of their current course). Indeed,we found an average score of 23% (random guessing yields20%). There was no statistically signiﬁcant difference betweenparticipant scores and a random distribution. We believe thatthese three mitigations help account for self-selection biasissues and give conﬁdence that our pool contains novices (butacknowledge that the issue is reduced, rather than eliminated).We also observe that unusually for software engineeringstudies, a majority of our population (77%) were female. Webelieve that the high ratio of females in our population iscaused by a combination of: a more gender-balanced popula-tion pool than most software engineering studies, fNIRs datasignal, and self selection bias. First, the course we recruited from is fairly gender balanced: around 45–50% of studentsidentify as women, compared to 22% for CS overall at ouruniversity. Second, by chance, two-thirds of the participantswho we were unable to analyze due to fNIRS signal quality(i.e., measurement noise) were male. Third, women oftenvolunteer for college studies at higher rates than men and fordifferent reasons [38] while men are more likely to have someprior programming experience [39] and thus be excluded fromour study.

B. fNIRS Data Collection and Setup

Each participant’s fNIRS data was collected during a sin-gle session which lasted 1.5 hours. First, participants gaveinformed consent and ﬁlled out a short demographic survey.Next, participants watched a training video preparing them forthe scan, a 30–45 minute process that involved ﬁtting eachparticipant for the fNIRS cap by moving hair to optimizesensor contact with the scalp, and thus, signal quality.During the fNIRS scan, the participant sat in a chair facinga monitor wearing the fNIRS cap. The room was kept dim toreduce the amount of ambient light that could interfere withthe fNIRS data collection. Participants were also instructed tostay as still as possible. Each participant was shown 90 stimuli:30 mental rotation stimuli, 30 reading-based stimuli, and 30coding-based stimuli. All stimuli asked the participant tochoose one of two answers. Participants indicated their answerby pressing a corresponding key on a standard keyboard. Foreach stimuli, participants had up to 30 seconds to respond.The 90 stimuli were in randomized order and were brokeninto three blocks of 30 questions, each containing 10 stimuli ofeach type. Between stimuli, participants were shown a ﬁxationcross for 2–10 seconds. Between blocks, participants had anoptional longer break to rest and/or drink water.We now present technical information about our fNIRSdevice and cap. We used a CW6 fNIRS system (TECHEN,Milford, MA, USA) with 690 nm and 830 nm wavelengths.It has ﬁber-optic cables which transmit light from the deviceto sensors connected to the participant’s cap. These sensorsare either transmitters that emit light or receivers that detectlight. As a result, fNIRS collects a participant’s hemodynamicresponse only along a set of pre-deﬁned channels betweentransmitters and receivers. The location, number, and coverageof these channels is determined by the cap design. We used aprobe conﬁguration similar to that used in Huang et al. [40]with dense coverage of transmitter-receiver pairs in the occip-ital, frontotemporal, and frontal regions. In total, 15 receiverand 30 transmitter ﬁbers were used, yielding 55 channelsfrom which data were collected, covering Broadmann areas6–9, 17–19, 21, 22, 39–41, and 44–47. The data from thesechannels were then analyzed using both NIRS Brain AnalyzIRToolbox [41] and custom scripts written in MATLAB. Apicture of our cap coverage is in Figure 1; it includes areasidentiﬁed in previous mental rotation studies as well as keylanguage areas.Our number of channels is higher compared to manypreviously published fNIRS papers in software engineering,4ig. 1: fNIRS cap used in our experiments. Red circles arelight transmitters while blue circles are light receivers. fNIRSis only able to observe brain activation on channels betweennearby transmitters and receivers.giving broader coverage [4], [19]. We used two cap sizes toaccommodate different head circumferences (58 cm and 60cm). Signals were sampled at 50 Hz.

C. fNIRS Stimuli

We now turn to a description of the content of our fNIRSstimuli. As mentioned in Section III-B, participants wereshown three categories of stimuli: mental rotation, reading-based stimuli, and coding-based stimuli. All three types askedthe participant to choose between two answers A and B .We use mental rotation stimuli adapted from Peters andBattista’s Mental Rotation Stimulus Library [42]. These stim-uli are designed to induce brain activation associated withmentally rotating 3D shapes, one facet of visiospatial cognition(See section II-C). In each mental rotation stimulus, theparticipant was asked to choose which of two objects wasa possible rotation of a third object (see Figure 2a for anexample). To admit direct comparison with previous work, ourmental rotation stimuli are the same stimuli used by Huang etal. when analyzing data structure manipulation brain activationwith experienced software developers [6].For the reading stimuli, we use sentence completion tasksadapted from ofﬁcial Graduate Record Examination practiceexam questions [43], an assessment required for admission tomany graduate programs. For each stimulus, participants wereasked to read a sentence and choose the appropriate word orphrase to ﬁll a blank (see Figure 2b for an example).For the coding stimuli, we created a corpus of short codesnippets that use constructs familiar to introductory computingstudents. Speciﬁcally, they contained boolean logic, whileloops, for loops, and arrays. These are all early core conceptsin most introductory curricula [44] and are also covered in ourinstitution’s CS1. Unfortunately, we were not able to directlyreuse coding stimuli from previous neuroimaging studies asthey are generally geared toward expert programmers and thuscontain constructs unfamiliar to novices. For the array-basedquestions, however, we were able to adapt stimuli created byHuang et al. [6]. For each coding stimulus, participates wereasked to choose either the correct output or return value of ashort code snippet (see Figure 2c for an example).

D. Followup Programming Assessment

At the end of the semester (10–12 weeks after the fNIRsscans), we invited participants to complete a written program-ming test. Along with the programming test, participants alsocompleted a battery of cognitive, and behavioral assessments.For the programming assessment, we used the

Second CS1Assessment (SCS1), a validated language-agnostic measure ofCS1 programming ability [37]. The SCS1 contains 27 multiplechoice questions and takes one hour. It covers Boolean logic,while loops, for loops, arrays, if statements, functions, andrecursion. There are three types of questions: deﬁnition ques-tions, code tracing questions, and code replacement questions.Due to COVID-19, we were unable to hold the test in person.Rather, participants completed an online version of the SCS1over a proctored video call. Responses were then checked fortiming-based anomalies to ensure participants did not rushthrough the test.IV. A

NALYSIS M ETHODOLOGY

We now go over our methodology for analyzing the fNIRSdata. Broadly, there are three stages in our analysis pipeline:preprocessing, individual modeling, and group level modeling.

1) Preprocessing:

The raw data, in the form of light intensityvalues, were converted into optical density data by calculatingthe ﬂuctuations in light absorption by the presence of eitheroxygenated (HbO) or deoxygenated (HbR) blood. The opticaldensity data were then converted into an HbO/HbR signalusing the Modiﬁed Beer-Lambert law. We ran a general linearmodel (GLM) with pre-whitening and robust least squares toﬁt the data [45].

2) Individual Subject Modeling:

After the hemodynamicresponse was modeled for each subject, quality control checkswere implemented to limit the amount of noise in the group-level model. The signal-to-noise ratio, anticorrelation of HbOand HbR, and brain-activation plots were considered whendeciding whether to exclude individual blocks or whole par-ticipants. The signal-to-noise ratio was calculated as a ratiobetween the absolute signal mean and standard deviation, witha threshold set at 0.9. As a result, 16 blocks were excludedfrom further analysis (see Section III-B for a discussion of ourexperimental block setup). In an ideal hemodynamic responsefunction, the levels of HbO will increase as the levels of HbRdecrease and vice-versa [46], so the correlation between thesetwo levels should be negative. Thirteen additional blocks thatdid not show this pattern were excluded from further analysis.Next, the brain activations estimated by the GLM for

AllConditions > Rest were plotted onto brain models usinga photogrammetry-based localization method [47], at whichpoint the loci of activity could be examined and scrutinized.We expected activity in the visual cortex (as this is a visualexperiment), as well as in the inferior frontal gyrus duringthe reading task (as this is a well-documented region forlanguage processing). Thirty seven blocks that did not pass thisactivation pattern check were excluded. These criteria were notmutually exclusive. In total, 59 blocks from 31 participantswere included in the group-level modeling.5 a) Mental Rotation: correct answer is “A”. (b) Reading: correct answer is “B”. (c) Code: correct answer is “A”.

Fig. 2: Example fNIRS Stimuli for mental rotation ( a ) , reading ( b ) , and coding ( c ) tasks.

3) Group Modeling:

We used a linear mixed effects modelfor group level analysis, contrasting

Task > Baseline acti-vations to estimate task-related brain activations and brain-behavior correlations. Lastly, we applied a false-discovery rate(FDR) threshold correction ( q < . ) to account for themultiple-comparison issue.V. V ALIDATION

In this section we validate that the brain activation patternswe observe during mental rotation and reading align with thoseestablished by previous work. Speciﬁcally, we present ourresults for the contrasts

Mental Rotation > Rest and

Reading > Rest . We provide brain activation visualizations for

MentalRotation > Rest and

Reading > Rest in Figures 3 and 4respectively. We also provide a tabular view of our results withspeciﬁc t -values in Table II (see Section II-B for an overviewof t -values and A > B notation).In Mental Rotation > Rest , we observe signiﬁcant bilateralactivation in the occipital and parietal lobes (BAs 7, 17, 18,19, 39). The occipital lobe (BAs 17, 18, 19) is associatedwith the visual cortex and is responsible for tasks such asimage and pattern recognition, both visiospatial processes. Theparietal activation (BAs 7, 39) is also localized in regionsassociated with spatial tasks including spatial reasoning andmental rotation [33], [48].Our results are therefore consistent with previous mentalrotation neuroimaging; in his meta-review of mental rotationimaging studies, Zacks also identiﬁed BAs 7, 17, 19, and39 as active during mental rotation tasks [31]. We do notobserve signiﬁcant supplementary motor cortex activation (BA6), another area often active during mental rotation tasks. Thisis not too surprising, however, as the supplementary motorcortex is most strongly activated in tasks which encourageparticipants to use motor simulation strategies [31]; we did notprompt participants to use such strategies in our experiment.In

Reading > Rest , we observe signiﬁcant occipital, parietal,and prefrontal activation lateralized to the left hemisphere,aligning with previous work. We observe signiﬁcant activationin both Broca’s and Wernicke’s areas, widely considered twoof the most important language areas [34]. We also observesigniﬁcant activation in the left dorsolateral prefrontal cortex(See section II-B, BA 46), a region associated with attentionand working memory. Regarding the occipital activation, whilewe observe some bilateral activation, signiﬁcant activation is Fig. 3: Baseline Mental Rotation Activation: Red indicatesregions more activated during the mental rotation task whileblue indicates regions more activated during rest. Note that allsigniﬁcant mental rotation activation is located in the back ofthe head in the parietal and occipital lobes.Fig. 4: Baseline Reading Activation: Red indicates regionsmore activated during the reading task while blue indicatesregions more activated during rest. Note the reading activationin Broca’s and Wernicke’s areas (the double arrow) as well asthe left-lateralized occipital and parietal activation (the star).substantially more widespread in the left hemisphere. The leftoccipital cortex has been found to correspond with word andletter speciﬁc pattern recognition [49].6ur activations for rotation and reading align with pre-vious work: signiﬁcant occipital and parietal activationfor rotation, and occipital and prefrontal activation forreading, including Broca’s and Wernicke’s areas. Rotationactivation is bilateral while reading is primarily in theleft hemisphere. This validation gives conﬁdence for bothconstruct and internal validity (i.e., that our protocol mea-sures what we think it measures and does so correctly).VI. E

XPERIMENTAL R ESULTS

We now present the results of our experiment probing theneurological connections between reading, mental rotation,and programming for novice programmers (See Section II-Bfor an overview of relevant notation and vocabulary (e.g. A > B ). We focus our results around three research questions: • RQ1—Programming Activation: What areas of the brainactivate when novice software engineers program? • RQ2—Comparative Activation: How does the codingbrain activation of novices compare to their brain acti-vation during mental rotation and during reading? • RQ3—Prediction: Are there connections between codingbrain activation patterns at the beginning of CS1 and theirprogramming performance at the end of the course?

A. RQ1—Programming Activation

To determine which areas of the brain activate when novicesoftware engineers program, we present the results of thecontrast

Code > Rest . That is, we test which brain areassigniﬁcantly distinguish programming from a resting state( p < . , q < . ). We also discuss the functionality ofthe distinguishing brain regions. Figure 5 contains a brainactivation visualization for Code > Rest , and we provide atabular view of our results with speciﬁc t -values in Table II.While coding, novices exhibit signiﬁcant occipital activation(BAs 17, 18, 19). While bilateral, we observe somewhatstronger right hemisphere activity. Functionally, the occipitalcortex is associated with visual processing, and it includesareas such as the primary visual cortex (BA 17) and visualassociation area (BA 18). This occipital activation is thestrongest activation we observe in Code > Rest , with threeout of the ﬁve channels with t -values greater than ﬁve.Beyond occipital activation, we also observe signiﬁcantposterior parietal activation, primarily in the angular gyrus(BA 39). This activation is bilateral: the other two channelswith t -values greater than ﬁve both cover BA 39, one ineach hemisphere. In the left hemisphere, the angular gyrusis important for language-related tasks [50]. Some researchersalso include BA 39 in Wernicke’s area, one of the two mainbrain cortex regions associated with natural language process-ing. However, we do not observe activation while coding inthe regions most commonly associated with Wernicke’s area:BAs 22 and 40. The angular gyrus is also strongly associatedwith spatial cognition tasks including spatial orientation (e.g.,distinguishing left from right), spatial attention, numerical Fig. 5: Signiﬁcant Coding Activation: Red indicates regionsmore activated during the coding task while blue indicates re-gions more activated during rest. Note the widespread bilateralactivation in both the DLPFC (the arrows) and the occipitaland posterior parietal cortices (the star).computation, and mental rotation [31], [51]. While the spatialfunctionality of the angular gyrus is bilateral, many spatialtasks, including mental rotation, are concentrated in the righthemisphere [51]. Therefore, the bilateral activation of theangular gyrus indicates that novices use both language andspatial cognitive processes while programming .We also observe signiﬁcant activation in the frontal cortex.Speciﬁcally, we observe bilateral activation in the DLPFC (BA46) and activation in the left superior premotor cortex (BA 6).The premotor cortex is associated with motor processing, andit has also been found to activate during visiospatial tasksincluding mental rotation [31]. The DLPFC is associated withworking memory; lower activation in this region correspondswith worse performance on working-memory intensive taskssuch as complex problem solving [52]. The signiﬁcant bilateralDLPFC activation, therefore, indicates that novices ﬁnd pro-gramming a challenging and working memory intensive task .Novices engage brain regions associated with languageand spatial cognition, as well as regions associated withincreased demand for attention and executive function (vianeural activity in Code > Rest , p < . , q < . ). B. RQ2—Signiﬁcant Comparative Activation

We now compare novices’ coding brain activation to theiractivation during mental rotation and reading by presenting ourﬁndings for the contrasts

Code > Mental Rotation and

Code > Reading . We provide brain activation visualizations for

MentalRotation > Rest and

Reading > Rest in Figures 6 and 7, andwe provide a tabular view of our results with speciﬁc t -valuesin Table II. Our high level ﬁnding is that while coding, mentalrotation, and reading are all neurally distinct tasks, we observemore substantial differences between coding and reading thanwe do between coding and mental rotation .7 rain Region Rotation > Rest Reading > Rest Code > Rest Code > Rotation Code > Reading

Frontal CortexLeft DLPFC (BA 46) . − .

32 3 . − .

95 2 . − .

96 3 . − . Right DLPFC (BA 46) − . − − .

62 2 . − .

27 3 . − .

79 2 . − . Broca’s Area (Left BAs 44 and 45) . − . − . − − . IFG (Right BAs 44 and 45) . − .

27 3 . − .

79 2 . − . Left Supplementary Motor Cortex (BA 6) . − .

45 3 . − .

45 3 . − . Right Premotor Cortex (BA 6) − . − − . Left BA 8 − . − − .

07 3 . − .

30 2 . − . − . − . Right BA 8 − . − − .

62 2 . − . Left BA 9 − . − − . − . − − . − . − − . − . − . Right BA 9 − . − − . − . − − . − . − − .

13 2 . − . Temporal CortexWernicke’s Area (Left BAs 22, 40) − . − − .

44 5 . − .

91 6 . − . − . − − . Right BA 21 . − . Right BA 22 − . − − .

65 3 . − . Right Auditory Cortex (BA 41) − . − − .

81 6 . − . Parietal CortexLeft BA 7 . − .

38 3 . − .

17 3 . − .

64 2 . − . Left Angular Gyrus (BA 39) . − . − . − − . − . − .

28 6 . − . Right Angular Gyrus (BA 39) . − .

83 3 . − . − . − .

98 4 . − . Occipital CortexLeft BA 17 . − .

36 8 . − .

11 2 . − . − . − − . − . − . Right BA 17 . − .

17 4 . − . − . − − . − . − − . Left BA 18 . − .

36 3 . − .

11 2 . − . − . − − . − . − − . Right BA 18 . − .

48 4 . − .

29 5 . − . − . − − . − . − . Left BA 19 . − .

76 2 . − .

16 3 . − . Right BA 19 . − .

83 3 . − . − . − − .

49 3 . − . TABLE II: t -value statistics for Mental Rotation > Rest , Reading > Rest , Code > Rest , Code > Mental Rotation , and

Code > Reading . All reported results are signiﬁcant ( p < . ) and pass our false discovery threshold ( q < . ). Blank cells indicateno signiﬁcant effect was found in that region. Results closer to +8 or -8 indicate more signiﬁcant activation or deactivation.We highlight results with t-values less than -5 or greater than +5.Fig. 6: Activation Contrast between Programming and MentalRotation: Red indicates regions more activated during codingwhile blue indicates regions more activated during mentalrotation. Note that compared to mental rotation, coding hasstronger bilateral frontal activation (the arrow) and right pos-terior parietal activation in the angular gyrus (the star).For Coding > Mental Rotation , we observe several signiﬁ-cant differences in activation. First, when coding, participantsexhibit more bilateral frontal activation than while mentallyrotating objects. This comparative activation is in the DLPFCand the premotor cortex (BAs 6 and 46), areas commonly as- Fig. 7: Activation Contrast between Programming and Read-ing: Red indicates regions more activated during coding whileblue indicates regions more activated during reading. Notethat reading has comparatively more activation in Broca’s andWernicke’s areas (the double arrow), while coding has substan-tially more right frontal activation (the star) and more right-lateralized occipital / parietal activation (the single arrow).sociated with working memory and spatial manipulation [31],[52]. Similarly, we observe comparatively high coding activa-tion in the right angular gyrus, a region also connected withspatial reasoning [51]. We ﬁnd it intriguing that many of the8reas with more activity for coding than for mental rotationare associated with spatial reasoning, ostensibly the processmeasured by the mental rotation stimuli. This may imply thatcoding is a challenging task that is actually more spatiallyintensive for novices than simple mental rotation.When considering a contrast A > B , it is important todistinguish between a comparison that is positive because A is greater than B (called a “true activation”) vs. one that ispositive only because B is negative (a “strong deactivation”).All of the signiﬁcant Coding > Mental Rotation differencesdiscussed so far are channels with either positive activation orno signiﬁcant activation in

Rotation > Rest . Thus, they areall true differences between coding and mental rotation. Onthe other hand, while we also observe a very strong compar-ative activation ( t = 6 . ) in Coding > Mental Rotation inWernicke’s area, this is a facet of the same channel’s strongdeactivation in

Rotation > Rest instead of a true activation.For

Coding > Reading , we also observe signiﬁcant differ-ences in activation: coding has comparatively stronger acti-vation throughout the right hemisphere and in the premotorcortex while reading has stronger activation in Broca’s andWernicke’s areas. All of the signiﬁcant differences in the lefthemisphere including Broca’s and Wernicke’s areas are truecontrasts; that is, none of them are caused by a signiﬁcantdeactivation in one of the

Task > Rest comparisons. In theright hemisphere, several of the apparent activations for codingare caused by strong deactivation in

Reading > Rest . Evenso, we observe true comparative right-hemisphere activations:the right occipital, angular gyrus, and DLPFC are all moreactivated while coding than while reading.While we observe that Coding is neurologically distinctfrom both mental rotation and reading, we also observe moresubstantial differences in

Coding > Reading than in

Coding > Mental Rotation . In

Coding > Reading , there are fourchannels with t -values greater than +5 or less than − ,while in Coding > Mental Rotation , there is only one (seeTable II). Furthermore, the strong channel difference in

Coding > Mental Rotation is not particularly compelling as it is causedby deactivation. Taken together with the previous functionalanalysis, this trend hints that for novices, coding is a morespatially based and a less language-based cognitive process.We ﬁnd that, for novices, coding is neurally distinctfrom both reading and spatial reasoning. Coding engagesregions associated with working memory more than doeseither reading or rotation, indicating that programmingis a more cognitively challenging task. However, weobserve more signiﬁcant substantial differences in

Coding > Reading than we do in

Coding > Rotation : novices mayrely heavily on spatial reasoning while coding.

C. RQ3—Prediction

We now turn to an exploratory analysis of connectionsbetween observed brain activation patterns and participants’ ﬁnal programming assessment scores. To do so, we use

Repre-sentational Similarity Analysis (RSA), a common Psychologyapproach, to correlate brain activity interactions with scores onthe programming post-test (see Kriegeskorte et al. [53] for anintroduction to RSA). We test if the brain activation similaritybetween mental rotation and coding (

Mental Rotation × Coding ) or the brain activation similarity between reading andcoding (

Reading × Coding ) are correlated with SCS1 scores.We calculate these correlations for the right hemisphere, lefthemisphere, right hemisphere frontal, left hemisphere frontal,and occipital regions for a total of ﬁve statistical tests perhypothesis and ten total test–hypothesis pairs.We ﬁnd a correlation that remains signiﬁcant after apply-ing the Bonferroni correction for multiple comparisons [54].Speciﬁcally, we ﬁnd a signiﬁcant medium negative correlationbetween (

Mental Rotation × Coding ) in the right frontal regionand programming post-test score ( r = − . , p = 0 . ,adjusted Bonferroni p = 0 . ): the less similar the neuralactivation patterns for coding and rotation, the better the ﬁnalprogramming assessment outcome.Notice that coding also elicited stronger right frontal activa-tion than the mental rotation task. It is therefore possible thatthe more individuals engage the right frontal, in a way thatis dissimilar from its engagement during spatial processing,the less progress they make over the course of the semester.The hypothesis is similar to the observation that the moredissimilar is the activity between novice readers and theirlanguage, the less progress they make in learning to read [55].In the Psychology literature, one common approach for testingsuch a hypothesis would be to investigate correlations betweensigniﬁcant channel activations and response time. However,while in some settings response time can be used as a proxyfor difﬁculty, our coding stimuli were not designed withthat consideration; we do not ﬁnd a statistically signiﬁcantcorrelation ( r = − . , p = 0 . ) and lack enoughinformation to either substantiate or refute that hypothesis.In an exploratory analysis relating initial brain activationpatterns and post-test programming scores, we ﬁnd that less similar patterns of activation for coding and mentalrotation in the right frontal hemisphere at the start of thesemester predict better outcomes on the end-of-semesterﬁnal programming assessment ( r = − . , p = 0 . ).VII. D ISCUSSION

In this section, we discuss the implications of our experi-mental results. In particular, we consider how the coding brainactivation patterns of novice programmers compare to thoseof more experienced software developers (Section VII-A) andalso discuss future research directions (Section VII-B).

A. Novices vs. Experts

In this section, we discuss how our results compare tothose observed in neuroimaging studies of expert developers.Generally, we observe more right-hemisphere activation, more9ngagement of visiospatial processes, and less engagement oflanguage processes than is seen in experts. For example, allsigniﬁcant activation areas observed by Siegmund et al. werein the left hemisphere, a majority coinciding with establishedlanguage regions [1]. Similarly, Floyd et al. found that forexperts, programming becomes increasingly less distinguish-able from reading, a left-lateralized cognitive activity. In thiscontext, our work helps establish an experience-based later-alization shift: novice programmers generally exhibit bilateralactivation, especially in regions associated with visiospatialprocessing, while expert developers see increasingly left-lateralized activation centered in language-associated regions.However, not all extant studies of experts observe a strongconnection between reading and programming: in their studyexamining code writing (as opposed code reading, the focusof other neuroimaging studies including our own), Krueger et al. observed signiﬁcantly more right-brain activation inspatial areas during code writing than prose writing [7]. Moreinvestigation is needed to see if code writing exhibits a similarlateralization trend with expertise. However, it is possible thatcode writing consistently remains a more spatial activity.

B. Future Directions and Implications

To the best of our knowledge, this is the ﬁrst paper to usemedical imaging to explicitly investigate novice programmercoding brain activation patterns and their correlates. As aresult, beyond replications and meta-analyses, which are morecommon in Psychology (e.g., [56]) but not yet as prevalent inComputer Science, many of the research implications relateto building on the baselines established by the results pre-sented here. We focus on two dimensions: how programmerslearn other computing activities at a cognitive level, andhow learning programming in general compares to establishedneurological theories for learning other disciplines.For the former, we note that recent medical imaging re-search on programming has focused on program comprehen-sion and code review, with a lesser emphasis on data structuresand code writing [1]–[3], [6], [7]. Other activities remain un-explored. To take one example, it is unknown whether booleanlogic has a signiﬁcant spatial cognitive component. Whilegeneral logic has been studied, particular paradigms, such ascircuit design, may be processed differently by humans, poten-tially suggesting alternate training or tool-support approaches.Other experimental protocol paradigms are also relativelyunexplored: almost all software engineering neuroimaging re-search consists of showing ﬁxed, static stimuli. Even relativelyfoundational experimental structures in Psychology, such aspriming, masking, and recall are unexplored. For example,building on the inﬂuential work of Chase and Simon [57],Psychologists have studied the relationship between chessexpertise and the ability to recall or reason about brieﬂy-presented random chess boards [58]. While Siegmund et al. randomized aspects of programs to study comprehension [2],using such paradigms to tease apart programming expertiseneurologically remains unexplored. For the latter, we observe that a number of theories andhypotheses about how humans learn various subjects, fromsecond languages [59] to musical instruments [60], havebeen posited in the literature. We believe that it will befruitful to investigate whether a sequential model or a morespatial encoding strategy (see Margulieux [24]) best describeslearning to program. Based on our results, our preliminaryspeculation is that spatial encoding is indeed a key generalstrategy employed by novices that may decrease in importanceover time. If true, this would have implications for the useof domain-speciﬁc strategies in skills-based training. A moreconcrete investigation for programming is merited.VIII. L

IMITATIONS AND T HREATS TO V ALIDITY

Although our experiments and analysis provide signiﬁcantevidence about novice programmers and spatial ability, ourresults may not generalize. We consider a number of threatsto validity and discuss how our approach mitigates them.We note that fNIRS experiments are dependent on transmit-ters and receivers for infrared light (see Section II-A): if nopairs are present for a relevant portion of the brain, activitythere cannot be measured. We mitigate this potential source offalse negatives in two ways. First, adapt a validated cap designproposed by Huang et al. for use in software engineeringand spatial ability studies [6]. Second, information from othermedical imaging approaches, such as fMRI, which do notdepend on transmitter placement, is used to determine whichbrain regions to measure [1], [3], [7].We also consider issues of construct validity: are we mea-suring what we claim to be measuring (e.g., spatial ability,introductory programming, etc.)? While there are multipleaspects to spatial ability, we use mental rotation, an establishedparadigm for investigating spatial ability, both in psychologyin general [33], [48], [61] and in computer science in particu-lar [6], [62]. For introductory programming, we make use ofthe SCS1, a validated assessment [37].Finally, all of our subjects are students at the same largeUS university. This aspect of participation selection may limitthe generality of our results to other populations.IX. R

ELATED W ORK

We place our results in context with respect to three broadcategories of previous work. a) Spatial Skills and Programming:

There is a positiverelationship between programming and spatial ability and [28],[29], [63], [64]. Parkinson and Cutts found that “spatial skillstypically increase as the level of academic achievement incomputer science increases” [29]. Furthermore, Parker et al. found that spatial reasoning is better mediating variable forafﬂuence discrepancies in computer science than computingaccess [64]. There have also recently been studies establishinga causal transfer between spatial reasoning training and com-puter science performance. Cooper et al. and Bockmon et al. ran studies with high school and university programmers, ﬁnd-ing that those who participated in additional spatial training10erformed better on a ﬁnal programming test [62], [65]. Mar-gulieux’s spatial encoding strategy (SpES) framework relatesthe cogitative processes behind spatial ability and learning toprogram [24]. SpES hypothesizes that strong spatial reasoningability helps novice programmers use general strategies formentally encoding non-verbal information.Our results provide context and nuance to such claims:we ﬁnd that novice programmers do use spatial cognitiveprocesses while programming (RQ1) and that the degree ofdissimilarity between patterns of neural activity for coding andspatial tasks can predict ﬁnal outcomes (RQ3). b) Reading and Programming: From documentation tocode review to requirements elicitation to code summarization,many software engineering activities involve a signiﬁcantreading component [66]–[68]. Experimentally, several studiesreport a correlation between overall programming ability andthe ability to read a program and describe its function innatural language [69], [70] or posit natural language readingas a basis for code comprehension [71], [72]. Some modelsextend to training: Fedorenko et al. hypothesize that “peda-gogies for developing linguistic ﬂuency” can inform how totrain programmers, based on a perceived similarity betweenlearning programming and second language learning [73].Their hypothesis was supported by a recent study by Prat et al. which found that natural language aptitude was a signiﬁcantfactor in predicting programming success [72].Our results elaborate on how such claims apply to noviceprogrammers: while novices do use language cognitive pro-cesses when coding (RQ1), we ﬁnd that coding and reading aremore different for novices than are coding and spatial ability,suggesting they rely more on spatial reasoning when coding. c) Medical Imaging and Software Engineering: Follow-ing the pioneering work of Siegmund et al. [1], a number ofpapers have used medical imaging techniques to investigatesoftware engineering activities (e.g., [1]–[5], [7], [19]–[21]).Related to our work in particular, Yu et al. used fNIRS to com-pare mental rotation tasks to data structure manipulation [6].A key distinction of our work is that those studies focus onprogrammers with years of experience. Explicit investigationsof programming expertise using neuroimaging are relativelyrare, and tend to involve either proxies such as undergraduategrades [3] or comparisons between graduates or professionalsand undergraduates (e.g., Siegmund et al. measure 8 studentsand 3 professionals [2, Sec. 3.3]). While Floyd et al. foundthat coding and prose tasks are more similar in terms ofneural activity for senior undergraduate than for mid-levelundergraduates [3] (i.e., as programmers become more experi-enced), our results provide evidence that the pattern continues :as programmers become less experienced, programming andreading show less cognitive similarity (RQ1, RQ2).X. C

ONCLUSION

Neurological understandings of how novices engineer soft-ware has implications for training, pedagogy and tool de-velopment. In a study of 31 participants, we use fNIRS tocompare the neural activity for introductory programming, reading and spatial reasoning tasks in a controlled, contrast-based experiment. We ﬁnd that all three tasks — coding,prose reading, and mental rotation — are mentally distinctfor novices . This clariﬁes previous ﬁndings that they may bemore similar in experts [3] or for complex data structures [40].However, while those tasks are neurally distinct, we ﬁnd more signiﬁcant and substantial differences between proseand coding than between mental rotation and coding .Intriguingly, we ﬁnd generally more activation in areas ofthe brain associated with spatial ability and task difﬁcultywhile coding compared to that reported studies with moreexpert developers. Finally, in an exploratory analysis, we ﬁndthat certain patterns of neural activity at the start of thesemester are predictive of end-of-semester outcomes , open-ing the door for future experiments to model such phenomenamore directly. To the best of our knowledge, this is the ﬁrststudy to focus speciﬁcally on novice programmers and tomake use of a signiﬁcant time-delayed outcomes assessment.While preliminary, these ﬁndings both elaborate on previousresults (e.g., relating expertise to a similarity between codingand prose reading) and also provide a new understanding thecognitive processes underlying novice programming.A

CKNOWLEDGEMENTS

We acknowledge the partial support of the NSF (CCF1908633, CCF 1763674) as well as both the Center forResearch on Learning and Teaching and also the Center forAcademic Innovation at the University of Michigan. Addi-tionally, we thank Jessica Kim for her help understandingthe fNIRS setup, and we thank Yu Huang for sharing thefNIRS cap from her previous work. Finally, we thank ourundergraduate research assistants Anne Fitzpatrick, Annie Li,and Serena Chan for their logistical help and their help pilotingfNIRS stimuli. R

EFERENCES[1] J. Siegmund, C. K¨astner, S. Apel, C. Parnin, A. Bethmann, T. Leich,G. Saake, and A. Brechmann, “Understanding understanding sourcecode with functional magnetic resonance imaging,” in

Proceedings ofthe 36th International Conference on Software Engineering , 2014, pp.378–389.[2] J. Siegmund, N. Peitek, C. Parnin, S. Apel, J. Hofmeister,C. K¨astner, A. Begel, A. Bethmann, and A. Brechmann, “MeasuringNeural Efﬁciency of Program Comprehension,” in

Foundationsof Software Engineering , 2017, pp. 140–150. [Online]. Available:http://doi.acm.org/10.1145/3106237.3106268[3] B. Floyd, T. Santander, and W. Weimer, “Decoding the representation ofcode in the brain: An fmri study of code review and expertise,” in . IEEE, 2017, pp. 175–186.[4] T. Nakagawa, Y. Kamei, H. Uwano, A. Monden, K. Matsumoto, andD. M. German, “Quantifying programmers’ mental workload duringprogram comprehension based on cerebral blood ﬂow measurement: acontrolled experiment,” in

Companion proceedings of the 36th interna-tional conference on software engineering , 2014, pp. 448–451.[5] S. Fakhoury, Y. Ma, V. Arnaoudova, and O. Adesope, “The effect of poorsource code lexicon and readability on developers’ cognitive load,” in

International Conference on Program Comprehension , 2018.[6] Y. Huang, X. Liu, R. Krueger, T. Santander, X. Hu, K. Leach,and W. Weimer, “Distilling neural representations of data structuremanipulation using fMRI and fNIRS,” in

International Conferenceon Software Engineering , 2019, pp. 396–407. [Online]. Available:https://doi.org/10.1109/ICSE.2019.00053

7] R. Krueger, Y. Huang, X. Liu, T. Santander, W. Weimer, and K. Leach,“Neurological divide: An fmri study of prose and code writing,” in

International Conference on Software Engineering , 2020.[8] R. B. Buxton, K. Uluda˘g, D. J. Dubowitz, and T. T. Liu, “Modeling thehemodynamic response to brain activation,”

Neuroimage , vol. 23, pp.S220–S233, 2004.[9] D. A. Boas, C. E. Elwell, M. Ferrari, and G. Taga, “Twenty years offunctional near-infrared spectroscopy: introduction for the special issue,”2014.[10] S. Lloyd-Fox, A. Blasi, and C. Elwell, “Illuminating the developingbrain: the past, present and future of functional near infrared spec-troscopy,”

Neuroscience & Biobehavioral Reviews , vol. 34, no. 3, pp.269–284, 2010.[11] A.-C. Ehlis, S. Schneider, T. Dresler, and A. J. Fallgatter, “Applicationof functional near-infrared spectroscopy in psychiatry,”

Neuroimage ,vol. 85, pp. 478–488, 2014.[12] H. Obrig, “NIRS in clinical neurology — a ‘promising’ tool?”

Neuroim-age , vol. 85, pp. 535–546, 2014.[13] Scicurious, “IgNobel prize in neuroscience: The dead salmonstudy,”

Scientiﬁc American Blog Network , Sep 2012. [On-line]. Available: https://blogs.scientiﬁcamerican.com/scicurious-brain/ignobel-prize-in-neuroscience-the-dead-salmon-study/[14] C. M. Bennett, M. Miller, and G. Wolford, “Neural correlates ofinterspecies perspective taking in the post-mortem atlantic salmon: anargument for multiple comparisons correction,”

Neuroimage , vol. 47, no.Suppl 1, p. S125, 2009.[15] R. N. Henson, C. J. Price, M. D. Rugg, R. Turner, and K. J. Friston, “De-tecting latency differences in event-related bold responses: application towords versus nonwords and initial versus repeated face presentations,”

Neuroimage , vol. 15, no. 1, pp. 83–97, 2002.[16] R. Aamand, T. Dalsgaard, Y.-C. Lynn Ho, A. Moller, A. Roepstorff, andT. Lund, “A NO way to BOLD?: Dietary nitrate alters the hemodynamicresponse to visual stimulation,”

NeuroImage , vol. 83, 07 2013.[17] M. A. Lindquist, J. M. Loh, L. Y. Atlas, and T. D. Wager, “Modelingthe hemodynamic response function in fMRI: efﬁciency, bias and mis-modeling,”

Neuroimage , vol. 45, no. 1, pp. S187–S198, 2009.[18] M. Ferrari and V. Quaresima, “A brief review on the history of humanfunctional near-infrared spectroscopy (fnirs) development and ﬁelds ofapplication,”

Neuroimage , vol. 63, no. 2, pp. 921–935, 2012.[19] Y. Ikutani and H. Uwano, “Brain activity measurement during programcomprehension with nirs,” in

Software Engineering, Artiﬁcial Intelli-gence, Networking and Parallel/Distributed Computing . IEEE, 2014,pp. 1–6.[20] J. Duraes, H. Madeira, J. Castelhano, C. Duarte, and M. C. Branco,“WAP: Understanding the Brain at Software Debugging,” in

Interna-tional Symposium on Software Reliability Engineering , 2016, pp. 87–92.[21] J. Castelhano, I. C. Duarte, C. Ferreira, J. Duraes, H. Madeira, andM. Castelo-Branco, “The Role of the Insula in Intuitive Expert BugDetection in Computer Code: An fMRI Study,”

Brain Imaging andBehavior , May 2018.[22] N. Peitek, J. Siegmund, C. Parnin, S. Apel, J. Hofmeister, and A. Brech-mann, “Simultaneous Measurement of Program Comprehension withfMRI and Eye Tracking: A Case Study,” in

Symposium on EmpiricalSoftware Engineering and Measurement , 2018, to appear.[23] K. Brodmann,

Brodmann’s: Localisation in the cerebral cortex .Springer Science & Business Media, 2007.[24] L. E. Margulieux, “Spatial encoding strategy theory: The relationshipbetween spatial skill and stem achievement,” in

Proceedings of the 2019ACM Conference on International Computing Education Research , ser.ICER ’19, 2019, p. 81–90.[25] M. Hegarty and M. Kozhevnikov, “Types of visual–spatial represen-tations and mathematical problem solving.”

Journal of educationalpsychology , vol. 91, no. 4, p. 684, 1999.[26] J. Wai, D. Lubinski, and C. P. Benbow, “Spatial ability for stem domains:Aligning over 50 years of cumulative psychological knowledge solidiﬁesits importance.”

Journal of Educational Psychology , vol. 101, no. 4, p.817, 2009.[27] S. Sorby, N. Veurink, and S. Streiner, “Does spatial skills instructionimprove stem outcomes? the answer is ‘yes’,”

Learning and IndividualDifferences , vol. 67, pp. 209–222, 2018.[28] S. Jones and G. Burnett, “Spatial ability and learning to program,”

Human Technology: An Interdisciplinary Journal on Humans in ICTEnvironments , 2008. [29] J. Parkinson and Q. Cutts, “Investigating the relationship betweenspatial skills and computer science,” in

Proceedings of the 2018 ACMConference on International Computing Education Research , 2018, pp.106–114.[30] D. H. Uttal, N. G. Meadow, E. Tipton, L. L. Hand, A. R. Alden,C. Warren, and N. S. Newcombe, “The malleability of spatial skills:A meta-analysis of training studies.”

Psychological bulletin , vol. 139,no. 2, p. 352, 2013.[31] J. M. Zacks, “Neuroimaging studies of mental rotation: a meta-analysisand review,”

Journal of cognitive neuroscience , vol. 20, no. 1, pp. 1–19,2008.[32] R. N. Shepard and J. Metzler, “Mental rotation of three-dimensionalobjects,”

Science , vol. 171, no. 3972, pp. 701–703, 1971.[33] J. C. Culham and N. G. Kanwisher, “Neuroimaging of cognitive func-tions in human parietal cortex,”

Current opinion in neurobiology , vol. 11,no. 2, pp. 157–163, 2001.[34] C. J. Price, “A review and synthesis of the ﬁrst 20 years of pet and fmristudies of heard speech, spoken language and reading,”

Neuroimage ,vol. 62, no. 2, pp. 816–847, 2012.[35] M. Vigneau, V. Beaucousin, P.-Y. Herve, H. Duffau, F. Crivello,O. Houde, B. Mazoyer, and N. Tzourio-Mazoyer, “Meta-analyzingleft hemisphere language areas: phonology, semantics, and sentenceprocessing,”

Neuroimage , vol. 30, no. 4, pp. 1414–1432, 2006.[36] J. Maloney, M. Resnick, N. Rusk, B. Silverman, and E. Eastmond,“The scratch programming language and environment,”

ACM Trans.Comput. Educ. , vol. 10, no. 4, Nov. 2010. [Online]. Available:https://doi.org/10.1145/1868358.1868363[37] M. C. Parker, M. Guzdial, and S. Engleman, “Replication, validation,and use of a language independent cs1 knowledge assessment,” in

Proceedings of the 2016 ACM Conference on International ComputingEducation Research , ser. ICER ’16, 2016, p. 93–101.[38] L. Lobato, J. M. Bethony, F. B. Pereira, S. L. Grahek, D. Diemert, andM. F. Gazzinelli, “Impact of gender on the decision to participate ina clinical trial: a cross-sectional study,”

BMC Public Health , vol. 14,no. 1, pp. 1–9, 2014.[39] M. G. Sackrowitz and A. P. Parelius, “An unlevel playing ﬁeld: Womenin the introductory computer science courses,”

ACM SIGCSE Bulletin ,vol. 28, no. 1, pp. 37–41, 1996.[40] Y. Huang, X. Liu, R. Krueger, T. Santander, X. Hu, K. Leach, andW. Weimer, “Distilling neural representations of data structure manipu-lation using fMRI and fNIRS,” in

International Conference on SoftwareEngineering (ICSE) , 2019.[41] H. Santosa, X. Zhai, F. Fishburn, and T. Huppert, “The NIRS brainAnalyzIR toolbox,”

Algorithms , vol. 11, no. 5, May 2018.[42] M. Peters and C. Battista, “Applications of mental rotation ﬁgures of theshepard and metzler type and description of a mental rotation stimuluslibrary,”

Brain and cognition

Proceedings of the 41st ACM technicalsymposium on Computer science education , 2010, pp. 97–101.[45] J. W. Barker, A. Aarabi, and T. J. Huppert, “Autoregressive model basedalgorithm for correcting motion and serially correlated errors in fNIRS,”

Biomedical optics express , vol. 4, no. 8, pp. 1366–1379, 2013.[46] X. Cui, S. Bray, and A. Reiss, “Functional near infrared spectroscopy(NIRS) signal improvement based on negative correlation between oxy-genated and deoxygenated hemoglobin dynamics,”

Neuroimage , vol. 49,no. 4, pp. 3039–3046, 2010.[47] X.-S. Hu, N. Wagley, A. T. Rioboo, A. F. DaSilva, and I. Kovelman,“Photogrammetry-based stereoscopic optode registration method forfunctional near-infrared spectroscopy,”

Journal of Biomedical Optics ,vol. 25.[48] M. S. Cohen, S. M. Kosslyn, H. C. Breiter, G. J. DiGirolamo, W. L.Thompson, A. Anderson, S. Bookheimer, B. R. Rosen, and J. Belliveau,“Changes in cortical activity during mental rotation a mapping studyusing functional mri,”

Brain , vol. 119, no. 1, pp. 89–100, 1996.[49] L. Cohen, O. Martinaud, C. Lemer, S. Leh´ericy, Y. Samson, M. Obadia,A. Slachevsky, and S. Dehaene, “Visual word recognition in the left andright hemispheres: anatomical and functional correlates of peripheralalexias,”

Cerebral cortex , vol. 13, no. 12, pp. 1313–1333, 2003.[50] S. L. Brownsett and R. J. Wise, “The contribution of the parietal lobesto speaking and writing,”

Cerebral Cortex , vol. 20, no. 3, pp. 517–523,2010.

51] M. L. Seghier, “The angular gyrus: multiple functions and multiplesubdivisions,”

The Neuroscientist , vol. 19, no. 1, pp. 43–61, 2013.[52] A. K. Barbey, M. Koenigs, and J. Grafman, “Dorsolateral prefrontalcontributions to human working memory,” cortex , vol. 49, no. 5, pp.1195–1205, 2013.[53] N. Kriegeskorte, M. Mur, and P. A. Bandettini, “Representationalsimilarity analysis-connecting the branches of systems neuroscience,”

Frontiers in systems neuroscience , vol. 2, p. 4, 2008.[54] S.-Y. Chen, Z. Feng, and X. Yi, “A general introduction to adjustmentfor multiple comparisons,”

Journal of thoracic disease , vol. 9, no. 6, p.1725, 2017.[55] R. A. Marks, I. Kovelman, O. Kepinska, M. Oliver, Z. Xia, S. L. Haft,L. Zekelman, P. Duong, Y. Uchikoshi, R. Hancock, and F. Hoeft, “Spo-ken language proﬁciency predicts print-speech convergence in beginningreaders,”

NeuroImage , vol. 201, p. 116021, 2019.[56] P. R. Ventura Jr, “Identifying predictors of success for an objects-ﬁrstcs1,”

Computer Science Education , 2005.[57] H. A. Simon and W. G. Chase, “Perception in chess,”

CognitivePsychology , vol. 4, no. 1, pp. 55–81, 1973.[58] Y. Gong, K. Ericsson, and J. Moxley, “Recall of brieﬂy presentedchess positions and its relation to chess skill,”

PLoS ONE ,vol. 10, no. 3, p. e0118756, 2015. [Online]. Available: https://doi.org/10.1371/journal.pone.0118756[59] I. Kovelman, S. A. Baker, and L. A. Petitto, “Bilingual and monolingualbrains compared: a functional magnetic resonance imaging investigationof syntactic processing and a possible “neural signature” of bilingual-ism,”

Journal of Cognitive Neuroscience , vol. 20, no. 1, pp. 153–169,2008.[60] R. L. Lathrop, “How students learn music: The psychology of music andmusic education,”

Music Educators Journal , vol. 56, no. 6, pp. 47–145,1970.[61] I. M. Harris, G. F. Egan, C. Sonkkila, H. J. Tochon-Danguy, G. Paxinos,and J. D. Watson, “Selective right parietal lobe activation during mentalrotation: a parametric pet study,”

Brain , vol. 123, no. 1, pp. 65–73, 2000.[62] R. Bockmon, S. Cooper, W. Koperski, J. Gratch, S. Sorby, andM. Dorodchi, “A cs1 spatial skills intervention and the impact onintroductory programming abilities,” in

Proceedings of the 51st ACMTechnical Symposium on Computer Science Education , 2020, pp. 766–772.[63] S. Fincher, A. Robins, B. Baker, I. Box, Q. Cutts, M. de Raadt, P. Haden,J. Hamer, M. Hamilton, R. Lister et al. , “Predictors of success in a ﬁrstprogramming course,” in

Proceedings of the 8th Australasian ComputingEducation Conference (ACE 2006) , vol. 52. Australian ComputerSociety Inc., 2006, pp. 189–196.[64] M. C. Parker, A. Solomon, B. Pritchett, D. A. Illingworth, L. E.Marguilieux, and M. Guzdial, “Socioeconomic status and computerscience achievement: Spatial ability as a mediating variable in a novelmodel of understanding,” in

Proceedings of the 2018 ACM Conferenceon International Computing Education Research , ser. ICER ’18, 2018,p. 97–105.[65] S. Cooper, K. Wang, M. Israni, and S. Sorby, “Spatial skillstraining in introductory computing,” in

International ComputingEducation Research , 2015, pp. 13–20. [Online]. Available: https://doi.org/10.1145/2787622.2787728[66] X. Xia, L. Bao, D. Lo, Z. Xing, A. E. Hassan, and S. Li, “Measuringprogram comprehension: A large-scale ﬁeld study with professionals,”

IEEE Transactions on Software Engineering , vol. 44, no. 10, pp. 951–976, 2017.[67] S. Haiduc, J. Aponte, L. Moreno, and A. Marcus, “On the use ofautomated text summarization techniques for summarizing source code,”in . IEEE, 2010,pp. 35–44.[68] P. W. McBurney and C. McMillan, “Automatic source code summa-rization of context for java methods,”

IEEE Transactions on SoftwareEngineering , vol. 42, no. 2, pp. 103–119, 2015.[69] L. Murphy, S. Fitzgerald, R. Lister, and R. McCauley, “Ability to’explainin plain english’linked to proﬁciency in computer-based programming,”in

Proceedings of the ninth annual international conference on Interna-tional computing education research , 2012, pp. 111–118.[70] M. Lopez, J. Whalley, P. Robbins, and R. Lister, “Relationships betweenreading, tracing and writing skills in introductory programming,” in

Pro-ceedings of the fourth international workshop on computing educationresearch , 2008, pp. 101–112. [71] T. Busjahn, C. Schulte, and A. Busjahn, “Analysis of code reading togain more insight in program comprehension,” in

Proceedings of the11th Koli Calling International Conference on Computing EducationResearch , 2011, pp. 1–9.[72] C. S. Prat, T. M. Madhyastha, M. J. Mottarella, and C.-H. Kuo,“Relating natural language aptitude to individual differences in learningprogramming languages,”

Scientiﬁc reports , vol. 10, no. 1, pp. 1–10,2020.[73] E. Fedorenko, A. Ivanova, R. Dhamala, and M. U. Bers, “The languageof programming: a cognitive perspective,”

Trends in cognitive sciences ,vol. 23, no. 7, pp. 525–528, 2019.,vol. 23, no. 7, pp. 525–528, 2019.