[PDF] Demixed shared component analysis of neural population data from multiple brain areas

Abstract

Recent advances in neuroscience data acquisition allow for the simultaneous recording of large populations of neurons across multiple brain areas while subjects perform complex cognitive tasks. Interpreting these data requires us to index how task-relevant information is shared across brain regions, but this is often confounded by the mixing of different task parameters at the single neuron level. Here, inspired by a method developed for a single brain area, we introduce a new technique for demixing variables across multiple brain areas, called demixed shared component analysis (dSCA). dSCA decomposes population activity into a few components, such that the shared components capture the maximum amount of shared information across brain regions while also depending on relevant task parameters. This yields interpretable components that express which variables are shared between different brain regions and when this information is shared across time. To illustrate our method, we reanalyze two datasets recorded during decision-making tasks in rodents and macaques. We find that dSCA provides new insights into the shared computation between different brain areas in these datasets, relating to several different aspects of decision formation.

Full PDF

DDemixed shared component analysis of neuralpopulation data from multiple brain areas

Yu Takagi ∗ University of Oxford, UK [email protected]

Steven W. Kennerley

University College London, UK [email protected]

Jun-ichiro Hirayama † AIST, Japan [email protected]

Laurence T. Hunt † University of Oxford, UK [email protected]

Abstract

Recent advances in neuroscience data acquisition allow for the simultaneous record-ing of large populations of neurons across multiple brain areas while subjectsperform complex cognitive tasks. Interpreting these data requires us to indexhow task-relevant information is shared across brain regions, but this is oftenconfounded by the mixing of different task parameters at the single neuron level.Here, inspired by a method developed for a single brain area, we introduce anew technique for demixing variables across multiple brain areas, called demixedshared component analysis (dSCA). dSCA decomposes population activity into afew components, such that the shared components capture the maximum amountof shared information across brain regions while also depending on relevant taskparameters. This yields interpretable components that express which variablesare shared between different brain regions and when this information is sharedacross time. To illustrate our method, we reanalyze two datasets recorded duringdecision-making tasks in rodents and macaques. We ﬁnd that dSCA provides newinsights into the shared computation between different brain areas in these datasets,relating to several different aspects of decision formation.

Recent methodological advances make it possible to record thousands of neurons simultaneously[1]. Although such high-dimensional recording yields insights that are not apparent from studyingsingle neuron activity, analysing population data remains a non-trivial problem because of theheterogeneity of responses and ‘mixing’ of encoded variables observed in neural data [2]. Whiletraditionally such heterogeneity was discarded by simply averaging across neurons, more recentlyseveral dimensionality reduction methods have been developed for neurophysiological data thatisolate key features of the population response structure [3]. One popular approach is demixedprincipal component analysis (dPCA; [4, 5]). dPCA identiﬁes components that explain maximumpopulation variance while also depending upon key task parameters. This allows experimenters tocombine rich population recordings with equally rich task design, as dPCA isolates low-dimensionalcomponents that vary along axes deﬁned by features of the experimental task.Besides the number of neurons recorded, neuroscientists are also experiencing a revolution in thenumber of brain areas recorded. This is fascinating because the brain is a connected system comprised ∗ Corresponding author. † These authors contributed equally.Preprint. Under review. a r X i v : . [ q - b i o . N C ] J un f functionally specialised areas that interact with each other to produce complex behaviors. Recently,researchers have started to investigate interactions between populations of neurons in distinct areasfor motor control [6], visual processing [7] or perceptual decision making [8]. Similar to studiesexamining population responses in a single brain region, these studies have applied dimensionalityreduction methods to examine information shared across regions - including principal componentanalysis (PCA; [6]), reduced rank regression (RRR; [7]), or canonical correlation analysis (CCA;[8]).However, while these approaches all yield important insights into cross-regional communication,they cannot identify when and what task parameters are shared across regions. This limits ourunderstanding of how information is shared across regions during cognitive tasks. It is known that thetask parameters are mixed at the level of single neuron [9] or low-dimensional components obtainedby standard dimensionality reduction methods [10]. Thus, it is important to properly demix eachof closely related but distinct task parameters, for example, those encoding decision input, choiceformation and motor output during decision making tasks [11, 12]. It is also important to identifyprecise timing of communications because relevant cross-areal communications may only occur atany speciﬁc timing during the entire task-related processing.Here, inspired by the approach taken by dPCA, we propose a method for identifying shared com-ponents across two areas that is speciﬁc to a task parameter of interest in a time-resolved manner.The key idea is to ‘marginalize’ neural population activity in a single area to demix a speciﬁc taskparameter of interest, while maximizing the information shared by the two areas with a time lag. Wecall this procedure demixed shared component analysis (dSCA). Our contributions are summarized as follows:1. We brieﬂy review previous methods, emphasizing that they cannot identify what and whentask parameters are shared across brain regions.2. Motivated by this fact, we propose dSCA to isolate the contribution of a task parameter ofinterest to the shared information between different brain areas. Using a simulation analysisin Sect. 3 we show that dSCA ﬁnds shared components that are demixed into speciﬁc taskparameters of interest in a time-resolved manner, while previous CCA or RRR fail.3. We reanalyze two previously published decision making datasets [8, 13] and show that dSCAcaptures shared components among different brain areas that is speciﬁc to task parameterssuch as decision input, stimulus valuation, attentional reorienting and choice formation.

Interaction of populations of neurons across different brain areas.

Early studies investigatedinteractions of different brain areas in different scales: pairs of single neurons (e.g. [14]), populationsof neurons in one area and a single neuron in another (e.g. [15]), neurons in one area and the localﬁeld potential (LFP) in another (e.g. [16]), and LFPs in different areas (e.g. [17]). In recent years,some researchers have started to investigate interaction of neuronal populations between brain areas[6, 7, 8]. They commonly applied linear dimensionality reduction methods to study the relationshipbetween neural populations in different areas (Figure 1c, top). For example, Kaufman et al. (2014)[6] applied PCA to each area separately, then regressed from one area to the other area (an approachsometimes referred to as principal component regression). However, the components obtained byPCA that explain maximal variance separately in each area will not necessarily align with those thatwould explain maximal shared information between the two areas.More recent studies have used alternative techniques such as RRR [7] and CCA [8] (Figure 1d, top)to directly ﬁnd low-dimensional latent components from two populations of neurons (Figure 1e,top), optimized for identifying interaction between them. In particular, Steinmetz et al. (2019) [8]also proposed a time-resolved approach, by applying CCA exhaustively on all possible pairs of timepoints/windows between one area and the other area activities (Figure 1f, top). The method, calledjoint peri-event canonical correlation (jPECC) analysis, identiﬁes when cross-regional interactionoccurs from one region to another. However, none of the above approaches provide results that aredemixed by experimental conditions. This means that none of these analyses could identify ( ‘what’ )information was being shared across regions. All codes for the analyses will be published after acceptance. emixing task parameters. It is well known that neurons have mixed selectivity, where a neuronreﬂects more than one task parameter [9]. This is still true for the component after applying standarddimensionality reduction method [10]. It is a critical problem for researchers who are interestedin cognitive processing during a complex experiment, because different types of computations runsimultaneously in the brain, and it is important to be able to dissociate computations pertaining toone task parameter from the others. Several approaches have been proposed (e.g., [4, 5, 10]). Amongsuch approaches, dPCA [4, 5] is one of the most popular methods because of its simplicity. dPCAcombines regression with dimensionality reduction to demix task parameters of interest. However,this demixing is performed on neurons from a single region at a time, rather than capturing sharedinformation across regions.

We begin with a typical neuroscience experiment that motivated us to develop dSCA. In the experi-ment, animals are trained on a set of stimuli to make decisions in order to maximize total rewards(Figure 1a, top). For example, animals earn rewards at the end of each task trial if they choose anappropriate action (

Decision ) depending on the visual stimulus (

Stimulus ) presented after a ﬁxa-tion period. Each trial is thus labelled with the two task parameters, Decision and Stimulus, bothdiscrete-valued. Single neuron activities are measured during a series of task trials using an implantedelectrode array or probe in the brain. In recent years, such multielectrode technologies have been usedfor recording populations of neurons simultaneously across multiple brain regions (Figure 1b). Insome instances, however, data from different electrodes may be recorded in different sessions, and a‘pseudopopulation’ is reconstructed by ﬁrst averaging across task parameters and then concatenatingneurons across recording sessions. We will describe this procedure in detail later.Our goal is to investigate both the content ( ‘what’ ) and timing ( ‘when’ ) of task-related communicationsamong multiple brain areas based upon the measured activities of neuronal populations. We ﬁrstassume that the entire population was measured simultaneously from area ‘ X ’ and area ‘ Y ’ (but notethat our approach generalises to non-simultaneous pseudopopulation recordings below). For eacharea, we thus observe M × T × N arrays of ﬁring rates, where M denote numbers of neurons inthe area, T denotes the number of trials, and N denotes the length of timeseries of a particular timewindow of interest (e.g. 0-500ms after the stimulus onset).Although existing approaches for cross-areal interaction analysis (see Related Work) can detectlow-dimensional representations of shared information between different brain areas, they cannot giveinsights into the type of information in relation to the task parameters of interest. This is because theseapproaches use only the data (ﬁring rate) matrices without taking into account how task parameterscause changes in these matrices. As a result, obtained components from these approaches are ingeneral mixed in terms of the task parameters (Figure 1e, top).This is problematic if we want to study how task-relevant information is passed between brain regionsas a cognitive process unfolds. Consider, for example, a decision task in which sensory information ispassed in a bottom up sweep from lower to higher cortical areas, but the categorical choice emergesin a distributed fashion across multiple layers in the cortical hierarchy (e.g. [18, 19]; see Figure 1cfor simple schematic). Applying standard methods to this data will not differentiate between sensoryinput causing covariation between brain regions’ activity on the one hand, and the emergence of thedecision process on the other.To overcome the problem of what information is shared between regions, we propose to combine theidea of demixing [5] with cross-areal interaction analysis using CCA/RRR. We use demixed SharedComponent Analysis (dSCA) to refer to a family of methods that combine these two principles.We focus on RRR for ease of exposition, but note that RRR reduces to CCA if the target matrix iswhitened [20] and thus the framework can unify both techniques.First, recall that standard least-squares RRR minimizes the following: L RRR = k Y − WX k where X is the area X ’s data matrix of size M X × T , Y is the area Y ’s data matrix of size M Y × T ,and W is coefﬁcient matrix of size M X × M Y , and the rank of W is constrained not to be greaterthan K ( < min ( M X , M Y )) ; M X and M Y denote their respective numbers of neurons , and the3 ource ( X T ) Target ( Y T ) Neurons Neurons T r i a l s T r i a l s S ou r c e N eu r on s T a r ge t N eu r on s Mixed dbc e f

Demixed a Time of Area XTime of Area Y

Time of area X

Significant relationship T i m e o f a r ea Y Stimulus-relatedDecision-related

Stimulus onset(e.g. stimlus A, B or C) Decision(e.g. choose right or left) jPECC: Repeat CCA/RRR at multiple timepointsjPESC: Repeat dSCAat multiple timepoints

Fixation Area X Area Y CCARRR

Neurons T i m e T r i a l s Neurons T i m e T r i a l s e.g. early sensory cortex e.g. prefrontal cortex Target( Y T Stimulus : Marginalized by stimulus)Target( Y T Decision : Marginalized by decision)

Right choiceaveragingStimulus AaveragingStimulus BaveragingStimulus CaveragingLeft choiceaveraging dSCAdSCAStimulus Decision

Figure 1: dSCA can identify content and timing of task-related communications among mul-tiple brain areas. a

Schematic of typical timeline of a trial in a decision making experiment inneuroscience. b The data are represented as sequences of populations of neurons matrices of area X (e.g. early sensory cortex) and area Y (e.g. prefrontal cortex). c area X (top) and area Y (bottom)communicate task-related information each other. d Schematic of the analysis for two populationsof neurons from different brain areas, X and Y . Conventional approaches (CCA or RRR) estimaterelationship between source and target populations of neurons without taking task parameters intoaccount (top). In contrast, dSCA applies marginalization to the target matrix by averaging acrossspeciﬁc task-parameter of interests (middle: marginalization for stimulus; bottom: marginalizationfor decision). In the marginalization procedure, for each neuron and each task parameter, trials havingthe same level of the task parameter have the same values (average across the same-level trials).Cells with the same colours indicate the same values. e Schematic of the obtained components fromCCA/RRR (top) and dSCA (middle and bottom). Each circle indicates a neuron or a component,and the width of an arrow indicates estimated weights. The component obtained from CCA/RRR ismixed, whereas the components obtained via dSCA are demixed in terms of the task parameter. f Top: Schematic of joint peri-event canonical correlation (jPECC) analysis. Results obtained fromCCA/RRR may not dissociate two task parameters from one another. Using dSCA yields jointperi-event shared component (jPESC) for stimulus (middle) and decision (bottom) are dissociated.Coloured areas indicate signiﬁcant relationship between two areas at the pair of time-points.number of columns in X and Y equals the number of trials T in our time-resolved setting while itmay vary depending on the context. The constrained minimization can be solved using the singularvalue decomposition: W RRR = W OLS VV T where W OLS is the ordinary least-squares solution and the columns of the M X × K matrix V contain the top K principal components of the optimal linear predictor ˆY OLS = XW OLS .To properly demix the effect of task parameters within CCA/RRR, we now make a key changeto the above framework based on the so-called ’marginalization’ procedure [5]. The idea is toreplace every column of raw target matrix Y with its conditional expectation, estimated empirically,given the corresponding realization of a speciﬁc task parameter of interest (Figure 1d, middle andbottom). For example, in the running example above, there can be 3 possible stimuli and 2 possibledecisions. Marginalization to demix ‘stimuli’ yields Y Stimulus having identical columns (trials) ifthey correspond to the same type of stimulus irrespective to the type of decision (Figure 1d, middle).Similarly, marginalization for ‘decision’ yields Y Decision with identical columns depending only onthe type of decision for each trial (Figure 1d, bottom). We will generically write the marginalizedmatrix as Y m , where m can be any task parameter of interest, such as stimulus or decision, containing N m possible levels. 4e refer to the resultant analysis framework as dSCA. After solving the CCA/RRR, if Y m cansigniﬁcantly be associated with the source activity via the low-rank representations, it indicates thatareas X and Y share information that is relevant to the task parameter m of interest. For example,if we are interested in the stimulus-related information sharing between areas X and Y , we willmarginalize the target matrix by stimulus, thus we use Y Stimulus as a target matrix. Then, applyingRRR as above, we can ﬁnd an optimal low-rank representation of the source populations of neuronsfor predicting the target area’s variance that is related to the stimulus information (Figure 1e, middleand bottom). Thus, dSCA explicitly takes task parameters into account, which is the crucial differencefrom related previous methods.Note that in our framework, we only marginalize the target, with the ‘source’ matrix X being leftintact. As discussed in [5], the underlying idea is that while the marginalized target can eliminatetask-irrelevant variability by marginalization, one can still employ full information in the sourcepopulations of neurons. In fact, one may apply marginalization to source matrix as well in addition tothe target, which gives a simple variation of dSCA. However, this reduces the effective sample sizerather drastically (as the column pairs in X m and Y m then contain many duplicates). We empiricallyfound that marginalization of both matrices often leads to less accurate results (see Section 4.1 forsimulation analysis).So far, we have assumed that all the neurons are measured simultaneously, but this is not alwaysthe case. Fortunately, when we do not record all neurons simultaneously, we can still apply thesame technique by making use of the concept of ‘pseudopopulations’, as discussed in [5]. For eachneuron, we ﬁrst compute summary statistics for all possible combinations of task parameters, calledperi-stimulus time histogram or PSTH. To calculate PSTH, we will average each neuron’s ﬁring rateover trials for each possible task parameter, in order to estimate the neuron’s time-dependent ﬁringrate. For example, suppose that we have two task parameters, stimulus and decision. In this case, wewill average over trials for each stimulus s (out of S ) and decision d (out of D ). Speciﬁcally, we willuse two matrices of X ∈ R M X × C and Y ∈ R M Y × C , where C = S × D .To address the question of when information is shared between the two regions, we follow the jPECCapproach introduced in Steinmetz et al. (2019) [8]. We repeat the dSCA analysis at all possiblecombinations of time-bins between areas X and Y . This yields a N -by- N peri-event matrix for eachdemixed shared component (Figure 1f), which we refer to as the joint peri-event shared components(jPESC). When performing this lagged analysis, we assume that whichever neuronal population iscurrently earlier in time is the ‘source’ matrix, and whichever population is later in time is the ‘target’matrix. Synthetic data

To illustrate that dSCA can detect shared components between two areas that isspeciﬁc to a task parameter of interest, we generated two simulated neuronal population datasets.Suppose that we simultaneously recorded from two brain areas, X and Y , during the experimentin which two task parameters exist: stimulus ( S ) and decision ( D ). The neural population in X is affected by variation in the stimulus, as shown in Figure 1c. Then, the population of neuronsin X passes stimulus information to another population of neurons in Y with some time delay. Adecision computation then arises in the population of neurons Y , such that it begins to encode thecategorical choice of the animal, which is passed back to populations of neurons in X . We also addedindependent random noise to populations of neurons in X and Y (see Supplementary Material).We ﬁrst conducted jPECC on the simulated data, applying standard CCA and RRR to X and Y with different time point pairs (Figure 2a), as was done in Steinmetz et al. (2019) [8]. Each pixel inthe resulting matrix represents the strength of cross-validated correlation between the ﬁrst pair ofcanonical variables at a given latency for CCA, or explained variance, − ( Y m − ˆ Y m ) / Var ( Y m ) ,for RRR. Although jPECC can reveal that these two areas share information (signiﬁcant clustersare shown; P < . , cluster-based permutation test, corrected for multiple comparisons; seeSupplementary Material for details), it does not tell us what types of information is being shared(Figure 2b). Given that both CCA and RRR provide similar results in the simulation analyses, wewill use CCA for the following analyses of real datasets. Note that, if we used explained variance asa metric for CCA, we obtained similar results to using correlation (see Supplementary Figure 1).5 Stimulus onset a From X to YTime of area Y T i m e o f a r ea X From Y to X c CCA

RRR E x p l a i ned v a r i an c e E x p l a i ned v a r i an c e C o rr e l a t i on c oe ff i c i en t dSCA-Stimulus dSCA-Decision

01 -1-0.5 -1-0.5

Figure 2:

Simulation demonstrates that dSCA decomposes information sharing between dif-ferent brain regions into different task-relevant parameters. a

Schematic of joint peri-eventanalysis. b Results obtained from CCA (left) and RRR (right) show similar results. c Results obtainedfrom dSCA for stimulus (left) and decision (right).We next applied jPESC to X and Y , to obtain a similar time-resolved matrix but using dSCA ratherthan CCA/RRR. We focus on two main task parameters: stimulus ( S ) and decision ( D ). In contrast toCCA, we used the marginalized matrix as the target, rather than the raw population matrix. Figure 2cshows that dSCA can clearly dissociate the shared information related to the distinct task parametersin populations of neurons in area X and Y ( P < . , cluster-based permutation test, corrected formultiple comparisons). Note that, although we used the raw matrix as the source matrix for dSCA,we could also use the marginalized source matrix (as described in section 2.2). We found that such aversion of dSCA in which both X and Y are marginalized provides similar, but less accurate resultsthan the results from using the raw source matrix (see Supplementary Figure 2). Therefore, we usedthe raw matrix as the source matrix for the following analyses of real datasets. Perceptual decision making

Next, we reanalyzed a perceptual decision making dataset for vali-dating the use of dSCA to demix task parameters of interest. We used the dataset that was publishedin Steinmetz et al. (2019) [8] . On each trial of the experiment (Figure 3a), visual stimuli of varyingcontrast could appear on the left side, right side, both sides or neither side. Mice earned a waterreward by turning a wheel with their forepaws to indicate which side had the higher contrast. Here,we exclude NoGo trials, where neither stimulus was present and mice should not move.In the original study, the authors analyzed interactions of neural populations using jPECC amongthree different subregions: visual cortex, frontal cortex, and midbrain (Figure 3b); see Steinmetz etal. (2019) for precise anatomical locations included in each of these subregions [8]. They analyzedneural activities that were recorded at relative to movement between visual and frontal cortex (Figure3c, top), visual cortex and midbrain (Figure 3c, middle), and frontal cortex and midbrain (Figure 3c,bottom). Their results revealed the latency at how information is shared between these subregions,but not which task parameter was being shared. We preprocessed the dataset exactly as in the originalstudy (see Supplementary Material for the details of preprocessing) and our results with standardjPECC replicated their previous ﬁndings (Figure 3c; signiﬁcant clusters are shown;

P < . ,cluster-based permutation test, corrected for multiple comparisons). Note that, for each combination,we analysed data from the three sessions with the largest number of completed trials; this is becausethere was a substantial variation in terms of the number of completed trials between sessions, and acertain amount of trials are necessary for reliable estimation by dSCA. Qualitatively similar (albeitweaker) results could be obtained from all sessions, including those with fewer trials (SupplementaryFigure 3).To obtain demixed results, we applied dSCA to the same dataset. Here, we focus on two main taskparameters: stimulus and decision. Stimulus is deﬁned as the strength of left coherence (1, 0.5,0.25 or 0) minus strength of right coherence (1, 0.5, 0.25 or 0) for each trial. Decision is deﬁned asthe mouse’s choice for each trial (Left or Right). In addition to applying dSCA to raw populationmatrices, we also applied dSCA to matrices after regressing out either stimulus or decision fromthese matrices, because these two task parameters are correlated to some extent. Figure 3d showsthat for all combinations of brain regions, information sharing is clearly decomposed into stimulus-and decision-related components (signiﬁcant jPESC clusters are shown; P < . , cluster-basedpermutation test, corrected for multiple comparisons). We can also see different types of time lag. Forexample, between midbrain and frontal cortex (bottom row), the shared stimulus-related componentis lagged (occurring earlier in midbrain than in frontal cortex); by contrast, the decision-relatedcomponent emerged in parallel between them (directional arrows indicate signiﬁcant time delay,6 b c d CCA dSCA-Stimulus dSCA-Decision dSCA-Stimulus(Decision controlled) dSCA-Decision(Stimulus controlled)Stimulus onset Decision(Movement)Time-window of analysis(locked to movement onset)Turn left . n.s.n.s. -1-0.95 -200 0 -200 0 -200 0 -200 0 Time of Visual cortexfrom movement onset (ms) T i m e o f F r on t a l c o r t e x f r o m m o v e m en t on s e t ( m s ) Time of Visual cortexfrom movement onset (ms) T i m e o f M i db r a i n f r o m m o v e m en t on s e t ( m s ) Time of Midbrainfrom movement onset (ms) T i m e o f F r on t a l c o r t e x f r o m m o v e m en t on s e t ( m s ) Visual → FrontalMidbrain → VisualMidbrain → Frontal Midbrain ⇆ Frontal Midbrain → FrontalVisual → Frontal-2000 01-2000 -200 0-2000

Midbrain → FrontalMidbrain → VisualVisual → Frontal

Midbrain → FrontalVisual → MidbrainVisual → FrontalVisual → Midbrain

Movement onset C o rr e l a t i on c oe ff i c i en t E x p l a i ned v a r i an c e Figure 3:

DSCA reveals strong double-dissociation of information sharing between two taskparameters that are not apparent by CCA. a

Schematic of mice turning a wheel to indicatewhich of two visual gratings had higher contrast. b Schematic of task sequence. We focus on thetime-window of -300 – 100 ms after decision (movement onset). c Results obtained from CCA.Directional arrows indicate signiﬁcant time lag, whereas bidirectional arrows indicate no signiﬁcanttime lag. d Results obtained from dSCA for sensory inputs (1st column), movement direction (2ndcolumn), sensory inputs that controlled movement direction (3rd column), and movement directionthat controlled sensory inputs (4th column). n.s. corresponds to no signiﬁcant cluster.whereas bidirectional arrows indicate non time delay;

P < . ; see Supplementary Material fordetails of the statistical inference). This underscores the unique contribution of dSCA in allowing usto observe when and what types of information is shared across different brain areas. Economic decision making

Finally, we applied dSCA to recordings from a macaque monkeyperforming an attention-guided information search and economic choice task. The data used herewere previously published in Hunt et al. (2018) [13]. On each trial, a monkey made an instructedsaccade toward a highlighted location to reveal a picture cue. The cue indicated to the monkey eitherthe probability or magnitude of reward that would be available from a subsequent choice towards thatspatial location. After 300ms of uninterrupted ﬁxation, cue 1 was covered, and another cue locationwas highlighted. Full details of the information search and choice structure of the task can be foundin Hunt et al. (2018) [13]; here, we only focus on the time when cue 1 was ﬁrst attended to.The authors recorded from three prefrontal cortex (PFC) subregions (anterior cingulate cortex [ACC],orbitofrontal cortex [OFC], and dorsolateral prefrontal cortex [DLPFC]), and analyzed each areaseparately in their original paper. The authors previously analysed these data only within-region, andcapitalized on neuronal heterogeneity to assessing population-level encoding of cue value and spatiallocation, amongst other variables. Although they found a strong dissociation between the three PFCsubregions in the degree of population encoding, all subregions had some encoding of both value andspace. We therefore sought to use dSCA to identify how value and space information was sharedbetween the three subregions, timelocked to the presentation of the stimulus.As with the previous analyses, we ﬁrst applied CCA to the combinations of ACC, OFC and DLPFC.We preprocessed the dataset exactly the same way as the original study. Note that, unlike the datasetpublished in Steinmetz et al. (2019), we performed the analysis on ‘pseudopopulations’ because notall neurons were simultaneously recorded (see Supplementary Material). Figure 4c applies standardjPECC to the data, and shows that all pairs of PFC subregions shared information after cue 1 onset(signiﬁcant clusters are shown;

P < . , cluster-based permutation test, corrected for multiplecomparisons). However, again, these results cannot tell us what type of information is being shared.To obtain demixed results, we next applied dSCA to these pairs of PFC subregions. We focused ontwo main task parameters in the task: space and attended-value. Space is deﬁned as the direction ofcue 1 (Left Option or Right Option). Attended-value is deﬁned as the magnitude (value or probability)of cue 1 (1, 2, 3, 4 or 5). As spatial position and value are orthogonal by task design, there is no needto regress these out of the data prior to marginalisation as in Figure 3.7 b Time-window of analysis

SaccadeFixation

Cue 1 openCue 1 availableSaccade

ClosedAvailable c d

CCA dSCA-Value dSCA-Space

Cue 1 openTime of OFC (ms) n.s.

Time of DLPFC (ms) T i m e o f A CC ( m s ) C o rr e l a t i on c oe ff i c i en t E x p l a i ned v a r i an c e T i m e o f A CC ( m s ) Time of OFC (ms) T i m e o f D L P F C ( m s ) ACC ⇆DLPFC

ACC →DLPFC

OFC ⇆DLPFC

ACC ⇆OFC

OFC →DLPFC

ACC →DLPFC

Figure 4: dSCA reveals that value-related information is strongly shared between OFC-ACC,emerging simultaneously across both subregions, whereas space-related information is sharedbetween DLPFC and the other regions with a time lag. a

Schematic of task. Monkey chosebetween a left and right option (dotted rectangles), after sequentially sampling cues that revealedreward probability and magnitude. Gray squares indicate locations available for sampling. b Schematic of task sequence. We focus on the time-window of -200 – 600 ms after cue 1 onset. c jPECC results obtained from CCA. Directional arrows indicate signiﬁcant time lag, whereasbidirectional arrows indicate no signiﬁcant time lag. d jPESC obtained from dSCA for value (left)and space (right). n.s. corresponds to a no signiﬁcant cluster.Figure 4d shows that, again, the jPESC with dSCA captures several characteristics that are notapparent from the jPECC with CCA. Between ACC and DLPFC, value- and space-related sharedcomponents was strongly dissociated after cue 1 onset ( P < . , corrected for multiple comparisons,cluster-based permutation test, only signiﬁcant clusters are shown; see Supplementary Material fordetails); between ACC and OFC, we can see the strongest value-related shared components amongall combinations, whereas no space-related shared components emerged; space-related componentsin ACC/OFC just after stimulus onset were sustained in DLPFC, whereas value-related computationemerged relatively in parallel later ( P < . ; see Supplementary Material for details).In summary, compared to previous methods, dSCA can provide us with insight into how informationis shared in terms across brain regions, in terms of a speciﬁc task parameter of interest. We proposed dSCA, a new technique for analysing populations of neurons obtained from differentbrain areas. Unlike previous methods, dSCA decomposes population activities into a few componentsto ﬁnd a low-rank approximation that maximizes the information shared by multiple brain areaswith a time lag, while also depending on a relevant task parameter. We demonstrated that dSCAcan reveal task speciﬁc shared components that are overlooked by conventional approaches usingsimulation and two previously published neuroscience datasets. We believe dSCA will be usefulfor neuroscientists who will have a large amount of data from different brain areas during complexcognitive experiments.Future research topics include: (i) extending dSCA to deal with more than two brain areas; (ii)investigating the characteristics of components obtained from dSCA at the level of single-trial, forexample, what is the behavioural difference between trials in which value-related information isshared between ACC-OFC and trials in which value-related information is not shared between them;8nd (iii) applying dSCA to human neuroimaging data measured by magnetoencephalography, orelectrical potentials measured by electrocorticography.

Broader Impact (a)

Who may beneﬁt from this research.

Neuroscientists who will investigate interactionamong multiple brain areas in terms of speciﬁc task parameter of interest.(b)

Who may be put at disadvantage from this research.

Not applicable to our research.(c)

What are the consequences of failure of the system.

Not applicable to our research.(d)

Whether the task/method leverages biases in the data.

Not applicable to our research.

Acknowledgments and Disclosure of Funding

The authors declare no competing interests.

References [1] James J. Jun, Nicholas A. Steinmetz, Joshua H. Siegle, Daniel J. Denman, Marius Bauza, BrianBarbarits, Albert K. Lee, Costas A. Anastassiou, Alexandru Andrei, Çaˇgatay Aydin, MladenBarbic, Timothy J. Blanche, Vincent Bonin, João Couto, Barundeb Dutta, Sergey L. Gratiy,Diego A. Gutnisky, Michael Häusser, Bill Karsh, Peter Ledochowitsch, Carolina Mora Lopez,Catalin Mitelut, Silke Musa, Michael Okun, Marius Pachitariu, Jan Putzeys, P. Dylan Rich,Cyrille Rossant, Wei Lung Sun, Karel Svoboda, Matteo Carandini, Kenneth D. Harris, ChristofKoch, John O’Keefe, and Timothy D. Harris. Fully integrated silicon probes for high-densityrecording of neural activity.

Nature , 551(7679):232–236, 2017.[2] Stefano Fusi, Earl K. Miller, and Mattia Rigotti. Why neurons mix: High dimensionality forhigher cognition.

Current Opinion in Neurobiology , 37:66–74, 2016.[3] John P. Cunningham and Byron M. Yu. Dimensionality reduction for large-scale neuralrecordings.

Nature Neuroscience , 17(11):1500–1509, 2014.[4] Wieland Brendel, Ranulfo Romo, and Christian K. MacHens. Demixed principal componentanalysis.

Advances in Neural Information Processing Systems 24: 25th Annual Conference onNeural Information Processing Systems 2011, NIPS 2011 , pages 1–9, 2011.[5] Dmitry Kobak, Wieland Brendel, Christos Constantinidis, Claudia E. Feierstein, Adam Kepecs,Zachary F. Mainen, Xue Lian Qi, Ranulfo Romo, Naoshige Uchida, and Christian K. Machens.Demixed principal component analysis of neural population data. eLife , 5(APRIL2016):1–36,2016.[6] Matthew T. Kaufman, Mark M. Churchland, Stephen I. Ryu, and Krishna V. Shenoy. Corticalactivity in the null space: Permitting preparation without movement.

Nature Neuroscience ,17(3):440–448, 2014.[7] João D. Semedo, Amin Zandvakili, Christian K. Machens, Byron M. Yu, and Adam Kohn.Cortical Areas Interact through a Communication Subspace.

Neuron , 102(1):249–259.e4, 2019.[8] Nicholas A. Steinmetz, Peter Zatka-Haas, Matteo Carandini, and Kenneth D. Harris. Distributedcoding of choice, action and engagement across the mouse brain.

Nature , 576(7786):266–273,2019.[9] Mattia Rigotti, Omri Barak, Melissa R. Warden, Xiao Jing Wang, Nathaniel D. Daw, Earl K.Miller, and Stefano Fusi. The importance of mixed selectivity in complex cognitive tasks.

Nature , 497(7451):585–590, 2013.[10] Valerio Mante, David Sussillo, Krishna V. Shenoy, and William T. Newsome. Context-dependentcomputation by recurrent dynamics in prefrontal cortex.

Nature , 503(7474):78–84, 2013.911] Joshua I. Gold and Michael N. Shadlen. The Neural Basis of Decision Making.

Annual Reviewof Neuroscience , 30(1):535–574, 2007.[12] Laurence T. Hunt, Timothy EJ Behrens, Takayuki Hosokawa, Jonathan D Wallis, andSteven Wayne Kennerley. Capturing the temporal evolution of choice across prefrontal cortex. eLife , 4(November):1689–1699, 2015.[13] Laurence T. Hunt, W. M.Nishantha Malalasekera, Archy O. de Berker, Bruno Miranda, Simon F.Farmer, Timothy E.J. Behrens, and Steven W. Kennerley. Triple dissociation of attention anddecision computations across prefrontal cortex.

Nature Neuroscience , 21(10):1471–1481, 2018.[14] L. G. Nowak, M. H.J. Munk, A. C. James, P. Girard, and J. Bullier. Cross-correlation studyof the temporal interactions between areas V1 and V2 of the macaque monkey.

Journal ofNeurophysiology , 81(3):1057–1074, 1999.[15] Wilson Truccolo, Leigh R. Hochberg, and John P. Donoghue. Collective dynamics in humanand monkey sensorimotor cortex: Predicting single neuron spikes.

Nature Neuroscience ,13(1):105–111, 2010.[16] Georgia G. Gregoriou, Stephen J. Gotts, Huihui Zhou, and Robert Desimone. High-Frequency, long-range coupling between prefrontal and visual cortex during attention.

Science ,324(5931):1207–1210, 2009.[17] R F Salazar, N M Dotson, S L Bressler, and C M Gray. Content-speciﬁc fronto-parietalsynchronization during visual working memory.

Science , 338(6110):1097–1100, 2012.[18] Markus Siegel, Timothy J. Buschman, and Earl K. Miller. Cortical information ﬂow duringﬂexible sensorimotor decisions.

Science , 348(6241):1352–1355, 2015.[19] Peter R Murphy, Niklas Wilming, Diana C Hernandez-Bocanegra, Genis Prat Ortega, andTobias H Donner. Normative Circuit Dynamics Across Human Cortex During EvidenceAccumulation in Changing Environments. bioRxiv , page 2020.01.29.924795, 2020.[20] Fernando De Torre. A Least-Squares Framework for Component Analysis.

Ieee Transactionson Pattern Analysis and Machine Intelligence , 34(6):1041–1055, 2012.10 upplementary Marterial:Demixed shared component analysis of neuralpopulation data from multiple brain areas

Yu Takagi Steven W. Kennerley Jun-ichiro Hirayama Laurence T. Hunt

We generated sequences of neuronal populations in areas X (e.g. early sensory cortex) and Y (e.g.prefrontal cortex). We set the number of time steps to 24, and the number of neurons in areas X and Y to 10 and 9, respectively. At each time step t , baseline activities for areas X and Y were drawnrandomly from a Gaussian distribution N (0 , . Each trial was labelled with stimulus and decision.There were 5 possible stimuli and 3 possible decisions, thus × possible combinationsof labels. For each combination, we generated 20 trials, resulting in 300 trials in total. Stimulusonset occurred at time step 6. Noises were added to each neuron for each trial independently from aGaussian distribution N (0 , . ) .Neurons in areas X and Y were affected by the stimulus and decision, and communicated witheach other as follows. First, neurons in area X were affected by the stimulus for three time stepsfrom the time of stimulus onset. The magnitudes of the effects were randomly determined for eachneuron, were linearly correlated with the level of the stimulus, and were linearly increased acrosstime. Neurons in area X passed the stimulus-related information to the neurons in area Y via arandom projection matrix after two time steps from the time when neurons in area X started toprocess stimulus-related computation. After area Y received the stimulus-related input from area X , neurons in area Y started to compute the decision. As with the stimulus-related computation inarea X , the magnitude of the effects were also randomly determined for each neuron, were linearlycorrelated with the level of the decision, and were linearly increased across time. Area Y then passeddecision-related information back to area X via another random projection matrix after two time stepsfrom the time when decision-related computation in Y emerged. With the setting described above, we obtained a neurons × trials × timesteps array forarea X , and neurons × trials × timesteps array for area Y . To investigate the relationshipbetween areas X and Y in a time-resolved manner, we then constructed two matrices X and Y , where X is the area X ’s data matrix of size × at time t X , and Y is the area Y ’s data matrix of size × at time t Y . We assume that whichever neuronal population is currently earlier in time is the‘source’ matrix, and whichever population is later in time is the ‘target’ matrix . We ignored the caseswhere t X == t Y .To separate the data into training and testing sets, for each label combination, we held out one randomtrial as test trial. Thus the number of test set trials is decisions × stimuli = 15 . We then appliedjPECC (with CCA/RRR) or jPESC (with dSCA) to training set as follows. We then applied obtainedtransformation matrices to test set. We repeated this procedure ﬁfteen times for different train-testsplits, and averaged the results. Preprint. Under review. a r X i v : . [ q - b i o . N C ] J un PECC with CCA/RRR was performed on the training set, L2 regularized using λ = 0 . . For CCA,the test set trials were projected onto a canonical dimension, and the Pearson correlation coefﬁcientwas computed between projected test set trials in the source and target areas. For RRR, we predictedvalues of the test set target matrix from the test set source matrix, using transformation matricesestimated by training set. We then computed explained variance between predicted and actual test settarget matrix. We used the top low-dimensional component both for CCA and RRR.As with the jPECC with CCA/RRR, jPESC with dSCA was performed on the training set, L2regularized using λ = 0 . . However, here we applied marginalization to the target matrix. We usedstimulus- and decision-marginalized target matrices for dSCA separately. We predicted values ofthe marginalized target matrix from the source matrix in the test set, using transformation matricesestimated by the training set. We then computed explained variance between predicted and actualtarget matrices. We used the top low-dimensional component. To test whether areas of high correlation/explained-variance between two brain regions were signiﬁ-cantly larger than would be expected by chance, we used a cluster-based permutation test [1]. ForjPECC with CCA, we identiﬁed clusters in the correlation map that were larger than a cluster-formingthreshold (set at r > . ). For jPECC with RRR and jPESC with dSCA, we identiﬁed clusters in theexplained variance map that were larger than a cluster-forming threshold (set at R > − . ). Wethen permuted trials in one brain area to recompute jPECC/jPESC, and we identiﬁed clusters thatexceeded the cluster-forming threshold in the permuted data. Note that, we permuted trials but notdistorted temporal structure across time steps. For each of the 100 permutations, we stored the size ofthe largest cluster. This procedure provided a null distribution of maximum cluster sizes that wouldbe expected by chance. We used 95th percentile of this null distribution as a threshold for deemingwhether the cluster sizes observed in the data were signiﬁcant, at P < . (corrected for multiplecomparisons across all pairs of timepoints). The authors trained mice to perform visual discrimination task. During each recording session, theauthors simultaneously recorded from hundreds of neurons in multiple regions using Neuropixels[2, 3] probe ( n = 92 probe insertions over 39 sessions in 10 mice). The details of data acquisition andpreprocessing are described in [4]. Datasets is obtained from https://figshare.com/articles/Dataset_from_Steinmetz_et_al_2019/9598406 and codes for preprocessing is obtained from https://github.com/nsteinme/steinmetz-et-al-2019 , both were published by the authors.In their original paper, they applied jPECC with CCA to neural activities that were recorded at relativeto movement (-300 to 100 ms) between visual and frontal cortex (15 sessions), visual cortex andmidbrain (10 sessions), and frontal cortex and midbrain (9 sessions) (see [4] for precise anatomicallocations included in each of these subregions). For each combination, we analysed data from thethree sessions with the largest number of completed trials; this is because there was a substantialvariation in terms of the number of completed trials between sessions, and we found empirically thata certain amount of trials are necessary for reliable estimation by dSCA. We applied jPECC with CCA and jPESC with dSCA. All settings are the same as the simulationanalyses except for the following: • Before applying jPECC with CCA and jPESC with dSCA, we applied PCA to both sourceand target matrices as was done in [4], across time points and trials to reduce populationactivity to ten dimensions. • For jPESC with dSCA, marginalization was performed two main task parameters: stimulusand decision (see main manuscript for the deﬁnition). We also applied dSCA to matricesafter regressing out either stimulus or decision from these matrices. We conﬁrmed that2pplying PCA before applying jPESC with dSCA did not substantially change the results(data not shown). • We repeated the cross-validation procedure ten times, averaging the results.

We used the same cluster-based permutation test procedure as described above for the simulationanalyses.To quantify lead–lag relationships, we computed the asymmetry index by calculating the number ofsigniﬁcant timepoints included in the identiﬁed cluster from the above permutation procedure. Wecalculated these numbers in right and left half of the obtained jPECC and jPESC matrices separately,and then subtracted right from left. This procedure provided a null distribution of the difference in thenumber of signiﬁcant timepoints between left half and right half that would be expected by chance.We used the 95th percentile of this null distribution as a threshold for deeming whether the differenceof number of signiﬁcant timepoints between the left half and right half observed in the data weresigniﬁcant, at

P < . . The authors recorded neuronal activity in the macaque OFC, ACC and DLPFC during sequentialattention-guided information search and choice. During a typical recording session, 8–24 electrodeswere lowered bilaterally into multiple target regions. We used the data from monkey coded ‘F’. Thedetails of data acquisition and preprocessing are described in [1]. Dataset and codes for preprocessingare obtained from http://crcns.org/data-sets/pfc/pfc-7 that was published by the authors.In this experiment, neural recordings were obtained in multiple sessions, so most of the neurons werenot recorded simultaneously. We therefore used ‘pseudopopulation’ matrices by averaging averagedacross task parameters to identify each neuron’s response to experimental variables. This allowed usto collapse data across sessions.

We applied jPECC with CCA and jPESC with dSCA, as per the previous analyses. All proceduresare the same as the simulation analyses except for the following settings: • For jPECC with CCA and jPESC with dSCA, we applied PCA to both source and targetmatrices as was done in [4], across time points and trials to reduce population activity toeight dimensions. • For jPESC with dSCA, marginalization was performed on two main task parameters in thetask: space and attended-value (see main manuscript for the deﬁnition). We conﬁrmed thatapplying PCA before applying jPESC with dSCA did not substantially change the results(data not shown). • For cross-validation in the pseudopopulation setting, to separate the data into training andtesting sets, we followed the procedure proposed in [5]. We ﬁrst held out one random trialfor each neuron in each combination of task parameters, i.e. space and attended-value,as a set of test pseudopopulations X test and Y test . Because there are 5 possible stimuliand 2 possible spaces, the dimensions of X test and Y test is × . Remaining trialswere averaged to form a training sets of X train and Y train . Note that test and training sets( X test and X train , or Y test and Y train ) have the same dimensions of 10. We repeated thiscross-validation procedure 20 times and averaged the results. We used the same cluster-based permutation test procedure as described above for the simulationanalyses. We also used the same permutation test procedure for determining lead–lag relationships asdescribed above for the perceptual decision making task.3 eferences [1] Laurence T. Hunt, W. M.Nishantha Malalasekera, Archy O. de Berker, Bruno Miranda, Simon F.Farmer, Timothy E.J. Behrens, and Steven W. Kennerley. Triple dissociation of attention anddecision computations across prefrontal cortex.

Nature Neuroscience , 21(10):1471–1481, 2018.[2] Nicholas A. Steinmetz, Christof Koch, Kenneth D. Harris, and Matteo Carandini. Challengesand opportunities for large-scale electrophysiology with Neuropixels probes.

Current Opinion inNeurobiology , 50:92–100, 2018.[3] James J. Jun, Nicholas A. Steinmetz, Joshua H. Siegle, Daniel J. Denman, Marius Bauza, BrianBarbarits, Albert K. Lee, Costas A. Anastassiou, Alexandru Andrei, Çaˇgatay Aydin, MladenBarbic, Timothy J. Blanche, Vincent Bonin, João Couto, Barundeb Dutta, Sergey L. Gratiy,Diego A. Gutnisky, Michael Häusser, Bill Karsh, Peter Ledochowitsch, Carolina Mora Lopez,Catalin Mitelut, Silke Musa, Michael Okun, Marius Pachitariu, Jan Putzeys, P. Dylan Rich,Cyrille Rossant, Wei Lung Sun, Karel Svoboda, Matteo Carandini, Kenneth D. Harris, ChristofKoch, John O’Keefe, and Timothy D. Harris. Fully integrated silicon probes for high-densityrecording of neural activity.

Nature , 551(7679):232–236, 2017.[4] Nicholas A. Steinmetz, Peter Zatka-Haas, Matteo Carandini, and Kenneth D. Harris. Distributedcoding of choice, action and engagement across the mouse brain.

Nature , 576(7786):266–273,2019.[5] Dmitry Kobak, Wieland Brendel, Christos Constantinidis, Claudia E. Feierstein, Adam Kepecs,Zachary F. Mainen, Xue Lian Qi, Ranulfo Romo, Naoshige Uchida, and Christian K. Machens.Demixed principal component analysis of neural population data. eLife , 5(APRIL2016):1–36,2016. 4

CA-CanonCorr CCA-ExplVar RRR-ExplVar

01 -1-0.5-1-0.5 E x p l a i ned v a r i an c e E x p l a i ned v a r i an c e C o rr e l a t i on c oe ff i c i en t Figure 1:

Different metrics/methods provide similar results.

CCA can identify two clusters whenaccuracy is deﬁned as correlation coefﬁcient (left) or explained variance (middle). RRR (right) alsoprovides similar results 5 aw source matrix(Stimulus) Raw source matrix(Decision)Marginalized source matrix(Stimulus) Marginalized source matrix(Decision) -1-0.5 E x p l a i ned v a r i an c e Figure 2:

Simulation results obtained from non-marginalized source matrix (top row) muchclearly captured the interaction between two areas compared to the results obtained frommarginalized source matrix (bottom row) that captured the smaller size of clusters. b CCA dSCA-Stimulus dSCA-Decision dSCA-Stimulus(Decision controlled) dSCA-Decision(Stimulus controlled) -200 0 -200 0 -200 0 -200 0 -200 0

Time of Visual cortexfrom movement onset (ms) T i m e o f F r on t a l c o r t e x f r o m m o v e m en t on s e t ( m s ) Time of Visual cortexfrom movement onset (ms) T i m e o f M i db r a i n f r o m m o v e m en t on s e t ( m s ) Time of Midbrainfrom movement onset (ms) T i m e o f F r on t a l c o r t e x f r o m m o v e m en t on s e t ( m s ) -2000 01-2000-2000 Movement onset C o rr e l a t i on c oe ff i c i en t E x p l a i ned v a r i an c e -1-0.95 -1.01-0.95 Figure 3:

For data from mice (Steinmetz et al., 2019), qualitatively similar (albeit weaker)results could be obtained from all sessions, including those with fewer trials. a jPECC resultsobtained from CCA are shown. For visualization purposes, we used arbitrary cluster-formingthreshold: r > 0.3 and size-of-cluster > 5. bb