Percept formation from neural populations in sensory decision-making tasks
PPercept formation from neural populations in sensorydecision-making tasks
Adrien Wohrer , ∗ , Christian Machens ∗ E-mail: [email protected]
Abstract
We study a standard linear readout model of perceptual integration from a population of sensoryneurons. We show that the readout can be associated to a set of characteristic equations whichsummarize the joint trial-to-trial covariance structure of neural activities and animal percept. Thesecharacteristic equations implicitly determine the readout parameters that were used by the animalto create its percept. In particular, they implicitly constrain the temporal integration window w andthe typical number of neurons K which give rise to the percept. Comparing neural and behavioralsensitivity alone cannot disentangle these two sources of perceptual integration, so the characteristicequations also involve a measure of choice signals, like those assessed by the classic experimentalmeasure of choice probabilities. We then propose a statistical method of analysis which allows torecover the typical scales of integration w and K from finite numbers of recorded neurons and record-ing trials, and show the efficiency of this method on an artificial encoding network. We also studythe statistical method theoretically, and relate its laws of convergence to the underlying structure ofneural activity in the population, as described through its singular value decomposition. Altogether,our method provides the first thorough interpretation of feedforward percept formation from a pop-ulation of sensory neurons. It can readily be applied to experimental recordings in classic sensorydecision-making tasks, and hopefully provide new insights into the nature of perceptual integration. Most cortical neurons are noisy, or at least appear so to experimenters. When a sensory neuron’s spikesare recorded in response to a well-controlled stimulus, they will show a large variability from trial to trial.This noisiness has been acknowledged from early on, as a nuisance preventing experimenters from easyaccess to the encoding properties of sensory neurons. But what is the impact of trial-to-trial sensory noiseon the organism itself? This question gained renewed interest a few decades ago, with the generalizationof experimental setups recording neural activity from awake, behaving animals (Mountcastle et al., 1990;Britten et al., 1992). In these setups, animals are presented with a set of stimuli f and trained to responddifferentially to different values of f , thus providing an (indirect) report of their percept of f . As neuralactivity and animal behavior are simultaneously monitored, it becomes possible to seek a causal linkbetween the two.In such setups, one particular hypothesis—which we refer to as the “sensory noise” hypothesis—hasproven instrumental in linking neural activity and percepts. It postulates that trial-to-trial noise at thelevel of sensory neurons is the main factor limiting the accuracy of the animal’s perceptual judgements(Werner and Mountcastle, 1965; Talbot et al., 1968). Indeed, signal detection theory provides the ade-quate tools to estimate such accuracies. Any type of biological response to a stimulus f —say r ( f )—can1 a r X i v : . [ q - b i o . N C ] M a r be associated to a signal-to-noise ratio (SNR), which measures how typical variations in r due to a changeof stimulus f (the signal ) compare to intrinsic variations of r from trial to trial (the noise ). When r ( f )measures the response of a neuron to stimulus f , the resulting SNR is often called the neurometric sensitivity for that particular neuron. Alternatively, r ( f ) may also be the response of the animal itselfto stimulus f . The resulting SNR is called the animal’s psychometric sensitivity, which quantifies theanimal’s ability to discriminate nearby stimulus values f . Reformulated in terms of SNRs, the “sensorynoise” hypothesis states that neurometric sensitivity, computed from the population of sensory neuronsunder survey, is equal to the psychometric sensitivity for the animal in the task.Applying this idea, neurometric and psychometric sensitivities have often been computed and com-pared, in various sensory systems and behavioral tasks (see, e.g., Romo and Salinas, 2003; Gold andShadlen, 2007, for reference). However, it was progressively realized that most of these comparisons bearno simple interpretation, because the neurometric sensitivity is not a fixed quantity: it depends on howinformation is read out from the neurons. For example, if the various sensory neurons in the populationbehave independently one from another, then the overall SNR from the population will essentially bethe sum of individual SNRs and thus, the experimenter’s estimate of neurometric sensitivity will dependon how many neurons—say K —they included in their analysis. This intuition still holds in realisticpopulations where neurons are not independent, with the additional complexity that the evolution ofneural SNR with K is very influenced by the correlation structure of noise in the population (Shadlenand Newsome, 1998; Abbott and Dayan, 1999; Averbeck et al., 2006).More subtly, another parameter has a direct influence on estimated neurometric SNRs: the time scale w used to integrate each neuron’s spike train, to describe the neuron’s activity over the trial (Cohen andNewsome, 2009). Indeed, through the central limit theorem, the more neural spikes are integrated intothe readout, the more accurate that readout will be. Adding extra neurons through K , or extra spikesfor each neuron through w , will thus have the same type of impact on the readout’s overall SNR. In fact,if all neurons from the population are identical, independent Poisson encoders, one can easily show thatthe readout’s overall SNR scales with √ wK , emphasizing the duality between K and w .As there is no unique way of reading out information from a population of sensory neurons, a questionnaturally arises: what type of readout does the organism use? For example, how many sensory neurons K , and what typical integration time scale w , provide a relevant description of the animal’s perceptformation? The “sensory noise” hypothesis can precisely be used to answer this question: the ‘true’neuronal readout for the organism must be the one providing the best account of animal behavior.However, the previous K – w discussion clearly shows that comparing neurometric SNR to psychometricSNR is not sufficient to target the true readout: there will be several combinations of K and w leadingto the same overall neurometric SNR, while corresponding to very different extraction strategies by theanimal. Thus, an additional experimental measure is required to recover the typical scales of integrationof the true readout. Choice signals are a good candidate for this additional measure. In two-alternative tasks, where theanimal must report a binary discrimination of stimulus value (say, f > f < choice probabilities (CP) (Green and Swets, 1966; Britten et al., 1996).CP is computed for each recorded neuron individually, and quantifies the trial-to-trial correlation betweenthe activity of that neuron and the animal’s ultimate (binary) choice on the trial, all other features beingheld constant. In particular, since CP is computed across trials with the same stimulus value (generallyuninformative, i.e., f = 0), the observed correlations cannot reflect the influence of stimulus on neuralactivity and animal choice. Instead, a significant CP can only result from the process by which theneuron’s activity influences—or is influenced by—the animal’s forming perceptual decision.It is intuitively clear that CPs reveal something about the way information is extracted from sensoryneurons. For example, if the animal’s percept is built from a single neuron, then that neuron will have avery large CP, because its activity on every trial directly predicts the animal’s percept. Instead, if severalindependent neurons contribute to form the animal’s percept, then they are all expected to have lowCP value, as the activity of each neuron has only a marginal impact on the animal’s decision. However,converting this intuition about choice signals into a quantitative interpretation was long hampered by thefact that, just like neurometric SNR, CP values are largely influenced by the population’s noise covariancestructure. For example, a neuron may not be utilized by the animal to form its percept, and yet displaysignificant CP because its activity is correlated with that of another neuron being utilized. As a result,early studies relating CP values to the animal’s perceptual readout only relied on numerical simulations(Shadlen et al., 1996; Cohen and Newsome, 2009), assuming very specific noise correlation structuresthat weakened the generalizations of their results. Only very recently have Haefner et al. (2013) providedan analytical expression for CP values in the presence of noise correlations (see section 4.2), opening thedoor to general, quantitative interpretations of choice probabilities.In this article, we show how the combined information of animal sensitivity (SNR) and choice signalsallows to estimate the typical scales of percept formation by the animal, both across neurons (number ofneurons involved K ) and in time (integration window w ). Our results apply in the standard feedforwardmodel of percept formation, and can be derived for any noise covariance structure in the neural population.We first show how the joint covariance structure of neural activities and animal percept leads to a setof characteristic equations for the readout, which implicitly determine the animal’s perceptual readoutpolicy across neurons and time. Then, we show how these characteristic equations can be used in astatistical form, across the ensemble of trials and neurons available to the experimenter, to determinethe typical scales K and w of percept formation from the activity of sensory neurons. This approachis mandatory since experimental measurements can only provide statistical samples of the full neuralpopulation. Using an artificial neural network to provide sensory encoding, we show that our methodcan reliably recover the true scales of perceptual integration, without requiring full measurement of theneural population. Thus, our method can readily be applied to real experimental data, and provide newinsights into the nature of sensory percept formation. We place ourselves in a general framework, describing a typical perceptual decision-making experiment(Fig. 1). On each trial, a different stimulus f is presented to the animal (Fig. 1a, top), which thentakes a decision according to its internal judgement f (cid:63) of stimulus value. Our framework assumes thatthis percept f (cid:63) is directly available to the experimenter on each trial. In real experimental setups,the animal’s report is generally more indirect—typically a binary choice based on the unknown percept f (cid:63) . We choose the former approach because it applies generically to most perceptual decision-makingexperiments, whereas the “choice” part is more dependent on each particular setup. We detail later howboth approaches can be reconciled through simple models of the animal’s behavior (section 4.2).Simultaneously, experimenters record neural activities from a large population of sensory neurons,which is assumed to convey the basic information about f used by the animal to take its decision(Fig. 1a, bottom). Typical examples could be area MT in the context of a moving dot discriminationtask (e.g., Britten et al., 1992), area MT or V2 in the context of a depth discrimination task (e.g., Ukaand DeAngelis, 2003; Nienborg and Cumming, 2009), or area S1 in the context of a tactile discriminationtask (e.g., Hern´andez et al., 2000). We describe the activity of this neural population on every trial as apoint process s ( t ) = { s i ( t ) } i =1 ...N tot , where each s i ( t ) is the spike train for neuron i , viewed as a series ofDirac pulses. As an important remark, N tot denotes the full population size, a very large and unknownnumber. It is not the number of neurons actually recorded by the experimenter, which is generally muchsmaller.For simplicity, we assume a fine discrimination task, where the different stimulus values f presentedto the animal display only moderate variations around a central value, say f = 0. This substantiallysimplifies SNR computations, because the ‘signal’ part of any response r ( f ) is then summarized by itsslope in f = 0 : ∂ f E( r | f ) | f =0 , where E() denotes the average response over trials. We assume that thislinearization with f can be performed both for the psychometric report f (cid:63) , and for individual neuronactivities. This is mostly a convenience though, and the framework could be generalized to more complexdependencies on stimulus f .From the raw data of f (cid:63) and s ( t ) on each trial, a number of measures are routinely used to describeneural activity and animal behavior. First, the psychometric sensitivity Z (cid:63) describes the animal’s accu-racy in distinguishing nearby frequency values. It can be computed from the distribution of ( f, f (cid:63) ) acrosstrials (Fig. 1b), according to the formula: Z (cid:63) = 1 (cid:104) Var( f (cid:63) | f ) (cid:105) f , (1)where notation (cid:104) . (cid:105) f denotes an average across stimulus conditions. This is exactly the (squared) SNRfor random variable f (cid:63) ( f ), assuming that the ‘signal’ term ∂ f E( f (cid:63) | f ) is equal to 1 because the animal’saverage judgement of f is unbiased (the framework easily generalizes to a biased percept).On the other hand, for each recorded neuron, it is common practice to compute its peri-stimulus timehistogram (PSTH) in response to each different tested stimulus (Fig. 1d): λ i ( t ; f ) := E( s i ( t ) | f ) , (2)where E denotes averaging over trials. Since all stimuli f are assumed to be close one from another, thedependency of λ i ( t ; f ) on f is essentially linear, and can be summarized by the (temporal) tuning curvefor the neuron (Fig. 1e): β i ( t ) := ∂ f λ i ( t ; f ) . (3)Furthermore, as recent techniques allow the simultaneous recording of many neurons, experimentersalso have access to samples from the trial-to-trial covariance structure in the population (Fig. 1c). Forevery pair of neurons ( i, j ) and instants in time ( t, s ), this covariance structure is assessed through theneurons’ joint peri-stimulus time histogram (JPSTH, Aertsen et al., 1989): γ ij ( t, s ) := (cid:104) Cov( s i ( t ) , s j ( s ) | f ) (cid:105) f . (4)We only consider the average covariance structure, over different stimuli f . First, as above, nearby valuesof f insure that the covariance structure will remain mostly unchanged. Second, trial-to-trial covariancescorrespond to second-order effects on neural activity, which require several trials to be reliably estimated—another reason to lump data from different stimuli f into a single estimate.Finally, we can measure a choice signal for each neuron, estimating the trial-to-trial covariance ofneuron activity s i ( t ) with the animal’s choice (Fig. 1f). Since in our framework the animal directlyreports its percept f (cid:63) , we readily describe the choice signal of each neuron by its percept covariance (PCV) curve: π (cid:63)i ( t ) := (cid:104) Cov( s i ( t ) , f (cid:63) | f ) (cid:105) f . (5)Again, this covariance information is lumped across the different (nearby) stimulus values f , in orderto improve experimental measurement. The PCV curve captures the core intuition behind the moretraditional measure of choice probability (CP), while retaining a linear form convenient for analyticaltreatment. Percept covariance curves are not directly measurable in classic experimental setups wherethe animal only reports a binary choice ; however their analytical link to available measures such as CPscan be easily derived given simple models of the animal’s decision policy (see section 4.2).Unlike many characterizations of neural activity that rely only on spike counts, our framework re-quires an explicit temporal description of neural activity through PSTHs (eq. 2), JPSTHs (eq. 4) andpercept covariance curves (eq. 5). Indeed, our method ultimately predict when , and how long , perceptual Figure 1.
Framework and main experimental measures. (a) Experimental setup. Top: A set of stimulusvalues f (color-coded as blue, yellow, red) are repeatedly presented to an animal, which reports its percept f (cid:63) on each trial (color-coded as green). Bottom: In each session, several task-relevant sensory neuronsare recorded simultaneously with behavior. (b) Perceptual sensitivity Z (cid:63) is defined as the square SNRof the animal’s reports f (cid:63) ( f ). (c) The noise covariance structure can be assessed in each each pair ofsimultaneously recorded neurons, as their joint peri-stimulus histogram (JPSTH). (d) Trial-wise responseof a particular neuron. Each thin line is the schematical representation of the spike train on each trial.Segregating trials according to stimulus (top), we access the neuron’s peri-stimulus histogram (PSTH)and its tuning curve—shown in panel (e). Segregating trials according to the animal’s perceptual error f (cid:63) ( f ) − f (bottom), we access the neuron’s percept covariance (PCV) curve—shown in panel (f).integration takes place in the organism. Readers may feel uncomfortable that the resulting definitions aredirectly expressed over trains of Dirac pulses. While these notations are fully justified in the framework ofpoint processes (Daley and Vere-Jones, 2007), they describe idealized quantities that cannot be estimatedfrom a finite number of trials, leading to jaggy estimates formed from the collection of Dirac peaks. Soin practice, spike trains s i ( t ) are computed in temporal bins of finite precision. All experimental measures described above, taken together, provide a full characterization of the jointcovariance structure of variables ( s ( t ) , f (cid:63) ) across stimuli and trials (Fig. 2c). The key argument toexploit these data, which is actually a reformulation of the ‘sensory noise’ hypothesis, is that the animal’spercept f (cid:63) is built on every trial from the activity of the sensory neurons, meaning that f (cid:63) = F (cid:63) ( s )for some unknown readout F (cid:63) . As a result, each proposed readout F directly yields an estimate forthe joint covariance structure of ( s ( t ) , f (cid:63) )—through a set of relationships which constitute the readout’s characteristic equations . Conversely, since this joint covariance structure is experimentally measurable,it implicitly constrains the nature of the true readout F (cid:63) which was applied by the animal. In thissection, we introduce a generic form of linear readout, stemming from the standard feedforward model ofperceptual integration, and derive its characteristic equations. We show that in theory, these equationstotally characterize the readout applied by the animal. We define a generic linear readout from the activity of sensory neurons s ( t ) (Fig. 2a), based on a givenreadout vector: a = { a i } i =1 ...N tot , a given integration kernel with normalized shape h and time constant w : h w ( t ) := w − h ( τ /w ), and a given readout time t R : (cid:98) f ( t R ) := (cid:88) i (cid:90) u> a i s i ( t R − u ) h w ( u ) du. (6)The readout is noted (cid:98) f , as it must ultimately be an estimator of stimulus value f . We explicitely note thedependence on t R to emphasize that (cid:98) f ( t R ) is built from a sliding temporal average of the spike trains ;so that each instant in time yields a potential readout.This is a classical form of readout from a neural population, which has often been used previously anddescribed as the ‘standard’ model of perceptual integration (Shadlen et al., 1996; Haefner et al., 2013).The temporal parameters h w and t R describe how each neuron’s temporal spike train s i ( t ) is integratedinto a single number describing the neuron’s activity over the trial: s i = (cid:82) u> s i ( t R − u ) h w ( u ) du . Inturn, the percept is built linearly from the population activity as (cid:98) f = (cid:80) i a i s i through a specific readoutvector, or ‘perceptual policy’, a .However, traditional studies generally make ad hoc choices for the various constituants of this readout.Most often, s i simply describes the total spike count for neuron i , which in our model corresponds tochoosing a square kernel h , and parameters w = t R = T describing an integration over the full period T of sensory stimulation. As mentionned in the introduction, there is no reason that this should bea relevant description of sensory integration by the organism: the integration window w has a directinfluence on predicted SNRs for the readout, and experiments suggest that animals do not always usethe full stimulation period to build their judgement (Luna et al., 2005; Stanford et al., 2010).Instead, we make no assumption on the nature of w and t R , and view them as free parameters of themodel. Then, the model parameters implicitly characterize the typical scales of perceptual integration bythe animal. The number of significantly nonzero entries in a , say K , defines the number of neuronscontributing to the percept. The readout window w characterizes the behavioral scale of temporalintegration from the sensory neurons, and time t R characterizes when during stimulation this integration Figure 2.
Linear readout and its interpretation. (a) We study a “standard” model of perceptualreadout, with two parameters w and t R defining integration in time, and a readout vector a definingintegration across neurons. (b) Geometric interpretation of the model. The temporal parameters w and t R define the tuning vector b and noise covariance matrix C in the population. Colored ellipses schematizethe distribution of neural activities from trial to trial, for the three possible stimulus values. The readout (cid:98) f can be viewed as an orthogonal projection of neural activities in the direction given by a . (c) Sensitivity Z , PCV curves π i ( t ) and noise covariance JPSTHs γ ij ( t, s ) totally define the joint covariance structurebetween spike trains s ( t ) and percept (cid:98) f . (d) Any feedforward readout of neural activities can be viewedas a mapping (cid:98) f = F ( s ( t )), so the true F is implicitly constrained by the covariance data from panel c. Inthe case of the linear readout model, these constraints are summarized by three characateristic equations,which relate neural and perceptual data through the readout’s parameters w , t R and a .takes place. The exact shape h given to the integration kernel is of less importance ; for conceptual andimplementational simplicity we assume it to be a square window. However, we note that (1) other shapesmay have a higher biological relevance, such as the decreasing exponential mimicking synaptic integrationby downstream neurons, and (2) nothing prevents our method from making h itself a free parameter,provided the data contain enough power to estimate it. Finally, our model can also be extended toversions where extraction time t R is not fixed, but varies from trial to trial ; this issue is discussed insection 4.3.2. Thanks to its linear structure, the readout defined in eq. 6 allows for a simple characterization of thecovariance structure that it induces between neural activity s ( t ) and the resulting percept (cid:98) f (Fig. 2b). Weshow in appendix A that this covariance structure can be summarized by three characteristic equations:1 = b (cid:62) a , (7) Z − = a (cid:62) Ca , (8) π ( t ) = Γ ( t ) a , (9)where vector b and matrices Γ ( t ) and C respectively describe the population’s tuning and noise covariancestructures, derived from the underlying neural statistics β ( t ) and γ ( t ) introduced in eq. 3-4: b i ( w, t R ) := (cid:90) u> β i ( t R − u ) h w ( u ) du, (10)Γ ij ( t | w, t R ) := (cid:90) u> γ ij ( t, t R − u ) h w ( u ) du, (11) C ij ( w, t R ) := (cid:90) u> Γ ij ( t R − u ) h w ( u ) du. (12)We here note the explicit dependency of b , Γ ( t ) and C on the temporal parameters of the readout w and t R . We will generally omit it in the sequel. Thus, the right-hand sides of eq. 7-9 depend only on readoutparameters w , t R , a and on the statistics of neural activity, independently of the animal’s percept.On the other hand, the left-hand sides of eq. 7-9 describe experimental quantities related to thereadout’s resulting percept (cid:98) f . The first line describes the average tuning of (cid:98) f to stimulus f , that is ∂ f E( (cid:98) f | f ), which is equal to 1 because we assume that (cid:98) f is unbiased. The second line expresses theresulting sensitivity Z for the readout, defined as in eq. 1. It reveals the dual influence of the numberof neurons (through a ) and integration window w on the readout’s overall sensitivity: indeed, undermild assumptions, the covariance matrix C scales with w − (see appendix A.2.1). Finally, the third lineexpresses the resulting covariance between (cid:98) f and the activity of each neuron s i ( t ), defined as in eq. 5.This is essentially the relationship already revealed by Haefner et al. (2013), that choice probabilities arerelated to readout weights through the noise covariance matrix ; however, our formalism focuses on thesimpler linear measure of PCV curves, and explicitly takes time into account.Both the neural measures β ( t ) and γ ( t, s ) on the right-hand side, and the percept-related measures Z and π ( t ) on the left-hand side, can be estimated from data. As a result, the characteristic equationsdefine an implicit constraint on the readout parameters w , t R and a (Fig. 2d). Actually, if the readoutmodel in eq. 6 is true, and precise measures are available for all neurons in the population, one sees easilythat these constraints would uniquely determine the readout parameters. Indeed, for fixed parameters w and t R , eq. 7 and 9 impose linear constraints on vector a . These constraints are generally overcomplete,since a is N tot -dimensional, while each time t in eq. 9 provides N tot additional linear constraints. Thus,in general, a solution a will only exist if one has targeted the true parameters w and t R , and it will thenbe unique. In the previous section we have shown that, in the standard linear model of percept formation, thetrial-to-trial covariance structure between spike trains s ( t ) and the resulting percept (cid:98) f leads to a set ofcharacteristic equations which implicitly define the parameters of the perceptual readout, provided thecovariance structure has been fully estimated.Unfortunately, this direct approach makes a fundamental assumption which cannot be reconciled withreal, experimental recordings: it assumes we have recorded all neurons from the population under survey,whereas real recordings only ever record from a small subset of that population. Thus we cannot hopeto reconstruct the real vector a , simply because some—probably most—of the neurons contributing to a were not recorded. Moreover, even across those neurons which were recorded through a series of sessionsin a given area, the noise covariance structure can never be fully assessed ; it remains elusive betweenneurons which were not recorded simultaneously.For this fundamental reason, the characteristic equations 7-9 should be used with a different perspec-tive than the full recovery of readout parameters. Instead, we propose to exploit the structure of theequations in a statistical approach, with the restricted goal of estimating the typical scales of readoutmost compatible with recorded data. A first necessary step in our approach is to statistically describe the nature of readout vector a . We aremostly interested in the support of a , meaning, the number and nature of neurons contributing to perceptformation. Thus, we assume that the percept is built only from the activities of an unknown ensemble K of neurons in the population and that, for given K and temporal parameters ( w, t R ), the readout vector a is chosen optimally to maximize the SNR of the resulting percept. Indeed, through this hypothesis, wetotally reformulate the problem of characterizing a in that of characterizing K ; which allows for muchsimpler statistical descriptions.The readout vector a achieving the maximum sensitivity Z in eq. 8, under the constraints of eq. 7and having support on K , is well known from the statistical literature. It is uniquely given by Fisher’slinear discriminant formula (Hastie et al., 2009): a K = 1 b (cid:62)K C − K b K C − K b K , (13)where a K , b K and C K are the versions of vectors a , b (eq. 10) and matrix C (eq. 12) restricted to neuronensemble K . By injecting the form (eq. 13) into eq. 8-9 we obtain a new version of the characteristic equa-tions, under the assumption that percept is built optimally from some given ensemble K , and temporalparameters ( w, t R ): Z ( K | w, t R ) = b (cid:62)K C − K b K , (14) π i ( t |K , w, t R ) = 1 Z ( K ) Γ i K ( t ) C − K b K . (15) Z in eq. 14 is the (optimal) sensitivity associated to this particular choice of K , w and t R . In eq. 15, π i ( t ) is the resulting, predicted PCV curve for every neuron i in the population (not only in ensemble K ). Γ i K ( t ) is a row vector whose entries are equal to Γ ij ( t ) (eq. 11) for neurons j ∈ K .These equations open the door to a statistical description of percept formation in the neural popula-tion: we can now parse through a large set of candidate ensembles K and temporal parameters ( w, t R ),and ask when the predictions for sensitivity (eq. 14) and PCV curves (eq. 15) match their true psy-chophysical counterparts Z (cid:63) (eq. 1) and π (cid:63)i ( t ) (eq. 5). For sensitivity, the straightforward comparison isto require that Z ( K | w, t R ) ≈ Z (cid:63) .0On the other hand, for the PCV equation (eq. 15), it is pointless to search an elementwise matchfor every neuron i , between the predicted curve π i ( t ) and its true measure π (cid:63)i ( t ). Indeed, since only asmall subset of the neurons have been recorded, no candidate readout ensemble K will be equal to thetrue ensemble (say K (cid:63) ) that was used by the animal ; and there is no guarantee that the covariancestructure between i and K , which gives rise to prediction (eq. 15), should be similar to that between i and K (cid:63) . Instead, a given set of readout parameters ( K , w, t R ) should be deemed plausible if they predictthe correct distribution of PCV signals across the population, irrespective of exact neuron identities i .Full distributions are difficult to estimate from finite amounts of data, and we will find the followingpopulation averages to convey sufficient information: W ( t |K , w, t R ) := E i (cid:16) b i ( w, t R ) π i ( t |K , w, t R ) (cid:17) , (16) W (cid:63) ( t | w, t R ) := E i (cid:16) b i ( w, t R ) π (cid:63)i ( t ) (cid:17) , (17)where E i denotes averaging over the full population of neurons i = 1 . . . N tot . We will deem a set ofreadout parameters plausible if they yield W ( K | w, t R ) ≈ W (cid:63) ( w, t R ) . Multiplying each PCV curveby the neuron’s tuning b i (eq. 10) yields more stable estimates for W , as discussed in section 3.2 andappendix B. There are many ways to compare the real values of sensitivity and PCV signals, to their predictionsgiven by eq. 14-15. We propose here an ad-hoc method, whose main characteristics are the following: (1)focus mostly on first-order statistics (i.e., means) across the neural population, (2) use arbitrary tolerancevalues to compare real and predicted data, (3) fit the two indicators sequentially: first SNR, then perceptcovariance. Due to its simplicity, this method will prove robust to measurements errors arising from finiteamounts of data (section 3.3).Our method is also designed to cope with a fundamental limitation of real recordings: all neurons(ensemble K , neurons i ) contributing to predictions eq. 14-15 must have been recorded simultaneously,to assess their noise covariance structure. This constraint sets a limit on ensemble sizes K which can beeasily investigated (but see section 4.4). Moreover, it prevents from estimating the full average of choicesignals (eq. 16) predicted by a given ensemble K —it is only available for simultaneously recorded neurons i . As a result, predictions (eq. 15) from different tested ensembles K must somehow be aggregated toproduce a reliable prediction of choice signals.We propose that each tested ensemble K contribute to our estimates in proportion to its ability toaccount for the animal’s sensitivity:P Z ( K | w, t R ) ∼ exp (cid:18) − ( Z ( K | w, t R ) − Z (cid:63) ) α Z (cid:19) , (18)normalized to insure (cid:80) K P Z ( K ) = 1 across all tested ensembles ( w and t R being fixed). Parameter α Z is the required tolerance for the fit, set by the experimenter. It is a regularization parameter creating atradeoff between precision of fit (small α Z ) and reliability of measurements, since a larger α Z leads tomore samples K with a substantial contribution P Z ( K ). When testing our method (section 3) we choose α Z as 5% of Z (cid:63) .For each tested couple ( w, t R ), we then use P Z ( K | w, t R ) as a weighting factor over all tested ensembles Note that W (cid:63) ( t | w, t R ) depends on parameters ( w, t R ) only through the neurons’ tunings b i ( w, t R ). In practice, asneural activities are rather stationary in time, W (cid:63) ( t ) changes very little for different values of parameters ( w, t R ). K , which yields two quantities:˘ K ( w, t R ) := (cid:88) K P Z ( K | w, t R )Card( K ) , (19)˘ W ( t | w, t R ) := (cid:88) K P Z ( K | w, t R )E i ( K ) (cid:16) b i π i ( t |K , w, t R ) (cid:17) , (20)where E i ( K ) denotes an average across all neurons i available to compute a prediction with eq. 15. Theseneurons must have been recorded simultaneously to ensemble K and, in order to produce an unbiasedestimate of choice signals in the full population, they should not belong to K itself.In eq. 19, ˘ K ( w, t R ) is the ensemble size K which most likely explains the animal’s sensitivity, givenreadout parameters ( w, t R ). In eq. 20, ˘ W ( t | w, t R ) is the mean prediction for PCV signals b i π i ( t ) acrossneurons i in the population, but stemming only from ensembles K which are compatible with the animal’ssensitivity. Considering quantity W ( t ) introduced in eq. 16, we see that˘ W ( t | w, t R ) (cid:39) (cid:88) K P Z ( K | w, t R ) W ( t |K , w, t R ) . (21)The equality is only approximate, because only neurons i recorded simultaneously to K are available toestimate ˘ W ( t ). However, as neurons i are random and we average over many ensembles K , ˘ W ( t ) rapidlyconverges to the quantity described in eq. 21.Both ˘ W ( t | w, t R ) and W (cid:63) ( t | w, t R ) are temporal signals defined over some interval [ T min , T max ] corre-sponding to one trial repetition. Defining the L2 norm for such temporal signals as (cid:107) x (cid:107) = ( T max − T min ) − (cid:90) T max t = T min x ( t ) dt, (22)we will deem parameters ( w, t R ) plausible if they lead to a small value of (cid:107) ˘ W ( w, t R ) − W (cid:63) ( w, t R ) (cid:107) .To yield a quantitative estimate of fit, we introduce a tolerance α W and define the following weightingfunction: P W ( w, t R ) ∼ exp (cid:32) − (cid:107) ˘ W ( w, t R ) − W (cid:63) ( w, t R ) (cid:107) α W (cid:33) , (23)normalized to insure (cid:80) w,t R P W ( w, t R ) = 1 across all tested temporal parameters ( w, t R ). Again, toler-ance α W is set arbitrarily by the experimenter. When testing our method (section 3) we choose α W as5% of (cid:107) W (cid:63) ( w, t R ) (cid:107) .Overall, the statistical method introduced above reduces readout parameters to three numbers: thetemporal extraction parameters w and t R , and the typical number of neurons K used by the readout.Thus, we can now apply a ‘brute-force’ approach: test all possible combinations ( K, w, t R ), computethe population statistics from eq. 18-23, and target the parameters that provide the best fit. In thenext section, we show the validity of this statistical approach, which allows us to recover the typicalscales ( K, w, t R ) of perceptual integration in an artificial network simulation. We further detail howthis statistical approach can be adapted to counteract measurement errors which typically arise in realexperiments from the finite number of available trials. In this section, we show how the statistical analysis of sensitivity and choice signals described aboveallows to recover the scales of integration of the neural readout. Naturally, to assess the validity of our2method, it is necessary to know the true nature of this readout. This can only be achieved through anartificial simulation of sensory integration, where we have full control on neural activities and readoutprocedure.We thus implemented an artificial neural network, that encodes some input stimulus f in the spikingactivity of its neurons (Fig. 3a). Precise parameters of this network are provided as SupplementaryMaterial (section S1). Briefly, on each trial, 100 input Poisson neurons fire with rate f , taking one of threepossible values 25, 30 and 35 Hz. The encoding population per se consists of 500 leaky integrate-and-fire(LIF) neurons. 100 of these neurons receive sparse excitatory projections from the input Poisson neurons,which naturally endows them with a positive tuning to stimulus f . 100 other neurons receive sparseinhibitory projections from the Poisson neurons, which naturally endows them with negative tuning.The remaining 300 neurons receive no direct projections from the input. Instead, all neurons in theencoding population are coupled through a sparse connectivity with random delays up to 5 ms. Synapticweights are random and balanced, tuned to ensure overall firing rates around 30 Hz. We implementedand simulated the network using Brian, a spiking neural network simulator in Python (Goodman andBrette, 2008). The statistics of activity for the resulting population are depicted in Fig. 3b,c,e.We then define the true perceptual readout from this network. We pick a random set of K (cid:63) = 40neurons in the population, whose activity is integrated over w (cid:63) = 50 ms and read out at time t (cid:63)R = 80 ms,on each stimulus presentation (each presentation lasting 500 ms). The resulting estimator f (cid:63) of stimulusvalue is built optimally given these constraints, through Fisher linear discriminant analysis (eq. 13) . Thisleads to a ‘psychometric’ sensitivity Z (cid:63) ≈ − , meaning that the network can typically discriminatevariations of ( Z (cid:63) ) − / ≈ f . (For comparison, over the same integration period w (cid:63) ,the 100 input Poisson neurons can discriminate variations around 2.5 Hz.) We then compute a PCVfor every recorded neuron, measuring its trial-to-trial covariance with estimator f (cid:63) (Fig. 3d). Applyingthe statistical method described above, our goal is now to recover the scales ( K (cid:63) , w (cid:63) , t (cid:63)R ) of perceptualintegration, on the basis of the experimental measures depicted in (Fig. 3c-e). To show the theoretical validity of our analysis, we first apply it to a situation where the trial-to-trial covariance of neurons and percept f (cid:63) (eq. 1-5) has been fully measured, with high precision . Inparticular, we assume full knowledge of the noise covariance structure γ ( t, s ) in our population (eq. 4).Actually, in these irrealistic conditions, the statistical analysis described above is not useful: instead, onecould directly solve the characteristic equations (eq. 7-9) to recover the readout parameters a , w and t R . However, this is a necessary first step to verify that our method is not flawed theoretically. Do thestatistical quantities introduced in eq. 18-23 allow to recover the true scales of integration ( K (cid:63) , w (cid:63) , t (cid:63)R )?Assuming a square integration kernel h , we test a set of candidate temporal integration windows w from 10 to 100 msec, and a set of candidate readout times t R from 10 to 200 msec, all in steps of 10msec. We then pick randomly candidate neural ensembles K , of sizes ranging from 2 to 90 neurons, with50 different random ensembles for each tested size K . For each tested parameters ( w, t R ), we computethe distribution of predicted SNRs Z ( K ) given by eq. 14, across all candidate neural ensembles (Fig. 4a).Each neural sample K is then associated to a weight P Z ( K ) describing the goodness of fit to the trueSNR Z (cid:63) (eq. 18). Following eq. 19-20, this yields an estimate for the best-fitting population size ˘ K ( w, t R )(Fig. 4a, dashed vertical line) and mean PCV curve ˘ W ( t | w, t R ) (Fig. 4b). Since we assume full knowledgeof experimental data, all 500 neurons i are involved in estimating ˘ W ( t ), independently of ensemble K .In Fig. 4c, we show the estimated population size ˘ K ( w, t R ) as a function of w and t R . It shows themark of the K – w tradeoff on sensitivity, mentioned in the introduction: smaller integration windows w To avoid overfitting issues, the trials used to learn the optimal f (cid:63) are not used in the subsequent analysis. To compute these estimates, all 500 neurons were simultaneously monitored over 16,500 repetitions of each stimuluscondition (not counting the trials used to train estimator f (cid:63) ). Using bootstrap resamplings over stimulus repetitions, wechecked that the resulting measures were virtually error-free. Figure 3.
Artificial neural network used for testing the method. (a) Network architecture. The encodinglayer consists of LIF neurons coupled through a sparse, balanced recurrent connectivity with randomdelays. Stimulus f is the firing intensity of a group of input Poisson neurons, which project sparselyinto two subpopulations of the encoding layer (neurons excited by the input vs. neurons inhibited by theinput). The majority of neurons in the encoding layer receive no direct projection, but can still acquirestimulus tuning through the recurrent connections. A “true” readout f (cid:63) is produced on every trial onthe basis of true parameters w (cid:63) , t (cid:63)R and K (cid:63) —which should be retrieved by our method. (b) Classicpopulation statistics in the encoding layer, for neural spike counts over a trial. (c) Sample PSTHs fromthe encoding layer. Model neurons display varied firing rates, and tunings of different polarities. (d)Sample PCV curves for the same neurons as panel c, computed by correlating each neuron’s spikes withthe true readout f (cid:63) . (e) Sample JPSTHs (noise correlations) for pairs of neurons in the encoding layer.Inset: corresponding cross-correlograms, obtained by projection along the diagonal.4 Figure 4.
Statistical recovery of readout parameters: noiseless measures. (a) For each tested temporalparameters ( w, t R ), predicted sensitivities Z ( K ) are computed for several candidate readout ensembles K of varying sizes. The goodness of fit to true sensitivity Z (cid:63) defines a weighting function P Z ( K ) acrossensembles. (b) The weighting function is used to compute a compound prediction ˘ W ( t ) for the averagePCV signal in the population, which is compared to the true average W (cid:63) ( t ). The three columns inpanels a-b correspond to different candidates ( w, t R ) for temporal integration. (c) Best-fitting ensemblesize ˘ K depending on candidate parameters ( w, t R ). The K − w tradeoff on sensitivity is clearly visible.(d) Goodness of fit of PCV signals depending on candidate parameters ( w, t R ) shows a clear optimumaround the true parameters of the readout. (e) Same as panel d, but transformed into a weighting functionP W ( w, t R ) over candidate temporal parameters.5require larger ensemble sizes K to account for the animal’s sensitivity. In Fig. 4d, we show two measuresof the resulting fit between ˘ W ( t | w, t R ) and its true value W (cid:63) ( t | w, t R ). In the first panel, we plot theplain L2 norm between the two temporal signals (using T min = −
100 msec and T max = 200 msec asintegration bounds). In the second panel, we reexpress this L2 norm as a weighting P W ( w, t R ) over theset of tested temporal parameters (eq. 23). Applying this final weighting over candidate values w , t R ,and ˘ K ( w, t R ) yields numerical estimates for the scales of the readout: (cid:98) w = 54 ± (cid:99) t R = 81 ± (cid:98) K = 34 . ± . w (cid:63) , t (cid:63)R and K (cid:63) , showing the theoretical validity of thisapproach. The estimated (cid:98) K is somewhat smaller than its true value K (cid:63) = 40, however this is no bias inour method: it simply means that the 40 neurons chosen randomly as the source of percept were slightlyless sensitive than the ‘average’ 40 neurons in the population.We also remind that these estimates depend on the tolerance levels fixed by the experimenter tocompute P Z ( K ) (eq. 18) and P W ( w, t R ) (eq. 23). Numerically, we find the resulting mean estimates tobe rather stable across a range of sensible tolerances. On the other hand, the resulting error bars—whichare obtained as second-order moment of the quantities weighted by P W ( w, t R )—only describe the typicalvariations of the parameters that lead to estimates within the fixed tolerances. In particular, driving thetolerances to zero always drives the error bars to zero, even though the predicted averages may becomefalse as too little data enter their computation.Why does the method work? Essentially, it proceeds in two successive steps. First, ( w, t R ) being heldfixed, it uses SNR information to target plausible neural ensembles K (Fig. 4a,c). Since the readout isassumed to be optimal, the mean SNR can only increase with the size K of the ensembles considered(Fig. 4a, plain blue curve). Plausible ensembles K are those lying near the crossing of this curve with thetrue ‘psychometric’ SNR (Fig. 4a, dashed red curve). For a straightforward application of our method,this crossing should occur within the typical ensemble sizes K tested—which are, in practice, limitedby the number of simultaneously recorded neurons. In section 4.4, we discuss possible extensions of themethod to the case where the crossing does not occur.Second, ( w, t R ) being still fixed, an average PCV prediction ˘ W ( t | w, t R ) is built, using the neuralensembles K targeted above, and compared to the true mean PCV curve W (cid:63) ( t | w, t R ). It is not trivialthat this comparison should work. To simplify the argumentation, let us assume that parameters w and t R are fixed at their true values w (cid:63) and t (cid:63)R . On the one hand, since the true percept is built from some(unknown) neural ensemble K (cid:63) , we have W (cid:63) ( t ) = W ( t |K (cid:63) ), using the notations of eq. 16-17. On theother hand, the prediction ˘ W ( t ) is built as a compound mean of W ( t |K ) over several candidate ensembles K (see eq. 21). As our method requires a match between W (cid:63) ( t ) and ˘ W ( t ), it implicitly supposes that allensembles K contributing to ˘ W ( t ) lead to very similar population averages W ( t |K ).Predicted curves W ( t |K ) are generally not available experimentally. However, we can computethem in our full-data simulation (Fig. 5). We find that, amongst ensembles K of similar size K , the i -population means of b i π i ( t |K ) rapidly converge to a single curve, independently of ensemble K (Fig. 5b).Furthermore, this result is not trivial: when the same analysis is performed on the plain PCV curves,not multiplied by tuning b i , the convergence does not occur anymore (Fig. 5a), or at least not as fast.In Fig. 5c, we plot the ratio between the variance of curves accross different ensembles K , and the powerof the mean curve, across all ensembles K of same size K . This ratio quickly drops to zero for thetuning-multiplied PCV curves (blue), but not for the plain PCV curves (green).To summarize, it is crucial for our method that each PCV curve π i ( t ) be multiplied by the neuron’stuning b i before computing population averages. Aside from the experimental observations of Fig. 5,6 Figure 5.
Mean percept covariance curves depend on readout ensemble K . (a) The mean value ofPCV curves π i ( t ) across neurons i in the population depends strongly on the readout ensemble K givingrise to the percept. (b) The mean value of tuning-multiplied PCV curves b i π i ( t ) depends much less onthe exact ensemble K , only on its size. This justifies our definition for the mean PCV curve W ( t |K )(eq. 16). (c) Relative variance of mean PCV curves, across readout ensembles K of the same size. Itis defined as the average of (cid:107) W ( K ) − W ( K ) (cid:107) across all ensembles ( K , K ) of similar size, divided by (cid:107) E K W ( K ) (cid:107) , power of the average curve across ensembles of size K . For the tuning-multiplied versionof W ( t |K ) (blue), this ratio quicky drops to zero. This is not the case for the plain mean E i ( π i ( t ) |K )(green).7several arguments justify this operation. First, it is well-known experimentally that choice signals andtuning for individual neurons are often positively correlated at the population level (Britten et al., 1996;Uka and DeAngelis, 2003). Intuitively, this is because positively-tuned neurons contribute positively tostimulus estimation, and conversely for negatively-tuned neurons. The strong population-wide correlationis indeed present in our simulated network (Fig. 3b). As a result, the population average for b i π i ( t ) isexpected to be mostly positive (Fig. 5b), which diminishes possible variations from one ensemble K tothe other. Second, theoretical arguments (appendix B and Supplementary Material S2) show that b i π i ( t )is a form better suited to compute an i -population average. It can be shown to be positive under mildassumptions, and its laws of convergence can be related to the overall spectrum of covariance in thepopulation. Having shown the theoretical efficiency of the statistical quantities introduced above in retrieving thecorrect scales of perceptual integration, we now test our method on its real purpose: recovering the scalesfrom incomplete experimental data (Fig. 6). We thus limit our measures to 150 repetitions for each testedstimulus. Furthermore, we split our population in 5 ensembles of 100 ‘simultaneously recorded’ neurons,so that noise covariance information (eq. 4) is only available between neurons belonging to the sameensemble. We use the same candidate values for parameters w , t R and K as before, picking 50 candidateensembles K for each tested size K . Neurons in K always belong to the same ‘simultaneous ensemble’,which is picked randomly. Finally, for each ensemble K , we consider 10 additional neurons i , from thesame ‘simultaneous ensemble’ but segregated from neurons in K , to compute the PCV prediction ˘ W ( t )(eq. 20).The method then proceeds as above, save a couple of modifications due to the incompleteness ofthe data. First, concerning SNR computations (eq. 14), the estimated covariance matrix C K may turnout to be rank-deficient up to numerical precision (although it should be full-rank theoretically, sincethe number of trials (450) is larger than the largest tested size K ). We thus replace its inverse C − K by its Moore-Penrose pseudo-inverse, with the default numerical tolerance of our mathematical software(Matlab). Even so, we observe a global overestimation of predicted sensitivities Z , compared to theirvalues in the full-data case (dashed blue lines in Fig. 6a, reproduced from Fig. 4a). This overfitting isa well-known feature when estimating Fisher sensitivity from insufficient data (Raudys and Duin, 1998;Hoyle, 2011).Second, concerning mean PCV predictions (eq. 20), our final estimates ˘ W ( t ) become noisy, reflectingthe jagginess of the underlying neural measures due to insufficient trials (Fig. 6b). This jagginess isproblematic, as it artificially increases measured values for the divergence (cid:107) ˘ W ( w, t R ) − W (cid:63) ( w, t R ) (cid:107) , whichis our final criterion to retrieve plausible values of ( w, t R ). However, this effect can be largely compensatedby resorting to resampling over trials (bootstrap). More precisely, for each tested parameters ( w, t R ), wemay describe our noisy measures in the form:˘ W ( meas ) ( t ) = ˘ W ( real ) ( t ) + η ( t ) ,W (cid:63), ( meas ) ( t ) = W (cid:63), ( real ) ( t ) + η (cid:63) ( t ) , where η ( t ) and η (cid:63) ( t ) describe our (unknown) measurement errors on ˘ W and W (cid:63) . From this follows theestimate: E (cid:107) ˘ W ( meas ) − W (cid:63), ( meas ) (cid:107) = (cid:107) ˘ W ( real ) − W (cid:63), ( real ) (cid:107) + E (cid:107) η (cid:107) + E (cid:107) η (cid:63) (cid:107) , (24)where E denotes the (theoretical) expectancy over the set of trials giving rise to measures ˘ W and W (cid:63) .This estimate is based on the assumption that measurement errors η ( t ) and η (cid:63) ( t ) are independent, whichis likely to be the case given that ˘ W ( t ) stems from predictions (on the basis of neural tunings and noisecovariances) whereas W (cid:63) ( t ) stems from measurements of the true PCV curves. All terms involving an8expectancy E in eq. 24 can be estimated by resampling with replacement over the set of recorded trials.By computing their difference, we thus get a corrected estimate for (cid:107) ˘ W ( real ) − W (cid:63), ( real ) (cid:107) . This is theestimate plotted in Fig. 6d. A drawback of this method is that the resulting estimate may become(slightly) negative when the underlying match between ˘ W and W (cid:63) is “too good”. However, this does notprevent from estimating the resulting weighting function P W ( w, t R ) (eq. 23), accepting that some termsin the exponential may become (slightly) positive.The final results, in Fig. 6, show that our method is still able to recover the most plausible scales ofperceptual integration. Using the same tolerances as previsouly (5% of the power of the true measures),we find the following final estimates: (cid:98) w = 50 ± , (cid:99) t R = 81 ± , (cid:98) K = 28 . ± . , again very close to the true scales of the readout. Notably, (cid:98) K is smaller than its prediction in thefull-data analysis: this is a consequence of the slight overfitting on estimated SNRs, which leads to anunderestimation of the population size K required to match the psychometric SNR. The issue remainsminor in this setup ; however we note that standard cross-validation and regularization techniques exist,that respectively assess and counteract the effects of overfitting (Hastie et al., 2009).In conclusion, the statistical method introduced above allows to overcome the missing data inherentto realistic recordings, by integrating information from all recorded neurons into a few reliable statisticalestimators. We have proposed a framework to interpret sensitivity and choice signals in a standard model of perceptualdecision-making. The purpose of our study is to help understand how perceptual integration takesplace from a full sensory neural population. This question requires, not only to compute neurometricsensitivities or choice signals for individual neurons, but also to integrate these measures in a single bigpicture of how information is read out from the population as a whole.The sensitivity to stimulus achievable by a neural population has received much attention, both ex-perimental and theoretical. It was progressively realized that (1) the structure of noise correlationsinfluences the amount of information that can be extracted from a neural population, and (2) the lin-ear readout maximizing sensitivity is generally not a simple average of neural activities, but rather anadequate weighting optimizing the ratio between signal and noise extracted from the population, whichcorresponds to Fisher’s linear discriminant (see Abbott and Dayan, 1999; Averbeck et al., 2006, and ref-erences therein). Similarly, the role of time window w used to integrate the spike counts of each neuronhas long been acknowledged to have a direct effect on the overall estimated sensitivity (see, e.g., Brittenet al., 1992; Uka and DeAngelis, 2003; Cohen and Newsome, 2009; Price and Born, 2010).Choice signals have also received much attention since their first measurements, in the form of choiceprobabilities (Britten et al., 1996). The temporal evolution of choice signals is routinely computed toqualitatively establish the instants in time when a given population covaries with the animal’s percept(de Lafuente and Romo, 2006; Price and Born, 2010). Recently, the specific temporal evolution of CPsignals during a depth discrimination task has cast doubt on the traditional, feedforward interpretation ofCP signals (Nienborg and Cumming, 2009, see section 4.3). However, very little studies have quantitatively interpreted CP signals so far, because no analytical relationship was available to interpret their values.9 Figure 6.
Statistical recovery of readout parameters: noisy measures. Same legends as Fig. 4, but withmodifications specific to small sample data. In panel b, the thin curves are different versions obtainedthrough bootstrap resampling over trials, and the thick curve is the average across bootstrap samples. Inpanel d, the L2 norm is corrected for measurement errors, using the bootstrap samples and eq. 24. Withthis modification, our method can recover the true readout parameters on the basis of finite amounts ofdata.0Only recently have Haefner et al. (2013) derived the analytical expression of CPs in the standard modelof perceptual integration (see section 4.2).To the best of our knowledge, only one study has explicitly proposed to jointly use sensitivity andchoice signals, as two independent constraints characterizing the underlying neural code. In this seminalstudy, Shadlen et al. (1996) proposed a feed-forward model of perceptual integration in visual area MTresponding to a moving dots stimulus, and studied how the population’s sensitivity and individual neuronCPs vary as a function of model parameters such as the number of neurons, strength of noise correlations,etc. In section 2.2, we have formalized this intuition of Shadlen et al. (1996), by showing that sensitivityand choice signals are two distinct, constitutive elements of the joint covariance structure between percept (cid:98) f and neural activity s ( t ) (Fig. 2c). The third constitutive element is the noise covariance structure of s ( t ) itself, a result also intuited by Shadlen et al. (1996) even though they assumed an oversimplified,homogeneous noise correlation matrix.Unlike most previous theoretical studies on the subject, we explicitly modeled all neural activities intime. Indeed, this is the only way of targeting the instants of sensory stimulation which contribute topercept formation, and thus to decipher to K – w tradeoff on sensitivity. Finally, the statistical approachdevelopped in section 2.3 is, to our knowledge, the first attempt to build inhomogeneous, partial measuresof neural activity into a quantitative interpretation of percept formation from the full neural population. Our model, as presented above, assumes a direct perceptual report of stimulus value f (cid:63) on every trial.Real experiments generally involve a more indirect report: to allow easier task learning by the animal,the report is always binary. In the classic random dot motion discrimination task (Britten et al., 1992), amonkey is visually presented with a set of randomly moving dots whose overall motion is slightly biasedtowards the left ( f < f > f and f of two successivevibrating stimuli on their fingertip. They must press one button if they consider that f > f , and theother button otherwise.Thus, classic choice signals such as CP only measure the covariation between the spike train of eachneuron and the animal’s binary choice c on each trial. To infer anything about the animal’s underlying percept f (cid:63) , it is also necessary to assume a behavioral model describing how the monkey takes a binarydecision, on every trial, on the basis of its sensory percept. Most often, this behavioral model is implicitlyassumed to be optimal. For example, in the random dot motion task, it is generally assumed that c = H ( f (cid:63) ) (Heavyside function), which is clearly the optimal policy if the animal has no further informationabout f . In the two-frequency task, the optimal behavioral model would be c = H ( f (cid:63) − f (cid:63) ). However, inthe real experiment, the monkeys have to memorize f for a few seconds before f is presented, so potentialeffects of memory loss may also come into play. More generally, behaving animals can display biases,lapses of attention, various exploratory and reward-maximization policies that lead to deviations fromthe optimal behavioral model. To summarize, choosing a relevant behavioral model is a connex problemthat cannot be addressed here, and that will vary depending on the task and individual considered.However, for most tractable behavioral models, the predicted sensitivities and choice signals willultimately rely on the quantities introduced in this article. To take the simplest example, we focus on therandom dot motion task with optimal policy c = H ( f (cid:63) )—as assumed in most models of the task—andmake the classic assumption that the statistics of f (cid:63) (given f ) are Gaussian (Fig. 7a). This model predictsthe following psychometric curve (probability of button presses as a function of stimulus value):E( c | f ) = P( f (cid:63) > | f ) = Φ (cid:0) √ Z (cid:63) f (cid:1) , where Φ is the standard cumulative normal distribution, and Z (cid:63) is the square SNR for f (cid:63) , as defined in1eq. 1. Thus, Z (cid:63) used in our model can easily be retrieved from experimental measures of the psychometriccurve.Same results hold for choice signals. Generally, choice signals are directly computed over some tempo-ral average s i of the underlying spike trains. Choice probability for every neuron i measures the area underthe ROC curve between the two distributions of s i , respectively conditioned on c = 0 and c = 1 (Greenand Swets, 1966). Recently Haefner et al. (2013) have shown that, assuming (1) multivariate Gaussianstatistics between s and f (cid:63) , and (2) the optimal behavioral model c = H ( f (cid:63) ), choice probability can beanalytically expressed as: CP i (cid:39)
12 + √ π √ Z (cid:63) π (cid:63)i σ ( s i ) , a formula virtually exact over the full range of plausible CP values. The rightmost fraction is nothingbut the Pearson correlation between variables s i and f (cid:63) . The numerator involves the linear covariancebetween s i and f (cid:63) which is, in our notations, the temporally averaged PCV curve π (cid:63)i . The authors furtherderived that, in the standard model of percept formation with readout vector a , this term is given by π (cid:63) = Ca , which is exactly the PCV characteristic equation (eq. 9) in its temporally-averaged form. TheCP formula involves a normalization by σ ( s i ), the standard deviation of spike count s i . This preventsfrom a straightforward extension of the formula in time, because σ ( s i ) tends to infinity as the integrationwindow used to compute s i tends to zero.A simpler measure of choice signals is the choice-conditioned difference in firing rate (Britten et al.,1996), which can be computed for every individual neuron i as ∆ i ( t ) := E( s i ( t ) | c = 1) − E( s i ( t ) | c = 0).Under the same assumptions as above (Gaussian statistics for s (t) and f (cid:63) , optimal behavioral model),this difference can be analytically expressed as:∆ i ( t ) = (cid:114) π √ Z (cid:63) π (cid:63)i ( t ) . (25)This is very close to the CP formula, but without the additional normalization by σ ( s i ). Thus it directlyallows for a simple generalization to temporal signals. Since ∆ i ( t ) is easily computable from experimentaldata, it provides the easiest way of accessing the underyling PCV curves π (cid:63)i ( t ) used in our article. The readout model (eq. 6) used to analyze sensitivity and choice signals is an instalment of the ‘standard’,feedforward model of percept formation. As such it makes a number of hypothesis which should beunderstood when applying our methods to real experimental data. First, it assumes that the percept f (cid:63) is built linearly from the activities of the neurons. There is no guarantee that this is the case duringreal percept formation, but linearity is an unavoidable ingredient to make quantitative predictions at thepopulation level. Even if the real percept formation departs from linearity, fitting a linear model willmost likely retain meaningful estimates for the coarse information (temporal scales, number of neuronsinvolved) that we seek to estimate in this work.More precisely, the model in eq. 6 assumes that spikes are integrated using a kernel separable acrossneurons and time, that is A i ( t ) = a i h w ( t ). Theory does not prevent from studying a more general inte-gration, where each neuron i contributes with a different time course A i ( t ). The readout’s characteristicequations are derived equally well in that case. Rather, assuming a separable form reflects (1) the in-tuition that the temporal components of integration are rather uniform across the population, and (2) Relying on the general formula E( X | X >
0) = √ π − ρ , applicable to any bivariate normal variables ( X , X ) withmeans 0, unitary variances, and correlation coefficient ρ . We note that the assumption of normality is violated at small timescales because s i ( t ) is clearly not Gaussian in that case. However, in practice, ∆ i ( t ) is always computed with a minimalamount of temporal smoothing which resolves this potential issue. Figure 7.
Discussion. (a) Classic behavioral model. If the task is to judge whether f >
0, the optimalbehavioral policy consists of the simple threshold rule c = H ( f (cid:63) ) (Heavyside function). Furthermore, thetrial-to-trial distribution of percept f (cid:63) given f (distributions with different colors) is generally assumedto be Gaussian. Under these hypotheses, sensitivity and PCV signals used in this article are directlycomputed from real experimental data (neurometric curve and choice signals). (b) If readout time t R varies strongly from trial to trial (with density g ( t )), it leads to a flattening of PCV signals (thick greencurve) compared to the case with deterministic t R (dashed green curve). (c) If a decision-related signalfeedbacks into sensory areas, it leads to a divergence of PCV signals (thick green curve) after the readouttime t R , compared to the case without feedback (dashed green curve).the impossibility to fit a model with general kernel A i ( t ). Instead, we summarize temporal integrationfrom the population by two parameters w and t R , opening the door to a reliable estimation from data.Although the integration shape h could also be fit from data in theory, it seems more fruitful to assumea simple shape from the start (a classic square window kernel in our applications). Given that our goalis to estimate the coarse scales of percept formation, our method will likely be robust to various simplechoices for h . As a simple example, we tested our method, assuming a square window kernel, on dataproduced by a readout using an exponential kernel, and still recovered the correct parameters w , t R and K . Our model, as presented above, makes another important assumption: that perceptual readout occurs atthe same time t R on every stimulus presentation. This assumption is likely to be valid in perceptual tasksthat allow a fast reaction from the animal (‘reaction time’ tasks), in which case t R will generally be assmall as it can get (see, e.g., Stanford et al., 2010). However, when sensory stimulation lasts longer (say,over 500 msec) it opens the door to variations in t R from trial to trial, or even to several reactualizationsof percept f (cid:63) during the same trial. For example, imagine that the stimulus is a particular RGB color ona monitor, and you are asked to judge whether it contains more green (G) or blue (B). From intuition,we can tell that our performance in such a task will not sensibly increase whether we watch the colorfor one second or one minute. In our model’s formalism (eq. 6), this reveals built-in limitations on theeffective integration window w that we can use in the task (remember that the readout’s performance isproportional to w ). But then, if our percept arises from a limited integration window w and we indeedwatch the color for a full minute, when is our percept built?In appendix A, we derive a more general version of the characteristic equations (eq. 7-9) assumingthat t R in eq. 6 is itself a random variable, drawn on each trial following some probability distribution g ( t ). Because sensory neurons have rather stationary activities in time, this additional assumptiondoes not strongly affect the readout’s sensitivity. On the other hand, it affects strongly PCV curves.Essentially, the resulting PCV curve resembles a convolution of the deterministic PCV curve by g ( t )(Fig. 7b, section A.2.2). If g ( t ) is substantially distributed in time, the PCV curves will become broader,3and flatter. In practice, this means that if a behavioral task is built such that t R can display strongvariations from trial to trial, the statistical method introduced above will produce biased estimates. Intheory, this issue could be resolved by adding an additional parameter in the analysis to describe g ( t )(see section A.2.2), but the validation remains to be done. Finally, our ‘standard’ model assumes that percept formation is exclusively feed-forward. The activities s i ( t ) of the sensory neurons are integrated to give rise to percept (cid:98) f and the animal’s resulting choice c , and this forming decision does not affect sensory neurons in return. Recent evidence suggests thatreality is more complex. By looking at the temporal evolution of CP signals in V2 neurons during adepth discrimination task, Nienborg and Cumming (2009) evidenced dynamics which are best explainedby a top-down signal, biasing the activity of the neurons on each trial after the choice is formed . Inour notations, the population spikes s i ( t ) would thus display a choice-dependent signal which kicks in onevery trial after time t R , resulting in PCV signals that deviate from their prediction in the absence offeedback (Fig. 7c).What descriptive power does our model retain, if such top-down effects are strong? The answerdepends on the nature of the putative feedback. If the feedback depends linearly on percept (cid:98) f (and thus,on the spike trains), its effects are fully encompassed in our model. Indeed, this feedback signal willthen be totally captured by the neurons’ linear covariance structure γ ij ( t, s ), so that our predictions willnaturally take it into account. This is also the case if the oddity noted by Nienborg and Cumming (2009)is due to global shifts of neural excitability from trial to trial. On the other hand, if the feedback dependsdirectly on the choice c —which displays a nonlinear, ‘all-or-none’ dependency on (cid:98) f —then it will not becaptured by our model, and lead to possible biases. Even so, the effects of the feedback could be largelyalleviated through a small trick: compare true and predicted PCV signals only up to (candidate) time t R (see eq. 22). Can we understand in more depth the statistical principles at work underneath our method of estimation?What factors govern the evolution of sensitivity Z ( K ) (Fig. 4a, eq. 14) and mean PCV signal W ( t |K )(Fig. 4b, eq. 16), as a function of the number of neurons K used for readout? This question is not onlyof theoretical, but also of practical interest. Indeed, it may happen in real applications that the numberof simultaneously recorded neurons K max is too small to observe the crossing of predicted and true SNRcurves (Fig. 4a) . In such a case, predictions will be biased because no recorded ensemble K can readilyaccount for animal sensitivity.What predictive power do ensembles up to size K max contain about larger ensembles? For example,can we extrapolate the shape of the mean SNR curve (Fig. 4a) to K > K max ? In appendix B we addressthis question theoretically, by studying the value of SNR and PCV signals as a function of ensemblesize K , and of the general structure of activity in the population. Our study relies on the singularvalue decomposition (SVD) of neural activity in the population. The SVD reveals a set of m = 1 . . . M independent modes of population activity, each mode being associated to a power λ m and a sensitivity η m . Essentially, the sensitivity embedded in a neural ensemble K of size K increases as the sum ofsensitivities for the K first modes in the population—which are the modes with the largest powers λ m .Conversely, the overall power of PCV signal W ( t ) decreases as the average value of λ m in these K firstmodes, weighted by their respective sensitivities.Because there is no general relationship between the power λ m of a mode and its sensitivity to stimulus η m , there is no trivial way of extrapolating SNR and PCV predictions to ensemble sizes K that were Actually, this is always bound to happen for small tested parameter w , following the K – w tradeoff. λ m and η m —which essentially amounts to characterizing the relative embeddingof signal and noise in the full population (Wohrer et al., 2012). For example, it is classically assumedthat the noise covariance matrix is “smooth” with respect to the signal covariance matrix, so that theformer can be predicted on the basis of the latter (Wohrer et al., 2010; Haefner et al., 2013). Thus, whileextrapolation of the statistical method above to larger populations is not trivial, it can be performedunder specific assumptions about the embedding of signal and noise in the population considered. We have shown how classic data recorded during perceptual decision-making experiments can be inter-preted as samples from the joint covariance structure of neural activities and animal decision. Assuminga standard linear model of percept formation from neural activities, we derived a set of characteristicequations which relate neural and perceptual data, and thus define implicitly the parameters of perceptualintegration by the animal on the basis of its sensory neurons. The neural data consist of neural PSTHs(first moment of neural activities) and JPSTHs (second moment of neural activities). The perceptual dataconsist of the animal’s sensitivity, and of each neuron’s covariance with the animal’s choice—a quantityoften assessed through choice probabilities, and for which we proposed a simpler linear equivalent coined percept covariance (PCV).We then proposed a method to utilize these characteristic equations in a case of practical interest,when experimenters only have access to finite statistical samples of neural data across the full population.Our goal was to successfully recover the instants in time and the typical number of neurons being used forpercept formation—a difficult problem which cannot be solved on the sole basis of sensitivity information,due to the “ K – w tradeoff”. Our method relies on statistical averages of predicted sensitivity and PCVsignals arising from random, candidate neural ensembles used as the source of percept formation ; andseeks to match these predictions with the true, recorded perceptual data. We tested this method on anartificial neural network producing a form of stimulus encoding, and showed that it successfully recoversthe scales of perceptual integration, on the basis of sample recordings of realistic size.Our method opens the way to novel experimental assessments of percept formation in sensory decision-making tasks. Indeed, the two main quantities used in our statistical analysis—sensitivity Z (eq. 14) andmean PCV curve W ( t ) (eq. 16)—rely on classic experimental measures. The main limitation of ourapproach is the size of candidate readout ensembles which can be considered, as they should necessarilyhave been recorded simultaneously. However, the number of simultaneously recorded neurons is constantlypushing upwards with modern experimental techniques, so we may expect that this limitation, if itexists, will soon be overcome. Furthermore, through a theoretical analysis based on the singular valuedecomposition (SVD) of neural activities, we showed the possibility of extrapolation to larger ensemblesizes than those simultaneously recorded, although such extrapolations can only be done under specificassumptions, and on a case-by-case basis. For all these reasons, our method can readily be tested onreal data, and hopefully provide new insights into the nature of percept formation from populations ofsensory neurons.5 Appendices
A Characteristic equations for the readout
A.1 Derivation
We here derive the characteristic equations for the linear readout introduced in the main text, and furthercomment some of its properties. We consider a more general version of eq. 6, where the extraction time t R is allowed to vary from trial to trial. We thus assume that t R is itself a random variable, drawn on eachtrial according to some density function g ( t ), independently of neural activities s ( t ). The full readoutmodel then writes: (cid:98) f ( t R ) = (cid:88) i (cid:90) u> a i s i ( t R − u ) h w ( u ) du, (A.1) t R ∼ g ( t ) . (A.2)This model naturally encompasses the simpler version presented in the main text, with a deterministictime t R : it corresponds to taking g ( t ) as a Dirac function located on that deterministic value.The characteristic equations for this model rely on a straightforward computation of the second orderstatistics of (cid:98) f , starting from eq. A.1. To deal with random time t R , we note that for any randomprocess X ( t ) independent of t R , E( X ( t R )) = (cid:82) + ∞ t = −∞ g ( t )E( X ( t )) dt . This expression is valid only if t R isindependent from the random variables contributing to X (in our case, the spike trains).Then, the expected value of (cid:98) f given a stimulus f writes:E( (cid:98) f ( t R ) | f ) = (cid:90) t g ( t ) (cid:88) i (cid:90) u> a i E( s i ( t − u ) | f ) h w ( u ) du dt = (cid:88) i a i (cid:90) t ( g (cid:63) h w )( t ) λ i ( t ; f ) dt, (A.3)where g (cid:63) h w ( t ) = (cid:82) u g ( u ) h w ( u − t ) du is the temporal correlation between g and h w , and λ i ( t ; f ) is thePSTH for neuron i in stimulus condition f , defined as in the main text (eq. 2).Similarly, the expected value of (cid:98) f given a stimulus f writes:E( (cid:98) f ( t R ) | f ) = (cid:90) t g ( t )E (cid:16)(cid:16) (cid:88) i (cid:90) u> a i s i ( t − u ) h w ( u ) du (cid:17) (cid:12)(cid:12)(cid:12) f (cid:17) dt = (cid:90) t g ( t ) (cid:88) ij (cid:90) (cid:90) ( u,v ) > a i a j E( s i ( t − u ) s j ( t − v ) | f ) h w ( u ) h w ( v ) du dv dt = (cid:88) ij a i a j (cid:90) (cid:90) ( t,s ) G w ( t, s ) η ij ( t, s ; f ) dt ds, (A.4)where we have defined G w ( t, s ) := (cid:82) u g ( u ) h w ( u − t ) h w ( u − s ) du , and η ij ( t, s ; f ) := E( s i ( t ) s j ( s ) | f ). η ij ( t, s ; f ) is very related to the covariance structure in the population. It corresponds to the “plain”JPSTH for the neurons in stimulus condition f , before correcting by the so-called “product predictor”(Aertsen et al., 1989).6Finally, the expected value for the product of (cid:98) f and the activity of any neuron s i ( t ) writes:E( (cid:98) f ( t R ) s i ( t ) | f ) = (cid:90) s g ( s ) (cid:88) j (cid:90) u> a j E( s i ( t ) s j ( s − u ) | f ) h w ( u ) du ds = (cid:88) j a j (cid:90) s ( g (cid:63) h w )( s ) η ij ( t, s ; f ) ds, (A.5)using the same notations as above.The three expressions eq. A.3-A.5 roughly correspond to the three characteristic equations for thereadout. To obtain them, we consider the variational versions of the previous expressions. First, weobtain the characteristic equation for tuning by differentiating eq. A.3 with respect to stimulus. Second,equations A.4 and A.5 are expressed in ‘product’ form E( XY ), whereas the corresponding characteristicequations are expressed in ‘covariance’ form Cov( X, Y ) = E( XY ) − E( X )E( Y ). Once this is done, andafter some rearrangement of the terms, we obtain the characterisitic equations for tuning (eq. A.6),sensitivity (eq. A.7) and percept covariance (eq. A.8): ∂ f E( (cid:98) f | f ) = (cid:88) i a i (cid:90) t ( g (cid:63) h w )( t ) β i ( t ) dt, (A.6) (cid:104) Var( (cid:98) f | f ) (cid:105) f = (cid:88) ij a i a j (cid:16) (cid:90) (cid:90) ( t,s ) G w ( t, s ) γ ij ( t, s ) dt ds + V tempij (cid:17) , (A.7) (cid:104) Cov( (cid:98) f , s i ( t ) | f ) (cid:105) f = (cid:88) j a j (cid:90) s ( g (cid:63) h w )( s ) γ ij ( t, s ) ds. (A.8)In eq. A.6, β i ( t ) is the temporal tuning curve for neuron i , defined as in eq. 3. If the readout is unbiased,the left-hand side is equal to 1, as in the main text. In eq. A.7 and A.8, γ ij ( t, s ) is the covariance structure(JPSTH) between neurons i and j , defined as in eq. 4.Finally in eq. A.7, matrix V tempij is an additional source of variance that appears only when g ( t ) hasan extended temporal support, i.e., when t R is non-deterministic. It then writes: V tempij = (cid:90) (cid:90) ( t,s ) G w ( t, s ) (cid:68)(cid:16) λ i ( t ; f ) − λ i ( f ) (cid:17)(cid:16) λ j ( s ; f ) − λ j ( f ) (cid:17)(cid:69) f dt ds, where λ i ( f ) := (cid:82) t ( g (cid:63) h w )( t ) λ i ( t ; f ) dt is the temporal average already used above (eq. A.3). Thus, V tempij measures a form of temporal covariance in the PSTHs for the neurons.When t R is deterministic, as in the main text, we have g ( t ) = δ ( t − t R ), a Dirac function. Then, thetemporal integration kernels used in eq. A.6-A.8 boil down to ( g (cid:63) h w )( t ) = h w ( t R − t ) and G w ( t, s ) = h w ( t R − t ) h w ( t R − s ). One checks easily that in these conditions, the additional temporal variance term V tempij vanishes, and we recover the characteristic equations from the main text. A.2 Additional interpretations
A.2.1 Sensitivity as a function of w In the form of eq. A.7, it is not clear how the value of w influences the variance of (cid:98) f , and thus thereadout’s sensitivity. To get a better intuition, let us first neglect the temporal variance term V tempij .One checks easily that kernel G w ( t, s ), introduced above, verifies the following property: (cid:82) t G w ( t, t + τ ) dt =7( h w (cid:63) h w )( τ ), the autocorrelation of kernel h w . As a result, we can rewrite A.7 in the form : (cid:104) Var( (cid:98) f | f ) (cid:105) f = (cid:88) ij a i a j (cid:90) τ ( h w (cid:63) h w )( τ ) (cid:16) (cid:90) t G w ( t, t + τ ) h w (cid:63) h w ( τ ) γ ij ( t, t + τ ) dt (cid:17) dτ = (cid:88) ij a i a j (cid:90) τ ( h w (cid:63) h w )( τ ) γ ij ( τ ) dτ. (A.9)In the first line, the function of t defined by the fraction is positive and has an integral of 1, so it operatesas a temporal averaging on γ ij ( t, t + τ ). The resulting average over t , noted γ ij ( τ ) in the second line, isthus a form of cross-correlogram between neurons i and j , measuring the average covariance between thespikes from i and j separated by a time lag τ .Because h w is a low-pass kernel with scale w , its autocorrelation function typically has support on[ − w, w ], and verifies : ( h w (cid:63) h w )(0) = w − . On the other hand, γ ij ( τ ) typically has support on someinterval [ − τ γ , τ γ ], where τ γ is the typical time scale of noise correlations in the population. As a result,as soon as w gets bigger than τ γ , the integral in (A.9) starts behaving like w − , and the SNR of (cid:98) f scalesas w . A similar analysis can be performed on the additional term V tempij (eq. A.7). A.2.2 Non-deterministic t R What are the main departures from the main text when function g ( t ) has an extended temporal support?From eq. A.6-A.8, it is clear that the general form of the characteristic equations still holds:1 = b (cid:62) a ,Z − = a (cid:62) Ca , π ( t ) = Γ ( t ) a , but with more general definitions of b ( w, g ), C ( w, g ) and Γ ( t | w, g ). First, an additional covariance matrix V temp may contribute to C , if neural activities are not stationary in time. Indeed, if t R varies from trialto trial, any variation of firing rates in time creates an additional source of variability in (cid:98) f .Second, through eq. A.8, g ( s ) acts a weighting factor over the PCV curves that would be obtained foreach t R : Γ ( t | g ) = (cid:82) s g ( s ) Γ ( t | t R = s ) ds . This leads to the spreading of PCV curves sketched in Fig. 7c.These two features lead to lose one specific property of the deterministic case. When the “natural”temporal averaging of PCV signals was considered, that is π = (cid:82) t h w ( t R − t ) π ( t ) dt , the integrated PCVequation yielded π = Ca , because Γ = C . In the general case, the “natural” temporal averaging is π = (cid:82) t ( g (cid:63) h w )( t ) π ( t ) dt , and one checks easily that Γ (cid:54) = C . Thus, with general g ( t ), the sensitivity(eq. A.7) and PCV (eq. A.8) equations become more dissociated.In these conditions, it is unclear whether the statistical approach introduced in the main text couldbe extended, to also recover a non-deterministic extraction function g ( t ). The main concern is that thetemporal evolution of PCV signals is only determined by the aggregate function ( g (cid:63) h w )( t ) (eq. A.8),which cannot be used to disentangle g ( t ) and w separately. However, general considerations suggest thatthe method could still work in that case. Indeed, the respective effects of g ( t ) and w on the covariancestructures used in eq. A.7-A.8 can roughly be thought of as a scaling: Γ (cid:39) (cid:15) ( w, g ) C , (A.10)because the overall “shape” of covariance between neurons (as opposed to its “strength”) does not dependmuch on the precise temporal integration used to compute their activity. Actually, under the specific Assuming proper scaling for shape function h : (cid:82) u h ( u ) du = (cid:82) u h ( u ) du = 1. γ ij ( t, s ) = γ ij Q ( | t − s | ) (stationary activities with uniform temporal correlations), rela-tionship (eq. A.10) can be shown to be exact, with (cid:15) ( w, g ) = (cid:82) ξ (cid:101) Q ( ξ ) (cid:107) (cid:102) h w ( ξ ) (cid:107) (cid:107) (cid:101) g ( ξ ) (cid:107) dξ (cid:82) ξ (cid:101) Q ( ξ ) (cid:107) (cid:102) h w ( ξ ) (cid:107) dξ expressed in terms of Fourier transforms. As a result, the mean PCV curve W ( t ) (eq. 15, 16) is predictedto scale as (cid:15) ( w, g ). So, while matching the temporal support of W ( t ) and W (cid:63) ( t ) constrains the value of( g (cid:63) h w )( t ), matching their overall power constrains (cid:15) ( w, g ), and we can hope to disentangle the valuesof g ( t ) and w separately. In practice though, this would require the fitting of at least one additionaltemporal parameter ; typically, the standard deviation of t R from trial to trial. B Singular value analysis—summary
We summarize here the main results of a theoretical analysis to understand the evolution of SNR andPCV signals achieved by readout ensembles of growing size K . Detailed mathematical derivations areavailable in Supplementary Section S2. For simplicity we focus only on time-integrated neural activities s i := (cid:82) u h w ( u ) s i ( t R − u ) du , assuming a fixed choice of ( w, t R ). We consider random readout ensembles K in the population, and two resulting indicators. First, we consider the sensitivity Y ( K ), linked toSNR Z by relationship Y = Z (1 + Z ) − . This is the natural description of sensitivity in the frameworkbelow. It is obtained like Z in the main text (eq. 14) but replacing the noise covariance matrix C bythe total covariance matrix A = C + (cid:104) f (cid:105) bb (cid:62) . Second, we consider the mean PCV in the population W ( K ), obtained as the “natural” temporal integration of signal W ( t |K ) from the main text (eq. 16): W := (cid:82) u h w ( u ) W ( t R − u ) du . Since W ( t ) is mostly positive, W roughly corresponds to the overall powerin W ( t ). SVD reformulation of neural activity.
The analysis relies on the singular value decomposition(SVD) of population activity into m = 1 . . . M orthogonal modes : s fωi = M (cid:88) m =1 λ m u mi v fωm , where the lower index i = 1 . . . N tot indicates neurons in the population, and the upper index indicatesall possible stimuli f and random realizations ω of network activity. Each mode m is defined by itspower λ m >
0, its distribution vector (over neurons) u m , and its appearance variable v m which takes adifferent random value on every trial. By construction, the various modes are orthogonal across neurons(( u m ) (cid:62) u n = δ mn ), and linearly independent across trials (Cov fω ( v m , v n ) = δ mn ), so they typicallycorrespond to distinct “patterns of activity” in the population. The power λ m describes the overallimpact of mode m on population activity. We assume λ ≥ · · · ≥ λ M , so we progressively include modeswith lower power—either because they involve only a small fraction of neurons, either because they appearonly on rare trials. The number of modes M is the intrinsic dimensionality of the neural population’sactivity. In real populations we expect M << N tot , because neural activities are largely correlated.The SVD is best viewed as a change of variables reexpressing neural activities { s i } i =1 ...N tot in termsof mode appearance variables { v m } m =1 ...M . Just like individual neurons, each mode m can be associatedto a sensitivity to stimulus η m , which describes the proportion of the mode’s power λ m due to variationsof the signal ( f ), as opposed to variations of the noise ( ω ). Since modes are linearly independent, the fullpopulation’s sensitivity corresponds to the sum of individual mode sensitivities: Y ( ∞ ) = (cid:80) m η m .9 Sensitivity and PCV from finite neural ensembles.
We now want to estimate the amount ofstimulus sensitivity Y ( K ) that can be extrated, not from the full population, but from neural subensemblesof size K . The SVD provides a natural reinterpretation of this problem in terms of activity modes: eachensemble K “reveals” only a fraction of the underlying modes. The pivotal object to perform thisreinterpretation is our so-called data matrix : D K := (cid:110) d mi = λ m u mi (cid:111) m =1 ...Mi ∈K , an M × K matrix describing the activity of neural ensemble K in the space of modes. In the originalproblem formulation, the K × K matrix D (cid:62) D describes the covariance of neural activity in ensemble K , and we want to estimate the resulting sensitivity. In the dual formulation, the M × M matrix DD (cid:62) describes a covariance structure between modes, but estimated only from the sample neurons in K . Theproblem now lives in a space of fixed dimensionality M , and can be related to classical problems ofestimating covariance structures from a finite number of samples—in our case, the neurons.Applying this dual approach, we find that Y ( K ) and W ( K ) depend on readout ensemble K onlythrough an M × M matrix ∆ K , the (rank K ) orthogonal projector on the span of vectors { d i } i ∈K inmode space: Y ( K ) = η (cid:62) ∆ K η ,W ( K ) = − B + ( N tot Y ( K )) − η (cid:62) Λ ∆ K η , where B := E i ( b i ) is the average square tuning in the population. Furthermore, the average projector ∆ K across ensembles of size K , that is E K ∆ , is approximately diagonal in mode space. Noting { (cid:15) mK } forits diagonal, we thus obtain the approximations:E K Y (cid:39) M (cid:88) m =1 (cid:15) mK η m , (B.1)E K W (cid:39) − B + N − (cid:80) Mm =1 (cid:15) mK η m λ m (cid:80) Mm =1 (cid:15) mK η m , (B.2)where 0 ≤ (cid:15) mK ≤ m revealed by K random neurons. As modes withlarger power λ m tend to be revealed first, a rough but useful image is to consider that (cid:15) mK (cid:39) m ≤ K —onlythe K first modes are revealed by ensembles of K neurons.Thus, sensitivity E K Y grows with K as mode sensitivities η m are progressively revealed. Saturationoccurs when all nonzero η m are revealed, in which case E K Y = Y ( ∞ ). Conversely, the mean PCV E K W decreases with K . Indeed, the fraction in eq. B.2 can be viewed as an average power (cid:104) λ (cid:105) m,K , whereeach mode m contributes with a weight (cid:15) mK η m . As (cid:15) mK progressively reveals modes with lower power λ m ,this average power is expected to decrease with K . Again, saturation occurs when all nonzero η m arerevealed, and then E K W = Z ( ∞ ) − B , the predicted value for choice signals in case of optimal readoutfrom the full population (Haefner et al., 2013). Extrapolation to large K . What do these results tell us about possible extrapolations to ensemblesizes K larger than the maximum number of neurons simultaneously recorded by the experimenter?Essentially, that such extrapolations always require further assumptions about the structure of activityin the population.Indeed, one can imagine scenarios in which the most sensitive modes (those with highest η m ) areassociated to relatively low powers λ m and thus, appear only at large K . This could be the case, forexample, if a very local circuit of neurons carries a lot of information about the stimulus, independentlyfrom the rest of the population. Because it involves few neurons, the corresponding mode of activity0will have a low power λ m , and will require very large ensembles K to be detected—simply because thecorresponding neurons are not recorded in smaller ensembles. A similar discussion can be found inHaefner et al. (2013). Another example is the encoding network theoretically proposed by Boerlin andDen`eve (2011), where each neuron spikes only if its information is not already encoded in the activityof the remaining neurons. This results in the appearance of a few, global modes of activity which arespecifically designed to have a very large SNR, meaning high η m and low λ m . In this case, any estimationof sensitivity from a subpopulation K will consistently be smaller than the full population’s sensitivity.To summarize, extrapolation can only be performed under additional assumptions about the overalllink between η m and λ m —or equivalently, about the relationship between ‘signal’ and ‘noise’ contributionsto population activity (see also discussions in Wohrer et al., 2012; Haefner et al., 2013). The extent towhich such assumptions are justified will depend on each specific context. References
Abbott, L. F., Dayan, P., 1999. The effect of correlated variability on the accuracy of a population code. Neuralcomputation 11 (1), 91–101.Aertsen, A. M., Gerstein, G. L., Habib, M. K., Palm, G., 1989. Dynamics of neuronal firing correlation: modulationof ”effective connectivity”. Journal of neurophysiology 61 (5), 900–17.Averbeck, B. B., Latham, P. E., Pouget, A., 2006. Neural correlations, population coding and computation.Nature Reviews Neuroscience 7 (5), 358–66.Boerlin, M., Den`eve, S., 2011. Spike-based population coding and working memory. PLoS computational biology7 (2).Britten, K. H., Newsome, W. T., Shadlen, M. N., Celebrini, S., Movshon, A. J., 1996. A relationship betweenbehavioral choice and the visual response of neurons in macaque MT. Visual Neuroscience 13, 87–100.Britten, K. H., Shadlen, M. N., Newsome, W. T., Movshon, J. A., 1992. The analysis of visual motion: acomparison of neuronal and psychophysical performance. Journal of Neuroscience 12 (12), 4745–4765.Cohen, M. R., Newsome, W. T., 2009. Estimates of the contribution of single neurons to perception depend ontimescale and noise correlation. Journal of Neuroscience 29 (20), 6635–48.Daley, D., Vere-Jones, D., 2007. An introduction to the theory of point processes. Vol. 1. Springer Verlag, NewYork, USA.de Lafuente, V., Romo, R., 2006. Neural correlate of subjective sensory experience gradually builds up acrosscortical areas. Proceedings of the National Academy of Sciences of the United States of America 103 (39),14266–71.Gold, J. I., Shadlen, M. N., 2007. The neural basis of decision making. Annual Review of Neuroscience 30, 535–74.Goodman, D., Brette, R., 2008. Brian: a simulator for spiking neural networks in python. Frontiers in neuroin-formatics 2.Green, D., Swets, J., 1966. Signal detection theory and psychophysics. Vol. 1974. Wiley, New York, USA.Haefner, R. M., Gerwinn, S., Macke, J. H., Bethge, M., 2013. Inferring decoding strategies from choice probabilitiesin the presence of correlated variability. Nature Neuroscience 16 (2), 235–242.Hastie, T., Tibshirani, R., Friedman, J., 2009. The elements of statistical learning. Springer Verlag, New York,USA. Typically the sum of all neural activities, in the simplest instantiation of the model. Hern´andez, A., Zainos, A., Romo, R., 2000. Neuronal correlates of sensory discrimination in the somatosensorycortex. Proceedings of the National Academy of Sciences of the United States of America 97 (11), 6191–6.Hoyle, D. C., 2011. Accuracy of pseudo-inverse covariance learning–a random matrix theory analysis. PatternAnalysis and Machine Intelligence, IEEE Transactions on 33 (7), 1470–1481.Luna, R., Hern´andez, A., Brody, C. D., Romo, R., 2005. Neural codes for perceptual discrimination in primarysomatosensory cortex. Nature Neuroscience 8 (9), 1210–9.Mountcastle, V., Steinmetz, M. A., Romo, R., 1990. Frequency discrimination in the sense of flutter: psychophys-ical measurements correlated with postcentral events in behaving monkeys. Journal of Neuroscience 10 (9),3032–3044.Nienborg, H., Cumming, B. G., 2009. Decision-related activity in sensory neurons reflects more than a neuron’scausal effect. Nature 459 (7243), 89–92.Price, N. S. C., Born, R. T., Oct. 2010. Timescales of sensory- and decision-related activity in the middle temporaland medial superior temporal areas. Journal of Neuroscience 30 (42), 14036–45.Raudys, S., Duin, R., 1998. Expected classification error of the fisher linear classifier with pseudo-inverse covariancematrix. Pattern Recognition Letters 19 (5), 385–392.Romo, R., Salinas, E., 2003. Flutter discrimination: neural codes, perception, memory and decision making.Nature Reviews Neuroscience 4 (3), 203–18.Shadlen, M. N., Britten, K. H., Newsome, W. T., Movshon, A. J., 1996. A computational analysis of the relation-ship between neuronal and behavioral responses to visual motion. Journal of Neuroscience 76 (4), 1486–1510.Shadlen, M. N., Newsome, W. T., 1998. The variable discharge of cortical neurons: implications for connectivity,computation, and information coding. Journal of Neuroscience 18 (10), 3870–3896.Stanford, T. R., Shankar, S., Massoglia, D. P., Costello, M. G., Salinas, E., 2010. Perceptual decision making inless than 30 milliseconds. Nature Neuroscience 13 (3), 379–385.Talbot, W., Darian-Smith, I., Kornhuber, H., Mountcastle, V., 1968. The sense of flutter-vibration: comparisonof the human capacity with response patterns of mechanoreceptive afferents from the monkey hand. Journal ofNeurophysiology 31 (2).Uka, T., DeAngelis, G. C., 2003. Contribution of middle temporal area to coarse depth discrimination: comparisonof neuronal and psychophysical sensitivity. Journal of Neuroscience 23 (8), 3515–30.Werner, G., Mountcastle, V., 1965. Neural activity in mechanoreceptive cutaneous afferents: Stimulus-responserelations, weber functions, and information transmission. Journal of Neurophysiology 28 (2).Wohrer, A., Humphries, M. D., Machens, C., 2012. Population-wide distributions of neural activity during per-ceptual decision-making. Progress in Neurobiology.Wohrer, A., Romo, R., Machens, C., 2010. Linear readout from a neural population with partial correlation data.No. 1. pp. 2–10. SUPPLEMENTARY INFORMATION
S1 Encoding network
We detail here the architecture of the artificial encoding network used to test our method (summarizedin section 3.1 from the main text). This ad-hoc network was designed to display some classic featuresof sensory cortical neurons involved in perceptual decision-making tasks (e.g, V2, MT, S1, S2. . . ). Toreproduce the diversity of response naturally observed at the population level (Wohrer et al., 2012),neurons in our network have broadly distributed firing rates, and some diversity in their temporal responseprofiles. We also wished to reproduce the continuum of tuning to stimulus observed in real populations,where some neurons have positive tuning to stimulus (rate increase when f increases), and other neuronshave negative tuning. Finally, we wished to reproduce realistic strengths of noise correlations betweenneurons in the population (Figure 3b from the main text), and insure that the tunings of each pair ofneurons (their “signal” correlation) be only slightly predictive of their noise correlation—another featureoften observed in real sensory populations (Wohrer et al., 2012).The network consists of two distinct layers of spiking neurons, of which only the second layer (encodinglayer) is “visible” to the experimenter. The first layer (L1) consists of 100 = 2 ×
50 independent Poissonneurons, whose firing intensity f constitutes the stimulus encoded by the second layer. On each trial, f takes one of three possible values 25, 30 and 35 Hz. All neurons are equivalent, but segregated in twodistinct populations according to their projections on the second layer. The Poisson firing constitutesthe only source of randomness in the network from trial to trial.The second layer (L2) consists of 500 leaky integrate-and-fire (LIF) neurons, some of which receiveinput from L1, and who are all coupled through a sparse, balanced connectivity. The generic equationfor these neurons writes τ dV ( s ) i dt = (cid:88) j ∈ L1 W (1 ,s ) ji δ ( t − t j ) + (cid:88) k ∈ L2 W (2) ki δ ( t − t k − ∆ ki ) + I ( s ) − ( V ( s ) i ( t ) − V rest ) . The neuron emits a spike at each time t i when V ( s ) i reaches threshold V thr , after what the neuron’spotential is reinitialized at resting value V rest . All neurons share the same membrane time constant τ = 20 msec, threshold V thr = −
50 mV, and resting potential V rest = −
60 mV. Upper index s denotesone of three possible subtypes of neurons in L2: Positively-biased neurons ( s = p , 100 neurons), negatively-biased neurons ( s = n , 100 neurons) and unbiased neurons ( s = u , 300 neurons).Positively-biased neurons receive sparse excitatory connections from 50 neurons in L1 ( W (1 ,p ) ji ≥ W (1 ,n ) ji ≤ W (1 ,u ) ji = 0). As these asymmetriescreate biases in the total synaptic inputs to each type of cell, the intrinsic currents I ( p ) , I ( n ) and I ( u ) also vary depending on neuron subtype, to insure homogeneous firing properties inside the three popu-lations (see Table 1). Finally, all L2 neurons are connected through a single matrix W (2) of recurrentconnections—independently of their subtype. All connection matrices W (1 ,s ) and W (2) are sparse with(Erd¨os-Renyi) connection probability p = 0 .
2. Non-zero connection strengths are picked uniformly in aninterval [ w min , w max ], which depends on the connection considered: see Table 1. Note that L2 recurrentconnections can be both excitatory and inhibitory, a departure from biology which allows for an easierimplementation.Finally, the recurrent connections in L2 are associated to synaptic delays: for each pair ( i, k ) of con-nected L2 neurons, the random delay ∆ ki is drawn uniformly between 0 and 5 msec. This substantiallyincreases the diversity of neural responses in the population, particularly at the level of JPSTHs (Figure3e from the main text)—this is interesting because our method is specifically designed to analyse generic,3Subtype I ( s ) w (1 ,s )min w (1 ,s )max w (2)min w (2)max Pos. biased ( p ) 0 0 2 -2 2Neg. biased ( n ) 14 -3 0 -2 2Unbiased ( u ) 5 0 0 -2 2 Table 1.
Connectivity parameters in the three subtypes of L2 neurons. All values are expressed inmillivolts.heterogeneous population activities.We implemented and simulated the network using Brian, a spiking neural network simulator in Python(Goodman and Brette, 2008). Our simulation consisted of many successive epochs of 500 msec with allpossible successions of the three stimulus values f (as in Figure 1a from the main text). Since the inputPoisson neurons were always firing close to 30 Hz, there was no strong transient at stimulus onset as isoften observed in real sensory neurons. In our case, the change of activity between two successive stimuliwas always only differential, and rather weak (see Figure 3c from the main text).4 S2 Singular value analysis