[PDF] A new inference approach for training shallow and deep generalized linear models of noisy interacting neurons

Abstract

Generalized linear models are one of the most efficient paradigms for predicting the correlated stochastic activity of neuronal networks in response to external stimuli, with applications in many brain areas. However, when dealing with complex stimuli, the inferred coupling parameters often do not generalize across different stimulus statistics, leading to degraded performance and blowup instabilities. Here, we develop a two-step inference strategy that allows us to train robust generalized linear models of interacting neurons, by explicitly separating the effects of correlations in the stimulus from network interactions in each training step. Applying this approach to the responses of retinal ganglion cells to complex visual stimuli, we show that, compared to classical methods, the models trained in this way exhibit improved performance, are more stable, yield robust interaction networks, and generalize well across complex visual statistics. The method can be extended to deep convolutional neural networks, leading to models with high predictive accuracy for both the neuron firing rates and their correlations.

Full PDF

AA new inference approach for training shallow anddeep generalized linear models of noisy interactingneurons

Gabriel Mahuas , Giulio Isacchini , Olivier Marre , Ulisse Ferrari

2, *, ‡ , Thierry Mora ‡ Laboratoire de physique de l’École normale supérieure, PSL University, CNRS, SorbonneUniversity, and University of Paris, 75005 Paris, France Institut de la Vision, Sorbonne University, INSERM, CNRS, 75012 Paris, France Institute of Science and Technology Austria, 3400 Klosterneuburg, Austria Max Planck Institute for Dynamics and Self-organization, 37077 Göttingen, Germany * [email protected] ‡ These authors contributed equally to this work

Abstract

Generalized linear models are one of the most efﬁcient paradigms for predicting thecorrelated stochastic activity of neuronal networks in response to external stimuli,with applications in many brain areas. However, when dealing with complexstimuli, their parameters often do not generalize across different stimulus statistics,leading to degraded performance and blowup instabilities. Here, we develop a two-step inference strategy that allows us to train robust generalized linear models ofinteracting neurons, by explicitly separating the effects of stimulus correlations andnoise correlations in each training step. Applying this approach to the responses ofretinal ganglion cells to complex visual stimuli, we show that, compared to classicalmethods, the models trained in this way exhibit improved performance, are morestable, yield robust interaction networks, and generalize well across complex visualstatistics. The method can be extended to deep convolutional neural networks,leading to models with high predictive accuracy for both the neuron ﬁring ratesand their correlations.

The pioneering work of J.W. Pillow and colleagues [1] showed how the Generalized Linear Model(GLM) can be used for predicting the stochastic response of neurons to external stimuli. Thanks toits versatility [2], high performance, and easy inference, the GLM has become one of the referencemodels in computational neuroscience. Nowadays, its applications range from retinal ganglion cells[1], to neurons in the LGN [3], visual [4], motor [5], parietal [6] cortices, as well as other brainregions [7, 8]. However, the GLM has also shown some signiﬁcant limitations that has preventedits application to an even wider spectrum of contexts. In particular, the GLM shows unsatisfyingperformance when applied to the response to complex stimuli with spatio-temporal correlations muchstronger than white noise, as for example naturalistic images [9] or videos [10].A ﬁrst limitation is that the inferred parameters depend on the stimulus used for training. This happensnot only for the part of the model that deals with the external stimulus, which typically suffers a changein the stimulus statistics, but also for the couplings parameters quantifying interactions between theneurons of the network. However, if these couplings are to reﬂect an underlying network of biologicalinteractions, they should be stimulus independent. In addition, and as we show in this paper, this lack

Preprint. Under review. a r X i v : . [ c ond - m a t . d i s - nn ] J un f generalizability comes with errors in the prediction of correlated noise between neurons. Thisissue can strongly limit the application of GLM for unveiling direct synaptic connections between therecorded neurons [11] and for estimating the impact of noise correlations in information transmission[1].A second issue is that the GLM can be subject to uncontrollable and unnatural self-excitation transients[12, 13, 14]. During these strong and positive feedback loops, the network’s past activity may driveits current state to excitations above naturalistic levels, in turn activating neurons in subsequenttime steps and resulting in a transient of very high, unrealistic activity. This problem limits theuse of the GLM as a generative model—it is often necessary to remove those self-excitation runsby hand. Ref. [12] proposed an extension of the GLM that also includes quadratic terms limitingself-excitations of the network, but this comes at the price of more ﬁtting parameters and harderinference. Ref. [13] showed that a GLM that predicts the responses several time-steps ahead in time[15] limits self-excitation, but this implies higher computational complexity and the risk of missingﬁne temporal structures. Alternatively, Ref. [14] proposed an approximation to estimate the stabilityof the inferred GLM model, and then used a stability criterion to constrain the parameter space overstable models. However the resulting models are sub-optimal, with degraded performance.Thirdly, because neuronal responses are highly non-linear and hard to model for complex stimuli,the GLM fails to predict those responses correctly, even for early visual areas such as the retina [10].Recently deep convolutional neural networks (CNNs) have been shown to outperform the GLM atpredicting individual neuron mean responses [9, 16, 17, 18]. Compared to the GLM, these deepCNNs beneﬁt from a more ﬂexible and richer network architecture allowing for strong performanceimprovements [9]. However, the GLM retains an advantage over CNNs: thanks to the couplingsbetween neurons in the same layer, it can account for both shared noise across the population andself-inhibition due to refractoriness. This feature, which is missing from deep CNNs [9], can be usedto study how noise correlated in space and time impacts the population response [1]. A joint modelcombining the beneﬁts of the deep architecture of CNNs and the neuronal couplings of the GLM isstill lacking. It would allow for a more detailed description of the neuronal response to stimulus.In this paper we develop a two-step inference strategy for the GLM that solves these three issues. Weapply it to recordings in the rat retina subject to different visual stimulations. The main idea is touse the responses to a repeated stimulus to infer the GLM couplings without including the stimulusprocessing component. Then, in a second, independent step, we infer the parameters of the modelpertaining to stimulus processing. Our approach allows for a wide variety of architectures, includingdeep CNNs. Finally, we introduce an approximation scheme to put together the two inference resultsinto a single model that can predict the joint network response from the stimulus. Retinal ganglion cells of a long-evans rat were recorded through a multi-electrode array experiment[19, 20] and spike-sorted with

SpyKING CIRCUS [21]. Cell activity was stimulated with one unre-peated and two repeated videos of checkerboard (white-noise) and moving bars. For the checkerboard,we used the unrepeated ( s ) and one of the two repeated videos ( s in total for repetitions)for training, and the second repeated video for testing ( s in total for 120 repetitions). Similarly,for the moving bar video we used the unrepeated ( s ) and one of the two repeated videos ( s in total for 50 repetitions) for training, and the second repeated video for testing ( s in total for 50repetitions). In addition, we also recorded responses from a full-ﬁeld movie with naturalistic statistics[19].After sorting, we applied a spike-triggered average analysis to locate the receptive ﬁelds of each cell.Then, we used the response to full-ﬁeld stimulation to cluster cells into different cell-types. In thiswork we focus on a population of N = 25 OFF Alpha retinal ganglion cells, which tile the visual ﬁeldthrough a regular mosaic. The responses to both checkerboard and moving bar stimulations showedstrong correlations, which we decompose into the sum of stimulus and noise correlations. Stimuluscorrelations are correlations between the cell mean responses (Peristimulus time histogram or PSTH).They are large only for the bar video, mostly because the video itself has strong and long-rangedcorrelations. Noise correlations, on the other hand, are due to shared noise from upstream neuronsand gap junctions between cells in the same layer [22], and mostly reﬂect the architecture of the2nderlying biological network. Consistently, noise correlations were similar in the response of thetwo stimulations. In Suppl. sect. S1 we present additional statistics of the data.

In our Poisson GLM framework, n i ( t ) , the number of spikes emitted by cell i in the time-bin t ofduration dt = 1 . ms , follows a Poisson distribution with mean λ i ( t ) : n i ( t ) ∼ Pois ( λ i ( t )) . Thevector of the cells’ ﬁring rate { λ i ( t ) } Ni =1 , with N = 25 is then estimated as λ i ( t ) = exp (cid:8) h i stim ( t ) + h i int ( t ) (cid:9) , (1)where h i int ( t ) = (cid:88) j (cid:88) τ J ij ( τ ) n j ( t − τ ) (2)accounts for both past ﬁring history of cell i itself and the contribution coming from all other cells inthe network: J ii are the spike-history ﬁlters, whereas J i (cid:54) = j are coupling ﬁlters. Both integrate thepast up to ms. h i stim ( t ) is a contribution accounting for stimulus drives, which takes the form of alinear spatio-temporal convolution in the classical GLM: h i stim ( t ) = (cid:88) τ (cid:88) xy K x,y ( τ ) S x,y ( t − τ ) , (3)where S x,y ( t ) is the stimulus movie at time t , { x, y } being the pixel coordinates and K x,y ( τ ) is alinear ﬁlter that integrates the past up to ms. Later in the paper, we will go beyond this classicalarchitecture and will allow for deep, non-linear architectures.In order to regularize couplings and spike-history ﬁlters during the inferences, we projected theirtemporal part over a raised cosine basis [1] of 4 and 7 elements respectively, and added an L -regularization = 0 . , which we kept the same for all the inferences. In addition, we imposed anabsolute refractory period of τ i refr time-bins during simulations and consequently J ii ( τ ) were set tozero for τ ≤ τ i refr . In order to lower its dimension, the temporal behavior of stimulus ﬁlter K x,y ( τ ) was projected on a raised cosine basis with 10 element. In addition we included an L1 regularizationover the basis weights and a L2 regularization over the spatial Laplacian to induce smoothness.All the inferences were done by log-likelihood (log- (cid:96) ) maximization with Broyden-Fletcher-Goldfarb-Shanno (BFGS) method, using the empirical past spike activity during training [1]. For easycomparison, all the performances discussed below are summarized in Table 1. We inferred the GLM by whole log- (cid:96) maximization from both the response to the checkerboardand moving bar non-repeated stimulations, and then simulated its response to the repeated videos(Fig. 1). Consistent with [1], in the case of the checkerboard stimulus, the model can predict withhigh accuracy the PSTH of all cells (Fig. 1A, mean Pearson’s ρ = 0 . ± . ). It also reproducesthe values of the zero-lag (17 ms window) noise correlations for all cell pairs (Fig. 1B, coefﬁcient ofdetermination CoD = 0 . ), and the temporal structure of noise cross-correlations (Fig. 1C).The model performance is very degraded for the moving bars video—a stimulus characterised by long-range correlations. The model reproduces the empirical PSTH with rather good accuracy (Fig. 1D, ρ = 0 . ± . ) and shows fair overall accuracy on the noise correlations (Fig. 1E, CoD = 0 . ).However it overestimates the value of the noise correlation for certain distant cell pairs (Fig. 1E&F).A closer look reveals that the model overestimates noise correlations for pairs of cells that are stronglystimulus-correlated (Fig. 1G). Here the error in the estimates is normalized over the empirical valueof the noise correlations with a cut-off at three standard deviations. Interestingly, the effect is strongonly for the moving bar video, as stimulus correlations are small for checkerboard stimulation. Theseresults show that the inferred couplings of the GLM do not depend only on the correlated noiseamong the neurons, but can also be inﬂuenced by stimulus correlations. This prevents the inferredcouplings from generalizing across stimuli. In addition, we observed several self-excitation transientswhen simulating the GLM inferred from the moving-bars stimulus ( of the time, in of therepetitions, Fig.1H, versus for the model inferred from the checkerboard stimulus). This effect is3igure 1: GLM fails to predict noise correlations in the presence of strong stimulus correla-tions.

A) PSTH prediction for the response of an example cell to checkerboard stimulation. Inset:histogram of the model performance (Pearson correlation between empirical and model PSTH) for allcells in the population. B) Empirical and model predicted noise correlations versus distance betweenthe cells. Inset: scatterplot. C) Empirical and model predicted noise cross-correlation betweena nearby and a distant example cells. D,E,F) same as A,B,C, but for the response to moving barstimulation. Note that the model overestimates noise correlations between certain pairs of distantcells. G) Error in the prediction of noise correlations normalized over their empirical value versus theempirical value of stimulus correlations. H) Population ﬁring rate in time during model simulationsof the responses to the moving bar stimulus. Note the transient of unnatural high activity due toself-excitation within the model.probably the consequence of the over-estimation of those cell-to-cell couplings in the moving-barsstimulus, which drive the over-excitation of the network.All these issues can be ascribed to the fact that by maximising the whole log- (cid:96) over all the parameterssimultaneously, the GLM mixes the impact of stimulus correlations with neuronal past activity. In thenext section we develop an inference strategy that disentangles stimulus from noise correlations andinfer their parameters independently.

In order to disentangle the inference of the couplings between neurons from that of the stimulusﬁlters, we split the model training into two independent steps. We name this approach “two-step”inference (Fig. 2).

Filter inference.

We run the inference of a GLM model without the couplings between neurons orwith themselves (spike-history ﬁlter) using the responses to the unrepeated stimulus (single-neuron4 x Stimulusfilter NonlinearityPost-spike filterNoiseCouplingfiltersFrom other cells To other cells S t i m u l u s e x Stimulusfilter Nonlinearity Noisee x Auxiliaryparam. NonlinearityPost-spike filterNoiseCouplingfiltersFrom other cells To other cells hstim(t) S t i m u l u s e x Stimulus NonlinearityPost-spike filterNoiseCouplingfiltersFrom other cells To other cells S t i m u l u s A B C filter

Figure 2:

Two-step inference of couplings and spike-history ﬁlters.

A) Whole log- (cid:96) maximization[1] trains couplings and spike-history ﬁlters together with the stimulus ﬁlter. B) Two-step inferencetrains couplings ﬁlters and stimulus ﬁlters by running two independent log- (cid:96) maximizations. Top:we remove coupling ﬁlters and infer the equivalent of an LNP model for each cell. Bottom: we runan inference over repeated data where we add auxiliary variables (instead of the stimulus ﬁlter) toexactly enforce the PSTH prediction. C) We build together the model by using the previously inferredparameters. A correction needs to be added (not shown, see text).linear-nonlinear Poisson (LNP) models [23], Fig. 2B). This inference allows us to predict the meanﬁring rate λ of each neuron. Coupling inference.

We run a log- (cid:96) maximization inference over the response to a repeated videostimulation. Instead of inferring the parameters of a stimulus ﬁlter ( K x,y ( τ ) in Eq. 3), we treatthe terms h i stim ( t ) of Eq. 1 as auxiliary parameters that we infer directly from data (Fig. 2B). Thelog- (cid:96) derivative over these parameters is proportional to the difference between empirical and model-predicted PSTH. As a consequence, and thanks to repeated data, the addition of these parametersallows for enforcing the value of the PSTH exactly when the corresponding log- (cid:96) gradient vanishes.In this way, stimulus correlations are perfectly accounted for, and the couplings only reﬂect correlatednoise between neurons. As for the GLM inferred with whole log- (cid:96) maximization, we imposed anabsolute refractory period of τ i refr time-bins during simulations and thus set J ii ( τ ) to zero for τ ≤ τ i refr . Full model.

Once couplings and stimulus ﬁlters are inferred, we can combine them to build up thefull model (Fig. 2C). This cannot be done straightforwardly because the addition of the couplingswill change the ﬁring rate prediction of LNP model. To correct for this effect, we subtract the meancontribution of the coupling term: h i int ( t ) → h i int ( t ) − (cid:104) h i int ( t ) (cid:105) noise ∼ Pois . This correction is equivalentto modify Eq. 2 into (cid:88) j (cid:88) τ J ij ( τ ) n j ( t − τ ) → (cid:88) j (cid:88) τ J ij ( τ ) (cid:16) n j ( t − τ ) − λ i ( t − τ ) (cid:17) . (4)Lastly, in order to account for the addition of absolute refractory periods, we added a term (cid:80) τ i refr τ =1 λ i ( t − τ ) for each neurons (Suppl. Sect. S2). To compute all the corrections, we there-fore only need the past ﬁring rates λ i ( t ) of all neurons in the absence of the couplings, which aregiven by the LNP model predictions. This allows the full model to predict the neuronal response tounseen (testing) data.We ﬁrst applied our two-step inference to the response to checkerboard stimulation and obtainedvery similar results to whole log- (cid:96) maximization (Table 1). By constrast, performance was improvedin the case of the moving bar stimulus (Fig. 3). The two inference approaches yielded similarperformances for the PSTH (Fig. 3A, ρ = 0 . ± . , versus ρ = 0 . ± . ), but for noisecorrelations we obtained much better results (Fig. 3B, CoD = 0 . , versus CoD = 0 . ). In particular,the model avoids the overestimation the noise correlations for distant pairs (Fig. 3B&C) that weobtained with whole log- (cid:96) maximization (Fig. 1E&F). With the two-step inference, the strong stimuluscorrelations of the moving bar video do not affect the model inference as was the case for wholelog- (cid:96) maximization (Fig. 3D). In addition the model is much more stable, and we never observed self-excitation for either stimulus when simulating the model (Fig.3E, versus of the time, Fig. 1H).In Table 1 we report all the performance for the different cases.5igure 3: Two-step inference retrieves noise correlations independently of the strong stimuluscorrelations in the moving bar video.

A) PSTH prediction for an example cell. Inset: histogramof the model performance for all cells. B) Empirical and model-predicted noise correlations versusdistance between the cells. Inset: scatterplot. C) Empirical and model predicted noise cross-correlation between example pairs of nearby and distant cells. D) Normalized error in the predictionof noise correlations plotted versus the empirical value of the stimulus correlations. E) Populationactivity during model simulation shows no self-excitation transients.

So far we have shown how our two-step approach can disentangle the inference of neuronal couplingsfrom stimulus correlations. If these couplings are only due to network effects, one should expectthem to generalize across stimulus conditions. To test for this, we run model simulations of onestimulus using its stimulus ﬁlter and the coupling ﬁlters inferred from the other. For the checkerboardmovie (Fig. 4), and compared to the case where couplings are inferred on the same stimulus, with ourtwo-step inference we obtained performances that are almost equal for the PSTH ( ρ = 0 . ± . ,versus ρ = 0 . ± . ) and rather good for noise correlations (CoD = 0 . , versus CoD = 0 . ).In addition, we never observed self-excitation (Fig. 4D). By contrast, when we used the couplingsinferred by whole log- (cid:96) maximization, self-excitation happens so often ( of the time in ofthe repetitions) that we were not able to estimate the model performance (Fig. 4E).For the moving bar video (Fig. S2), our two-step inference yielded performances similar to the casewhere couplings were inferred on the same stimulus (Table 1). Using the couplings inferred bywhole log- (cid:96) maximization instead, the model performance decreased for the PSTH ( ρ = 0 . ± . ,versus ρ = 0 . ± . ), and improved for noise correlations (CoD = 0 . , versus CoD = 0 . ). Inconclusion, two-step outperforms whole log- (cid:96) maximization on both stimuli (Table 1). Our two-step inference decomposes the model training into two independent components, one forthe stimulus processing and one for network effects. In the previous experiments we still used alinear convolution to process the stimulus, but thanks to this decomposition, we can also consider anymachine capable of predicting the neurons ﬁring rates { λ i ( t ) } Ni =1 . In order to predict the responseto checkerboard stimulation with higher accuracy, we inferred a deep, time-distributed CNN, aspecial case of CNNs [9] with the additional constraint that the weights of the convolutional layersare shared in time [24]. In our architecture, two time-distributed convolutional layers are followed6igure 4: Two-step inference allows for generalizing across stimulus ensembles

A,B,C,D) Sim-ulation of the checkerboard responses for a model where stimulus ﬁlters were inferred from theresponse to checkerboard, and couplings ﬁlter were inferred from the moving bar data with ourtwo-step inference. A) PSTH predictions. B) Noise correlations. C) Noise cross-correlation. D)Population activity showed no self-excitation transients E) Simulation of checkerboard responseswhen couplings ﬁlters are those inferred from moving bar data with whole log- (cid:96) maximization. Themodel shows self-excitation during all runs.Figure 5:

Deep CNN can be included in our two-step approach to improve model performance

A) Architecture of our deep, time-distributed CNN. B) PSTH prediction for the response of anexample cell to checkerboard stimulation. Inset: histogram of model performance for all cells. C)Empirical and model predicted noise correlations versus distance between cells. Inset: scatterplot.7heckerboard stimulus Moving bars stimulusPSTH noise-corr. self-exc. PSTH noise-corr. self-exc.whole log- (cid:96) maximization . ± .

05 0 .

94 0% 0 . ± . two-stepapproach . ± .

05 0 .

95 0% 0 . ± . coupl. exchangemax log (cid:96) unstable unstable

93% 0 . ± . coupl. exchangetwo-step . ± .

05 0 .

84 0% 0 . ± . CNN . ± .

04 0 .

93 0% — — —Table 1:

Model performance for different inference approaches.

We computed Pearson’s correla-tion coefﬁcients between empirical and model predicted ﬁring rate (PSTH). For noise correlations, weestimated the CoD between model predictions and data. The third and forth rows refer to simulationsthat use the coupling ﬁlters inferred from the other stimulus.by a max-pooling and eventually by two dense layers that output the ﬁring rate λ i ( t ) (Fig. 5A).After training, we included the model in our two-step inference to build a model with both a deeparchitecture for the stimulus component, and a network of coupling ﬁlters.The model shows higher performance in predicting the PSTH: ρ = 0 . ± . , versus ρ = 0 . ± . and ρ = 0 . ± . , when compared to our previous models (Fig. .5B). In addition, the model wascapable of predicting noise correlations with high accuracy (Fig. .5C, CoD = 0 . , versus CoD = 0 . and CoD = 0 . ). We also did not observe any self-excitation transient. In summary, the modelcombines the beneﬁts of deep networks with those of the GLM with its neuronal couplings.We summarise all the different model performances in Table 1. In this work we have studied the application of the GLM to the case of retinal ganglion cells subject tocomplex visual stimulation with strong correlations. We have shown how whole log- (cid:96) maximizationover all model parameters leads to inferring erroneous coupling ﬁlters that reﬂect stimulus correlations(Fig. 1G). This effect introduces spurious noise correlations when the model is simulated (Fig. 1E&F),prevents its generalization from one stimulus ensemble to another (Fig. 4E), and increases the chanceof having self-excitation in the network dynamics (Fig. 1G). This last issue poses a major problemwhen the GLM is used as a generative model for simulating spiking activity.To solve these issues we have proposed a two-step algorithm for inferring the GLM that takesadvantage of repeated data to disentangle the stimulus processing component from the couplingnetwork. A similar approach has been proposed in the context of maximum entropy models [25,26], and here we have fully developed it for the GLM. Our method prevents the rise of largecouplings reﬂecting strong stimulus correlations (Fig. 3D). The absence of these couplings lowers theprobability of observing self-excitation (Fig. 3E) and the inferred GLM does not predict spuriousnoise correlations (Fig. 3B&C). In addition, with our two-step inference the couplings are robust to achange of stimulus, and allows for generalizations (Fig. 4). In particular we showed that a modelwith the stimulus ﬁlter inferred from checkerboard data but couplings inferred from moving barstimulation predicts with high accuracy the response to checkerboard.The strongest drawback of using our method is the requirement of repeated data, which are notnecessary for whole log- (cid:96) maximization of GLM. This may limit the application of our inferenceapproach. However we emphasize that only s of repeated data were needed for inferring thecouplings. In addition, another possibility that deserves to be tested is the use of spontaneous activityinstead of repeated stimuli. For the retina, this activity can be recorded while the tissue is exposed toa static full-ﬁeld image (blank stimulus). However, as spontaneous activity is usually very low, theserecordings need to be long enough to measure correlations with high precision.8nother important contribution of our work is the possibility to easily include deep CNNs intothe GLM to increase its predicting power. Deep CNNs represent today one of the best options formodelling and predicting the mean response of sensory neurons to complex stimuli such as naturalisticones [9, 16, 17, 18]. However, a generative model for predicting neuronal correlated noise with deepCNN is still lacking: building a deep network that would take as an input both the stimulus and thepast activity of the neural population would be very challenging, since it would have to deal with veryheterogeneous inputs. Our framework solves this problem by separating the CNN inference from thatof coupling and spike-history ﬁlters, and can thus be easily added on an already inferred CNN.The GLM has been used to estimate the impact of correlated noise on information transmission, butmostly for stimuli with low complexity [1, 27]. Future works can apply our method to model theresponses to complex stimulations and study its impact on stimulus encoding.9 roader Impact In this work we present a computational advance to improve the inference and performance of theGLM. As the GLM is one of the most used models in computational neuroscience, we believe thatmany researchers can beneﬁt from this work to advance in their investigations. The ﬁght againstblindness, which affects about 45 millions people worldwide, is one of such possible applications.Retinal prostheses, where an array of stimulating electrodes is used to evoke activity in neurons, are apromising solution currently under clinical investigation. A central challenge for such implants isto improve the information that is sent to the brain. A central challenge for retinal implants is thusto mimic the computations carried out by a healthy retina to optimize information sent to the brain.Modeling retinal processing could thus help optimize vision restoration strategies in the long term[28].We believe that no one will be put at disadvantage from this research, that there are no consequencesof failure of the system. Biases in the data do not apply to the present context.

Acknowledgments

We thank M. Chalk and G. Tkacik for useful discussion. This work was supported by ANR TRAJEC-TORY and DECORE, by the European Union’s Horizon 2020 research and innovation programmeunder grant agreement No 785219, No. 785907 (Human Brain Project SGA2) and No. 639888,a grant from AVIESAN-UNADEV to OM, by the French State program Investissements d’Avenirmanaged by the Agence Nationale de la Recherche [LIFESENSES: ANR-10-LABX-65], with thesupport of the Programme Investissements d’Avenir IHU FOReSIGHT (ANR-18-IAHU-01) and byAgence Nationale de la Recherche grant ANR-17-ERC2-0025-01 “IRREVERSIBLE”.

References [1] J.W. Pillow, J. Shlens, L. Paninski, A. Sher, A.M. Litke, E. J. Chichilnisky, and E.P. Simoncelli.

Spatio-temporal correlations and visual signalling in a complete neuronal population . Nature , :995–999, 2008.[2] Alison I Weber and Jonathan W Pillow. Capturing the dynamical repertoire of single neuronswith generalized linear models.

Neural computation , 29(12):3260–3289, 2017.[3] Baktash Babadi, Alexander Casti, Youping Xiao, Ehud Kaplan, and Liam Paninski. A general-ized linear model of the impact of direct and indirect inputs to the lateral geniculate nucleus.

Journal of Vision , 10(10):22–22, 2010.[4] Subhodh Kotekal and Jason N MacLean. Recurrent interactions can explain the variance insingle trial responses.

PLOS Computational Biology , 16(1):e1007591, 2020.[5] Wilson Truccolo, Uri T Eden, Matthew R Fellows, John P Donoghue, and Emery N Brown. Apoint process framework for relating neural spiking activity to spiking history, neural ensemble,and extrinsic covariate effects.

Journal of neurophysiology , 93(2):1074–1089, 2005.[6] Il Memming Park, Miriam LR Meister, Alexander C Huk, and Jonathan W Pillow. Encodingand decoding in parietal cortex during sensorimotor decision-making.

Nature neuroscience ,17(10):1395, 2014.[7] Caroline A Runyan, Eugenio Piasini, Stefano Panzeri, and Christopher D Harvey. Distincttimescales of population coding across cortex.

Nature , 548(7665):92–96, 2017.[8] Rajeev V Rikhye, Aditya Gilra, and Michael M Halassa. Thalamic regulation of switch-ing between cortical representations enables cognitive ﬂexibility.

Nature Neuroscience ,21(12):1753–1763, 2018.[9] L. McIntosh, N. Maheswaranathan, A. Nayebi, S. Ganguli, and S. Baccus. Deep learningmodels of the retinal response to natural scenes. In

Advances in Neural Information ProcessingSystems , pages 1361–1369, 2016.[10] A. Heitman, N. Brackbill, M. Greschner, A. Sher, A. M. Litke, and E.J. Chichilnisky. Testingpseudo-linear models of responses to natural scenes in primate retina. bioRxiv , 2016.1011] Ryota Kobayashi, Shuhei Kurita, Anno Kurth, Katsunori Kitano, Kenji Mizuseki, MarkusDiesmann, Barry J Richmond, and Shigeru Shinomoto. Reconstructing neuronal circuitry fromparallel spike trains.

Nature communications , 10(1):1–13, 2019.[12] Il Memming Park, Evan W Archer, Nicholas Priebe, and Jonathan W Pillow. Spectral meth-ods for neural characterization using generalized quadratic models. In

Advances in neuralinformation processing systems , pages 2454–2462, 2013.[13] David Hocker and Il Memming Park. Multistep inference for generalized linear spikingmodels curbs runaway excitation. In , pages 613–616. IEEE, 2017.[14] Felipe Gerhard, Moritz Deger, and Wilson Truccolo. On the stability and dynamics of stochasticspiking neuron models: Nonlinear hawkes process and point process glms.

PLoS computationalbiology , 13(2), 2017.[15] Jose C Principe and Jyh-Ming Kuo. Dynamic modelling of chaotic time series with neuralnetworks. In

Advances in neural information processing systems , pages 311–318, 1995.[16] Santiago A Cadena, George H Denﬁeld, Edgar Y Walker, Leon A Gatys, Andreas S Tolias,Matthias Bethge, and Alexander S Ecker. Deep convolutional models improve predictions ofmacaque v1 responses to natural images.

BioRxiv , page 201764, 2018.[17] Hidenori Tanaka, Aran Nayebi, Niru Maheswaranathan, Lane McIntosh, Stephen Baccus, andSurya Ganguli. From deep learning to mechanistic understanding in neuroscience: the structureof retinal prediction. In

Advances in Neural Information Processing Systems , pages 8535–8545,2019.[18] Santiago A Cadena, Fabian H Sinz, Taliah Muhammad, Emmanouil Froudarakis, Erick Cobos,Edgar Y Walker, Jake Reimer, Matthias Bethge, Andreas Tolias, and Alexander S Ecker. Howwell do deep neural networks trained on object recognition characterize the mouse visualsystem? 2019.[19] S. Deny, U. Ferrari, E. Mace, P. Yger, R. Caplette, S. Picaud, G. Tkaˇcik, and O. Marre.Multiplexed computations in retinal ganglion cells of a single type.

Nature communications ,8(1):1964, 2017.[20] O. Marre, D. Amodei, N. Deshmukh, K. Sadeghi, F. Soo, T. Holy, and M.J. Berry.

Mapping aComplete Neural Population in the Retina . Journal of Neuroscience , :1485973, 2012.[21] Pierre Yger, Giulia LB Spampinato, Elric Esposito, Baptiste Lefebvre, Stéphane Deny,Christophe Gardella, Marcel Stimberg, Florian Jetter, Guenther Zeck, Serge Picaud, et al.A spike sorting toolbox for up to thousands of electrodes validated with ground truth recordingsin vitro and in vivo.

ELife , 7:e34518, 2018.[22] I.H. Brivanlou, D.K. Warland, and M. Meister.

Mechanisms of Concerted Firing among RetinalGanglion Cells . Neuron , :527–539, 1998.[23] E.J. Chichilnisky. A simple white noise analysis of neuronal light responses. Network: Compu-tation in Neural Systems , 12(2):199–213, 2001.[24] Francois Chollet et al. Keras. https://keras.io , 2015.[25] E. Granot-Atedgi, G. Tkacik, R. Segev, and E. Schneidman. Stimulus-dependent maximumentropy models of neural population codes.

PLOS Computational Biology , 9(3):1–14, 2013.[26] U. Ferrari, S. Deny, M. Chalk, G. Tkaˇcik, O. Marre, and T. Mora. Separating intrinsic in-teractions from extrinsic correlations in a network of sensory neurons.

Physical Review E ,98(4):042410, 2018.[27] F. Franke, M. Fiscella, M. Sevelev, B. Roska, A. Hierlemann, and R.A. da Silveira. Structuresof neural correlation and how they favor coding.

Neuron , 89(2):409–422, 2016.[28] U. Ferrari, S. Deny, A. Sengupta, R. Caplette, J.-A. Sahel, D. Dalkara, S. Picaud, J. Duebel,and O. Marre. Optogenetic vision restoration with high resolution. arXiv preprint ,arXiv:1811.06866, 2018. To appear in PLOS Computational Biology.11

A B CD E

Check total corr B a r s t o t a l c o rr Distance (micron) T o t a l c o rr BarsdataCheckdata

Distance (micron) -0.100.10.20.30.40.50.6 S t i m u l u s c o rr Distance (micron) -0.0500.050.10.150.20.250.3 N o i s e c o rr Figure S1:

Stimulus and noise correlation in the retinal response

A) Mosaic for N = 25 OFFalpha cells. B) Scatterplot of total pairwise correlation between the spiking activity in response tocheckerboard and moving bars video. C) Total pairwise correlation versus cell distance D) Stimuluscorrelation versus cell distance E) Noise correlation versus cell distanceResponses to checkerboard and moving bars stimuli show different correlation patterns (Fig. S1).The moving bar video induces much stronger and long-ranged stimulus correlations, especially forcertain pairs of distant cells. On the contrary, noise correlations decrease smoothly with distance andare of similar magnitude in the two datasets.

S2 Correction for the absolute refractory period

As explained in the main text, when we add the two-step coupling ﬁlters to the LNP model, we needto correct the h iint by its mean, Eq.4. However this correction does not take into account the additionof an absolute refractory period. In fact, if we start with an LNP model with rate λ ( t ) , and we preventthe cell to spike if it has spiked in the previous τ i refr time-bins during simulations, then the model ratewill become a random variable itself with an average lower than λ ( t ) . In order to correct for thiseffect, we need ﬁrst to quantify the mean of n ( t ) , the spike-count at time t : E ( n ( t ) ) = E (cid:32) n ( t ) ∼ P ois ( λ ( t )) (cid:12)(cid:12)(cid:12) (cid:88) τ n ( t − τ ) = 0 (cid:33) = E (cid:16) n ( t ) ∼ P ois ( λ ( t )) (cid:17) Prob (cid:32)(cid:88) τ n ( t − τ ) = 0 (cid:33) ≈ E (cid:16) n ( t ) ∼ P ois ( λ ( t )) (cid:17) (cid:89) τ Prob ( n ( t − τ ) = 0 )= λ ( t ) (cid:89) τ exp {− λ ( t − τ ) } (5)where the approximation is valid under the hypothesis of small λ . By taking the log of Eq. 5, weobtain the correction term (cid:80) τ λ ( t − τ ) that needs to be added to h int ( t ) in order to correct for theaddition of the absolute refractory period. 12 Figure S2:

Generalization results for moving bar stimulus

Simulation of the moving bar responsesfor a model where stimulus ﬁlters were inferred from the response to moving bar and couplings ﬁlterwere inferred from the checkerboard data (opposite of Fig. 4) with whole log- (cid:96)(cid:96)