[PDF] Bayesian nonparametric analysis for the detection of spikes in noisy calcium imaging data

Abstract

Recent advancements in miniaturized fluorescence microscopy have made it possible to investigate neuronal responses to external stimuli in awake behaving animals through the analysis of intra-cellular calcium signals. An on-going challenge is deconvoluting the temporal signals to extract the spike trains from the noisy calcium signals' time-series. In this manuscript, we propose a nested Bayesian finite mixture specification that allows for the estimation of spiking activity and, simultaneously, reconstructing the distributions of the calcium transient spikes' amplitudes under different experimental conditions. The proposed model leverages two nested layers of random discrete mixture priors to borrow information between experiments and discover similarities in the distributional patterns of neuronal responses to different stimuli. Furthermore, the spikes' intensity values are also clustered within and between experimental conditions to determine the existence of common (recurring) response amplitudes. Simulation studies and the analysis of a data set from the Allen Brain Observatory show the effectiveness of the method in clustering and detecting neuronal activities.

Full PDF

BBayesian nonparametric analysis for the detection of spikesin noisy calcium imaging data

Laura D’Angelo *1 , Antonio Canale †1 , Zhaoxia Yu ‡2 , and Michele Guindani § 21 Dipartimento di Scienze Statistiche; Universit`a degli Studi di Padova Department of Statistics; University of California, Irvine

Abstract

Recent advancements in miniaturized ﬂuorescence microscopy have made it possible to investigateneuronal responses to external stimuli in awake behaving animals through the analysis of intra-cellularcalcium signals. An on-going challenge is deconvoluting the temporal signals to extract the spiketrains from the noisy calcium signals’ time-series. In this manuscript, we propose a nested Bayesianﬁnite mixture speciﬁcation that allows for the estimation of spiking activity and, simultaneously, re-constructing the distributions of the calcium transient spikes’ amplitudes under different experimentalconditions. The proposed model leverages two nested layers of random discrete mixture priors toborrow information between experiments and discover similarities in the distributional patterns of neu-ronal responses to different stimuli. Furthermore, the spikes’ intensity values are also clustered withinand between experimental conditions to determine the existence of common (recurring) response am-plitudes. Simulation studies and the analysis of a data set from the Allen Brain Observatory show theeffectiveness of the method in clustering and detecting neuronal activities.

Keywords:

Dirichlet process; Mixtures of Finite Mixtures; Model-based clustering; Nested Dirichletprocess; Spike and slab. * [email protected] † [email protected] ‡ [email protected] § [email protected] a r X i v : . [ s t a t . A P ] F e b I NTRODUCTION

In recent years, calcium imaging has become a popular technique to measure the neuronal activityin awake, freely moving and behaving animals over time. Due to the development of miniaturizedand ﬂexible microendoscopes for ﬂuorescence microscopy, this technique has allowed to study howindividual neurons and neuronal networks encode external stimuli and cognitive processes (Nakajima& Schmitt 2020, Li et al. 2015). Calcium ions generate intracellular signals that determine a largevariety of functions in all neurons (Grienberger & Konnerth 2012). The mechanism at the basis ofcalcium imaging is a physiological process of the cells: when a neuron ﬁres, calcium ﬂoods the celland produces a transient spike in its concentration. By using genetically encoded calcium indicators,which are ﬂuorescent molecules that react when binding to the calcium ions, it is possible to opti-cally measure the level of calcium by analyzing the observed ﬂuorescence trace. The outcome of thistechnique is a movie of time-varying ﬂuorescence intensities, from which the spike trains of the ob-servable neurons have to be extracted through a complex pre-processing phase. In general, this phaseis meant to deal with two issues: identifying the spatial location of each neuron in the optical ﬁeld,and deconvoluting the temporal signals to extract their spike trains. Several methods are employed toextract the ﬂuorescence traces, e.g. by using independent component analysis (Dombeck et al. 2010,Mukamel et al. 2009), non-negative deconvolution (Vogelstein et al. 2010) and non-negative matrixfactorization (Maruyama et al. 2014).The resulting processed data consist of a ﬂuorescent calcium trace for each observable neuron inthe targeted area (see Figure 3 for an example). The observed ﬂuorescence trace can be used as a proxyof cellular activity but, unfortunately, it does not represent exactly the target of the analysis. What isactually relevant to the study are the precise spike times and the intracellular calcium concentrationof the observable neurons when the animal is subjected to external stimuli (Vogelstein et al. 2009).Extracting the neuronal activity from these series is not trivial: the calcium imaging technology hasseveral limitations, including the presence of measurement noise, the nonlinearity between ﬂuores-cence transient and calcium concentration and the slow decay of the ﬂuorescence trace compared tothe underlying neuronal activity (Friedrich et al. 2017, Dana et al. 2019, Rose et al. 2014). Moreover,the large-scale of the time series introduces additional complexity to the analysis. Therefore, a preciseestimation of the spike times and amplitudes is a fundamental step toward the understanding of theneurons’ behavior.As a motivating application, here we consider a publicly available data set from the Allen BrainObservatory (Allen Brain Observatory 2017) of calcium imaging data obtained through two-photonsmicroscopy in behaving mice. This study is an extended in vivo survey of physiological activity in themouse visual cortex in response to a range of visual stimuli. Each mouse is placed in front of a screenwhere different types of visual stimuli are shown, while the mouse’s neuronal activity is recorded. Thestimuli vary from simple synthetic images such as locally sparse noise or static gratings, to complexnatural scenes and movies. The goal of the study is to investigate how neurons at different depths in thevisual areas respond to stimuli of different complexity. Speciﬁcally, each neuron in the visual cortex an be characterized by their receptive ﬁeld , i.e. the features of the visual stimulus that trigger thesignalling of that neuron. Hence, it is of critical interest to devise methods that allow inferring howthe neuronal response varies under the different types of visual stimuli. We expect that the neuronalactivity will vary across all the experimental settings, and that some variations in its intensity will beobserved based on the speciﬁc visual stimulus.Several approaches have been proposed to accurately and efﬁciently estimate the neuronal activityin calcium imaging data. For example, Friedrich & Paninski (2016) and Friedrich et al. (2017) haveproposed an on-line algorithm based on a lasso penalty to enforce sparsity of signal detection. Jewell &Witten (2018) and Jewell et al. (2019) have proposed using an L penalty in lieu of the L penalizationand an efﬁcient algorithm to identify the presence or absence of spikes. In a Bayesian framework,Pnevmatikakis et al. (2013) have proposed to conduct inference on spike trains by estimating posteriorprobabilities of a latent binary indicator of spike presence at each time point. However, the model inPnevmatikakis et al. (2013) does not explicitly assume sparsity of the spikes. Moreover, it is expectedthe rate and the distribution of spikes to be stimulus-dependent (Brenner et al. 2002), but none of theprevious approaches allows to take into account the heterogeneity of spikes’ behaviors as a function ofthe stimulus. As Figure 3 clearly shows for the Allen Brain Observatory data, the spikes’ intensitiesvary greatly according to the type of stimulus. See also the discussion in Shibue & Komaki (2020)where they employ a marked point processes for the deconvolution of calcium imaging data.In this manuscript, we introduce a coherent nested Bayesian ﬁnite mixture model that allows forthe estimation of spiking activity and, simultaneously , for reconstructing the distributions of spikesunder various experimental conditions; for example, in response to different types of visual stimuliin the Allen Brain Observatory data set. More speciﬁcally, our modeling framework estimates andclusters the distributions of the calcium transient spikes’ amplitudes via a nested formulation of thegeneralized mixture of ﬁnite mixtures (gMFM) prior recently proposed by Fr¨uhwirth-Schnatter et al.(2020). The proposed model further adopts the use of a common atom speciﬁcation as in Denti et al.(2020) for estimating the distribution of the spikes’ amplitudes under each experimental condition. Theproposed common atom gMFM has several advantages with respect to typical Bayesian nonparametricmodels for nested data. With respect to models based on Dirichlet process priors, the gMFM providesincreased ﬂexibility to estimate partitions characterized either by many, well-balanced, clusters or bya small set of large clusters. The common atom model allows to obtain nested inference on densitieswithout incurring in the degeneracy issues pointed out by Camerlenghi et al. (2019) for the widely usednested Dirichlet process of Rodr´ıguez et al. (2008). At the same time, the common atom formulationstill leverages two nested layers of random discrete mixture priors to borrow information between ex-periments and to identify similarities in the distributional patterns of the neuronal responses to differentstimuli. In addition, differently than in the nested Dirichlet process, the common atom model also al-lows to cluster the inferred spikes’ intensity values both within and between experimental conditions,so to infer common (recurring) response amplitudes. Finally, we allow our model to enforce sparsityof neuron ﬁring over time by assuming a spike-and-slab prior speciﬁcation on the marginal distribution f the amplitudes. AYESIAN MIXTURE MODEL FOR CALCIUM IMAGING DATA

ODEL AND PRIOR SPECIFICATION

The observed ﬂuorescence is often considered as a noisy realization of the underlying true calciumconcentration. To model a neuron’s activity, we adopt a popular model in the neuroscience literature,where the decay in ﬂuorescence is modeled through an autoregressive process and the spikes are mod-eled as jumps in correspondence to the neuron’s ﬁring events (Vogelstein et al. 2010). Denoting with y t the observed ﬂuorescence trace of a neuron and with c t the underlying calcium concentration, for t = 1 , . . . , T , one can assume y t = b + c t + (cid:15) t , (cid:15) t ∼ N(0 , σ ) ,c t = γc t − + A t + w t , w t ∼ N(0 , τ ) , (1)where b models the baseline level of the observed trace and (cid:15) t is a measurement error. In the absenceof neuronal activity, the true calcium concentration c t is considered to be centered around zero. Theparameter A t captures the neuronal activity: in the absence of a spike ( A t = 0 ), the calcium level fol-lows a AR(1) process controlled by the parameter γ ; when a spike occurs, the concentration increasesinstantaneously with the spike amplitude A t > .We are interested in characterizing the neuronal activity under different experimental conditions.For each time point t = 1 , . . . , T , let g t be a discrete categorical variable, taking values in { , . . . , J } ,where J is the number of distinct experimental settings, so that g t = j indicates that the neuronalactivity at time t is observed under condition j . The experimental conditions are often designed to cap-ture variations in neuronal activity with respect to a baseline process, which may represent a “typical”brain process. For example, in the Allen Observatory data, the interest is to investigate visually-evokedfunctional responses of neurons in the mouse’s visual cortex. Therefore, neurons associated with visualdecoding should be expected to activate in all conditions. It is then of interest to study not only if butalso how the neurons deferentially respond to the presentation of a variety of visual stimuli.In this paper, we propose a hierarchical Bayesian approach to investigate similarities and differ-ences in the distribution of spikes over time and conditions. In order to borrow information acrossdifferent experimental conditions, one option is to ﬁt a parametric hierarchical random effect model,and obtain a post-MCMC clustering of the estimated spikes A t by grouping together those spikes withsimilar magnitudes. This approach has several limitations: on the one hand, the distribution of therandom effects is constrained into a speciﬁc parametric form; on the other hand, the clustering of, say,the posterior mean estimates of the parameters A t ’s does not allow to fully describe stimulus-speciﬁcdistributional differences and to take into account the posterior uncertainty in the spikes. n order to allow ﬂexible modeling of distributions and to describe the heterogeneity of distribu-tional features, we assume a nested Bayesian ﬁnite mixture speciﬁcation. More speciﬁcally, we rewrite(1) as y t | b, γ, c t − , A t , σ , τ ∼ N( b + γ c t − + A t , σ + τ ) and we assume that the spikes A t are from stimulus-speciﬁc distributions, i.e. ( A t | g t = j, G j ) ∼ G j , j = 1 , . . . , J , to account for the observed variety of neuronal activity under different experiment set-tings. We further allow for clustering the distributions across conditions, in order to capture similarpatterns of neuronal activity. Indeed, one may typically expect K < J distributional clusters. Forexample, a neuron may respond to general visual stimulation and not speciﬁcally to the type of stim-ulus considered. More speciﬁcally, we assume the following generalized mixture of ﬁnite mixturesstructure: G , . . . , G J | Q ∼ Q, Q = K (cid:88) k =1 π k δ G ∗ k (2)where π , . . . , π K | K ∼ Dirichlet K ( α/K, . . . α/K ) , α > , and G ∗ , . . . , G ∗ K are a set of cluster-deﬁning distributions, obtained as realizations of an underlying random probability measure, speciﬁedfurther below. Equation (2) implies that the G j ’s, j = 1 , . . . , J have a positive probability of clusteringtogether, thereby giving rise to distributional clusters . In practice, the number of mixture components, K , is typically larger than the number of clusters, K + , and some of the atoms G ∗ k are not assigned to anyof the G j ’s (empty components). The prior on the number of mixture components K is a translatedbeta-negative-binomial distribution as in Fr¨uhwirth-Schnatter et al. (2020). Including a prior p ( K ) leads to both K + and K being random a priori. Finally, the distributional atoms G ∗ k , k = 1 , . . . , K arealso obtained as a realization from an underlying generalized mixture of ﬁnite mixtures, G ∗ k = L (cid:88) l =1 ω l,k δ A ∗ l (3)with ω ,k , . . . , ω L,k | L ∼ Dirichlet L ( β/L ) , for some positive real number β > . The set ofatoms A ∗ l is common across all distributions G ∗ , . . . , G ∗ K and they are obtained as i.i.d. draws froma centering measure, A ∗ l ∼ G ( A ∗ l ) . Therefore, equation (3) deﬁnes a clustering of the inferred spikeintensities both within a given condition (i.e. for ﬁxed G ∗ k ) and across conditions (i.e. across the G ∗ k ’s;hence, across the G j ’s). In the following, we adopt common terminology in the literature on nestedBayesian non-parametric priors and indicate the clustering induced on the A t through the proposedtwo-layers prior as observational clustering . The nested gMFM formulation requires the speciﬁcationof a prior on the number of components that specify the lower-level distributional atoms G ∗ k , L ∼ p ( L ) .Once again, some of the components may be empty.We enforce sparsity in the detection of the spikes by modeling the base measure G for the param-eters A ∗ l with a spike-and-slab speciﬁcation (Mitchell & Beauchamp 1988), which is a convex mixture etween a Dirac mass at zero – representing the absence of neuronal response – and a diffuse densityon the positive real numbers – representing the intensity of the neuronal response. More speciﬁcally,we assume G = (1 − p ) δ + p Ga ( h A , h A ) , (4)where the slab is a gamma distribution, Ga( a, b ) with mean a/b and variance a/b . The choice of agamma distribution in (4) is particularly relevant for sparsity-inducing purposes, as the gamma densitybelongs to the set of moment non-local prior densities, as deﬁned by Johnson & Rossell (2010). There-fore, a negligible probability density is assigned to values in a neighborhood of zero, thus inducing aclear separation between the baseline neuronal activity and the neuronal responses. In particular, thehigher the shape parameter h A , the larger is the separation. We assume a Beta( h p , h p ) prior for theproportion of spikes p with h p much smaller than h p in order to favor sparsity of detections.The proposed formulation can be seen as a special case of inner spike-and-slab nonparametricpriors, following a terminology introduced by Canale et al. (2017). In the following, we will refer tothe proposed speciﬁcation as a ﬁnite common atom model (fCAM).The Bayesian model elicitation is completed by assuming conjugate priors for the underlying cal-cium level concentration parameters, i.e. the baseline calcium level b , and the variances σ and τ .Speciﬁcally, the following conjugate prior distributions are assumed: c ∼ N(0 , C ) , b ∼ N( b , B )1 /σ ∼ Ga( h σ , h σ ) , /τ ∼ Ga( h τ , h τ ) , Finally, under the assumption that the process is stationary with positive correlation between the cal-cium level at consecutive times, we constrain γ ∈ (0 , and let γ ∼ Beta( h γ , h γ ) , a priori.2.2 P OSTERIOR INFERENCE

For computational purposes, it is often convenient to rewrite the likelihood for an observation y t undercondition g t = j by introducing two latent cluster allocation variables, S j = S g t and M t , indicatingthe distributional cluster for the group j and the observational cluster for y t , respectively.Given K and { π k } Kk =1 , the distributional allocation variable S j ∼ Multin K ( π ) . Similarly, given L and { ω l } Ll =1 , the observational allocation variable M t ∼ Multin L ( ω ) . For notation simplicity, we haveindicated the parameter vectors using bold letters in the equations above. Therefore, conditionally onthe other model parameters, the joint distribution of the observed data and the latent cluster allocationscan be written as f ( y , M , S | π , ω , A ∗ ) = J (cid:89) j =1 π S j (cid:89) t : g t = j ω M t ,S j p ( y t | A ∗ M t ) , which facilitates posterior inference. ore speciﬁcally, posterior inference for the proposed fCAM can be carried out quite straightfor-wardly by means of Markov chain Monte Carlo (MCMC) techniques. The sampling of the latent cal-cium level c t uses an iterative approach based on the Kalman ﬁlter and on a forward ﬁltering backwardsampling algorithm (Prado & West 2010). Full conditional posteriors for b , p , σ and τ are availablein closed form thus leading to straightforward Gibbs sampling steps. For the autoregressive parameter γ , we use a Metropolis-Hastings within the Gibbs step. The sampling of A t exploits a combination ofthe nested slice sampler of Denti et al. (2020) and of the telescoping sampler of Fr¨uhwirth-Schnatteret al. (2020). A detailed description of the latter step is reported in the the Appendix. Here, we justpresent a schematic description of the MCMC steps:1) Sample the calcium level c t , for t = 0 , . . . , T , using a forward ﬁltering backward sampling:a) Run Kalman ﬁlter: set a = m = 0 , R = C = var( c ) . For t = 1 , . . . , T let a t = γ m t − + A t R t = γ C t − + τ . Compute the ﬁltering distribution’s parameters, m t and C t , for t = 1 , . . . , T , where m t = a t + R t ( R t + σ ) − ( y t − b − a t ) C t = R t − R t ( R t + σ ) − . b) Draw c T ∼ N( m T , C T ) ;c) For t = T − , . . . , , draw c t ∼ N( h t , H t ) , with h t = m t + γ C t R − t +1 ( c t +1 − a t +1 ) H t = C t − γ C t R − t +1 .

2) Sample a new value for the baseline level b : b ∼ N (cid:32) b B + 1 σ T (cid:88) t =1 ( y t − c t ) , (cid:114) B + Tσ (cid:33) .

3) Sample the variance on the output equation σ and the variance on the state equation τ : /σ ∼ Ga (cid:32) h σ + T , h σ + 12 T (cid:88) t =1 ( y t − c t − b ) (cid:33) /τ ∼ Ga (cid:32) h τ + T , h τ + 12 T (cid:88) t =1 ( c t − γ c t − − A t ) (cid:33) . ) Update the autoregressive parameter γ using a Metropolis-Hastings step.5) Update the parameter p of the spike-and-slab base measure from p ∼ Beta( h p + T − n , h p + n ) , where n is the number of y t assigned to the the spike component.6) Update the cluster allocations variables S and M , the number of mixture components K and L ,and the cluster parameters A ∗ using the nested telescoping sampling for the ﬁnite common atommodel reported in the Appendix. IMULATION STUDY

The performances of the proposed method are assessed through a simulation study. The purpose ofthis section is twofold, namely to assess both the ability to correctly identify the spike times, and theaccuracy of the inferred clustering structure.We simulated synthetic data exhibiting a baseline level and a number of spikes representing theeffect of the response of a neuron to a stimulus, thus mimicking the characteristics of real series ofcalcium imaging following the structure of model (1). Speciﬁcally, we ﬁrst divided the time frameinto J hypothetical experimental conditions of equal length, with J varying in the different scenariosdescribed below. Consistent with our motivating assumption that the neuronal response depends onthe type of stimulus, each experimental condition is assumed to belong to one of the K distributionalclusters. Then, for each experimental condition, we generated the neuronal activity: ﬁrst, we generatedthe presence or absence of a neuron response uniformly in time, where the spike probability can varyacross groups. Then, conditionally on the obtained activations, we generated some additional spikesin a short subsequent interval, so that it is very likely to observe close or even successive spikes. Inthis way, the data mimic a real calcium imaging time series. Moreover, we are able to conduct acareful assessment of the ability of the model to distinguish the presence of a single high spike versusthe convolution of several spikes in consecutive times. Finally, the values A t , conditionally on theirdistributional cluster, are generated from one of the ﬁnite sets of spike amplitudes described below.We simulated 50 independent data sets for each of the three scenarios described henceforth. InScenario 1 we assumed J = 6 experimental conditions, generated from K = 4 distributional clusters.The spike amplitudes in the distributional clusters are (0.35, 0.89, 1.15, 1.80, 2.20), (0.65, 0.89, 1.40,1.80), (0.35, 0.65, 1.15), and (0.35, 0.89, 1.60). Scenario 2 assumes J = 4 experimental conditionsand K = 3 distributional clusters with spike amplitudes equal to (0.3, 0.5, 0.7, 0.9, 1.1, 1.5), (0.3, 0.9,1.5, 1.8), and (0.5, 0.9, 1.5). Finally, Scenario 3 sets J = 5 and K = 3 with the spike amplitudes inthe distributional clusters being (0.3, 0.5, 0.7, 0.9, 1.1), (0.3, 0.9, 1.1, 1.3), and (0.7, 0.9, 1.3). Whilein Scenario 1 the amplitudes of the spikes are quite large, spaced apart, and with the correspondingdistributional clusters well distinct, in Scenario 3 the spike amplitudes are more homogeneous and e+002e−044e−046e−048e−04 CAM fCAM L0 Scenario 1

Scenario 2

Scenario 3

Figure 1: Distribution of the misclassiﬁcation error rate in the simulation study.more clustered in time. Scenario 2 represents an in-between situation. Hence, from the ﬁrst to thelast scenario, we are assuming an increasing degree of complexity. The R script generating thesesynthetic datasets can be found at the Repository https://github.com/laura-dangelo/fCAM_calcium_imaging .The results attained by the proposed fCAM are compared to those obtained exploiting the commonatom model (CAM) of Denti et al. (2020) – which provides a benchmark for the clustering of thespikes and the stimulus-speciﬁc distributions – and the L penalization method of Jewell et al. (2019),which provides a benchmark for the task of spikes’ detection. The latter method is tuned choosing thepenalization parameter that minimizes the in-sample misclassiﬁcation error rate, thus achieving a sortof oracle performance.To assess the sensitivity of the proposed fCAM to the prior speciﬁcation, we repeated the numericalexperiment for different values of the hyperparameters h A and h A in (4). In particular, the shapeparameter h A was supposed to play a key role in the detection of spikes. Keeping ﬁxed the ratio h A /h A , the parameters were set equal to 3, 4, 6, and 8: a small value implies, a priori , less separationbetween zero and the distribution of the positive spikes while a large value corresponds to the oppositeeffect.Focusing on the classiﬁcation of each time point as a spike or not, Figure 1 summarizes the misclas-siﬁcation rate for all competing methods under the three scenarios. The results of the replications aresummarized using boxplots. For our fCAM, we report only the results obtained with h A = h A = 8 as those obtained for the other choices are essentially equivalent. The rates are small in absolute valueand broadly comparable across the different methods, thus conﬁrming that all the competing modelsare effective in detecting the spikes.However, the proposed fCAM not only enables the detection of spikes but also allows to conductinference on the clustering structure. Therefore, we report on its ability to identify the clustering bservational clustersDistributional clustersCAM fCAMCAM fCAM0.000.250.500.751.000.9900.9930.996 Scenario 1

Observational clustersDistributional clustersCAM fCAMCAM fCAM0.000.250.500.751.000.960.970.980.99

Scenario 2

Observational clustersDistributional clustersCAM fCAMCAM fCAM0.60.81.00.9750.9800.9850.990

Scenario 3

Figure 2: Distribution of the adjusted Rand index on the distributional and observational clusters, computed on the50 simulations for the three scenarios of the simulated data.structure. Figure 2 reports the adjusted Rand index (Rand 1971, Hubert & Arabie 1985) computed onboth the observational and the distributional clusters for h A = h A = 8 (results for other settingsare similar). Values of the adjusted Rand index close to 1 denote that the identiﬁed structure resemblethe true clustering. While for the observational clusters the results are broadly comparable, for thedistributional clusters, the performance of the proposed fCAM is uniformly superior. In addition,the variability of the results generally appears to be drastically smaller for the fCAM, thus providingevidence of greater efﬁciency. LLEN B RAIN O BSERVATORY DATA ANALYSIS

We now revert to the analysis of the data from the Allen Brain Observatory (Allen Institute for BrainScience 2016). The data comprise the dF/F -transformed ﬂuorescence trace for a cell during session-Bof the experiment. This session comprises three types of visual stimuli (static gratings, natural sceneand natural movie) in addition to some period of spontaneous activity (absence of visual stimuli). Sincethe data are recorded at a frequency of Hz, the resulting series consists of 113,865 time points fora total of 63.2 minutes. We focus our analysis on a neuron located in the primary visual area, at animaging depth equal to 350 microns.The observed ﬂuorescence trace is shown with a continuous black line in Figure 3. Different

123 0 1000 2000 3000

Time (seconds) C a l c i u m l e v e l Stimulus

Natural movie Natural scene Static grating

Figure 3: Observed ﬂuorescence trace y t from the Allen Brain Observatory data (black line), and visual stimulus towhich the mouse is exposed (shaded areas). The yellow line represents the estimated neuronal activity.shaded backgrounds indicate the types of visual stimuli. Using the notation introduced in the previousSections, J = 4 with j = 1 , , corresponding to static grating, natural scene, and natural movie,respectively and j = 4 indicating no stimulus presence.We ran the MCMC algorithm of Section 2.2 using the same prior speciﬁcation of Section 2.1 for10,000 iterations discarding the ﬁrst 7000 iterations as burn-in and keeping one iteration every twoto improve mixing. Visual inspection of the traceplots and Geweke diagnostics showed no issueswith convergence. The superimposed light line in Figure 3 represents the estimated neuronal activityin terms of the inferred amplitude A t , i.e. removing the measurement errors and the result of theaccumulation of calcium from the previous spikes. Here and henceforth, we identiﬁed the presence ofa spike if the proportion of non-zero A t ’s over all MCMC iterations was greater than 60%.As already mentioned in the Introduction, in calcium imaging it is of interest studying the distribu-tion of the spikes in response to each experimental stimulus, and identifying similarities and differencesin these distributions across stimuli.We start by investigating the presence of similarities in the neuronal response to different typesof visual stimuli. This corresponds to analyzing the clustering of the spike distributions induced bythe proposed fCAM. The model clusters together the groups corresponding to the natural scene andnatural movie stimuli with high posterior probability, while the static grating stimulus and the absenceof stimuli are assigned to two separate distributional clusters. In other terms, the neuron appears toshow similar neuronal responses in the natural scene and natural movie stimuli whereas the responsesappear distinctly different under the other two conditions. tatic grating D en s i t y Natural scene D en s i t y Natural movie D en s i t y Figure 4: Empirical distribution of the posterior means of the observational cluster parameters A t for the threeexperimental conditions of the Allen Brain Observatory data.To understand whether and how the neuronal response depends on the type of stimulus, we es-timated the spike amplitude distribution for each of the four types of stimuli. Figure 4 shows thehistograms of posterior means of the non-zero spike amplitudes for the three types of stimuli. The dis-tribution for the time interval between 1018-1319 sec in Figure 3 (absence of stimuli) is not presentedbecause no activity was detected. Despite the apparent similarities of the distributions in Figure 4,the second cluster of spike amplitude distributions (natural scene and natural movie) shows a heaviertail. Speciﬁcally, the highest observed cluster during the static grating stimulus (top plot) is centered at1.04, while for the other two stimuli we obtained several higher values, with the largest cluster centeredaround 1.49.A qualitative representation of how these spike clusters are distributed within the three groups isgiven in Figure 5. The three plots show a short interval of the observed calcium series, chosen in corre-spondence of one of the highest observed spikes. Each plot also shows a series of colored vertical lines:the lines are placed at the estimated spike times, and the colors correspond to the estimated spike am-plitudes. The represented partition is the posterior point estimate obtained by minimizing the variation f information loss, as proposed in Wade & Ghahramani (2018). Conditionally on the obtained parti-tion, for each cluster a representative value for the cluster parameter is obtained as follows: ﬁrst, foreach MCMC iteration, the group-speciﬁc average of A t is computed keeping the partition ﬁxed; then,these values are averaged over all the MCMC iterations. We notice that for all experiments, high val-ues of the observed calcium level are often produced as the result of several consecutive spikes, since,individually, the spikes are characterized by a relatively low amplitude, and the observed calcium levelis cumulated due to its autoregressive behavior. The autoregressive parameter γ has a posterior meanequal to 0.498 with a 95% credible interval of (0.484, 0.508). This result corresponds to the under-standing that the observed calcium response may be generated by high-frequency ﬁring neurons: due tothe low-sampling rate, the non-linear calcium signal essentially captures a super-imposition of multiplespikes (Hoang et al. 2020).As a matter of fact, another useful quantity we can compute to compare the neuronal activitybetween stimuli is the ﬁring rate, which provides a measure of how often the neuron has activatedduring a speciﬁc visual stimulus. The rate computes the number of detected spikes per second, to takeinto account the different duration of the experiments. For the static grating stimulus the posteriormean rate (and related 95% credible interval) is .

287 (0 . , . , while for the natural scene andnatural movie stimuli they are .

523 (0 . , . and .

612 (0 . , . , respectively. Thoseresults highlight the role of spike-frequency adaptation, whereby some neurons show an increasedactivity when exposed to more complex stimuli, thus exhibiting higher ﬁring rates and larger calciumconcentration measurements (Peron & Gabbiani 2009). ISCUSSION

Calcium imaging has become widely applied to record the neuronal activity in awake, freely movingand behaving animals. However, reliable spike detection and spike time estimation remain challenging,due to the non-linearity and low signal-to-noise ratio of the calcium response. We have proposed asingle-stage nested Bayesian ﬁnite mixture model that allows to estimate the spike activity and alsoto characterize how its distribution varies across stimuli. The method shows good performances in asimulation study and captures characteristic features of neuronal activity in an application to publiclyavailable data from the Allen Brain Observatory.In line with the current literature, our approach is limited to the analysis of the calcium responsesobserved from a single neuron. Methods to infer neuronal connectivity from calcium imaging data overmultiple regions of the brain remain sparse, often limited to point estimates (Mishchenko et al. 2011)or the analysis of in-vitro data (Rigat et al. 2006). Possible extensions of the framework presentedhere may focus on studying the dynamic clustering of temporally correlated groups of neurons, as afunction of external stimuli or the movement of the animal through an environment. In addition to thelow signal-to-noise ratio, issues related to the dimension of the data and the accurate identiﬁcation ofthe locations of neurons further compound the statistical and computational challenges (Petersen et al. .00.51.01.5 C a l c i u m l e v e l Static grating P r( s p i k e ) C a l c i u m l e v e l Natural scene P r( s p i k e ) C a l c i u m l e v e l Natural movie

Time (seconds) P r( s p i k e ) A Figure 5: Short interval of length 500 of the Allen Brain Observatory data in correspondence of a spike, for the threestimuli. The vertical lines indicate the time of a spike and the colors correspond to the observational cluster of itsamplitude. The bottom panels show the estimated posterior probability of spike presence, for each time point. R EFERENCES

Allen Brain Observatory (2017), ‘Technical whitepaper: stimulus set and response analyses’.Allen Institute for Brain Science (2016), ‘Allen brain observatory’, http://observatory.brain-map.org/visualcoding.Brenner, N., Agam, O., Bialek, W. & de Ruyter van Steveninck, R. (2002), ‘Statistical properties ofspike trains: universal and stimulus-dependent aspects’,

Physical review. E, Statistical, nonlinear,and soft matter physics , 031907.Camerlenghi, F., Dunson, D. B., Lijoi, A., Pr¨unster, I. & Rodr´ıguez, A. (2019), ‘Latent nested non-parametric priors (with discussion)’, Bayesian Analysis (4), 1303–1356.Canale, A., Lijoi, A., Nipoti, B. & Pr¨unster, I. (2017), ‘On the Pitman–Yor process with spike and slabbase measure’, Biometrika (3), 681–697.Dana, H., Sun, Y., Mohar, B., Hulse, B. K., Kerlin, A. M., Hasseman, J. P., Tsegaye, G., Tsang, A.,Wong, A., Patel, R., Macklin, J. J., Chen, Y., Konnerth, A., Jayaraman, V., Looger, L. L., Schreiter,E. R., Svoboda, K. & Kim, D. S. (2019), ‘High-performance calcium sensors for imaging activity inneuronal populations and microcompartments’,

Nature Methods , 649 – 657.Denti, F., Camerlenghi, F., Guindani, M. & Mira, A. (2020), ‘A common atom model for the Bayesiannonparametric analysis of nested data’, arXiv:2008.07077.Dombeck, D. A., Harvey, C. D., Tian, L., Looger, L. L. & Tank, D. W. (2010), ‘Functional imagingof hippocampal place cells at cellular resolution during virtual navigation’, Nature Neuroscience , 1433 – 1440.Friedrich, J. & Paninski, L. (2016), Fast active set methods for online spike inference from calciumimaging, in D. Lee, M. Sugiyama, U. Luxburg, I. Guyon & R. Garnett, eds, ‘Advances In NeuralInformation Processing Systems’, pp. 1984 – 1992.Friedrich, J., Zhou, P. & Paninski, L. (2017), ‘Fast online deconvolution of calcium imaging data’,

PLOS Computational Biology (3), 1 – 26.Fr¨uhwirth-Schnatter, S., Malsiner-Walli, G. & Gr¨un, B. (2020), ‘Generalized mixtures of ﬁnite mix-tures and telescoping sampling’, arXiv:2005.09918. rienberger, C. & Konnerth, A. (2012), ‘Imaging calcium in neurons’, Neuron (5), 862 – 885.Hoang, H., Sato, M.-a., Shinomoto, S., Tsutsumi, S., Hashizume, M., Ishikawa, T., Kano, M., Ikegaya,Y., Kitamura, K., Kawato, M. & Toyama, K. (2020), ‘Improved hyperacuity estimation of spiketiming from calcium imaging’, Scientiﬁc Reports (1), 17844.Hubert, L. & Arabie, P. (1985), ‘Comparing partitions’, Journal of Classiﬁcation (336), 193 – 218.Jewell, S. W., Hocking, T. D., Fearnhead, P. & Witten, D. M. (2019), ‘Fast nonconvex deconvolutionof calcium imaging data’, Biostatistics (4), 709–726.Jewell, S. & Witten, D. (2018), ‘Exact spike train inference via L0 optimization’, The Annals of AppliedStatistics (4), 2457 – 2482.Johnson, V. E. & Rossell, D. (2010), ‘On the use of non-local prior densities in Bayesian hypothesistests’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) (2), 143–170.Li, N., Chen, T.-W., Guo, Z. V., Gerfen, C. R. & Svoboda, K. (2015), ‘A motor cortex circuit for motorplanning and movement’, Nature (7541), 51–56.Maruyama, R., Maeda, K., Moroda, H., Kato, I., Inoue, M., Miyakawa, H. & Aonishi, T. (2014),‘Detecting cells using non-negative matrix factorization on calcium imaging data’,

Neural Networks , 11–19.Mishchenko, Y., Vogelstein, J. T. & Paninski, L. (2011), ‘A Bayesian approach for inferring neuronalconnectivity from calcium ﬂuorescent imaging data’, Annals of Applied Statistics (2B), 1229–1261.Mitchell, T. J. & Beauchamp, J. J. (1988), ‘Bayesian variable selection in linear regression’, Journalof the American Statistical Association (404), 1023–1036.Mukamel, E. A., Nimmerjahn, A. & Schnitzer, M. J. (2009), ‘Automated analysis of cellular signalsfrom large-scale calcium imaging data’, Neuron (6), 747–760.Nakajima, M. & Schmitt, L. I. (2020), ‘Understanding the circuit basis of cognitive functions usingmouse models’, Neuroscience Research , 44 – 58.Peron, S. P. & Gabbiani, F. (2009), ‘Role of spike-frequency adaptation in shaping neuronal responseto dynamic stimuli’,

Biological cybernetics (6), 505–520.Petersen, A., Simon, N. & Witten, D. (2018), ‘Scalpel: Extracting neurons from calcium imaging data’,

Annals of Applied Statistics (4), 2430–2456. nevmatikakis, E., Merel, J., Pakman, A. & Paninski, L. (2013), Bayesian spike inference from calciumimaging data, in ‘In Signals, Systems and Computers’, pp. 349 – 353.Prado, R. & West, M. (2010), Time Series: Modeling, Computation, and Inference , 1st edn, Chapmanand Hall.Rand, W. M. (1971), ‘Objective criteria for the evaluation of clustering methods’,

Journal of the Amer-ican Statistical Association (336), 846–850.Rigat, F., de Gunst, M. & van Pelt, J. (2006), ‘Bayesian modelling and analysis of spatio-temporalneuronal networks’, Bayesian Analysis (4), 733–764.Rodr´ıguez, A., Dunson, D. B. & Gelfand, A. E. (2008), ‘The nested Dirichlet process’, Journal of theAmerican Statistical Association (483), 1131–1154.Rose, T., Goltstein, P. M., Portugues, R. & Griesbeck, O. (2014), ‘Putting a ﬁnishing touch on gecis’,

Frontiers in Molecular Neuroscience , 88.Shibue, R. & Komaki, F. (2020), ‘Deconvolution of calcium imaging data using marked point pro-cesses’, PLOS Computational Biology (3), 1–25.Vogelstein, J. T., Packer, A. M., Machado, T. A., Sippy, T., Babadi, B., Yuste, R. & Paninski, L. (2010),‘Fast nonnegative deconvolution for spike train inference from population calcium imaging’, Journalof Neurophysiology (6), 3691–3704.Vogelstein, J. T., Watson, B. O., Packer, A. M., Yuste, R., Jedynak, B. & Paninski, L. (2009),‘Spike inference from calcium imaging using sequential Monte Carlo methods’,

Biophysical Journal (2), 636 – 655.Wade, S. & Ghahramani, Z. (2018), ‘Bayesian cluster analysis: point estimation and credible balls(with discussion)’, Bayesian Analysis (2), 559–626. A N

ESTED TELESCOPING SAMPLING

Denote with C D the current partition on the distributions and with C O the partition on the observations.1. Sample the weights on the distributions: ( π , . . . , π K ) | K, α, C D ∼ Dir( e , . . . , e K ) ; where e k = α/K + J k , and J k is the number of groups assigned to the distribution k .2. Sample the weights on the observations: for all k ∈ { , . . . , K } sample a vector ω k from ( ω ,k , . . . , ω L,k ) | L, β, C O , C D ∼ Dir( f ,k , . . . , f L,k ) ; where f l,k = β/L + N l,k , and N l,k is the number of observations in the observational cluster l and distributional cluster k . . Update the partition on the distributions C D by sampling from the posterior distribution of thelatent cluster allocation variables S . For j = 1 , . . . , J Pr( S j = k | π , K, A ∗ , y , g ) ∝ π k (cid:89) t : g t = j ω M t ,S j p ( y t | A ∗ M t ) , with k ∈ { , . . . , K } . Determine J k = { j : S j = k } , for k = 1 . . . , K , and the number ofnon-empty components K + = (cid:80) Kk =1 I { J k > } . Relabel the components so that the ﬁrst K + are non-empty.4. Update the partition on the observations C O by sampling from the posterior distribution of thelatent cluster allocation variables M . For t = 1 , . . . , T Pr( M t = l | S g t = k, S , ω , L, K, A ∗ , y , g ) ∝ ω l,k p ( y t | A ∗ M t ) , with l ∈ { , . . . , L } , k ∈ { , . . . , K } . Determine N l = { t : M t = l } , for l = 1 . . . , L , and thenumber of non-empty components L + = (cid:80) Ll =1 I { N l > } . Relabel the components so that theﬁrst L + are non-empty. Because all the mixtures share the same atoms, the cluster parametersare sorted regardless of the distributional cluster allocation.5. Sample the cluster parameters for the non-empty components: p ( A ∗ l | − ) ∝ p ( A ∗ l ) (cid:81) t : M t = l p ( y t | A ∗ l ) .6. Conditional on C D , sample the number of components K of the mixture on distributions.7. Conditional on C O , sample the number of components L of the mixtures on observations. If L > L + , sample a new parameter A ∗ for the empty components from the prior distribution.8. Update the hyperparameter α on the Dirichlet distribution on the mixture weights on distribu-tions.9. Update the hyperparameter β on the Dirichlet distribution on the mixture weights on observa-tions.The posterior distributions for steps 6-9 are given in Fr¨uhwirth-Schnatter et al. (2020).on the Dirichlet distribution on the mixture weights on observa-tions.The posterior distributions for steps 6-9 are given in Fr¨uhwirth-Schnatter et al. (2020).