Bayesian nonparametric analysis for the detection of spikes in noisy calcium imaging data
Laura D'Angelo, Antonio Canale, Zhaoxia Yu, Michele Guindani
BBayesian nonparametric analysis for the detection of spikesin noisy calcium imaging data
Laura D’Angelo *1 , Antonio Canale †1 , Zhaoxia Yu ‡2 , and Michele Guindani § 21 Dipartimento di Scienze Statistiche; Universit`a degli Studi di Padova Department of Statistics; University of California, Irvine
Abstract
Recent advancements in miniaturized fluorescence microscopy have made it possible to investigateneuronal responses to external stimuli in awake behaving animals through the analysis of intra-cellularcalcium signals. An on-going challenge is deconvoluting the temporal signals to extract the spiketrains from the noisy calcium signals’ time-series. In this manuscript, we propose a nested Bayesianfinite mixture specification that allows for the estimation of spiking activity and, simultaneously, re-constructing the distributions of the calcium transient spikes’ amplitudes under different experimentalconditions. The proposed model leverages two nested layers of random discrete mixture priors toborrow information between experiments and discover similarities in the distributional patterns of neu-ronal responses to different stimuli. Furthermore, the spikes’ intensity values are also clustered withinand between experimental conditions to determine the existence of common (recurring) response am-plitudes. Simulation studies and the analysis of a data set from the Allen Brain Observatory show theeffectiveness of the method in clustering and detecting neuronal activities.
Keywords:
Dirichlet process; Mixtures of Finite Mixtures; Model-based clustering; Nested Dirichletprocess; Spike and slab. * [email protected] † [email protected] ‡ [email protected] § [email protected] a r X i v : . [ s t a t . A P ] F e b I NTRODUCTION
In recent years, calcium imaging has become a popular technique to measure the neuronal activityin awake, freely moving and behaving animals over time. Due to the development of miniaturizedand flexible microendoscopes for fluorescence microscopy, this technique has allowed to study howindividual neurons and neuronal networks encode external stimuli and cognitive processes (Nakajima& Schmitt 2020, Li et al. 2015). Calcium ions generate intracellular signals that determine a largevariety of functions in all neurons (Grienberger & Konnerth 2012). The mechanism at the basis ofcalcium imaging is a physiological process of the cells: when a neuron fires, calcium floods the celland produces a transient spike in its concentration. By using genetically encoded calcium indicators,which are fluorescent molecules that react when binding to the calcium ions, it is possible to opti-cally measure the level of calcium by analyzing the observed fluorescence trace. The outcome of thistechnique is a movie of time-varying fluorescence intensities, from which the spike trains of the ob-servable neurons have to be extracted through a complex pre-processing phase. In general, this phaseis meant to deal with two issues: identifying the spatial location of each neuron in the optical field,and deconvoluting the temporal signals to extract their spike trains. Several methods are employed toextract the fluorescence traces, e.g. by using independent component analysis (Dombeck et al. 2010,Mukamel et al. 2009), non-negative deconvolution (Vogelstein et al. 2010) and non-negative matrixfactorization (Maruyama et al. 2014).The resulting processed data consist of a fluorescent calcium trace for each observable neuron inthe targeted area (see Figure 3 for an example). The observed fluorescence trace can be used as a proxyof cellular activity but, unfortunately, it does not represent exactly the target of the analysis. What isactually relevant to the study are the precise spike times and the intracellular calcium concentrationof the observable neurons when the animal is subjected to external stimuli (Vogelstein et al. 2009).Extracting the neuronal activity from these series is not trivial: the calcium imaging technology hasseveral limitations, including the presence of measurement noise, the nonlinearity between fluores-cence transient and calcium concentration and the slow decay of the fluorescence trace compared tothe underlying neuronal activity (Friedrich et al. 2017, Dana et al. 2019, Rose et al. 2014). Moreover,the large-scale of the time series introduces additional complexity to the analysis. Therefore, a preciseestimation of the spike times and amplitudes is a fundamental step toward the understanding of theneurons’ behavior.As a motivating application, here we consider a publicly available data set from the Allen BrainObservatory (Allen Brain Observatory 2017) of calcium imaging data obtained through two-photonsmicroscopy in behaving mice. This study is an extended in vivo survey of physiological activity in themouse visual cortex in response to a range of visual stimuli. Each mouse is placed in front of a screenwhere different types of visual stimuli are shown, while the mouse’s neuronal activity is recorded. Thestimuli vary from simple synthetic images such as locally sparse noise or static gratings, to complexnatural scenes and movies. The goal of the study is to investigate how neurons at different depths in thevisual areas respond to stimuli of different complexity. Specifically, each neuron in the visual cortex an be characterized by their receptive field , i.e. the features of the visual stimulus that trigger thesignalling of that neuron. Hence, it is of critical interest to devise methods that allow inferring howthe neuronal response varies under the different types of visual stimuli. We expect that the neuronalactivity will vary across all the experimental settings, and that some variations in its intensity will beobserved based on the specific visual stimulus.Several approaches have been proposed to accurately and efficiently estimate the neuronal activityin calcium imaging data. For example, Friedrich & Paninski (2016) and Friedrich et al. (2017) haveproposed an on-line algorithm based on a lasso penalty to enforce sparsity of signal detection. Jewell &Witten (2018) and Jewell et al. (2019) have proposed using an L penalty in lieu of the L penalizationand an efficient algorithm to identify the presence or absence of spikes. In a Bayesian framework,Pnevmatikakis et al. (2013) have proposed to conduct inference on spike trains by estimating posteriorprobabilities of a latent binary indicator of spike presence at each time point. However, the model inPnevmatikakis et al. (2013) does not explicitly assume sparsity of the spikes. Moreover, it is expectedthe rate and the distribution of spikes to be stimulus-dependent (Brenner et al. 2002), but none of theprevious approaches allows to take into account the heterogeneity of spikes’ behaviors as a function ofthe stimulus. As Figure 3 clearly shows for the Allen Brain Observatory data, the spikes’ intensitiesvary greatly according to the type of stimulus. See also the discussion in Shibue & Komaki (2020)where they employ a marked point processes for the deconvolution of calcium imaging data.In this manuscript, we introduce a coherent nested Bayesian finite mixture model that allows forthe estimation of spiking activity and, simultaneously , for reconstructing the distributions of spikesunder various experimental conditions; for example, in response to different types of visual stimuliin the Allen Brain Observatory data set. More specifically, our modeling framework estimates andclusters the distributions of the calcium transient spikes’ amplitudes via a nested formulation of thegeneralized mixture of finite mixtures (gMFM) prior recently proposed by Fr¨uhwirth-Schnatter et al.(2020). The proposed model further adopts the use of a common atom specification as in Denti et al.(2020) for estimating the distribution of the spikes’ amplitudes under each experimental condition. Theproposed common atom gMFM has several advantages with respect to typical Bayesian nonparametricmodels for nested data. With respect to models based on Dirichlet process priors, the gMFM providesincreased flexibility to estimate partitions characterized either by many, well-balanced, clusters or bya small set of large clusters. The common atom model allows to obtain nested inference on densitieswithout incurring in the degeneracy issues pointed out by Camerlenghi et al. (2019) for the widely usednested Dirichlet process of Rodr´ıguez et al. (2008). At the same time, the common atom formulationstill leverages two nested layers of random discrete mixture priors to borrow information between ex-periments and to identify similarities in the distributional patterns of the neuronal responses to differentstimuli. In addition, differently than in the nested Dirichlet process, the common atom model also al-lows to cluster the inferred spikes’ intensity values both within and between experimental conditions,so to infer common (recurring) response amplitudes. Finally, we allow our model to enforce sparsityof neuron firing over time by assuming a spike-and-slab prior specification on the marginal distribution f the amplitudes. AYESIAN MIXTURE MODEL FOR CALCIUM IMAGING DATA
ODEL AND PRIOR SPECIFICATION
The observed fluorescence is often considered as a noisy realization of the underlying true calciumconcentration. To model a neuron’s activity, we adopt a popular model in the neuroscience literature,where the decay in fluorescence is modeled through an autoregressive process and the spikes are mod-eled as jumps in correspondence to the neuron’s firing events (Vogelstein et al. 2010). Denoting with y t the observed fluorescence trace of a neuron and with c t the underlying calcium concentration, for t = 1 , . . . , T , one can assume y t = b + c t + (cid:15) t , (cid:15) t ∼ N(0 , σ ) ,c t = γc t − + A t + w t , w t ∼ N(0 , τ ) , (1)where b models the baseline level of the observed trace and (cid:15) t is a measurement error. In the absenceof neuronal activity, the true calcium concentration c t is considered to be centered around zero. Theparameter A t captures the neuronal activity: in the absence of a spike ( A t = 0 ), the calcium level fol-lows a AR(1) process controlled by the parameter γ ; when a spike occurs, the concentration increasesinstantaneously with the spike amplitude A t > .We are interested in characterizing the neuronal activity under different experimental conditions.For each time point t = 1 , . . . , T , let g t be a discrete categorical variable, taking values in { , . . . , J } ,where J is the number of distinct experimental settings, so that g t = j indicates that the neuronalactivity at time t is observed under condition j . The experimental conditions are often designed to cap-ture variations in neuronal activity with respect to a baseline process, which may represent a “typical”brain process. For example, in the Allen Observatory data, the interest is to investigate visually-evokedfunctional responses of neurons in the mouse’s visual cortex. Therefore, neurons associated with visualdecoding should be expected to activate in all conditions. It is then of interest to study not only if butalso how the neurons deferentially respond to the presentation of a variety of visual stimuli.In this paper, we propose a hierarchical Bayesian approach to investigate similarities and differ-ences in the distribution of spikes over time and conditions. In order to borrow information acrossdifferent experimental conditions, one option is to fit a parametric hierarchical random effect model,and obtain a post-MCMC clustering of the estimated spikes A t by grouping together those spikes withsimilar magnitudes. This approach has several limitations: on the one hand, the distribution of therandom effects is constrained into a specific parametric form; on the other hand, the clustering of, say,the posterior mean estimates of the parameters A t ’s does not allow to fully describe stimulus-specificdistributional differences and to take into account the posterior uncertainty in the spikes. n order to allow flexible modeling of distributions and to describe the heterogeneity of distribu-tional features, we assume a nested Bayesian finite mixture specification. More specifically, we rewrite(1) as y t | b, γ, c t − , A t , σ , τ ∼ N( b + γ c t − + A t , σ + τ ) and we assume that the spikes A t are from stimulus-specific distributions, i.e. ( A t | g t = j, G j ) ∼ G j , j = 1 , . . . , J , to account for the observed variety of neuronal activity under different experiment set-tings. We further allow for clustering the distributions across conditions, in order to capture similarpatterns of neuronal activity. Indeed, one may typically expect K < J distributional clusters. Forexample, a neuron may respond to general visual stimulation and not specifically to the type of stim-ulus considered. More specifically, we assume the following generalized mixture of finite mixturesstructure: G , . . . , G J | Q ∼ Q, Q = K (cid:88) k =1 π k δ G ∗ k (2)where π , . . . , π K | K ∼ Dirichlet K ( α/K, . . . α/K ) , α > , and G ∗ , . . . , G ∗ K are a set of cluster-defining distributions, obtained as realizations of an underlying random probability measure, specifiedfurther below. Equation (2) implies that the G j ’s, j = 1 , . . . , J have a positive probability of clusteringtogether, thereby giving rise to distributional clusters . In practice, the number of mixture components, K , is typically larger than the number of clusters, K + , and some of the atoms G ∗ k are not assigned to anyof the G j ’s (empty components). The prior on the number of mixture components K is a translatedbeta-negative-binomial distribution as in Fr¨uhwirth-Schnatter et al. (2020). Including a prior p ( K ) leads to both K + and K being random a priori. Finally, the distributional atoms G ∗ k , k = 1 , . . . , K arealso obtained as a realization from an underlying generalized mixture of finite mixtures, G ∗ k = L (cid:88) l =1 ω l,k δ A ∗ l (3)with ω ,k , . . . , ω L,k | L ∼ Dirichlet L ( β/L ) , for some positive real number β > . The set ofatoms A ∗ l is common across all distributions G ∗ , . . . , G ∗ K and they are obtained as i.i.d. draws froma centering measure, A ∗ l ∼ G ( A ∗ l ) . Therefore, equation (3) defines a clustering of the inferred spikeintensities both within a given condition (i.e. for fixed G ∗ k ) and across conditions (i.e. across the G ∗ k ’s;hence, across the G j ’s). In the following, we adopt common terminology in the literature on nestedBayesian non-parametric priors and indicate the clustering induced on the A t through the proposedtwo-layers prior as observational clustering . The nested gMFM formulation requires the specificationof a prior on the number of components that specify the lower-level distributional atoms G ∗ k , L ∼ p ( L ) .Once again, some of the components may be empty.We enforce sparsity in the detection of the spikes by modeling the base measure G for the param-eters A ∗ l with a spike-and-slab specification (Mitchell & Beauchamp 1988), which is a convex mixture etween a Dirac mass at zero – representing the absence of neuronal response – and a diffuse densityon the positive real numbers – representing the intensity of the neuronal response. More specifically,we assume G = (1 − p ) δ + p Ga ( h A , h A ) , (4)where the slab is a gamma distribution, Ga( a, b ) with mean a/b and variance a/b . The choice of agamma distribution in (4) is particularly relevant for sparsity-inducing purposes, as the gamma densitybelongs to the set of moment non-local prior densities, as defined by Johnson & Rossell (2010). There-fore, a negligible probability density is assigned to values in a neighborhood of zero, thus inducing aclear separation between the baseline neuronal activity and the neuronal responses. In particular, thehigher the shape parameter h A , the larger is the separation. We assume a Beta( h p , h p ) prior for theproportion of spikes p with h p much smaller than h p in order to favor sparsity of detections.The proposed formulation can be seen as a special case of inner spike-and-slab nonparametricpriors, following a terminology introduced by Canale et al. (2017). In the following, we will refer tothe proposed specification as a finite common atom model (fCAM).The Bayesian model elicitation is completed by assuming conjugate priors for the underlying cal-cium level concentration parameters, i.e. the baseline calcium level b , and the variances σ and τ .Specifically, the following conjugate prior distributions are assumed: c ∼ N(0 , C ) , b ∼ N( b , B )1 /σ ∼ Ga( h σ , h σ ) , /τ ∼ Ga( h τ , h τ ) , Finally, under the assumption that the process is stationary with positive correlation between the cal-cium level at consecutive times, we constrain γ ∈ (0 , and let γ ∼ Beta( h γ , h γ ) , a priori.2.2 P OSTERIOR INFERENCE
For computational purposes, it is often convenient to rewrite the likelihood for an observation y t undercondition g t = j by introducing two latent cluster allocation variables, S j = S g t and M t , indicatingthe distributional cluster for the group j and the observational cluster for y t , respectively.Given K and { π k } Kk =1 , the distributional allocation variable S j ∼ Multin K ( π ) . Similarly, given L and { ω l } Ll =1 , the observational allocation variable M t ∼ Multin L ( ω ) . For notation simplicity, we haveindicated the parameter vectors using bold letters in the equations above. Therefore, conditionally onthe other model parameters, the joint distribution of the observed data and the latent cluster allocationscan be written as f ( y , M , S | π , ω , A ∗ ) = J (cid:89) j =1 π S j (cid:89) t : g t = j ω M t ,S j p ( y t | A ∗ M t ) , which facilitates posterior inference. ore specifically, posterior inference for the proposed fCAM can be carried out quite straightfor-wardly by means of Markov chain Monte Carlo (MCMC) techniques. The sampling of the latent cal-cium level c t uses an iterative approach based on the Kalman filter and on a forward filtering backwardsampling algorithm (Prado & West 2010). Full conditional posteriors for b , p , σ and τ are availablein closed form thus leading to straightforward Gibbs sampling steps. For the autoregressive parameter γ , we use a Metropolis-Hastings within the Gibbs step. The sampling of A t exploits a combination ofthe nested slice sampler of Denti et al. (2020) and of the telescoping sampler of Fr¨uhwirth-Schnatteret al. (2020). A detailed description of the latter step is reported in the the Appendix. Here, we justpresent a schematic description of the MCMC steps:1) Sample the calcium level c t , for t = 0 , . . . , T , using a forward filtering backward sampling:a) Run Kalman filter: set a = m = 0 , R = C = var( c ) . For t = 1 , . . . , T let a t = γ m t − + A t R t = γ C t − + τ . Compute the filtering distribution’s parameters, m t and C t , for t = 1 , . . . , T , where m t = a t + R t ( R t + σ ) − ( y t − b − a t ) C t = R t − R t ( R t + σ ) − . b) Draw c T ∼ N( m T , C T ) ;c) For t = T − , . . . , , draw c t ∼ N( h t , H t ) , with h t = m t + γ C t R − t +1 ( c t +1 − a t +1 ) H t = C t − γ C t R − t +1 .
2) Sample a new value for the baseline level b : b ∼ N (cid:32) b B + 1 σ T (cid:88) t =1 ( y t − c t ) , (cid:114) B + Tσ (cid:33) .
3) Sample the variance on the output equation σ and the variance on the state equation τ : /σ ∼ Ga (cid:32) h σ + T , h σ + 12 T (cid:88) t =1 ( y t − c t − b ) (cid:33) /τ ∼ Ga (cid:32) h τ + T , h τ + 12 T (cid:88) t =1 ( c t − γ c t − − A t ) (cid:33) . ) Update the autoregressive parameter γ using a Metropolis-Hastings step.5) Update the parameter p of the spike-and-slab base measure from p ∼ Beta( h p + T − n , h p + n ) , where n is the number of y t assigned to the the spike component.6) Update the cluster allocations variables S and M , the number of mixture components K and L ,and the cluster parameters A ∗ using the nested telescoping sampling for the finite common atommodel reported in the Appendix. IMULATION STUDY
The performances of the proposed method are assessed through a simulation study. The purpose ofthis section is twofold, namely to assess both the ability to correctly identify the spike times, and theaccuracy of the inferred clustering structure.We simulated synthetic data exhibiting a baseline level and a number of spikes representing theeffect of the response of a neuron to a stimulus, thus mimicking the characteristics of real series ofcalcium imaging following the structure of model (1). Specifically, we first divided the time frameinto J hypothetical experimental conditions of equal length, with J varying in the different scenariosdescribed below. Consistent with our motivating assumption that the neuronal response depends onthe type of stimulus, each experimental condition is assumed to belong to one of the K distributionalclusters. Then, for each experimental condition, we generated the neuronal activity: first, we generatedthe presence or absence of a neuron response uniformly in time, where the spike probability can varyacross groups. Then, conditionally on the obtained activations, we generated some additional spikesin a short subsequent interval, so that it is very likely to observe close or even successive spikes. Inthis way, the data mimic a real calcium imaging time series. Moreover, we are able to conduct acareful assessment of the ability of the model to distinguish the presence of a single high spike versusthe convolution of several spikes in consecutive times. Finally, the values A t , conditionally on theirdistributional cluster, are generated from one of the finite sets of spike amplitudes described below.We simulated 50 independent data sets for each of the three scenarios described henceforth. InScenario 1 we assumed J = 6 experimental conditions, generated from K = 4 distributional clusters.The spike amplitudes in the distributional clusters are (0.35, 0.89, 1.15, 1.80, 2.20), (0.65, 0.89, 1.40,1.80), (0.35, 0.65, 1.15), and (0.35, 0.89, 1.60). Scenario 2 assumes J = 4 experimental conditionsand K = 3 distributional clusters with spike amplitudes equal to (0.3, 0.5, 0.7, 0.9, 1.1, 1.5), (0.3, 0.9,1.5, 1.8), and (0.5, 0.9, 1.5). Finally, Scenario 3 sets J = 5 and K = 3 with the spike amplitudes inthe distributional clusters being (0.3, 0.5, 0.7, 0.9, 1.1), (0.3, 0.9, 1.1, 1.3), and (0.7, 0.9, 1.3). Whilein Scenario 1 the amplitudes of the spikes are quite large, spaced apart, and with the correspondingdistributional clusters well distinct, in Scenario 3 the spike amplitudes are more homogeneous and e+002e−044e−046e−048e−04 CAM fCAM L0 Scenario 1
Scenario 2
Scenario 3
Figure 1: Distribution of the misclassification error rate in the simulation study.more clustered in time. Scenario 2 represents an in-between situation. Hence, from the first to thelast scenario, we are assuming an increasing degree of complexity. The R script generating thesesynthetic datasets can be found at the Repository https://github.com/laura-dangelo/fCAM_calcium_imaging .The results attained by the proposed fCAM are compared to those obtained exploiting the commonatom model (CAM) of Denti et al. (2020) – which provides a benchmark for the clustering of thespikes and the stimulus-specific distributions – and the L penalization method of Jewell et al. (2019),which provides a benchmark for the task of spikes’ detection. The latter method is tuned choosing thepenalization parameter that minimizes the in-sample misclassification error rate, thus achieving a sortof oracle performance.To assess the sensitivity of the proposed fCAM to the prior specification, we repeated the numericalexperiment for different values of the hyperparameters h A and h A in (4). In particular, the shapeparameter h A was supposed to play a key role in the detection of spikes. Keeping fixed the ratio h A /h A , the parameters were set equal to 3, 4, 6, and 8: a small value implies, a priori , less separationbetween zero and the distribution of the positive spikes while a large value corresponds to the oppositeeffect.Focusing on the classification of each time point as a spike or not, Figure 1 summarizes the misclas-sification rate for all competing methods under the three scenarios. The results of the replications aresummarized using boxplots. For our fCAM, we report only the results obtained with h A = h A = 8 as those obtained for the other choices are essentially equivalent. The rates are small in absolute valueand broadly comparable across the different methods, thus confirming that all the competing modelsare effective in detecting the spikes.However, the proposed fCAM not only enables the detection of spikes but also allows to conductinference on the clustering structure. Therefore, we report on its ability to identify the clustering bservational clustersDistributional clustersCAM fCAMCAM fCAM0.000.250.500.751.000.9900.9930.996 Scenario 1
Observational clustersDistributional clustersCAM fCAMCAM fCAM0.000.250.500.751.000.960.970.980.99
Scenario 2
Observational clustersDistributional clustersCAM fCAMCAM fCAM0.60.81.00.9750.9800.9850.990
Scenario 3
Figure 2: Distribution of the adjusted Rand index on the distributional and observational clusters, computed on the50 simulations for the three scenarios of the simulated data.structure. Figure 2 reports the adjusted Rand index (Rand 1971, Hubert & Arabie 1985) computed onboth the observational and the distributional clusters for h A = h A = 8 (results for other settingsare similar). Values of the adjusted Rand index close to 1 denote that the identified structure resemblethe true clustering. While for the observational clusters the results are broadly comparable, for thedistributional clusters, the performance of the proposed fCAM is uniformly superior. In addition,the variability of the results generally appears to be drastically smaller for the fCAM, thus providingevidence of greater efficiency. LLEN B RAIN O BSERVATORY DATA ANALYSIS
We now revert to the analysis of the data from the Allen Brain Observatory (Allen Institute for BrainScience 2016). The data comprise the dF/F -transformed fluorescence trace for a cell during session-Bof the experiment. This session comprises three types of visual stimuli (static gratings, natural sceneand natural movie) in addition to some period of spontaneous activity (absence of visual stimuli). Sincethe data are recorded at a frequency of Hz, the resulting series consists of 113,865 time points fora total of 63.2 minutes. We focus our analysis on a neuron located in the primary visual area, at animaging depth equal to 350 microns.The observed fluorescence trace is shown with a continuous black line in Figure 3. Different
123 0 1000 2000 3000
Time (seconds) C a l c i u m l e v e l Stimulus
Natural movie Natural scene Static grating
Figure 3: Observed fluorescence trace y t from the Allen Brain Observatory data (black line), and visual stimulus towhich the mouse is exposed (shaded areas). The yellow line represents the estimated neuronal activity.shaded backgrounds indicate the types of visual stimuli. Using the notation introduced in the previousSections, J = 4 with j = 1 , , corresponding to static grating, natural scene, and natural movie,respectively and j = 4 indicating no stimulus presence.We ran the MCMC algorithm of Section 2.2 using the same prior specification of Section 2.1 for10,000 iterations discarding the first 7000 iterations as burn-in and keeping one iteration every twoto improve mixing. Visual inspection of the traceplots and Geweke diagnostics showed no issueswith convergence. The superimposed light line in Figure 3 represents the estimated neuronal activityin terms of the inferred amplitude A t , i.e. removing the measurement errors and the result of theaccumulation of calcium from the previous spikes. Here and henceforth, we identified the presence ofa spike if the proportion of non-zero A t ’s over all MCMC iterations was greater than 60%.As already mentioned in the Introduction, in calcium imaging it is of interest studying the distribu-tion of the spikes in response to each experimental stimulus, and identifying similarities and differencesin these distributions across stimuli.We start by investigating the presence of similarities in the neuronal response to different typesof visual stimuli. This corresponds to analyzing the clustering of the spike distributions induced bythe proposed fCAM. The model clusters together the groups corresponding to the natural scene andnatural movie stimuli with high posterior probability, while the static grating stimulus and the absenceof stimuli are assigned to two separate distributional clusters. In other terms, the neuron appears toshow similar neuronal responses in the natural scene and natural movie stimuli whereas the responsesappear distinctly different under the other two conditions. tatic grating D en s i t y Natural scene D en s i t y Natural movie D en s i t y Figure 4: Empirical distribution of the posterior means of the observational cluster parameters A t for the threeexperimental conditions of the Allen Brain Observatory data.To understand whether and how the neuronal response depends on the type of stimulus, we es-timated the spike amplitude distribution for each of the four types of stimuli. Figure 4 shows thehistograms of posterior means of the non-zero spike amplitudes for the three types of stimuli. The dis-tribution for the time interval between 1018-1319 sec in Figure 3 (absence of stimuli) is not presentedbecause no activity was detected. Despite the apparent similarities of the distributions in Figure 4,the second cluster of spike amplitude distributions (natural scene and natural movie) shows a heaviertail. Specifically, the highest observed cluster during the static grating stimulus (top plot) is centered at1.04, while for the other two stimuli we obtained several higher values, with the largest cluster centeredaround 1.49.A qualitative representation of how these spike clusters are distributed within the three groups isgiven in Figure 5. The three plots show a short interval of the observed calcium series, chosen in corre-spondence of one of the highest observed spikes. Each plot also shows a series of colored vertical lines:the lines are placed at the estimated spike times, and the colors correspond to the estimated spike am-plitudes. The represented partition is the posterior point estimate obtained by minimizing the variation f information loss, as proposed in Wade & Ghahramani (2018). Conditionally on the obtained parti-tion, for each cluster a representative value for the cluster parameter is obtained as follows: first, foreach MCMC iteration, the group-specific average of A t is computed keeping the partition fixed; then,these values are averaged over all the MCMC iterations. We notice that for all experiments, high val-ues of the observed calcium level are often produced as the result of several consecutive spikes, since,individually, the spikes are characterized by a relatively low amplitude, and the observed calcium levelis cumulated due to its autoregressive behavior. The autoregressive parameter γ has a posterior meanequal to 0.498 with a 95% credible interval of (0.484, 0.508). This result corresponds to the under-standing that the observed calcium response may be generated by high-frequency firing neurons: due tothe low-sampling rate, the non-linear calcium signal essentially captures a super-imposition of multiplespikes (Hoang et al. 2020).As a matter of fact, another useful quantity we can compute to compare the neuronal activitybetween stimuli is the firing rate, which provides a measure of how often the neuron has activatedduring a specific visual stimulus. The rate computes the number of detected spikes per second, to takeinto account the different duration of the experiments. For the static grating stimulus the posteriormean rate (and related 95% credible interval) is .
287 (0 . , . , while for the natural scene andnatural movie stimuli they are .
523 (0 . , . and .
612 (0 . , . , respectively. Thoseresults highlight the role of spike-frequency adaptation, whereby some neurons show an increasedactivity when exposed to more complex stimuli, thus exhibiting higher firing rates and larger calciumconcentration measurements (Peron & Gabbiani 2009). ISCUSSION
Calcium imaging has become widely applied to record the neuronal activity in awake, freely movingand behaving animals. However, reliable spike detection and spike time estimation remain challenging,due to the non-linearity and low signal-to-noise ratio of the calcium response. We have proposed asingle-stage nested Bayesian finite mixture model that allows to estimate the spike activity and alsoto characterize how its distribution varies across stimuli. The method shows good performances in asimulation study and captures characteristic features of neuronal activity in an application to publiclyavailable data from the Allen Brain Observatory.In line with the current literature, our approach is limited to the analysis of the calcium responsesobserved from a single neuron. Methods to infer neuronal connectivity from calcium imaging data overmultiple regions of the brain remain sparse, often limited to point estimates (Mishchenko et al. 2011)or the analysis of in-vitro data (Rigat et al. 2006). Possible extensions of the framework presentedhere may focus on studying the dynamic clustering of temporally correlated groups of neurons, as afunction of external stimuli or the movement of the animal through an environment. In addition to thelow signal-to-noise ratio, issues related to the dimension of the data and the accurate identification ofthe locations of neurons further compound the statistical and computational challenges (Petersen et al. .00.51.01.5 C a l c i u m l e v e l Static grating P r( s p i k e ) C a l c i u m l e v e l Natural scene P r( s p i k e ) C a l c i u m l e v e l Natural movie
Time (seconds) P r( s p i k e ) A Figure 5: Short interval of length 500 of the Allen Brain Observatory data in correspondence of a spike, for the threestimuli. The vertical lines indicate the time of a spike and the colors correspond to the observational cluster of itsamplitude. The bottom panels show the estimated posterior probability of spike presence, for each time point. R EFERENCES
Allen Brain Observatory (2017), ‘Technical whitepaper: stimulus set and response analyses’.Allen Institute for Brain Science (2016), ‘Allen brain observatory’, http://observatory.brain-map.org/visualcoding.Brenner, N., Agam, O., Bialek, W. & de Ruyter van Steveninck, R. (2002), ‘Statistical properties ofspike trains: universal and stimulus-dependent aspects’,
Physical review. E, Statistical, nonlinear,and soft matter physics , 031907.Camerlenghi, F., Dunson, D. B., Lijoi, A., Pr¨unster, I. & Rodr´ıguez, A. (2019), ‘Latent nested non-parametric priors (with discussion)’, Bayesian Analysis (4), 1303–1356.Canale, A., Lijoi, A., Nipoti, B. & Pr¨unster, I. (2017), ‘On the Pitman–Yor process with spike and slabbase measure’, Biometrika (3), 681–697.Dana, H., Sun, Y., Mohar, B., Hulse, B. K., Kerlin, A. M., Hasseman, J. P., Tsegaye, G., Tsang, A.,Wong, A., Patel, R., Macklin, J. J., Chen, Y., Konnerth, A., Jayaraman, V., Looger, L. L., Schreiter,E. R., Svoboda, K. & Kim, D. S. (2019), ‘High-performance calcium sensors for imaging activity inneuronal populations and microcompartments’,
Nature Methods , 649 – 657.Denti, F., Camerlenghi, F., Guindani, M. & Mira, A. (2020), ‘A common atom model for the Bayesiannonparametric analysis of nested data’, arXiv:2008.07077.Dombeck, D. A., Harvey, C. D., Tian, L., Looger, L. L. & Tank, D. W. (2010), ‘Functional imagingof hippocampal place cells at cellular resolution during virtual navigation’, Nature Neuroscience , 1433 – 1440.Friedrich, J. & Paninski, L. (2016), Fast active set methods for online spike inference from calciumimaging, in D. Lee, M. Sugiyama, U. Luxburg, I. Guyon & R. Garnett, eds, ‘Advances In NeuralInformation Processing Systems’, pp. 1984 – 1992.Friedrich, J., Zhou, P. & Paninski, L. (2017), ‘Fast online deconvolution of calcium imaging data’,
PLOS Computational Biology (3), 1 – 26.Fr¨uhwirth-Schnatter, S., Malsiner-Walli, G. & Gr¨un, B. (2020), ‘Generalized mixtures of finite mix-tures and telescoping sampling’, arXiv:2005.09918. rienberger, C. & Konnerth, A. (2012), ‘Imaging calcium in neurons’, Neuron (5), 862 – 885.Hoang, H., Sato, M.-a., Shinomoto, S., Tsutsumi, S., Hashizume, M., Ishikawa, T., Kano, M., Ikegaya,Y., Kitamura, K., Kawato, M. & Toyama, K. (2020), ‘Improved hyperacuity estimation of spiketiming from calcium imaging’, Scientific Reports (1), 17844.Hubert, L. & Arabie, P. (1985), ‘Comparing partitions’, Journal of Classification (336), 193 – 218.Jewell, S. W., Hocking, T. D., Fearnhead, P. & Witten, D. M. (2019), ‘Fast nonconvex deconvolutionof calcium imaging data’, Biostatistics (4), 709–726.Jewell, S. & Witten, D. (2018), ‘Exact spike train inference via L0 optimization’, The Annals of AppliedStatistics (4), 2457 – 2482.Johnson, V. E. & Rossell, D. (2010), ‘On the use of non-local prior densities in Bayesian hypothesistests’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) (2), 143–170.Li, N., Chen, T.-W., Guo, Z. V., Gerfen, C. R. & Svoboda, K. (2015), ‘A motor cortex circuit for motorplanning and movement’, Nature (7541), 51–56.Maruyama, R., Maeda, K., Moroda, H., Kato, I., Inoue, M., Miyakawa, H. & Aonishi, T. (2014),‘Detecting cells using non-negative matrix factorization on calcium imaging data’,
Neural Networks , 11–19.Mishchenko, Y., Vogelstein, J. T. & Paninski, L. (2011), ‘A Bayesian approach for inferring neuronalconnectivity from calcium fluorescent imaging data’, Annals of Applied Statistics (2B), 1229–1261.Mitchell, T. J. & Beauchamp, J. J. (1988), ‘Bayesian variable selection in linear regression’, Journalof the American Statistical Association (404), 1023–1036.Mukamel, E. A., Nimmerjahn, A. & Schnitzer, M. J. (2009), ‘Automated analysis of cellular signalsfrom large-scale calcium imaging data’, Neuron (6), 747–760.Nakajima, M. & Schmitt, L. I. (2020), ‘Understanding the circuit basis of cognitive functions usingmouse models’, Neuroscience Research , 44 – 58.Peron, S. P. & Gabbiani, F. (2009), ‘Role of spike-frequency adaptation in shaping neuronal responseto dynamic stimuli’,
Biological cybernetics (6), 505–520.Petersen, A., Simon, N. & Witten, D. (2018), ‘Scalpel: Extracting neurons from calcium imaging data’,
Annals of Applied Statistics (4), 2430–2456. nevmatikakis, E., Merel, J., Pakman, A. & Paninski, L. (2013), Bayesian spike inference from calciumimaging data, in ‘In Signals, Systems and Computers’, pp. 349 – 353.Prado, R. & West, M. (2010), Time Series: Modeling, Computation, and Inference , 1st edn, Chapmanand Hall.Rand, W. M. (1971), ‘Objective criteria for the evaluation of clustering methods’,
Journal of the Amer-ican Statistical Association (336), 846–850.Rigat, F., de Gunst, M. & van Pelt, J. (2006), ‘Bayesian modelling and analysis of spatio-temporalneuronal networks’, Bayesian Analysis (4), 733–764.Rodr´ıguez, A., Dunson, D. B. & Gelfand, A. E. (2008), ‘The nested Dirichlet process’, Journal of theAmerican Statistical Association (483), 1131–1154.Rose, T., Goltstein, P. M., Portugues, R. & Griesbeck, O. (2014), ‘Putting a finishing touch on gecis’,
Frontiers in Molecular Neuroscience , 88.Shibue, R. & Komaki, F. (2020), ‘Deconvolution of calcium imaging data using marked point pro-cesses’, PLOS Computational Biology (3), 1–25.Vogelstein, J. T., Packer, A. M., Machado, T. A., Sippy, T., Babadi, B., Yuste, R. & Paninski, L. (2010),‘Fast nonnegative deconvolution for spike train inference from population calcium imaging’, Journalof Neurophysiology (6), 3691–3704.Vogelstein, J. T., Watson, B. O., Packer, A. M., Yuste, R., Jedynak, B. & Paninski, L. (2009),‘Spike inference from calcium imaging using sequential Monte Carlo methods’,
Biophysical Journal (2), 636 – 655.Wade, S. & Ghahramani, Z. (2018), ‘Bayesian cluster analysis: point estimation and credible balls(with discussion)’, Bayesian Analysis (2), 559–626. A N
ESTED TELESCOPING SAMPLING
Denote with C D the current partition on the distributions and with C O the partition on the observations.1. Sample the weights on the distributions: ( π , . . . , π K ) | K, α, C D ∼ Dir( e , . . . , e K ) ; where e k = α/K + J k , and J k is the number of groups assigned to the distribution k .2. Sample the weights on the observations: for all k ∈ { , . . . , K } sample a vector ω k from ( ω ,k , . . . , ω L,k ) | L, β, C O , C D ∼ Dir( f ,k , . . . , f L,k ) ; where f l,k = β/L + N l,k , and N l,k is the number of observations in the observational cluster l and distributional cluster k . . Update the partition on the distributions C D by sampling from the posterior distribution of thelatent cluster allocation variables S . For j = 1 , . . . , J Pr( S j = k | π , K, A ∗ , y , g ) ∝ π k (cid:89) t : g t = j ω M t ,S j p ( y t | A ∗ M t ) , with k ∈ { , . . . , K } . Determine J k = { j : S j = k } , for k = 1 . . . , K , and the number ofnon-empty components K + = (cid:80) Kk =1 I { J k > } . Relabel the components so that the first K + are non-empty.4. Update the partition on the observations C O by sampling from the posterior distribution of thelatent cluster allocation variables M . For t = 1 , . . . , T Pr( M t = l | S g t = k, S , ω , L, K, A ∗ , y , g ) ∝ ω l,k p ( y t | A ∗ M t ) , with l ∈ { , . . . , L } , k ∈ { , . . . , K } . Determine N l = { t : M t = l } , for l = 1 . . . , L , and thenumber of non-empty components L + = (cid:80) Ll =1 I { N l > } . Relabel the components so that thefirst L + are non-empty. Because all the mixtures share the same atoms, the cluster parametersare sorted regardless of the distributional cluster allocation.5. Sample the cluster parameters for the non-empty components: p ( A ∗ l | − ) ∝ p ( A ∗ l ) (cid:81) t : M t = l p ( y t | A ∗ l ) .6. Conditional on C D , sample the number of components K of the mixture on distributions.7. Conditional on C O , sample the number of components L of the mixtures on observations. If L > L + , sample a new parameter A ∗ for the empty components from the prior distribution.8. Update the hyperparameter α on the Dirichlet distribution on the mixture weights on distribu-tions.9. Update the hyperparameter β on the Dirichlet distribution on the mixture weights on observa-tions.The posterior distributions for steps 6-9 are given in Fr¨uhwirth-Schnatter et al. (2020).on the Dirichlet distribution on the mixture weights on observa-tions.The posterior distributions for steps 6-9 are given in Fr¨uhwirth-Schnatter et al. (2020).