A Bayesian approach for inferring neuronal connectivity from calcium fluorescent imaging data
aa r X i v : . [ s t a t . A P ] J u l The Annals of Applied Statistics © Institute of Mathematical Statistics, 2011
A BAYESIAN APPROACH FOR INFERRING NEURONALCONNECTIVITY FROM CALCIUM FLUORESCENTIMAGING DATA
By Yuriy Mishchencko, Joshua T. Vogelstein and LiamPaninski Columbia University, Johns Hopkins University and Columbia University
Deducing the structure of neural circuits is one of the centralproblems of modern neuroscience. Recently-introduced calcium fluo-rescent imaging methods permit experimentalists to observe networkactivity in large populations of neurons, but these techniques provideonly indirect observations of neural spike trains, with limited timeresolution and signal quality. In this work we present a Bayesian ap-proach for inferring neural circuitry given this type of imaging data.We model the network activity in terms of a collection of coupled hid-den Markov chains, with each chain corresponding to a single neuronin the network and the coupling between the chains reflecting the net-work’s connectivity matrix. We derive a Monte Carlo Expectation–Maximization algorithm for fitting the model parameters; to obtainthe sufficient statistics in a computationally-efficient manner, we in-troduce a specialized blockwise-Gibbs algorithm for sampling fromthe joint activity of all observed neurons given the observed fluo-rescence data. We perform large-scale simulations of randomly con-nected neuronal networks with biophysically realistic parameters andfind that the proposed methods can accurately infer the connectivityin these networks given reasonable experimental and computationalconstraints. In addition, the estimation accuracy may be improvedsignificantly by incorporating prior knowledge about the sparsenessof connectivity in the network, via standard L penalization methods.
1. Introduction.
Since Ramon y Cajal discovered that the brain is a richand dense network of neurons [Ramon y Cajal (1904, 1923)], neuroscientistshave been intensely curious about the details of these networks, which are
Received September 2009; revised October 2009. Supported by NIDCD Grant DC00109. Supported by a NSF CAREER and McKnight Scholar award.
Key words and phrases.
Sequential Monte Carlo, Metropolis–Hastings, spike traindata, point process, generalized linear model.
This is an electronic reprint of the original article published by theInstitute of Mathematical Statistics in
The Annals of Applied Statistics ,2011, Vol. 5, No. 2B, 1229–1261. This reprint differs from the original inpagination and typographic detail. 1
Y. MISHCHENKO, J. T. VOGELSTEIN AND L. PANINSKI believed to be the biological substrate for memory, cognition and perception.While we have learned a great deal in the last century about “macro-circuits”(the connectivity between coarsely-defined brain areas), a number of keyquestions remain open about “micro-circuit” structure, that is, the connec-tivity within populations of neurons at a fine-grained cellular level. Twocomplementary strategies for investigating micro-circuits have been pursuedextensively.
Anatomical approaches to inferring circuitry do not rely on ob-serving neural activity; some recent exciting examples include array tomog-raphy [Micheva and Smith (2007)], genetic “brainbow” approaches [Livetet al. (2007)], and serial electron microscopy [Briggman and Denk (2006)].Our work, on the other hand, takes a functional approach: our aim is to infermicro-circuits by observing the simultaneous activity of a population of neu-rons, without making direct use of fine-grained anatomical measurements.Experimental tools that enable simultaneous observations of the activ-ity of many neurons are now widely available. While arrays of extracellularelectrodes have been exploited for this purpose [Hatsopoulos et al. (1998);Harris et al. (2003); Stein et al. (2004); Santhanam et al. (2006); Luczaket al. (2007)], the arrays most often used in vivo are inadequate for in-ferring monosynaptic connectivity in large populations of neurons, as theinter-electrode spacing is typically too large to record from closely neighbor-ing neurons; importantly, neighboring neurons are more likely connectedto one another than distant neurons [Abeles (1991); Braitenberg and Schuz(1998)]. Alternately, calcium-sensitive fluorescent indicators allow us to ob-serve the spiking activity of on the order of 10 neighboring neurons [Tsien(1989); Yuste et al. (2006); Cossart, Aronov and Yuste (2003); Ohki et al.(2005)] within a micro-circuit. Some organic dyes achieve sufficiently highsignal-to-noise ratios (SNR) that individual action potentials (spikes) maybe resolved [Yuste et al. (2006)], and bulk-loading techniques enable ex-perimentalists to simultaneously fill populations of neurons with such dyes[Stosiek et al. (2003)]. In addition, genetically encoded calcium indicatorsare under rapid development in a number of groups, and are approachingSNR levels of nearly single spike accuracy as well [Wallace et al. (2008)].Microscopy technologies for collecting fluorescence signals are also rapidlydeveloping. Cooled CCDs for wide-field imaging (either epifluorescence orconfocal) now achieve a quantum efficiency of ≈
90% with frame rates up to60 Hz or greater, depending on the field of view [Djurisic et al. (2004)]. Forin vivo work, 2-photon laser scanning microscopy can achieve similar frame It is worth noting, however, that multielectrode arrays which have been recently de-veloped for use in the retina [Segev et al. (2004); Litke et al. (2004); Petrusca et al.(2007); Pillow et al. (2008)] or in cell culture [Lei et al. (2008)] are capable of much densersampling.NFERRING NEURONAL CONNECTIVITY FROM CALCIUM IMAGING rates, using either acoustic-optical deflectors to focus light at arbitrary lo-cations in three-dimensional space [Iyer, Hoogland and Saggau (2006); Sa-lome et al. (2006); Reddy et al. (2008)] or resonant scanners [Nguyen et al.(2001)]. Together, these experimental tools can provide movies of calciumfluorescence transients from large networks of neurons with adequate SNR,at imaging frequencies of 30 Hz or greater, in both in vitro and in vivopreparations.Given these experimental advances in functional neural imaging, our goalis to develop efficient computational and statistical methods to exploit thisdata for the analysis of neural connectivity; see Figure 1 for a schematic Fig. 1.
Schematic overview. The raw observed data is a large-scale calcium fluorescencemovie, which is pre-processed to correct for movement artifacts and find regions-of-in-terest, that is, putative neurons. [Note that we have omitted details of these importantpreprocessing steps in this paper; see, for example, Cossart, Aronov and Yuste (2003);Dombeck et al. (2007) for further details.] Given the fluorescence traces F i ( t ) from eachneuron, we estimate the underlying spike trains (i.e., the time series of neural activity)using statistical deconvolution methods. Then we estimate the parameters of a networkmodel given the observed data. Our major goal is to obtain an accurate estimate of thenetwork connectivity matrix, which summarizes the information we are able to infer aboutthe local neuronal microcircuit. (We emphasize that this illustration is strictly schematic,and does not correspond directly to any of the results described below.) This figure adaptedfrom personal communications with R. Yuste, B. Watson and A. Packer. Y. MISHCHENKO, J. T. VOGELSTEIN AND L. PANINSKI overview. One major challenge here is that calcium transients due to actionpotentials provide indirect observations, and decay about an order of mag-nitude slower than the time course of the underlying neural activity [Yusteet al. (2006); Roxin, Hakim and Brunel (2008)]. Thus, to properly analyzethe network connectivity, we must incorporate methods for effectively de-convolving the observed noisy fluorescence signal to obtain estimates of theunderlying spiking rates [Yaksi and Friedrich (2006); Greenberg, Houwel-ing and Kerr (2008); Vogelstein et al. (2009)]. To this end, we introducea coupled Markovian state-space model that relates the observed variables(fluorescence traces from the neurons in the microscope’s field of view) to thehidden variables of interest (the spike trains and intracellular calcium con-centrations of these neurons), as governed by a set of biophysical parametersincluding the network connectivity matrix. As discussed in [Vogelstein et al.(2009)], this parametric approach effectively introduces a number of con-straints on the hidden variables, leading to significantly better performancethan standard blind deconvolution approaches. Given this state-space model,we derive a Monte Carlo Expectation–Maximization algorithm for obtainingthe maximum a posteriori estimates of the parameters of interest. Standardsampling procedures (e.g., Gibbs sampling or sequential Monte Carlo) areinadequate in this setting, due to the high dimensionality and nonlinear,non-Gaussian dynamics of the hidden variables; we therefore develop a spe-cialized blockwise-Gibbs approach for efficiently computing the sufficientstatistics. This strategy enables us to accurately infer the connectivity ma-trix from large simulated neural populations, under realistic assumptionsabout the dynamics and observation parameters.
2. Methods.
Model.
We begin by detailing a parametric generative model for the(unobserved) joint spike trains of all N observable neurons, along with theobserved calcium fluorescence data. Each neuron is modeled as a generalizedlinear model (GLM). This class of models is known to capture the statisti-cal firing properties of individual neurons fairly accurately [Brillinger (1988);Chornoboy, Schramm and Karr (1988); Brillinger (1992); Plesser and Gerst-ner (2000); Paninski et al. (2004); Paninski (2004); Rigat, de Gunst and vanPelt (2006); Truccolo et al. (2005); Nykamp (2007); Kulkarni and Paninski(2007); Pillow et al. (2008); Vidne et al. (2009); Stevenson et al. (2009)].We denote the i th neuron’s activity at time t as n i ( t ): in continuous time, n i ( t ) could be modeled as an unmarked point process, but we will takea discrete-time approach here, with each n i ( t ) taken to be a binary randomvariable. We model the spiking probability of neuron i via an instantaneousnonlinear function, f ( · ), of the filtered and summed input to that neuron atthat time, J i ( t ). This input is composed of the following: (i) some baseline NFERRING NEURONAL CONNECTIVITY FROM CALCIUM IMAGING value, b i ; (ii) some external vector stimulus, S ext ( t ), that is linearly filteredby k i ; and (iii) spike history terms, h ij ( t ), encoding the influence on neuron i from neuron j , weighted by w ij : n i ( t ) ∼ Bernoulli[ f ( J i ( t ))] , J i ( t ) = b i + k i · S ext ( t ) + N X j =1 w ij h ij ( t ) . (1)To ensure computational tractability of the parameter inference problem,we must impose some reasonable constraints on the instantaneous nonlinear-ity f ( · ) (which plays the role of the inverse of the link function in the stan-dard GLM setting) and on the dynamics of the spike-history effects h ij ( t ).First, we restrict our attention to functions f ( · ) which ensure the concavityof the spiking loglikelihood in this model [Paninski (2004); Escola and Panin-ski (2011)], as we will discuss at more length below. In this paper we use f ( J ) = P [ n > | n ∼ Poiss( e J ∆)] = 1 − exp[ − e J ∆](2)(Figure 2), where the inclusion of ∆, the time step size, ensures that thefiring rate scales properly with respect to the time discretization; see [Escolaand Paninski (2011)] for a proof that this f ( · ) satisfies the required concav-ity constraints. However, we should note that in our experience the resultsdepend only weakly on the details of f ( · ) within the class of log-concavemodels [Li and Duan (1989); Paninski (2004)] (see also Section 3.4 below).Second, because the algorithms we develop below assume Markovian dy-namics, we model the spike history terms as autoregressive processes driven Fig. 2.
A plot of the firing rate nonlinearity f ( J ) used in our simulations. Note that thefiring rate saturates at / ∆ , because of our Bernoulli assumption (i.e., the spike countper bin is at most one). Here the binwidth ∆ = (60 Hz ) − . The horizontal gray line in-dicates 5 Hz, the baseline firing rate for most of the simulations discussed in the Resultssection. Y. MISHCHENKO, J. T. VOGELSTEIN AND L. PANINSKI by the spike train n j ( t ): h ij ( t ) = (1 − ∆ /τ hij ) h ij ( t − ∆) + n j ( t − ∆) + σ hij √ ∆ ε hij ( t ) , (3)where τ hij is a decay time constant, σ hij is a standard deviation parame-ter, √ ∆ ensures that the statistics of this Markov process have a properOrnstein–Uhlenbeck limit as ∆ →
0, and throughout this paper, ε denotesan independent standard normal random variable. Note that this model gen-eralizes [via a simple augmentation of the state variable h ij ( t )] to allow eachneuron pair to have several spike history terms, each with a unique timeconstant, which when weighted and summed allow us to model a wide va-riety of possible post-synaptic effects, including bursting, facilitating, anddepressing synapses; see [Vogelstein et al. (2009)] for further details. We re-strict our attention to the case of a single time constant τ hij per synapse here,so the deterministic part of h ij ( t ) is a simple exponentially-filtered versionof the spike train n j ( t ). Furthermore, we assume that τ hij is the same forall neurons and all synapses, although, in principle, each synapse could bemodeled with its unique τ hij . We do that both for simplicity and also becausewe find that the detailed shape of the coupling terms h ij ( t ) had a limitedeffect on the inference of the connectivity matrix, as illustrated in Figure 12below. Thus, we treat τ hij and σ hij as known synaptic parameters which arethe same for each neuron pair ( i, j ), and denote them as τ h and σ h hereafter.We chose values for τ h and σ h in our inference based on experimental data[Lefort et al. (2009)]; see Table 1 below. Therefore, our unknown spikingparameters are { w i , k i , b i } i ≤ N , with w i = ( w i , . . . , w iN ).The problem of estimating the connectivity parameters w = { w i } i ≤ N inthis type of GLM, given a fully-observed ensemble of neural spike trains { n i ( t ) } i ≤ N , has recently received a great deal of attention; see the refer-ences above for a partial list. In the calcium fluorescent imaging setting,however, we do not directly observe spike trains; { n i ( t ) } i ≤ N must be con-sidered a hidden variable here. Instead, each spike in a given neuron leads toa rapid increase in the intracellular calcium concentration, which then de-cays slowly due to various cellular buffering and extrusion mechanisms. Wein turn make only noisy, indirect, and subsampled observations of this in-tracellular calcium concentration, via fluorescent imaging techniques [Yusteet al. (2006)]. To perform statistical inference in this setting, [Vogelsteinet al. (2009)] proposed a simple conditional first-order hidden Markov model(HMM) for the intracellular calcium concentration C i ( t ) in cell i at time t ,along with the observed fluorescence, F i ( t ): C i ( t ) = C i ( t − ∆) + ( C bi − C i ( t − ∆))∆ /τ ci + A i n i ( t ) + σ ci √ ∆ ε ci ( t ) , (4) F i ( t ) = α i S ( C i ( t )) + β i + q ( σ Fi ) + γ i S ( C i ( t )) ε Fi ( t ) . (5) NFERRING NEURONAL CONNECTIVITY FROM CALCIUM IMAGING This model can be interpreted as a simple driven autoregressive process:under nonspiking conditions, C i ( t ) fluctuates around the baseline level of C bi ,driven by normally-distributed noise ε ci ( t ) with standard deviation σ ci √ ∆.Whenever the neuron fires a spike, n i ( t ) = 1, the calcium variable C i ( t )jumps by a fixed amount A i , and subsequently decays with time constant τ ci .The fluorescence signal F i ( t ) corresponds to the count of photons collectedat the detector per neuron per imaging frame. This photon count may bemodeled with normal statistics, with the mean given by a saturating Hill-type function S ( C ) = C/ ( C + K d ) [Yasuda et al. (2004)] and the variancescaling with the mean; see [Vogelstein et al. (2009)] for further discussion.Because the parameter K d effectively acts as a simple scale factor, and isa property of the fluorescent indicator, we assume throughout this work thatit is known. Figure 3 shows a couple examples depicting the relationshipbetween spike trains and observations. It will be useful to define an effectiveSNR as eSNR = E [ F i ( t ) − F i ( t − ∆) | n i ( t ) = 1] E [( F i ( t ) − F i ( t − ∆)) / | n i ( t ) = 0] / , (6)that is, the size of a spike-driven fluorescence jump divided by a rough mea-sure of the standard deviation of the baseline fluorescence. For concreteness,the effective SNR values in Figure 3 were 9 and 3 in the left and right panels,respectively.To summarize, equations (1)–(5) define a coupled HMM: the underly-ing spike trains { n i ( t ) } i ≤ N and spike history terms { h ij ( t ) } i,j ≤ N evolve ina Markovian manner given the stimulus S ext ( t ). These spike trains in turndrive the intracellular calcium concentrations { C i ( t ) } i ≤ N , which are them-selves Markovian, but evolving at a slower timescale τ ci . Finally, we ob- Fig. 3.
Two example traces of simulated fluorescence data, at different SNR levels,demonstrating the relationship between spike trains and observed fluorescence in ourmodel. Note that both panels have the same underlying spike train. Simulation parameters: k i = 0 . , C bi = 1 µ M, τ ci = 500 msec, A i = 50 µ M, σ ci = 0 . µ M, γ i = 0 . [effective SNR ≈ , as defined in equation (6) ; see also Figure below] in the left panel and γ i = 0 . (eSNR ≈ ) in the right panel, and σ Fi = 0 , ∆ = (60 Hz ) − . Y. MISHCHENKO, J. T. VOGELSTEIN AND L. PANINSKI serve only the fluorescence signals { F i ( t ) } i ≤ N , which are related in a simpleMarkovian fashion to the calcium variables { C i ( t ) } i ≤ N .2.2. Goal and general strategy.
Our primary goal is to estimate the con-nectivity matrix, w , given the observed set of calcium fluorescence signals F = { F i } i ≤ N , where F i = { F i ( t ) } t ≤ T . We must also deal with a numberof intrinsic parameters, ˜ θ i : the intrinsic spiking parameters { b i , w ii } i ≤ N ,the calcium parameters { C bi , τ ci , A i , σ ci } i ≤ N , and the observation parameters { α i , β i , γ i , σ Fi } i ≤ N . We addressed the problem of estimating these intrin-sic parameters in earlier work [Vogelstein et al. (2009)]; thus, our focushere will be on the connectivity matrix w . A Bayesian approach is nat-ural here, since we have a good deal of prior information about neuralconnectivity; see [Rigat, de Gunst and van Pelt (2006)] for a related dis-cussion. However, a fully-Bayesian approach, in which we numerically inte-grate over the very high-dimensional parameter space θ = { θ i } i ≤ N , where θ i = { w i , b i , C bi , τ ci , A i , σ ci , α i , β i , γ i , σ Fi } , is less attractive from a computa-tional point of view. Thus, our compromise is to compute maximum a poste-riori (MAP) estimates for the parameters via an expectation–maximization(EM) algorithm, in which the sufficient statistics are computed by a hybridblockwise Gibbs sampler and sequential Monte Carlo (SMC) method. Morespecifically, we iterate the steps:E-step: Evaluate Q ( θ, θ ( l ) ) = E P [ X | F ; θ ( l ) ] ln P [ F , X | θ ] = R P [ X | F ; θ ( l ) ] ln P [ F , X | θ ] d X ;M-step: Solve θ ( l +1) = argmax θ { Q ( θ, θ ( l ) ) + ln P ( θ ) } ,where X denotes the set of all hidden variables { C i ( t ) , n i ( t ) , h ij ( t ) } i,j ≤ N,t ≤ T and P ( θ ) denotes a (possibly improper) prior on the parameter space θ . Ac-cording to standard EM theory [Dempster, Laird and Rubin (1977); McLach-lan and Krishnan (1996)], each iteration of these two steps is guaranteed toincrease the log-posterior ln P ( θ ( l ) | F ), and will therefore lead to at leasta locally maximum a posteriori estimator.Now, our major challenge is to evaluate the auxiliary function Q ( θ, θ ( l ) )in the E-step. Our model is a coupled HMM, as discussed in the previoussection; therefore, as usual in the HMM setting [Rabiner (1989)], Q may bebroken up into a sum of simpler terms: Q ( θ, θ ( l ) ) = X it Z ln P [ F i ( t ) | C i ( t ); α i , β i , γ i , σ Fi ] dP [ C i ( t ) | F ; θ ( l ) ] The intrinsic parameters for neuron i are all its parameters minus the cross-couplingterms, that is, ˜ θ i = θ i \{ w ij } i = j . To reduce the notational load, we will ignore the estimation of the stimulus filter k i below; this term may be estimated with b i and w ii using very similar convex optimizationmethods, as discussed in [Vogelstein et al. (2009)].NFERRING NEURONAL CONNECTIVITY FROM CALCIUM IMAGING + X it Z ln P [ C i ( t ) | C i ( t − ∆) , (7) n i ( t ); C bi , τ ci , A i , σ ci ] dP [ C i ( t ) , C i ( t − ∆) | F ; θ ( l ) ]+ X it Z ln P [ n i ( t ) | h i ( t ); b i , w i ] dP [ n i ( t ) , h i ( t ) | F ; θ ( l ) ] , where h i ( t ) = { h ij ( t ) } j ≤ N . Note that each of the three sums here correspondsto a different component of the model described in equations (1)–(5): thefirst sum involves the fluorescent observation parameters, the second thecalcium dynamics, and the third the spiking dynamics.Thus, we need only compute low-dimensional marginals of the full pos-terior distribution P [ X | F ; θ ]; specifically, we need the pairwise marginals P [ C i ( t ) | F ; θ ], P [ C i ( t ) , C i ( t − ∆) | F ; θ ], and P [ n i ( t ) , h i ( t ) | F ; θ ]. Details for cal-culating P [ C i ( t ) , C i ( t − ∆) | F i ; ˜ θ i ] and P [ C i ( t ) | F i ; ˜ θ i ] are found in [Vogelsteinet al. (2009)], while calculating the joint marginal for the high-dimensionalhidden variable h i necessitates the development of specialized blockwiseGibbs-SMC sampling methods, as we describe in the subsequent Sections 2.3and 2.4. Once we have obtained these marginals, the M-step breaks up intoa number of independent optimizations that may be computed in paralleland which are therefore relatively straightforward (Section 2.5); see Sec-tion 2.6 for a pseudocode summary along with some specific implementationdetails.2.3. Initialization of intrinsic parameters via sequential Monte Carlo meth-ods.
We begin by constructing relatively cheap, approximate preliminaryestimators for the intrinsic parameters, ˜ θ i . The idea is to initialize our es-timator by assuming that each neuron is observed independently. Thus, wewant to compute P [ C i ( t ) , C i ( t − ∆) | F i ; ˜ θ i ] and P [ C i ( t ) | F i ; ˜ θ i ], and solve theM-step for each ˜ θ i , with the connectivity matrix parameters held fixed. Thissingle-neuron case is much simpler, and has been discussed at length in [Vo-gelstein et al. (2009)]; therefore, we only provide a brief overview here. Thestandard forward and backward recursions provide the necessary posteriordistributions, in principle [Shumway and Stoffer (2006)]: P [ X i ( t ) | F i (0 : t )] ∝ P [ F i ( t ) | X i ( t )] Z P [ X i ( t ) | X i ( t − ∆)](8) × P [ X i ( t − ∆) | F i (0 : t − ∆)] dX i ( t − ∆) ,P [ X i ( t ) , X i ( t − ∆) | F i ]= P [ X i ( t ) | F i ](9) Y. MISHCHENKO, J. T. VOGELSTEIN AND L. PANINSKI × P [ X i ( t ) | X i ( t − ∆)] P [ X i ( t − ∆) | F i (0 : t − ∆)] R P [ X i ( t ) | X i ( t − ∆)] P [ X i ( t − ∆) | F i (0 : t − ∆)] dX i ( t − ∆) , where F i ( s : t ) denotes the time series F i from time points s to t , and we havedropped the conditioning on the parameters for brevity’s sake. Equation (8)describes the forward (filter) pass of the recursion, and equation (9) describesthe backward (smoother) pass, providing both P [ X i ( t ) , X i ( t − ∆) | F i ] and P [ X i ( t ) | F i ] [obtained by marginalizing over X i ( t − ∆)].Because these integrals cannot be analytically evaluated for our model,we approximate them using a SMC (“marginal particle filtering”) method[Doucet, Godsill and Andrieu (2000); Doucet, de Freitas and Gordon (2001);Godsill, Doucet and West (2004)]. More specifically, we replace the forwarddistribution with a particle approximation: P [ X i ( t ) | F i (0 : t )] ≈ M X m =1 p ( m ) f ( t ) δ [ X i ( t ) − X ( m ) i ( t )] , (10)where m = 1 , . . . , M indexes the M particles in the set ( M was typicallyset to about 50 in our experiments), p ( m ) f ( t ) corresponds to the relative“forward” probability of X i ( t ) = X ( m ) i ( t ), and δ [ · ] indicates a Dirac mass.Instead of using the analytic forward recursion, equation (8), at each timestep, we update the particle weights using the particle forward recursion p ( m ) f ( t ) = P [ F i ( t ) | X ( m ) i ( t )] P [ X ( m ) i ( t ) | X ( m ) i ( t − ∆)] p ( m ) f ( t − ∆) q [ X ( m ) i ( t )] , (11)where q [ X ( m ) i ( t )] is the proposal density from which we sample the par-ticle positions X ( m ) i ( t ). In this work we use the “one-step-ahead” sam-pler [Doucet, Godsill and Andrieu (2000); Vogelstein et al. (2009)], thatis, q [ X ( m ) i ( t )] = P [ X ( m ) i ( t ) | X ( m ) i ( t − ∆) , F i ( t )]. After sampling and comput-ing the weights, we use stratified resampling [Douc, Cappe and Moulines(2005)] to ensure the particles accurately approximate the desired distribu-tion. Once we complete the forward recursion from t = 0 , . . . , T , we beginthe backward pass from t = T, . . . ,
0, using r ( m,m ′ ) ( t, t − ∆) = p ( m ) b ( t ) P [ X ( m ) i ( t ) | X ( m ′ ) i ( t − ∆)] p ( m ) f ( t − ∆) P m ′ P [ X ( m ) i ( t ) | X ( m ′ ) i ( t − ∆)] p ( m ′ ) f ( t − ∆) , (12) p ( m ′ ) b ( t − ∆) = M X j =1 r ( m,m ′ ) ( t, t − ∆) , (13) NFERRING NEURONAL CONNECTIVITY FROM CALCIUM IMAGING to obtain the approximation P [ X i ( t ) , X i ( t − ∆) | F i ] ≈ X m,m ′ r ( m,m ′ ) i ( t, t − ∆) δ [ X i ( t ) − X ( m ) i ( t )](14) × δ [ X i ( t − ∆) − X ( m ′ ) i ( t − ∆)]for more details, see [Vogelstein et al. (2009)]. Thus, equations (10)–(14)may be used to compute the sufficient statistics for estimating the intrinsicparameters ˜ θ i for each neuron.As discussed following equation (7), the M-step decouples into three in-dependent subproblems. The first term depends on only { α i , β i , γ i , σ i } ; since P [ F i ( t ) | S ( C i ( t )); ˜ θ i ] is Gaussian, we can estimate these parameters by solv-ing a weighted regression problem (specifically, we use a coordinate–optimi-zation approach: we solve a quadratic problem for { α i , β i } while holding { γ i , σ i } fixed, then estimate { γ i , σ i } by the usual residual error formulaswhile holding { α i , β i } fixed). Similarly, the second term requires us to opti-mize over { τ ci , A i , C bi } , and then we use the residuals to estimate σ ci . Notethat all the parameters mentioned so far are constrained to be non-negative,but may be solved efficiently using standard quadratic program solvers if weuse the simple reparameterization τ ci → − ∆ /τ ci . Finally, the last term maybe expanded: E [ln P [ n i ( t ) , h i ( t ) | F ; θ i ]]= P [ n i ( t ) , h i ( t ) | F ; θ i ] ln f [ J i ( t )](15) + (1 − P [ n i ( t ) , h i ( t ) | F ; θ i ]) ln[1 − f ( J i ( t ))];since J i ( t ) is a linear function of { b i , w i } , and the right-hand side of equa-tion (15) is concave in J i ( t ), we see that the third term in equation (7) isa sum of terms which are concave in { b i , w i } —and therefore also concavein the linear subspace { b i , w ii } with { w ij } i = j held fixed—and may thus bemaximized efficiently using any convex optimization method, for example,Newton–Raphson or conjugate gradient ascent.Our procedure therefore is to initialize the parameters for each neuronusing some default values that we have found to be effective in practice inanalyzing real data, and then iteratively (i) estimate the marginal posteri-ors via the SMC recursions (10)–(14) (E-step), and (ii) maximize over theintrinsic parameters ˜ θ i (M-step), using the separable convex optimizationapproach described above. We iterate these two steps until the change in ˜ θ i does not exceed some minimum threshold. We then use the marginal poste-riors from the last iteration to seed the blockwise Gibbs sampling proceduredescribed below for approximating P [ n i , h i | F ; θ i ]. Y. MISHCHENKO, J. T. VOGELSTEIN AND L. PANINSKI
Estimating joint posteriors over weakly coupled neurons.
Now weturn to the key problem: constructing an estimate of the joint marginals { P [ n i ( t ) , h i ( t ) | F ; θ ] } i ≤ N,t ≤ T , which are the sufficient statistics for estimatingthe connectivity matrix w [recall equation (7)]. The SMC method describedin the preceding section only provides the marginal distribution over a sin-gle neuron’s hidden variables; this method may in principle be extended toobtain the desired full posterior P [ X ( t ) , X ( t − ∆) | F ; θ ], but SMC is fun-damentally a sequential importance sampling method, and therefore scalespoorly as the dimensionality of the hidden state X ( t ) increases [Bickel, Liand Bengtsson (2008)]. Thus, we need a different approach.One very simple idea is to use a Gibbs sampler: sample sequentially from X i ( t ) ∼ P [ X i ( t ) | X \ i , X i (0) , . . . , X i ( t − ∆) , X i ( t + ∆) , . . . , X i ( T ) , F ; θ ] , (16)looping over all cells i and all time bins t . Unfortunately, this approach islikely to mix poorly, due to the strong temporal dependence between X i ( t )and X i ( t + ∆). Instead, we propose a blockwise Gibbs strategy, samplingone spike train as a block: X i ∼ P [ X i | X \ i , F ; θ ] . (17)If we can draw these blockwise samples X i = X i ( s : t ) efficiently for a largesubset of t − s adjacent time-bins simultaneously, then we would expect theresulting Markov chain to mix much more quickly than the single-elementGibbs chain. This follows due to the weak dependence between X i and X j when i = j , and the fact that Gibbs is most efficient for weakly-dependentvariables [Robert and Casella (2005)].So, how can we efficiently sample from P [ X i | X \ i , F ; θ ]? One attractiveapproach is to try to re-purpose the SMC method described above, which isquite effective for drawing approximate samples from P [ X i | X \ i , F i ; θ ] for oneneuron i at a time. Recall that sampling from an HMM is in principle easyby the “propagate forward, sample backward” method: we first compute theforward probabilities P [ X i ( t ) | X \ i (0 : t ) , F i (0 : t ); θ ] recursively for timesteps t = 0 up to T , then sample backward from P [ X i ( t ) | X \ i (0 : T ) , F i (0 : T ) , X i ( t − ∆); θ ]. This approach is powerful because each sample requires just lineartime to compute [i.e., O ( T / ∆) time, where
T / ∆ is the number of desiredtime steps]. Unfortunately, in this case we can only compute the forwardprobabilities approximately (via equations (10)–(11)), and so therefore thisattractive forward-backward approach only provides approximate samplesfrom P [ X i | X \ i , F ; θ ], not the exact samples required for the validity of theGibbs method.Of course, in principle, we should be able to use the Metropolis–Hastings(M–H) algorithm to correct these approximate samples. The problem is thatthe M–H acceptance ratio in this setting involves a high-dimensional integral NFERRING NEURONAL CONNECTIVITY FROM CALCIUM IMAGING over the set of paths that the particle filter might possibly trace out, andis therefore difficult to compute directly. [Andrieu, Doucet and Holenstein(2007)] discuss this problem at more length, along with some proposed solu-tions. A slightly simpler approach was introduced by [Neal, Beal and Roweis(2003)]. Their idea is to exploit the O ( T / ∆) forward-backward samplingmethod by embedding a discrete Markov chain within the continuous statespace X t on which X i ( t ) is defined; the state space of this discrete embeddedchain is sampled randomly according to some distribution ρ t with support on X t . It turns out that an appropriate Markov chain (incorporating the orig-inal state space model transition and observation probabilities, along withthe auxiliary sampling distributions ρ t ) may be constructed quite tractably,guaranteeing that the samples produced by this algorithm have the desiredequilibrium density. See [Neal, Beal and Roweis (2003)] for details.We can apply this embedded-chain method directly here to sample from P [ X i | X \ i , F ; θ ]. The one remaining question is how to choose the auxil-iary densities ρ t . We would like to choose these densities to be close to thedesired marginal densities P [ X i ( t ) | X \ i , F ; θ ], and conveniently, we have al-ready computed a good (discrete) approximation to these densities, usingthe SMC methods described in the last section. The algorithm described in[Neal, Beal and Roweis (2003)] requires the densities ρ t to be continuous,so we simply convolve our discrete SMC-based approximation [specifically,the X i ( t )-marginal of equation (14)] with an appropriate normal density toarrive at a very tractable mixture-of-Gaussians representation for ρ t .Thus, to summarize, our procedure for approximating the desired jointstate distributions P [ n i ( t ) , h i ( t ) | F ; θ ] has a Metropolis-within-blockwise-Gibbs flavor, where the internal Metropolis step is replaced by the O ( T / ∆)embedded-chain method introduced by [Neal, Beal and Roweis (2003)], andthe auxiliary densities ρ t necessary for implementing the embedded-chainsampler are obtained using the SMC methods from [Vogelstein et al. (2009)].2.4.1. A factorized approximation of the joint posteriors.
If the SNR inthe calcium imaging is sufficiently high, then, by definition, the observed flu-orescence data F i will provide enough information to determine the underly-ing hidden variables X i . Thus, in this case the joint posterior approximatelyfactorizes into a product of marginals for each neuron i : P [ X | F ; θ ] ≈ Y i ≤ N P [ X i | F ; ˜ θ i ] . (18)We can take advantage of this because we have already estimated all themarginals on the right-hand side using the approximate SMC methods inSection 2.3. This factorized approximation entails a significant gain in ef-ficiency for two reasons: first, it obviates the need to generate joint sam-ples via the expensive blockwise-Gibbs approach described above; and sec-ond, because we can easily parallelize the SMC step, inferring the marginals Y. MISHCHENKO, J. T. VOGELSTEIN AND L. PANINSKI P [ X i ( t ) | F i ; ˜ θ i ] and estimating the parameters θ i for each neuron on a sepa-rate processor. We will discuss the empirical accuracy of this approximationin Section 3.2.5. Estimating the connectivity matrix.
Computing the M-step for theconnectivity matrix, w , is an optimization problem with on the order of N variables. The auxiliary function equation (7) is concave in w , and decom-poses into N separable terms that may be optimized independently usingstandard ascent methods. To improve our estimates, we will incorporatetwo sources of strong a priori information via our prior P ( w ): first, previousanatomical studies have established that connectivity in many neuroanatom-ical substrates is “sparse,” that is, most neurons form synapses with onlya fraction of their neighbors [Buhl, Halasy and Somogyi (1994); Thompson,Girdlestone and West (1988); Reyes et al. (1998); Feldmeyer et al. (1999);Gupta, Wang and Markram (2000); Feldmeyer and Sakmann (2000); Pe-tersen and Sakmann (2000); Binzegger, Douglas and Martin (2004); Songet al. (2005); Mishchenko et al. (2009)], implying that many elements of theconnectivity matrix w are zero; see also [Paninski (2004); Rigat, de Gunstand van Pelt (2006); Pillow et al. (2008); Stevenson et al. (2008)] for furtherdiscussion. Second, “Dale’s law” states that each of a neuron’s postsynapticconnections in the adult cortex (and many other brain areas) must all be ofthe same sign (either excitatory or inhibitory). Both of these priors are easyto incorporate in the M-step optimization, as we discuss below.2.5.1. Imposing a sparse prior on the connectivity.
It is well known thatimposing sparseness via an L -regularizer can dramatically reduce the amountof data necessary to accurately reconstruct sparse high-dimensional parame-ters [Tibshirani (1996); Tipping (2001); Donoho and Elad (2003); Ng (2004);Candes and Wakin (2008); Mishchenko (2009)]. We incorporate a prior ofthe form ln p ( w ) = const − λ P i,j | w ij | , and additionally enforce the con-straints | w ij | < L , for a suitable constant L (since both excitatory and in-hibitory cortical connections are known to be bounded in size). Since thepenalty ln p ( w ) is concave, and the constraints | w ij | < L are convex, we maysolve the resulting optimization problem in the M-step using standard con-vex optimization methods [Boyd and Vandenberghe (2004)]. In addition, theproblem retains its separable structure: the full optimization may be brokenup into N smaller problems that may be solved independently.2.5.2. Imposing Dale’s law on the connectivity.
Enforcing Dale’s law re-quires us to solve a nonconvex, nonseparable problem: we need to optimizethe concave function Q ( θ, θ ( l ) ) + ln P ( θ ) under the nonconvex, nonseparableconstraint that all of the elements in any column of the matrix w are of NFERRING NEURONAL CONNECTIVITY FROM CALCIUM IMAGING the same sign (either nonpositive or nonnegative). It is difficult to solve thisnonconvex problem exactly, but we have found that simple greedy methodsare quite efficient in finding good approximate solutions.We begin with our original sparse solution, obtained as discussed in theprevious subsection without enforcing Dale’s law. Then we assign each neu-ron as either excitatory or inhibitory, based on the weights we have in-ferred in the previous step: that is, neurons i whose inferred postsynapticconnections w ij are largely positive are tentatively labeled excitatory, andneurons with largely inhibitory inferred postsynapic connections are labeledinhibitory. Neurons which are highly ambiguous may be unassigned in theearly iterations, to avoid making mistakes from which it might be difficultto recover. Given the assignments a i ( a i = 1 for putative excitatory cells, − a i w ij ≥ , | w ij | Pseudocode summarizing our ap-proach is given in Algorithm 1. As discussed in Section 2.3, the intrinsicparameters ˜ θ i may be initialized effectively using the methods describedin [Vogelstein et al. (2009)]; then the full parameter θ is estimated viaEM, where we use the embedded-chain-within-blockwise-Gibbs approachdiscussed in Section 2.4 (or the cheaper factorized approximation describedin Section 2.4.1) to obtain the sufficient statistics in the E-step and the sep-arable convex optimization methods discussed in Section 2.5 for the M-step.As emphasized above, the parallel nature of these EM steps is essen-tial for making these computations tractable. We performed the bulk ofour analysis on a 256-processor cluster of Intel Xeon L5430 based comput-ers (2.66 GHz). For 10 minutes of simulated fluorescence data, imaged at30 Hz, calculations using the factorized approximation typically took 10–20 minutes per neuron (divided by the number of available processing nodeson the cluster), with time split approximately equally between (i) estimat-ing the intrinsic parameters ˜ θ i , (ii) approximating the posteriors using theindependent SMC method, and (iii) estimating the connectivity matrix, w .The hybrid embedded-chain-within-blockwise-Gibbs sampler was substan-tially slower, up to an hour per neuron, with the Gibbs sampler dominatingthe computation time, because we thinned the chain by a factor of five,following preliminary quantification of the autocorrelation timescale of theGibbs chain (data not shown). Y. MISHCHENKO, J. T. VOGELSTEIN AND L. PANINSKI Algorithm 1 Pseudocode for estimating connectivity from calcium imagingdata using EM; η and η are user-defined convergence tolerance parameters. while | w ( l ) − w ( l − | > η dofor all i = 1 . . . N dowhile | ˜ θ ( l ) i − ˜ θ ( l − i | > η do Approximate P [ X i ( t ) | F i ; ˜ θ i ] using SMC (Section 2.3)Perform the M-step for the intrinsic parameters ˜ θ i (Section 2.3) end whileend forfor all i = 1 . . . N do Approximate P [ n i ( t ) , h i ( t ) | F ; θ i ] using either the blockwise Gibbsmethod or the factorized approximation (Section 2.4) end forfor all i = 1 . . . N do Perform the M-step for { b i , w i } i ≤ N using separable convex opti-mization methods (Section 2.5) end forend while Simulating a neural population. To test the described method forinferring connectivity from calcium imaging data, we simulated networks ofspontaneously firing randomly connected neurons according to our model,equations (1)–(5), and also using other network models (see Section 3.4).Although simulations ran at 1 msec time discretization, the imaging ratewas assumed to be much slower: 5–200 Hz (cf. Figure 8 below).Model parameters were chosen based on experimental data available inthe literature for cortical neural networks [Sayer, Friedlande and Redman(1990); Braitenberg and Schuz (1998); Gomez-Urquijo et al. (2000); Lefortet al. (2009)]. More specifically, the network consisted of 80% excitatoryand 20% inhibitory neurons [Braitenberg and Schuz (1998); Gomez-Urquijoet al. (2000)], each respecting Dale’s law (as discussed in Section 2.5 above).Neurons were randomly connected to each other in a spatially homogeneousmanner with probability 0 . . ≈ NFERRING NEURONAL CONNECTIVITY FROM CALCIUM IMAGING the excitatory connections. PSP shapes were modeled as an alpha function[Koch (1999)]: roughly, the difference of two exponentials, corresponding toa sharp rise and relatively slow decay [Sayer, Friedlande and Redman (1990)].We neglected conduction delays, given that the time delays below ∼ w ij in equation (1)—are dimensionless quantitiesrepresenting the change in the spiking probability of neuron i given a spikein neuron j , whereas PSP peak amplitude describes the physiologically mea-sured change in the membrane voltage of a neuron due to synaptic currentstriggered by a spike in neuron j . To relate the two, note that in order totrigger an immediate spike in a neuron that typically has its membrane volt-age V b mV below the spiking threshold, roughly n E = V b /V E simultaneousexcitatory PSPs with the peak amplitude V E would be necessary. Therefore,the change in the spiking probability of a neuron due to excitatory synapticcurrent V E can be approximately defined as δP E = V E /V b (20)(so that δP E n E ≈ V b ≈ 15 mV here, while values for the PSP amplitude V E were chosen as described above. Similarly, according to equation (1), thesame change in the spiking probability of a neuron i following the spike ofa neuron j in the GLM is roughly δP E = [ f ( b i + w ij ) − f ( b i )] τ h , (21)where recall τ h is the typical PSP time-scale, that is, the time over whicha spike in neuron j significantly affects the firing probability of the neuron i .Equating these two expressions gives us a simple method for converting thephysiological parameters V E and V b into suitable GLM parameters w ij .Finally, parameters for the internal calcium dynamics and fluorescenceobservations were chosen according to our experience with several cells an-alyzed using the algorithm of [Vogelstein et al. (2009)], and conformed topreviously published results [Yuste et al. (2006); Helmchen, Imoto and Sak-mann (1996); Brenowitz and Regehr (2007)]. Table 1 summarizes the detailsfor each of the parameters in our model. 3. Results. In this section we study the performance of our proposed net-work estimation methods, using the simulated data described in Section 2.7above. Specifically, we estimated the connectivity matrix using both theembedded-chain-within-blockwise-Gibbs approach and the simpler factor-ized approximation. Figure 4 summarizes one typical experiment: the EM Y. MISHCHENKO, J. T. VOGELSTEIN AND L. PANINSKI Table 1 Table of simulation parameters Variable Value/distribution Unit Total neurons 10–500 ∼ E (0 . 5) mVInhibitory PSP peak height ∼ −E (2 . 3) mVExcitatory PSP rise time 1 msecInhibitory PSP rise time 1 msecExcitatory PSP decay time ∼ N . (10 , . 5) msecInhibitory PSP decay time ∼ N . (20 , 5) msecRefractory time, w ii ∼ N . (10 , . 5) msecCalcium std. σ c ∼ N . (28 , µ MCalcium jump after spike, A c ∼ N . (80 , µ MCalcium baseline, C b ∼ N . (24 , µ MCalcium decay time, τ c ∼ N . (200 , 60) msecDissociation constant, K d µ MFluorescence scale, α β γ − − − n/aSignal-independent noise, σ F · − − · − n/a E ( λ ) indicates an exponential distribution with mean λ , and N p ( µ, σ ) indicates a normaldistribution with mean µ and variance σ , truncated at lower bound pµ . Units (whenapplicable) are given with respect to mean values (i.e., units are squared for variance). algorithm using the factorized approximation estimated the connectivity ma-trix about as accurately as the full embedded-chain-within-blockwise-Gibbsapproach ( r = 0 . 47 vs. r = 0 . Impact of coarse time discretization of calcium imaging data andscale factor of inferred connection weights. A notable feature of the resultsillustrated in the left panel of Figure 4 is that our estimator is biased down-ward by a roughly constant scale factor: our estimates ˆ w ij are approximatelylinearly related to the true values of w ij in the simulated network, but theslope of this linear relationship is less than one. At first blush, this bias doesnot seem like a major problem: as we discussed in Section 2.7, even in thenoiseless case we should at best expect our estimated coupling weights ˆ w ij to correspond to some monotonically increasing function of the true neuralconnectivities, as measured by biophysical quantities such as the peak PSP NFERRING NEURONAL CONNECTIVITY FROM CALCIUM IMAGING Fig. 4. Quality of the connectivity matrix estimated from simulated calcium imagingdata. Inferred connection weights ˆ w ij are shown in a scatter plot versus real connec-tion weights w ij , with inference performed using the factorized approximation, exact em-bedded-chain-within-blockwise-Gibbs approach, and true spike trains down-sampled to theframe rate of the calcium imaging. A network of N = 25 neurons was used, firing at ≈ T = 10 min at Hz with intermediate eSNR ≈ (6) andFigure below]. The squared correlation coefficient between the connection weights calcu-lated using the factorized approximation and true connection weights was r = 0 . , com-pared with the embedded-chain-within-blockwise-Gibbs method’s r = 0 . . For connectionweights calculated directly from the true spike train down-sampled to the calcium imagingframe rate, we obtained r = 0 . . (For comparison, r = 0 . for the connectivity matrixcalculated using the full spike trains with 1 ms precision; data not shown.) Here and in thefollowing figures the gray dashed line indicates unity, y = x . The inferred connectivity inthe left panel shows a clear scale bias, which can be corrected by dividing by the scale cor-rection factor calculated in Section below (right panel). The vertical lines apparent atzero in both subplots are due to the fact that the connection probability in the true networkwas significantly less than one: that is, many of the true weights w ij are exactly zero. amplitude. Nonetheless, we would like to understand the source of this biasmore quantitatively; in this section we discuss this issue in more depth andderive a simple method for correcting the bias.The bias is largely due to the fact that we suffer a loss of temporal res-olution when we attempt to infer spike times from slowly-sampled fluores-cence data. As discussed in [Vogelstein et al. (2009)], we can recover someof this temporal information by using a finer time resolution for our recov-ered spike trains than ∆, the time resolution of the observed fluorescencesignal. However, when we attempted to infer w directly spike trains sampledfrom the posterior P [ X | F ] at higher-than-∆ resolution, we found that theinferred connectivity matrix was strongly biased toward the symmetrizedmatrix ( w + w T ) / F i ( t )and F j ( t ) (at the reduced time resolution ∆), the EM algorithm would typ-ically infer an excitatory bidirectional connection: that is, both ˆ w ij and ˆ w ji Y. MISHCHENKO, J. T. VOGELSTEIN AND L. PANINSKI would be large, even if only a unidirectional connection existed between neu-rons i and j in the true network. While we expect, by standard arguments,that the Monte Carlo EM estimator constructed here should be consistent(i.e., we should recover the correct w in the limit of large data length T and many Monte Carlo samples), we found that this bias persisted givenexperimentally-reasonable lengths of data and computation time.Therefore, to circumvent this problem, we simply used the original imag-ing time resolution ∆ for the inferred spike trains: note that, due to thedefinition of the spike history terms h ij in equation (3), a spike in neuron j at time t will only affect neuron i ’s firing rate at time t + ∆ and greater.This successfully counteracted the symmetrization problem (and also spedthe calculations substantially), but resulted in the scale bias exhibited inFigure 4, since any spikes that fall into the same time bin are treated ascoincidental: only spikes that precede spikes in a neighboring neuron by atleast one time step will directly affect the estimates of w ij , and thereforegrouping asynchronous spikes within a single time bin ∆ results in a loss ofinformation.To estimate the magnitude of this time-discretization bias more quantita-tively, we consider a significantly simplified case of two neurons coupled witha small weight w , and firing with baseline firing rate of r = f ( b ). In thiscase an approximate sufficient statistic for estimating w may be definedas the expected elevation in the spike rate of neuron one on an interval oflength T , following a spike in neuron two: SS = E (cid:20)Z t ′ + T t ′ n ( t ) dt | n ( t ′ ) = 1 , n ( t ) = 0 ∀ t ∈ ( t ′ , t ′ + T ] (cid:21) (22) ≈ r T + f ′ ( b ) w τ h , where f ′ ( b ) represents the slope of the nonlinear function f ( · ) at the base-line level b . This approximation leads to a conceptually simple method-of-moments estimator, ˆ w = ( SS − r T ) /f ′ ( b ) τ h . (23)Now, if the spike trains are down-sampled into time-bins of size ∆, we mustestimate the statistic SS with a discrete sum instead: SS ds = E " t ′ +∆+ T X t = t ′ +∆ n ds ( t ) (cid:12)(cid:12)(cid:12) n ds ( t ′ ) = 1 , n ds ( t ) = 0 ∀ t ∈ ( t ′ , t ′ + T ] ≈ r T + f ′ ( b ) Z ∆0 dt ′ ∆ Z ∆+ T ∆ w exp( − ( t − t ′ ) /τ h ) dt (24) ≈ r T + f ′ ( b ) w − exp( − ∆ /τ h )∆ /τ h . NFERRING NEURONAL CONNECTIVITY FROM CALCIUM IMAGING Fig. 5. The low frame rate of calcium imaging explains the scale error observed in theinferred connectivity weights shown in Figure . A correction scale factor may be calculatedanalytically (thick line) as discussed in the main text [equation (25)]. The scale errorobserved empirically (thin line) matches well with this theoretical estimate. In the lattercase, the scale error was calculated from the fits obtained directly from the true spiketrains, down sampled to different ∆ , for a network of N = 25 neurons firing at ≈ T = 10 min. The error-bars indicate 95% confidence intervals for scale errorat each ∆ . n ds ( t ) here are down-sampled spikes, that is, the spikes defined on a grid t = 0 , ∆ , , . . . . In the second equality we made the approximation that thetrue position of the spike of the second neuron, n ds ( t ′ ), may be uniformlydistributed in the first time-bin [0 , ∆], and the discrete sum over t is fromthe second time-bin [∆ , T , T + ∆], that is, over all spikes of thefirst neuron that occurred in any of the strictly subsequent time-bins up to T + ∆. Forming a method-of-moments estimator as in equation (23) leadsto a biased estimate, ˆ w ds ≈ − exp( − ∆ /τ h )∆ /τ h ˆ w , (25)and somewhat surprisingly (given the rather crude nature of these approx-imations), this corresponds quite well with the scale bias we observe inpractice. In Figure 5 we plot the scale bias from equation (25) versus thatempirically deduced from our simulations for different values of ∆; we seethat equation (25) describes the observed scale bias fairly well. Thus, wecan divide by this analytically-derived factor to effectively correct the biasof our estimates, as shown in the right panel of Figure 4.3.2. Impact of prior information on the inference. Next we investigatedthe importance of incorporating prior information in our estimates. Wefound that imposing a sparse prior (as described in Section 2.5) significantly Y. MISHCHENKO, J. T. VOGELSTEIN AND L. PANINSKI Fig. 6. Imposing a sparse prior on connectivity improves our estimates. Scatter plotsindicate the connection weights w ij reconstructed using no prior ( r = 0 . ; left panel)and a sparse prior ( r = 0 . ; right panel) vs. the true connection weights in each case.These plots were based on a simulation of N = 50 neurons firing at ≈ T = 10 min at Hz, with eSNR ≈ . Clearly, the sparse prior reduces the relative error,as indicated by comparing the relative distance between the data points (black dots) to thebest linear fit (gray dash-dotted line), at the expense of some additional soft-threshold bias,as is usual in the L setting. improved our results. For example, Figure 6 illustrates a case in which ourobtained r increased from 0 . 64 (with no L penalization in the M-step) to0 . 85 (with penalization; the penalty λ was chosen approximately as the in-verse mean absolute value of w ij , which is known here because we preparedthe network simulations, but is available in practice given the previous phys-iological measurements discussed in Section 2.7). See also Figure 10 below.Furthermore, the weights estimated using the sparse prior more reliably pro-vide the sign (i.e., excitatory or inhibitory) of each presynaptic neuron inthe network (Figure 7).Incorporation of Dale’s law, on the other hand, only leads to an ≈ r in the absence of an L penalty, and no significantimprovement at all in the presence of an L penalty (data not shown). Thus,Dale’s prior was not pursued further here.3.3. Impact of experimental factors on estimator accuracy. Next wesought to quantify the minimal experimental conditions necessary for ac-curate estimation of the connectivity matrix. Figure 8 shows the quality ofthe inferred connectivity matrix as a function of the imaging frame rate, andindicates that imaging frame rates ≥ 30 Hz are needed to achieve meaningfulreconstruction results. This matches nicely with currently-available technol-ogy; as discussed in the introduction, 30 or 60 Hz imaging is already inprogress in a number of laboratories [Nguyen et al. (2001); Iyer, Hooglandand Saggau (2006); Salome et al. (2006); Reddy et al. (2008)], though in NFERRING NEURONAL CONNECTIVITY FROM CALCIUM IMAGING Fig. 7. The distributions of inferred connection weights using no prior (left panel)and a sparse prior (right panel) vs. true distributions. When the sparse prior is en-forced, zero weights are recovered with substantially higher frequency (black lines), thusallowing better identification of connected neural pairs. Likewise, excitatory and in-hibitory weights are better recognized (red and blue lines, respectively), thus allowingaccurate classification of neurons as excitatory or inhibitory. The normalized Ham-ming distance between the inferred and true connectivity matrix here (defined as H ( w , ˆ w ) = [ N ( N − − P ij | sign( w ij ) − sign( ˆ w ij ) | , with the convention sign(0) = 0 )was . . Distributions are shown for a simulated population of N = 200 neurons fir-ing at ≈ T = 10 min at Hz, with eSNR ≈ 10. Note that the peak atzero in the true distributions (black dashed trace) corresponds to the vertical line visible atzero in Figures and ; inferred distributions were rescaled to remove bias, as describedin Section . some cases higher imaging rates come at a cost in the signal-to-noise ratioof the images or in the number of neurons that may be imaged simultane-ously. Similarly, Figure 9 illustrates the quality of the inferred connectivitymatrix as a function of the effective SNR measure defined in equation (6).Finally, Figure 10 shows the quality of the inferred connectivity matrixas a function of the experimental duration. The minimal amount of datafor a particular r depended substantially on whether the sparse prior wasenforced. In particular, when not imposing a sparse prior, the calcium imag-ing duration necessary to achieve r = 0 . T ≈ 10 min, and r = 0 . 75 was achieved at T ≈ 30 min. With a sparse prior, r > . T ≈ N varying between 50–200 neurons. These results were consistent witha rough Fisher information computation which we performed but have omit-ted here to conserve space. Y. MISHCHENKO, J. T. VOGELSTEIN AND L. PANINSKI Fig. 8. Accuracy of the inferred connectivity as a function of the frame rate of calciumimaging. A population of N = 25 neurons firing at ≈ T = 10 min wassimulated here, with eSNR ≈ . At Hz, r saturated at the level r ≈ . achievedwith ∆ → . Fig. 9. Accuracy of inferred connectivity as a function of effective imaging SNR [eSNR,defined in equation (6) ], for frame rates of 15, 33 and Hz. Neural population simulationwas the same as in Figure . Vertical black lines correspond to the eSNR values of the twoexample traces in Figure , for comparison. Impact of strong correlations and deviations from generative model onthe inference. Estimation of network connectivity is fundamentally rootedin observing changes in the spike rate conditioned on the state of the otherneurons. Considered from the point of view of estimating a standard GLM,it is clear that the inputs to our model (1) must satisfy certain basic identi-fiability conditions if we are to have any hope of accurately estimating theparameter w . In particular, we must rule out highly multicollinear inputs { h ij ( t ) } : speaking roughly, the set of observed spike trains should be rich NFERRING NEURONAL CONNECTIVITY FROM CALCIUM IMAGING Fig. 10. Accuracy of inferred connectivity as a function of the imaging time and neu-ral population size. Incorporating a sparse prior dramatically increases the reconstructionquality (dashed lines). When the sparse prior is imposed, T = 5 min is sufficient to re-cover of the variance in the connection weights. Incorporating Dale’s prior leads toonly marginal improvement (dotted line). Furthermore, reconstruction accuracy does notstrongly depend on the neural population size, N . Here, neural populations of size N = 100 and are shown (black and gray, respectively), with eSNR ≈ and Hz imaging ratein each case. enough to span all N dimensions of w i , for each cell i . In the simulationspursued here, the coupling matrix { w ij } i = j was fairly weak and neuronsfired largely independently of each other: see Figure 11, upper left, for anillustration. In this case of weakly-correlated firing, the inputs { h ij ( t ) } willalso be weakly correlated, and the model should be identifiable, as indeed wefound. Should this weak-coupling condition be violated, however (e.g., dueto high correlations in the spiking of a few neurons), we may require muchmore data to obtain accurate estimates due to multicollinearity problems.To explore this issue, we carried out a simulation of a hypothetical stronglycoupled neural network, where, in addition to the physiologically-relevantweak sparse connectivity discussed in Section 2.7, we introduced a sparserandom strong connectivity component. More specifically, we allowed a frac-tion of neurons to couple strongly to the other neurons, making these “com-mand” neurons which in turn could strongly drive the activity of the rest ofthe population [MacLean et al. (2005)]. The strength of this strong connec-tivity component was chosen to dynamically build up the actual firing ratefrom the baseline rate of f ( b ) ≈ Y. MISHCHENKO, J. T. VOGELSTEIN AND L. PANINSKI Fig. 11. Diversity of observed neural activity patterns is required for accurate circuit in-ference. Here, 15 sec of simulated spike trains for a weakly coupled network (top left panel)and a network with strongly coupled component (top right panel) are shown. In weakly cou-pled networks, spikes are sufficiently uncorrelated to give access to enough different neuralactivity patterns to estimate the weights w . In a strongly coupled case, many highly syn-chronous events are evident (top right panel), thus preventing observation of a sufficientlyrich ensemble of activity patterns. Accordingly, the connectivity estimates for the stronglycoupled neural network (bottom right panel) do not represent the true connectivity of thecircuit, even for the weakly coupled component. This is contrary to the weakly-couplednetwork (bottom left panel) where true connectivity is successfully obtained. Networks of N = 50 neurons firing at ≈ T = 10 min at Hz were used to producethis figure; eSNR ≈ . comparison, the left panels show a “typical” network (i.e., one lacking manystrongly coupled neurons), and its associated connectivity inference.On the other hand, our inference algorithm showed significant robustnessto model misspecifcation, that is, deviations from our generative model. Oneimportant such deviation is variation in the time scales of PSPs in differentsynapses. Up to now, all PSP time-scales were assumed to be the same, thatis, { τ hij } i,j ≤ N = τ h . In Figure 12 we introduce additional variability in τ h from one neuron to another. Variability in τ h results in added variance inthe estimates of the connectivity weights, w ij , through the τ h -dependence NFERRING NEURONAL CONNECTIVITY FROM CALCIUM IMAGING Fig. 12. Inference is robust to deviations of the data from our generative model. Withup to variability allowed in PSP time scales τ h (right panel), our algorithm providedreconstructions of almost the same quality as when all τ h ’s were the same (left panel).Simulation conditions were the same as in Figure 8, at Hz imaging rate. of the scaling factor equation (25). However, we found that this additionalvariance was relatively insignificant in cases where τ h varied up to 25% fromneuron to neuron. We also found that inference was robust to changes inthe sparseness of the underlying connectivity matrix: we simulated neuralpopulations of size N = 25 and N = 50 neurons, as above, with connec-tion sparseness varying from 5% (very sparse) to 100% (all-to-all), and inall cases the performance of our algorithm remained stable, with r ≈ . w ij = 0 (data not shown). Fi-nally, simulations with more biophysically-based conductance-driven noisyintegrate-and-fire network models [Vogels and Abbott (2005)] led to qualita-tively similar results, further establishing the robustness of these methods;again, details are omitted to conserve space. 4. Discussion. In this paper we develop a Bayesian approach for infer-ring connectivity in a network of spiking neurons observed using calciumfluorescent imaging. A number of previous authors have addressed the prob-lem of inferring neuronal connectivity given a fully-observed set of spiketrains in a network [Brillinger (1988); Chornoboy, Schramm and Karr (1988);Brillinger (1992); Paninski et al. (2004); Paninski (2004); Truccolo et al.(2005); Rigat, de Gunst and van Pelt (2006); Nykamp (2007); Kulkarni andPaninski (2007); Vidne et al. (2009); Stevenson et al. (2009); Garofalo et al.(2009); Cocco, Leibler and Monasson (2009)], but the main challenge in thepresent work is the indirect nature of the calcium imaging data, which pro-vides only noisy, low-pass filtered, temporally sub-sampled observations ofspikes of individual neurons. To solve this problem, we develop a special-ized blockwise-Gibbs sampler that makes use of an embedded Markov chain Y. MISHCHENKO, J. T. VOGELSTEIN AND L. PANINSKI method due to [Neal, Beal and Roweis (2003)]. The connectivity matrix isthen inferred in an EM framework; the M-step parallelizes quite efficientlyand allows for the easy incorporation of prior sparseness information, whichsignificantly reduces data requirements in this context. We have found thatthese methods can effectively infer the connectivity in simulated neuronalnetworks, given reasonable lengths of data, computation time, and assump-tions on the biophysical network parameters.To our knowledge, we are the first to address this problem using the sta-tistical deconvolution methods and EM formulation described here [thoughsee also Roxin, Hakim and Brunel (2008), who fit simplified, low temporalresolution transition-based models to the 10 Hz calcium data obtained byIkegaya et al. (2004)]. However, we should note that [Rigat, de Gunst andvan Pelt (2006)] developed a closely related approach to infer connectivityfrom low-SNR electrical recordings involving possibly-misclassified spikes (incontrast to the slow, lowpass-filtered calcium signals we discuss here). In par-ticular, these authors employed a very similar Bernoulli GLM and developeda Metropolis-within-Gibbs sampler to approximate the necessary sufficientstatistics for their model. In addition [Rigat, de Gunst and van Pelt (2006)]develop a more intricate hierarchical prior for the connectivity parameter w ; while we found that a simple L penalization was quite effective here, itwill be worthwhile to explore more informative priors in future work.A number of possible improvements of our method are available. One ofthe biggest challenges for inferring neural connectivity from functional datais the presence of indirect inputs from unobserved neurons [Nykamp (2005);Nykamp (2007); Kulkarni and Paninski (2007); Vidne et al. (2009); Vakorin,Krakovska and Mcintosh (2009)]: it is typically impossible to observe theactivity of all neurons in a given circuit, and correlations in the unobservedinputs can mimic connections among different observed neurons. Developingmethods to cope with such unobserved common inputs is currently an areaof active research, and should certainly be incorporated in the methods wehave developed here.Several other important directions for future work are worth noting. First,recently-developed photo-stimulation methods for activating or deactivatingindividual neurons or sub-populations [Boyden et al. (2005); Szobota et al.(2007); Nikolenko et al. (2011)] may be useful to increase statistical powerin cases where the circuit’s unperturbed activity may not allow reliable de-termination of a circuit’s connectivity matrix; in particular, by utilizingexternal stimulation, we can in principle choose a sufficiently rich experi-mental design (i.e., a sample of input activity patterns) to overcome themulticollinearity problems discussed in the context of Figure 11.Second, improvements of the algorithms for faster implementation areunder development. Specifically, fast nonnegative optimization-based decon-volution methods may be a promising alternative [Vogelstein et al. (2008); NFERRING NEURONAL CONNECTIVITY FROM CALCIUM IMAGING Paninski et al. (2009)] to the SMC approach used here. In addition, modifi-cations of our generative model to incorporate nonstationarities in the fluo-rescent signal (e.g., due to dye bleaching and drift) are fairly straightforward.Third, a fully Bayesian algorithm for estimating the posterior distribu-tions of all the parameters (instead of just the MAP estimate) would be ofsignificant interest. Such a fully-Bayesian extension is conceptually simple:we just need to extend our Gibbs sampler to additionally sample from the pa-rameter θ given the sampled spike trains X . Since we already have a methodfor drawing X given θ and F , with such an additional sampler we may ob-tain samples from P ( X , θ | F ) simply by sampling from X ∼ P ( X | θ, F ) and θ ∼ P ( θ | X ), via blockwise-Gibbs. Sampling from the posteriors P ( θ | X ) inthe GLM setting is quite tractable using hybrid Monte Carlo methods, sinceall of the necessary posteriors are log-concave [Ishwaran (1999); Gamerman(1997); Gamerman (1998); Ahmadian, Pillow and Paninski (2011)].Finally, most importantly, we are currently applying these algorithms inpreliminary experiments on real data. Checking the accuracy of our esti-mates is of course more challenging in the context of nonsimulated data,but a number of methods for partial validation are available, includingmultiple-patch recordings [Song et al. (2005)], photo stimulation techniques[Nikolenko, Poskanzer and Yuste (2007)], and fluorescent anatomical mark-ers which can distinguish between different cell types [Meyer et al. (2002)](i.e., inhibitory vs. excitatory cells; cf. Figure 7). We hope to present ourresults in the near future. Acknowledgments. We thank R. Yuste, B. Watson, A. Packer, T. Sippy,T. Mrsic-Flogel and V. Bonin for data and helpful discussions, and A. Rami-rez for helpful comments on an earlier draft.REFERENCES Abeles, M. (1991). Corticonics . Cambridge Univ. Press, Cambridge. Ahmadian, Y., Pillow, J. and Paninski, L. (2011). Efficient Markov chain Monte Carlomethods for decoding population spike trains. Neural Comput. . Andrieu, C., Doucet, A. and Holenstein, A. (2007). Particle Markov chain MonteCarlo. Working paper. Bickel, P., Li, B. and Bengtsson, T. (2008). Sharp failure rates for the bootstrapparticle filter in high dimensions. In Pushing the Limits of Contemporary Statistics:Contributions in Honor of Jayanta K. Ghosh (Clarke, B. and Ghosal, S., eds.) 318–329. IMS, Beachwod, OH. MR2459233 Binzegger, T., Douglas, R. J. and Martin, K. A. C. (2004). A quantitative map ofthe circuit of cat primary visual cortex. J. Neurosci. Boyd, S. and Vandenberghe, L. (2004). Convex Optimization . Oxford Univ. Press.MR2061575 Boyden, E. S., Zhang, F., Bamberg, E., Nagel, G. and Deisseroth, K. (2005).Millisecond-timescale, genetically targeted optical control of neural activity. Nat. Neu-rosci. Y. MISHCHENKO, J. T. VOGELSTEIN AND L. PANINSKI Braitenberg, V. and Schuz, A. (1998). Cortex: Statistics and Geometry of NeuronalConnectivity. Springer, Berlin. Brenowitz, S. D. and Regehr, W. G. (2007). Reliability and heterogeneity of calciumsignaling at single presynaptic boutons of cerebellar granule cells. J. Neurosci. Briggman, K. L. and Denk, W. (2006). Towards neural circuit reconstruction withvolume electron microscopy techniques. Curr. Opin. Neurobiol. Brillinger, D. (1988). Maximum likelihood analysis of spike trains of interacting nervecells. Biol. Cybern. Brillinger, D. (1992). Nerve cell spike train data analysis: A progression of technique. J. Amer. Statist. Assoc. Buhl, E., Halasy, K. and Somogyi, P. (1994). Diverse sources of hippocampal unitaryinhibitory postynaptic potentials and the number of synaptic release sites. Nature Candes, E. J. and Wakin, M. (2008). An introduction to compressive sampling. IEEESignal Proc. Mag. Chornoboy, E., Schramm, L. and Karr, A. (1988). Maximum likelihood identificationof neural point process systems. Biol. Cybern. Cocco, S., Leibler, S. and Monasson, R. (2009). Neuronal couplings between retinalganglion cells inferred by efficient inverse statistical physics methods. Proc. Nat. Acad.Sci. Cossart, R., Aronov, D. and Yuste, R. (2003). Attractor dynamics of network upstates in the neocortex. Nature Dempster, A., Laird, N. and Rubin, D. (1977). Maximum likelihood from incompletedata via the EM algorithm. J. Roy. Statist. Soc. Ser. B Djurisic, M., Antic, S., Chen, W. R. and Zecevic, D. (2004). Voltage imaging fromdendrites of mitral cells: EPSP attenuation and spike trigger zones. J. Neurosci. Dombeck, D. A., Khabbaz, A. N., Collman, F., Adelman, T. L. and Tank, D. W. (2007). Imaging large-scale neural activity with cellular resolution in awake, mobilemice. Neuron Donoho, D. and Elad, M. (2003). Optimally sparse representation in general (nonorthog-onal) dictionaries via L minimization. PNAS Douc, R., Cappe, O. and Moulines, E. (2005). Comparison of resampling schemes forparticle filtering. In Proc. 4th Int. Symp. Image and Signal Processing and Analysis Doucet, A., de Freitas, N. and Gordon, N. , eds. (2001). Sequential Monte Carlo inPractice . Springer, New York. Doucet, A., Godsill, S. and Andrieu, C. (2000). On sequential Monte Carlo samplingmethods for Bayesian filtering. Stat. Comput. Escola, S. and Paninski, L. (2011). Hidden Markov models applied toward the inferenceof neural states and the improved estimation of linear receptive fields. Neural Comput. To appear. Feldmeyer, D., Egger, V., Lubke, J. and Sakmann, B. (1999). Reliable synapticconnections between pairs of excitatory layer 4 neurones within a single “barrel” ofdeveloping rat somatosensory cortex. J. Physiol. Feldmeyer, D. and Sakmann, B. (2000). Synaptic efficacy and reliability of excitatoryconnections between the principal neurones of the input (layer 4) and output (layer 5)of the neocortex. J. Physiol. Gamerman, D. (1997). Sampling from the posterior distribution in generalized linearmixed models. Statist. Comput. Gamerman, D. (1998). Markov chain Monte Carlo for dynamic generalised linear models. Biometrika Garofalo, M., Nieus, T., Massobrio, P. and Martinoia, S. (2009). Evaluation of theperformance of information theory-based methods and cross-correlation to estimate thefunctional connectivity in cortical networks. PLoS ONE e6482. Godsill, S., Doucet, A. and West, M. (2004). Monte Carlo smoothing for non-lineartime series. J. Amer. Statist. Assoc. Gomez-Urquijo, S. M., Reblet, C., Bueno-Lopez, J. L. and Gutierrez-Ibarluzea,I. (2000). Gabaergic neurons in the rabbit visual cortex: Percentage, distribution andcortical projections. Brain Res. Greenberg, D. S., Houweling, A. R. and Kerr, J. N. D. (2008). Population imagingof ongoing neuronal activity in the visual cortex of awake rats. Nat. Neurosci Gupta, A., Wang, Y. and Markram, H. (2000). Organizing principles for a diversityof gabaergic interneurons and synapses in the neocortex. Science Harris, K., Csicsvari, J., Hirase, H., Dragoi, G. and Buzsaki, G. (2003). Organi-zation of cell assemblies in the hippocampus. Nature Hatsopoulos, N., Ojakangas, C., Paninski, L. and Donoghue, J. (1998). Informationabout movement direction obtained by synchronous activity of motor cortical neurons. PNAS Helmchen, F., Imoto, K. and Sakmann, B. (1996). Ca buffering and action potential-evoked Ca signaling in dendrites of pyramidal neurons. Biophys. J. Ikegaya, Y., Aaron, G., Cossart, R., Aronov, D., Lampl, I., Ferster, D. and Yuste, R. (2004). Synfire chains and cortical songs: Temporal modules of corticalactivity. Science Ishwaran, H. (1999). Applications of hybrid Monte Carlo to Bayesian generalized linearmodels: Quasicomplete separation and neural networks. J. Comput. Graph. Statist. Iyer, V., Hoogland, T. M. and Saggau, P. (2006). Fast functional imaging of singleneurons using random-access multiphoton (RAMP) microscopy. J. Neurophysiol. Koch, C. (1999). Biophysics of Computation . Oxford Univ. Press, Oxford. Kulkarni, J. and Paninski, L. (2007). Common-input models for multiple neural spike-train data. Network Comput. Neural Syst. Lefort, S., Tomm, C., Floyd Sarria, J.-C. and Petersen, C. C. H. (2009). Theexcitatory neuronal network of the c2 barrel column in mouse primary somatosensorycortex. Neuron Lei, N., Watson, B., MacLean, J., Yuste, R. and Shepard, K. (2008). A 256-by-256 cmos microelectrode array for extracellular stimulation of acute brain slices. In Proceedings to the International Solid-State Circuits Conference , ISSCC. Li, K. and Duan, N. (1989). Regression analysis under link violation. Ann. Statist. Litke, A., Bezayiff, N., Chichilnisky, E., Cunningham, W., Dabrowski, W.,Grillo, A., Grivich, M., Grybos, P., Hottowy, P., Kachiguine, S., Kalmar,R., Mathieson, K., Petrusca, D., Rahman, M. and Sher, A. (2004). What doesthe eye tell the brain? Development of a system for the large scale recording of retinaloutput activity. IEEE Trans. Nucl. Sci. Y. MISHCHENKO, J. T. VOGELSTEIN AND L. PANINSKI Livet, J., Weissman, T., Kang, H., Draft, R., Lu, J., Bennis, R., Sanes, J. and Lichtman, J. (2007). Transgenic strategies for combinatorial expression of fluorescentproteins in the nervous system. Nature Luczak, A., Bartho, P., Marguet, S., Buzsaki, G. and Harris, K. (2007). Sequentialstructure of neocortical spontaneous activity in vivo. PNAS MacLean, J., Watson, B., Aaron, G. and Yuste, R. (2005). Internal dynamics deter-mine the cortical response to thalamic stimulation. Neuron McLachlan, G. and Krishnan, T. (1996). The EM Algorithm and Extensions . Wiley,New York. MR2392878 Meyer, A. H., Katona, I., Blatow, M., Rozov, A. and Monyer, H. (2002). Invivo labeling of parvalbumin-positive interneurons and analysis of electrical coupling inidentified neurons. J. Neurosci. Micheva, K. and Smith, S. (2007). Array tomography: A new tool for imaging themolecular architecture and ultrastructure of neural circuits. Neuron Mishchenko, Y. (2009). Strategies for identifying exact structure of neural cir-cuits with broad light microscopy connectivity probes. Preprint. Available at http://precedings.nature.com/documents/2669/version/2 . Mishchenko, Y., Spacek, J., Mendenhall, J., Chklovskii, D. and Harris, K. M. (2009). Reconstruction of hippocampal CA1 neuropil at nanometer resolution revealsdisordered packing of processes and dependence of synaptic connectivity on local envi-ronment and dendritic caliber. To appear. Neal, R., Beal, M. and Roweis, S. (2003). Inferring state sequences for non-linearsystems with embedded hidden Markov models. In NIPS Ng, A. (2004). Feature selection, L vs. L regularization, and rotational invariance. In Proceedings of the Twenty-First International Conference on Machine Learning . ICML Nguyen, Q. T., Callamaras, N., Hsieh, C. and Parker, I. (2001). Construction ofa two-photon microscope for video-rate Ca imaging. Cell Calcium Nikolenko, V., Poskanzer, K. and Yuste, R. (2007). Two-photon photostimulationand imaging of neural circuits. Nature Methods Nikolenko, V., Watson, B., Araya, R., Woodruff, A., Peterka, D. and Yuste, R. (2011). SLM microscopy: Scanless two-photon imaging and photostim-ulation using spatial light modulators. Frontiers in Neural Circuits . To appear.DOI:10.3389/neuro.04.005.2008. Nykamp, D. Q. (2005). Revealing pairwise coupling in linear–nonlinear networks. SIAMJ. Appl. Math. Nykamp, D. Q. (2007). A mathematical framework for inferring connectivity in proba-bilistic neuronal networks. Math. Biosci. Ohki, K., Chung, S., Ch’ng, Y., Kara, P. and Reid, C. (2005). Functional imagingwith cellular resolution reveals precise micro-architecture in visual cortex. Nature Paninski, L. (2004). Maximum likelihood estimation of cascade point-process neural en-coding models. Network Comput. Neural Syst. Paninski, L., Ahmadian, Y., Ferreira, D., Koyama, S., Rahnama, K., Vidne, M.,Vogelstein, J. and Wu, W. (2009). A new look at state-space models for neural data. J. Comput. Neurosci. To appear. Paninski, L., Fellows, M., Shoham, S., Hatsopoulos, N. and Donoghue, J. (2004).Superlinear population encoding of dynamic hand trajectory in primary motor cortex. J. Neurosci. Petersen, C. C. and Sakmann, B. (2000). The excitatory neuronal network of rat layer4 barrel cortex. J. Neurosci. Petrusca, D., Grivich, M. I., Sher, A., Field, G. D., Gauthier, J. L., Greschner,M., Shlens, J., Chichilnisky, E. J. and Litke, A. M. (2007). Identification andcharacterization of a Y-like primate retinal ganglion cell type. J. Neurosci. Pillow, J., Shlens, J., Paninski, L., Sher, A., Litke, A., Chichilnisky, E. and Simoncelli, E. (2008). Spatiotemporal correlations and visual signaling in a completeneuronal population. Nature Plesser, H. and Gerstner, W. (2000). Noise in integrate-and-fire neurons: Fromstochastic input to escape rates. Neural Comput. Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications inspeech recognition. Proc. IEEE Ramon y Cajal, S. (1904). La Textura del Sistema Nerviosa del Hombre y los Vertebra-dos . Moya, Madrid. Ramon y Cajal, S. (1923). Recuerdos de mi vida: Historia de mi labor cientifica . AlianzaEditorial, Madrid. Reddy, G., Kelleher, K., Fink, R. and Saggau, P. (2008). Three-dimensional ran-dom access multiphoton microscopy for functional imaging of neuronal activity. Nat.Neurosci. Reyes, A., Lujan, R., Rozov, A., Burnashev, N., Somogyi, P. and Sakmann, B. (1998). Target-cell-specific facilitation and depression in neocortical circuits. Nat. Neu-rosci. Rigat, F., de Gunst, M. and van Pelt, J. (2006). Bayesian modelling and analysis ofspatio-temporal neuronal networks. Bayesian Anal. Robert, C. and Casella, G. (2005). Monte Carlo Statistical Methods . Springer, NewYork. Roxin, A., Hakim, V. and Brunel, N. (2008). The statistics of repeating patterns ofcortical activity can be reproduced by a model network of stochastic binary neurons. J. Neurosci. Salome, R., Kremer, Y., Dieudonne, S., Leger, J.-F., Krichevsky, O., Wyart,C., Chatenay, D. and Bourdieu, L. (2006). Ultrafast random-access scanning in two-photon microscopy using acousto-optic deflectors. J. Neurosci. Methods Santhanam, G., Ryu, S. I., Yu, B. M., Afshar, A. and Shenoy, K. V. (2006). A high-performance brain-computer interface. Nature Sayer, R. J., Friedlander, M. J. and Redman, S. J. (1990). The time course andamplitude of epsps evoked at synapses between pairs of CA3/CA1 neurons in the hip-pocampal slice. J. Neurosci. Segev, R., Goodhouse, J., Puchalla, J. and Berry, M. (2004). Recording spikes froma large fraction of the ganglion cells in a retinal patch. Nat. Neurosci. Shumway, R. and Stoffer, D. (2006). Time Series Analysis and Its Applications .Springer, New York. MR2228626 Song, S., Sjostrom, P. J., Reiql, M., Nelson, S. and Chklovskii, D. B. (2005).Highly nonrandom features of synaptic connectivity in local cortical circuits. PLoSBiol. e68. Stein, R. B., Weber, D. J., Aoyagi, Y., Prochazka, A., Wagenaar, J. B. M.,Shoham, S. and Normann, R. A. (2004). Coding of position by simultaneouslyrecorded sensory neurones in the cat dorsal root ganglion. J. Physiol. Y. MISHCHENKO, J. T. VOGELSTEIN AND L. PANINSKI Stevenson, I., Rebesco, J., Hatsopoulos, N., Haga, Z., Miller, L. and Koerding,K. (2008). Inferring network structure from spikes. In Statistical Analysis of NeuralData Meeting . Stevenson, I. H., Rebesco, J. M., Hatsopoulos, N. G., Haga, Z., Miller, L. E. and Kording, K. P. (2009). Bayesian inference of functional connectivity and networkstructure from spikes. IEEE Trans. Neural Syst. Rehab. Stosiek, C., Garaschuk, O., Holthoff, K. and Konnerth, A. (2003). In vivo two-photon calcium imaging of neuronal networks. Proc. Natl. Acad. Sci. USA Szobota, S., Gorostiza, P., Del Bene, F., Wyart, C., Fortin, D. L., Kolstad,K. D., Tulyathan, O., Volgraf, M., Numano, R., Aaron, H. L., Scott, E. K.,Kramer, R. H., Flannery, J., Baier, H., Trauner, D. and Isacoff, E. Y. (2007).Remote control of neuronal activity with a light-gated glutamate receptor. Neuron Thompson, A., Girdlestone, D. and West, D. (1988). Voltage-dependent currentsprolong single-axon postsynaptic potentials in layer III pyramidal neurons in rat neo-cortical slices. J. Neurophysiol. Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist.Soc. Ser. B Tipping, M. (2001). Sparse Bayesian learning and the relevance vector machine. J. Mach.Learn. Res. Truccolo, W., Eden, U., Fellows, M., Donoghue, J. and Brown, E. (2005). A pointprocess framework for relating neural spiking activity to spiking history, neural ensembleand extrinsic covariate effects. J. Neurophysiol. Tsien, R. Y. (1989). Fluorescent probes of cell signaling. Ann. Rev. Neurosci. Vakorin, V. A., Krakovska, O. A. and Mcintosh, A. R. (2009). Confounding effectsof indirect connections on causality estimation. J. Neurosci. Methods Vidne, M., Kulkarni, J., Ahmadian, Y., Pillow, J., Shlens, J., Chichilnisky, E.,Simoncelli, E. and Paninski, L. (2009). Inferring functional connectivity in an en-semble of retinal ganglion cells sharing a common input. In Computational and SystemsNeuroscience (COSYNE09) . Vogels, T. and Abbott, L. F. (2005). Signal propagation and logic gating in networksof integrate-and-fire neurons. J. Neurosci. Vogelstein, J., Babadi, B., Watson, B., Yuste, R. and Paninski, L. (2008). Fastnonnegative deconvolution via tridiagonal interior-point methods, applied to calciumfluorescence data. In Statistical Analysis of Neural Data (SAND) Conference . Vogelstein, J., Watson, B., Packer, A., Jedynak, B., Yuste, R. and Paninski, L. (2009). Spike inference from calcium imaging using sequential monte carlo methods. Biophys. J. Wallace, D., zum Alten Borgloh, S., Astori, S., Yang, Y., Bausen, M., Kugler,S., Palmer, A., Tsien, R., Sprengel, R., Kerr, J., Denk, W. and Hasan, M. (2008). Single-spike detection in vitro and in vivo with a genetic Ca sensor. Nat.Methods Yaksi, E. and Friedrich, R. W. (2006). Reconstruction of firing rate changes acrossneuronal populations by temporally deconvolved Ca imaging. Nat. Methods Yasuda, R., Nimchinsky, E. A., Scheuss, V., Pologruto, T. A., Oertner, T. G.,Sabatini, B. L. and Svoboda, K. (2004). Imaging calcium concentration dynamics insmall neuronal compartments. Sci. STKE Yuste, R., Konnerth, A., Masters, B. et al. (2006). Imaging in Neuroscience andDevelopment, A Laboratory Manual . Oxford, New York. Y. MishchenkoL. PaninskiDepartment of Statisticsand Center for Theoretical NeuroscienceColumbia University1255 Amsterdam AveNew York, New York 10027USAE-mail: [email protected]@stat.columbia.edu URL: