Cognitive computation using neural representations of time and space in the Laplace domain
CCognitive computation using neural representations of timeand space in the Laplace domain
Marc W. Howard and Michael E. Hasselmo
Center for Memory and Brain, Center for Systems Neuroscience, Department ofPsychological and Brain Sciences, Department of PhysicsBoston UniversitySubmitted to
Computational Brain & Behavior
March 27, 2020
Abstract
Memory for the past makes use of a record of what happened when—afunction over past time. Time cells in the hippocampus and temporal contextcells in the entorhinal cortex both code for events as a function of past time,but with very different receptive fields. Time cells in the hippocampus canbe understood as a compressed estimate of events as a function of the past.Temporal context cells in the entorhinal cortex can be understood as theLaplace transform of that function, respectively. Other functional cell typesin the hippocampus and related regions, including border cells, place cells,trajectory coding, splitter cells, can be understood as coding for functionsover space or past movements or their Laplace transforms. More abstractquantities, like distance in an abstract conceptual space or numerosity couldalso be mapped onto populations of neurons coding for the Laplace transformof functions over those variables. Quantitative cognitive models of memoryand evidence accumulation can also be specified in this framework allowingconstraints from both behavior and neurophysiology. More generally, thecomputational power of the Laplace domain could be important for efficientlyimplementing data-independent operators, which could serve as a basis forneural models of a very broad range of cognitive computations.Connectionist models have had astounding success in recent years in describing in-creasingly sophisticated behaviors using a large number of simple processing elements(LeCun, Bengio, & Hinton, 2015; Graves, Wayne, & Danihelka, 2014). However, the nativeability to perform symbolic computations has long been noted as a key problem in develop-ing a theory of cognition (Fodor & Pylyshyn, 1988; Gallistel & King, 2011; Marcus, 2018).Among other things, symbolic processing requires operators that are independent of thedata on which they operate. For instance, a computer program can add any pair of integerswhether they are familiar or not. Human cognition also has a powerful symbolic capabilitythat allows us to perform many data-independent operations. To take a concrete situation,after focusing on Figure 1 one could close one’s eyes and implement a huge number of oper-ations on the contents of memory. For instance, one could choose to imagine Moe Howard’s a r X i v : . [ q - b i o . N C ] M a r OWARD AND HASSELMO A is above or belowMoe’s tie. Operations like translation (e.g., imagining Moe’s face moved by 5 cm to the left)or subtraction (e.g., the relative position of the thought bubble and Moe’s tie) would haveobvious benefits in computational problems like spatial navigation, where we have learned agreat deal about functional correlates of neurons in the hippocampus and entorhinal cortex(O’Keefe & Dostrovsky, 1971; Wilson & McNaughton, 1993; Hafting, Fyhn, Molden, Moser,& Moser, 2005). If cognitive data of many different types used the same form of neuralrepresentation then if we knew how to build data-independent operators in one domain,the same computational mechanisms could be reused across many domains of cognition. Acomplete set of operations would constitute a “cognitive map” that could be used for manydifferent types of information (O’Keefe & Nadel, 1978; Behrens et al., 2018).This paper reviews recent evidence that suggests a common form of neural represen-tation for many types of information important in cognition. The basic idea is that thefiring rate of populations of neurons represent functions out in the world. Some popula-tions do not represent these functions directly, but rather represent the Laplace transformof functions. Because we know a great deal about the properties of the Laplace transform,this lets us understand the computational capabilities of these populations at a relativelydeep level. This paper proceeds in three sections. In the section entitled “Computingwith functions in the Laplace domain” we sketch out in non-technical language the ideasbehind this hypothesis. This section will explain what it means to say the brain “rep-resents a function,” and what it means for the a population of neurons to estimate “theLaplace transform of a function.” In the second section, we describe recent neurophysiolog-ical evidence from the hippocampus and entorhinal cortex. The data show evidence thathippocampal time cells behave as if they are estimating a function over past time. Moreoverneurons in the entorhinal cortex behave as if they were estimating the Laplace transformof this function over past time. To the extent one accepts this empirical account, it meansthat the brain has a transform/inverse pair for functions of time one synapse away in themedial temporal lobe. In the third section, we will review modeling work describing howto construct transform/inverse pairs to represent functions over not only time, but spatialposition, other kinematic variables, and accumulated evidence for use in decision-makingcircuits. We suggest that the reader should seriously consider the idea that the brain mightuse transform/inverse pairs to perform cognitive computations in many different domains. Computing with functions in the Laplace domain
We argue that the brain, at least in some cases, computes using functions describinginformation over some continuum (Figure 1a). Consider some function f defining a scalarvalue in the external world over some domain x , f ( x ), for instance, in vision the pattern oflight in a greyscale image as a function of retinal position. We will write f ( x o ) to refer tothe value at a single position and understand f ( x ) to mean the brightness over all possiblepositions. The activity of neurons along the retina along the retinal surface estimates thisfunction. To distinguish the brain’s internal estimate from the actual function in the world,we will write ˜ f ( ∗ x ) to describe the activity over a population of neurons. The value at aparticular location ˜ f ( ∗ x o ) corresponds to the activation of the receptor that is indexed tothe physical location x o . We understand ˜ f ( ∗ x ) to mean the activation of all the receptors OWARD AND HASSELMO A B C
Figure 1 . Encoding functions in the Laplace domain. A.
The brain tries to estimate functions f ( x ) out in the world. The brain’s estimate of this function is denoted ˜ f ( ∗ x ). In many cases, it is notpractical to directly compute ˜ f ( ∗ x ). Instead the brain first estimates the Laplace transform of f ( x ), F ( s ) and then constructs ˜ f ( ∗ x ) by inverting the transform via an inverse transform operator L -1 k .Both ˜ f ( ∗ x ) and F ( s ) correspond to firing rate across many neurons indexed by ∗ x or s as appropriate.We assume the population is very large so we can think of ∗ x and s as effectively continuous. B. The Laplace transform of a function is analogous to a reflection in a fun-house mirror. Like thereflection, the transform of a function need not superficially resemble the original image. However,each unique image causes a unique reflection. This means that given a particular reflection, andknowledge of the distortion introduced by the mirror, one could reconstruct the particular imageassociated with the particular reflection. Similarly, each function specifies a unique transform, onecan in principle reconstruct the original function from its transform. C. Data-independent operatorsto compute with functions in the Laplace domain. The world, left of the dashed line, contains somefunction, f ( x ). The Laplace operator L , is used to generate the Laplace transform of the function F ( s ) in the brain. Approximately inverting the transform, via an operator L -1 k generates an internalestimate of the external function, ˜ f ( ∗ x ). Note that there is some “blur” in this estimate of thetrue function. Data-independent operators are necessary for symbolic computation. Many suchoperators can be efficiently implemented in the Laplace domain. Here we illustrate a translationoperator. Although the world has provided f ( x ), we want to compute a translated version of thefunction f ( x + δ ). We can rcompute the Laplace transform of f ( x + δ ) by operating on F ( s ) withan operator R δ such that R δ F ( s ) is the transform of f ( x + δ ). Now, applying the inverse operatorwe can obtain an approximation of f ( x + δ ), ˜ f ( ∗ x + δ ) = L -1 k R δ F ( s ). Note that the translationoperator R is independent of the data—it works equally well on any function. Moreover, in the caseof translation, R is particularly simple—just a diagonal matrix—enabling efficient computation oftranslation. Other data-independent operators have a simple form in the Laplace domain. OWARD AND HASSELMO ∗ x mapsonto the continuum of x , enabling the population to distinguish many different functions f ( x ). We can understand the particular shape of the receptive fields as basis functions overthe domain x . We will assume that the number of receptors is very large and the distancebetween their centers is small so that we can think of ˜ f ( ∗ x ) as if it was a function over acontinuous variable. Note that the density of receptors need not be constant in differentregions of x .If we cannot directly place a receptor at a particular physical location x o , how canwe estimate functions over variables such as time or allocentric position, or location withinan abstract conceptual space? We hypothesize (Shankar & Howard, 2010, 2012; Howardet al., 2014) that as an intermediate step in estimating ˜ f ( ∗ x ) the brain could construct theLaplace transform of f ( x ) over another population of neurons. We describe this situationnotationally as F ( s ) = L f ( x ). Analogous to the way in which ˜ f ( ∗ x ) corresponds to theactivity of many neurons indexed by their value of ∗ x , F ( s ) is understandable as a particularpattern of activity over a population of neurons, each indexed by a continuous parameter s . Rather than receptive fields that tile x , neurons in F ( s ) have receptive fields that fall offexponentially like e − sx .The insight that F ( s ) is the Laplace transform of f ( x ) is very powerful—it means thatknowing F ( s ) is enough to specify f ( x ). Because neurons in F ( s ) do not have receptive fieldscentered on a particular value of x , it is not necessarily intuitive to visualize the connectionbetween f and F . In this sense, the Laplace transform of a function is something like thereflection in a funhouse mirror (Fig. 1B). We can construct ˜ f ( ∗ x ) by inverting the transform:˜ f ( ∗ x ) = L -1k F ( s ). Here L -1k is a feedforward operator that approximates the inverse Laplacetransform (Shankar & Howard, 2012; Liu, Tiganj, Hasselmo, & Howard, 2019). Of coursethe inverse cannot be precise—with a finite number of neurons we cannot reconstruct thepotentially infinite amount of information in a continuous function (Appendix 2). However,it can be shown (Shankar & Howard, 2012) that the properties of L -1k blur ˜ f such that thewidth of each receptive field in ˜ f is a constant fraction of ∗ x . This is closely analogous tothe finding that the size of receptive fields in the visual system grows proportional to thedistance from the fovea.One of the reasons the Laplace transform is so widely used in engineering and dataprocessing applications is because one can efficiently implement data-independent oper-ations on functions in the Laplace domain. That is, suppose one wants to perform anoperation on some function f . In many cases, it is is more computationally efficient com-putational to construct F = L f , apply the appropriate operator in the Laplace domain to F and then take the inverse to get the desired answer. Figure 1C provides a schematic forhow this could work for function translation—constructing f ( x + δ ) from f ( x ). EfficientLaplace domain methods exist for many unary operators that take in one function f , suchas translation, computing the mean or moments of the distribution or taking derivatives.Moreover, there are also methods for binary operations that compare two functions f and g to one another, such as convolution and cross-correlation. Thus, if the brain had access toboth Laplace transforms it could in principle take advantage of some of this computationalpower to implement data-independent operations. OWARD AND HASSELMO Coding for past events as a function of time in the brain
Memory, by definition, requires some record of the past. Psychologists have longappreciated that memory relies on an explicit record of what events happened when in thepast (James, 1890; Brown, Neath, & Chater, 2007; Balsam & Gallistel, 2009). Computa-tional neuroscientists have long proposed models with sequentially-activated neurons couldrepresent past events (Tank & Hopfield, 1987; Grossberg & Merrill, 1992; Goldman, 2009).The observation that memory is less precise for less recent events has led to the proposalthat this record of the past is compressed, such that the time at which recent events oc-curred has better resolution than events further in the past (Fig. 2A). This compression isanalogous to the compression of the visual system where regions of visual space near thefovea have a much greater resolution than regions further from the fovea (Howard, 2018).More formally, at time t the brain tries to estimate the objective past leading up tothe present f ( τ ). In this formulation, τ runs from zero to infinity with zero correspondingto the moment in the immediate past at time t . At each moment, we can understandthe past as a function over the τ axis (Fig. 2A). This function f ( τ ) is estimated by apopulation of neurons that we write as ˜ f ( ∗ τ ). Cognitive modeling and theoretical work(Shankar & Howard, 2012; Howard, Shankar, Aue, & Criss, 2015) has shown that this kindof representation can be used to construct detailed behavioral models of many memory tasksif the representation of the past is compressed. We first argue that hippocampal time cellshave the properties predicted for ˜ f ( ∗ τ ) and then review evidence suggesting that neurons inthe entorhinal cortex of rodents and monkeys show properties consistent with the Laplacetransform F ( s ) = L f ( τ ). Time cells in the hippocampus code for a compressed timeline of the recent past
Time cells in the hippocampus behave as if they have receptive fields organized intime (Figure 1C, (Pastalkova, Itskov, Amarasingham, & Buzsaki, 2008; MacDonald, Lepage,Eden, & Eichenbaum, 2011; Eichenbaum, 2014; Terada, Sakurai, Nakahara, & Fujisawa,2017; Taxidis et al., 2018; Cruzado, Tiganj, Brincat, Miller, & Howard, 2019)). As atriggering event recedes into the past, the event first enters and then exits the “time field”of different time cells indexed by ∗ τ . Because the time fields for different cells are centeredon different ∗ τ s, the population fires in sequence as the triggering event moves throughthe past. Hippocampal time cells have the computational properties one would expect of acompressed representation of what happened when as a function of past time. First, differentexternal stimuli can trigger distinct sequences of hippocampal time cells (MacDonald etal., 2011; Terada et al., 2017; Taxidis et al., 2018; Cruzado et al., 2019), meaning thatthese populations carry information about what stimulus happened in the past. Second,hippocampal time cells show decreasing temporal accuracy further in the past. The numberof cells with receptive fields around a particular value ∗ τ o goes down as ∗ τ o goes up. Moreover,the width of receptive fields go up with ∗ τ o (Cruzado et al., 2019; Kraus, Robinson, White,Eichenbaum, & Hasselmo, 2013; Howard et al., 2014; Salz et al., 2016). OWARD AND HASSELMO A BC
Figure 2 . A compressed timeline and its Laplace transform in hippocampus and en-torhinal cortex. A.
Schematic for a compressed internal timeline. The horizontal line describes asequence of distinct events, here a sequence of tones, in the external world. At a particular moment t , f ( τ ) describes the objective past leading up to the present, with τ = 0 corresponding to theimmediate past. At any moment f ( τ ) describes what event (i.e., which note) happened at eachtime τ in the past. One can imagine an internal estimate of the timeline leading up to the present˜ f ( ∗ τ ) (diagonal line). ˜ f ( ∗ τ ) estimates what happened when in the past, but the internal time axisis compressed. This means that the time of occurence for past events is resolved with decreasingaccuracy for events further in the past (note that spacing between the memory for notes further inthe past is decreased). B. Hippocampal time cells have receptive fields in time. Each panel is adifferent neuron, with a series of rasters at the top and a smoothed peri-stimulus time histogramshown below. Time zero in this study is the beginning of the delay period of a memory experiment.Time cells fire when the triggering event is a certain time in the past. They can thus be understoodas coding for a function over past time. Note that the cells that fire later have wider temporalreceptive fields. This is general characteristic of hippocampal time cells and indicates less temporalresolution for events further in the past, consistent with a compressed representation in the brain.After MacDonald, et al., (2011). C. Left: Time courses for model units encoding the instantaneousinput f , the Laplace transform of the past F ( s ), and the inverse transform ˜ f ( ∗ τ ). When the inputwas a time τ in the past, the neurons coding the Laplace transform are activated as e − sτ . Neurons in F ( s ) activate at the same time after the input is presented and then decay exponentially at differentrates in time as τ increases. Neurons coding for ˜ f ( ∗ τ ) activate sequentially. Middle, Theoreticalpredictions for the two populations expressed as a heatmap. Right; Empirical heatmaps for units inmacaque entorhinal cortex (top) and hippocampus (bottom). Data from Bright, et al., (2019) andCruzado, et al., (2019). OWARD AND HASSELMO Temporal context cells in entorhinal cortex code for the Laplace transform of a compressedtimeline of the past.
Let us consider how we would identify neurons coding for the Laplace transform F ( s ) = L f ( τ ). Cells coding the Laplace transform of a variable x should show receptivefields that fall off like e − sx . A set of neurons coding the Laplace transform of past time τ should show receptive fields that go like e − sτ , with many different values of s across differentneurons. If we think of the triggering stimulus as a delta function at time t = 0, it enters f ( τ ) at time t at τ = 0. At time t after the triggering stimulus, the firing rates shouldchange like e − st . Observing the firing of a neuron with rate constant s , we should see itchange shortly after the triggering stimulus, and then relax back to baseline exponentiallyin the time after the triggering stimulus. Cells with high values of s (corresponding to fasttime constants) should relax quickly; cells with small values of s corresponding to slow timeconstants) should relax more slowly. We would expect a continuum of s values to describethe continuum of τ values. If the representation is compressed, we would see more neuronswith fast decay rates than with slow decay rates. The grey lines in Figure 2C depicts how F ( s ) and ˜ f ( ∗ τ ) should behave in the time after a triggering stimulus for different values of s and ∗ τ .Recent evidence shows that cells in the entorhinal cortex contain temporal informa-tion, like hippocampal time cells, but with temporal receptive fields that are as we wouldexpect from the Laplace transform (Figure 2C, (Bright et al., 2019)). These “temporalcontext cells” are analogous to findings from a rodent experiment recording from lateralentorhinal cortex (Tsao et al., 2018). In that study, neurons in the EC were perturbed byentry into an enclosure for a period of random foraging. Different neurons relaxed with avariety of rates, showing gradual decay over time scales of up to tens of minutes. Althoughthere are thus far only two studies showing this phenomenon, the similarity of the quali-tative properties of the neurons despite drastic changes in the methods of the two studiesis striking. Appendix 1 discusses possible neurophysiological mechanisms to implement theLaplace transform and L -1k in neural circuits. Time and memory outside the MTL
The entorhinal cortex and hippocampus are believed to be important in episodicmemory. Computational modeling suggests that a representation like ˜ f ( ∗ τ ) is also usefulfor other “kinds” of memory, including short-term working memory tasks, conditioningtasks, as well as interval timing tasks (Howard et al., 2015; Tiganj, Cruzado, & Howard,2019). This suggests that other brain regions have access to representations like ˜ f ( ∗ τ ).Indeed, time cells with more or less the same properties of hippocampal time cells havebeen observed in the striatum (Mello, Soares, & Paton, 2015; Akhlaghpour et al., 2016; Jin,Fujii, & Graybiel, 2009) medial prefrontal cortex (Tiganj, Shankar, & Howard, 2017), lateralprefrontal cortex (Tiganj, Cromer, Roy, Miller, & Howard, 2018; Cruzado et al., 2019) anddorsolateral prefrontal cortex (Jin et al., 2009). The fact that this kind of representation isso widespread suggests that many different types of memory utilize a compressed timelineof the past. Ramping neurons observed outside of the EC during memory and timing tasks(e.g., (Mita, Mushiake, Shima, Matsuzaka, & Tanji, 2009; Rossi-Pool et al., 2019; Zhang etal., 2019; Wang, Narain, Hosseini, & Jazayeri, 2018)) could also be manifestations of the OWARD AND HASSELMO
Compressed functions of other variables
A general framework for cognitive computation in the brain requires that represen-tations of many different variables use the same “neural currency.” The same formalismutilizing the Laplace transform and its inverse can give rise not only to functions over timebut functions over many other variables as well. The basic idea (Appendix 2) is the equa-tions implementing the Laplace transform of a function of time are modulated by the rateof change of some variable x . We refer to the modulation factor at time t as α ( t ). At thecellular level, α ( t ) is understandable as a gain factor that changes the slope of the f-i curverelating firing rate to input current. If all of the neurons participating in the transform aremodulated at each moment by α ( t ) = dx/dt , then F ( s ) holds the transform with respect to x rather than time, F ( s ) = L f ( x ). When one inverts the transform, with L -1k , this resultsin an estimate of the function of x , ˜ f ( ∗ x ) = L -1k F ( s ) (Figure 3A).This strategy can be used to describe different kinds of functions by coding for dif-ferent input stimuli—different “what” information—and choosing α ( t ) to be the rate ofchange of different variables. In this section, we discuss computational work representingcompressed functions of variables other than time. For instance we will see that this ap-proach can be used to compute functions coding for the relative spatial position of the wallof an enclosure, for past movements as a function of their position in the sequence or theamount of evidence accumulated for one of two alternatives in a simple decision-makingtask. The first subsection, entitled “Spatiotemporal trajectories in the medial temporallobe,” reviews evidence that transform/inverse pairs of representations can account for a“particle zoo” of functional cell types in the MTL during spatiotemporal navigation. In thenext subsection, entitled “Accumulated evidence and decision-making,” we describe a neu-ral implementation of widely-used cognitive models for evidence accumulation models usingtransform/inverse pairs. Finally, in the last subsection, entitled “Cognitive models builtentirely of transform/inverse pairs,” we consider the possibility of cognitive models madeentirely of transform/inverse pairs and how they could exploit computational properties ofthe Laplace domain for symbolic computation. Spatiotemporal trajectories in the medial temporal lobe
It has long been suggested that the hippocampal place code is a special case of amore general form of representation coding for spatial, temporal and other more abstractrelationships between events (O’Keefe & Nadel, 1978; Cohen & Eichenbaum, 1993; Eichen-baum, Dudchenko, Wood, Shapiro, & Tanila, 1999; Hasselmo, 2012). A wide diversityof functional cell types that communicate information about kinematic variables have beenreported in the hippocampus and related structures, including place cells, border cells, split-ter cells, trajectory coding cells, speed cells, head direction cells and many more. Many ofthese functional cell types (the most notable exception being grid cells) can be understoodas the Laplace transform of a function coding a spatiotemporal trajectory; others can beunderstood as an approximate inverse of a function. Moreover, these populations seem tocome in pairs, with populations with properties like the Laplace transform in the entorhinalcortex and the populations with properties like the inverse in the hippocampus.
OWARD AND HASSELMO A B
Figure 3 . Laplace domain code for space and other variables. A.
This computationalframework can be used to construct compressed functions over any continuous variable, here denoted x , for which the brain has access to the time derivative. The gain of the neurons coding for thetransform is dynamically set to α ( t ) (see Eq. 5 in Appendix 2). If α ( t ) = dx/dt , the time derivativeof x , then the transform is with respect to x instead of t . The inverse thus estimates f ( x ) ratherthan f ( t ). B. Schematic showing the activity of a population of cells coding for one-dimensionalposition from an environmental boundary. In this simulation, the landmark is at the left of a lineartrack (position zero). The animal moves at a constant speed to the other end of the track andis reflected back towards its initial starting point. The activity of populations of cells coding theLaplace transform ( F ( s ), top row) and the inverse transform ( ˜ f ( ∗ x ), bottom) are shown as a functionof time (left) and position (right) over three laps. Different values of s and ∗ x are shown as differentlines. As the animal moves away from the landmark, firing rates in the transform decay exponentiallywith different rates. When the animal reverses direction, these cells rise until the starting positionis reached. The cells coding the inverse fire in sequence as the distance to the landmark grows andthen fire in the reverse sequence as the agent approaches the landmark. When plotted as a functionof position rather than time the cells in F ( s ) show characteristic exponential receptive fields as afunction of position and the cells in ˜ f ( ∗ x ) show circumscribed place fields. Consider border cells in the medial entorhinal cortex (MEC) (Solstad, Boccara,Kropff, Moser, & Moser, 2008). Border cells fire maximally at a location close to theboundary of an environment with a particular orientation. Their firing rate decays mono-tonically with distance to the boundary. We saw earlier that temporal context cells in theentorhinal cortex are perturbed by a specific stimulus and then relax monotonically towardstheir baseline firing rate over time (Fig. 2C, (Tsao et al., 2018; Bright et al., 2019)). Tem-poral context cells code the Laplace transform of a function over time F ( s ) = L f ( τ ). TheLaplace transform of distance to the boundary F ( s ) = L f ( x ) would behave similarly, withexponentially-decaying receptive fields in space. As the animal moves away from a cell’spreferred boundary, firing rate would decrease exponentially with distance; as the animalmoved towards the boundary the firing rate would increase along the same curve describ-ing the receptive field (Fig. 3B). This is possible because α ( t ) is the signed velocity in thedirection of the boundary. If a population of border cells encodes the Laplace transformof distance to the boundaries, then across neurons there should be a wide variety spatial OWARD AND HASSELMO s in the population of border cells.By analogy to time cells, which have receptive fields in a circumscribed region of timesince a triggering event, the inverse of border cells would generate a population of neuronswith circumscribed receptive fields in space. Boundary vector cells (BVCs), observed withinthe subiculum, have just this property, with elongated firing fields that align with boundariesof an enclosure (Lever, Burton, Jeewajee, O’Keefe, & Burgess, 2009). In fact classicalhippocampal place cells behave as if they are formed from conjunctions of BVCs (O’Keefe& Burgess, 1996; Barry et al., 2006). If BVCs and place cells are the result of an approximateinverse transform, they should have properties analogous to those observed for populationsof time cells. BVCs should have more fields close to boundaries and the width of fieldsshould increase with distance to the boundary.This framework organizes other “cell types” in the MTL as well. Consider a popula-tion of cells coding for the sequence of movements leading up to the present position as afunction of distance traveled. In words, this population codes for a function f that carriesinformation like “I got here by travelling North for 2 cm; before that I moved West for10 cm . . . ” In this case, the “what information” in the population would be head direction(“2 cm in the past I was facing North” or “8 cm in the past I was facing West”). In orderto convey this information as a function of traveled distance, we would set α ( t ) to be speed(unsigned velocity in the direction of motion). Cells coding the Laplace transform of thiskind of function would behave as “trajectory-dependent” or “retrospective coding” cells(Frank, Brown, & Wilson, 2000). Cells coding for the inverse transform would manifestas “splitter” cells that fire differentially on the central arm of a figure-8 maze during analternation task depending on the past locations (Frank et al., 2000; Wood, Dudchenko,Robitsek, & Eichenbaum, 2000; Dudchenko & Wood, 2014) that have been observed in theentorhinal cortex and hippocampus.Other functional cell types in the MEC can be understood as coding for spatiotempo-ral trajectories in the Laplace domain or approximating functions. When an animal pausesduring a virtual navigation task, a population of MEC cells fire sequentially recording theamount of time since the animal ceased moving (Heys & Dombeck, 2018). Even speed cells,which are believed to map the animal’s instantaneous speed onto their firing rate, actuallyfilter speed as a function of time with a spectrum of time constants (Dannenberg, Kelley,Hoyland, Monaghan, & Hasselmo, 2019). This is consistent with the idea that speed cells inMEC are actually coding the Laplace transform of the history of speed in the time leadingup to the present. The characteristic predictions from this theoretical approach are bestevaluated at the level of populations and manifest largely as distributions of parameters. Accumulated evidence and decision-making
In many simple decision-making experiments, noisy instantaneous evidence must beintegrated over time in order to reach a confident decision. Decades of work in mathematicalpsychology has resulted in sophisticated computational models for simple evidence accumu-lation tasks (Luce, 1986; Smith & Ratcliff, 2004). The best known is the diffusion model(Ratcliff, 1978) (Figure 4). At each moment during the decision, the observer samples someevidence. The “particle’s position” at any moment, X t , describes the accumulated evidence OWARD AND HASSELMO Figure 4 . A Laplace-domain implementation of the diffusion model for evidence accu-mulation.
The diffusion model describes the internal state while a decision is being made as aparticle moving towards two absorbing boundaries, each corresponding to one of two possible de-cisions. At each moment of the decision, the position of the particle is a delta function located ata position X t that starts at a position z and moves between the two boundaries. In the Laplacedomain implementation of the diffusion model, two populations code for the distance to each of thetwo decision bounds. Cells coding for the Laplace transform of distance-to-bound, F ( s ), will ramptheir firing up (or down) as evidence accumulates. These cells have exponential receptive fields overthe decision axis. Different populations code for each of the two boundaries. We distinguish the twopopulations as F L ( s ) and F R ( s ). Within each population, different neurons have different values of s . Cells coding for the inverse Laplace transform, ˜ f ( ∗ x ), have receptive fields that tile each decisionaxis. Within each population different cells have different receptive field centers. After Howard, etal., (2018). for each alternative up to that point. This abstract model aligns to a strategy in whichthe starting position (usually referred to as z ) is controlled by the decision-maker’s priorexpectations and the boundary separation (usually referred to as a ) describes the degreeof confidence the decision-maker requires before making a choice (Gold & Shadlen, 2007).We can understand the evidence at any moment t as a function f ( x ) with a peak at asingle value X t . The time derivative of the position of the particle is just the instantaneousevidence sampled at time t .With this understanding, it is straightforward to build a Laplace-domain model ofthe diffusion model by constructing two populations, one of which estimates the distanceof X t to the lower bound and one that estimates the distance to the upper bound. Wewill subscript the two populations such that F R ( s ) and F L ( s ) correspond to the Laplacetransforms of these two functions and ˜ f R ( ∗ x ) and ˜ f L ( ∗ x ) correspond to the inverse transforms(Figure 4, (Howard, Luzardo, & Tiganj, 2018)). In the diffusion model evidence for onealternative reduces the evidence for the other alternative so α L ( t ) = − α R ( t ). A decisionis made when the “particle” reaches the smallest value of ∗ x in one of the populations.Setting α L and α R to be non-zero and have the same sign effectively changes the decisionbounds. Positive paired values of α have the effect of widening the decision bounds whereasnegative paired values of α have the effect of collapsing the decision bounds enabling speeded OWARD AND HASSELMO x is carried by the average firing rate of many neurons (Zandbelt, Purcell,Palmeri, Logan, & Schall, 2014), the Laplace domain representation represents x as a dis-tributed pattern of firing across many neurons indexed by their value of s . This populationhas receptive fields that are exponential curves in the decision variable, much like leakyintegrator models for decision-making (Busemeyer & Townsend, 1993). This approach fur-ther predicts that there should be a heterogeneous distribution of s values across neurons,analogous to recent findings from rodent cortex (Koay, Thiberge, Brody, & Tank, 2019).It is precisely this heterogeneity across neurons—the fact that s forms a continuum—thatallows the population to code the Laplace transform of accumulated evidence. The inversetransform leads to neurons with compact receptive fields along the decision axis, analogousto empirical findings from rodent posterior parietal cortex (Morcos & Harvey, 2016). Cognitive models built entirely of transform/inverse pairs
We have seen evidence that memory—data represented as functions over time andspace—and evidence accumulation—a function over position within a decision space—canboth be represented with the same form of neural circuit for encoding the Laplace transformand inverse. Many detailed cognitive models of memory tasks include a memory componentand an evidence accumulation component in describing behavioral data (Ratcliff, 1978;Nosofsky & Palmeri, 1998; Sederberg, Howard, & Kahana, 2008; Donkin & Nosofsky, 2012).Moreover, detailed models of evidence accumulation make use of memory for past outcomesto make sense of sequential dependencies in RTs (Kornblum, 1973) manifest as changesin bias, drift rate or boundary separation (Gold, Law, Connolly, & Bennur, 2008; Urai,De Gee, Tsetsos, & Donner, 2019). Neurally, reward history can be decoded from neuralactivity in several brain regions (Morcos & Harvey, 2016; Marcos et al., 2013). Bernacchiaand colleagues (Bernacchia, Seo, Lee, & Wang, 2011) estimated the time scale over whichcortical neurons were modulated by reward history and found a wide range of decay rates,very much consistent with the idea that the population contained information about theLaplace transform of the history of rewards.This convergence between memory and evidence accumulation suggests the possibilitythat two interrelated systems using the same mathematical form interact with one another.That is, perhaps the same equations that govern memory for reward history over tens ofminutes (Bernacchia et al., 2011) also govern the evolution of evidence between two deci-sion bounds over the scale of less than a second (Koay et al., 2019). The reward historycould be used to set the bias parameter of the evidence accumulator so that segment of thecomputational cognitive model could be built from the same form of equations. Cognitivearchitectures (Laird, 2012; J. R. Anderson, 2013) have long provided self-contained modelsfor cognitive performance, with many interacting modules contributing to any particulartask. Perhaps if the same neural circuit can be used for the evidence accumulation andworking memory modules, one could use the same kind of canonical Laplace circuit to con-struct all (or most of) the modules needed to perform a complete task. A general cognitivecomputer built along these lines would require sequential operation of “cognitive programs”
OWARD AND HASSELMO α ( t ) can be externally set, could be used to implement condi-tional flow control of the sequence of operations. We discuss two additional considerationsthat suggest the Laplace domain could be well suited for a more general cognitive computer.First, it is mathematically straightforward to write out efficient data-independent operatorsusing the Laplace representation. These can be understood as population-level modula-tions of circuits and, at least in the case of the translation operator—can lead to interestingconnections to neurophysiology. Second, neural evidence suggests neural representations ofsequences of motor actions can be understood as functions over planned future time. Thissuggests that other sequences—for instance sequences of cognitive operations—could alsobe constructed as functions over a planned future. These properties are necessary (butcertainly not sufficient) to develop a general computing device to mimic human cognition(Gallistel & King, 2011). Early computational work has demonstrated the feasibility of thisapproach at least for a few simple laboratory memory tasks (Tiganj et al., 2019). * . Efficient data-independent operators in the Laplace domainThe properties of the Laplace domain make it particularly well-suited for data-independent operations. Given data—in the form of functions represented as F ( s )/ ˜ f ( ∗ x )pairs—these operators generate an appropriate answer for every possible function theycould encounter. For instance, an addition operator should not need to know in advancewhat pair of numbers will be added together and should work effectively on numbers ithas never experienced before. Properties of the Laplace domain provide efficient recipes fordata-independent operators. We discuss several of these here.The translation operator takes a function f ( x ) and shifts it by some amount to f ( x + δ ). Consider how to implement translation of a function represented by a neural population˜ f ( ∗ x ). We would need to transfer information from each cell ∗ x o to a translated cell ∗ x o + δ .This could be implemented via a functional connection between pairs of cells—i.e., a matrixof synaptic connections. However, because we do not know a priori what displacement δ will be required, to be useful for all possible translations this hypothesized circuit mustconnect every neuron in ˜ f ( ∗ x ) with every other neuron. Translation in the Laplace domainis computationally more simple. If F ( s ) is the transform of f ( x ), the transform of thetranslated function f ( x + δ ) is simply e − sδ F ( s ). That is, the activity of each cell codingfor the transform is multiplied by a number that depends on s and δ . There is no need forinformation to be exchanged between cells in F . To examine the translated function, wesimply need to invert the transform with L -1k and obtain an estimate of f ( x + δ ).Translation is potentially useful for many problems that arise in cognitive science.For instance, translating functions over time can be used to predict the future. A modelimplementing function translation to predict the future (Shankar, Singh, & Howard, 2016)can be mapped on to theta phase precession in the hippocampus and associated regions(van der Meer & Redish, 2011). The key neurobiological property necessary to implementtranslation in this model is the ability to dynamically modulate synaptic weights over thecourse of theta oscillations, a property that has been observed in field potential recordings(Wyble, Linster, & Hasselmo, 2000). Translation could also be useful in manipulatingvisual representations or generating planned movements in allocentric space (e.g., Johnson& Redish, 2007). OWARD AND HASSELMO f ( x ) and g ( x )representing two specific numbers x f and x g . We can imagine f ( x ) as a flat function exceptfor a peak at the value x f and g ( x ) as a flat except for a peak at x g . What would we desirefor a function representing the sum of these two numbers? Simply adding f ( x ) + g ( x ) isclearly not what we want—this would give two peaks, one at x f and the other at x g , which isnot understandable as a single number. A moments reflection shows that we would want therepresentation [ f + g ][ x ]to have a single peak at x f + x g . The convolution of two functionsproduces just this answer. The convolution of two functions is written f (cid:63) g . Much liketranslation, convolution performed directly on functions is computationally demanding. Todirectly convolve a population of cells ˜ f and another population ˜ g would require one to takethe product of the activation of all possible pairs of cells and then sum the results, keepingseparate the information about the difference in ∗ x between them. While this is possible tocompute it would require many connections and a relatively elaborate circuit. In contrast,convolution is much more simple in the Laplace domain. In particular, L [ f (cid:63) g ] = F ( s ) G ( s ).That is, to construct the transform of the convolution of two functions, we need only takethe product of the transform of each of the functions at each s . To invert the transformand obtain a direct estimate of the answer, we would apply L -1k as L -1k F ( s ) G ( s ). A neuralcircuit implementing this mathematical operation would provide a sensible answer for anypair of numbers.To subtract a pair of numbers, we need an operator that is the inverse of addition.The inverse to convolution is referred to as cross-correlation, f g . Like convolution, theLaplace transform of the cross-correlation of two functions is relatively simple: L [ f g ] = F ( s ) G ( − s ). Although there are certainly important problems to solve in constructing adetailed neural model of subtraction, existence of an inverse operator to addition eliminatesa conceptual obstacle to constructing a number system: any pair of “numbers” (representedas functions over a popuation of neurons) could be combined to obtain a new “number.” Thecompression of neural representations ˜ f ( ∗ x ) means that our estimate of number is not precise,but of course the brain’s estimate of number is also imprecise (Gallistel & Gelman, 2000;Feigenson, Dehaene, & Spelke, 2004; Nieder & Dehaene, 2009). Notably, the quantitativeform of compression of the brain’s number system is believed to be similar to the compressionof retinal coordinates in the cortex (Schwartz, 1977; Van Essen, Newsome, & Maunsell,1984) and is at least roughly consistent with the form of compression of time shown bytime cells. Moreover, a general subtraction operator could be used across many differentcognitive domains. * . Are planned actions represented using the Laplace domain? In order to build“cognitive programs” it would be necessary to sequentially gate information in and out ofmemory and to and from the evidence accumulation circuit. Gating can in principle at leastbe implemented via oscillatory dynamics in the brain (Sherfey, Ardid, Miller, Hasselmo, & OWARD AND HASSELMO will happen when in the planned future. In a task where monkeys had to make a series ofmovements, recordings from the lateral PFC showed neurons that code for what motion ananimal makes in sequence (Mushiake, Saito, Sakamoto, Itoyama, & Tanji, 2006). That is,the animal had to perform a series of movements, say moving a cursor left-right-down .As the sequence unfolded, cells fired conjunctively for specific movements (e.g., left or down ) but only in specific positions in the sequence (e.g., first, second, or third). In muchthe same way a stimulus-specific time cell fires only when its preferred stimulus is in itstemporal receptive field (Tiganj et al., 2018; Taxidis et al., 2018), these cells fire when theirpreferred movement occurs in their sequential receptive field. Notably, these populations inlPFC also fired in the moment before the sequence was initiated (Mushiake et al., 2006), butretaining the coding properties that will occur in the future movement. This pre-movementfiring was as if the entire sequential plan was quickly loaded into memory prior to movementinitiation. Smooth reaching movements also result in sequentially activated cells in motorcortex (Lebedev et al., 2019). The similarity to stimulus-specific time cells suggests thatthese neural populations could code for an estimate of a function of sequential actions.By analogy to the Laplace transform of the past, cells coding the Laplace transformof planned future actions would manifest as cells that ramp to the time when an event willtake place. Ramping neurons during movement preparation have been observed in prefrontalcortex cortices (Narayanan, 2016), including anterior lateral motor (ALM) cortex (Li, Daie,Svoboda, & Druckmann, 2016; Inagaki, Inagaki, Romani, & Svoboda, 2018; Svoboda & Li,2018; Inagaki, Fontolan, Romani, & Svoboda, 2019). Neurons in ALM in particular canbe used to decode what movement will occur and how far in the future (Li et al., 2016).Note that when s is small, an exponential function is approximately linear. If this ALMpopulation codes for the Laplace transform of time until a planned movement, this predictsthat different cells should ramp at a variety of rates. Discussion
This paper pursues the hypothesis that the brain represents functions in the worldas activity over populations of neurons. The parameters of the receptive fields of theseneurons trace out a continuum and the brain uses two distinct forms of receptive fields.Exponential receptive fields enable a population to code for the Laplace transform of afunction; circumscribed receptive fields enable a compressed estimate of the function itself.We reviewed evidence that the brain maintains both of these kinds of representation forfunctions over past time in the EC and hippocampus. Computationally, this approach canbe used to estimate functions over many other variables. Considering spatial variables wecan make sense of border cells, boundary vector cells and other functional cell types in thehippocampus and related regions. We reviewed computational work showing that widely-used cognitive models for evidence accumulation can be cast in this framework, making
OWARD AND HASSELMO
Computational neuroscience and cognition
As our ability to measure activity from large numbers of neurons grows, it will be in-creasingly necessary to have ways of understanding the collective behavior of large numbersof neurons (Yuste, 2015; Hasselmo, 2015). The basic unit of analysis we have argued foris not the neuron, but rather populations of neurons representing and manipulating con-tinua. This is analogous to the approach taken in many fields of physics where it has longbeen appreciated that theories should describe phenomena at an appropriate level of detail(P. W. Anderson, 1972). For instance, fluid dynamics describes the flow of liquids not interms of molecules but in terms of incompressible volume elements. To determine the flowof water in a pipe one does not need need to worry at all about chemistry. If we could mea-sure the position of each individual water molecule during an experiment, we could evaluatethe theory, but the theory would be equally correct no matter whether we understand thechemistry of water molecules or if the incompressible volume element was made of greencheese. A different theory would be required to understand why some liquids have differentviscosity than others. Returning to neuroscience, if the approach in this paper has merit, itsuggests a number of specific problems that are tractable in the context of computationalneuroscience. How does a population of temporal context cells manage to have a specificdistribution of time constants? How do neural circuits implement L -1k ? We discuss somepossibilities in Appendix 1, but the larger point is that this approach segments the com-putational neuroscience of circuits of neurons from cognitive neuroscience. Understandingcognition starting from individual neurons is kind of like trying to understand the flow ofwater through a channel starting with a model of the Bohr atom.If it is really the case that populations of neurons organize themselves to estimatecontinua, then this places constraints on the data analysis tools we use to study populationsof neurons. Thus far, the strategy taken with time cells and temporal context cells has beento construct a hypothesis about the specific variable being represented and then estimateindividual receptive fields to hopefully trace out a continuum of parameters across neuronscorresponding to ∗ τ or s . This approach could in principle be applied piecemeal to problemsin different brain regions, but there are significant challenges. First, even in the hippocam-pus, cells have receptive fields along more than one kind of variable. For instance, considerthe situation when an animal is placed on a treadmill with varying speeds. Because thespeed changes from trial to trial, the time since the run started is deconfounded from dis-tance traveled. We would expect “time cells” to care only about time and “distance cells”to care about distance. However, all possible combinations of time and distance sensitivityare observed, with time cells and distance cells as special cases of a continuous mixture(Kraus et al., 2013; Howard & Eichenbaum, 2015). Second, our a priori hypotheses aboutwhat a specific population of cells codes for depend on prior work. A data-driven approachto neural data analysis would avoid these kinds of problems. However, widely used data-driven approaches can be ill-suited to discover continua. For instance, individual reachingmovements generate sequences of activity in motor cortex strikingly similar to sequencesof time cells (Lebedev et al., 2019). These sequences can be readily understood as cellstiling a continuum, ˜ f ( ∗ x ). But data-driven dimensionality reduction methods can identifyrotational dynamics from the same kinds of data (Churchland et al., 2012; Aoi, Mante, OWARD AND HASSELMO
17& Pillow, 2019). To overcome this problem will require data-driven tools that look formultidimensional continua in neural coding.
Computational models of natural and artificial cognition
Laboratory cognitive tasks allow us to study behavior in a quantitative way undertightly controlled circumstances. Although this approach is quite artificial relative to real-world cognition, it places strong constraints on computational models of behavior. However,recent work in mathematical psychology has shown that even very successful cognitivemodels cannot be uniquely identified using behavioral data alone (Jones & Dzhafarov,2014). Joint modeling of neural and behavioral data is a promising avenue to constraincognitive models (Turner, Sederberg, Brown, & Steyvers, 2013; Turner, Forstmann, Love,Palmeri, & Van Maanen, 2017; Palestro, Bahg, et al., 2018), but it does not solve theproblem of determining whether a particular cognitive model is neurally plausible a priori .If we knew with certainty that populations of neurons really do represent continua via theLaplace transform and that those continua have a specific form of compression, this wouldplace a strong constraint on detailed cognitive models of behavior.If thoughts map onto functions, then thinking maps onto manipulating those func-tions. The Laplace domain provides recipes for data-independent operators that could beused to manipulate and compare functions—to think. As such, this way of viewing cogni-tion and neurophysiology sidesteps many of the conceptual concerns that have traditionallydogged connectionist models and much of contemporary deep learning approaches.
OWARD AND HASSELMO Appendix 1: Possible neurophysiological mechanisms for theLaplace transform
There are three main requirements to implement the Laplace transform/inverse codingscheme for functions over arbitrary variables. First, the the Laplace transform requires thatneurons have a wide range of functional time constants that are very large compared tomembrane time constants. Second, to invert the Laplace transform it is necessary for acircuit to implement the L -1k operator. Third, to enable coding of the Laplace transformof functions other than time, it is necessary to manipulate the gain of neurons. This boxsketches possible neurophysiological mechanisms for these three computational functions.There are almost certainly other possible mechanisms that could give rise to these propertiesand there is no guarrantee that, even assuming that different brain regions obey the sameequations, that they are implemented using the same mechanisms in different regions.Neurophysiological data indicates that neural circuits could implement the mecha-nisms of the Laplace transform. The real part of the Laplace transform corresponds toexponential decay with a spectrum of time constants. Recurrent network connections couldgenerate slow time constants, but it is also possible that intracellular mechanisms contributeto exponential decay with a variety of time constants across cells. Intracellular recordings incortical slice preparations show persistent firing over a range of time scales in the absence ofsynaptic input. For instance, spike frequency accommodation of neurons in piriform cortexshows a pattern of exponential decay over hundreds of milliseconds (Barkai & Hasselmo,1994). In entorhinal cortex slice preparations show exponential decay in persistent firingrate over seconds (Tahvildari, Frans´en, Alonso, & Hasselmo, 2007; Knauer, Jochems, Valero-Aracama, & Yoshida, 2013). At the upper limit, isolated neurons in slices from entorhinal(Egorov, Hamam, Frans´en, Hasselmo, & Alonso, 2002) and perirhinal cortex (Navaroli,Zhao, Boguszewski, & Brown, 2011) integrate their inputs and maintain persistent firingfor arbitrarily long periods of time. These cells show effectively infinite time constants inthe absence of synaptic inputs. The decay in persistent firing can be modeled based on theproperties of nonspecific calcium dependent cation current and calcium diffusion (Frans´en,Tahvildari, Egorov, Hasselmo, & Alonso, 2006; Tiganj, Hasselmo, & Howard, 2015).The entorhinal cortex provides input to the hippocampus, so that the populationcoding for F ( s ) is one synapse away from the population coding for ˜ f ( ∗ τ ). Because theequation ˜ f ( ∗ τ ) = L -1k F ( s ) is mathematically true, there should be some way to understandthe functional mapping between the regions as an approximate inverse Laplace transform L -1k . The inverse Laplace transform requires combining the different exponential decay rateswith different positive and negative values (Eq. 4, Appendix 2). The simplest way to thinkof this is subtraction of an exponential function with a faster decay from an exponential ofthe same starting value with slower decay. This will result in a function that peaks at a timepoint dependent upon the difference of the two time constants. If we multiplied both of thetime constants by the same number, the difference would peak at a proportionally largertime. A biological detailed spiking model of the inverse Laplace transform (Liu et al., 2019)can be built from a series of additions and subtractions in which a particular time constanthas subtractions from time constants close in value. These derivatives with respect to s (Eq. 4) are analogous to center-surround receptive fields (Marr & Hildreth, 1980), only in s rather than in retinal position. Methods for blind source separation including independent OWARD AND HASSELMO
OWARD AND HASSELMO Appendix 2: Mathematics of the Laplace transform
Consider a population of leaky integrators indexed by their rate parameter s . Eachof the neurons in this population receive the same input f ( t ) at each moment and updatetheir firing rate as dF ( s ) dt = − sF ( s ) + f ( t ) . (1)We understand F ( s ) as describing the activity of a large number of neurons with manyvalues of s . Note that Eq. 1 only requires information about f and F at the presentmoment. However, the solution to Eq. 1 gives the real Laplace transform of the entirefunction f ( τ ) running backwards from the present infinitely far in the past: F ( s ) = (cid:90) ∞ f ( τ ) e − sτ dτ (2)where we understand f ( τ ) on the right hand side to be the series of inputs ordered from thepresent towards the past. That is, the f ( τ ) on the right hand side of Eq. 2 is related to f ( t )in Eq. 1 as f ( τ ) ≡ f ( t − τ ). Another way to say this is that F ( s ), the pattern of activity attime t , is the Laplace transform of the entire history of f leading up to the present.The Post approximation (Post, 1930) provides a neurally-realistic method for ap-proximately inverting the Laplace transform. This method allows us to take the set of cellscoding for F ( s ), each with a different value of s and map them onto a new population of cellsthat estimate the original function. We index those cells by a parameter ∗ τ and write ˜ f ( ∗ τ )to refer to the firing rate of the entire population. The approximation of f ( τ ) is computedas follows: ˜ f ( ∗ τ ) = L -1k F ( s ) (3)= C k s k +1 d k ds k F ( s ) (4)The parameter k controls the precision of the approximation. Post proved that in the limitas k → ∞ , ˜ f ( ∗ τ ) = f ( τ ). The internal estimate of past time ∗ τ is related to s as ∗ τ = k/s .This means that ∗ τ is proportional to the time constant 1 /s . The value of ∗ τ has a physicalmeaning in that it gives the time lag at which each cell in ˜ f would peak following a deltafunction input.Equation 4 describes a mapping from a population of cells indexed by s to anotherpopulation indexed by ∗ τ . To understand the mechanism of the inverse operator, let’sconsider Eq. 4 from the perspective of a particular cell with a particular value ∗ τ o . Thetime dependence on the right hand side comes entirely from the derivative term— C k is aconstant that is the same for all cells and s k +1 is a scaling factor specific to the value of ∗ τ o .The derivative term says that the firing rate ˜ f ( ∗ τ o ) is controlled by the k th derivative withrespect to s in the neighborhood of a specific value of s , s o = k/ ∗ τ o . Computing the k thderivative requires comparing the firing rate other cells in the neighborhood of s o (Shankar& Howard, 2013).To generalize to functions over variables other than time, we allow the gain of all ofthe neurons coding for F ( s ) to be modulated together by a time-dependent function α ( t ): dF ( s ) dt = α ( t ) [ − sF ( s ) + f ( t )] (5) OWARD AND HASSELMO α ( t ) = 1, this expression reduces to Eq. 1. Consider the situation where f ( t ) isa delta function input at t = 0. This initializes the activation at 1 for all units. This is theLaplace transform of a delta function at x = 0. If, in the time after t >
0, we find α ( t ) = 1, F ( s ) at time t will code for the Laplace transform of the time since the delta function, F ( s ) = e − st . However, if α ( t ) was some positive constant that was greater or less than one, α o we would find F ( s ) = e − s ( α o t ) . That is, changing α o from 1 is equivalent to making timego faster or slower. If we found F ( s ) in a state where F ( s ) = e − sx for some value of x , thenwe set α to some specific value α o , we would find after some time displacement ∆ t that F ( s ) is now e − sα o ∆ t e − sx = e − s ( x + α o ∆ t ) . If we could arrange for α o to be the rate of change of x during this interval, then our newvalue of F ( s ) = e − s ( x +∆ x ) . Note that this is true whether ∆ x is positive or negative. Thismeans that during an interval where f ( t ) = 0, if α ( t ) = dx/dt then F ( s ) records the Laplacetransform of f ( x ) rather than f ( t ).Although there are many variables that could be productively represented in this way,there are two potentially important limitations to this approach. First, if f ( t ) is to be non-zero, f ( t ) must be an implicit function of x , f [ x ( t )]. This makes sense if f ( t ) correspondsto, say, contact with a landmark in a spatial environment but can lead to complicationsin general. Second, significant problems arise when one attempts to use this approach torepresent values of x <
0. To make this concrete, note that Eq. 5 works for both positiveand negative rates of change—as we would expect in a spatial navigation task where theanimal can move either to the left or to the right. Suppose one starts with F ( s ) = e − s ;each cell at a high firing rate. If we set α to be α o and evolve Eq. 5 for some time we find F ( s ) = e − sα o ∆ t . If α o is positive, each of the cells decay from 1. However, if α o is negative,the cells increase their firing rate exponentially from 1, growing without bound. Moreover,the inverse operator does not not behave well as x passes through zero. Acknowledgments
Alexander Howard helped with Figure 2. Supported by ONR MURI N00014-16-1-2832 and NIBIB R01EB022864. The authors thank Randy Gallistel, Per Sederberg, JoshGold, Aude Oliva, and Nathaniel Daw for helpful conversations.
References
Akhlaghpour, H., Wiskerke, J., Choi, J. Y., Taliaferro, J. P., Au, J., & Witten, I. (2016). Dissociatedsequential activity and stimulus encoding in the dorsomedial striatum during spatial workingmemory. eLife , , e19507.Anderson, J. R. (2013). The adaptive character of thought . Psychology Press.Anderson, P. W. (1972). More is different.
Science , (4047), 393–396.Aoi, M. C., Mante, V., & Pillow, J. W. (2019). Prefrontal cortex exhibits multi-dimensional dynamicencoding during decision-making. bioRxiv , 808584.Balsam, P. D., & Gallistel, C. R. (2009). Temporal maps and informativeness in associative learning. Trends in Neuroscience , (2), 73–78.Barkai, E., & Hasselmo, M. E. (1994). Modulation of the input/output function of rat piriformcortex pyramidal cells. Journal of Neurophysiology , (2), 644–658. OWARD AND HASSELMO Barry, C., Lever, C., Hayman, R., Hartley, T., Burton, S., O’Keefe, J., . . . Burgess, N. (2006). Theboundary vector cell model of place cell firing and spatial memory.
Reviews in Neuroscience , (1-2), 71-97.Behrens, T. E. J., Muller, T. H., Whittington, J. C. R., Mark, S., Baram, A. B., Stachenfeld, K. L.,& Kurth-Nelson, Z. (2018). What is a cognitive map? organizing knowledge for flexiblebehavior. Neuron , (2), 490-509. doi: 10.1016/j.neuron.2018.10.002Bell, A. J., & Sejnowski, T. J. (1997). The independent components of natural scenes are edgefilters. Vision research , (23), 3327–3338.Bernacchia, A., Seo, H., Lee, D., & Wang, X. J. (2011). A reservoir of time constants for memorytraces in cortical neurons. Nature Neuroscience , (3), 366-72.Bhandari, A., & Badre, D. (2018). Learning and transfer of working memory gating policies. Cognition , , 89–100.Bright, I. M., Meister, M. L. R., Cruzado, N. A., Tiganj, Z., Howard, M. W., & Buffalo, E. A. (2019).A temporal record of the past with a spectrum of time constants in the monkey entorhinalcortex. bioRxiv , 688341.Brown, G. D. A., Neath, I., & Chater, N. (2007). A temporal ratio model of memory. PsychologicalReview , (3), 539-76.Busemeyer, J. R., & Townsend, J. T. (1993). Decision field theory: a dynamic-cognitive approachto decision making in an uncertain environment. Psychological Review , (3), 432-459.Chance, F. S., Abbott, L. F., & Reyes, A. D. (2002). Gain modulation from background synapticinput. Neuron , (4), 773-82.Churchland, M. M., Cunningham, J. P., Kaufman, M. T., Foster, J. D., Nuyujukian, P., Ryu, S. I.,& Shenoy, K. V. (2012). Neural population dynamics during reaching. Nature , (7405),51.Cohen, N. J., & Eichenbaum, H. (1993). Memory, amnesia, and the hippocampal system . Cambridge,MA: The MIT Press.Cruzado, N. A., Tiganj, Z., Brincat, S. L., Miller, E. K., & Howard, M. W. (2019). Conjunctiverepresentation of what and when in monkey hippocampus and lateral prefrontal cortex duringan associative memory task. bioRxiv , 709659.Dannenberg, H., Kelley, C., Hoyland, A., Monaghan, C. K., & Hasselmo, M. E. (2019). The firingrate speed code of entorhinal speed cells differs across behaviorally relevant time scales anddoes not depend on medial septum inputs.
Journal of Neuroscience , 1450–18.Donkin, C., & Nosofsky, R. M. (2012). A power-law model of psychological memory strength inshort- and long-term recognition.
Psychological Science . doi: 10.1177/0956797611430961Dudchenko, P. A., & Wood, E. R. (2014). Splitter cells: hippocampal place cells whose firing ismodulated by where the animal is going or where it has been. In
Space, time and memory inthe hippocampal formation (pp. 253–272). Springer.Egorov, A. V., Hamam, B. N., Frans´en, E., Hasselmo, M. E., & Alonso, A. A. (2002). Gradedpersistent activity in entorhinal cortex neurons.
Nature , (6912), 173-8.Eichenbaum, H. (2014). Time cells in the hippocampus: a new dimension for mapping memories. Nature Reviews Neuroscience , (11), 732-44. doi: 10.1038/nrn3827Eichenbaum, H., Dudchenko, P., Wood, E., Shapiro, M., & Tanila, H. (1999). The hippocampus,memory, and place cells: is it spatial memory or a memory space? Neuron , (2), 209-226.Feigenson, L., Dehaene, S., & Spelke, E. (2004). Core systems of number. Trends in CognitiveSciences , (7), 307–314.Fodor, J. A., & Pylyshyn, Z. W. (1988). Connectionism and cognitive architecture: A criticalanalysis. Cognition , (1), 3–71.Frank, L. M., Brown, E. N., & Wilson, M. (2000). Trajectory encoding in the hippocampus andentorhinal cortex. Neuron , (1), 169-178.Frans´en, E., Tahvildari, B., Egorov, A. V., Hasselmo, M. E., & Alonso, A. A. (2006). Mechanismof graded persistent cellular activity of entorhinal cortex layer V neurons. Neuron , (5), OWARD AND HASSELMO Trends in Cognitive Sciences , (2), 59–65.Gallistel, C. R., & King, A. P. (2011). Memory and the computational brain: Why cognitive sciencewill transform neuroscience (Vol. 6). John Wiley & Sons.Gold, J. I., Law, C.-T., Connolly, P., & Bennur, S. (2008). The relative influences of priors andsensory evidence on an oculomotor decision variable during perceptual learning.
Journal ofneurophysiology , (5), 2653–2668.Gold, J. I., & Shadlen, M. N. (2007). The neural basis of decision making. Annual Review Neuro-science , , 535–574.Goldman, M. S. (2009). Memory without feedback in a neural network. Neuron , (4), 621–634.Graves, A., Wayne, G., & Danihelka, I. (2014). Neural turing machines. arXiv preprintarXiv:1410.5401 .Grossberg, S., & Merrill, J. (1992). A neural network model of adaptively timed reinforcementlearning and hippocampal dynamics. Cognitive Brain Research , , 3-38.Hafting, T., Fyhn, M., Molden, S., Moser, M. B., & Moser, E. I. (2005). Microstructure of a spatialmap in the entorhinal cortex. Nature , (7052), 801-6.Hasselmo, M. E. (2012). How we remember: Brain mechanisms of episodic memory . Cambridge,MA: MIT Press.Hasselmo, M. E. (2015). If i had a million neurons: Potential tests of cortico-hippocampal theories.In
Progress in brain research (Vol. 219, pp. 1–19). Elsevier.Heys, J. G., & Dombeck, D. A. (2018). Evidence for a subcircuit in medial entorhinal cortexrepresenting elapsed time during immobility.
Nature neuroscience , (11), 1574.Howard, M. W. (2018). Memory as perception of the past: Compressed time in mind and brain. Trends in Cognitive Sciences , , 124-136.Howard, M. W., & Eichenbaum, H. (2015). Time and space in the hippocampus. Brain Research , , 345-354.Howard, M. W., Luzardo, A., & Tiganj, Z. (2018). Evidence accumulation in a Laplace decisionspace. Computational Brain and Behavior , , 237-251.Howard, M. W., MacDonald, C. J., Tiganj, Z., Shankar, K. H., Du, Q., Hasselmo, M. E., &Eichenbaum, H. (2014). A unified mathematical framework for coding time, space, andsequences in the hippocampal region. Journal of Neuroscience , (13), 4692-707. doi:10.1523/JNEUROSCI.5808-12.2014Howard, M. W., Shankar, K. H., Aue, W., & Criss, A. H. (2015). A distributed representation ofinternal time. Psychological Review , (1), 24-53.Inagaki, H. K., Fontolan, L., Romani, S., & Svoboda, K. (2019). Discrete attractor dynamicsunderlies persistent activity in the frontal cortex. Nature , (7743), 212.Inagaki, H. K., Inagaki, M., Romani, S., & Svoboda, K. (2018). Low-dimensional and monotonicpreparatory activity in mouse anterior lateral motor cortex. Journal of Neuroscience , (17),4163–4185.James, W. (1890). The principles of psychology . New York: Holt.Jin, D. Z., Fujii, N., & Graybiel, A. M. (2009). Neural representation of time in cortico-basal gangliacircuits.
Proceedings of the National Academy of Sciences , (45), 19156–19161.Johnson, A., & Redish, A. D. (2007). Neural ensembles in CA3 transiently encode paths forward ofthe animal at a decision point. Journal of Neuroscience , (45), 12176-89.Jones, M., & Dzhafarov, E. N. (2014). Unfalsifiability and mutual translatability of major modelingschemes for choice reaction time. Psychological review , (1), 1.Knauer, B., Jochems, A., Valero-Aracama, M. J., & Yoshida, M. (2013). Long-lasting intrinsicpersistent firing in rat CA1 pyramidal cells: A possible mechanism for active maintenance ofmemory. Hippocampus . OWARD AND HASSELMO Koay, S. A., Thiberge, S. Y., Brody, C., & Tank, D. W. (2019). Neural correlates of cognitionin primary visual versus neighboring posterior cortices during visual evidence-accumulation-based navigation. bioRxiv , 568766.Kornblum, S. (1973). Sequential effects in choice reaction time: A tutorial review.
Attention andperformance IV , 259–288.Kraus, B. J., Robinson, R. J., 2nd, White, J. A., Eichenbaum, H., & Hasselmo, M. E. (2013).Hippocampal “time cells”: time versus path integration.
Neuron , (6), 1090-101. doi:10.1016/j.neuron.2013.04.015Laird, J. E. (2012). The Soar cognitive architecture . MIT press.Lebedev, M. A., Ossadtchi, A., Mill, N. A., Urp´ı, N. A., Cervera, M. R., & Nicolelis, M. A. (2019).What, if anything, is the true neurophysiological significance of rotational dynamics?
BioRxiv ,597419.LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning.
Nature , (7553), 436–444.Lever, C., Burton, S., Jeewajee, A., O’Keefe, J., & Burgess, N. (2009). Boundary vector cells in thesubiculum of the hippocampal formation. Journal of Neuroscience , (31), 9771-7.Li, N., Daie, K., Svoboda, K., & Druckmann, S. (2016). Robust neuronal dynamics in premotorcortex during motor planning. Nature , (7600), 459-464.Liu, Y., Tiganj, Z., Hasselmo, M. E., & Howard, M. W. (2019). A neural microcircuit model for ascalable scale-invariant representation of time. Hippocampus , (3), 260–274.Luce, R. D. (1986). Response times: Their role in inferring elementary mental organization (No. 8).Oxford University Press on Demand.MacDonald, C. J., Lepage, K. Q., Eden, U. T., & Eichenbaum, H. (2011). Hippocampal “time cells”bridge the gap in memory for discontiguous events.
Neuron , (4), 737-749.Marcos, E., Pani, P., Brunamonti, E., Deco, G., Ferraina, S., & Verschure, P. (2013). Neural vari-ability in premotor cortex is modulated by trial history and predicts behavioral performance. Neuron , (2), 249–255.Marcus, G. F. (2018). The algebraic mind: Integrating connectionism and cognitive science . MITpress.Marr, D., & Hildreth, E. (1980). Theory of edge detection.
Proceedings of the Royal Society ofLondon B , (1167), 187-217.Maunsell, J. H. R., & Treue, S. (2006, Jun). Feature-based attention in visual cortex. Trends inNeurosciences , (6), 317-22. doi: 10.1016/j.tins.2006.04.001McAdams, C. J., & Maunsell, J. H. (1999). Effects of attention on orientation-tuning functions ofsingle neurons in macaque cortical area v4. The Journal of Neuroscience , (1), 431–441.Mehaffey, W., Doiron, B., Maler, L., & Turner, R. (2005). Deterministic multiplicative gain controlwith active dendrites. Journal of Neuroscience , (43), 9968-9977.Mello, G. B., Soares, S., & Paton, J. J. (2015). A scalable population code for time in the striatum. Current Biology , (9), 1113–1122.Mita, A., Mushiake, H., Shima, K., Matsuzaka, Y., & Tanji, J. (2009). Interval time codingby neurons in the presupplementary and supplementary motor areas. Nature Neuroscience , (4), 502.Morcos, A. S., & Harvey, C. D. (2016). History-dependent variability in population dynamics duringevidence accumulation in cortex. Nature Neuroscience , (12), 1672–1681.Mushiake, H., Saito, N., Sakamoto, K., Itoyama, Y., & Tanji, J. (2006). Activity in the lateralprefrontal cortex reflects multiple steps of future events in action plans. Neuron , (4), 631–641.Narayanan, N. S. (2016). Ramping activity is a cortical mechanism of temporal control of action. Current opinion in behavioral sciences , , 226–230.Navaroli, V. L., Zhao, Y., Boguszewski, P., & Brown, T. H. (2011). Muscarinic receptor activationenables persistent firing in pyramidal neurons from superficial layers of dorsal perirhinal cortex. Hippocampus , 1392-1404. doi: 10.1002/hipo.20975
OWARD AND HASSELMO Nieder, A., & Dehaene, S. (2009). Representation of number in the brain.
Annual Review ofNeuroscience , , 185-208. doi: 10.1146/annurev.neuro.051508.135550Nosofsky, R. M., & Palmeri, T. J. (1998). An exemplar-based random walk model of speededclassification. Psychological Review , , 266-300.O’Keefe, J., & Burgess, N. (1996). Geometric determinants of the place fields of hippocampalneurons. Nature , (6581), 425-428.O’Keefe, J., & Dostrovsky, J. (1971). The hippocampus as a spatial map. preliminary evidence fromunit activity in the freely-moving rat. Brain Research , (1), 171-175.O’Keefe, J., & Nadel, L. (1978). The hippocampus as a cognitive map . New York: Oxford UniversityPress.Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties bylearning a sparse code for natural images.
Nature , (6583), 607–609.Palestro, J. J., Bahg, G., Sederberg, P. B., Lu, Z.-L., Steyvers, M., & Turner, B. M. (2018). A tutorialon joint models of neural and behavioral measures of cognition. Journal of MathematicalPsychology , , 20–48.Palestro, J. J., Weichart, E., Sederberg, P. B., & Turner, B. M. (2018). Some task demands inducecollapsing bounds: Evidence from a behavioral analysis. Psychonomic Bulletin & Review , (4), 1225–1248.Pastalkova, E., Itskov, V., Amarasingham, A., & Buzsaki, G. (2008). Internally generated cellassembly sequences in the rat hippocampus. Science , (5894), 1322-7.Poirazi, P., Brannon, T., & Mel, B. W. (2003). Arithmetic of subthreshold synaptic summation ina model CA1 pyramidal cell. Neuron , (6), 977-87.Post, E. (1930). Generalized differentiation. Transactions of the American Mathematical Society , , 723-781.Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review , , 59-108.Rossi-Pool, R., Zizumbo, J., Alvarez, M., Vergara, J., Zainos, A., & Romo, R. (2019). Temporal sig-nals underlying a cognitive process in the dorsal premotor cortex. Proceedings of the NationalAcademy of Sciences , , 7523–7532.Salz, D. M., Tiganj, Z., Khasnabish, S., Kohley, A., Sheehan, D., Howard, M. W., & Eichenbaum,H. (2016). Time cells in hippocampal area CA3. Journal of Neuroscience , , 7476-7484.Schwartz, E. L. (1977). Spatial mapping in the primate sensory projection: analytic structure andrelevance to perception. Biological Cybernetics , (4), 181-94.Sederberg, P. B., Howard, M. W., & Kahana, M. J. (2008). A context-based theory of recency andcontiguity in free recall. Psychological Review , , 893-912.Shankar, K. H., & Howard, M. W. (2010). Timing using temporal context. Brain Research , ,3-17.Shankar, K. H., & Howard, M. W. (2012). A scale-invariant internal representation of time. NeuralComputation , (1), 134-193.Shankar, K. H., & Howard, M. W. (2013). Optimally fuzzy temporal memory. Journal of MachineLearning Research , , 3753-3780.Shankar, K. H., Singh, I., & Howard, M. W. (2016). Neural mechanism to simulate a scale-invariantfuture. Neural Computation , , 2594–2627.Sherfey, J. S., Ardid, S., Miller, E. K., Hasselmo, M. E., & Kopell, N. J. (2019). Prefrontal oscillationsmodulate the propagation of neuronal activity required for working memory. bioRxiv , 531574.Silver, R. A. (2010). Neuronal arithmetic. Nature Reviews Neuroscience , (7), 474–489.Smith, P. L., & Ratcliff, R. (2004). Psychology and neurobiology of simple decisions. Trends inneurosciences. , (3), 161-8.Solstad, T., Boccara, C. N., Kropff, E., Moser, M. B., & Moser, E. I. (2008). Representation ofgeometric borders in the entorhinal cortex. Science , (5909), 1865-8.Svoboda, K., & Li, N. (2018). Neural mechanisms of movement planning: motor cortex and beyond. Current opinion in neurobiology , , 33–41. OWARD AND HASSELMO Tahvildari, B., Frans´en, E., Alonso, A. A., & Hasselmo, M. E. (2007). Switching between ”On” and”Off” states of persistent activity in lateral entorhinal layer III neurons.
Hippocampus , (4),257-63.Tank, D., & Hopfield, J. (1987). Neural computation by concentrating information in time. Pro-ceedings of the National Academy of Sciences , (7), 1896–1900.Taxidis, J., Pnevmatikakis, E., Mylavarapu, A. L., Arora, J. S., Samadian, K. D., Hoffberg, E. A.,& Golshani, P. (2018). Emergence of stable sensory and dynamic temporal representations inthe hippocampus during working memory. bioRxiv , 474510.Terada, S., Sakurai, Y., Nakahara, H., & Fujisawa, S. (2017). Temporal and rate coding for discreteevent sequences in the hippocampus. Neuron , , 1-15.Tiganj, Z., Cromer, J. A., Roy, J. E., Miller, E. K., & Howard, M. W. (2018). Compressed timelineof recent experience in monkey lPFC. Journal of Cognitive Neuroscience , , 935-950.Tiganj, Z., Cruzado, N. A., & Howard, M. W. (2019). Towards a neural-level cognitive architecture:modeling behavior in working memory tasks with neurons. In A. Goel, C. Seifert, & C. Freksa(Eds.), Proceedings of the 41st annual conference of the cognitive science society (p. 1118-1123). Montreal: Cognitive Science Society.Tiganj, Z., Hasselmo, M. E., & Howard, M. W. (2015). A simple biophysically plausible model forlong time constants in single neurons.
Hippocampus , (1), 27-37.Tiganj, Z., Shankar, K. H., & Howard, M. W. (2017). Scale invariant value computation forreinforcement learning in continuous time. In AAAI 2017 Spring Symposium Series - Scienceof Intelligence: Computational Principles of Natural and Artificial Intelligence.
Tsao, A., Sugar, J., Lu, L., Wang, C., Knierim, J. J., Moser, M.-B., & Moser, E. I. (2018). Integratingtime from experience in the lateral entorhinal cortex.
Nature , , 57-62.Turner, B. M., Forstmann, B. U., Love, B. C., Palmeri, T. J., & Van Maanen, L. (2017). Approachesto analysis in model-based cognitive neuroscience. Journal of Mathematical Psychology , ,65–79.Turner, B. M., Sederberg, P. B., Brown, S. D., & Steyvers, M. (2013). A method for efficientlysampling from distributions with correlated dimensions. Psychological Methods , (3), 368-84.doi: 10.1037/a0032222Urai, A. E., De Gee, J. W., Tsetsos, K., & Donner, T. H. (2019). Choice history biases subsequentevidence accumulation. eLife , .van der Meer, M. A. A., & Redish, A. D. (2011). Theta phase precession in rat ventral stria-tum links place and reward information. Journal of Neuroscience , (8), 2843-54. doi:10.1523/JNEUROSCI.4869-10.2011Van Essen, D. C., Newsome, W. T., & Maunsell, J. H. (1984). The visual field representation instriate cortex of the macaque monkey: asymmetries, anisotropies, and individual variability. Vision Research , (5), 429-48.Wang, J., Narain, D., Hosseini, E. A., & Jazayeri, M. (2018). Flexible timing by temporal scalingof cortical responses. Nature Neuroscience , (1), 102.Wilson, M. A., & McNaughton, B. L. (1993). Dynamics of the hippocampal ensemble code forspace. Science , , 1055-8.Wood, E. R., Dudchenko, P. A., Robitsek, R. J., & Eichenbaum, H. (2000). Hippocampal neuronsencode information about different types of memory episodes occurring in the same location. Neuron , (3), 623-33.Wyble, B. P., Linster, C., & Hasselmo, M. E. (2000). Size of CA1-evoked synaptic potentialsis related to theta rhythm phase in rat hippocampus. Journal of Neurophysiology , (4),2138-44.Yuste, R. (2015). From the neuron doctrine to neural networks. Nature reviews neuroscience , (8),487–497.Zandbelt, B., Purcell, B. A., Palmeri, T. J., Logan, G. D., & Schall, J. D. (2014). Response timesfrom ensembles of accumulators. Proceedings of the National Academy of Sciences , (7), OWARD AND HASSELMO eLife ,8