Predicting the future with a scale-invariant temporal memory for the past
11 Predicting the future with a scale-invariant tem-poral memory for the past
Wei Zhong Goh , Varun Ursekar , Marc W. Howard , Graduate Program in Neuroscience, Department of Physics, Department of Psychological and Brain Sciences,Center for Systems Neuroscience,610 Commonwealth Avenue,Boston University.
Keywords:
Reinforcement learning, prediction, scale invariance, long memory
Abstract
In recent years it has become clear that the brain maintains a temporal memory of recentevents stretching far into the past. This paper presents a neurally-inspired algorithm touse a scale-invariant temporal representation of the past to predict a scale-invariant fu-ture. The result is a scale-invariant estimate of future events as a function of the time atwhich they are expected to occur. The algorithm is time-local, with credit assigned tothe present event by observing how it affects the prediction of the future. To illustratethe potential utility of this approach, we test the model on simultaneous renewal pro-cesses with different time scales. The algorithm scales well on these problems despitethe fact that the number of states needed to describe them as a Markov process growsexponentially.
Reinforcement learning (RL) models that are designed for Markov processes (e.g.,Watkins and Dayan, 1992; Sutton, 1988) have been extraordinarily successful in ac-counting for reward systems in the brain (e.g., Schultz et al., 1997; Waelti et al., 2001)and led to remarkable achievements in artificial intelligence (e.g., Mnih et al., 2015;Silver et al., 2018). For instance, in the successor representation, each relevant configu-ration of the world is defined as a state and the goal is to estimate the Markov transitionprobabilities between states (Dayan, 1993). Despite the success of RL, its affinity forMarkov statistics may be a serious limitation. The real world contains many distinct a r X i v : . [ q - b i o . N C ] J a n auses that predict their effects at a range of time scales, presenting a challenge forlearners optimized for Markov statistics. Of course, random processes with memorycan be turned into Markov processes at the cost of defining additional states. How-ever, the cost in terms of memory, and time to learn transition probabilities among anexponentially growing number of states, may be excessively costly in some settings.It has been proposed that a primary function of the mammalian brain is to predictfuture events to enable adaptive behavior (Clark, 2013; Friston, 2010). Evidence fromneuroscience has made clear that the brain contains robust memory for the identity andtime of recent events. For instance, sequentially activated time cells in the hippocam-pus, prefrontal cortex, and striatum (e.g., MacDonald et al., 2011; Tiganj et al., 2018;Mello et al., 2015) maintain information about the time at which recent events wereexperienced over at least tens of seconds, and perhaps much longer. Experimental pre-sentation of distinct stimuli triggers different sequences of time cells (e.g., Tiganj et al.,2018; Taxidis et al., 2020; Cruzado et al., in press) so that these populations maintaininformation about what happened when. In addition to sequentially activated time cells,neurons in the entorhinal cortex (Tsao et al., 2018; Bright et al., 2020) and other cor-tical regions (Bernacchia et al., 2011; Murray et al., 2017) carry temporal information via populations of neurons that respond with a spectrum of characteristic time scalesup to at least tens of minutes. This paper, inspired by work arguing that conditioningin the brain results from an attempt to learn temporal contingencies between stimuliBalsam and Gallistel (2009); Gallistel et al. (2019), presents a formal model that learnsto predict the future given a temporal record of the past. This proposed mechanism iscomputable given a temporal history that can be translated in time and proposes a so-lution for how to estimate the future from a past that includes information about manyevents.This paper proceeds as follows. In the rest of this section, we review a model forretaining a record of past events, and associations between event pairs. In Section 2,we present the model for predicting the future given a temporal record of the past. InSection 3, we discuss its computational complexity, time scale invariance and severalother properties. In Section 4, we present a numerical demonstration of the efficacyof this algorithm. Finally, in Section 5, we compare this algorithm to traditional RLalgorithms, and point out its connections to neuroscience. We start with an agent which is capable of observing and remembering several types ofevents, such as the onset of a 440 Hz tone or the appearance of an image of an apple.In this section, we will describe a model for its capabilities. We will see that the agentmaintains a fuzzy timeline of past events, which it uses to make pairwise associationsbetween events. Neurobiological justification for this model is outlined in Section A.1of the Appendix.
We assume that the world provides a series of discrete events that occur in continuoustime. At each moment, at most one event can occur. For simplicity, without loss of2 H [ W H U Q D O W L P H W V L J Q D O V W U H Q J W K D V L J Q D O I R U Y D U L R X V V W L P X O L W \ S H V [ V L J Q D O I ; V L J Q D O I < ] V L J Q D O I = L Q W H U Q D O S D V W W L P H P H P R U \ V W U H Q J W K E P H P R U \ I R U [ I ; D W Y D U L R X V W L P H V V L Q F H [ R F F X U U H G [ R F F X U U H G D J R [ R F F X U U H G D J R [ R F F X U U H G D J R Figure 1:
Memory is a fuzzy representation of the signal up to the present. (a)
Sig-nal as a function of external time, for three event types, X , Y and Z . This is the scenarioconsidered in Fig. 2 through 6. (b) Memory for a recent event as a function of internalpast time, at varying (external) times since the event occurred. As a function of internalpast time, peaks in the memory are present at approximately the time interval since theevent.generality, suppose there are three types of events, which we call X , Y and Z respec-tively. Whenever we need to avoid confusion, we will use event type to refer to typeof event, and use event episode to refer to an individual occurrence of an event. Weencode the occurrence of the event type X as a signal f X ( t ) , which is the sum of Diracdelta functions centered at the occurrence times of episodes of X (Fig. 1a). (We willdiscuss quantities in relation to X ; such statements hold analogously for Y and Z .) Wecall t , the argument for the signal f X ( t ) , real time or external time , to emphasize thatthis time axis is a feature of the world instead of being constructed by the observer. Wedenote the collection of all three signals as f ( t ) , and analogously for the quantities tofollow. At every instant in (external) time t , the agent has direct access to f ( t ) (whichis zero precisely unless the event of interest occurs at t ), but not f at any other timevalue. Signals are shown in Fig. 1a in the case where X , Y and Z occur at times 0, 1 and2 respectively. At every instant in time t , the agent’s memory for X , denoted ˜ f X ( ∗ τ ; t ) , is a fuzzyrepresentation of the signal up to the present, f X ( t − ∗ τ ) . From the agent’s perspective,the internal past time, ∗ τ > , indexes how long ago events in memory might haveoccurred. The degree of fuzziness of the memory varies inversely with a sharpnessparameter k , which is typically a small even integer; throughout this paper, it is fixedat 8.At time τ + t , the memory element for an event that occurred at time τ is givenby ˜ f ( ∗ τ ; τ + t ) = Φ k ( t/ ∗ τ ) / ∗ τ , where the fuzziness, Φ k ( · ) , is given by the dimensionless3quation Φ k ( x ) = u ( x ) κ x k e − kx , (1) κ = k k +1 /k ! is a normalizing constant and u is the unit step function. Memories for arecent event are shown in Fig. 1b for various values of t . For an arbitrary signal f , theassociated memory up to time t is ˜ f ( ∗ τ ; t ) = 1 ∗ τ (cid:90) t −∞ f ( τ ) Φ k (cid:18) t − τ ∗ τ (cid:19) dτ. (2)In other words, the memory for an event type is the sum of the memory elements asso-ciated with each episode of that event type. On its face, Eq. 2 appears to assume thatthe agent has access to the infinite past of f ( t ) . However, previous work has shownthat ˜ f ( ∗ τ ; t ) can be efficiently and time-locally constructed from a set of leaky inte-grators with a spectrum of time constants (see section A.1 in the Appendix; Shankarand Howard, 2013). Using this approach, the number of leaky integrators necessary toremember the past to some bound T goes up like log T .The signal f up to any given external time t fixes the event occurrence history.However, due to the agent’s fuzzy memory, the agent is only able to form a fuzzy sub-jective belief distribution about the event occurrence history leading up to the present.We may interpret the memory for X as the agent’s subjective estimate of the instanta-neous rate of occurrence of X at time t − ∗ τ . In other words, we have, for an infinitesimaltime element d ∗ τ , ˜ f X ( ∗ τ ; t ) d ∗ τ ≈ P (cid:16) X @ t − ∗ τ ( d ∗ τ ) (cid:17) , (3)where P ( · ) , the probability of an event, is used in the subjective Bayesian sense todescribe the agent’s belief, and “ X @ t − ∗ τ ( d ∗ τ ) ” stands for “an episode of event X occurred within the infinitesimal time interval between t − ∗ τ and t − ∗ τ + d ∗ τ .” Since ˜ f allows the agent access to the identity of and approximate time at which past eventsmight have happened, we describe ˜ f ( ∗ τ ) to be a timeline of the past.At each instant in time t , the agent is also able to compute the state of the memorya time interval δ into the future, assuming that no events of interest occur during thatinterval. We call this the projected memory, which is given by, for an arbitrary signal f , ˜ f δ ( ∗ τ ; t ) = 1 ∗ τ (cid:90) t −∞ f ( τ ) Φ k (cid:18) t + δ − τ ∗ τ (cid:19) dτ. (4)Translation can be efficiently implemented based on the set of leaky integrators. Priorwork has shown that this can be done in a neurobiologically reasonable way (see SectionA.2 in the Appendix; Shankar et al., 2016). Many models of memory make use of associations between the temporal context de-scribing the recent past and the currently available stimulus. The agent described herebuilds pairwise associations from X (the cue) to Y (the outcome) as the average state ofmemory for X whenever Y occurs, and analogously for other event pairs: ∆ M YX ( ∗ τ ) ∝ ˜ f X ( ∗ τ ; t ) f Y ( t ) . (5)4 L Q W H U Q D O I X W X U H W L P H D V V R F L D W L R Q V W U H Q J W K S D L U Z L V H D V V R F L D W L R Q V 0 <