[PDF] The backpropagation-based recollection hypothesis: Backpropagated action potentials mediate recall, imagination, language understanding and naming

Abstract

Ever since the advent of the neuron doctrine more than a century ago, information processing in the brain is widely believed to mainly follow the forward pre to post-synaptic neurons direction. In this paper, we emit the backpropagation-based recollection hypothesis as follows: weak and fast fading Action Potentials following the (highest weight) post to pre-synaptic backward pathways, mediate explicit cue-based memory recall. This includes also the tasks of imagination, future episodic thinking, language understanding and associating names to various stimuli. These signals originate in highly invariant neurons, which uniquely respond to some specific stimuli (e.g. image of a cat). They then travel backwards to reactivate the same populations of neurons that uniquely respond to this specific stimuli during perception, thus recreating "offline" an experience that is similar. After stating our hypothesis in details, we review abundant evidence on the existence of such backpropagating signals as well as other relevant literature that supports our claims. We then leverage simulations based on existing spiking neural network models with STDP learning to show the computational feasibility of using such a mechanism to map the image of an object to its name with the same high accuracy as a state of the art machine learning classifier. Although not yet a theory, we believe this hypothesis presents a paradigm shift that is worth further investigating: it opens the way, among others, to new interpretations of language acquisition and understanding, the interplay between memories encoding and retrieval, as well as reconciling the apparently opposed views between sparse coding and distributed representations.

Full PDF

TThe backpropagation-based recollection hypothesis:

Backpropagated action potentials mediate recall,imagination, language understanding and naming

Zied Ben Houidi Huawei Technologies Co. Ltd.

Abstract —Ever since the advent of the neuron doctrine morethan a century ago, information processing in the brain is widelybelieved to mainly follow the forward pre to post-synaptic neu-rons direction. In this paper, we emit the backpropagation-basedrecollection hypothesis as follows: weak and fast fading ActionPotentials following the (highest weight) post to pre-synapticbackward pathways, mediate explicit cue-based memory recall.This includes also the tasks of imagination, future episodic think-ing, language understanding and associating names to variousstimuli. These signals originate in highly invariant neurons, whichuniquely respond to some speciﬁc stimuli (e.g. image of a cat).They then travel backwards to reactivate the same populationsof neurons that uniquely respond to this speciﬁc stimuli duringperception, thus recreating “ofﬂine” an experience that is similar.After stating our hypothesis in details, we review abundantevidence on the existence of such backpropagating signals as wellas other relevant literature that supports our claims. We thenleverage simulations based on existing spiking neural networkmodels with STDP learning to show the computational feasibilityof using such a mechanism to map the image of an objectto its name with the same high accuracy as a state of theart machine learning classiﬁer. Although not yet a theory, webelieve this hypothesis presents a paradigm shift that is worthfurther investigating: it opens the way, among others, to newinterpretations of language acquisition and understanding, theinterplay between memories encoding and retrieval, as well asreconciling the apparently opposed views between sparse codingand distributed representations.

I. I

NTRODUCTION

Biological brains process sensory visual input and learnto extract invariant representations from such input in anunsupervised manner. However, mapping signiﬁers [1], i.e.mental representations of the image-sounds or ”words”, tothe signiﬁed mental representations that such words refer to,needs an interaction with an external agent that supervises thelearning. Such “teacher” simultaneously generates the sound-image related to the signiﬁer in the presence of the actualstimuli that relates to the signiﬁed: this can happen for exampleby generating the sound ”cat” or the writing of the word ”cat”in the presence of an actual image of a cat. The teacher shallrepeat this procedure until both are associated. We say thatthe agent has learned to map the signiﬁer to the signiﬁed andvice versa .In this context, it is tempting to think at ﬁrst sight thatthe repeated co-occurence of both stimuli reinforces theirconnection, in a hebbian manner, and thus allows the mapping [email protected], [email protected] of signiﬁers to their signiﬁed representations. However, it iscommonly accepted that neurons process sensory informationmainly in a forward manner, i.e. from pre-synaptic neuronsto post-synaptic ones. Yet, in the case of the signiﬁer andthe signiﬁed “problem”, the two co-occuring signals wouldeventually reach a connection point, where both signals havefollowed only forward paths. Now, assuming that memoriesare stored and encoded in the same areas that uniquelyresponded to the memorized stimuli during the ﬁrst encounter,a challenging question arises: What neural mechanism allowsto associate one to the other, such that the activation of thesigniﬁer can trigger back the activation of the signiﬁed andvice versa.We argue that this problem is a particular case of a moregeneral one that occurs whenever the recollection of previouslyencountered and stored stimuli is needed. This is the case ofexplicit memory where a sensory stimuli, such as a visualscene or a smell, acts as a cue to trigger the retrieval ofpast related events. We further argue that this is also the casein imagination where different parts of previously encodedstimuli are recalled and ”merged” together to generate an”imagined” new experience that was not exactly met before.In this paper, we hypothesize that weak and fast-fadingbackpropagating action potentials (APs) (from post to pre-synaptic neurons) whose strength is proportional to pre-synaptic “weights” are the medium by which previouslyencoded information is recollected. Such backpropagatingsignals start in what we call source pointer neurons and travelall the way back, selectively reactivating on their path theneurons which uniquely responded to the stimuli during theﬁrst encounter, thus creating a similar experience to that of theﬁrst time. A source pointer neuron, as we will develop lateris a neuron that specializes, thanks to past memorized co-occurences, in selectively and invariantly responding only tothe retrieval cue (e.g. signiﬁer for language) or its associatedmemories to be retrieved (e.g. signiﬁed). We refer to this as the backpropagation-based recollection hypothesis throughoutthe paper.Prior work has extensively studied the interplay betweenvisual perception and retrieval as happens in visual imagery(See for example Pearson’s recent review [4]). For example, in See the discussions in Sec. III-C1 for more elaboration on this assumption As we will show later in Sec. III-D, the existence of such selective neuronshas been widely observed [2], [3] a r X i v : . [ q - b i o . N C ] J a n ddition to the high overlap between areas involved in retrievaland perception, it has been observed– thanks to DynamicalCausal Modeling (DCM) of activation patterns, that thereexists indeed a reverse top-down signaling pathway, fromhigher-level to lower-level cortical areas, that is responsible forrecollection of visual images [5], [6]. However, following thetraditional conception of forward propagation, these observedtop-down activation patterns were attributed (wrongly, weargue) to backward recurrent feedback connections. Providinga biologically plausible computational model at the neuronallevel that explains how backward recurrent connections (thatuse the pre to post synaptic path) can reactivate a previouslyencoded stimuli was beyond the scope of their work andremains to the best of our knowledge unsolved. For the sakeof completeness, it is worth mentioning that recent years haveactually seen the rise of ”forward-based” computational gen-erative models that come from machine learning and that cangenerate realistic images, the most notorious being VariationalAutoencoders (VAEs) [7] and Generative Adversarial Nets(GANs) [8]. However, having been designed for a differentpurpose, it is not clear how they can be put together toimplement retrieval tasks, even in machine learning. Secondand most importantly, their complexity and the supervisedmechanisms they employ make them less likely to be bio-logically plausible [9], [10], [11]. We argue instead in thispaper in favour of a simpler unsupervised mechanism whereno local error information and no output target for supervisionare needed: the same forward paths used for perception aresimply used backward for retrieval.After stating our hypothesis, in the general case (Sec. II-A)and in the particular case of language understanding andnaming (Sec. II-B2), we discuss its veriﬁability and reviewabundant experimental evidence that backs up the plausibilityof most of its assumptions (Sec. III): we ﬁnd, for example, thatsuch fading away backpropagating action potentials have beenwidely measured, that they are stronger when the postsynapticneuron is ﬁring and most of all, that they can, interestingly andusefully, be controlled by neuromodulation so as to increasetheir strength or disinhibit them (see Sec. III-A). We furtherreview the neural correlates of cue-based explicit memoryretrieval, the existence of sparse pointer neurons, and ﬁndfurther abundant evidence supporting the hypothesis.We then focus in the remainder of the paper on a partic-ular case of our problem which is language understandingand particularly naming, which we computationally model(Sec. IV) and then simulate (Sec. V).In this context, wedeﬁne naming as the act of retrieving the representation ofthe sound-image (signiﬁer) that refers best to a presentedvisual stimuli (signiﬁed). We deﬁne understanding , on theother hand, as the task of retrieving the signiﬁed representationthat corresponds to a presented auditory or visual stimuli ofa signiﬁer. We leverage recent success [12], [13] in trainingArtiﬁcial Spiking Neural Networks (SNNs) with Spike TimingDependent Plasticity (STDP) learning to simulate a neuralnetwork that implements our hypothesis. We verify the com-putational efﬁciency of backpropagation-based recollection by comparing its accuracy in correctly naming a visual object,to that of a state of the art Machine learning algorithm. Tofurther challenge the computational ability of our hypothesis,we test it on an extreme learning task, which is namingobjects after seeing a single instance of each class. We ﬁndthat backpropagation-based recollection leads on average andmaximum to a higher accuracy compared to a Support VectorMachine (SVM) classiﬁer. We are of course aware that theSNN models we use in this paper are not the brain, let alonefrom the particular implementation we leverage. We believe,however, that the simulations hint towards the computationalefﬁciency of the mechanism. That, especially tied with ourliterature review, calls for a serious further exploration of thispath, especially given the breadth of the potential implications(as we discuss in Sec. VI-A).II. T

HE BACKPROPAGATION - BASED RECOLLECTION H YPOTHESIS

A. General case

We posit our hypothesis and its assumptions in its contextas follows.

Memories are stored in a distributed fashion inthe same areas where they are detected and recognized whenencountered for the ﬁrst time. Recollection of memories istherefore a process by which the appropriate population ofneurons is re-activated again so as to ”re-live”, ofﬂine, anexperience that is similar to the ﬁrst encounter. We hypoth-esize that weak and fading away backpropagating signalsfrom post-synaptic to pre-synaptic neurons, whose strengthis proportional to the post-synaptic neuron’s ﬁring rate andpre-synaptic weights, is the mechanism by which the brainperforms generative tasks. By generative tasks, we mean theregeneration of a previously lived and memorized stimuli (e.g.recollection in explicit declarative memory), the generation ofa plausible future stimuli (e.g. future episodic memory) or theregeneration and combination of previous separately-lived andmemorized stimuli, a process we refer to as imagination (e.g.imagining a cat that laughs by combining a memorized mentalimage of a cat with that of the act of laughing).The recollection process starts by the presentation of aretrieval cue, which activates few sparse neurons that uniquelyidentify both the retrieval cue and the to-be retrieved memory.We assume that these neurons learned to respond only to thepresence of either of both stimuli thanks to a low-level learningrule such as STDP [16]; due to the (repeated or modulated)co-occurence of both stimuli in the past: the cue and the to-be retrieved. We further hypothesize that the retrograde signalis initiated in such source ”pointer” neurons (e.g. “JenniferAnniston” cells [2] as we will discuss later if the goal is torecall prior stimuli related to, say, Jennifer Anniston). Thesignal then travels backwards following the paths with higherweights, activating on its way all the various neurons thatcompose the mental images to be recalled. We hypothesize We build on Perez’s python implementation available on github [14] fromwhom we obtain the authorization to use and modify for research purposes.We also release our modiﬁcations so as to ease the reproduction of results [15] ig. 1: Illustration in the case of naming and understanding ﬁnally that such backpropagation can be controlled remotelyvia neuromodulation so as to invoke it, increase its strengthor to inhibit it. This neuromodulation acts thus as a ”switch”to control whether to do the retrieval or not.B. Case of language acquisition, understanding and naming

We further argue that a particular case of the above-mentioned generative tasks, which we later computationallysimulate, is a form of explicit semantic memory related tolanguage understanding and naming. We ﬁrst start with someterminology.

1) Terminology:

We adhere to the conceptualization andterminology introduced by swiss linguist Ferdinand de Saus-sure [1] and build on the distinction he introduced betweenthe signiﬁer and the signiﬁed . Since several interpretations ofSaussure’s work could perhaps be made, we clarify in thefollowing the one we adhere to. In particular, we refer tothe signiﬁer as the mental representation of the sound-imageof the word. By sound-image, we mean either the phoneticsound resulting from the word, or the image of the lettersthat form the word. We refer to the signiﬁed as the mentalrepresentation of the actual object that the sound-image andits mental concept refer to. Both the signiﬁer and the signiﬁedare concepts, one represents the word, the other represents themental image(s) that this word often refers to. In this context,we refer to understanding as the act of mapping the signiﬁerto its signiﬁed.

Naming is the act of retrieving the signiﬁer thatcorresponds to a given mental representation or to a presentedvisual stimuli.

2) Illustration in the case of language:

When it comes tolanguage, our hypothesis implies that backpropagating APsmediate understanding (recollection of the signiﬁed once pre-sented with a stimuli that presents the signiﬁer) and naming(recollection of the signiﬁer or name that corresponds to agiven visual object or any other stimuli in general).Fig. 1 illustrates our hypothesis and modeling through theexample of three concurrent sensory inputs that are presentedto a learner and that need to be “permanently associated”. To make a long story short, we assume as illustrated in theﬁgure that there exists an area where the visual and soundactivation pathways intersect or meet each other (here, at thelast “backpropagation root” layer in Fig. 1). It is a similararea that will be, according to our hypothesis, the root of thebackpropagating APs that mediate the tasks of understandingand naming. During the “understanding” task, the retrieval cueis a word-related stimuli (e.g. sound “cat” or image of the word“cat”) and the retrieved memory is the signiﬁed representation(illustrated by the Understanding backward pathway in theﬁgure). In the task of naming, the retrieval cue is the signiﬁedobject (here a cat) and the retrieved memory is the name orsigniﬁer of the object (illustrated by the Naming backwardpathway).In more details, the ﬁgure exempliﬁes the toy case ofa “teacher” that shows the “learner” an image of a cat,simultaneously to how the word cat is written, as well asthe sound of the word. In this case, the sensory input of a sound “cat” as well as an image of the word are processedthrough consecutive feed-forward neural layers. Similarly towhat happens in the primate’s visual cortex [17], neurons inthe earlier layers have learned (thanks to a simple unsupervisedrule such as STDP) to respond to simple features and thedeeper we go, the more selective the neurons become and themore they respond to more complex ones. We assume that at alater processing stage, there exists fewer “sparse” neurons thatselectively respond only to the presence of the sound “cat”, werefer to such neurons as the “sound signiﬁer” pointer neuronsof the word cat. Similarly, we assume that there exists neuronsthat selectively respond to the “image” of the word cat and werefer to such family of neurons as the “Image signiﬁer” pointerneurons. Finally, neurons that selectively respond to both arereferred to as the “sound-image signiﬁer” pointer neurons orsimply signiﬁer neurons. Following a similar pattern, the visualimage of the cat itself is processed through various neuralstages until certain neurons (referred to as signiﬁed in theﬁgure) respond selectively only to the presence of the imageof a cat.This is how, as illustrated in the ﬁgure, at some stage, theabove described visual and sound pathways reach a commonconnection point, in a subsequent feedforward neural layer.We hypothesize that the repeated, or the neuromodulated , co-occurence of signiﬁer and signiﬁed stimuli (i.e. saying “cat”in the presence of an actual cat) reinforces their connection,at this junction point layer, in “a hebbian manner” thanks toa simple rule such as STDP learning. Next, when presentedwith either signiﬁer or signiﬁed related stimuli, the sourcepointer neuron(s) in the backpropagation root layer willﬁre, resulting in backpropagating action potential(s) that areproportional to pre-synaptic weights . The latter will cause the The connection can be reinforced either by mere repetition, e.g. repeatingmany times the word cat in the presence of an image cat, or by neuromod-ulation, in which case a single co-occurence can be enough to cause a longterm reinforcement of the connection. one in theory should be enough but they could be many in reality, e.g. forredundancy ackward activation of the appropriate neurons, thus recallingthe signiﬁed if the presented stimuli relates to the signiﬁer,and vice versa.To be more explicit, we consider the naming task as suc-cessful if the backpropagating AP activates only the Signiﬁerneuron(s) that uniquely identify the word “cat”, let it be theimage or the sound. The understanding task is successfulif the backpropagation activates only the Signiﬁed neuron(s)that uniquely characterize the image of the cat. It goes thuswithout saying that, as described above and illustrated, weadhere to the view that there exists few neurons that respondselectively to complex stimuli such as (i) sound signiﬁerstimuli, (ii) image signiﬁer stimuli, (iii) signiﬁed or (iv)uniquely to the three previous ones (e.g. they ﬁre only whenpresented with any of the three previous ones). During theaction potential backpropagation, these neurons act, accordingto our hypothesis, as pointers to selectively reactivate anappropriate population of pre-synaptic neurons that uniquelycharacterizes the memory trace to be retrieved; thus creatinga similar experience to that of the ﬁrst encounter(s) when thememory was encoded. For example, what is retrieved couldbe an experience of the sound of the word ”cat” with theparticular voice or conditions in which it was encoded (in linewith what is called the encoding speciﬁcity principle [18] thatwe will recall later).As we will discuss later, this hypothesis reconciles (i)sparse/localist and (ii) distributed representation theories inthe brain, promising to end a long debate between cognitivepsychologists and neuroscientists [19], [20] . In our frame-work, there is no need to chose between them as both areneeded, but for different purposes: Sparse coding, illustratedhere by the presence of highly selective neurons, is neededfor backpropagation-based retrieval, while the encoding ofthe entire memory trace is still done via a distributed setof neurons. The latter can be selectively reactivated “on-demand”, from source pointer neurons backwards. It is thusthe simultaneous activation of an entire population of neuronsthat forms the entire memory trace, and single highly selectiveneurons are only pointers, helpful for retrieval.Finally, and interestingly, the fact that backpropagating sig-nals are “fading-away” in nature could explain the ephemeralnature of the experience of recalled memories or visual mentalimagery’s lack of vividness : the latter are not as persistent asthe experience of live sensory stimulation. C. What this hypothesis is not about

Finally, it is necessary to clarify that this hypothesis ismeant to explain mainy only the recollection processes . Bythis, we mean in the particular case of explicit recall, the reactivation, as close as possible, of the same neurons thatwere activated during previous encounters of the stimuli to be For simplicity, we focus in this toy example, on a single encounter that,we assume, was “encoded right away”. In reality, stored memory traces mightevolve with repeated exposure, such that recalling what is meant by the wordcat, leads to the recall of a memory of the signiﬁed, that is statistical in nature(e.g. one of the many cats met before, or an average abstract image of a cat) recalled. Indeed, under our hypothesis, reactivating (more orless) the same neural ensembles that uniquely respond to thestimuli to be recalled, is what creates again, “ofﬂine”, a similar subjective experience, despite the absence of the stimuli.As a consequence, in the case of language and particularlyname association, what is covered by our hypothesis is howto (i) learn the association and how to (ii) recall the name, not yet actually how to produce it. By producing, we meanemitting the sound or writing the letters of the words. Furtherinvestigations are needed to reassess the production tasks inlight of our new hypothesis. Nonetheless, it occurs to us thatlearning to produce the right sounds, or speaking, happensthrough a trial and error, reward-based based process in whichthe goal is to “mimic”. The study of such mechanism isbeyond the scope of this paper and is left for future work. Ourhypothesis covers instead the reactivation of source pointerneurons that uniquely characterize the name. The latter canbe further used to passively recall the name (e.g. recollectinghow it sounds, or how it is visually written). How such pointerneurons participate to invoke motor areas to produce soundsor write letters is out of scope for now.Next, when we talk about understanding, we mean themodality-speciﬁc features of semantic memory [21]: i.e. re-calling the details of how a face or an emotion look orfeel like exactly, as opposed to other aspects of semanticmemory like ﬁnding abstract relationships between words.We leave the latter aspect of semantic memory for futurework. Worth mentioning though, Patterson et al. reviewedsemantic knowledge organization in the human brain [21] andreported that all theories agreed on the fact that modality-speciﬁc recall is implemented by a distributed brain network, afact that is coherent with a backpropagation-based recollectionhypothesis.Then, our hypothesis assumes that there are centers thatremotely control via neuromodulation whether or not to invokethe recollection: by either increasing the backpropagation orby inhibiting it. It goes thus without saying that the hypothesisdoes not cover what mechanisms control these control centers,and under which conditions recollection is favoured or shutdown. What our hypothesis predicts is that the task of these“control centers” can be extremely easy to implement: the untargeted remote generation of an excitatory neuromodulatorfavours further recall (depending on whatever cues are acti-vated at the moment). The same applies for inhibition.Finally, our hypothesis, being focused only on the recollec-tion process from sparse source pointer neurons backwards,does not directly explain the mechanisms involved in sparseinvariant neurons formation, novelty or familiarity detection,and the interplay between short (e.g. few days back) and long(e.g. few years back) term memories. Nonetheless, it still offersa ground to reason about these issues. For example, the factthat the same sparse neurons keep being used to signal thepresence of the same familiar stimuli throughout the years(e.g. face of own child) might explain why humans are unableto remember much younger versions of these faces (in theabsence of photos): memories are updated in situ and familiaraces will always lead to the activation of the very same“familiar” invariant sparse neurons, not to the activation of“novel” ones.III. R

EVIEW OF EVIDENCE IN THE LITERATURE

We ﬁrst position our hypothesis in the literature and showevidence that backs it up together with its assumptions.

A. Existence of retrograde signals and backpropagating actionpotentials

Despite the prevalent view of forward processing, it turnsout that a plethora of studies have measured, both in vitro and in vivo in anesthetized [22] and awake [23], [24] mam-malians, action potentials that backpropagate to apical anddistal dendrites, and this for various classes of neurons [25],[26], [27], [28], [29]. We cite in the following only some ofthese studies. For a more complete list, the reader can referto the review of Stuart et al. [25] or that of Waters et al. [27]which summarized the ﬁndings about the measurements andhypothesized few roles of backpropagating action potentials.For a more general review of retrograde signals, i.e. not onlyactivity-dependent but also during synaptogenesis etc., thereader can refer to Tao et al. [30].Williams and Stuart [28] performed simultaneous somaticand dendritic recordings from Thalamocortical (TC) neuronsand measured that action potentials, both due to sensory infor-mation or cortical excitatory postsynaptic neurons, backpropa-gate into the dendrites. In another work, the same authors[29]measured the same phenomenon in neocortical pyramidalneurons. Interestingly for our hypothesis, the authors havefound that action potentials due to physiological patterns ofﬁring, backpropagate three to four times more effectivelycompared to action potential pertaining to mean ﬁring rates.This observation is conﬁrmed by several studies (reviewed byWaters et al. [27]) which found that backpropagation wasmodulated by synaptic input. For example, properly timedexcitatory input leads to the ampliﬁcation of backpropagation,whereas inhibitory input might block it. More interestinglyfor our hypothesis, many neuromodulators have an inﬂuenceon backpropagation, often leading to its enhancement, but inmore complex ways. For example and interestingly, given thesupposed role of the hippocampus in retrieval (see later), inhippocampal CA1 pyramidal neurons, it has been observedthat muscarinic agonists enhance the backpropagation in aprogressive manner having a stronger and stronger effect onsubsequent action potentials [31], [32]. This suggests thatneuromodulation therein can act as a “switch” to enable ActionPotential backpropagation in a selective manner, a feature thatis necessary for our hypothesis. As described in Sec. II, notevery presentation of a visual stimuli would systematicallylead to the activation of “naming”. And probably not every pre-sentation of the signiﬁer stimuli should lead to the evocationof its signiﬁed representation. Similarly, not every exposureto a familiar stimuli ( known ) automatically leads to explicitly remembering its context. Nonetheless, to the best of our knowledge, the role ofsuch activity-dependent backpropagation of Action Potentials,as reviewed for example by Waters et al. [27], has beenso far hypothesized to be local, acting as a feedback loopfrom postsynaptic to pre-synaptic neurons, to regulate spikingactivity or support synaptic plasticity. In this work, we emitthe hypothesis that such retrograde signals play a more explicitand direct role in higher-level cognitive tasks, such as naming,understanding, and other generative processes like imaginationand explicit memory retrieval. We next oppose our hypothesisto the state of knowledge in experimental cognitive sciencesabout explicit memory (Sec. III-B). We later dive deeper indetails and oppose our assumptions to what is known aboutthe neural correlates of explicit memory(Sec. III-C).

B. Cue-based retrieval: a cognitive sciences perspective

Our hypothesis applies to any task in which the reconstruc-tion of previous encoded stimuli is needed. The literature onthis topic found its origins in early experimental cognitive sci-ence research (e.g. [33], [34]) before recent advances, drivenby neuroimaging and optogenetic stimulation, allowed to shedmore and more light on some actual neural correlates [35]. Westart by surveying the ﬁrst and linking it to our hypothesis.Since the seminal work of Endel Tulving, long-term humanmemory is widely classiﬁed into explicit (declarative) memoryand implicit (procedural) memory. While procedural memoryrelates to long-term acquired skills such as driving or playingan instrument, declarative memory, relates to the explicitrecollection of memories about facts, words, images and eventsetc. In this paper, we focus on the latter and provide ahypothesis about how the retrieval of these memories happensat the neuronal level. It is worth mentioning that Tulvingalso played a role in further dividing explicit memories intoepisodic and semantic ones[34]. Our hypothesis is orthogonalto the difference between them, its mechanism can be usefulfor both and beyond, we believe, to any generative task suchas imagination and mind-wandering.One crucial principle in this area was formulated by Tulvingand Thomson under the name of encoding speciﬁcity [18].The principle stresses the importance of retrieval cues andthe entire context that is perceived during encoding for laterretrieval: the surrounding context that was present duringthe ﬁrst perception and encoding moment can act as anefﬁcient retrieval cue in the future. Although it might soundstraightforward today, such early work [33], [18] played a rolein differentiating between availability of memories and their accessibility , thanks to cues, let them be internal or externalstimuli-based : the inability to recall does not necessarily meanthat the memory is not available but could be also due to thelack (inactivation) of appropriate cues.Our hypothesis offers a neurobiological ground to interpretand simulate encoding speciﬁcity: for us, any surroundingcontext during encoding can act as a retrieval cue as long as itctivates the source pointer neurons that uniquely identify theretrieval cue and all the remaining surrounding context to beretrieved. Backpropagating action potentials can then “travelbackwards” to reactivate networks of neurons which uniquelyresponded to the stimuli during the encoding moment, thuscreating, again, a similar experience. In fact, the process bywhich an internal or external (sensory) cue activates the storedmemory trace is well known and has been called ecphory [36],[37]. Ecphory , which is this interaction between trace andcue, is described as the ﬁrst stage of memory retrieval beforeconversion actually happens and the recollection experience is“lived”. In our hypothesis, the cue activates mainly the sourcepointer neurons, which in turn activate back the appropriatepresynaptic populations, resulting in the recall of all the relatedtraces. As such, following our hypothesis, it can be easilyseen how two components may affect memory retrieval asalready predicted by Tulving: the lack of the appropriate cueor a decay of the synaptic weights of the concerned neuralnetworks. Our hypothesis announces others that could beveriﬁed in the future. For example, a decline in the intensityor extent of the backpropagation (e.g. impairment in theneuromodulation that is supposed to facilitate it) can hamperretrieval. Conversely, an excess of such backpropagation mightlead to higher levels of intrusive thoughts.

Today, the encoding speciﬁcity view has passed the testof time (see [35] for a recent review) and even its “oppo-nents” [38], [39], [40] do not question the necessity of somedegree of match between encoding and retrieval conditions,but rather stress the importance of additional factors thatinﬂuence the performance of later retrieval; the most importantbeing the “discriminative” power of the retrieval cue or itsdistinctiveness. Accordingly, recalling performance is not onlyrelated to the amount of match between encoding and retrievalconditions, as thought ﬁrst, but also to cue overload and henceto what extent the cue is “discriminative”.

This latter viewcan be also easily observed under our framework: let’s havein mind the illustration in Fig.1 and the image of a cat as acue. If this image appears, during learning, simultaneouslywith all sorts of names, and not only the signiﬁer “cat”,then the backpropagating action potentials would potentiallysimultaneously activate many “ Signiﬁer” neurons (and notonly that of the cat), making it hard to distinguish and hencecorrectly name. The necessity of being discriminative can bealso seen in our simulations of naming later in Sec. V: afterbackpropagating action potentials back to the “categories”layer, the “signiﬁer neuron” that gets the highest “votes” iselected to signal the name of the object. If all neurons receive“equal votes” because of cue overload, it would be impossibleto retrieve the right name. The selectivity of source pointer neurons is a consequence of encounteredprior stimuli, and this deﬁnes what particular contextual cues are efﬁcient atrecalling what particular encoded trace. A term that Tulving revived together with the forgotten work of germanscientist Richard Semon, who ﬁrst coined it and stressed the importance ofretrieval cues.

C. Explicit memory: neural correlates

We now dive more into the neurobiological foundationsof encoding speciﬁcity, ecphory and the processes involvedin explicit memory retrieval. This ﬁeld has seen tremendousadvances driven by two techniques: neuroimaging and opto-genetic stimulation.We start from a recent thorough review [35] in whicha large body of research strongly supported Semon’s andTulving’s cognitive theories: namely, that (i) the success ofaccessibility depends on the interaction between cues andmemory traces and that (ii) there are strong ties betweenencoding and retrieval, including at the level of activatedneural ensembles. In particular, it was shown, using artiﬁcialoptogenetic stimulation techniques that it is possible to eitherdisrupt or mimic ecphoric processes by activating or inhibitingthe same speciﬁc neural ensembles that were active duringencoding .In one experimental study [41] that we further analyzebelow, blocking the neural ensembles that were used torecognize the cue during encoding, resulted in impairmentin retrieval. In the experiments, mice were conditioned toproduce a fear response whenever placed in a particularcontext, a context which stands here for the cue. At thesame time, CA1 neural ensembles that were particularly activeduring learning are optogenetically tagged. Whenever placedin the same context again, mice successfully freeze as asign that they recognize the environment. However, placingthem while inhibiting the previously tagged neural ensemblesconsiderably reduces freezing levels. This means that if theneural ensembles that recognize the cue do not activate, thememory is not retrieved.

Other studies (e.g. [42]) similaryused optogenetics to demonstrate the “opposite” possibility:artiﬁcially reactivating the neural ensembles that recognizethe cue, thus inducing the retrieval of the memory trace(i.e. causing freezing), even outside the context in which theconditioning happened (i.e. in the absence of a natural cue).Now, in the previous two families of experiments, mappingthe cue and the trace was learned naturally, neural ensembleswere tagged depending on their activity during conditioning,and artiﬁcial inhibition or excitation were used to disruptor elicit retrieval. A last recent family of experiments [43]showed that it is even possible to associate a cue and a traceartiﬁcially and to later elicit retrieval in natural conditions. Inparticular, they repeatedly used photostimulation to artiﬁciallyactivate a neural ensemble that usually recognizes a specialsmell, simultaneously to photostimulating another memorytrace that elicits avoidance. After this co-occurence basedconditioning happened, exposure of mice to the real smellcaused an avoidance reaction: mice “remembered” to avoidalthough they’ve never experienced the smell in reality. techniques that allow to later activate or inhibit precisely only selectedpopulations of neurons that were initially selectively tagged, depending ontheir activity Note that, here, we use interchangeably encoding, learning and condi-tioning to accommodate different terminologies used in different papers. inally, note how, in optogenetic stimulation above, lightcan simultaneously selectively activate a set of neurons thatwere prepared in advance to be light sensitive. This allowedto dissect the interaction between memory traces and retrievalcues and show how both are intimately related during encod-ing. However, it is not clear today how a similar selectivereactivation can happen in the brain.

That’s exactly the roleof our hypothesis: hence, our answer to the question “howdoes a neural ensemble activates in a selective way” lies inbackpropagating action potentials following the paths withhighest presynaptic weights. Our hypothesis also offers aframework to simulate (as we do later) and understand theseissues at the level of single neurons.

As a ﬁrst summary, the studies above [41], [42], [43]and many others reported in the review [35] (which weencourage the reader to check) conﬁrm the importance ofcues and encoding speciﬁcity. But beyond that, they suggestthat retrieval reactivates what was active during encoding, ina process referred to, sometimes, as neural reinstatement.

Thisprinciple is at the heart of our hypothesis as backpropagatedAPs should follow exactly the reverse path that uniquely ledto the activation of source neurons during encoding. It turnsout that many arguments support this reinstatement principle.We review them in what follows.1) Retrieval as top-down re-activation:

First, historically,the oldest (yet weak) supporting fact is that simply reinstatingthe encoding context during the recollection time enhancesretrieval performance and quality, as reported by some re-views [44], [45]. Second, more recently and more strongly, asigniﬁcant body of research intentionally studied the overlapbetween encoding and retrieval and provided large evidencein favour of the principle using various ensemble tagging[42],[41], [46], [47], [48], [49], [50], [51], [52], [53], EEG[54]and neuroimaging[55], [56], [57], [58], [59], [60], [61], [62],[63], [64], [5] techniques. Furthermore, it was shown that thereactivation overlap between encoding and retrieval inﬂuencesalso the perceived quality of the retrieval. For example, in thecase of visual imagery (i.e. attempting to mentally visualizean image), it has been shown that activation overlap in visualcortex increased visual imagery vividness, or the subjectiveintensity of the remembered image[65], [62], [61]. We referthe reader to the many references above for more informationand cite in what follows only few examples of each technique.In terms of neuroimaging, Dijkstra et al. [5] for exam-ple used Dynamical Causal Modeling (DCM) [66] to infercoupling between cortical regions involved in the tasks ofvisual perception as opposed to visual imagery. They measuredthat visual imagery vividness correlated more with top-downconnectivity patterns (from high level cortical areas to lowerlevel areas) as opposed to perception itself. Many other studiessuggest such top down mechanism during visual imagery [67],[68], [69], [6], [4]. The reader can refer to Pearson’s recentreview [4] of the cognitive neuroscience of visual mentalimagery for more details about the top down reverse hierarchyof information and the fact that the process seems to be a weakform of the bottom-up perception. However, in general, due to the widespread view that neural computation is mainly forwardfrom pre to postsynaptic neurons, this top down activationcascade has been always interpreted, in the literature, as theresult of feedforward feedback connections from higher-levelcortical layers to lower ones.Then, and perhaps more convincing than neuroimaging,neural ensemble tagging techniques also conﬁrm the principle.In addition to work we described above [41], [42], [43],another recent example is the work of Guskjolen et al. [53]who performed contextual fear conditioning experiments onyoung mice while tagging the neural ensembles which wereactive during encoding. As happens with infantile amnesiain humans, the infant mice later exhibited forgetfulness.However, photo stimulation of tagged neurons, only in thehippocampal formation (Dentate Gyrus in particular), inducedmemory recovery and reactivation of broader areas which weretagged during conditioning including hippocampal CA1 andC3, and cortical neurons. Note how this ﬁnding is inline againwith the idea that traces are distributed in neural ensemblesthat span many cortical brain regions [70], each responsible ofone aspect of information (sensory, motor, visual, emotionaletc).

2) Existence of “backpropagation root layers”:

The lastpaper leads us to the last point we review in this section: therole of the Medial Temporal Lobe and its relationship to ourbackpropagation root layers where source pointer neurons arelocated.

Indeed, our hypothesis assumes the existence of anarea where source pointer neurons lie and where the backprop-agation starts. If our hypothesis is correct, this area shouldform the glue between cues and retrieved traces, and shouldobserve a reversal of the ﬂow of information. Interestingly,the Medial temporal lobe and the hippocampus in particularhas been shown to (i) play this role and (ii) exhibit a similarreversal behaviour.

For (i), many theories[71], [72], [73], [74],[75] support that the hippocampus performs exactly the taskof reinstating patterns of activity in the cortex that were aliveduring encoding. This can be already seen from the study ofTanaka et al. [41] which we reported above. What they didby actually monitoring cortical activity while inactivating hip-pocampal CA1 cells on rodents, shows how the hippocampusis likely responsible for reinstating the patterns that were activeat encoding. By permanently tagging neurons which wereactive during encoding (a fear conditioning experiment), theywere able silence them with laser stimulation, up to severaldays later. When silencing only the tagged CA1 cells (and notthe entire engram), memory retrieval was impaired; and therest of the neural ensemble in the cortex and amygdala, whichused to reactivate during retrieval, was not reactivated again.Many other studies [76], [77], [78] also showed that retrievalsuccess depended on whether or not the hippocampus wasconcurrently solicited or not, and this during both encodingand retrieval. Horner et al. [77] showed further evidence thatthe hippocampus binds together all elements composing a tracethat is stored in distributed regions in the cortex, playing asa hub to perform what is also called pattern completion task.Finally, interestingly for our hypothesis, Staresina et al. [78]bserved a reversible signal ﬂow from the cue region to thetarget region to be recalled through the hippocampus.

Thisputs the HC and MTL in the position of good candidates tobe root backpropagation areas as per in our hypothesis: theyseem to implement the link between cues’ networks and traces’networks and they seem to be the place where ﬂow reversalhappens.

Citing verbatim a thorough review and perspective fromMoscovitch[79]: “Retrieval occurs when an external or inter-nally generated cue triggers the hippocampal index, whichin turn activates the entire neocortical ensemble associatedwith it. In this way, we recover not only the content of anevent but the consciousness that accompanied our experienceof it”. Moscovitch refers later to the hippocampal memoryindexing theory of Teyler and DiScenna [73], [72] as follows:“Memory traces in the HC/MTL are encoded in sparse, dis-tributed representations that act as an index or pointers to theneocortical ensembles that mediate the attended information”.

This claim, which is inline with our hypothesis, leads to thenext assumption which we further verify next: the existence ofsparse source pointer neurons.D. Existence of highly selective source Pointer Neurons

Our hypothesis assumes that, at a certain deep level ofprocessing, certain neurons will become highly selective andinvariant: they serve as pointers to reconstruct the encodedstimuli they represent. For example, in the particular case oflanguage, some neurons will respond only to the signiﬁed oronly to the signiﬁer (one of the cues), and some would respondto both the signiﬁer and the signiﬁed (i.e. what is in commonbetween the cue and the “to be recalled”). In this section, wescan the literature about concept representation in the brain toassess the plausibility of the existence of such neurons.

It turnsout that similar neurons have been documented and that theirrole is not yet well understood, given the still ongoing debatesbetween two opposed views on the matter: the “distributedrepresentations” view [80], [81] and the “ sparse coding”view [82], [83] .Indeed, the distributed representations view defends thatconcepts in the brain are represented by the unique activationpatterns of entire and large populations of neurons. It is thusthe pattern uniqueness across a large population that deﬁnescomplex concepts, not particular single neurons. The sparsecoding view defends instead that there exists few neurons thatrepresent selectively particular items or concepts. The extremeversion of sparseness would be that there is a unique cellthat responds to each single unique concept, a version thatis pejoratively and anecdotally known as the grandmother cellhypothesis [19], [20].

Our hypothesis promises to reconcileboth views as follows: information is stored in distributednetworks, and sparse neurons also exist but they play the roleof hubs to connect them and ease retrieval by being source ofbackpropagated APs.

The ﬁrst accounts of sparseness date back to a while agoalready. In practice, since the seminal work of Hubel andwiesel [84], it became mainstream that neurons tend overall to respond to more and more complex features, the deeper wego in the processing layers of sensory input[17], [85]. Indeed,evidence suggests the existence of a hierarchy along what iscalled the ventral visual pathway [85], starting from the pri-mary visual cortex V1, where basic features are encoded, untilthe inferior temporal (IT) cortex, where neurons selectivelyrespond to complex shapes like hands and faces [86], [87],[88].Other known examples of sparse coding, for spatial rep-resentation, are place and grid cells [89]. Place cells forinstance are single neurons which signal speciﬁc places in theenvironment: as the individual navigates in its environment,only the neurons that signal the current place ﬁeld ﬁre.Interestingly, such highly selective neurons have been foundwithin the hippocampal formation, the area which seems tobe a good candidate for being a backpropagation root layer asdiscussed in Sec. III-C2 above.In general, the literature is rife with studies that have mea-sured such selective neurons, in ways that ﬁt our hypothesis,and interestingly in these same MTL areas. For example,Fried et al. [90] have measured neurons that selectivelydiscriminated humans (faces) from inanimate objects, andthis, interestingly, during both encoding and retrieval. Othersdistinguished speciﬁc facial expressions. A little later, Kreiman et al. [91] have measured neurons that highly responded onlyto speciﬁc categories such as animals, houses and celebrities.In a continuous line of work, Quiroga and colleagues [2],[92], [93], [94] have set to understand how the visual featureswe mentioned above are passed to upper layers of the hier-archy so as to understand how they are later used by highercognitive processes: a question to which we hypothesize ananswer in this paper. It is in one of these works, which becamepopular, that Quiroga et al. [2] reported the existence of highlyselective neurons that responded to the presence of speciﬁcstimuli related to places or individuals such as Bill Clintonand Jennifer Anniston. One of the found selective neuronseven exhibited highly selective responses to any stimuli thatis related to Halle Berry, let it be the face, or even the writtenwords.

The latter neuron exhibits striking similar propertiesto our Source pointer neurons as described in the languageunderstanding and naming tasks.

Then, given that this workreminds the widely unaccepted grandmother cell hypothesis,further clariﬁcation have followed up.Waydo et al. [93], with Quiroga as a co-author, later useda probabilistic approach to explore a bit more rigorouslyQuiroga et al. ’s original ﬁndings [2]. Indeed, the latter ob-viously did not test for all MTL neurons and all possiblecategories of objects. Hence, (i) a found selective invariantneuron could respond to other untested categories, and (ii)there might exist many neurons, and not only one as foundby the authors, that would selectively respond to the samestimulus. Authors thus developed a probabilistic model to es-timate the odds, and conﬁrmed the sparseness hypothesis (yet,arguing also against single grandmother cells [82] as done ﬁrstby Quiroga and co-authors). Nonetheless, the model leads to,as the authors conclude, only a bound on the true sparseness:he neural coding could be in reality even much sparser thanthey estimated. In another follow up work, Quiroga et al. [92]insisted, already in the title, on the fact that it is sparse but notgrandmother cells and argued against the unlikely possibilitythat a single unique neuron responds to each stimulus.

In ourcase, although we simulate the hypothesis using single neuronsin Sec. IV, our hypothesis is inline with existence of many ofsuch sparse invariant neurons. We actually think that multipleneurons would be needed, at least to guarantee resiliency ifsome neurons fails.

The authors however conclude with a setof difﬁcult open questions. Our hypothesis suggests alreadyfew answers to the following ones: “How are MTL cellsinvolved in learning associations? How are MTL cells involvedin free recall or the spontaneous emergence of recollectionin the human mind?” As discussed, we believe it couldbe: backpropagated APs, triggered through neuromodulationby some “control centers”, which decide whether or not tofacilitate the recall, the naming etc. The same principle shouldapply to free recall, where said centers activate sequentiallyrelated concepts (see our discussion on mind wandering inSec. VI-A).Mast but not least, in a subsequent work [94], Quiroga etal. have measured that the MTL selective neurons reﬂect thesubjects’ decisions about the stimuli rather than visual fea-tures themselves. They’ve put this in evidence by performingexperiments in which they present subjects with vague stimulithat is a mixture of different celebrities (e.g. a picture thatis the mixture of presidents Bush and Clinton). As expectedfrom previous studies, exposing users to one of the celebritiesleads them to later see the morphed image as pertaining to theopposite celebrity. This is probably due to the fact of “tiring”the neurons of this character. Later by recording Clinton’s andBush’s neurons, they concluded that such MTL neurons ﬁre inaccordance to the decision, not the features. In an interestingfollow up comment, Reddy et al. [95] remarked that damages tothe MTL area cause subjects to have memory impairments, yethaving perfect perceptual awareness and consciousness.

Thisis inline with our hypothesized role of source pointer neuronsthat allow to map related stimuli with backpropagated APs.

As a summary, it seems that sparse neurons that respondselectively to complex concepts exist, in areas where wesuspect them to do, with properties that are inline with ourbackpropagation-based recollection hypothesis.

E. Summary of arguments in favour

We now summarize, as illustrated in simple points inTab. I, the arguments in favour of our hypothesis. First, asseen in Sec. III-A, activity-dependent backpropagating actionpotentials happen and are biologically plausible. Moreover, ithas been observed that these APs are stronger when neuronsare ﬁring, a necessary condition for our hypothesis. Indeed,according to our hypothesis, the retrieval cue should activatethe source pointer neurons which should result in the back-propagating retrieval process. Second, we’ve seen that certaintypes of neuromodulation can enhance such backpropagationin a selective and progressive manner, thus acting as a switch to inhibit it or to strengthen it. This is necessary to controlif or not to enable the retrieval process. This feature isnecessary as humans can also control whether or not to favourthe retrieval of some explicit memory after the exposure toa cue. A naive example is that: not each time we see ascreen of laptop, we recall its name. Interestingly enough,this modulatory phenomenon on backpropagated APs has beenobserved in hippocampal cells, an area that is known to becrucial [75], [79] in memory retrieval, and especially knownin some theories [73], [72], [79] as the place that storesthe indexes that allow to retrieve memories that are storedin other cortical areas. We found also evidence that suchsparse indexes which we called pointer neurons have beenalso observed and well documented [2], [92]. Even moreinterestingly, we’ve seen that a reversal of information ﬂowshas been observed in the hippocampus which is believed to actas a glue between the cues and the engrams to be retrieved.This brings us to another assumption of our hypothesis whichis that retrieval is the reactivation of the same areas andneurons that were used during encoding, a task sometimescalled neural reinstatement, or pattern completion. We ﬁnda large body of optogenetic-based and neuroimaging-basedevidence that conﬁrms such assumption. There is indeed a highoverlap between the areas involved in these two tasks, andoptogenetic-based experiments cited above are indeed basedon tagging the speciﬁc neural ensembles that were activeduring encoding. Additionally, and perhaps as an added bonus,we will computationally show later that this hypothesis is aneffective computational method to associate names to visualinput, with the same high accuracy of a supervised machinelearning algorithm.Additionally and may be anecdotally, the fact that thesesignals are weak and fading away might explain why imag-ination and memory recollection elicit subjective experiencesthat are themselves transient and fading away in nature. Therecollection of an image of a cat is much less vivid andpersistent than the subjective experience that is due to thesensory input.Finally, not reviewed in details above, if our hypothesisproves true for cue-based recollection, it becomes more thanreasonable to embrace the view that it also mediates othergenerative tasks such as mind wandering, intentional creativethinking, dreaming, as well as future episodic thinking orimagining the future. Existing neural correlates studies of suchgenerative tasks [96], [97], [98] can be leveraged to furtherverify our hypothesis.To summarize, we interpret all the arguments in favour asan encouraging call for future work and further investigation.In particular, it should be veriﬁable experimentally if theextent of the action potential backpropagation is proportionalto pre-synaptic weights, and whether and to what extentbackpropagation can be far reaching (e.g. eventually more thanone pre-synaptic hop away). ssumption EvidenceBackpropagating Action Potentials Sec.III-ABackpropagation stronger when neurons ﬁre Sec.III-ABackpropgation can be selectively modulated Sec.III-AHigh overlap between retrieval and encoding Sec.III-C1Information ﬂow reversal (at pointer neurons) Sec.III-C2Backpropagation effects can be far reaching No, but veriﬁableBackpropagation proportional to presynaptic weights No, but veriﬁableExistence of source pointer neurons Sec.III-D

TABLE I: Summary of hypothesis assumptions and supportingevidenceIV. N

AME ASSOCIATION : MODELING WITH SPIKINGNEURAL NETWORKS

We now focus on the task of retrieving object names usingtheir image as a cue. As an added bonus, we set to simulateour hypothesis and assess whether it is a computationallyefﬁcient strategy for this task. To this end, we leverage existingartiﬁcial Spiking Neural Networks (SNNs) trained with STDPlearning and simulate a “teacher” that simultaneously shows tothe SNN, during learning, the images and their correspondingnames. Then during test, backpropagated action potentials areused to retrieve the right name. We compare the accuracy ofa naming mechanism employing our hypothesis to that of amachine learning classiﬁer. Next, we ﬁrst describe in Sec. IV-Athe recent existing SNN models we build on. We criticallyreview their limits and plausibility in Sec. IV-B. Finally, wedetail how we use them in our simulations.

A. Image classiﬁcation with existing Spiking Neural Networks(SNNs)

SNNs are a class of biologically inspired computationalmodels in which spiking neurons communicate informationthrough individual spikes that propagate from one neuronto the next. Such spikes simulate APs, happening when themembrane potential of the neuron crosses a certain thresh-old. In reality, Both the rates at which spikes are generatedand the temporal patterns of spikes are believed to carryinformation about the input stimuli [99], [100]. The artiﬁcialSNNs we leverage in this paper simulate a simpler version ofthis process, still offering higher biological plausibility [101]compared to other artiﬁcial models. Indeed, for training, theSNN we leverage use the more biologically plausible STDPlearning rule [102], [16], [103], [104], [105]. Under thisrule, synaptic weights are updated according to the relativespike times of pre and post synaptic neurons: if the pre-synaptic spike occurs slightly before the post-synaptic spike,then a persistent strengthening of synapses called long-termpotentiation (LTP) occurs [99]. In the other case, the resultis a long-term depression (LTD) which leads to a persistentdepotentiation of synapses.Two recent SNN models in particular provided backgroundfor our simulations [12], [13]. We build in particular on themodel of Kheradpisheh et al. [12] which achieved impressiveaccuracy on simple datasets. We reuse almost as-is its featureextraction layers. The latter are illustrated in the “featurelearning (STDP)” upper part of Fig. 2. As can be seen, it consists of consecutive layers of neural processing. Theﬁrst is a temporal coding layer, that is meant to somewhatsimulate retinal ganglion cells ﬁring moments. It is followedby a cascade of convolutional and pooling layers to extractvisual features. In more details, the ﬁrst layer is responsible ofencoding the input signal into discrete spike trains in the tem-poral domain. For this, it uses Difference of Gaussian (DoG)ﬁlters. This layer detects positive and negative contrasts inthe input image and encodes them in spike latencies, accordingto their strengths. Next, each neuron in the convolutional layerreceives input spikes from the neurons located in a certainwindow and emits a spike when its potential reaches a speciﬁcthreshold. Pooling layers perform a nonlinear max poolingoperation in which they only propagate the ﬁrst spike emitted.In this model, STDP learning only occurs in convolutionallayers and it is done layer by layer. For each image presentedto the neural network, there is a “competition” between theneurons of a convolutional layer and those which ﬁre earliertrigger STDP and learn the input pattern. Finally, the last layeris a global pooling layer which performs a global max pooling.The role of these feature extraction layers is just to learn visualfeatures: they are trained without name labels by propagatingmany images through the layers and adjusting the weights withSTDP.Next, unlike what happens within our hypothesis, in Kher-adpisheh’s model [12], the trained output of this ﬁnal layer isused to train a linear Support Vector Machine (SVM) classiﬁer.The SVM classiﬁer is of course not biologically plausible butthe goal of Kheradpisheh et al. was only to assess the ability ofSNNs and STDP to extract salient visual features that are goodenough to discriminate images. And they actually found thatthey were good enough in terms of classiﬁcation accuracy:their implementation reached 99%, and 98.4% of accuracyin the face/motorbike and MNIST datasets, respectively. Wereproduced their results using Perez’s available implemen-tation [14]. After some search of best parameters[15], wereached around of accuracy on the face/motorbike datasetwith an SVM classiﬁer.Finally, worth mentioning, Mozafari et al. [13] proposeda 4 layers SNN with STDP, whose classiﬁcation last layeris trained this time using reinforcement learning, instead ofthe SVM classiﬁer. Their ﬁnal layer is a decision makinglayer that performs a global pooling operation. Each neuronin it is assigned to a category and the neuron which ﬁresﬁrst indicates the network decision. This work, unlike theprevious one does not thus rely on an external, biologicallynot plausible, classiﬁer. Indeed, weight change in the last layeris modulated by a reward/punishment signal which dependson the correctness/incorrectness of the network’s decision.However, the paper lacks plausibility in that it does notanswer the challenging question of how and who generatesthe reward and punishment signals and more crucially: howdoes it “know” which neurons to punish and which neuron often used to grossly approximate the spatial visual processing in theretina. o “reward”. Later, we present instead an end-to-end model,at the neuronal level, from learning associations to naming.Indeed, we postulate that the simple repeated co-occurence ofsigniﬁer and signiﬁed is enough to tie them together as inpassive learning, and this, in a bidirectional way. Thus in ourframework, ﬁrst, no unknown reward signal or mechanism isneeded. Second, the naming of the object does not implicatea feed-forward mechanism but rather the backpropagation ofaction potentials. Such backpropagation allows to map thesigniﬁer to the signiﬁed in a bidrectional way: retriveingthe signiﬁer from the signiﬁed (naming) and vice versa, thesigniﬁed from the signiﬁer (understanding). B. Plausibility of the above SNN models

Next, before using Kheradpisheh’s model [12] as a basis, webrieﬂy discuss its (lack of) plausibility, as this allows us to laterbetter gauge the plausibility of our hypothesis. In a nutshell,we are aware that the SNN model above lack plausibility inmany aspects, despite being unsupervised, and despite usinga simple rule like STDP learning. For example, it uses onlya spike-time neural coding. It also uses convolutional neuralnetworks (CNNs) with weight sharing, which is biologicallynot plausible. However, all these “problems” do not impact ourhypothesis since we are mainly interested in the last layersof the neural networks (where the retrograde signaling orbackpropagation of the action potentials will actually initiate).Besides, and very interestingly, recent work [106] has shownthat training using “properly translated data” such as the caseof “correlated” images in video relieves the need of using CNNweight sharing, and results in an approximate form of it.Our position here is as follows. The above simple neuralnetwork model trained only with unsupervised STDP learningachieves good performance on what was 20 years ago adifﬁcult problem. This means that it extracts visual featuresof fairly good quality. The latter are of course far frombeing perfect: the accuracy does not reach 100% even on thesimplest motor/face dataset. Nonetheless, we set to verify ifour hypothesis with STDP learning can successfully use thesesame features to ﬁnd the right name association. For fairness,we compare our results to those of the SVM classiﬁer.

C. Our model

Under our hypothesis, successful name association com-prises three steps, two for learning and one for recollection, asmodeled in Fig. 2. The ﬁrst feature learning step is completelyunsupervised and learns through repeated exposure to visualstimuli to extract salient features (e.g. lines, shapes etc) todiscriminate visual content. In the brain, such learning issupposed to happen early in life. And if, for some reasonor another, one is not exposed to visual stimuli, such learningdoes not happen, which leads to cortical blindness. As alreadymentioned, we reuse the SNN model described above [12] tomodel it.The second step is a semi-supervised learning one, wherebya teacher shows a learner an image and the right name thatrefers to it. We call this co-occurence learning . Humans for instance can learn from a single example to map a new objectto its new name. Sometimes, when no external reinforcementhappens, the exposition needs to be repeated multiple timesuntil it is remembered. We model this step by simply adding anew “categories layer” and emulating the right spike each timean image is propagated through the SNN, i.e. generate a spikefor “cat” category neuron while the cat image is propagatedthrough the SNN.The last step is simply the recollection of the name. It isin this step that a retrograde signaling from all neurons in thebackpropagation root layer is sent backwards to the categorieslayer. The neuron(s) which receive the highest “vote” signalthe network decision. We now describe the three steps ofFigure 2 in more details.

D. Feature Learning

For feature learning, we used almost as-is the SNN modelabove [12] and the reader can refer to that reference for moredetails. This phase starts with the input image which beingencoded into discrete spike events in the temporal domain.The encoding process is performed by using Difference ofGaussian (DoG) ﬁlters. Spike times are then computed ac-cording to the output of the DoG ﬁlter. More precisely, let r be the value at a certain index after having applied the DoGﬁlter. Then, the ﬁring time t is deﬁned to be t = r . Thiscorresponds in encoding higher contrast areas of the imageto lower spike times (i.e., latency is inversely proportionalto the constrast). As a result of making the above processdiscrete, each single image is transformed into several wavesof spikes that propagate, one by one, through the layers:spikes that signal higher contrast areas being the ﬁrst to enterthe network. Next, Convolutional layers are arranged in afeedforward manner. Between two consecutive convolutionallayers, a pooling layer performs a max operation to compressvisual data and provide translation invariance. The task of aneuron in a pooling layer simply consists in propagating theﬁrst spike received from a receptive window of the previousconvolutional layer. Neurons in all the convolutional layersare non-leaky integrate and ﬁre neurons. They integrate inputspikes and emit a spike as soon as they reach their threshold.The latter is a hyperparameters to set. Immediately after aspike occurs, weights are updated accordingly, using a simpli-ﬁed version of the STDP learning rule. Let i, j be the indicesof the post and pre-synaptic neurons, respectively. Also, letus deﬁne t i , t j to be their corresponding spike times. Thesynaptic weight w ij is updated by adding a modiﬁcation factor ∆ ij computed as follows according to a simpliﬁed version ofSTDP [12], [107]. ∆ ij = (cid:40) α + · w ij · (1 − w ij ) , if t j ≤ t i − α − · w ij · (1 − w ij ) , if t j > t i (1) α + , α − ∈ R ≥ are two parameters that specify the learningrate or by how much the weights are changed. The latter factorimpacts a lot the learning. Indeed, small values would lead toa slow learning process, they simulate a neural network that isconﬁdent in its prior “beliefs and decisions” (weights). Highalues allow to learn very fast information about the currentstimuli, but they can have as a consequence to “forget” whatthey learned with previous stimuli. Note that this simpliﬁedversion of the STDP rule does not take into account theabsolute time difference between post and pre-synaptic spikes.What matters instead is only the order, or the sign of thedifference. In practise, this is not a problem for our model.This feature learning process goes on, by propagating train-ing images one by one. Each time a new image has been fullyprocessed , and weights updated and stored, the potential ofeach neuron is reset to 0, preparing for the next image. Initially,the synaptic weights are chosen at random from a normaldistribution with some mean and standard deviation . TheSTDP rule ensures that they always remain in the range [0 , .Within each image, learning is done layer by layer: learningat layer (cid:96) begins when learning at layer (cid:96) − has terminated.The intent of this feature learning phase is to learn thesynaptic weights of each neuron in all the convolutionallayers. As observed previously with this SNN model [12],neurons in the ﬁrst layer converged to the simple four orientededges. Neurons in the successive layers learned more complexones by integrating spikes from previous layers. We stressthat this phase is totally unsupervised as the network onlylearns frequent features associated with images and it requiresno knowledge about the input image categories. The nextlearning step includes these categories. We qualify it as semi-supervised. E. Co-occurrence learning

Once the SNN has learned the right weights and hencevisual features, the second co-occurence learning step canbegin. For this step, we assume as per our hypothesis, thatthere is a layer which encodes the object categories or names.The latter is connected, as shown in the ﬁgure, to the lastlayer of the image processing network. Then, an image (e.g.a cat as illustrated in the ﬁgure) is propagated through theimage processing neural network, while at the same time, theneuron which represents its signiﬁer or name is activatedsimultaneously.In more details, as in the previous phase, train images areconsidered one by one. Using the weights learned in the ﬁrstphase, each image passes through the network until it reachesthe last layer where a max pooling operation is performed.During co-occurence learning, a neuron of the last poolinglayer would thus receive two spikes: one propagated by theneuron associated to the class of the image, and one inputfrom the last convolutional layer. This simulates the teacherthat simultaneously shows the image and its name.Note that, implementation-wise [15], this is equivalent tohaving a matrix with the same shape as the last pooling layer remember that each image, because of DoG-based temporal encoding,results in multiple “waves” of spiking trains that are propagated sequentiallythrough the layers In practise, we try different initializations [15] and pick the best we simulate a single neuron for simplicity, but same reasoning appliesmultiple ones Fig. 2: Learning under our hypothesis, simulated with theproposed SDNN with its three main parts.that is associated to each image category (one weight percategory neuron per neuron in the backpropagation layer). Asin the ﬁrst phase, weights are initially random and are updatedonly using the STDP learning rule, as deﬁned in (1). Then,according to the order of post and pre-synaptic spike times andto the index of the spiking neuron, what will happen duringlearning is the following: the weight matrix of the right imagecategory is strengthened (LTP), while the weight matrices ofthe other categories will be weakened (LTD).Contrarily to the previous phase, this phase is supervised asit requires knowledge of the image category in order to be ableto link it with the corresponding image features. Indeed, theim here is to learn associations between the features learnedin the previous phase and image categories by using onlythe simple STDP rule. At the end of this phase, training hascompleted and the SNN can proceed to the naming task usingthe backpropagation principle.One extreme version of this second phase is what is calledone-shot learning: the network is given only a single exampleof each category. We will vary the number of such trainingexamples in Sec. V, effectively trying one-shot and few-shotlearning scenarios, hence why we consider this task to be semi-supervised.

F. Naming

Once the learning is done with the previous two phases, weare ready now for the naming task, following the principles ofbackpropagated action potentials.In this task, the image to name is ﬁrst propagated throughthe fully trained neural network until spikes start to happen inthe last pooling layer, which is our backpropagation root layer.We consider that all neurons in this layer are source pointerneurons. This means, as per our hypothesis, that we allowthem if they ﬁre, to send a backpropagated action potentials,modulated by the presynaptic weights learned in the previoussecond step, to the previous layer that encodes the labelsor names. Neurons in the “categories/signiﬁers” layer willintegrate such received signals and the category which hasthe highest vote is the retained name for the image. Namely,let C i for i = 1 , .., k be the class associated to the neuronwith the highest class score. Then, i is chosen to be the classthe image belongs to. It is this score that can be used as theaccumulated potential that brings the right neuron closer to itsﬁring threshold, leading when it ﬁres to the class decision .We show later in the next section that such a simple mech-anism allows to label the images as accurately as the SVMclassiﬁer. More interestingly, by using high learning ratesduring the ﬁrst co-occurence (e.g. neuromodulating to increasesynaptic strength), it is possible to learn to name objects, withmaximum accuracy, by showing the neural network only asingle instance of the image class; an extreme learning task inwhich the SVM classiﬁer seems to have more difﬁculty.V. S IMULATION RESULTS

We now evaluate the accuracy of the Spiking Neural Net-work when using our hypothesis to learn and name, andcompare it to that of the SNN model followed by the SVMclassiﬁer.

A. Experimental Setup

Experiments have been performed on a server with 5 Intel2.10 GHz CPU, 32 GB of memory and a GPU Nvidia TeslaP100 SXM2 with 16 GB of dedicated memory. We evaluate the In practise, implementation-wise, the score is simply the sum of weights.Indeed, the SNN model considers a spike to be a binary decision that happenswhen the accumulated potential reaches a threshold (i.e. we do not considerthe rate). We tried a version that uses the exact value of the internal potentialinstead of the unit value 1 but results were similar.

Motorbike FaceImage F a ce ( δ < ) M o t o r b i k e ( δ > ) Train

MotorbikeFace Motorbike FaceImage F a ce ( δ < ) M o t o r b i k e ( δ > ) Test

Fig. 3: Difference between the class scores for each image inthe Train (Left) and Test (Right) datasets.accuracy reached by our model using the Caltech motor facedataset [108] considering two classes: Faces and Motorbikes.

B. Overall accuracy with backpropagated APs

We ﬁrst focus on the case where we show the SNN manyexamples of each class. In particular, for each class we select398 images among which 200 are reserved for training (forfeature learning and co-occurence learning likewise) and 198are left for testing. After a search of parameters [15], we set α + and α − to . and . , respectively. The thresholdsfor the ﬁrst, second, and third convolutional layers are set to , , and . Max pooling is not performed for this dataset.This means that in our last layer we use a pooling window ofsize x . This is because images in the Caltech dataset havelow resolutions. The synaptic weights of the class matrices arechosen at random from a normal distribution with mean . and standard deviation . . In this setting, using the backpropagation-based recollec-tion, we reach an accuracy of . and . on the trainand test datasets, respectively. This performance is on par withthat of an SVM classiﬁer, as per the original SNN we build on,and conﬁrms the computational efﬁciency of backpropagatedaction potentials. In more details, Fig. 3 shows the class scores for boththe train and test datasets. Each point corresponds to animage and its value in the ordinate represents the differencebetween the scores associated to the Motorbike and Faceclasses, respectively. Therefore, images with positive valuesare associated to the Motorbike class, while images withnegative values are associated to the Face one. As it can beobserved, backpropgated action potentials allow to separate thetwo classes in a net way for most of the images. However, forsome of the images this distinction is not clear. We believe aswe discussed earlier that the problem comes from the SNNfeature extractor which, although performing, does not yetlearn good representations. Nonetheless, only few images areclassiﬁed incorrectly with a large relative error. This showsoverall the computational plausibility of the backpropagation-based recollection.

25 50 75 100 125 150 175 200 . . . . . . . . A cc u r a c y Train Test

Fig. 4: Accuracy as a function of the number of train imagesper category used in the Co-occurrence Learning phase.

C. Accuracy in few-shot learning

We now focus on the case where the “teacher” shows theSNN only few examples.

Varying the Number of Images in co-occurence learning

We ﬁrst vary the number of training images in the ﬁrst co-occurence based learning phase. In the remainder, wheneverwe talk about training, we refer to the supervised “teacher-based” co-occurence learning where a label is given. Fig.4shows the accuracy on the train and test datasets as a functionof the number of labeled train images. We vary the numberof labeled train images per experiment from to . Asthe number of train images increases, the accuracies reachedon both the train and test images increase as well. The trainand test scores pass from and after 25 imagesper category up to the . and . , from above, afterhaving used the full train dataset. This shows that the SNNlearns but slowly as we feed images and labels. One way tospeed up this process is to increase the learning rates. Varying the learning rate.

Increasing the learning rate cansimulate a neuromodulatory action that strengthens a connec-tion suddenly, without the need for repeated exposure. Wehence vary α + and α − used in the co-occurrence phase andobserve a considerable impact on the accuracy. Fig. 5 illus-trates the impact of the modulated learning rate on the accu-racy level reached as a function of the number of train imagesused. As a baseline, we use α + = 0 . , α − = 0 . and wemultiply both values by some factor λ ∈ { , , , , , } .When λ < , the learning is obviously slower. Indeed, theaccuracy varies from with training images (i.e. random),to using the entire train dataset and grows in a linear way.It will probably reach higher values with more training time.When λ = 1 , we reach the maximum possible test score usingall the train images.An interesting behaviour can be observed when λ > . Thelearning is at ﬁrst faster, as it can reach high accuracy afterhaving seen only a small sample of train images. However, itthen starts decreasing as the number of train images increases.We recall that the used STDP rule keeps the weights withinthe range [0 , . Thus, starting with high values for α + and . . . . . . . . A cc u r a c y λ = 0 . λ = 1 λ = 2 λ = 3 λ = 5 λ = 10 Fig. 5: Accuracy on the test images as a function of thenumber of train images per category used in the Co-occurrenceLearning phase (increased number of shots in a few-shotlearning task). α − allows to faster associate discriminant features with theimages categories. However, as the number of train imagesincreases, weights might become less helpful if they tendto reach the maximum value of and to have thus smallerintermediate values. The scores would become closer andless distinguishable. Another factor is that, as we will seelater, the SNN is better at recognizing and learning fromcertain particular images compared to others (see descriptionof Fig. 6). Hence, being exposed to a good image with a higherlearning rate leads to reaching a good accuracy. But beingexposed afterwards a “bad” image will lead to unlearning thegood weights, thus decreasing the performance.When λ = 10 and with only train images per category,it is possible to reach an accuracy of . . With λ =2 , , and , the number of necessary train images per categoryto reach the same accuracy are , , and , respectively.These results hint to the following direction: the best approachin terms of few-shot learning would be to ﬁrst start with ahigh learning rate or λ but then to stop changing the weightsby either suddenly decreasing λ , or by simply freezing thelearning and making the network always stick to the oldbeliefs. D. One-shot learning: Machine learning vs. Backproagation

In this ﬁnal section, we set the bar high and propose totrain and test the SNN in a one shot learning task, meaningthat we show the neural network only one single image fromeach class, together with their correct names. We then test theaccuracy of the network on the entire 198 images of the testset. We compare SVM and our model on this task.As explored earlier, reaching good performance in this taskgoes through even higher learning rates than tried previously.We experiment with various λ s and various (motorbike,face)image couples. We ﬁnd for instance that λ = 65 yieldedgood results among many other values, so we pick it. Butthis is where we interestingly discovered that the performancedepended incredibly on which couple of (motorbike,face) wereused for the one-shot learning task. We found that, using the .2 0.4 0.6 0.8 1.0Accuracy0.00.20.40.60.81.0 C u m u l a t i v e f r a c t i o n o f i m ag e p a i r s Backpropagated Action PotentialMachine learning (SVM)

Fig. 6: Comparison of ML and our hypothesis in one-shotlearning task. CDF across 2000 different pairs of (motor,face)pictures picked for training.backpropagation-based recollection, certain single couples ofmotorbike and face yielded an accuracy of 96.2% on all theremaining unseen 198 test images. Note that this is higher thanwhat we achieved earlier when training with all images andnot only one or few shots. At the same time, the maximumwe could achieve with the SVM classiﬁer on a single examplewas 84.5% of accuracy.To assess this more systematically, we test both SVM andbackpropagation in the one-shot exercise on around 1500different image couples of motorbike and face photos. We plotin Fig. 6 the resulting empirical cumulative distribution func-tions. The ﬁgure shows that both approaches yield differentdistributions, with most of SVM results being less dispersedaround a little higher than 80% accuracy.

We conclude that,with the right images used for training, backpropagationoutperforms by far the SVM in the one shot task (96.2%accuracy on 198 images against only 84.5%).

Note that this curious result suggests also that the SNNmodels are still not good enough in feature extraction. Thelearned representations are probably not as invariant as theyshould be, hence the differences between images. Furtherinvestigating the differences between these successful andless successful image couples might help enhancing currentSNN models. For these reasons, we believe that our sim-ulations should be seen as another argument to show onlythe computational effectiveness of the backpropagation-basedrecollection mechanism.

VI. D

ISCUSSION

For more than a century [109], information processing inthe brain has been widely believed to follow mainly theforward pre to post-synaptic neurons direction. In this work,we emitted the hypothesis that the backpropagation of actionpotentials mediates all “ofﬂine” generative tasks where thesimultaneous activation of speciﬁc targeted populations ofneurons is needed. This is, we claimed, the case of the retrievalof past memories or mental images, retrieval of signiﬁcation manual inspection of “good” and “bad” images did not uncover anypeculiar character to distinguish them of words, retrieval of names and even the mixture of distinctpast memories into imagination. We reviewed in sec. III-Aabundant evidence that calls for giving the hypothesis achance. As an added bonus, we showed in sec.V that ourhypothesis can be as, or even more, efﬁcient than a machinelearning algorithm in retrieving the category name of an object.If this hypothesis is conﬁrmed true, it would have tremendousimplications, by considerably improving our understanding ofneural encoding and high cognitive functions from a low-levelneural perspective. A. Possible implications

The ﬁrst big implication of this hypothesis is the promise tobring answers to the neural encoding problem and close the olddebate in cognitive sciences between localist and distributedrepresentation theories. If our hypothesis is true, the answer tothe representation problem becomes simple: (i) representationsof concepts are distributed but at the same time, (ii) thereexists highly selective neurons that respond to unique concepts.The latter serve as hubs between various related concepts,playing the role of source pointer neurons to retrieve the entireconcept’s features encoded by an entire population of neurons.For example, there should exist relatively few neurons thatuniquely respond to the image of a cat. But such neuronsserve as hubs to easily connect the cat concept to relatedmemories. Such sparse neurons act as pointers to retrieve thevisual features of a real cat through backpropagated actionpotentials: the latter travel backwards to reactivate selectivelythe neurons that represent the right lines and shapes andcolours that deﬁne a cat in a statistical way. Hence, to beable to recall through mental visual imagery the image ofa cat, the brain does not need to activate only the sparseneurons that respond to all cats, but, as the optogenetic studiesabove also hint to, the activation of an entire larger populationis needed. And if neurons that represent, say, vertical lines,are not activated during the process, the recalled image willphenomenologically lack them.Beyond vision, the cat’s toy example could apply to anymental state and any couple of (lived stimuli, later memory ofthe stimuli) , let them be smells, affects or even impressionsof movements. In accordance with the principle of groundedcognition [110], for which we believe our hypothesis applies,discrete concepts are grounded in the sensorimotor experiencesthat were encoded with them, such that the activation of asigniﬁer of a concept leads to the activation of the experiencesthat are grounded with it. Hence, the cue that is the words“moving” or “tickling” is correlated with areas that encodeactual moving or tickling.Another related hard problem in cognitive sciences that canbeneﬁt from our hypothesis is the binding problem or howdoes the brain binds higher-level concepts to more elementaryones and particularly, how does it associate the right features(e.g. colors) to the right discrete objects or concepts, forexample in an image composed of many objects. The factthat the brain needs some time to correctly perform the bind-ing [111] suggests that this operation is not forward-based buteneratively and iteratively happens later in a second stage. Ifour hypothesis is correct, this should happen through slow andrepetitive runs of top down action potential backpropagation,starting from the right source pointer neurons that deﬁneuniquely the discrete object, all the way backwards, activatingall the neurons that describe its attributes. Actually, twofamous competing (and high level) theories on this problem arethe feature-integration theory [112] and temporal synchroniza-tion theory [113], [114]. Our hypothesis could reconcile themas well. Indeed, both admit the involvement of different runs ofbottom up and top down hierarchies (e.g. attention in featureintegration theory) to implement the binding problem. How-ever, the exact physical mechanisms that implement this werestill unknown. Our hypothesis suggests that these top downhierarchies to bind objects to their features are implementedthrough backpropagated APs. Under this new realm, one doesnot have to chose exclusively between binding-by-synchronyand feature-integration theory: attention with backpropagatedAPs could synchronously activate selectively all the featuresrelated to a given discrete object.This leads us to another closely related implication whichis that attention itself is likely implemented through topdown backpropagated APs. In general, backpropagation canconstitute an easy-to-implement unique and simple mecha-nism that underlies a diverse set of tasks: ofﬂine thinkingor mind wandering, imagination, episodic memory retrievaland future episodic thinking. In our framework, imaginationbecomes “easy” to apprehend and would be simpoly theresulting activation patterns of a mixture of usually unrelatedconcepts: for example, an imagined “laughing cat” results fromsimultaneously top-down back-activating the “laughing” and”cat” concepts. The same applies for mind wandering, wherebackpropagated APs should induce activation patterns on “themost likely neural pathways” generating what seems to becoherent thoughts.All this predicts the existence of generators, or specializedcenters, that release neuromodulators remotely to control thegeneration by either inhibiting or facilitating backpropagation.This is where more advanced modeling and computer simula-tion work can help tremendously in future work. For example,it would be helpful to understand the interplay betweenbackpropagation and forward propagation, since the ﬁrst canresult as well in feedforward propagation, that in turn mightcause backpropagation etc. Such “lateral” activation patternscould be useful to ﬁnd associated concepts, as opposed to“digging into the details” of a single concept.Finally, if proven true, backpropagated APs could opennew ways to better understand some pathological unintentionalrecollections of memories. If so, understanding what factorsimpact the inhibition or excitation of backpropagating APs,could open the way towards understanding possible relateddisorders which might involve obsessive thinking or intrusivethoughts. Other related rarer dysfunctions happen in mentalvisual imagery as well. Examples are the absence (aphantasia)or excess (hyperphantasia) of visual imagery experiences [4].For example, the neuromodulation mechanisms that control backpropagation should be ﬁrst suspected.Last but not least, one implication is that the retrievalprocess is stochastic in nature. The retrieved memory tracelooks like the original ﬁrst perception but, depending on pastexperiences (and hence the weights of the neural connectionsdue to past experiences), the reactivation might not be exactlythe same. This can be seen most in the case of language wherethe same word (e.g. “a screen”) was seen multiple times duringencoding, in the presence of multiple similar stimuli (manydifferent types of screens), and where the same concept isfurther grounded in different neural network connections fromone subject to the other. Finally, if the framework defendedby our hypothesis is correct, language can be seen as acommon cue-based system useful to make the others livesimilar experiences to ours.

B. Verifyability and further investigations

Before drawing a bright future to our hypothesis, twodirections must be seriously taken to further investigate it andconﬁrm its plausibility (or inﬁrm it).

1) Empirical methods:

We veriﬁed in the literature theexistence of retrograde signals that satisfy some of the as-sumptions of our hypothesis. Further targeted empirical studiescan further verify the remaining ones. First, it is crucial tounderstand if the backpropagation signal is stronger on pathswith “higher synaptic weights”. Second, it is necessary tomeasure how far reaching the backpropagation signal can be,beyond solely the previous connection.

2) Computational effectiveness:

In addition to the aboveempirical methods, one line of work could be to verify inparallel the computational effectiveness of this hypothesisin implementing its target goals. In this work, we verifyas a ﬁrst step the ability of retrograde action potentials toperform the object recognition or the naming task, that isthe retrieval of the class of the object once the stimuli ispresented. This allows us to assess the computational power ofthis mechanism, compared to other less biologically plausibleones. We opted for this comparison because of the presenceof a baseline to compare to (an existing image classiﬁer) anda metric (accuarcy) to quantify the computational power ofthe mechanism. Not simulated in this work, retrograde APscan be used symmetrically for the task of “understanding”:i.e. an activation of the signiﬁer (word) neuron that leadsautomatically to the activation of the signiﬁed concept andits, say, visual features. This needs however, to rely onArtiﬁcial Neural networks that have good and plausible featureextraction capabilities. The SNNs we use are promising andclose but they do not yet fully satisfy the last property (seeSec. IV-B).Finally, if artiﬁcial backward reconstruction works well, ourhypothesis could be also computationally veriﬁable at least intheory, for the imagination aspect. One interesting experimentcould be to train Spiking neural networks to recognize two sep-arate concepts from images, exactly as we do for ”Motor” and”Face” in Sec. V. Then, instead of backward constructing onlyone concept at a time, it would be interesting to simultaneouslyctivate two concepts and see the effect on the backwardreconstructed images. To go back to the example above, onecould activate a concept like ”laughing” and a concept like”cat” and visualize the results of the “competition” betweenbackpropagating signals on the backward-constructed images.Similar work can be done with today’s state of the art deepneural networks such as GPT-3’s DALL·E [115] which usesa transformer decoder architecture. But the latter employssupervised mechanisms that lack biological plausibility [9],[10], [11]. A

CKNOWLEDGEMENT

The strongest acknowledgment should go to Dr. AndreaTomassilli who insisted despite being invited as co-author thathis contribution deserves to be only in the acknowledgment.Andrea Tomassilli executed the ﬁrst modiﬁcation to Perez’simplementation, allowing the simulation of the hypothesis andthe automation of the experiments and their analysis. We aregrateful to Dr. Alessandro Finamore for interesting feedbackon an earlier draft of the paper.R

EFERENCES[1] F. De Saussure,

Course in general linguistics . New York: McGraw-Hill., 1959.[2] R. Q. Quiroga, L. Reddy, G. Kreiman, C. Koch, and I. Fried, “Invariantvisual representation by single neurons in the human brain,”

Nature ,vol. 435, no. 7045, pp. 1102–1107, 2005.[3] C. E. Connor, “Friends and grandmothers,”

Nature , vol. 435, no. 7045,pp. 1036–1037, 2005.[4] J. Pearson, “The human imagination: the cognitive neuroscience ofvisual mental imagery,”

Nature Reviews Neuroscience , vol. 20, no. 10,pp. 624–634, 2019.[5] N. Dijkstra, P. Zeidman, S. Ondobaka, M. A. van Gerven, and K. Fris-ton, “Distinct top-down and bottom-up brain connectivity during visualperception and imagery,”

Scientiﬁc reports , vol. 7, no. 1, pp. 1–9, 2017.[6] D. Dentico, B. L. Cheung, J.-Y. Chang, J. Guokas, M. Boly, G. Tononi,and B. Van Veen, “Reversal of cortical information ﬂow during visualimagery as compared to visual perception,”

Neuroimage , vol. 100,pp. 237–243, 2014.[7] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” in ,2014.[8] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,”in

Advances in neural information processing systems , pp. 2672–2680,2014.[9] S. Grossberg, “Competitive learning: From interactive activation toadaptive resonance,”

Cognitive science , vol. 11, no. 1, pp. 23–63, 1987.[10] F. Crick, “The recent excitement about neural networks,”

Nature ,vol. 337, no. 6203, pp. 129–132, 1989.[11] J. C. Whittington and R. Bogacz, “Theories of error back-propagationin the brain,”

Trends in cognitive sciences , 2019.[12] S. R. Kheradpisheh, M. Ganjtabesh, S. J. Thorpe, and T. Masquelier,“Stdp-based spiking deep convolutional neural networks for objectrecognition,”

Neural Networks , vol. 99, pp. 56–67, 2018.[13] M. Mozafari, S. R. Kheradpisheh, T. Masquelier, A. Nowzari-Dalini,and M. Ganjtabesh, “First-spike-based visual categorization usingreward-modulated stdp,”

IEEE Transactions on Neural Networks andLearning Systems , 2018.[14] N. Perez-Nieves, “Sdnn python.” https://github.com/npvoid/SDNNpython. Accessed: 2020-11-08.[15] “Backpropagation-based recollection hypothesis code.” https://github.com/bendiogene/recollection hypothesis. Accessed: 2021-01-10.[16] H. Markram, J. L¨ubke, M. Frotscher, and B. Sakmann, “Regulationof synaptic efﬁcacy by coincidence of postsynaptic aps and epsps,”

Science , vol. 275, no. 5297, pp. 213–215, 1997. [17] N. Kruger, P. Janssen, S. Kalkan, M. Lappe, A. Leonardis, J. Piater,A. J. Rodriguez-Sanchez, and L. Wiskott, “Deep hierarchies in theprimate visual cortex: What can we learn for computer vision?,”

IEEEtransactions on pattern analysis and machine intelligence , vol. 35,no. 8, pp. 1847–1871, 2012.[18] E. Tulving and D. M. Thomson, “Encoding speciﬁcity and retrievalprocesses in episodic memory.,”

Psychological review , vol. 80, no. 5,p. 352, 1973.[19] J. S. Bowers, “On the biological plausibility of grandmother cells:implications for neural network theories in psychology and neuro-science.,”

Psychological review , vol. 116, no. 1, p. 220, 2009.[20] J. S. Bowers, “Grandmother cells and localist representations: a reviewof current thinking,”

Language, Cognition and Neuroscience , vol. 32,no. 3, pp. 257–273, 2017.[21] K. Patterson, P. J. Nestor, and T. T. Rogers, “Where do you know whatyou know? the representation of semantic knowledge in the humanbrain,”

Nature reviews neuroscience , vol. 8, no. 12, pp. 976–987, 2007.[22] K. Svoboda, W. Denk, D. Kleinfeld, and D. W. Tank, “In vivo dendriticcalcium dynamics in neocortical pyramidal neurons,”

Nature , vol. 385,no. 6612, pp. 161–165, 1997.[23] Y. Bereshpolova, Y. Amitai, A. G. Gusev, C. R. Stoelzel, and H. A.Swadlow, “Dendritic backpropagation and the state of the awakeneocortex,”

Journal of Neuroscience , vol. 27, no. 35, pp. 9392–9399,2007.[24] G. Buzsaki and A. Kandel, “Somadendritic backpropagation of actionpotentials in cortical pyramidal cells of the awake rat,”

Journal ofneurophysiology , vol. 79, no. 3, pp. 1587–1591, 1998.[25] G. Stuart, N. Spruston, B. Sakmann, and M. H¨ausser, “Action potentialinitiation and backpropagation in neurons of the mammalian cns,”

Trends in neurosciences , vol. 20, no. 3, pp. 125–131, 1997.[26] P. Vetter, A. Roth, and M. Hausser, “Propagation of action potentialsin dendrites depends on dendritic morphology,”

Journal of neurophys-iology , vol. 85, no. 2, pp. 926–937, 2001.[27] J. Waters, A. Schaefer, and B. Sakmann, “Backpropagating actionpotentials in neurones: measurement, mechanisms and potential func-tions,”

Progress in biophysics and molecular biology , vol. 87, no. 1,pp. 145–170, 2005.[28] S. R. Williams and G. J. Stuart, “Action potential backpropagationand somato-dendritic distribution of ion channels in thalamocorticalneurons,”

Journal of Neuroscience , vol. 20, no. 4, pp. 1307–1317, 2000.[29] S. R. Williams and G. J. Stuart, “Backpropagation of physiologicalspike trains in neocortical pyramidal neurons: implications for tem-poral coding in dendrites,”

Journal of Neuroscience , vol. 20, no. 22,pp. 8238–8246, 2000.[30] H. W. Tao and M.-m. Poo, “Retrograde signaling at central synapses,”

Proceedings of the National Academy of Sciences , vol. 98, no. 20,pp. 11009–11015, 2001.[31] H. Tsubokawa and W. N. Ross, “Muscarinic modulation of spikebackpropagation in the apical dendrites of hippocampal ca1 pyramidalneurons,”

Journal of Neuroscience , vol. 17, no. 15, pp. 5782–5791,1997.[32] D. A. Hoffman and D. Johnston, “Neuromodulation of dendritic actionpotentials,”

Journal of neurophysiology , vol. 81, no. 1, pp. 408–411,1999.[33] E. Tulving and Z. Pearlstone, “Availability versus accessibility ofinformation in memory for words,”

Journal of Verbal Learning andVerbal Behavior , vol. 5, no. 4, pp. 381–391, 1966.[34] E. Tulving et al. , “Episodic and semantic memory,”

Organization ofmemory , vol. 1, pp. 381–403, 1972.[35] P. W. Frankland, S. A. Josselyn, and S. K¨ohler, “The neurobiologicalfoundation of memory retrieval,”

Nature neuroscience , vol. 22, no. 10,pp. 1576–1585, 2019.[36] E. Tulving, “Ecphoric processes in episodic memory,”

PhilosophicalTransactions of the Royal Society of London. B, Biological Sciences ,vol. 302, no. 1110, pp. 361–371, 1983.[37] D. L. Schacter, J. E. Eich, and E. Tulving, “Richard semon’s theoryof memory,”

Journal of Verbal Learning and Verbal Behavior , vol. 17,no. 6, pp. 721–743, 1978.[38] J. S. Nairne, “The myth of the encoding-retrieval match,”

Memory ,vol. 10, no. 5-6, pp. 389–395, 2002.[39] M. Poirier, J. S. Nairne, C. Morin, F. G. Zimmermann, K. Kout-meridou, and J. Fowler, “Memory as discrimination: A challengeto the encoding–retrieval match principle.,”

Journal of Experimentalsychology: Learning, Memory, and Cognition , vol. 38, no. 1, p. 16,2012.[40] W. D. Goh and S. H. Lu, “Testing the myth of the encoding–retrievalmatch,”

Memory & cognition , vol. 40, no. 1, pp. 28–39, 2012.[41] K. Z. Tanaka, A. Pevzner, A. B. Hamidi, Y. Nakazawa, J. Graham,and B. J. Wiltgen, “Cortical representations are reinstated by thehippocampus during memory retrieval,”

Neuron , vol. 84, no. 2, pp. 347–354, 2014.[42] X. Liu, S. Ramirez, P. T. Pang, C. B. Puryear, A. Govindarajan, K. Deis-seroth, and S. Tonegawa, “Optogenetic stimulation of a hippocampalengram activates fear memory recall,”

Nature , vol. 484, no. 7394,pp. 381–385, 2012.[43] G. Vetere, L. M. Tran, S. Moberg, P. E. Steadman, L. Restivo, F. G.Morrison, K. J. Ressler, S. A. Josselyn, and P. W. Frankland, “Memoryformation in the absence of experience,”

Nature neuroscience , vol. 22,no. 6, pp. 933–940, 2019.[44] S. M. Smith and E. Vela, “Environmental context-dependent memory:A review and meta-analysis,”

Psychonomic bulletin & review , vol. 8,no. 2, pp. 203–220, 2001.[45] E. Eich, “Mood as a mediator of place dependent memory.,”

Journalof Experimental Psychology: General , vol. 124, no. 3, p. 293, 1995.[46] C. A. Denny, M. A. Kheirbek, E. L. Alba, K. F. Tanaka, R. A.Brachman, K. B. Laughman, N. K. Tomm, G. F. Turi, A. Losonczy,and R. Hen, “Hippocampal memory traces are differentially modulatedby experience, time, and adult neurogenesis,”

Neuron , vol. 83, no. 1,pp. 189–201, 2014.[47] L. G. Reijmers, B. L. Perkins, N. Matsuo, and M. Mayford, “Local-ization of a stable neural correlate of associative memory,”

Science ,vol. 317, no. 5842, pp. 1230–1233, 2007.[48] A. T. Sørensen, Y. A. Cooper, M. V. Baratta, F.-J. Weng, Y. Zhang,K. Ramamoorthi, R. Fropf, E. LaVerriere, J. Xue, A. Young, et al. , “Arobust activity marking system for exploring active neuronal ensem-bles,”

Elife , vol. 5, p. e13918, 2016.[49] A. F. Lacagnina, E. T. Brockway, C. R. Crovetti, F. Shue, M. J.McCarty, K. P. Sattler, S. C. Lim, S. L. Santos, C. A. Denny, and M. R.Drew, “Distinct hippocampal engrams control extinction and relapse offear memory,”

Nature neuroscience , vol. 22, no. 5, pp. 753–761, 2019.[50] O. Khalaf, S. Resch, L. Dixsaut, V. Gorden, L. Glauser, and J. Gr¨aff,“Reactivation of recall-induced neurons contributes to remote fearmemory attenuation,”

Science , vol. 360, no. 6394, pp. 1239–1242,2018.[51] S. Ramirez, X. Liu, P.-A. Lin, J. Suh, M. Pignatelli, R. L. Redondo, T. J.Ryan, and S. Tonegawa, “Creating a false memory in the hippocampus,”

Science , vol. 341, no. 6144, pp. 387–391, 2013.[52] K. K. Tayler, K. Z. Tanaka, L. G. Reijmers, and B. J. Wiltgen,“Reactivation of neural ensembles during the retrieval of recent andremote memory,”

Current Biology , vol. 23, no. 2, pp. 99–106, 2013.[53] A. Guskjolen, J. W. Kenney, J. de la Parra, B.-r. A. Yeung, S. A.Josselyn, and P. W. Frankland, “Recovery of “lost” infant memories inmice,”

Current Biology , vol. 28, no. 14, pp. 2283–2290, 2018.[54] G. T. Waldhauser, V. Braun, and S. Hanslmayr, “Episodic memoryretrieval functionally relies on very rapid reactivation of sensoryinformation,”

Journal of Neuroscience , vol. 36, no. 1, pp. 251–260,2016.[55] A. Jafarpour, L. Fuentemilla, A. J. Horner, W. Penny, and E. Duzel,“Replay of very early encoding representations during recollection,”

Journal of Neuroscience , vol. 34, no. 1, pp. 242–248, 2014.[56] J. D. Johnson, S. G. McDuff, M. D. Rugg, and K. A. Norman,“Recollection, familiarity, and cortical reinstatement: a multivoxelpattern analysis,”

Neuron , vol. 63, no. 5, pp. 697–708, 2009.[57] J. R. Manning, S. M. Polyn, G. H. Baltuch, B. Litt, and M. J. Kahana,“Oscillatory patterns in temporal lobe reveal context reinstatementduring memory search,”

Proceedings of the National Academy ofSciences , vol. 108, no. 31, pp. 12893–12897, 2011.[58] M. Ritchey, E. A. Wing, K. S. LaBar, and R. Cabeza, “Neural similaritybetween encoding and retrieval is related to memory via hippocampalinteractions,”

Cerebral cortex , vol. 23, no. 12, pp. 2818–2828, 2013.[59] B. P. Staresina, R. N. Henson, N. Kriegeskorte, and A. Alink, “Episodicreinstatement in the medial temporal lobe,”

Journal of Neuroscience ,vol. 32, no. 50, pp. 18150–18156, 2012.[60] R. B. Yaffe, M. S. Kerr, S. Damera, S. V. Sarma, S. K. Inati, and K. A.Zaghloul, “Reinstatement of distributed cortical oscillations occurs withprecise spatiotemporal dynamics during successful memory retrieval,”

Proceedings of the National Academy of Sciences , vol. 111, no. 52,pp. 18727–18732, 2014.[61] J. Fulford, F. Milton, D. Salas, A. Smith, A. Simler, C. Winlove, andA. Zeman, “The neural correlates of visual imagery vividness–an fmristudy and literature review,”

Cortex , vol. 105, pp. 26–40, 2018.[62] N. Dijkstra, S. E. Bosch, and M. A. van Gerven, “Vividness of visualimagery depends on the neural overlap with perception in visual areas,”

Journal of Neuroscience , vol. 37, no. 5, pp. 1367–1373, 2017.[63] N. Dijkstra, S. E. Bosch, and M. A. van Gerven, “Shared neuralmechanisms of visual perception and imagery,”

Trends in cognitivesciences , 2019.[64] N. Dijkstra, P. Mostert, F. P. de Lange, S. Bosch, and M. A. van Gerven,“Differential temporal dynamics during visual imagery and perception,”

Elife , vol. 7, p. e33904, 2018.[65] M. St-Laurent, H. Abdi, and B. R. Buchsbaum, “Distributed patternsof reactivation predict vividness of recollection,”

Journal of CognitiveNeuroscience , vol. 27, no. 10, pp. 2000–2018, 2015.[66] K. J. Friston, L. Harrison, and W. Penny, “Dynamic causal modelling,”

Neuroimage , vol. 19, no. 4, pp. 1273–1302, 2003.[67] S. Hochstein and M. Ahissar, “View from the top: Hierarchies andreverse hierarchies in the visual system,”

Neuron , vol. 36, no. 5,pp. 791–804, 2002.[68] T. Serre, A. Oliva, and T. Poggio, “A feedforward architecture accountsfor rapid categorization,”

Proceedings of the national academy ofsciences , vol. 104, no. 15, pp. 6424–6429, 2007.[69] J. Linde-Domingo, M. S. Treder, C. Kerr´en, and M. Wimber, “Evidencethat neural information ﬂow is reversed between object perception andobject reconstruction from memory,”

Nature communications , vol. 10,no. 1, pp. 1–13, 2019.[70] A. L. Wheeler, C. M. Teixeira, A. H. Wang, X. Xiong, N. Kovacevic,J. P. Lerch, A. R. McIntosh, J. Parkinson, and P. W. Frankland,“Identiﬁcation of a functional connectome for long-term fear memoryin mice,”

PLoS Comput Biol , vol. 9, no. 1, p. e1002853, 2013.[71] L. R. Squire and P. Alvarez, “Retrograde amnesia and memory con-solidation: a neurobiological perspective,”

Current opinion in neurobi-ology , vol. 5, no. 2, pp. 169–177, 1995.[72] T. J. Teyler and J. W. Rudy, “The hippocampal indexing theory andepisodic memory: updating the index,”

Hippocampus , vol. 17, no. 12,pp. 1158–1169, 2007.[73] T. J. Teyler and P. DiScenna, “The hippocampal memory indexingtheory.,”

Behavioral neuroscience , vol. 100, no. 2, p. 147, 1986.[74] J. L. McClelland, B. L. McNaughton, and R. C. O’Reilly, “Why thereare complementary learning systems in the hippocampus and neocortex:insights from the successes and failures of connectionist models oflearning and memory.,”

Psychological review , vol. 102, no. 3, p. 419,1995.[75] M. B. Merkow, J. F. Burke, and M. J. Kahana, “The human hippocam-pus contributes to both the recollection and familiarity componentsof recognition memory,”

Proceedings of the National Academy ofSciences , vol. 112, no. 46, pp. 14378–14383, 2015.[76] J. F. Danker, A. Tompary, and L. Davachi, “Trial-by-trial hippocampalencoding activation predicts the ﬁdelity of cortical reinstatement duringsubsequent retrieval,”

Cerebral Cortex , vol. 27, no. 7, pp. 3515–3524,2017.[77] A. J. Horner, J. A. Bisby, D. Bush, W.-J. Lin, and N. Burgess,“Evidence for holistic episodic recollection via hippocampal patterncompletion,”

Nature communications , vol. 6, no. 1, pp. 1–11, 2015.[78] B. P. Staresina, E. Cooper, and R. N. Henson, “Reversible informationﬂow across the medial temporal lobe: the hippocampus links corticalmodules during memory retrieval,”

Journal of Neuroscience , vol. 33,no. 35, pp. 14184–14192, 2013.[79] M. Moscovitch, “The hippocampus as a” stupid,” domain-speciﬁcmodule: Implications for theories of recent and remote memory, andof imagination.,”

Canadian Journal of Experimental Psychology/Revuecanadienne de psychologie exp´erimentale , vol. 62, no. 1, p. 62, 2008.[80] A. P. Georgopoulos, A. B. Schwartz, and R. E. Kettner, “Neuronalpopulation coding of movement direction,”

Science , vol. 233, no. 4771,pp. 1416–1419, 1986.[81] R. C. Decharms and A. Zador, “Neural representation and the corticalcode,”

Annual review of neuroscience , vol. 23, no. 1, pp. 613–647,2000.[82] H. B. Barlow, “Single units and sensation: a neuron doctrine forperceptual psychology?,”

Perception , vol. 1, no. 4, pp. 371–394, 1972.83] B. A. Olshausen and D. J. Field, “Sparse coding of sensory inputs,”

Current opinion in neurobiology , vol. 14, no. 4, pp. 481–487, 2004.[84] D. H. Hubel and T. N. Wiesel, “Receptive ﬁelds, binocular interactionand functional architecture in the cat’s visual cortex,”

The Journal ofphysiology , vol. 160, no. 1, p. 106, 1962.[85] M. Mishkin, L. G. Ungerleider, and K. A. Macko, “Object vision andspatial vision: two cortical pathways,”

Trends in neurosciences , vol. 6,pp. 414–417, 1983.[86] C. G. Gross, D. B. Bender, and C. d. Rocha-Miranda, “Visual receptiveﬁelds of neurons in inferotemporal cortex of the monkey,”

Science ,vol. 166, no. 3910, pp. 1303–1306, 1969.[87] K. Tanaka, “Inferotemporal cortex and object vision,”

Annual reviewof neuroscience , vol. 19, no. 1, pp. 109–139, 1996.[88] N. K. Logothetis and D. L. Sheinberg, “Visual object recognition,”

Annual review of neuroscience , vol. 19, no. 1, pp. 577–621, 1996.[89] E. I. Moser, E. Kropff, and M.-B. Moser, “Place cells, grid cells, andthe brain’s spatial representation system,”

Annu. Rev. Neurosci. , vol. 31,pp. 69–89, 2008.[90] I. Fried, K. A. MacDonald, and C. L. Wilson, “Single neuron activityin human hippocampus and amygdala during recognition of faces andobjects,”

Neuron , vol. 18, no. 5, pp. 753–765, 1997.[91] G. Kreiman, C. Koch, and I. Fried, “Category-speciﬁc visual responsesof single neurons in the human medial temporal lobe,”

Nature neuro-science , vol. 3, no. 9, pp. 946–953, 2000.[92] R. Q. Quiroga, G. Kreiman, C. Koch, and I. Fried, “Sparse butnot ‘grandmother-cell’coding in the medial temporal lobe,”

Trends incognitive sciences , vol. 12, no. 3, pp. 87–91, 2008.[93] S. Waydo, A. Kraskov, R. Q. Quiroga, I. Fried, and C. Koch, “Sparserepresentation in the human medial temporal lobe,”

Journal of Neuro-science , vol. 26, no. 40, pp. 10232–10234, 2006.[94] R. Q. Quiroga, A. Kraskov, F. Mormann, I. Fried, and C. Koch, “Single-cell responses to face adaptation in the human medial temporal lobe,”

Neuron , vol. 84, no. 2, pp. 363–369, 2014.[95] L. Reddy and S. J. Thorpe, “Concept cells through associative learningof high-level representations,”

Neuron , vol. 84, no. 2, pp. 248–251,2014.[96] K. Christoff, Z. C. Irving, K. C. Fox, R. N. Spreng, and J. R.Andrews-Hanna, “Mind-wandering as spontaneous thought: a dynamicframework,”

Nature Reviews Neuroscience , vol. 17, no. 11, pp. 718–731, 2016.[97] A. Kucyi, “Just a thought: How mind-wandering is represented indynamic brain connectivity,”

Neuroimage , vol. 180, pp. 505–514, 2018.[98] D. R. Addis, A. T. Wong, and D. L. Schacter, “Remembering the pastand imagining the future: common and distinct neural substrates duringevent construction and elaboration,”

Neuropsychologia , vol. 45, no. 7,pp. 1363–1377, 2007.[99] A. Tavanaei, M. Ghodrati, S. R. Kheradpisheh, T. Masquelier, and A. S.Maida, “Deep learning in spiking neural networks,” arXiv preprintarXiv:1804.08150 , 2018.[100] W. Gerstner and W. M. Kistler,

Spiking neuron models: Single neurons,populations, plasticity . Cambridge university press, 2002.[101] S. Ghosh-Dastidar and H. Adeli, “Spiking neural networks,”

Interna-tional journal of neural systems , vol. 19, no. 04, pp. 295–308, 2009.[102] M. Taylor, “The problem of stimulus structure in the behavioural theoryof perception,”

South African Journal of Psychology , vol. 3, pp. 23–45,1973.[103] N. Caporale and Y. Dan, “Spike timing–dependent plasticity: a hebbianlearning rule,”

Annu. Rev. Neurosci. , vol. 31, pp. 25–46, 2008.[104] S. Huang, C. Rozas, M. Trevino, J. Contreras, S. Yang, L. Song,T. Yoshioka, H.-K. Lee, and A. Kirkwood, “Associative hebbiansynaptic plasticity in primate visual cortex,”

Journal of Neuroscience ,vol. 34, no. 22, pp. 7575–7579, 2014.[105] D. B. McMahon and D. A. Leopold, “Stimulus timing-dependentplasticity in high-level vision,”

Current biology , vol. 22, no. 4, pp. 332–337, 2012.[106] J. Ott, E. Linstead, N. LaHaye, and P. Baldi, “Learning in the machine:To share or not to share?,”

Neural Networks , 2020.[107] T. Masquelier and S. J. Thorpe, “Unsupervised learning of visualfeatures through spike timing dependent plasticity,”

PLoS Comput Biol ,vol. 3, no. 2, p. e31, 2007.[108] L. Fei-Fei, R. Fergus, and P. Perona, “Learning generative visualmodels from few training examples: An incremental bayesian approachtested on 101 object categories,”

Computer vision and Image under-standing , vol. 106, no. 1, pp. 59–70, 2007. [109] G. Berlucchi, “Some aspects of the history of the law of dynamicpolarization of the neuron. from william james to sherrington, fromcajal and van gehuchten to golgi,”

Journal of the History of theNeurosciences , vol. 8, no. 2, pp. 191–201, 1999.[110] L. W. Barsalou, “Grounded cognition,”

Annu. Rev. Psychol. , vol. 59,pp. 617–645, 2008.[111] C. Von der Malsburg, “The what and why of binding: the modeler’sperspective,”

Neuron , vol. 24, no. 1, pp. 95–104, 1999.[112] A. M. Treisman and G. Gelade, “A feature-integration theory ofattention,”

Cognitive psychology , vol. 12, no. 1, pp. 97–136, 1980.[113] P. M. Milner, “A model for visual shape recognition.,”

Psychologicalreview , vol. 81, no. 6, p. 521, 1974.[114] M. N. Shadlen and J. A. Movshon, “Synchrony unbound: a criticalevaluation of the temporal binding hypothesis,”