The backpropagation-based recollection hypothesis: Backpropagated action potentials mediate recall, imagination, language understanding and naming
TThe backpropagation-based recollection hypothesis:
Backpropagated action potentials mediate recall,imagination, language understanding and naming
Zied Ben Houidi Huawei Technologies Co. Ltd.
Abstract —Ever since the advent of the neuron doctrine morethan a century ago, information processing in the brain is widelybelieved to mainly follow the forward pre to post-synaptic neu-rons direction. In this paper, we emit the backpropagation-basedrecollection hypothesis as follows: weak and fast fading ActionPotentials following the (highest weight) post to pre-synapticbackward pathways, mediate explicit cue-based memory recall.This includes also the tasks of imagination, future episodic think-ing, language understanding and associating names to variousstimuli. These signals originate in highly invariant neurons, whichuniquely respond to some specific stimuli (e.g. image of a cat).They then travel backwards to reactivate the same populationsof neurons that uniquely respond to this specific stimuli duringperception, thus recreating “offline” an experience that is similar.After stating our hypothesis in details, we review abundantevidence on the existence of such backpropagating signals as wellas other relevant literature that supports our claims. We thenleverage simulations based on existing spiking neural networkmodels with STDP learning to show the computational feasibilityof using such a mechanism to map the image of an objectto its name with the same high accuracy as a state of theart machine learning classifier. Although not yet a theory, webelieve this hypothesis presents a paradigm shift that is worthfurther investigating: it opens the way, among others, to newinterpretations of language acquisition and understanding, theinterplay between memories encoding and retrieval, as well asreconciling the apparently opposed views between sparse codingand distributed representations.
I. I
NTRODUCTION
Biological brains process sensory visual input and learnto extract invariant representations from such input in anunsupervised manner. However, mapping signifiers [1], i.e.mental representations of the image-sounds or ”words”, tothe signified mental representations that such words refer to,needs an interaction with an external agent that supervises thelearning. Such “teacher” simultaneously generates the sound-image related to the signifier in the presence of the actualstimuli that relates to the signified: this can happen for exampleby generating the sound ”cat” or the writing of the word ”cat”in the presence of an actual image of a cat. The teacher shallrepeat this procedure until both are associated. We say thatthe agent has learned to map the signifier to the signified andvice versa .In this context, it is tempting to think at first sight thatthe repeated co-occurence of both stimuli reinforces theirconnection, in a hebbian manner, and thus allows the mapping [email protected], [email protected] of signifiers to their signified representations. However, it iscommonly accepted that neurons process sensory informationmainly in a forward manner, i.e. from pre-synaptic neuronsto post-synaptic ones. Yet, in the case of the signifier andthe signified “problem”, the two co-occuring signals wouldeventually reach a connection point, where both signals havefollowed only forward paths. Now, assuming that memoriesare stored and encoded in the same areas that uniquelyresponded to the memorized stimuli during the first encounter,a challenging question arises: What neural mechanism allowsto associate one to the other, such that the activation of thesignifier can trigger back the activation of the signified andvice versa.We argue that this problem is a particular case of a moregeneral one that occurs whenever the recollection of previouslyencountered and stored stimuli is needed. This is the case ofexplicit memory where a sensory stimuli, such as a visualscene or a smell, acts as a cue to trigger the retrieval ofpast related events. We further argue that this is also the casein imagination where different parts of previously encodedstimuli are recalled and ”merged” together to generate an”imagined” new experience that was not exactly met before.In this paper, we hypothesize that weak and fast-fadingbackpropagating action potentials (APs) (from post to pre-synaptic neurons) whose strength is proportional to pre-synaptic “weights” are the medium by which previouslyencoded information is recollected. Such backpropagatingsignals start in what we call source pointer neurons and travelall the way back, selectively reactivating on their path theneurons which uniquely responded to the stimuli during thefirst encounter, thus creating a similar experience to that of thefirst time. A source pointer neuron, as we will develop lateris a neuron that specializes, thanks to past memorized co-occurences, in selectively and invariantly responding only tothe retrieval cue (e.g. signifier for language) or its associatedmemories to be retrieved (e.g. signified). We refer to this as the backpropagation-based recollection hypothesis throughoutthe paper.Prior work has extensively studied the interplay betweenvisual perception and retrieval as happens in visual imagery(See for example Pearson’s recent review [4]). For example, in See the discussions in Sec. III-C1 for more elaboration on this assumption As we will show later in Sec. III-D, the existence of such selective neuronshas been widely observed [2], [3] a r X i v : . [ q - b i o . N C ] J a n ddition to the high overlap between areas involved in retrievaland perception, it has been observed– thanks to DynamicalCausal Modeling (DCM) of activation patterns, that thereexists indeed a reverse top-down signaling pathway, fromhigher-level to lower-level cortical areas, that is responsible forrecollection of visual images [5], [6]. However, following thetraditional conception of forward propagation, these observedtop-down activation patterns were attributed (wrongly, weargue) to backward recurrent feedback connections. Providinga biologically plausible computational model at the neuronallevel that explains how backward recurrent connections (thatuse the pre to post synaptic path) can reactivate a previouslyencoded stimuli was beyond the scope of their work andremains to the best of our knowledge unsolved. For the sakeof completeness, it is worth mentioning that recent years haveactually seen the rise of ”forward-based” computational gen-erative models that come from machine learning and that cangenerate realistic images, the most notorious being VariationalAutoencoders (VAEs) [7] and Generative Adversarial Nets(GANs) [8]. However, having been designed for a differentpurpose, it is not clear how they can be put together toimplement retrieval tasks, even in machine learning. Secondand most importantly, their complexity and the supervisedmechanisms they employ make them less likely to be bio-logically plausible [9], [10], [11]. We argue instead in thispaper in favour of a simpler unsupervised mechanism whereno local error information and no output target for supervisionare needed: the same forward paths used for perception aresimply used backward for retrieval.After stating our hypothesis, in the general case (Sec. II-A)and in the particular case of language understanding andnaming (Sec. II-B2), we discuss its verifiability and reviewabundant experimental evidence that backs up the plausibilityof most of its assumptions (Sec. III): we find, for example, thatsuch fading away backpropagating action potentials have beenwidely measured, that they are stronger when the postsynapticneuron is firing and most of all, that they can, interestingly andusefully, be controlled by neuromodulation so as to increasetheir strength or disinhibit them (see Sec. III-A). We furtherreview the neural correlates of cue-based explicit memoryretrieval, the existence of sparse pointer neurons, and findfurther abundant evidence supporting the hypothesis.We then focus in the remainder of the paper on a partic-ular case of our problem which is language understandingand particularly naming, which we computationally model(Sec. IV) and then simulate (Sec. V).In this context, wedefine naming as the act of retrieving the representation ofthe sound-image (signifier) that refers best to a presentedvisual stimuli (signified). We define understanding , on theother hand, as the task of retrieving the signified representationthat corresponds to a presented auditory or visual stimuli ofa signifier. We leverage recent success [12], [13] in trainingArtificial Spiking Neural Networks (SNNs) with Spike TimingDependent Plasticity (STDP) learning to simulate a neuralnetwork that implements our hypothesis. We verify the com-putational efficiency of backpropagation-based recollection by comparing its accuracy in correctly naming a visual object,to that of a state of the art Machine learning algorithm. Tofurther challenge the computational ability of our hypothesis,we test it on an extreme learning task, which is namingobjects after seeing a single instance of each class. We findthat backpropagation-based recollection leads on average andmaximum to a higher accuracy compared to a Support VectorMachine (SVM) classifier. We are of course aware that theSNN models we use in this paper are not the brain, let alonefrom the particular implementation we leverage. We believe,however, that the simulations hint towards the computationalefficiency of the mechanism. That, especially tied with ourliterature review, calls for a serious further exploration of thispath, especially given the breadth of the potential implications(as we discuss in Sec. VI-A).II. T
HE BACKPROPAGATION - BASED RECOLLECTION H YPOTHESIS
A. General case
We posit our hypothesis and its assumptions in its contextas follows.
Memories are stored in a distributed fashion inthe same areas where they are detected and recognized whenencountered for the first time. Recollection of memories istherefore a process by which the appropriate population ofneurons is re-activated again so as to ”re-live”, offline, anexperience that is similar to the first encounter. We hypoth-esize that weak and fading away backpropagating signalsfrom post-synaptic to pre-synaptic neurons, whose strengthis proportional to the post-synaptic neuron’s firing rate andpre-synaptic weights, is the mechanism by which the brainperforms generative tasks. By generative tasks, we mean theregeneration of a previously lived and memorized stimuli (e.g.recollection in explicit declarative memory), the generation ofa plausible future stimuli (e.g. future episodic memory) or theregeneration and combination of previous separately-lived andmemorized stimuli, a process we refer to as imagination (e.g.imagining a cat that laughs by combining a memorized mentalimage of a cat with that of the act of laughing).The recollection process starts by the presentation of aretrieval cue, which activates few sparse neurons that uniquelyidentify both the retrieval cue and the to-be retrieved memory.We assume that these neurons learned to respond only to thepresence of either of both stimuli thanks to a low-level learningrule such as STDP [16]; due to the (repeated or modulated)co-occurence of both stimuli in the past: the cue and the to-be retrieved. We further hypothesize that the retrograde signalis initiated in such source ”pointer” neurons (e.g. “JenniferAnniston” cells [2] as we will discuss later if the goal is torecall prior stimuli related to, say, Jennifer Anniston). Thesignal then travels backwards following the paths with higherweights, activating on its way all the various neurons thatcompose the mental images to be recalled. We hypothesize We build on Perez’s python implementation available on github [14] fromwhom we obtain the authorization to use and modify for research purposes.We also release our modifications so as to ease the reproduction of results [15] ig. 1: Illustration in the case of naming and understanding finally that such backpropagation can be controlled remotelyvia neuromodulation so as to invoke it, increase its strengthor to inhibit it. This neuromodulation acts thus as a ”switch”to control whether to do the retrieval or not.B. Case of language acquisition, understanding and naming
We further argue that a particular case of the above-mentioned generative tasks, which we later computationallysimulate, is a form of explicit semantic memory related tolanguage understanding and naming. We first start with someterminology.
1) Terminology:
We adhere to the conceptualization andterminology introduced by swiss linguist Ferdinand de Saus-sure [1] and build on the distinction he introduced betweenthe signifier and the signified . Since several interpretations ofSaussure’s work could perhaps be made, we clarify in thefollowing the one we adhere to. In particular, we refer tothe signifier as the mental representation of the sound-imageof the word. By sound-image, we mean either the phoneticsound resulting from the word, or the image of the lettersthat form the word. We refer to the signified as the mentalrepresentation of the actual object that the sound-image andits mental concept refer to. Both the signifier and the signifiedare concepts, one represents the word, the other represents themental image(s) that this word often refers to. In this context,we refer to understanding as the act of mapping the signifierto its signified.
Naming is the act of retrieving the signifier thatcorresponds to a given mental representation or to a presentedvisual stimuli.
2) Illustration in the case of language:
When it comes tolanguage, our hypothesis implies that backpropagating APsmediate understanding (recollection of the signified once pre-sented with a stimuli that presents the signifier) and naming(recollection of the signifier or name that corresponds to agiven visual object or any other stimuli in general).Fig. 1 illustrates our hypothesis and modeling through theexample of three concurrent sensory inputs that are presentedto a learner and that need to be “permanently associated”. To make a long story short, we assume as illustrated in thefigure that there exists an area where the visual and soundactivation pathways intersect or meet each other (here, at thelast “backpropagation root” layer in Fig. 1). It is a similararea that will be, according to our hypothesis, the root of thebackpropagating APs that mediate the tasks of understandingand naming. During the “understanding” task, the retrieval cueis a word-related stimuli (e.g. sound “cat” or image of the word“cat”) and the retrieved memory is the signified representation(illustrated by the Understanding backward pathway in thefigure). In the task of naming, the retrieval cue is the signifiedobject (here a cat) and the retrieved memory is the name orsignifier of the object (illustrated by the Naming backwardpathway).In more details, the figure exemplifies the toy case ofa “teacher” that shows the “learner” an image of a cat,simultaneously to how the word cat is written, as well asthe sound of the word. In this case, the sensory input of a sound “cat” as well as an image of the word are processedthrough consecutive feed-forward neural layers. Similarly towhat happens in the primate’s visual cortex [17], neurons inthe earlier layers have learned (thanks to a simple unsupervisedrule such as STDP) to respond to simple features and thedeeper we go, the more selective the neurons become and themore they respond to more complex ones. We assume that at alater processing stage, there exists fewer “sparse” neurons thatselectively respond only to the presence of the sound “cat”, werefer to such neurons as the “sound signifier” pointer neuronsof the word cat. Similarly, we assume that there exists neuronsthat selectively respond to the “image” of the word cat and werefer to such family of neurons as the “Image signifier” pointerneurons. Finally, neurons that selectively respond to both arereferred to as the “sound-image signifier” pointer neurons orsimply signifier neurons. Following a similar pattern, the visualimage of the cat itself is processed through various neuralstages until certain neurons (referred to as signified in thefigure) respond selectively only to the presence of the imageof a cat.This is how, as illustrated in the figure, at some stage, theabove described visual and sound pathways reach a commonconnection point, in a subsequent feedforward neural layer.We hypothesize that the repeated, or the neuromodulated , co-occurence of signifier and signified stimuli (i.e. saying “cat”in the presence of an actual cat) reinforces their connection,at this junction point layer, in “a hebbian manner” thanks toa simple rule such as STDP learning. Next, when presentedwith either signifier or signified related stimuli, the sourcepointer neuron(s) in the backpropagation root layer willfire, resulting in backpropagating action potential(s) that areproportional to pre-synaptic weights . The latter will cause the The connection can be reinforced either by mere repetition, e.g. repeatingmany times the word cat in the presence of an image cat, or by neuromod-ulation, in which case a single co-occurence can be enough to cause a longterm reinforcement of the connection. one in theory should be enough but they could be many in reality, e.g. forredundancy ackward activation of the appropriate neurons, thus recallingthe signified if the presented stimuli relates to the signifier,and vice versa.To be more explicit, we consider the naming task as suc-cessful if the backpropagating AP activates only the Signifierneuron(s) that uniquely identify the word “cat”, let it be theimage or the sound. The understanding task is successfulif the backpropagation activates only the Signified neuron(s)that uniquely characterize the image of the cat. It goes thuswithout saying that, as described above and illustrated, weadhere to the view that there exists few neurons that respondselectively to complex stimuli such as (i) sound signifierstimuli, (ii) image signifier stimuli, (iii) signified or (iv)uniquely to the three previous ones (e.g. they fire only whenpresented with any of the three previous ones). During theaction potential backpropagation, these neurons act, accordingto our hypothesis, as pointers to selectively reactivate anappropriate population of pre-synaptic neurons that uniquelycharacterizes the memory trace to be retrieved; thus creatinga similar experience to that of the first encounter(s) when thememory was encoded. For example, what is retrieved couldbe an experience of the sound of the word ”cat” with theparticular voice or conditions in which it was encoded (in linewith what is called the encoding specificity principle [18] thatwe will recall later).As we will discuss later, this hypothesis reconciles (i)sparse/localist and (ii) distributed representation theories inthe brain, promising to end a long debate between cognitivepsychologists and neuroscientists [19], [20] . In our frame-work, there is no need to chose between them as both areneeded, but for different purposes: Sparse coding, illustratedhere by the presence of highly selective neurons, is neededfor backpropagation-based retrieval, while the encoding ofthe entire memory trace is still done via a distributed setof neurons. The latter can be selectively reactivated “on-demand”, from source pointer neurons backwards. It is thusthe simultaneous activation of an entire population of neuronsthat forms the entire memory trace, and single highly selectiveneurons are only pointers, helpful for retrieval.Finally, and interestingly, the fact that backpropagating sig-nals are “fading-away” in nature could explain the ephemeralnature of the experience of recalled memories or visual mentalimagery’s lack of vividness : the latter are not as persistent asthe experience of live sensory stimulation. C. What this hypothesis is not about
Finally, it is necessary to clarify that this hypothesis ismeant to explain mainy only the recollection processes . Bythis, we mean in the particular case of explicit recall, the reactivation, as close as possible, of the same neurons thatwere activated during previous encounters of the stimuli to be For simplicity, we focus in this toy example, on a single encounter that,we assume, was “encoded right away”. In reality, stored memory traces mightevolve with repeated exposure, such that recalling what is meant by the wordcat, leads to the recall of a memory of the signified, that is statistical in nature(e.g. one of the many cats met before, or an average abstract image of a cat) recalled. Indeed, under our hypothesis, reactivating (more orless) the same neural ensembles that uniquely respond to thestimuli to be recalled, is what creates again, “offline”, a similar subjective experience, despite the absence of the stimuli.As a consequence, in the case of language and particularlyname association, what is covered by our hypothesis is howto (i) learn the association and how to (ii) recall the name, not yet actually how to produce it. By producing, we meanemitting the sound or writing the letters of the words. Furtherinvestigations are needed to reassess the production tasks inlight of our new hypothesis. Nonetheless, it occurs to us thatlearning to produce the right sounds, or speaking, happensthrough a trial and error, reward-based based process in whichthe goal is to “mimic”. The study of such mechanism isbeyond the scope of this paper and is left for future work. Ourhypothesis covers instead the reactivation of source pointerneurons that uniquely characterize the name. The latter canbe further used to passively recall the name (e.g. recollectinghow it sounds, or how it is visually written). How such pointerneurons participate to invoke motor areas to produce soundsor write letters is out of scope for now.Next, when we talk about understanding, we mean themodality-specific features of semantic memory [21]: i.e. re-calling the details of how a face or an emotion look orfeel like exactly, as opposed to other aspects of semanticmemory like finding abstract relationships between words.We leave the latter aspect of semantic memory for futurework. Worth mentioning though, Patterson et al. reviewedsemantic knowledge organization in the human brain [21] andreported that all theories agreed on the fact that modality-specific recall is implemented by a distributed brain network, afact that is coherent with a backpropagation-based recollectionhypothesis.Then, our hypothesis assumes that there are centers thatremotely control via neuromodulation whether or not to invokethe recollection: by either increasing the backpropagation orby inhibiting it. It goes thus without saying that the hypothesisdoes not cover what mechanisms control these control centers,and under which conditions recollection is favoured or shutdown. What our hypothesis predicts is that the task of these“control centers” can be extremely easy to implement: the untargeted remote generation of an excitatory neuromodulatorfavours further recall (depending on whatever cues are acti-vated at the moment). The same applies for inhibition.Finally, our hypothesis, being focused only on the recollec-tion process from sparse source pointer neurons backwards,does not directly explain the mechanisms involved in sparseinvariant neurons formation, novelty or familiarity detection,and the interplay between short (e.g. few days back) and long(e.g. few years back) term memories. Nonetheless, it still offersa ground to reason about these issues. For example, the factthat the same sparse neurons keep being used to signal thepresence of the same familiar stimuli throughout the years(e.g. face of own child) might explain why humans are unableto remember much younger versions of these faces (in theabsence of photos): memories are updated in situ and familiaraces will always lead to the activation of the very same“familiar” invariant sparse neurons, not to the activation of“novel” ones.III. R
EVIEW OF EVIDENCE IN THE LITERATURE
We first position our hypothesis in the literature and showevidence that backs it up together with its assumptions.
A. Existence of retrograde signals and backpropagating actionpotentials
Despite the prevalent view of forward processing, it turnsout that a plethora of studies have measured, both in vitro and in vivo in anesthetized [22] and awake [23], [24] mam-malians, action potentials that backpropagate to apical anddistal dendrites, and this for various classes of neurons [25],[26], [27], [28], [29]. We cite in the following only some ofthese studies. For a more complete list, the reader can referto the review of Stuart et al. [25] or that of Waters et al. [27]which summarized the findings about the measurements andhypothesized few roles of backpropagating action potentials.For a more general review of retrograde signals, i.e. not onlyactivity-dependent but also during synaptogenesis etc., thereader can refer to Tao et al. [30].Williams and Stuart [28] performed simultaneous somaticand dendritic recordings from Thalamocortical (TC) neuronsand measured that action potentials, both due to sensory infor-mation or cortical excitatory postsynaptic neurons, backpropa-gate into the dendrites. In another work, the same authors[29]measured the same phenomenon in neocortical pyramidalneurons. Interestingly for our hypothesis, the authors havefound that action potentials due to physiological patterns offiring, backpropagate three to four times more effectivelycompared to action potential pertaining to mean firing rates.This observation is confirmed by several studies (reviewed byWaters et al. [27]) which found that backpropagation wasmodulated by synaptic input. For example, properly timedexcitatory input leads to the amplification of backpropagation,whereas inhibitory input might block it. More interestinglyfor our hypothesis, many neuromodulators have an influenceon backpropagation, often leading to its enhancement, but inmore complex ways. For example and interestingly, given thesupposed role of the hippocampus in retrieval (see later), inhippocampal CA1 pyramidal neurons, it has been observedthat muscarinic agonists enhance the backpropagation in aprogressive manner having a stronger and stronger effect onsubsequent action potentials [31], [32]. This suggests thatneuromodulation therein can act as a “switch” to enable ActionPotential backpropagation in a selective manner, a feature thatis necessary for our hypothesis. As described in Sec. II, notevery presentation of a visual stimuli would systematicallylead to the activation of “naming”. And probably not every pre-sentation of the signifier stimuli should lead to the evocationof its signified representation. Similarly, not every exposureto a familiar stimuli ( known ) automatically leads to explicitly remembering its context. Nonetheless, to the best of our knowledge, the role ofsuch activity-dependent backpropagation of Action Potentials,as reviewed for example by Waters et al. [27], has beenso far hypothesized to be local, acting as a feedback loopfrom postsynaptic to pre-synaptic neurons, to regulate spikingactivity or support synaptic plasticity. In this work, we emitthe hypothesis that such retrograde signals play a more explicitand direct role in higher-level cognitive tasks, such as naming,understanding, and other generative processes like imaginationand explicit memory retrieval. We next oppose our hypothesisto the state of knowledge in experimental cognitive sciencesabout explicit memory (Sec. III-B). We later dive deeper indetails and oppose our assumptions to what is known aboutthe neural correlates of explicit memory(Sec. III-C).
B. Cue-based retrieval: a cognitive sciences perspective
Our hypothesis applies to any task in which the reconstruc-tion of previous encoded stimuli is needed. The literature onthis topic found its origins in early experimental cognitive sci-ence research (e.g. [33], [34]) before recent advances, drivenby neuroimaging and optogenetic stimulation, allowed to shedmore and more light on some actual neural correlates [35]. Westart by surveying the first and linking it to our hypothesis.Since the seminal work of Endel Tulving, long-term humanmemory is widely classified into explicit (declarative) memoryand implicit (procedural) memory. While procedural memoryrelates to long-term acquired skills such as driving or playingan instrument, declarative memory, relates to the explicitrecollection of memories about facts, words, images and eventsetc. In this paper, we focus on the latter and provide ahypothesis about how the retrieval of these memories happensat the neuronal level. It is worth mentioning that Tulvingalso played a role in further dividing explicit memories intoepisodic and semantic ones[34]. Our hypothesis is orthogonalto the difference between them, its mechanism can be usefulfor both and beyond, we believe, to any generative task suchas imagination and mind-wandering.One crucial principle in this area was formulated by Tulvingand Thomson under the name of encoding specificity [18].The principle stresses the importance of retrieval cues andthe entire context that is perceived during encoding for laterretrieval: the surrounding context that was present duringthe first perception and encoding moment can act as anefficient retrieval cue in the future. Although it might soundstraightforward today, such early work [33], [18] played a rolein differentiating between availability of memories and their accessibility , thanks to cues, let them be internal or externalstimuli-based : the inability to recall does not necessarily meanthat the memory is not available but could be also due to thelack (inactivation) of appropriate cues.Our hypothesis offers a neurobiological ground to interpretand simulate encoding specificity: for us, any surroundingcontext during encoding can act as a retrieval cue as long as itctivates the source pointer neurons that uniquely identify theretrieval cue and all the remaining surrounding context to beretrieved. Backpropagating action potentials can then “travelbackwards” to reactivate networks of neurons which uniquelyresponded to the stimuli during the encoding moment, thuscreating, again, a similar experience. In fact, the process bywhich an internal or external (sensory) cue activates the storedmemory trace is well known and has been called ecphory [36],[37]. Ecphory , which is this interaction between trace andcue, is described as the first stage of memory retrieval beforeconversion actually happens and the recollection experience is“lived”. In our hypothesis, the cue activates mainly the sourcepointer neurons, which in turn activate back the appropriatepresynaptic populations, resulting in the recall of all the relatedtraces. As such, following our hypothesis, it can be easilyseen how two components may affect memory retrieval asalready predicted by Tulving: the lack of the appropriate cueor a decay of the synaptic weights of the concerned neuralnetworks. Our hypothesis announces others that could beverified in the future. For example, a decline in the intensityor extent of the backpropagation (e.g. impairment in theneuromodulation that is supposed to facilitate it) can hamperretrieval. Conversely, an excess of such backpropagation mightlead to higher levels of intrusive thoughts.
Today, the encoding specificity view has passed the testof time (see [35] for a recent review) and even its “oppo-nents” [38], [39], [40] do not question the necessity of somedegree of match between encoding and retrieval conditions,but rather stress the importance of additional factors thatinfluence the performance of later retrieval; the most importantbeing the “discriminative” power of the retrieval cue or itsdistinctiveness. Accordingly, recalling performance is not onlyrelated to the amount of match between encoding and retrievalconditions, as thought first, but also to cue overload and henceto what extent the cue is “discriminative”.
This latter viewcan be also easily observed under our framework: let’s havein mind the illustration in Fig.1 and the image of a cat as acue. If this image appears, during learning, simultaneouslywith all sorts of names, and not only the signifier “cat”,then the backpropagating action potentials would potentiallysimultaneously activate many “ Signifier” neurons (and notonly that of the cat), making it hard to distinguish and hencecorrectly name. The necessity of being discriminative can bealso seen in our simulations of naming later in Sec. V: afterbackpropagating action potentials back to the “categories”layer, the “signifier neuron” that gets the highest “votes” iselected to signal the name of the object. If all neurons receive“equal votes” because of cue overload, it would be impossibleto retrieve the right name. The selectivity of source pointer neurons is a consequence of encounteredprior stimuli, and this defines what particular contextual cues are efficient atrecalling what particular encoded trace. A term that Tulving revived together with the forgotten work of germanscientist Richard Semon, who first coined it and stressed the importance ofretrieval cues.
C. Explicit memory: neural correlates
We now dive more into the neurobiological foundationsof encoding specificity, ecphory and the processes involvedin explicit memory retrieval. This field has seen tremendousadvances driven by two techniques: neuroimaging and opto-genetic stimulation.We start from a recent thorough review [35] in whicha large body of research strongly supported Semon’s andTulving’s cognitive theories: namely, that (i) the success ofaccessibility depends on the interaction between cues andmemory traces and that (ii) there are strong ties betweenencoding and retrieval, including at the level of activatedneural ensembles. In particular, it was shown, using artificialoptogenetic stimulation techniques that it is possible to eitherdisrupt or mimic ecphoric processes by activating or inhibitingthe same specific neural ensembles that were active duringencoding .In one experimental study [41] that we further analyzebelow, blocking the neural ensembles that were used torecognize the cue during encoding, resulted in impairmentin retrieval. In the experiments, mice were conditioned toproduce a fear response whenever placed in a particularcontext, a context which stands here for the cue. At thesame time, CA1 neural ensembles that were particularly activeduring learning are optogenetically tagged. Whenever placedin the same context again, mice successfully freeze as asign that they recognize the environment. However, placingthem while inhibiting the previously tagged neural ensemblesconsiderably reduces freezing levels. This means that if theneural ensembles that recognize the cue do not activate, thememory is not retrieved.
Other studies (e.g. [42]) similaryused optogenetics to demonstrate the “opposite” possibility:artificially reactivating the neural ensembles that recognizethe cue, thus inducing the retrieval of the memory trace(i.e. causing freezing), even outside the context in which theconditioning happened (i.e. in the absence of a natural cue).Now, in the previous two families of experiments, mappingthe cue and the trace was learned naturally, neural ensembleswere tagged depending on their activity during conditioning,and artificial inhibition or excitation were used to disruptor elicit retrieval. A last recent family of experiments [43]showed that it is even possible to associate a cue and a traceartificially and to later elicit retrieval in natural conditions. Inparticular, they repeatedly used photostimulation to artificiallyactivate a neural ensemble that usually recognizes a specialsmell, simultaneously to photostimulating another memorytrace that elicits avoidance. After this co-occurence basedconditioning happened, exposure of mice to the real smellcaused an avoidance reaction: mice “remembered” to avoidalthough they’ve never experienced the smell in reality. techniques that allow to later activate or inhibit precisely only selectedpopulations of neurons that were initially selectively tagged, depending ontheir activity Note that, here, we use interchangeably encoding, learning and condi-tioning to accommodate different terminologies used in different papers. inally, note how, in optogenetic stimulation above, lightcan simultaneously selectively activate a set of neurons thatwere prepared in advance to be light sensitive. This allowedto dissect the interaction between memory traces and retrievalcues and show how both are intimately related during encod-ing. However, it is not clear today how a similar selectivereactivation can happen in the brain.
That’s exactly the roleof our hypothesis: hence, our answer to the question “howdoes a neural ensemble activates in a selective way” lies inbackpropagating action potentials following the paths withhighest presynaptic weights. Our hypothesis also offers aframework to simulate (as we do later) and understand theseissues at the level of single neurons.
As a first summary, the studies above [41], [42], [43]and many others reported in the review [35] (which weencourage the reader to check) confirm the importance ofcues and encoding specificity. But beyond that, they suggestthat retrieval reactivates what was active during encoding, ina process referred to, sometimes, as neural reinstatement.
Thisprinciple is at the heart of our hypothesis as backpropagatedAPs should follow exactly the reverse path that uniquely ledto the activation of source neurons during encoding. It turnsout that many arguments support this reinstatement principle.We review them in what follows.1) Retrieval as top-down re-activation:
First, historically,the oldest (yet weak) supporting fact is that simply reinstatingthe encoding context during the recollection time enhancesretrieval performance and quality, as reported by some re-views [44], [45]. Second, more recently and more strongly, asignificant body of research intentionally studied the overlapbetween encoding and retrieval and provided large evidencein favour of the principle using various ensemble tagging[42],[41], [46], [47], [48], [49], [50], [51], [52], [53], EEG[54]and neuroimaging[55], [56], [57], [58], [59], [60], [61], [62],[63], [64], [5] techniques. Furthermore, it was shown that thereactivation overlap between encoding and retrieval influencesalso the perceived quality of the retrieval. For example, in thecase of visual imagery (i.e. attempting to mentally visualizean image), it has been shown that activation overlap in visualcortex increased visual imagery vividness, or the subjectiveintensity of the remembered image[65], [62], [61]. We referthe reader to the many references above for more informationand cite in what follows only few examples of each technique.In terms of neuroimaging, Dijkstra et al. [5] for exam-ple used Dynamical Causal Modeling (DCM) [66] to infercoupling between cortical regions involved in the tasks ofvisual perception as opposed to visual imagery. They measuredthat visual imagery vividness correlated more with top-downconnectivity patterns (from high level cortical areas to lowerlevel areas) as opposed to perception itself. Many other studiessuggest such top down mechanism during visual imagery [67],[68], [69], [6], [4]. The reader can refer to Pearson’s recentreview [4] of the cognitive neuroscience of visual mentalimagery for more details about the top down reverse hierarchyof information and the fact that the process seems to be a weakform of the bottom-up perception. However, in general, due to the widespread view that neural computation is mainly forwardfrom pre to postsynaptic neurons, this top down activationcascade has been always interpreted, in the literature, as theresult of feedforward feedback connections from higher-levelcortical layers to lower ones.Then, and perhaps more convincing than neuroimaging,neural ensemble tagging techniques also confirm the principle.In addition to work we described above [41], [42], [43],another recent example is the work of Guskjolen et al. [53]who performed contextual fear conditioning experiments onyoung mice while tagging the neural ensembles which wereactive during encoding. As happens with infantile amnesiain humans, the infant mice later exhibited forgetfulness.However, photo stimulation of tagged neurons, only in thehippocampal formation (Dentate Gyrus in particular), inducedmemory recovery and reactivation of broader areas which weretagged during conditioning including hippocampal CA1 andC3, and cortical neurons. Note how this finding is inline againwith the idea that traces are distributed in neural ensemblesthat span many cortical brain regions [70], each responsible ofone aspect of information (sensory, motor, visual, emotionaletc).
2) Existence of “backpropagation root layers”:
The lastpaper leads us to the last point we review in this section: therole of the Medial Temporal Lobe and its relationship to ourbackpropagation root layers where source pointer neurons arelocated.
Indeed, our hypothesis assumes the existence of anarea where source pointer neurons lie and where the backprop-agation starts. If our hypothesis is correct, this area shouldform the glue between cues and retrieved traces, and shouldobserve a reversal of the flow of information. Interestingly,the Medial temporal lobe and the hippocampus in particularhas been shown to (i) play this role and (ii) exhibit a similarreversal behaviour.
For (i), many theories[71], [72], [73], [74],[75] support that the hippocampus performs exactly the taskof reinstating patterns of activity in the cortex that were aliveduring encoding. This can be already seen from the study ofTanaka et al. [41] which we reported above. What they didby actually monitoring cortical activity while inactivating hip-pocampal CA1 cells on rodents, shows how the hippocampusis likely responsible for reinstating the patterns that were activeat encoding. By permanently tagging neurons which wereactive during encoding (a fear conditioning experiment), theywere able silence them with laser stimulation, up to severaldays later. When silencing only the tagged CA1 cells (and notthe entire engram), memory retrieval was impaired; and therest of the neural ensemble in the cortex and amygdala, whichused to reactivate during retrieval, was not reactivated again.Many other studies [76], [77], [78] also showed that retrievalsuccess depended on whether or not the hippocampus wasconcurrently solicited or not, and this during both encodingand retrieval. Horner et al. [77] showed further evidence thatthe hippocampus binds together all elements composing a tracethat is stored in distributed regions in the cortex, playing asa hub to perform what is also called pattern completion task.Finally, interestingly for our hypothesis, Staresina et al. [78]bserved a reversible signal flow from the cue region to thetarget region to be recalled through the hippocampus.
Thisputs the HC and MTL in the position of good candidates tobe root backpropagation areas as per in our hypothesis: theyseem to implement the link between cues’ networks and traces’networks and they seem to be the place where flow reversalhappens.
Citing verbatim a thorough review and perspective fromMoscovitch[79]: “Retrieval occurs when an external or inter-nally generated cue triggers the hippocampal index, whichin turn activates the entire neocortical ensemble associatedwith it. In this way, we recover not only the content of anevent but the consciousness that accompanied our experienceof it”. Moscovitch refers later to the hippocampal memoryindexing theory of Teyler and DiScenna [73], [72] as follows:“Memory traces in the HC/MTL are encoded in sparse, dis-tributed representations that act as an index or pointers to theneocortical ensembles that mediate the attended information”.
This claim, which is inline with our hypothesis, leads to thenext assumption which we further verify next: the existence ofsparse source pointer neurons.D. Existence of highly selective source Pointer Neurons
Our hypothesis assumes that, at a certain deep level ofprocessing, certain neurons will become highly selective andinvariant: they serve as pointers to reconstruct the encodedstimuli they represent. For example, in the particular case oflanguage, some neurons will respond only to the signified oronly to the signifier (one of the cues), and some would respondto both the signifier and the signified (i.e. what is in commonbetween the cue and the “to be recalled”). In this section, wescan the literature about concept representation in the brain toassess the plausibility of the existence of such neurons.
It turnsout that similar neurons have been documented and that theirrole is not yet well understood, given the still ongoing debatesbetween two opposed views on the matter: the “distributedrepresentations” view [80], [81] and the “ sparse coding”view [82], [83] .Indeed, the distributed representations view defends thatconcepts in the brain are represented by the unique activationpatterns of entire and large populations of neurons. It is thusthe pattern uniqueness across a large population that definescomplex concepts, not particular single neurons. The sparsecoding view defends instead that there exists few neurons thatrepresent selectively particular items or concepts. The extremeversion of sparseness would be that there is a unique cellthat responds to each single unique concept, a version thatis pejoratively and anecdotally known as the grandmother cellhypothesis [19], [20].
Our hypothesis promises to reconcileboth views as follows: information is stored in distributednetworks, and sparse neurons also exist but they play the roleof hubs to connect them and ease retrieval by being source ofbackpropagated APs.
The first accounts of sparseness date back to a while agoalready. In practice, since the seminal work of Hubel andwiesel [84], it became mainstream that neurons tend overall to respond to more and more complex features, the deeper wego in the processing layers of sensory input[17], [85]. Indeed,evidence suggests the existence of a hierarchy along what iscalled the ventral visual pathway [85], starting from the pri-mary visual cortex V1, where basic features are encoded, untilthe inferior temporal (IT) cortex, where neurons selectivelyrespond to complex shapes like hands and faces [86], [87],[88].Other known examples of sparse coding, for spatial rep-resentation, are place and grid cells [89]. Place cells forinstance are single neurons which signal specific places in theenvironment: as the individual navigates in its environment,only the neurons that signal the current place field fire.Interestingly, such highly selective neurons have been foundwithin the hippocampal formation, the area which seems tobe a good candidate for being a backpropagation root layer asdiscussed in Sec. III-C2 above.In general, the literature is rife with studies that have mea-sured such selective neurons, in ways that fit our hypothesis,and interestingly in these same MTL areas. For example,Fried et al. [90] have measured neurons that selectivelydiscriminated humans (faces) from inanimate objects, andthis, interestingly, during both encoding and retrieval. Othersdistinguished specific facial expressions. A little later, Kreiman et al. [91] have measured neurons that highly responded onlyto specific categories such as animals, houses and celebrities.In a continuous line of work, Quiroga and colleagues [2],[92], [93], [94] have set to understand how the visual featureswe mentioned above are passed to upper layers of the hier-archy so as to understand how they are later used by highercognitive processes: a question to which we hypothesize ananswer in this paper. It is in one of these works, which becamepopular, that Quiroga et al. [2] reported the existence of highlyselective neurons that responded to the presence of specificstimuli related to places or individuals such as Bill Clintonand Jennifer Anniston. One of the found selective neuronseven exhibited highly selective responses to any stimuli thatis related to Halle Berry, let it be the face, or even the writtenwords.
The latter neuron exhibits striking similar propertiesto our Source pointer neurons as described in the languageunderstanding and naming tasks.
Then, given that this workreminds the widely unaccepted grandmother cell hypothesis,further clarification have followed up.Waydo et al. [93], with Quiroga as a co-author, later useda probabilistic approach to explore a bit more rigorouslyQuiroga et al. ’s original findings [2]. Indeed, the latter ob-viously did not test for all MTL neurons and all possiblecategories of objects. Hence, (i) a found selective invariantneuron could respond to other untested categories, and (ii)there might exist many neurons, and not only one as foundby the authors, that would selectively respond to the samestimulus. Authors thus developed a probabilistic model to es-timate the odds, and confirmed the sparseness hypothesis (yet,arguing also against single grandmother cells [82] as done firstby Quiroga and co-authors). Nonetheless, the model leads to,as the authors conclude, only a bound on the true sparseness:he neural coding could be in reality even much sparser thanthey estimated. In another follow up work, Quiroga et al. [92]insisted, already in the title, on the fact that it is sparse but notgrandmother cells and argued against the unlikely possibilitythat a single unique neuron responds to each stimulus.
In ourcase, although we simulate the hypothesis using single neuronsin Sec. IV, our hypothesis is inline with existence of many ofsuch sparse invariant neurons. We actually think that multipleneurons would be needed, at least to guarantee resiliency ifsome neurons fails.
The authors however conclude with a setof difficult open questions. Our hypothesis suggests alreadyfew answers to the following ones: “How are MTL cellsinvolved in learning associations? How are MTL cells involvedin free recall or the spontaneous emergence of recollectionin the human mind?” As discussed, we believe it couldbe: backpropagated APs, triggered through neuromodulationby some “control centers”, which decide whether or not tofacilitate the recall, the naming etc. The same principle shouldapply to free recall, where said centers activate sequentiallyrelated concepts (see our discussion on mind wandering inSec. VI-A).Mast but not least, in a subsequent work [94], Quiroga etal. have measured that the MTL selective neurons reflect thesubjects’ decisions about the stimuli rather than visual fea-tures themselves. They’ve put this in evidence by performingexperiments in which they present subjects with vague stimulithat is a mixture of different celebrities (e.g. a picture thatis the mixture of presidents Bush and Clinton). As expectedfrom previous studies, exposing users to one of the celebritiesleads them to later see the morphed image as pertaining to theopposite celebrity. This is probably due to the fact of “tiring”the neurons of this character. Later by recording Clinton’s andBush’s neurons, they concluded that such MTL neurons fire inaccordance to the decision, not the features. In an interestingfollow up comment, Reddy et al. [95] remarked that damages tothe MTL area cause subjects to have memory impairments, yethaving perfect perceptual awareness and consciousness.
Thisis inline with our hypothesized role of source pointer neuronsthat allow to map related stimuli with backpropagated APs.
As a summary, it seems that sparse neurons that respondselectively to complex concepts exist, in areas where wesuspect them to do, with properties that are inline with ourbackpropagation-based recollection hypothesis.
E. Summary of arguments in favour
We now summarize, as illustrated in simple points inTab. I, the arguments in favour of our hypothesis. First, asseen in Sec. III-A, activity-dependent backpropagating actionpotentials happen and are biologically plausible. Moreover, ithas been observed that these APs are stronger when neuronsare firing, a necessary condition for our hypothesis. Indeed,according to our hypothesis, the retrieval cue should activatethe source pointer neurons which should result in the back-propagating retrieval process. Second, we’ve seen that certaintypes of neuromodulation can enhance such backpropagationin a selective and progressive manner, thus acting as a switch to inhibit it or to strengthen it. This is necessary to controlif or not to enable the retrieval process. This feature isnecessary as humans can also control whether or not to favourthe retrieval of some explicit memory after the exposure toa cue. A naive example is that: not each time we see ascreen of laptop, we recall its name. Interestingly enough,this modulatory phenomenon on backpropagated APs has beenobserved in hippocampal cells, an area that is known to becrucial [75], [79] in memory retrieval, and especially knownin some theories [73], [72], [79] as the place that storesthe indexes that allow to retrieve memories that are storedin other cortical areas. We found also evidence that suchsparse indexes which we called pointer neurons have beenalso observed and well documented [2], [92]. Even moreinterestingly, we’ve seen that a reversal of information flowshas been observed in the hippocampus which is believed to actas a glue between the cues and the engrams to be retrieved.This brings us to another assumption of our hypothesis whichis that retrieval is the reactivation of the same areas andneurons that were used during encoding, a task sometimescalled neural reinstatement, or pattern completion. We finda large body of optogenetic-based and neuroimaging-basedevidence that confirms such assumption. There is indeed a highoverlap between the areas involved in these two tasks, andoptogenetic-based experiments cited above are indeed basedon tagging the specific neural ensembles that were activeduring encoding. Additionally, and perhaps as an added bonus,we will computationally show later that this hypothesis is aneffective computational method to associate names to visualinput, with the same high accuracy of a supervised machinelearning algorithm.Additionally and may be anecdotally, the fact that thesesignals are weak and fading away might explain why imag-ination and memory recollection elicit subjective experiencesthat are themselves transient and fading away in nature. Therecollection of an image of a cat is much less vivid andpersistent than the subjective experience that is due to thesensory input.Finally, not reviewed in details above, if our hypothesisproves true for cue-based recollection, it becomes more thanreasonable to embrace the view that it also mediates othergenerative tasks such as mind wandering, intentional creativethinking, dreaming, as well as future episodic thinking orimagining the future. Existing neural correlates studies of suchgenerative tasks [96], [97], [98] can be leveraged to furtherverify our hypothesis.To summarize, we interpret all the arguments in favour asan encouraging call for future work and further investigation.In particular, it should be verifiable experimentally if theextent of the action potential backpropagation is proportionalto pre-synaptic weights, and whether and to what extentbackpropagation can be far reaching (e.g. eventually more thanone pre-synaptic hop away). ssumption EvidenceBackpropagating Action Potentials Sec.III-ABackpropagation stronger when neurons fire Sec.III-ABackpropgation can be selectively modulated Sec.III-AHigh overlap between retrieval and encoding Sec.III-C1Information flow reversal (at pointer neurons) Sec.III-C2Backpropagation effects can be far reaching No, but verifiableBackpropagation proportional to presynaptic weights No, but verifiableExistence of source pointer neurons Sec.III-D
TABLE I: Summary of hypothesis assumptions and supportingevidenceIV. N
AME ASSOCIATION : MODELING WITH SPIKINGNEURAL NETWORKS
We now focus on the task of retrieving object names usingtheir image as a cue. As an added bonus, we set to simulateour hypothesis and assess whether it is a computationallyefficient strategy for this task. To this end, we leverage existingartificial Spiking Neural Networks (SNNs) trained with STDPlearning and simulate a “teacher” that simultaneously shows tothe SNN, during learning, the images and their correspondingnames. Then during test, backpropagated action potentials areused to retrieve the right name. We compare the accuracy ofa naming mechanism employing our hypothesis to that of amachine learning classifier. Next, we first describe in Sec. IV-Athe recent existing SNN models we build on. We criticallyreview their limits and plausibility in Sec. IV-B. Finally, wedetail how we use them in our simulations.
A. Image classification with existing Spiking Neural Networks(SNNs)
SNNs are a class of biologically inspired computationalmodels in which spiking neurons communicate informationthrough individual spikes that propagate from one neuronto the next. Such spikes simulate APs, happening when themembrane potential of the neuron crosses a certain thresh-old. In reality, Both the rates at which spikes are generatedand the temporal patterns of spikes are believed to carryinformation about the input stimuli [99], [100]. The artificialSNNs we leverage in this paper simulate a simpler version ofthis process, still offering higher biological plausibility [101]compared to other artificial models. Indeed, for training, theSNN we leverage use the more biologically plausible STDPlearning rule [102], [16], [103], [104], [105]. Under thisrule, synaptic weights are updated according to the relativespike times of pre and post synaptic neurons: if the pre-synaptic spike occurs slightly before the post-synaptic spike,then a persistent strengthening of synapses called long-termpotentiation (LTP) occurs [99]. In the other case, the resultis a long-term depression (LTD) which leads to a persistentdepotentiation of synapses.Two recent SNN models in particular provided backgroundfor our simulations [12], [13]. We build in particular on themodel of Kheradpisheh et al. [12] which achieved impressiveaccuracy on simple datasets. We reuse almost as-is its featureextraction layers. The latter are illustrated in the “featurelearning (STDP)” upper part of Fig. 2. As can be seen, it consists of consecutive layers of neural processing. Thefirst is a temporal coding layer, that is meant to somewhatsimulate retinal ganglion cells firing moments. It is followedby a cascade of convolutional and pooling layers to extractvisual features. In more details, the first layer is responsible ofencoding the input signal into discrete spike trains in the tem-poral domain. For this, it uses Difference of Gaussian (DoG)filters. This layer detects positive and negative contrasts inthe input image and encodes them in spike latencies, accordingto their strengths. Next, each neuron in the convolutional layerreceives input spikes from the neurons located in a certainwindow and emits a spike when its potential reaches a specificthreshold. Pooling layers perform a nonlinear max poolingoperation in which they only propagate the first spike emitted.In this model, STDP learning only occurs in convolutionallayers and it is done layer by layer. For each image presentedto the neural network, there is a “competition” between theneurons of a convolutional layer and those which fire earliertrigger STDP and learn the input pattern. Finally, the last layeris a global pooling layer which performs a global max pooling.The role of these feature extraction layers is just to learn visualfeatures: they are trained without name labels by propagatingmany images through the layers and adjusting the weights withSTDP.Next, unlike what happens within our hypothesis, in Kher-adpisheh’s model [12], the trained output of this final layer isused to train a linear Support Vector Machine (SVM) classifier.The SVM classifier is of course not biologically plausible butthe goal of Kheradpisheh et al. was only to assess the ability ofSNNs and STDP to extract salient visual features that are goodenough to discriminate images. And they actually found thatthey were good enough in terms of classification accuracy:their implementation reached 99%, and 98.4% of accuracyin the face/motorbike and MNIST datasets, respectively. Wereproduced their results using Perez’s available implemen-tation [14]. After some search of best parameters[15], wereached around of accuracy on the face/motorbike datasetwith an SVM classifier.Finally, worth mentioning, Mozafari et al. [13] proposeda 4 layers SNN with STDP, whose classification last layeris trained this time using reinforcement learning, instead ofthe SVM classifier. Their final layer is a decision makinglayer that performs a global pooling operation. Each neuronin it is assigned to a category and the neuron which firesfirst indicates the network decision. This work, unlike theprevious one does not thus rely on an external, biologicallynot plausible, classifier. Indeed, weight change in the last layeris modulated by a reward/punishment signal which dependson the correctness/incorrectness of the network’s decision.However, the paper lacks plausibility in that it does notanswer the challenging question of how and who generatesthe reward and punishment signals and more crucially: howdoes it “know” which neurons to punish and which neuron often used to grossly approximate the spatial visual processing in theretina. o “reward”. Later, we present instead an end-to-end model,at the neuronal level, from learning associations to naming.Indeed, we postulate that the simple repeated co-occurence ofsignifier and signified is enough to tie them together as inpassive learning, and this, in a bidirectional way. Thus in ourframework, first, no unknown reward signal or mechanism isneeded. Second, the naming of the object does not implicatea feed-forward mechanism but rather the backpropagation ofaction potentials. Such backpropagation allows to map thesignifier to the signified in a bidrectional way: retriveingthe signifier from the signified (naming) and vice versa, thesignified from the signifier (understanding). B. Plausibility of the above SNN models
Next, before using Kheradpisheh’s model [12] as a basis, webriefly discuss its (lack of) plausibility, as this allows us to laterbetter gauge the plausibility of our hypothesis. In a nutshell,we are aware that the SNN model above lack plausibility inmany aspects, despite being unsupervised, and despite usinga simple rule like STDP learning. For example, it uses onlya spike-time neural coding. It also uses convolutional neuralnetworks (CNNs) with weight sharing, which is biologicallynot plausible. However, all these “problems” do not impact ourhypothesis since we are mainly interested in the last layersof the neural networks (where the retrograde signaling orbackpropagation of the action potentials will actually initiate).Besides, and very interestingly, recent work [106] has shownthat training using “properly translated data” such as the caseof “correlated” images in video relieves the need of using CNNweight sharing, and results in an approximate form of it.Our position here is as follows. The above simple neuralnetwork model trained only with unsupervised STDP learningachieves good performance on what was 20 years ago adifficult problem. This means that it extracts visual featuresof fairly good quality. The latter are of course far frombeing perfect: the accuracy does not reach 100% even on thesimplest motor/face dataset. Nonetheless, we set to verify ifour hypothesis with STDP learning can successfully use thesesame features to find the right name association. For fairness,we compare our results to those of the SVM classifier.
C. Our model
Under our hypothesis, successful name association com-prises three steps, two for learning and one for recollection, asmodeled in Fig. 2. The first feature learning step is completelyunsupervised and learns through repeated exposure to visualstimuli to extract salient features (e.g. lines, shapes etc) todiscriminate visual content. In the brain, such learning issupposed to happen early in life. And if, for some reasonor another, one is not exposed to visual stimuli, such learningdoes not happen, which leads to cortical blindness. As alreadymentioned, we reuse the SNN model described above [12] tomodel it.The second step is a semi-supervised learning one, wherebya teacher shows a learner an image and the right name thatrefers to it. We call this co-occurence learning . Humans for instance can learn from a single example to map a new objectto its new name. Sometimes, when no external reinforcementhappens, the exposition needs to be repeated multiple timesuntil it is remembered. We model this step by simply adding anew “categories layer” and emulating the right spike each timean image is propagated through the SNN, i.e. generate a spikefor “cat” category neuron while the cat image is propagatedthrough the SNN.The last step is simply the recollection of the name. It isin this step that a retrograde signaling from all neurons in thebackpropagation root layer is sent backwards to the categorieslayer. The neuron(s) which receive the highest “vote” signalthe network decision. We now describe the three steps ofFigure 2 in more details.
D. Feature Learning
For feature learning, we used almost as-is the SNN modelabove [12] and the reader can refer to that reference for moredetails. This phase starts with the input image which beingencoded into discrete spike events in the temporal domain.The encoding process is performed by using Difference ofGaussian (DoG) filters. Spike times are then computed ac-cording to the output of the DoG filter. More precisely, let r be the value at a certain index after having applied the DoGfilter. Then, the firing time t is defined to be t = r . Thiscorresponds in encoding higher contrast areas of the imageto lower spike times (i.e., latency is inversely proportionalto the constrast). As a result of making the above processdiscrete, each single image is transformed into several wavesof spikes that propagate, one by one, through the layers:spikes that signal higher contrast areas being the first to enterthe network. Next, Convolutional layers are arranged in afeedforward manner. Between two consecutive convolutionallayers, a pooling layer performs a max operation to compressvisual data and provide translation invariance. The task of aneuron in a pooling layer simply consists in propagating thefirst spike received from a receptive window of the previousconvolutional layer. Neurons in all the convolutional layersare non-leaky integrate and fire neurons. They integrate inputspikes and emit a spike as soon as they reach their threshold.The latter is a hyperparameters to set. Immediately after aspike occurs, weights are updated accordingly, using a simpli-fied version of the STDP learning rule. Let i, j be the indicesof the post and pre-synaptic neurons, respectively. Also, letus define t i , t j to be their corresponding spike times. Thesynaptic weight w ij is updated by adding a modification factor ∆ ij computed as follows according to a simplified version ofSTDP [12], [107]. ∆ ij = (cid:40) α + · w ij · (1 − w ij ) , if t j ≤ t i − α − · w ij · (1 − w ij ) , if t j > t i (1) α + , α − ∈ R ≥ are two parameters that specify the learningrate or by how much the weights are changed. The latter factorimpacts a lot the learning. Indeed, small values would lead toa slow learning process, they simulate a neural network that isconfident in its prior “beliefs and decisions” (weights). Highalues allow to learn very fast information about the currentstimuli, but they can have as a consequence to “forget” whatthey learned with previous stimuli. Note that this simplifiedversion of the STDP rule does not take into account theabsolute time difference between post and pre-synaptic spikes.What matters instead is only the order, or the sign of thedifference. In practise, this is not a problem for our model.This feature learning process goes on, by propagating train-ing images one by one. Each time a new image has been fullyprocessed , and weights updated and stored, the potential ofeach neuron is reset to 0, preparing for the next image. Initially,the synaptic weights are chosen at random from a normaldistribution with some mean and standard deviation . TheSTDP rule ensures that they always remain in the range [0 , .Within each image, learning is done layer by layer: learningat layer (cid:96) begins when learning at layer (cid:96) − has terminated.The intent of this feature learning phase is to learn thesynaptic weights of each neuron in all the convolutionallayers. As observed previously with this SNN model [12],neurons in the first layer converged to the simple four orientededges. Neurons in the successive layers learned more complexones by integrating spikes from previous layers. We stressthat this phase is totally unsupervised as the network onlylearns frequent features associated with images and it requiresno knowledge about the input image categories. The nextlearning step includes these categories. We qualify it as semi-supervised. E. Co-occurrence learning
Once the SNN has learned the right weights and hencevisual features, the second co-occurence learning step canbegin. For this step, we assume as per our hypothesis, thatthere is a layer which encodes the object categories or names.The latter is connected, as shown in the figure, to the lastlayer of the image processing network. Then, an image (e.g.a cat as illustrated in the figure) is propagated through theimage processing neural network, while at the same time, theneuron which represents its signifier or name is activatedsimultaneously.In more details, as in the previous phase, train images areconsidered one by one. Using the weights learned in the firstphase, each image passes through the network until it reachesthe last layer where a max pooling operation is performed.During co-occurence learning, a neuron of the last poolinglayer would thus receive two spikes: one propagated by theneuron associated to the class of the image, and one inputfrom the last convolutional layer. This simulates the teacherthat simultaneously shows the image and its name.Note that, implementation-wise [15], this is equivalent tohaving a matrix with the same shape as the last pooling layer remember that each image, because of DoG-based temporal encoding,results in multiple “waves” of spiking trains that are propagated sequentiallythrough the layers In practise, we try different initializations [15] and pick the best we simulate a single neuron for simplicity, but same reasoning appliesmultiple ones Fig. 2: Learning under our hypothesis, simulated with theproposed SDNN with its three main parts.that is associated to each image category (one weight percategory neuron per neuron in the backpropagation layer). Asin the first phase, weights are initially random and are updatedonly using the STDP learning rule, as defined in (1). Then,according to the order of post and pre-synaptic spike times andto the index of the spiking neuron, what will happen duringlearning is the following: the weight matrix of the right imagecategory is strengthened (LTP), while the weight matrices ofthe other categories will be weakened (LTD).Contrarily to the previous phase, this phase is supervised asit requires knowledge of the image category in order to be ableto link it with the corresponding image features. Indeed, theim here is to learn associations between the features learnedin the previous phase and image categories by using onlythe simple STDP rule. At the end of this phase, training hascompleted and the SNN can proceed to the naming task usingthe backpropagation principle.One extreme version of this second phase is what is calledone-shot learning: the network is given only a single exampleof each category. We will vary the number of such trainingexamples in Sec. V, effectively trying one-shot and few-shotlearning scenarios, hence why we consider this task to be semi-supervised.
F. Naming
Once the learning is done with the previous two phases, weare ready now for the naming task, following the principles ofbackpropagated action potentials.In this task, the image to name is first propagated throughthe fully trained neural network until spikes start to happen inthe last pooling layer, which is our backpropagation root layer.We consider that all neurons in this layer are source pointerneurons. This means, as per our hypothesis, that we allowthem if they fire, to send a backpropagated action potentials,modulated by the presynaptic weights learned in the previoussecond step, to the previous layer that encodes the labelsor names. Neurons in the “categories/signifiers” layer willintegrate such received signals and the category which hasthe highest vote is the retained name for the image. Namely,let C i for i = 1 , .., k be the class associated to the neuronwith the highest class score. Then, i is chosen to be the classthe image belongs to. It is this score that can be used as theaccumulated potential that brings the right neuron closer to itsfiring threshold, leading when it fires to the class decision .We show later in the next section that such a simple mech-anism allows to label the images as accurately as the SVMclassifier. More interestingly, by using high learning ratesduring the first co-occurence (e.g. neuromodulating to increasesynaptic strength), it is possible to learn to name objects, withmaximum accuracy, by showing the neural network only asingle instance of the image class; an extreme learning task inwhich the SVM classifier seems to have more difficulty.V. S IMULATION RESULTS
We now evaluate the accuracy of the Spiking Neural Net-work when using our hypothesis to learn and name, andcompare it to that of the SNN model followed by the SVMclassifier.
A. Experimental Setup
Experiments have been performed on a server with 5 Intel2.10 GHz CPU, 32 GB of memory and a GPU Nvidia TeslaP100 SXM2 with 16 GB of dedicated memory. We evaluate the In practise, implementation-wise, the score is simply the sum of weights.Indeed, the SNN model considers a spike to be a binary decision that happenswhen the accumulated potential reaches a threshold (i.e. we do not considerthe rate). We tried a version that uses the exact value of the internal potentialinstead of the unit value 1 but results were similar.
Motorbike FaceImage F a ce ( δ < ) M o t o r b i k e ( δ > ) Train
MotorbikeFace Motorbike FaceImage F a ce ( δ < ) M o t o r b i k e ( δ > ) Test
Fig. 3: Difference between the class scores for each image inthe Train (Left) and Test (Right) datasets.accuracy reached by our model using the Caltech motor facedataset [108] considering two classes: Faces and Motorbikes.
B. Overall accuracy with backpropagated APs
We first focus on the case where we show the SNN manyexamples of each class. In particular, for each class we select398 images among which 200 are reserved for training (forfeature learning and co-occurence learning likewise) and 198are left for testing. After a search of parameters [15], we set α + and α − to . and . , respectively. The thresholdsfor the first, second, and third convolutional layers are set to , , and . Max pooling is not performed for this dataset.This means that in our last layer we use a pooling window ofsize x . This is because images in the Caltech dataset havelow resolutions. The synaptic weights of the class matrices arechosen at random from a normal distribution with mean . and standard deviation . . In this setting, using the backpropagation-based recollec-tion, we reach an accuracy of . and . on the trainand test datasets, respectively. This performance is on par withthat of an SVM classifier, as per the original SNN we build on,and confirms the computational efficiency of backpropagatedaction potentials. In more details, Fig. 3 shows the class scores for boththe train and test datasets. Each point corresponds to animage and its value in the ordinate represents the differencebetween the scores associated to the Motorbike and Faceclasses, respectively. Therefore, images with positive valuesare associated to the Motorbike class, while images withnegative values are associated to the Face one. As it can beobserved, backpropgated action potentials allow to separate thetwo classes in a net way for most of the images. However, forsome of the images this distinction is not clear. We believe aswe discussed earlier that the problem comes from the SNNfeature extractor which, although performing, does not yetlearn good representations. Nonetheless, only few images areclassified incorrectly with a large relative error. This showsoverall the computational plausibility of the backpropagation-based recollection.
25 50 75 100 125 150 175 200 . . . . . . . . A cc u r a c y Train Test
Fig. 4: Accuracy as a function of the number of train imagesper category used in the Co-occurrence Learning phase.
C. Accuracy in few-shot learning
We now focus on the case where the “teacher” shows theSNN only few examples.
Varying the Number of Images in co-occurence learning
We first vary the number of training images in the first co-occurence based learning phase. In the remainder, wheneverwe talk about training, we refer to the supervised “teacher-based” co-occurence learning where a label is given. Fig.4shows the accuracy on the train and test datasets as a functionof the number of labeled train images. We vary the numberof labeled train images per experiment from to . Asthe number of train images increases, the accuracies reachedon both the train and test images increase as well. The trainand test scores pass from and after 25 imagesper category up to the . and . , from above, afterhaving used the full train dataset. This shows that the SNNlearns but slowly as we feed images and labels. One way tospeed up this process is to increase the learning rates. Varying the learning rate.
Increasing the learning rate cansimulate a neuromodulatory action that strengthens a connec-tion suddenly, without the need for repeated exposure. Wehence vary α + and α − used in the co-occurrence phase andobserve a considerable impact on the accuracy. Fig. 5 illus-trates the impact of the modulated learning rate on the accu-racy level reached as a function of the number of train imagesused. As a baseline, we use α + = 0 . , α − = 0 . and wemultiply both values by some factor λ ∈ { , , , , , } .When λ < , the learning is obviously slower. Indeed, theaccuracy varies from with training images (i.e. random),to using the entire train dataset and grows in a linear way.It will probably reach higher values with more training time.When λ = 1 , we reach the maximum possible test score usingall the train images.An interesting behaviour can be observed when λ > . Thelearning is at first faster, as it can reach high accuracy afterhaving seen only a small sample of train images. However, itthen starts decreasing as the number of train images increases.We recall that the used STDP rule keeps the weights withinthe range [0 , . Thus, starting with high values for α + and . . . . . . . . A cc u r a c y λ = 0 . λ = 1 λ = 2 λ = 3 λ = 5 λ = 10 Fig. 5: Accuracy on the test images as a function of thenumber of train images per category used in the Co-occurrenceLearning phase (increased number of shots in a few-shotlearning task). α − allows to faster associate discriminant features with theimages categories. However, as the number of train imagesincreases, weights might become less helpful if they tendto reach the maximum value of and to have thus smallerintermediate values. The scores would become closer andless distinguishable. Another factor is that, as we will seelater, the SNN is better at recognizing and learning fromcertain particular images compared to others (see descriptionof Fig. 6). Hence, being exposed to a good image with a higherlearning rate leads to reaching a good accuracy. But beingexposed afterwards a “bad” image will lead to unlearning thegood weights, thus decreasing the performance.When λ = 10 and with only train images per category,it is possible to reach an accuracy of . . With λ =2 , , and , the number of necessary train images per categoryto reach the same accuracy are , , and , respectively.These results hint to the following direction: the best approachin terms of few-shot learning would be to first start with ahigh learning rate or λ but then to stop changing the weightsby either suddenly decreasing λ , or by simply freezing thelearning and making the network always stick to the oldbeliefs. D. One-shot learning: Machine learning vs. Backproagation
In this final section, we set the bar high and propose totrain and test the SNN in a one shot learning task, meaningthat we show the neural network only one single image fromeach class, together with their correct names. We then test theaccuracy of the network on the entire 198 images of the testset. We compare SVM and our model on this task.As explored earlier, reaching good performance in this taskgoes through even higher learning rates than tried previously.We experiment with various λ s and various (motorbike,face)image couples. We find for instance that λ = 65 yieldedgood results among many other values, so we pick it. Butthis is where we interestingly discovered that the performancedepended incredibly on which couple of (motorbike,face) wereused for the one-shot learning task. We found that, using the .2 0.4 0.6 0.8 1.0Accuracy0.00.20.40.60.81.0 C u m u l a t i v e f r a c t i o n o f i m ag e p a i r s Backpropagated Action PotentialMachine learning (SVM)
Fig. 6: Comparison of ML and our hypothesis in one-shotlearning task. CDF across 2000 different pairs of (motor,face)pictures picked for training.backpropagation-based recollection, certain single couples ofmotorbike and face yielded an accuracy of 96.2% on all theremaining unseen 198 test images. Note that this is higher thanwhat we achieved earlier when training with all images andnot only one or few shots. At the same time, the maximumwe could achieve with the SVM classifier on a single examplewas 84.5% of accuracy.To assess this more systematically, we test both SVM andbackpropagation in the one-shot exercise on around 1500different image couples of motorbike and face photos. We plotin Fig. 6 the resulting empirical cumulative distribution func-tions. The figure shows that both approaches yield differentdistributions, with most of SVM results being less dispersedaround a little higher than 80% accuracy.
We conclude that,with the right images used for training, backpropagationoutperforms by far the SVM in the one shot task (96.2%accuracy on 198 images against only 84.5%).
Note that this curious result suggests also that the SNNmodels are still not good enough in feature extraction. Thelearned representations are probably not as invariant as theyshould be, hence the differences between images. Furtherinvestigating the differences between these successful andless successful image couples might help enhancing currentSNN models. For these reasons, we believe that our sim-ulations should be seen as another argument to show onlythe computational effectiveness of the backpropagation-basedrecollection mechanism.
VI. D
ISCUSSION
For more than a century [109], information processing inthe brain has been widely believed to follow mainly theforward pre to post-synaptic neurons direction. In this work,we emitted the hypothesis that the backpropagation of actionpotentials mediates all “offline” generative tasks where thesimultaneous activation of specific targeted populations ofneurons is needed. This is, we claimed, the case of the retrievalof past memories or mental images, retrieval of signification manual inspection of “good” and “bad” images did not uncover anypeculiar character to distinguish them of words, retrieval of names and even the mixture of distinctpast memories into imagination. We reviewed in sec. III-Aabundant evidence that calls for giving the hypothesis achance. As an added bonus, we showed in sec.V that ourhypothesis can be as, or even more, efficient than a machinelearning algorithm in retrieving the category name of an object.If this hypothesis is confirmed true, it would have tremendousimplications, by considerably improving our understanding ofneural encoding and high cognitive functions from a low-levelneural perspective. A. Possible implications
The first big implication of this hypothesis is the promise tobring answers to the neural encoding problem and close the olddebate in cognitive sciences between localist and distributedrepresentation theories. If our hypothesis is true, the answer tothe representation problem becomes simple: (i) representationsof concepts are distributed but at the same time, (ii) thereexists highly selective neurons that respond to unique concepts.The latter serve as hubs between various related concepts,playing the role of source pointer neurons to retrieve the entireconcept’s features encoded by an entire population of neurons.For example, there should exist relatively few neurons thatuniquely respond to the image of a cat. But such neuronsserve as hubs to easily connect the cat concept to relatedmemories. Such sparse neurons act as pointers to retrieve thevisual features of a real cat through backpropagated actionpotentials: the latter travel backwards to reactivate selectivelythe neurons that represent the right lines and shapes andcolours that define a cat in a statistical way. Hence, to beable to recall through mental visual imagery the image ofa cat, the brain does not need to activate only the sparseneurons that respond to all cats, but, as the optogenetic studiesabove also hint to, the activation of an entire larger populationis needed. And if neurons that represent, say, vertical lines,are not activated during the process, the recalled image willphenomenologically lack them.Beyond vision, the cat’s toy example could apply to anymental state and any couple of (lived stimuli, later memory ofthe stimuli) , let them be smells, affects or even impressionsof movements. In accordance with the principle of groundedcognition [110], for which we believe our hypothesis applies,discrete concepts are grounded in the sensorimotor experiencesthat were encoded with them, such that the activation of asignifier of a concept leads to the activation of the experiencesthat are grounded with it. Hence, the cue that is the words“moving” or “tickling” is correlated with areas that encodeactual moving or tickling.Another related hard problem in cognitive sciences that canbenefit from our hypothesis is the binding problem or howdoes the brain binds higher-level concepts to more elementaryones and particularly, how does it associate the right features(e.g. colors) to the right discrete objects or concepts, forexample in an image composed of many objects. The factthat the brain needs some time to correctly perform the bind-ing [111] suggests that this operation is not forward-based buteneratively and iteratively happens later in a second stage. Ifour hypothesis is correct, this should happen through slow andrepetitive runs of top down action potential backpropagation,starting from the right source pointer neurons that defineuniquely the discrete object, all the way backwards, activatingall the neurons that describe its attributes. Actually, twofamous competing (and high level) theories on this problem arethe feature-integration theory [112] and temporal synchroniza-tion theory [113], [114]. Our hypothesis could reconcile themas well. Indeed, both admit the involvement of different runs ofbottom up and top down hierarchies (e.g. attention in featureintegration theory) to implement the binding problem. How-ever, the exact physical mechanisms that implement this werestill unknown. Our hypothesis suggests that these top downhierarchies to bind objects to their features are implementedthrough backpropagated APs. Under this new realm, one doesnot have to chose exclusively between binding-by-synchronyand feature-integration theory: attention with backpropagatedAPs could synchronously activate selectively all the featuresrelated to a given discrete object.This leads us to another closely related implication whichis that attention itself is likely implemented through topdown backpropagated APs. In general, backpropagation canconstitute an easy-to-implement unique and simple mecha-nism that underlies a diverse set of tasks: offline thinkingor mind wandering, imagination, episodic memory retrievaland future episodic thinking. In our framework, imaginationbecomes “easy” to apprehend and would be simpoly theresulting activation patterns of a mixture of usually unrelatedconcepts: for example, an imagined “laughing cat” results fromsimultaneously top-down back-activating the “laughing” and”cat” concepts. The same applies for mind wandering, wherebackpropagated APs should induce activation patterns on “themost likely neural pathways” generating what seems to becoherent thoughts.All this predicts the existence of generators, or specializedcenters, that release neuromodulators remotely to control thegeneration by either inhibiting or facilitating backpropagation.This is where more advanced modeling and computer simula-tion work can help tremendously in future work. For example,it would be helpful to understand the interplay betweenbackpropagation and forward propagation, since the first canresult as well in feedforward propagation, that in turn mightcause backpropagation etc. Such “lateral” activation patternscould be useful to find associated concepts, as opposed to“digging into the details” of a single concept.Finally, if proven true, backpropagated APs could opennew ways to better understand some pathological unintentionalrecollections of memories. If so, understanding what factorsimpact the inhibition or excitation of backpropagating APs,could open the way towards understanding possible relateddisorders which might involve obsessive thinking or intrusivethoughts. Other related rarer dysfunctions happen in mentalvisual imagery as well. Examples are the absence (aphantasia)or excess (hyperphantasia) of visual imagery experiences [4].For example, the neuromodulation mechanisms that control backpropagation should be first suspected.Last but not least, one implication is that the retrievalprocess is stochastic in nature. The retrieved memory tracelooks like the original first perception but, depending on pastexperiences (and hence the weights of the neural connectionsdue to past experiences), the reactivation might not be exactlythe same. This can be seen most in the case of language wherethe same word (e.g. “a screen”) was seen multiple times duringencoding, in the presence of multiple similar stimuli (manydifferent types of screens), and where the same concept isfurther grounded in different neural network connections fromone subject to the other. Finally, if the framework defendedby our hypothesis is correct, language can be seen as acommon cue-based system useful to make the others livesimilar experiences to ours.
B. Verifyability and further investigations
Before drawing a bright future to our hypothesis, twodirections must be seriously taken to further investigate it andconfirm its plausibility (or infirm it).
1) Empirical methods:
We verified in the literature theexistence of retrograde signals that satisfy some of the as-sumptions of our hypothesis. Further targeted empirical studiescan further verify the remaining ones. First, it is crucial tounderstand if the backpropagation signal is stronger on pathswith “higher synaptic weights”. Second, it is necessary tomeasure how far reaching the backpropagation signal can be,beyond solely the previous connection.
2) Computational effectiveness:
In addition to the aboveempirical methods, one line of work could be to verify inparallel the computational effectiveness of this hypothesisin implementing its target goals. In this work, we verifyas a first step the ability of retrograde action potentials toperform the object recognition or the naming task, that isthe retrieval of the class of the object once the stimuli ispresented. This allows us to assess the computational power ofthis mechanism, compared to other less biologically plausibleones. We opted for this comparison because of the presenceof a baseline to compare to (an existing image classifier) anda metric (accuarcy) to quantify the computational power ofthe mechanism. Not simulated in this work, retrograde APscan be used symmetrically for the task of “understanding”:i.e. an activation of the signifier (word) neuron that leadsautomatically to the activation of the signified concept andits, say, visual features. This needs however, to rely onArtificial Neural networks that have good and plausible featureextraction capabilities. The SNNs we use are promising andclose but they do not yet fully satisfy the last property (seeSec. IV-B).Finally, if artificial backward reconstruction works well, ourhypothesis could be also computationally verifiable at least intheory, for the imagination aspect. One interesting experimentcould be to train Spiking neural networks to recognize two sep-arate concepts from images, exactly as we do for ”Motor” and”Face” in Sec. V. Then, instead of backward constructing onlyone concept at a time, it would be interesting to simultaneouslyctivate two concepts and see the effect on the backwardreconstructed images. To go back to the example above, onecould activate a concept like ”laughing” and a concept like”cat” and visualize the results of the “competition” betweenbackpropagating signals on the backward-constructed images.Similar work can be done with today’s state of the art deepneural networks such as GPT-3’s DALL·E [115] which usesa transformer decoder architecture. But the latter employssupervised mechanisms that lack biological plausibility [9],[10], [11]. A
CKNOWLEDGEMENT
The strongest acknowledgment should go to Dr. AndreaTomassilli who insisted despite being invited as co-author thathis contribution deserves to be only in the acknowledgment.Andrea Tomassilli executed the first modification to Perez’simplementation, allowing the simulation of the hypothesis andthe automation of the experiments and their analysis. We aregrateful to Dr. Alessandro Finamore for interesting feedbackon an earlier draft of the paper.R
EFERENCES[1] F. De Saussure,
Course in general linguistics . New York: McGraw-Hill., 1959.[2] R. Q. Quiroga, L. Reddy, G. Kreiman, C. Koch, and I. Fried, “Invariantvisual representation by single neurons in the human brain,”
Nature ,vol. 435, no. 7045, pp. 1102–1107, 2005.[3] C. E. Connor, “Friends and grandmothers,”
Nature , vol. 435, no. 7045,pp. 1036–1037, 2005.[4] J. Pearson, “The human imagination: the cognitive neuroscience ofvisual mental imagery,”
Nature Reviews Neuroscience , vol. 20, no. 10,pp. 624–634, 2019.[5] N. Dijkstra, P. Zeidman, S. Ondobaka, M. A. van Gerven, and K. Fris-ton, “Distinct top-down and bottom-up brain connectivity during visualperception and imagery,”
Scientific reports , vol. 7, no. 1, pp. 1–9, 2017.[6] D. Dentico, B. L. Cheung, J.-Y. Chang, J. Guokas, M. Boly, G. Tononi,and B. Van Veen, “Reversal of cortical information flow during visualimagery as compared to visual perception,”
Neuroimage , vol. 100,pp. 237–243, 2014.[7] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” in ,2014.[8] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,”in
Advances in neural information processing systems , pp. 2672–2680,2014.[9] S. Grossberg, “Competitive learning: From interactive activation toadaptive resonance,”
Cognitive science , vol. 11, no. 1, pp. 23–63, 1987.[10] F. Crick, “The recent excitement about neural networks,”
Nature ,vol. 337, no. 6203, pp. 129–132, 1989.[11] J. C. Whittington and R. Bogacz, “Theories of error back-propagationin the brain,”
Trends in cognitive sciences , 2019.[12] S. R. Kheradpisheh, M. Ganjtabesh, S. J. Thorpe, and T. Masquelier,“Stdp-based spiking deep convolutional neural networks for objectrecognition,”
Neural Networks , vol. 99, pp. 56–67, 2018.[13] M. Mozafari, S. R. Kheradpisheh, T. Masquelier, A. Nowzari-Dalini,and M. Ganjtabesh, “First-spike-based visual categorization usingreward-modulated stdp,”
IEEE Transactions on Neural Networks andLearning Systems , 2018.[14] N. Perez-Nieves, “Sdnn python.” https://github.com/npvoid/SDNNpython. Accessed: 2020-11-08.[15] “Backpropagation-based recollection hypothesis code.” https://github.com/bendiogene/recollection hypothesis. Accessed: 2021-01-10.[16] H. Markram, J. L¨ubke, M. Frotscher, and B. Sakmann, “Regulationof synaptic efficacy by coincidence of postsynaptic aps and epsps,”
Science , vol. 275, no. 5297, pp. 213–215, 1997. [17] N. Kruger, P. Janssen, S. Kalkan, M. Lappe, A. Leonardis, J. Piater,A. J. Rodriguez-Sanchez, and L. Wiskott, “Deep hierarchies in theprimate visual cortex: What can we learn for computer vision?,”
IEEEtransactions on pattern analysis and machine intelligence , vol. 35,no. 8, pp. 1847–1871, 2012.[18] E. Tulving and D. M. Thomson, “Encoding specificity and retrievalprocesses in episodic memory.,”
Psychological review , vol. 80, no. 5,p. 352, 1973.[19] J. S. Bowers, “On the biological plausibility of grandmother cells:implications for neural network theories in psychology and neuro-science.,”
Psychological review , vol. 116, no. 1, p. 220, 2009.[20] J. S. Bowers, “Grandmother cells and localist representations: a reviewof current thinking,”
Language, Cognition and Neuroscience , vol. 32,no. 3, pp. 257–273, 2017.[21] K. Patterson, P. J. Nestor, and T. T. Rogers, “Where do you know whatyou know? the representation of semantic knowledge in the humanbrain,”
Nature reviews neuroscience , vol. 8, no. 12, pp. 976–987, 2007.[22] K. Svoboda, W. Denk, D. Kleinfeld, and D. W. Tank, “In vivo dendriticcalcium dynamics in neocortical pyramidal neurons,”
Nature , vol. 385,no. 6612, pp. 161–165, 1997.[23] Y. Bereshpolova, Y. Amitai, A. G. Gusev, C. R. Stoelzel, and H. A.Swadlow, “Dendritic backpropagation and the state of the awakeneocortex,”
Journal of Neuroscience , vol. 27, no. 35, pp. 9392–9399,2007.[24] G. Buzsaki and A. Kandel, “Somadendritic backpropagation of actionpotentials in cortical pyramidal cells of the awake rat,”
Journal ofneurophysiology , vol. 79, no. 3, pp. 1587–1591, 1998.[25] G. Stuart, N. Spruston, B. Sakmann, and M. H¨ausser, “Action potentialinitiation and backpropagation in neurons of the mammalian cns,”
Trends in neurosciences , vol. 20, no. 3, pp. 125–131, 1997.[26] P. Vetter, A. Roth, and M. Hausser, “Propagation of action potentialsin dendrites depends on dendritic morphology,”
Journal of neurophys-iology , vol. 85, no. 2, pp. 926–937, 2001.[27] J. Waters, A. Schaefer, and B. Sakmann, “Backpropagating actionpotentials in neurones: measurement, mechanisms and potential func-tions,”
Progress in biophysics and molecular biology , vol. 87, no. 1,pp. 145–170, 2005.[28] S. R. Williams and G. J. Stuart, “Action potential backpropagationand somato-dendritic distribution of ion channels in thalamocorticalneurons,”
Journal of Neuroscience , vol. 20, no. 4, pp. 1307–1317, 2000.[29] S. R. Williams and G. J. Stuart, “Backpropagation of physiologicalspike trains in neocortical pyramidal neurons: implications for tem-poral coding in dendrites,”
Journal of Neuroscience , vol. 20, no. 22,pp. 8238–8246, 2000.[30] H. W. Tao and M.-m. Poo, “Retrograde signaling at central synapses,”
Proceedings of the National Academy of Sciences , vol. 98, no. 20,pp. 11009–11015, 2001.[31] H. Tsubokawa and W. N. Ross, “Muscarinic modulation of spikebackpropagation in the apical dendrites of hippocampal ca1 pyramidalneurons,”
Journal of Neuroscience , vol. 17, no. 15, pp. 5782–5791,1997.[32] D. A. Hoffman and D. Johnston, “Neuromodulation of dendritic actionpotentials,”
Journal of neurophysiology , vol. 81, no. 1, pp. 408–411,1999.[33] E. Tulving and Z. Pearlstone, “Availability versus accessibility ofinformation in memory for words,”
Journal of Verbal Learning andVerbal Behavior , vol. 5, no. 4, pp. 381–391, 1966.[34] E. Tulving et al. , “Episodic and semantic memory,”
Organization ofmemory , vol. 1, pp. 381–403, 1972.[35] P. W. Frankland, S. A. Josselyn, and S. K¨ohler, “The neurobiologicalfoundation of memory retrieval,”
Nature neuroscience , vol. 22, no. 10,pp. 1576–1585, 2019.[36] E. Tulving, “Ecphoric processes in episodic memory,”
PhilosophicalTransactions of the Royal Society of London. B, Biological Sciences ,vol. 302, no. 1110, pp. 361–371, 1983.[37] D. L. Schacter, J. E. Eich, and E. Tulving, “Richard semon’s theoryof memory,”
Journal of Verbal Learning and Verbal Behavior , vol. 17,no. 6, pp. 721–743, 1978.[38] J. S. Nairne, “The myth of the encoding-retrieval match,”
Memory ,vol. 10, no. 5-6, pp. 389–395, 2002.[39] M. Poirier, J. S. Nairne, C. Morin, F. G. Zimmermann, K. Kout-meridou, and J. Fowler, “Memory as discrimination: A challengeto the encoding–retrieval match principle.,”
Journal of Experimentalsychology: Learning, Memory, and Cognition , vol. 38, no. 1, p. 16,2012.[40] W. D. Goh and S. H. Lu, “Testing the myth of the encoding–retrievalmatch,”
Memory & cognition , vol. 40, no. 1, pp. 28–39, 2012.[41] K. Z. Tanaka, A. Pevzner, A. B. Hamidi, Y. Nakazawa, J. Graham,and B. J. Wiltgen, “Cortical representations are reinstated by thehippocampus during memory retrieval,”
Neuron , vol. 84, no. 2, pp. 347–354, 2014.[42] X. Liu, S. Ramirez, P. T. Pang, C. B. Puryear, A. Govindarajan, K. Deis-seroth, and S. Tonegawa, “Optogenetic stimulation of a hippocampalengram activates fear memory recall,”
Nature , vol. 484, no. 7394,pp. 381–385, 2012.[43] G. Vetere, L. M. Tran, S. Moberg, P. E. Steadman, L. Restivo, F. G.Morrison, K. J. Ressler, S. A. Josselyn, and P. W. Frankland, “Memoryformation in the absence of experience,”
Nature neuroscience , vol. 22,no. 6, pp. 933–940, 2019.[44] S. M. Smith and E. Vela, “Environmental context-dependent memory:A review and meta-analysis,”
Psychonomic bulletin & review , vol. 8,no. 2, pp. 203–220, 2001.[45] E. Eich, “Mood as a mediator of place dependent memory.,”
Journalof Experimental Psychology: General , vol. 124, no. 3, p. 293, 1995.[46] C. A. Denny, M. A. Kheirbek, E. L. Alba, K. F. Tanaka, R. A.Brachman, K. B. Laughman, N. K. Tomm, G. F. Turi, A. Losonczy,and R. Hen, “Hippocampal memory traces are differentially modulatedby experience, time, and adult neurogenesis,”
Neuron , vol. 83, no. 1,pp. 189–201, 2014.[47] L. G. Reijmers, B. L. Perkins, N. Matsuo, and M. Mayford, “Local-ization of a stable neural correlate of associative memory,”
Science ,vol. 317, no. 5842, pp. 1230–1233, 2007.[48] A. T. Sørensen, Y. A. Cooper, M. V. Baratta, F.-J. Weng, Y. Zhang,K. Ramamoorthi, R. Fropf, E. LaVerriere, J. Xue, A. Young, et al. , “Arobust activity marking system for exploring active neuronal ensem-bles,”
Elife , vol. 5, p. e13918, 2016.[49] A. F. Lacagnina, E. T. Brockway, C. R. Crovetti, F. Shue, M. J.McCarty, K. P. Sattler, S. C. Lim, S. L. Santos, C. A. Denny, and M. R.Drew, “Distinct hippocampal engrams control extinction and relapse offear memory,”
Nature neuroscience , vol. 22, no. 5, pp. 753–761, 2019.[50] O. Khalaf, S. Resch, L. Dixsaut, V. Gorden, L. Glauser, and J. Gr¨aff,“Reactivation of recall-induced neurons contributes to remote fearmemory attenuation,”
Science , vol. 360, no. 6394, pp. 1239–1242,2018.[51] S. Ramirez, X. Liu, P.-A. Lin, J. Suh, M. Pignatelli, R. L. Redondo, T. J.Ryan, and S. Tonegawa, “Creating a false memory in the hippocampus,”
Science , vol. 341, no. 6144, pp. 387–391, 2013.[52] K. K. Tayler, K. Z. Tanaka, L. G. Reijmers, and B. J. Wiltgen,“Reactivation of neural ensembles during the retrieval of recent andremote memory,”
Current Biology , vol. 23, no. 2, pp. 99–106, 2013.[53] A. Guskjolen, J. W. Kenney, J. de la Parra, B.-r. A. Yeung, S. A.Josselyn, and P. W. Frankland, “Recovery of “lost” infant memories inmice,”
Current Biology , vol. 28, no. 14, pp. 2283–2290, 2018.[54] G. T. Waldhauser, V. Braun, and S. Hanslmayr, “Episodic memoryretrieval functionally relies on very rapid reactivation of sensoryinformation,”
Journal of Neuroscience , vol. 36, no. 1, pp. 251–260,2016.[55] A. Jafarpour, L. Fuentemilla, A. J. Horner, W. Penny, and E. Duzel,“Replay of very early encoding representations during recollection,”
Journal of Neuroscience , vol. 34, no. 1, pp. 242–248, 2014.[56] J. D. Johnson, S. G. McDuff, M. D. Rugg, and K. A. Norman,“Recollection, familiarity, and cortical reinstatement: a multivoxelpattern analysis,”
Neuron , vol. 63, no. 5, pp. 697–708, 2009.[57] J. R. Manning, S. M. Polyn, G. H. Baltuch, B. Litt, and M. J. Kahana,“Oscillatory patterns in temporal lobe reveal context reinstatementduring memory search,”
Proceedings of the National Academy ofSciences , vol. 108, no. 31, pp. 12893–12897, 2011.[58] M. Ritchey, E. A. Wing, K. S. LaBar, and R. Cabeza, “Neural similaritybetween encoding and retrieval is related to memory via hippocampalinteractions,”
Cerebral cortex , vol. 23, no. 12, pp. 2818–2828, 2013.[59] B. P. Staresina, R. N. Henson, N. Kriegeskorte, and A. Alink, “Episodicreinstatement in the medial temporal lobe,”
Journal of Neuroscience ,vol. 32, no. 50, pp. 18150–18156, 2012.[60] R. B. Yaffe, M. S. Kerr, S. Damera, S. V. Sarma, S. K. Inati, and K. A.Zaghloul, “Reinstatement of distributed cortical oscillations occurs withprecise spatiotemporal dynamics during successful memory retrieval,”
Proceedings of the National Academy of Sciences , vol. 111, no. 52,pp. 18727–18732, 2014.[61] J. Fulford, F. Milton, D. Salas, A. Smith, A. Simler, C. Winlove, andA. Zeman, “The neural correlates of visual imagery vividness–an fmristudy and literature review,”
Cortex , vol. 105, pp. 26–40, 2018.[62] N. Dijkstra, S. E. Bosch, and M. A. van Gerven, “Vividness of visualimagery depends on the neural overlap with perception in visual areas,”
Journal of Neuroscience , vol. 37, no. 5, pp. 1367–1373, 2017.[63] N. Dijkstra, S. E. Bosch, and M. A. van Gerven, “Shared neuralmechanisms of visual perception and imagery,”
Trends in cognitivesciences , 2019.[64] N. Dijkstra, P. Mostert, F. P. de Lange, S. Bosch, and M. A. van Gerven,“Differential temporal dynamics during visual imagery and perception,”
Elife , vol. 7, p. e33904, 2018.[65] M. St-Laurent, H. Abdi, and B. R. Buchsbaum, “Distributed patternsof reactivation predict vividness of recollection,”
Journal of CognitiveNeuroscience , vol. 27, no. 10, pp. 2000–2018, 2015.[66] K. J. Friston, L. Harrison, and W. Penny, “Dynamic causal modelling,”
Neuroimage , vol. 19, no. 4, pp. 1273–1302, 2003.[67] S. Hochstein and M. Ahissar, “View from the top: Hierarchies andreverse hierarchies in the visual system,”
Neuron , vol. 36, no. 5,pp. 791–804, 2002.[68] T. Serre, A. Oliva, and T. Poggio, “A feedforward architecture accountsfor rapid categorization,”
Proceedings of the national academy ofsciences , vol. 104, no. 15, pp. 6424–6429, 2007.[69] J. Linde-Domingo, M. S. Treder, C. Kerr´en, and M. Wimber, “Evidencethat neural information flow is reversed between object perception andobject reconstruction from memory,”
Nature communications , vol. 10,no. 1, pp. 1–13, 2019.[70] A. L. Wheeler, C. M. Teixeira, A. H. Wang, X. Xiong, N. Kovacevic,J. P. Lerch, A. R. McIntosh, J. Parkinson, and P. W. Frankland,“Identification of a functional connectome for long-term fear memoryin mice,”
PLoS Comput Biol , vol. 9, no. 1, p. e1002853, 2013.[71] L. R. Squire and P. Alvarez, “Retrograde amnesia and memory con-solidation: a neurobiological perspective,”
Current opinion in neurobi-ology , vol. 5, no. 2, pp. 169–177, 1995.[72] T. J. Teyler and J. W. Rudy, “The hippocampal indexing theory andepisodic memory: updating the index,”
Hippocampus , vol. 17, no. 12,pp. 1158–1169, 2007.[73] T. J. Teyler and P. DiScenna, “The hippocampal memory indexingtheory.,”
Behavioral neuroscience , vol. 100, no. 2, p. 147, 1986.[74] J. L. McClelland, B. L. McNaughton, and R. C. O’Reilly, “Why thereare complementary learning systems in the hippocampus and neocortex:insights from the successes and failures of connectionist models oflearning and memory.,”
Psychological review , vol. 102, no. 3, p. 419,1995.[75] M. B. Merkow, J. F. Burke, and M. J. Kahana, “The human hippocam-pus contributes to both the recollection and familiarity componentsof recognition memory,”
Proceedings of the National Academy ofSciences , vol. 112, no. 46, pp. 14378–14383, 2015.[76] J. F. Danker, A. Tompary, and L. Davachi, “Trial-by-trial hippocampalencoding activation predicts the fidelity of cortical reinstatement duringsubsequent retrieval,”
Cerebral Cortex , vol. 27, no. 7, pp. 3515–3524,2017.[77] A. J. Horner, J. A. Bisby, D. Bush, W.-J. Lin, and N. Burgess,“Evidence for holistic episodic recollection via hippocampal patterncompletion,”
Nature communications , vol. 6, no. 1, pp. 1–11, 2015.[78] B. P. Staresina, E. Cooper, and R. N. Henson, “Reversible informationflow across the medial temporal lobe: the hippocampus links corticalmodules during memory retrieval,”
Journal of Neuroscience , vol. 33,no. 35, pp. 14184–14192, 2013.[79] M. Moscovitch, “The hippocampus as a” stupid,” domain-specificmodule: Implications for theories of recent and remote memory, andof imagination.,”
Canadian Journal of Experimental Psychology/Revuecanadienne de psychologie exp´erimentale , vol. 62, no. 1, p. 62, 2008.[80] A. P. Georgopoulos, A. B. Schwartz, and R. E. Kettner, “Neuronalpopulation coding of movement direction,”
Science , vol. 233, no. 4771,pp. 1416–1419, 1986.[81] R. C. Decharms and A. Zador, “Neural representation and the corticalcode,”
Annual review of neuroscience , vol. 23, no. 1, pp. 613–647,2000.[82] H. B. Barlow, “Single units and sensation: a neuron doctrine forperceptual psychology?,”
Perception , vol. 1, no. 4, pp. 371–394, 1972.83] B. A. Olshausen and D. J. Field, “Sparse coding of sensory inputs,”
Current opinion in neurobiology , vol. 14, no. 4, pp. 481–487, 2004.[84] D. H. Hubel and T. N. Wiesel, “Receptive fields, binocular interactionand functional architecture in the cat’s visual cortex,”
The Journal ofphysiology , vol. 160, no. 1, p. 106, 1962.[85] M. Mishkin, L. G. Ungerleider, and K. A. Macko, “Object vision andspatial vision: two cortical pathways,”
Trends in neurosciences , vol. 6,pp. 414–417, 1983.[86] C. G. Gross, D. B. Bender, and C. d. Rocha-Miranda, “Visual receptivefields of neurons in inferotemporal cortex of the monkey,”
Science ,vol. 166, no. 3910, pp. 1303–1306, 1969.[87] K. Tanaka, “Inferotemporal cortex and object vision,”
Annual reviewof neuroscience , vol. 19, no. 1, pp. 109–139, 1996.[88] N. K. Logothetis and D. L. Sheinberg, “Visual object recognition,”
Annual review of neuroscience , vol. 19, no. 1, pp. 577–621, 1996.[89] E. I. Moser, E. Kropff, and M.-B. Moser, “Place cells, grid cells, andthe brain’s spatial representation system,”
Annu. Rev. Neurosci. , vol. 31,pp. 69–89, 2008.[90] I. Fried, K. A. MacDonald, and C. L. Wilson, “Single neuron activityin human hippocampus and amygdala during recognition of faces andobjects,”
Neuron , vol. 18, no. 5, pp. 753–765, 1997.[91] G. Kreiman, C. Koch, and I. Fried, “Category-specific visual responsesof single neurons in the human medial temporal lobe,”
Nature neuro-science , vol. 3, no. 9, pp. 946–953, 2000.[92] R. Q. Quiroga, G. Kreiman, C. Koch, and I. Fried, “Sparse butnot ‘grandmother-cell’coding in the medial temporal lobe,”
Trends incognitive sciences , vol. 12, no. 3, pp. 87–91, 2008.[93] S. Waydo, A. Kraskov, R. Q. Quiroga, I. Fried, and C. Koch, “Sparserepresentation in the human medial temporal lobe,”
Journal of Neuro-science , vol. 26, no. 40, pp. 10232–10234, 2006.[94] R. Q. Quiroga, A. Kraskov, F. Mormann, I. Fried, and C. Koch, “Single-cell responses to face adaptation in the human medial temporal lobe,”
Neuron , vol. 84, no. 2, pp. 363–369, 2014.[95] L. Reddy and S. J. Thorpe, “Concept cells through associative learningof high-level representations,”
Neuron , vol. 84, no. 2, pp. 248–251,2014.[96] K. Christoff, Z. C. Irving, K. C. Fox, R. N. Spreng, and J. R.Andrews-Hanna, “Mind-wandering as spontaneous thought: a dynamicframework,”
Nature Reviews Neuroscience , vol. 17, no. 11, pp. 718–731, 2016.[97] A. Kucyi, “Just a thought: How mind-wandering is represented indynamic brain connectivity,”
Neuroimage , vol. 180, pp. 505–514, 2018.[98] D. R. Addis, A. T. Wong, and D. L. Schacter, “Remembering the pastand imagining the future: common and distinct neural substrates duringevent construction and elaboration,”
Neuropsychologia , vol. 45, no. 7,pp. 1363–1377, 2007.[99] A. Tavanaei, M. Ghodrati, S. R. Kheradpisheh, T. Masquelier, and A. S.Maida, “Deep learning in spiking neural networks,” arXiv preprintarXiv:1804.08150 , 2018.[100] W. Gerstner and W. M. Kistler,
Spiking neuron models: Single neurons,populations, plasticity . Cambridge university press, 2002.[101] S. Ghosh-Dastidar and H. Adeli, “Spiking neural networks,”
Interna-tional journal of neural systems , vol. 19, no. 04, pp. 295–308, 2009.[102] M. Taylor, “The problem of stimulus structure in the behavioural theoryof perception,”
South African Journal of Psychology , vol. 3, pp. 23–45,1973.[103] N. Caporale and Y. Dan, “Spike timing–dependent plasticity: a hebbianlearning rule,”
Annu. Rev. Neurosci. , vol. 31, pp. 25–46, 2008.[104] S. Huang, C. Rozas, M. Trevino, J. Contreras, S. Yang, L. Song,T. Yoshioka, H.-K. Lee, and A. Kirkwood, “Associative hebbiansynaptic plasticity in primate visual cortex,”
Journal of Neuroscience ,vol. 34, no. 22, pp. 7575–7579, 2014.[105] D. B. McMahon and D. A. Leopold, “Stimulus timing-dependentplasticity in high-level vision,”
Current biology , vol. 22, no. 4, pp. 332–337, 2012.[106] J. Ott, E. Linstead, N. LaHaye, and P. Baldi, “Learning in the machine:To share or not to share?,”
Neural Networks , 2020.[107] T. Masquelier and S. J. Thorpe, “Unsupervised learning of visualfeatures through spike timing dependent plasticity,”
PLoS Comput Biol ,vol. 3, no. 2, p. e31, 2007.[108] L. Fei-Fei, R. Fergus, and P. Perona, “Learning generative visualmodels from few training examples: An incremental bayesian approachtested on 101 object categories,”
Computer vision and Image under-standing , vol. 106, no. 1, pp. 59–70, 2007. [109] G. Berlucchi, “Some aspects of the history of the law of dynamicpolarization of the neuron. from william james to sherrington, fromcajal and van gehuchten to golgi,”
Journal of the History of theNeurosciences , vol. 8, no. 2, pp. 191–201, 1999.[110] L. W. Barsalou, “Grounded cognition,”
Annu. Rev. Psychol. , vol. 59,pp. 617–645, 2008.[111] C. Von der Malsburg, “The what and why of binding: the modeler’sperspective,”
Neuron , vol. 24, no. 1, pp. 95–104, 1999.[112] A. M. Treisman and G. Gelade, “A feature-integration theory ofattention,”
Cognitive psychology , vol. 12, no. 1, pp. 97–136, 1980.[113] P. M. Milner, “A model for visual shape recognition.,”
Psychologicalreview , vol. 81, no. 6, p. 521, 1974.[114] M. N. Shadlen and J. A. Movshon, “Synchrony unbound: a criticalevaluation of the temporal binding hypothesis,”