A Quantum Information Retrieval Approach to Memory
aa r X i v : . [ q - b i o . N C ] O c t A Quantum Information Retrieval Approach toMemory
Kirsty Kitto and Peter Bruza
School of Information SystemsQueensland University of Technology, Brisbane, AustraliaEmail: [kirsty.kitto,p.bruza]@qut.edu.au
Liane Gabora
Psychology and Computer ScienceUniversity of British Columbia, Kelowna, CanadaEmail: [email protected]
Abstract —As computers approach the physical limits of in-formation storable in memory, new methods will be needed tofurther improve information storage and retrieval. We proposea quantum inspired vector based approach, which offers acontextually dependent mapping from the subsymbolic to thesymbolic representations of information. If implemented com-putationally, this approach would provide exceptionally highdensity of information storage, without the traditionally requiredphysical increase in storage capacity. The approach is inspiredby the structure of human memory and incorporates elementsof G¨ardenfors’ Conceptual Space approach and Humphreys etal.’s matrix model of memory.
I. M
EMORY S TRUCTURE AND I NFORMATION D ENSITY
The age of density driven computer memory increase isfast approaching its conclusion. With Moore’s law suggestingthat we are nearing the physical limits of information densitystorable in standard computational memory, it is time to inves-tigate new paradigms of information storage and retrieval. Thispaper proposes that a recently developed class of cognitivemodels provide a highly promising avenue, that can be usedto shift the current information storage paradigm from adensity dependent model to a more structural methodology.Our approach is inspired by insights from neuroscientificstudies of memory, and focuses upon the context in whichinformation is encoded and subsequently recalled. Mathemat-ically, it is grounded in a vector-based formalism that utilizesthe probability structure of quantum theory which draws upontwo related lines of research. One derives from modern ap-proaches to information retrieval which attempt to incorporatea sophisticated notion of context into the classification ofinformation as relevant to a query ([1], [2], [3], [4]). Thesecond approach is more squarely based in cognitive science,and uses a quantum approach to model concepts and theircombinations ([5], [6], [7], [8]).In summary, the key purpose of this paper is to suggest anew paradigm for information storage and retrieval in context that allows for a marked increase in the amount of informationstorable by a given resource. This will require the identificationof a mechanism by which stored information can be retrieved,which somehow links that information to relevant storage andretrieval contexts. We provide tentative solutions for both ofthese problems.We begin with a brief summary of how a subsymbolicencoding in human memory can still give rise to a symbolic capacity. This is followed by a review of the ConceptualSpace approach advocated by G¨ardenfors [9], which proposesa framework of three tiers for understanding human memory.We then discuss the Matrix Model of Memory [10], whichshows how a memory can be encoded along with informationabout the context in which it occurred. This will lead us toconsider the treatment of context in that model and finally toextend it through reference to a quantum information retrievalframework which combines the desirable features of eachapproach. We propose that our approach not only allows for anexceptionally high density memory storage but also providesa memory architecture that can process information in a waythat is flexible, adaptive, and possibly even creative.II. S
YMBOLIC AND S UBSYMBOLIC L EVELS OF H UMAN M EMORY
Let us begin by examining the architecture of humanmemory (summarized in [11]). This will serve as a startingpoint to build a computer memory that uses similar basicmechanisms to human memory.
A. The Subsymbolic Level
We take as a starting point some fairly well establishedcharacteristics of memory. Human memories are encodedin neurons that are sensitive to ranges (or values) of whathas been called subsymbolic microfeatures [12], [13]. Forexample, one might respond to lines of a particular orientation,or the quality of honesty, or quite possibly something thatdoes not exactly match an established term [14]. Note thatsometimes use the word concept is used by non-neuroscientists( e.g. [15]) to refer to subsymbolic microfeatures. In this paper,the word microfeatures is used to refer to stimuli respondedto by single cells, which may or may not be meaningful indaily life, and the word concepts to refer to things like DOG orBEAUTY that are generally comprised of many microfeatures,and refer collectively to a class of instances or exemplars thatare meaningful in daily life.Another characteristic of memory is that although eachneuron responds maximally to a particular microfeature, itresponds to a lesser extent to related microfeatures, an or-ganizational structure referred to as coarse coding [16]. Forexample, neuron A may respond preferentially to sounds of acertain frequency, while its neighbor B responds preferentiallyo sounds of a slightly different frequency, and so forth.However, although A responds maximally to sounds of onefrequency, it responds to a lesser degree to sounds of a similarfrequency. The upshot is that an item in memory is stored in adistributed manner across a cell assembly that contains manyneurons, and likewise, each neuron participates in the storageof many items [17]. A given experience activates not just oneneuron, nor every neuron to an equal degree, but activation isspread across members of an assembly. This means that thesame neurons get used and re-used in different capacities, aphenomenon referred to as neural re-entrance [18].The final key attribute of memory is its content addressabil-ity, meaning that there is a systematic relationship betweenthe content of a representation, and the neurons where it getsencoded. This emerges naturally as a consequence of the factthat representations activate neurons that are tuned to respondto particular features, so representations that get encoded inoverlapping regions of memory share features. As a result,they can thereafter be evoked by stimuli that are similar or resonant in some (perhaps context-specific) way [17], [19].Note that even if a brain does not possess a neuron thatis maximally tuned to a particular microfeature, the brainis still able to encode stimuli in which that microfeaturepredominates, because representations are distributed acrossmany neurons.Note that on the basis of the discovery of single cellsin the human brain that have highly selective, abstract andinvariant responses to complex, natural stimuli, which haveunfortunately been called concept cells, some neuroscientistshave questioned the idea that representations are distributed[15]. This is not inconsistent with the variety of distributedrepresentation discussed here. If you artificially activate oneneuron, it gives an invariant response. It is because real-worldstimuli and experiences activate not just one neuron but manythat actual representations in memory are distributed.
B. The Symbolic Level
Consciously experienced symbolic meanings emerge in re-sponse to the set of subsymbolic microfeatures responded to bythe entire constellation of activated neurons. Sometimes theseneurons have been activated as a unit many times before, atothers the constellation consists of neurons that have neverbeen activated simultaneously as a whole. In this case anemergent meaning may simultaneously incorporate elementsof different symbolic representations.The distributed, content addressable architecture of memoryis critical to the adaptive, flexible, and creative manner inwhich the information it stores is not just retrieved whenrequired, but frequently reconstructed in contextually appropri-ate and sometimes even creative ways. If this memory were notdistributed then there would be no overlap between items thatshare microfeatures, and thus no means of forging associationsbetween them. If it were not content-addressable then theseassociations would not be meaningful. The upshot is that rep-resentations which share features are encoded in overlappingdistributions of neurons, and therefore activation can spread from one to another. Thus representations are encoded inmemory in a way that takes into account how they are related,even if this relationship has never been consciously noticed[20], [11], [21]. This is not earth shattering; indeed it seemsfairly obvious with respect to the hierarchical structure ofknowledge. We may never have explicitly learned that a whitehamster is a mammal, but we know it is one nonetheless. Inthis sense it is reasonable to claim that people implicitly knowmore than they have ever explicitly learnt. This architecturehas implications that extend far beyond issues related to thehierarchical structure of knowledge.It should be pointed out how different this is from the typicalstructure of computer memory. In a computer memory, eachpossible input is stored in a unique address. Retrieval is thusa matter of looking up the address in a register and thenfetching the corresponding item at the specified location. Sincethere is no overlap of representations, there is no means ofcreatively forging new associations based on newly perceivedsimilarities. The exceptions are computer architectures that aredesigned to mimic, or are inspired by, the distributed, content-addressable nature of human memory, but these are difficult todiscuss formally. In this paper we shall propose a theoreticalstructure that can be used to map subsymbolic architecturesto symbolic representations, and so potentially provide a moreflexible, adaptable and creative approach to computer memory.
C. Forging Unusual Associations through Reconstructive In-terference of Memories
A fascinating finding to come out of the early connectionistliterature is that in a distributed, content addressable memory,not only do representations that share features activate eachother, they sometimes interact in a way that is creative. Evena simple neural network is able to abstract a prototype, fill inmissing features of a noisy or incomplete pattern, or createa new pattern on the fly that is more appropriate to thesituation than anything it has ever been fed as input [22].In fact, similar representations can interfere with one another[23], [24], [25], and these same papers provide numerousnames for this phenomenon: crosstalk; false memories; spu-rious memories; ghosts; and superposition catastrophe. Thesephenomena are suggestive of a form of thought that, if notoutright creative, involves a departure from known reality.Findings from neuroscience are also highly consistent withthis phenomenon; as Edelman puts it, one does not retrievea stored item from memory so much as reconstruct it [26].That is, an item in memory is never re-experienced in exactlythe form it was first experienced, but colored, however subtly,by what has been experienced in the meantime, re-assembledspontaneously in a way that relates to the task at hand (onereason eye-witness accounts cannot always be trusted [27],[28], [29]).Because information is encoded in a distributed manneracross ensembles of neurons interacting by way of synapses,the meaning of a representation is in part derived from themeanings of other representations that excite similar constel-lations of neurons; that is why memory is sometimes referredo as associative. Content addressability ensures that the brainnaturally brings to mind items that are similar in some perhapsunexpected or indefinable but useful or appealing way to whatis currently being experienced. Recall that if the regions inmemory where two distributed representations are encodedoverlap then they share one or more microfeatures. They mayhave been encoded at different times, under different circum-stances, and the correlation between them never explicitlynoticed. But the fact that their distributions overlap means thatsome context could come along for which this overlap wouldbe relevant, causing one to evoke the other. There are as manyroutes by which an association between two representationscan be forged as there are microfeatures by which they overlap;i.e., there is room for typical as well as atypical connections.Therefore what gets evoked in a given situation is relevant,and that happens for free; no search is necessary at all becausememory is content-addressable. The like attracts like principleis embedded in our neural architecture.Moreover, because memory is distributed and subject tocrosstalk, if a situation does come along that is relevant tomultiple representations, they merge together, a phenomenonthat has been termed reconstructive interference [30]. Themultiple items may be so similar to each other that you neverdetect that the recollection is actually a blend of many items,and in this case the distributions of neurons activated overlapsubstantially. Alternatively, they may differ in mundane ways,as in everyday mind-wandering. They may be superficiallydifferent but related in a way you never noticed before, inwhich case the distributions of neurons activated overlapsonly with respect to only a few features that happen to berelevant or important in the present context. Finally, the presentexperience may infuse recall of a previous experience that isrelevant or important with respect to only a few key features.We now turn to some of the models that have been proposedto describe this merging of the subsymbolic and the symboliclevels of human memory. We shall find a number which sharea key set of features.III. M
EMORY M ODELS I NSPIRED BY THE M ULTI - LEVELED A RCHITECTURE OF H UMAN M EMORY
G¨ardenfors [9] has proposed a three level model of cognitionin which the representation of information varies greatlyat each level. Within the lowest level, information is pre-conceptual or subsymbolic, and is carried by a connectionistrepresentation.At the uppermost level information is represented in termsof higher order symbolic structures such as sentences. Gram-mars specify the parts of a sentence, and the manner in whichthey fit together. It is at this upper symbolic level of cognitionwhere a significant portion of the computational literatureresides. Indeed, the very storage of information in a standardcomputer architecture could be understood as belonging to thislevel.While the need for at least these two levels seems intuitivelyplausible, the gap between the upper, logical level and thelowest connectionist level is difficult to bridge. How are we to connect the symbolic approaches with the structural? There isa possibility for some approach that shares logical and struc-tural characteristics between the symbolic and the structurallevels of cognition, and this is precisely where G¨ardenfors’intermediate, conceptual level, or conceptual space , is intro-duced. Rather than relying upon a connectionist structure, anintermediate geometric representation is used which providesan expressive theoretical framework capable of linking the‘hardware’ of a ‘neuronal’ level with the more commonlydescribed, and theoretically understood, logical level.IV. E
NCODING I NFORMATION IN A C ONCEPTUAL S TRUCTURE
Within a conceptual space, knowledge has a dimensionalstructure. For example, the property COLOR can be repre-sented in terms of three dimensions: hue, chromaticity, andbrightness, which can be mapped into a convex region in ageometric space. Thus, the property RED is a convex regionwithin the tri-dimensional space made up of hue, chromaticityand brightness, and the property BLUE would occupy adifferent region of this same space. A domain is a set ofintegral dimensions in the sense that values in particulardimensions can determine (or affect) the values possible inothers. For example, the three dimensions defining the colorspace are integral since the brightness of a color will affectboth its saturation (chromaticity) and hue. G¨ardenfors extendsthe notion of properties into concepts, which are likewisebased on domains.For example, the concept APPLE may have domains taste,shape, color, etc. Context is modeled as a weighting functionon the domains, for example, when eating an apple, the tastedomain will be prominent, but when playing with it, the shapedomain (i.e. its roundness) will be heavily weighted.Observe the distinction between representations at the sym-bolic and conceptual levels. At the symbolic level, the con-cept APPLE can be represented as the atomic propositionapple(x). However, within a conceptual space (conceptuallevel), it has a representation involving multiple inter-relateddimensions and domains. Colloquially speaking, the token“apple” (which might be spoken, written etc. ) is the tip of aniceberg with a rich underlying representation at the conceptuallevel. G¨ardenfors points out that the symbolic and conceptualrepresentations of information are not in conflict with eachother, but are to be seen as “different perspectives on howinformation is described”.However, an implementation problem arises in that boththe representation and the generation of a conceptual spacefrom its underlying content has generally been discussed onlyfor simple examples such as those above. It is not clearhow more complex examples could be implemented. A morecomprehensive and systematic approach to the representationof conceptual spaces is required.Vector space based models (VSBM) provide a viable firstavenue here. These can be traced back to the seminal paperof Salton et al. [31] who were searching for an appropriatemathematical space to represent documents for the purposesf Information Retrieval. Starting from a few basic desiderata,they settled upon a vector in a high dimensional vector spaceas an appropriate representation of a document. Within thisframework, a query is treated like a small (pseudo) documentthat is also converted to vector form. The documents in thecorpus are then ranked according to their distance to thequery; closer documents are considered more relevant thanones that are further away. One of the main drawbacks of thissystem was that it had trouble returning documents that wouldhave been highly relevant if one of the words in the querywas replaced by a synonym. The next advance came fromrepresenting concepts latently in a so-called semantic space where they are not formally represented or labeled. Semanticspaces are instances of vector spaces, and represent wordsin a basis created from other words, concepts, documents,or topics. They are generally built from the observation ofco-occurrences in large text corpora. In word spaces such asthe Hyperspace Analogue to Language (HAL) [32], the basisconsists of every word in the vocabulary. Thus, the vectorfor a given word w is calculated by summing the number ofoccurrences of word w ( i ) in a given context window aroundeach occurrence of w and writing that number at the position i in the vector that represents w . This number can be adjustedusing the distance (defined in terms of the number of words)or mutual information measures such as Point-Wise MutualInformation, which allows for a weighting of the importanceof the word at that position. It is also possible to take wordorder into account [33], [34]. Later models derived a morefundamental semantic value through a reduction of the initialword space using mathematical tools such as Singular ValueDecomposition [35], Non Negative Matrix factorization [36],or random projection [37], all of which generate a new smallerbasis which can under certain conditions be naturally relatedto certain topics, objects and concepts [36].Semantic space models, however, do not make provision forintegral dimensions (a notion related to that of ‘core proper-ties’ of a concept). This leaves them too situation dependent,and relevant primarily for the text collection from whichthey were constructed. For the purposes of next generationinformation storage, we will require a more objective informa-tion storage mechanism that can function satisfactorily at theconceptual level. Thus, while learning from the semantic spaceapproaches, this paper will propose that we extend them from atext based, and corpus dependent information representation,to a concept and property inspired approach. However, thevector based analytical contributions of the semantic spaceapproaches will play a key inspirational role as we start withthis extension.V. I NFORMATION R ETRIEVAL IN A M ATRIX M ODEL OF M EMORY
We now turn to a promising treatment of retrieval, thatis capable of providing a map between the neural and con-ceptual levels of information storage. The matrix model ofmemory [10] is a well known cognitive model. It stores andencodes memories as patterns of interconnections between the elements that define items in memory ( i.e. the neuronsfor a subsymbolic structure). All memories are superimposed(summated) in this representation so that, without appropriatecuing, their individual identities are lost. Thus, the modelprovides a natural link between the lower and mid levels ofinformation that G¨ardenfors proposes. For example, when aset of interconnected neurons fires, this can be represented inthe matrix model as a set of entries in a matrix, with the entriesin the matrix corresponding to the probability that a particularpairing of nodes will concurrently fire (although this is not anecessary interpretation of the model [10]).Humphreys et al. [10] take the position that there aretwo fundamental but pervasive memory access operations;matching and retrieval. Matching involves the comparison oftest cue(s) with the information stored in memory, and givesthe strength of the match as output. In contrast, retrievalinvolves the recovery of an associate of a cue (i.e. the returnof actual information), and so is the concept in which we arecurrently interested.The matrix model takes an item A i , occurring in a context X , to retrieve a list associate B i . This assumes that a three-wayassociation (between the context, the cue and the desired item)must have been stored. This is represented mathematicallyas the three-dimensional array xa ′ i b ′′ i , where x is a columnvector, a ′ i is a row vector, xa ′ i is a n × n matrix, and b ′′ i isanother vector in an orthogonal dimension to x and a ′ i . (Primesare used to indicate this set of orthogonality relationships.)The matrix xa ′ i represents the association between the context x and the cue a ′ i . If a list of items A B , A B . . . A k B k is learned in a context X , then Humphreys et al. definethe memory for the list as a simple sum over all the three-dimensional arrays that were formed: E = k X i =1 xa ′ i b ′′ i . (1)This list memory E is added onto any pre-existing memoriesin a process that we leave to the original article [10].Retrieval from this list memory is defined by Humphreys etal. [10] to work as follows. First, a test cue xa ′ j is applied tothe list memory: ( xa ′ j ) · E = ( xa ′ j ) · k X i =1 xa ′ i b ′′ i ! (2) = k X i =1 [( xa ′ j ) · ( xa ′ i )] b ′′ i (3) = k X i =1 [( x · x )( a ′ j · a ′ i )] b ′′ i (4) = [( x · x )( a ′ j · a ′ j )] b ′′ j + X i = j [( x · x )( a ′ j · a ′ i )] b ′′ i . (5)We can learn a little about this model through a considerationof the two terms in (5). The first term represents the desiredvector b ′′ j , weighted by a scalar term that results from takinghe dot products of two vectors with themselves. The secondterm is effectively an error term; if the similarity betweenthe cue a ′ j and the other cues that were used to store thememory ( a ′ i ) is too great then this error term will becomelarge and the chances of the correct item being recalled willdecrease. In short, the other stored memories ( b ′′ i , ( i = j ) )will interfere with the desired term. It is also worth notingthat the explicit representation of the context vectors using adot product (i.e. x · x ) in (5) suggests that the authors wereopen to the idea of a different context being present during therecall process, even if this was not explicitly discussed [10].We shall return to this point in section VII, however, a briefforeshadowing of that argument runs as follows: We considerthe use of context in this model to be unsatisfactory. Firstly,while the role of context is fundamental to this model, itmust be explicitly and fully recorded at the time of storage. Aslightly different context, or even a more detailed specificationof the same context could result in the retrieval of a verydifferent piece of information, even though a very similar cuewas utilized. The static treatment of context that is providedby this model therefore leaves us with what i an interestingretrieval paradigm, which is however perhaps unnecessarilylimited. We are left wondering if there is scope for a moreadaptive treatment of context, one provided by the geometricmodels of conceptual space that were introduced in section IV.In the remainder of this article we shall endeavor to connectthese two interesting approaches into an integrated cognitivememory model which could be used to form the basis ofa future physical implementation of computational memory.This approach takes its inspiration from a set of models thatconsider information retrieval in context, utilizing the powerfulformalism of quantum theory [1]–[4], [38], [39].VI. I NCORPORATING C ONTEXT INTO I NFORMATION E NCODING AND R ETRIEVAL
The seminal book by van Rijsbergen [1] provides a novelapproach to the modeling of semantic spaces, inspired byQuantum Theory (QT). This approach models a word w asa vector | w i = w ... w n (6)where | w i is termed a ket , in contrast to the row vectorobtained by taking the transpose: h w | = | w i T = ( w , . . . , w n ) .In this case, we shall take the subcomponents w . . . w n to bethe weights allocated to each of the available senses that theword might take in a n -dimensional vector space.We can quickly see the connection to both vector spacebased approaches and the Matrix Model of Memory. In bothcases a vector was obtained, (although in each case this was viaa different process) and formed the basis of further analysis.However, the formalism of quantum theory provides an extralevel of structure that implicitly incorporates a more adaptivenotion of context into information recall. This is done by seriously considering what it means todefine a basis for a conceptual space. Thus, the vector of(6) must be considered in its context ; it is a representationof information within a high dimensional vector space, withthe vector entries determining the extent of the vector in eachof the relevant dimensions.This geometric model provides predictions about the likelyrecall of an item from memory within a given context. Thisis achieved via an application of Pythagoras’ theorem. Thus,simplifying equation (6) down to a vector occurring in a twodimensional space, we might find that it could be drawn asshown in figure 1, where | w i = a (cid:18) (cid:19) + a (cid:18) (cid:19) (7) = a | p i + a | p i , and | a | + | a | = 1 . (8)Here, {| p i ≡ (0 , T , | p i ≡ (1 , T } define an orthonormal |w > = a |0 > + a |1 >aa |0 > |1 > c ccc 1 c Fig. 1. A concept w , for example red , is represented in some contextualprobe c which takes the form of a choice of basis. This low dimensionalrepresentation shows a case of some object being classified by a probe as“red” ( | i ) or “not red” ( | i ), with the probability of that a “red” judgmentbeing given by | a | . In a higher number of dimensions the property ofredness will be enclosed by a convex region, and the probabilities lie in arange of values specified by the extent of that region. basis, and so the inner product of these basis vectors returns0 or 1: h p | p i = h p | p i = 1 and h p | p i = h p | p i = 0 .The state (6), can be re-written using an extension of thisformalism, giving | w i = w ... + w ... + · · · + w n ... (9)which allows us to capture high dimensional vector represen-tations of information. Here, the w j ’s, or weights, representthe extent to which a piece of information falls into each ofthe dimensions of the vector space, and thus how much itoverlaps with the individual concepts represented by each axisin the basis. This means that the convex region representinga property in a conceptual space can be mapped out by acollection of vectors covering that region, with each of theweights mapping how much a given property is represented bythat dimension. A piece of stored information, (e.g. a concept w ) is thus represented in this framework as a state, or a vectorin a high dimensional space. For a low dimensional example,onsider the concept of “redness” that might be stored abouttwo different objects, such as an apple, and some wine. Ina two dimensional, or q-bit representation, each object willbe classified as either “red” or “not red” but this classificationwill depend upon the context. Figure 1 depicts a possible statewhich one of these objects might have, within a particularconcept space where | i represents “red” and | i “not red”.Within this specific context, we might find that an apple ismore likely to be returned as a response to a query that asksfor a “red object” than is red wine, although this might changewere the information to be sought in a different context. Weshall return to this point shortly, showing how this formalismcan very naturally capture this behavior.We propose that once information is stored in this complexmultidimensional space, it can be recovered through use of aprobe which enacts a quantum measurement of the state (7).This is defined with respect to a projection operator V , where,for the two dimensional case outlined above V = | p ih p | + | p ih p | = V + V . (10)According to the quantum formalism, the probability of aprobe represented by the p basis returning the desired valueis given by P ( | i ) = h w | V | w i (11) = h w | p ih p | w i (12) = (cid:0) a ∗ h p | p i + a ∗ h p | p i (cid:1) × (cid:0) a h p | p i + a h p | p i (cid:1) (13) = | a | . (14)However, in the context represented by the shifted basis infigure 2 the probe returns the desired information with aprobability given by P ( | i ) = | b | . Thus, a search for a |1 q >|0 p > |1 p > q >|0 a b b a Fig. 2. Changing the context of a probe can significantly affect the chancesof recall. Once this effect is incorporated into a Matrix-like memory model,a structural information storage system becomes more viable. “red object” in the context represented by p may yield a verydifferent result to that which searches in the context q .The assumption in (7) that the squared coefficients of thebasis vectors sum to 1 allows for the treatment of these valuesas probabilities since ≤ P ≤ etc. This approach makesuse of a geometrical notion of probability [40], which contrastswith standard probability theory, where probabilistic outcomesarise from our lack of knowledge as to what has actually oc-curred. Quantum probabilities are profoundly different, arising from a genuine state of uncertainty; the context in which theinformation is to be represented must be defined before therecall can start to make sense.We shall now show how this more sophisticated treatment ofcontext can be utilized in an extension of the Matrix MemoryModel which, while retaining the representation of items andcues as vectors, embeds them within a context that is spatialrather than vectorial.VII. R EMEMBERING AS A P ROCESS OF I NFORMATION R ETRIEVAL
In line with the proposal by Wiles et. al [41], we take theposition that the recall of information from a memory structurecan be well represented by a contextual probe to an underlyingnetwork structure. The construction of such a probe has beena difficult problem for neural modelers, as it is difficult tocorrelate the activation of neural connections with a logical, oreven conceptual, structure. However, with the three tier modeladvocated by G¨ardenfors we can begin to see how a probe thathas both a logical (symbolic) structure, and a connection to thelower (subsymbolic) level neural model can be created. Thissection will sketch out the key details of this construction.We start with a reference to the result shown in vanRijsbergen that projection operators such as the one in (10)can be used to define a conditional logic [1], meaning that thelink between the quantum conceptual space that we discussedin section VI and higher order logic has already been found.This leaves the connection between the subsymbolic neurallevel and the conceptual levels to be made. Returning to theconsideration of the Matrix Model that was started in section Vwe recall its use of a somewhat unsatisfactory explicit context .The representation of context in this model as a vector (i.e. x )means that it acquires an ontological status equivalent to thatof the items that are used as cues, or stored to be retrievedby those cues, however, we believe that this identification isincorrect. Rather then behaving as a thing, or absolute entity,context appears to be more of a relationship between thething currently under consideration ( i.e. the memory for thisscenario) and a perspective from which it will be viewed. Thisis a very new approach to the treatment of context in compu-tational representations, which most commonly take context tobe a thing, or a parameter [42], [43] with a similar ontologicalstatus as very system which is being considered within thatcontext . This is unlikely to be a satisfactory approach, but thelack of alternative formal models has hindered the adoptionof a more sophisticated understanding. However, the quantuminspired model presented above makes use of a very differentconceptualization, that we shall refer to here as an implicit context, which frames the system under consideration ratherthan being of a similar form to it. In what follows, we shallmake use of this implicit notion of context in an extension tothe Matrix Model of Memory which treats context as a spacerather than a vector.This will be achieved by utilizing projection operators ratherthan vectors to represent the context in which storage andrecall takes place. Thus, in place of the context vector x in5), we propose to utilize a projection operator that arises inthe same space as the memory itself V x = n X h =1 | x h ih x h | . (15)This equation takes the vector notion of context utilized inequation (5) and translates it into a set of projection operatorsdefined using basis vectors, each of which could have been acontext vector in the Matrix Model. Returning to equation (5)we rewrite it with this extended understanding of the contextof a memory: V y a j E = V y a j k X i =1 V x a i b i (16)where V y is a second cueing context defined with respect tothe vector y which could be specified with a different set ofbasis vectors to x . Expanding the projection operators in thisequation starts to give us some indication of how this modelcan be expected to behave: V y a j E = n X h =1 k X i =1 | y h ih y h | a j i| x i ih x i | a i i| b i i (17) = n X h =1 k X i =1 u hj v i | y h i| x i i| b i i (18) = X h,i u hj v i y h x i b i , (19)where u hj = h y h | a j i and v i = h x i | a i i are scalars, obtainedby taking the associated dot products of the correspondingvectors. These scalars weight the contribution of the individualcue, context and stored item vectors. If the context of recallis the same as the context of recording ( i.e. y = x ) then wecan say a little more about the recall process using a standardproperty of projection operators: V x V x = V x [40]. V x a j E = V x a j k X i =1 V x a i b i (20) = V x a j k X i =1 a i b i (21) = k X i =1 v i x i a i b i . (22)Finally, breaking (22) into the two components utilized in (5)we find that V y a j E = v j x j a j b j + X i = j v i x i a i b i (23)which has the same item to be retrieved + error terms of (5)but in more complex space that contains all cue, contextsand items stored in the memory. We finish by noting thatthis equation suggests that a context which maximizes v j will increase the probability of a correct retrieval result, butmany different contexts could have been used. Indeed, we need merely shift the basis in equation (15) in order to obtain a verydifferent set of representations for the items in memory, andthese would have a very different set of probabilities for recall.Thus, with a shift to a geometric space we see a way in whichinformation might be stored and retrieved from a system basedupon traditional storage mechanisms than is currently the case,all through the use of a sophisticated notion of context.VIII. C ONCLUSION
A strength of this approach lies in the density of informationthat it is likely to be able to store. The choice of a structuralapproach to information storage, with a potentially infinite setof contexts, allows for memory to move from a density drivenapproach, where the quantity of information stored is inverselyproportional to the size of the components used to store it,towards one where storage capacity is limited only by howmany sensible contexts can be used to retrieve the requiredinformation. Even with a very small storage space, a widerange of representations can still be obtained from a conceptualspace that takes the underlying subsymbolic structure andcomplexifies it according to the context in which it is accessed.Such an “actualization of potential” [30] provides both creativeability and extra storage capability. Indeed, with this approach,we can start to see how the lowest, or neural level of cognitioncan be made redundant despite its strong dependence upon aspecific structure.While we have emphasized the background of this quantuminspired model in the field of Information Retrieval, a relatedline of work [5]–[8], [45], [46] makes use of a quantumapproach to model concepts and their combinations. Thus,a growing body of literature points to the utility of thequantum formalism in modeling Information in context fromboth the cognitive and the computational sides of Informationstorage and retrieval. This approach has also been utilizedin a preliminary approach illustrating how context might beincorporated into vector spaces described with reference to apoint of view [47]. A solution which that paper shows mightcircumvent the apparent incompatibility between metricity andthe similarity judgments that humans actually make [48].While the proposed approach is in its very early days, wefeel that its incorporation of a wide range of both cognitive andcomputational insights makes it a highly interesting avenueto pursue as we search for new paradigms of computationalmemory and information storage. Future work will investigatethe manner in which different contexts might interfere withspecified cues to produce different probabilities of recall andhence different items from memory. It will also seek to furtherclarify the role of the projection operators in specifying acontext space, and to extend the formalism proposed at theend of the previous section. Finally, we intend to take seriouslythe notion of creativity as it arises in human memory, and toinvestigate the manner in which a similar notion might arisein a system such as this. Such a result would bring us one stepcloser towards a system capable of exhibiting a true form ofcomputational intelligence.
CKNOWLEDGMENTS
This project was supported in part by the Australian Re-search Council Discovery grant DP1094974, the Social Sci-ences and Humanities Research Council of Canada, and theFund for Scientific Research of Flanders, Belgium. Welcomesupport was also provided by the Marie Curie International Re-search Staff Exchange Scheme: Project 247590, “QONTEXT- Quantum Contextual Information Access and Retrieval”).R
EFERENCES[1] C. Van Rijsbergen,
The Geometry of Information Retrieval . CambridgeUniversity Press, 2004.[2] D. Widdows,
Geometry and Meaning . CSLI Publications, 2004.[3] D. Song and P. Bruza, “Towards context sensitive informationinference,”
Journal of the American Society for Information Scienceand Technology , vol. 54, no. 4, pp. 321 – 334, 2003.[4] P. Bruza, D. Widdows, and J. A. Woods, “Quantum Logic of DownBelow.” in
Handbook of Quantum Logic and Quantum Structures ,K. Engesser, D. Gabbay, and D. Lehmann, Eds. Elsevier, 2007, vol. 2.[5] D. Aerts and L. Gabora, “A theory of concepts and their combinations I:the structure of the sets of contexts and properties,”
Kybernetes , vol. 34,pp. 151–175, 2005.[6] ——, “A theory of concepts and their combinations II: a Hilbert spacerepresentation,”
Kybernetes , vol. 34, pp. 192–221, 2005.[7] K. Kitto, B. Ramm, P. D. Bruza, and L. Sitbon, “Testing for the non-separability of bi-ambiguous words,” in
Proceedings of the AAAI FallSymposium on Quantum Informatics for Cognitive, Social, and SemanticProcesses (QI 2010) . AAAI Press, 2010.[8] K. Kitto, B. Ramm, L. Sitbon, and P. D. Bruza, “Quantum theory beyondthe physical: information in context,”
Axiomathes , vol. 21, no. 2, pp.331–345, 2011.[9] P. G¨ardenfors,
Conceptual Spaces: The Geometry of Thought . MITPress, 2000.[10] M. Humphreys, J. Bain, and R. Pike, “Different ways to cue a coherentmemory system: A theory for episodic, semantic, and procedural tasks,”
Psychological Review , vol. 96, no. 2, pp. 208–233, 1989.[11] L. Gabora, “Revenge of the ‘neurds’: Characterizing creative thoughtin terms of the structure and dynamics of human memory,”
CreativityResearch Journal , vol. 22, no. 1, pp. 1–13, 2010.[12] P. Smolensky, “On the proper treatment of connectionism,”
Behavioraland Brain Sciences , vol. 11, pp. 1–43, 1988.[13] P. S. Churchland and T. Sejnowski,
The computational brain . MITPress, 1992.[14] Mikkulainen, “Neural network perspectives on cognition and adaptiverobotics,” in
Natural language processing with subsymbolic neuralnetworks , A. Brown, Ed. Institute of Physics Press, 1997.[15] A. Roy, “Discovery of concept cells in the human brain — could itchange our science?”
Natural Intelligence , vol. 1, no. 1, pp. 23–29,2011.[16] D. Hubel and T. N. Wiesel, “Receptive fields and functional architecturein two non-striate visual areas (18 and 19) of the cat,”
Journal ofNeurophysiology , vol. 28, pp. 229–289, 1965.[17] D. Hebb,
The organization of behavior . Wiley, 1949.[18] G. M. Edelman,
Neural Darwinism: the theory of neuronal groupselection . Oxford University Press, 1989.[19] D. A. Marr, “Theory of the cerebellar cortex,”
Journal of Physiology ,vol. 202, pp. 437–470, 1969.[20] L. Gabora, “Cognitive mechanisms underlying the origin and evolutionof culture,” Ph.D. dissertation, Center Leo Apostel for InterdiciplinaryStudies, Vrije Universiteit Brussel, 2001.[21] L. Gabora and A. Ranjan, “How insight emerges in distributed, content-addressable memory,” in
The Neuroscience of Creativity , A. Bristol,O. Vartanian, and J. Kaufman, Eds. Oxford University Press, In Press.[22] J. L. McClelland and D. E. Rumelhart, “A distributed model of memory,”in
Parallel distributed processing: Explorations in the microstructure ofcognition , D. E. Rumelhart, J. L. McClelland, and the PDP researchgroup, Eds. MIT Press, Cambridge, 1986, vol. II.[23] J. Feldman and D. Ballard, “Connectionist models and their properties,”
Cognitive Science , vol. 6, pp. 204–254, 1982. [24] J. Hopfield, “Neural Networks and Physical Systems with EmergentCollective Computational Abilities,”
Proceedings of the NationalAcademy of Sciences , vol. 79, pp. 2554–2558, 1982.[25] J. Hopfield, D. I. Feinstein, and R. D. Palmer, “’unlearning’ has astabilizing effect in collective memories,”
Nature , vol. 304, pp. 159–160, 1983.[26] G. M. Edelman,
Bright Air, Brilliant Fire: On the Matter of Mind . BasicBooks, 1993.[27] H. M. Paterson, R. I. Kemp, and J. P. Forgas, “Co-witnesses, confeder-ates, and conformity: The effects of discussion and delay on eyewitnessmemory,”
Psychiatry, Psychology and Law , vol. 16, no. 1, pp. S112–S124, 2009.[28] E. F. Loftus,
Memory: Surprising new insights Into how we rememberand why we forget . Addison-Wesley Pub. Co., 1980.[29] D. L. Schacter,
The seven sins of memory: How the mind forgets andremembers . Houghton Mifflin Co., 2001.[30] L. Gabora and A. Saab, “Creative interference and states of potentialityin analogy problem solving,” in
Proceedings of the Annual Meetingof the Cognitive Science Society . Boston MA.: Cognitive ScienceSociety, 2011, pp. 3506–3511.[31] G. Salton, A. Wong, and C. Yang, “A vector space model for automaticindexing,”
Communications of the ACM , vol. 18, no. 11, pp. 613–620,1975.[32] H. Sch¨utze, “Automatic word sense discrimination,”
Computationallinguistics , vol. 24, no. 1, pp. 97–123, 1998.[33] M. N. Jones and D. J. K. Mewhort, “Representing Word Meaning andOrder Information in a Composite Holographic Lexicon,”
PsychologicalReview , vol. 114, no. 1, pp. 1–37, 2007.[34] M. Sahlgren, A. Holst, and P. Kanerva, “Permutations as a means toencode order in word space,” in
Proceedings of the 30th Annual Meetingof the Cognitive Science Society , 2008, pp. 11–18.[35] T. Landauer and S. Dumais, “A solution to plato’s problem: the latentsemantic analysis theory of acquisition, induction and representation ofknowledge,”
Psychological review , vol. 104, no. 2, pp. 211–240, 1997.[36] D. D. Lee and H. S. Seung, “Learning the parts of objects bynon-negative matrix factorization,”
Nature , vol. 401, no. 6755, pp.788–791, 1999.[37] M. Sahlgren, “An introduction to random indexing,” in
Proceedings ofMethods and Applications of Semantic Indexing Workshop at the 7thInternational Conference on Terminology and Knowledge Engineering ,2005.[38] P. Bruza and R. Cole, “Quantum Logic of Semantic Space: AnExploratory Investigation of Context Effects in Practical Reasoning,” in
We Will Show Them: Essays in Honour of Dov Gabbay , S. Artemov,H. Barringer, A. d’Avila Garcez, L. Lamb, and J. Woods, Eds. CollegePublications, 2005, vol. 1, pp. 339–361.[39] M. Melucci, “A basis for information retrieval in context,”
ACM Trans.Inf. Syst. , vol. 26, pp. 14:1–14:41, June 2008.[40] C. J. Isham,
Lectures on Quantum Theory . London: Imperial CollegePress, 1995.[41] J. Wiles, G. Halford, J. Stewart, M. Humphreys, J. Bain, and W. Wilson,“Tensor Models: A creative basis for memory and analogical mapping,”in
Artificial Intelligence and Creativity , T. Dartnall, Ed. KluwerAcademic Publishers, 1994, pp. 145–159.[42] P. Br´ezillon, “Context in problem solving: a survey,”
KnowledgeEngineering Review , vol. 14, pp. 47–80, May 1999.[43] R. Guha and J. McCarthy, “Varieties of contexts,” in
Modeling andUsing Context , ser. Lecture Notes in Computer Science, P. Blackburn,C. Ghidini, R. Turner, and F. Giunchiglia, Eds. Springer Berlin /Heidelberg, 2003, vol. 2680, pp. 164–177.[44] K. Kitto and P. Bruza, “Tests and models of non-compositional con-cepts,” in
Proceedings: Cognitive Science 2012 , Japan, 2012, in Press.[45] P. Bruza, K. Kitto, B. Ramm, and L. Sitbon, “The non-decomposabilityof concept combinations,” under review.[46] D. Aerts, J. Broekaert, and L. Gabora, “A case for applying an abstractedquantum formalism to cognition,”
New Ideas in Psychology , vol. 29,no. 1, pp. 136–146, 2011.[47] S. Aerts, K. Kitto, and L. Sitbon, “Similarity metrics within a point ofview,” in
Quantum Interaction 5th International Symposium, QI 2011,Aberdeen, UK, June 26-29, 2011, Revised Selected Papers , ser. LNCS,D. Song, M. Melucci, and et al., Eds., vol. 7052. Springer, 2011, pp.13–24.[48] A. Tversky and I. Gati, “Similarity, separability, and the triangle in-equality.”