[PDF] Computing as compression: the SP theory of intelligence

Abstract

This paper provides an overview of the SP theory of intelligence and its central idea that artificial intelligence, mainstream computing, and much of human perception and cognition, may be understood as information compression. The background and origins of the SP theory are described, and the main elements of the theory, including the key concept of multiple alignment, borrowed from bioinformatics but with important differences. Associated with the SP theory is the idea that redundancy in information may be understood as repetition of patterns, that compression of information may be achieved via the matching and unification (merging) of patterns, and that computing and information compression are both fundamentally probabilistic. It appears that the SP system is Turing-equivalent in the sense that anything that may be computed with a Turing machine may, in principle, also be computed with an SP machine. One of the main strengths of the SP theory and the multiple alignment concept is in modelling concepts and phenomena in artificial intelligence. Within that area, the SP theory provides a simple but versatile means of representing different kinds of knowledge, it can model both the parsing and production of natural language, with potential for the understanding and translation of natural languages, it has strengths in pattern recognition, with potential in computer vision, it can model several kinds of reasoning, and it has capabilities in planning, problem solving, and unsupervised learning. The paper includes two examples showing how alternative parsings of an ambiguous sentence may be modelled as multiple alignments, and another example showing how the concept of multiple alignment may be applied in medical diagnosis.

Full PDF

aa r X i v : . [ c s . A I] M a r Computing as compression: the SP theory of intelligence

J Gerard Wolff Abstract.

This paper provides an overview of the

SP theory of intelligence and its central idea that artiﬁcial intelligence, mainstream computing,and much of human perception and cognition, may be understood asinformation compression.The background and origins of the SP theory are described, andthe main elements of the theory, including the key concept of multi-ple alignment , borrowed from bioinformatics but with important dif-ferences. Associated with the SP theory is the idea that redundancy in information may be understood as repetition of patterns, that com-pression of information may be achieved via the matching and uni-ﬁcation (merging) of patterns, and that computing and informationcompression are both fundamentally probabilistic. It appears that theSP system is Turing-equivalent in the sense that anything that may becomputed with a Turing machine may, in principle, also be computedwith an SP machine.One of the main strengths of the SP theory and the multiple align-ment concept is in modelling concepts and phenomena in artiﬁcialintelligence. Within that area, the SP theory provides a simple butversatile means of representing different kinds of knowledge, it canmodel both the parsing and production of natural language, with po-tential for the understanding and translation of natural languages, ithas strengths in pattern recognition, with potential in computer vi-sion, it can model several kinds of reasoning, and it has capabilitiesin planning, problem solving, and unsupervised learning.The paper includes two examples showing how alternative pars-ings of an ambiguous sentence may be modelled as multiple align-ments, and another example showing how the concept of multiplealignment may be applied in medical diagnosis.

Since about 1987, I have been developing the idea that artiﬁcial in-telligence, mainstream computing, and much of human perceptionand cognition, may be understood as information compression. Anearly version of the idea, as applied to computing, is described in[17]. Since then, progressively more reﬁned versions of the

SP theoryof intelligence have been described in several peer-reviewed articlesand, in some detail, in a book [19]. The main aim in this paper is to to provide an overview of thetheory, with the main focus on computing, and to describe some ofthe associated thinking.

The SP theory has grown out of four main strands of work: CognitionResearch.org, UK, email: [email protected] Bibliographic details of relevant publications may be found via links from . • A body of research, pioneered by Fred Attneave (eg, [1]), HoraceBarlow (eg [2, 3]), and others, showing that many aspects of theworkings of brains and nervous systems may be understood ascompression of information. • My own research, developing models of language learning, wherethe importance of information compression became increasinglyclear (see, for example, [16]). • Research on principles of ‘minimum length encoding’, pioneeredby Ray Solomonoff (eg, [11]), Chris Wallace (eg, [14]), JormaRissanen (eg, [9]), and others. • Several observations that suggest that information compressionhas a key role in computing, mathematics, and logic ([19, Chap-ters 2 and 10]), some of which are outlined in Section 4.2, below.

The main elements of the SP theory are: • The theory is conceived as an abstract system that, like a brain,may receive ‘New’ information via its senses and store some orall of it as ‘Old’ information. • All New and Old information is expressed as arrays of atomicsymbols ( patterns ) in one or two dimensions. • The system is designed for the unsupervised learning of Old pat-terns by compression of New patterns. • An important part of this process is, where possible, the econom-ical encoding of New patterns in terms of Old patterns. This maybe seen to achieve such things as pattern recognition, parsing orunderstanding of natural language, or other kinds of interpretationof incoming information in terms of stored knowledge, includingseveral kinds of reasoning. • Compression of information is achieved via the matching anduniﬁcation (merging) of patterns, with an improved version ofdynamic programming [19, Appendix A] providing ﬂexibility inmatching, and with key roles for the frequency of occurrence ofpatterns, and their sizes. Apart from direct evidence for the importance of information compressionin the workings of brains and nervous systems, we would expect informa-tion compression to have been favoured by natural selection because it canfacilitate economies in the storage of information, economies in the pro-cessing and transmission of information, corresponding economies in en-ergy demands (the brain is 2% of total body weight but itdemands 20% ofour resting metabolic rate), and perhaps most importantly, it provides thekey to the inductive prediction of the future from the past (Section 3.4). So far, the main emphasis in the development of the theory has been on one-dimensional patterns, as described in this paper. But there is clear potentialto generalise the theory for patterns in two dimensions, with potential ap-plications in, for example, computer vision.

The concept of multiple alignment , outlined in Section 3.1, is apowerful central idea, similar to the concept of multiple alignmentin bioinformatics but with important differences. • Owing to the intimate connection between information compres-sion and concepts of probability (Section 3.4), it is relativelystraightforward for the SP system to calculate probabilities for in-ferences made by the system, and probabilities for parsings, recog-nition of patterns, and so on. • In developing the theory, I have tried to take advantage of what isknown about the psychological and neurophysiological aspects ofhuman perception and cognition, and to ensure that the theory iscompatible with such knowledge. The way the SP concepts maybe realised with neurons (

SP-neural ) is discussed in [19, Chapter11].These ideas are currently realised in the form of two computermodels, SP62 and SP70, described in outline below. These modelsmay be seen as ﬁrst versions of the proposed

SP machine , an expres-sion of the SP theory and a means for it to be applied.

The multiple alignment concept in the SP theory has been adaptedfrom a similar concept in bioinformatics, where it means a pro-cess of arranging, in rows or columns, two or more DNA sequencesor amino-acid sequences so that matching symbols—as many aspossible—are aligned orthogonally in columns or rows.The main difference between the two concepts is that, in bioinfor-matics, all sequences have the same status, whereas in the SP theory,the system attempts to create a multiple alignment which enables oneNew pattern (sometimes more) to be encoded economically in termsof one or more Old patterns.As an illustration of the concept, Figure 1 shows two multiplealignments which are, in effect, two alternative parsings of the am-biguous sentence ‘Fruit ﬂies like a banana’. These two multiple alignments are the best ones found by the SP61computer model of the SP system , with a set of Old patterns repre-senting grammatical rules (including words and their grammaticalcategories) and a New pattern representing the sentence to be parsed.Here, ‘best’ means that these two multiple alignments achieve thegreatest degree of compression of the New patterns via its encodingin terms the Old patterns. More detail may be found in [19, Section3.5].Although this example does not illustrate the point, it is pertinentto mention that the process of forming multiple alignments is robustin the face of errors. Plausible multiple alignments may be formedeven when the New pattern or the Old patterns, or both, contain errorsof omission, commission, or substitution. The SP theory is currently expressed in the SP70 computer model,and a subset of it—SP62—which lacks any ability in learning.At the heart of the SP models is a process for ﬁnding good full orpartial matches between patterns [19, Appendix A], somewhat likethe WinMerge utility for ﬁnding similarities and differences betweenﬁles, or standard ‘dynamic programming’ methods for the alignment This sentence is the second part of

Time ﬂies like an arrow. Fruit ﬂies like abanana. , attributed to Groucho Marx. SP61 is a slightly earlier precursor of SP62, and very similar to it. of sequences. The main difference between the SP process and oth-ers, is that the former can deliver several alternative matches betweenpatterns, while WinMerge and standard methods deliver one ‘best’result.Multiple alignments are built in stages, with pairwise matchingand merging of patterns, with merged patterns from any stage beingcarried forward to later stages, and with a weeding out, at all stages,of low-scoring multiple alignments. This is broadly similar to someprograms for the creation of multiple alignments in bioinformatics.In the SP70 model, there are additional processes of deriving Oldpatterns from multiple alignments, evaluating sets of newly-createdOld patterns in terms of their effectiveness for the economical encod-ing of the New information, and the weeding out low-scoring sets.Because of the way each model searches for a global optimum, itdoes not depend on the presence or absence of any particular featureor combination of features. Up to a point, plausible results may beobtained in the face of errors of omission, commission and substitu-tion in the data.More detail may be found in [19, Sections 3.9, 3.10, and 9.2].

A central idea in the SP theory and the multiple alignment conceptis that redundancy in information may be understood as repetition ofpatterns, and that information compression may be achieved by ﬁnd-ing patterns that match each other and merging or ‘unifying’ patternsthat are the same. For the sake of brevity, this kind of matching anduniﬁcation of patterns will be referred to as ‘MUP’.MUP is fairly easy to see in widely-used ‘zip’ programs for infor-mation compression, based on the LZW algorithm or related algo-rithms. But such processes are invisible or hard to see in compressiontechniques such as arithmetic coding or wavelet compression.Although the point has not been argued in detail, I believe it islikely that MUP is fundamental, not only in zip programs, but also incompression techniques with a mathematical orientation: • In [19, Chapter 10], I have argued that much of mathematics (andlogic) may be understood in terms of information compression viamultiple alignment, including MUP. • If that argument is accepted then, since mathematics is prominentin compression techniques such as arithmetic coding and waveletcompression, we may infer that MUP is fundamental in compres-sion techniques of that kind.

In considering MUP, it not hard to see that, for any body of infor-mation I , except very small examples, there is a huge number of al-ternative ways in which patterns may be matched against each other,and there will normally be many alternative ways in which patternsmay be uniﬁed [19, Section 2.2.8.4].As indicated earlier, the focus of interest is normally on matchesbetween patterns that yield relatively high levels of compression.Since it is not normally possible to make an exhaustive search ofthe space of alternative matches, the SP computer models rely on thekinds of heuristic techniques that are familiar in other AI applica-tions, reducing the size of the search space by pruning the search treeat appropriate points. Current models allow the user to apply variousconstraints on searching (such as the exclusion of partial matches fruit flies like a banana 0| | | | |1 | | | | < N 1 banana > 1| | | | | | |2 | | | < NP 1 < D | > < N > > 2| | | | | | | | | |3 | | | | | < D 1 a > | 3| | | | | |4 | | < ADV 0 like > | | | 4| | | | | | | |5 | | < ADP < ADV > < NP > > 5| | | | |6 < N 2 fruit > | | | | 6| | | | | | |7 S 0 < N > < V | > < ADP > 7| | | |8 < V 0 flies > 8(a)0 fruit flies like a banana 0| | | | |1 | | | | < N 1 banana > 1| | | | | | |2 | | | < NP 1 < D | > < N > > 2| | | | | | | | | |3 | | | | | < D 1 a > | 3| | | | | |4 | | < V 1 like > | | | 4| | | | | | | |5 S 1 < NP | | > < V > < NP > 5| | | | |6 | | < A 0 fruit > | | 6| | | | | | |7 < NP 0 < A > < N | > > 7| | | |8 < N 0 flies > 8(b) Figure 1.

The two best multiple alignments found by a near-identical precursor of the SP62 computer model with Old patterns representing grammatical rules(rows 1 to 8 in (a) and (b)) and the ambiguous sentence fruit ﬂies like a banana as a ‘New’ pattern (row 0 in each multiple alignment). Reproduced from Figure5.1 in [19], with permission. between patterns), and to choose any size of search space from large(slow but thorough) through to small (quick and dirty).In its ideal form, with exhaustive search, and with a realisticallylarge size for I , MUP is not tractable. But with the use of heuristictechniques, MUP becomes quite practical, with time complexity ina serial processing environment estimated to be O ( n · m ) , where n is the number of atomic symbols in a given pattern and m is thenumber of atomic symbols in I [19, Appendix A.4]. In a parallel-processing environment, the time complexity has been estimated tobe O ( n ) ( ibid. ). It is widely recognised that there is a close connection between infor-mation compression and concepts of prediction and probability (see,for example, [6]). In terms of the SP theory, that close connectionmakes good sense: • The amount of compression that can be achieved via MUP de-pends directly on the sizes of patterns that are uniﬁed and theirfrequencies. The patterns that yield relatively high levels of com-pression are also those that provide a good basis for inductive pre-diction. In this connection, it appears that the sizes of patterns are as important astheir frequencies. • Partial matches between patterns provide the basis for speciﬁc pre-dictions: if we are going out and we see black clouds then, know-ing the association between black clouds and rain, we may decideto take an umbrella.All of this is fundamentally probabilistic: • As mentioned above, the frequency of occurrence of patterns is akey variable in the search for uniﬁcations that yield high levels ofcompression. • Because it is not normally possible to achieve ideal solutions oreven to know whether or not such a solution has been found (Sec-tion 3.3.1), there will be corresponding uncertainties. • For every multiple alignment that is created by the system, andfor any inferences that may be drawn, there is an associated prob-ability. These probabilities can be calculated by the SP computermodels.There is more on this topic in Section 4.3.

In [19, Chapter 4], I have argued that the SP system is equivalent to auniversal Turing machine [12], in the sense that anything that may beomputed with a Turing machine may, in principle, also be computedwith an SP machine. The ‘in principle’ qualiﬁcation is necessary be-cause the SP theory is still not fully mature and there are still someweaknesses in the SP computer models.The gist of the argument is that the operation of a Post canonicalsystem [8] may be understood in terms of the SP theory and, since itis accepted that the Post canonical system is equivalent to the Turingmachine (as a computational system), the Turing machine may alsobe understood in terms of the SP theory.The thread running through all three models of computing is thematching and uniﬁcation of patterns. This is, of course, a prominentfeature of the SP theory. Although it is not formally recognised in thePost canonical system or the Turing machine, it is relatively clear tosee in the Post canonical system and it can also be seen to operate inthe state transition tables of the Turing machine.Is there anything to choose between these three models? Isn’t theSP theory just another model of computing to go alongside earliermodels such as the Turing and Post models, ‘lamda calculus’ [10],‘recursive function’ [5], and ‘normal algorithm’ [7]?In answer to those questions, the main differences between the SPtheory and earlier theories of computing are: • It has a lot more to say about the nature of ‘intelligence’ than othertheories of computing (see Section 5). • Unlike earlier theories, it is founded on principles of informa-tion compression via the matching and uniﬁcation of patterns, andit includes mechanisms for building multiple alignments and forheuristic search that are not present in any of the other models. • Although the SP system is more complex than the Turing modelof computing, it can mean substantial reductions in the overallcomplexity of computing systems, bearing in mind that softwareis as much part of a computing system as is the hardware ([19,Section 4.4.2], [22]). The reasoning is that, by providing MUPmechanisms in the core of SP system, there is less need to providethose mechanisms in software and, in particular, there is less needto repeat those mechanisms again and again in different softwareapplications.

Although it may seem counter-intuitive to suppose that informationcompression has any signiﬁcant role in conventional computing aswe know it today, there is evidence in support of that idea, as outlinedin the following subsections.

The matching of patterns is widespread in conventional computingsystems and in most cases there is at least an implicit uniﬁcation ofthe patterns that match each other. Here are some examples: • Accessing information in computer memory . The process of ac-cessing an item of information in computer memory means ﬁndinga match between the address of the item as it is known within theCPU and the address of the item in computer memory. Althoughthe process is normally described in terms of the operations oflogic circuits, that should not obscure the fact that it is a processof ﬁnding a match between two copies of the relevant address. • De-referencing of names . Names are widely used in conventionalcomputing. Examples include the names of functions, procedures or sub-routines, names for objects and classes in object-orientedprograms, names for tables, records, and ﬁelds in databases,names of ﬁles and directories, names of variables, arrays and otherdata structures, and labels for program statements (for use with thenow-shunned ‘go to’ statements). The de-referencing of any suchname—ﬁnding what it represents—means ﬁnding a match, withan implicit uniﬁcation, between the name as a reference and thename on the structure that is to be retrieved. • Information retrieval . The ‘query-by-example’ technique of re-trieving information from a database means searching for a goodmatch (full or partial) between the query and zero or more recordsin the database, with implicit uniﬁcation where matches are found.In a similar way, searching for information on the internet meanssearching for good full or partial matches between a query patternand zero or more web pages.

The ‘chunking-with-codes’ technique for the compression of infor-mation means identifying a relatively large ‘chunk’ of informationthat occurs two or more times in a body of information, giving it arelatively short name or ‘code’, and then using the name instead thechunk in all but one places where the chunk occurs.Perhaps the most obvious example of this idea in conventionalcomputing is the use of named functions, procedures or sub-routinesin computer programs. The function or procedure may be seen as achunk of information which is deﬁned in one part of a given programand accessed via its name from other parts of the program. Unlessthe given function is used only once, this technique will normallymean useful savings in the sizes of programs. It will also facilitatethe editing of computer programs and eliminate the risk of introduc-ing inconsistencies between different instances of a given function.Similar things can be said about most of the other kinds of namesmentioned above in connection with ‘de-referencing of names’.

In sequential information, the ‘run-length coding’ technique for thecompression of information may be applied wherever something isrepeated two or more times in a contiguous sequence. In that case,multiple instances may be reduced to one, with some kind of indi-cation that it repeats, something like ‘a b c (10)’ (showing that thepattern ‘a b c’ repeats 10 times) or ‘a b c (*)’ (showing that the pat-tern repeats but without specifying the number of repetitions).In computer programs, this kind of technique can be seen in it-erations (eg, repeat ... until , while ... do , or for ... do ) and also inrecursive functions such as: long factorial(int x){ if (x == 1) return(1) ;return(x * factorial(x - 1)) ;}. The use of iteration or recursion can avoid a lot of space-wastingredundancy in computer programs.

The ‘schema-plus-correction’ technique for information compres-sion may be applied where a pattern is repeated but with variationsfrom one occurrence to another. For example, a six-course menu in restaurant may have the general form ‘Appetiser, S, sorbet, M, P,coffee and mints’, with choices at the points marked ‘S’ (starter), ‘M’(main course), and ‘P’ (pudding). Then a particular meal may be en-coded economically as something like ‘Menu1 (3)(5)(1)’, where thedigits determine the choices of starter, main course, and pudding.In a computer program, any function or procedure that has pa-rameters may be seen as a schema, where the parameters serve todetermine choices within the schema—and where those choices arenormally expressed in the form of conditional statements.In object-oriented design (next section), a class may also be seenas a schema for particular objects, with the details of each objectdetermined via parameters.

In terms of information compression, object-oriented design appearsto be signiﬁcant for two main reasons: • Economies in human perception and cognition . Arguably, we seethe world in terms of discrete objects because each such object isa recurrent constellation of features (a ‘chunk’) that enables us toperceive and understand things in an economical way [19, Section2.3.2]. Likewise, there are huge economies to be made in recog-nising things in terms of classes ( ibid. ). • Economies in software design . By modelling software on the ob-jects and classes that people know, we not only create programsthat are easy for people to understand but we take advantage ofwhat are normally very substantial economies in the way that peo-ple understand things. As already indicated, software objects maybe seen as examples of the chunking-with-codes technique forcompressing information and object-oriented classes may be seenas examples of schema-plus-correction.

As indicated earlier, similar things may be said about mathematicsand logic [19, Chapter 10]. It appears that much of mathematics andlogic may be understood in terms of the kinds of compression tech-niques that have been mentioned: chunking-with-codes, run-lengthcoding, and schema-plus-correction.As an example of the power of mathematics to compress infor-mation, Newton’s equation that relates the distance travelled by afalling object to the time since it began to fall ( s = gt / ) is verymuch more compact than any realistically-large table of those dis-tances and times. As indicated in Section 3.4, there is an intimate connection betweeninformation compression and concepts of probability, and the SP sys-tem is fundamentally probabilistic. This implies that computing isfundamentally probabilistic.That may seem like a strange conclusion in view of the clockworkcertainties that we associate with the operation of ordinary computersand the workings of mathematics and logic. There are at least threeanswers to that apparent contradiction: • It appears that computing, mathematics and logic are more proba-bilistic than our ordinary experience of them might suggest. Gre-gory Chaitin has written: “I have recently been able to take a fur-ther step along the path laid out by G¨odel and Turing. By trans-lating a particular computer program into an algebraic equation of a type that was familiar even to the ancient Greeks, I have shownthat there is randomness in the branch of pure mathematics knownas number theory. My work indicates that—to borrow Einsteinsmetaphor—God sometimes plays dice with whole numbers.” [4,p. 80]. • The SP system may imitate the clockwork nature of ordinary com-puters by delivering probabilities of 0 and 1. This can happen withcertain kinds of data, or tight constraints on the process of search-ing the abstract space of alternative matches, or both those things. • It seems likely that the all-or-nothing character of conventionalcomputers has its origins in the low computational power of earlycomputers. In those days, it was necessary to apply tight con-straints on the process of searching for matches between patterns.Otherwise, the computational demands would have been over-whelming. Similar things may be said about the origins of mathe-matics and logic, which have been developed for centuries withoutthe beneﬁt of any computational machine, except very simple andlow-powered devices. Now that it is technically feasible to applylarge amounts of computational power, constraints on searchingmay be relaxed.

Although Alan Turing envisaged that computers might become intel-ligent [13], the Turing theory, in itself, does not tell us how! Pluggingthat gap has been an important motivation in the development of theSP theory. As it stands, it is certainly not a comprehensive answerbut, as was mentioned in Section 4.1, and ampliﬁed here, it doeshave a lot more to say about the nature of intelligence than earliertheories of computing.The most comprehensive account of these aspects of the SP theoryis in [19]. In brief: • Representation of knowledge and information retrieval . Despitethe simplicity of representing knowledge with patterns, the waythey are processed within the multiple alignment framework givesthem the versatility to represent several kinds of knowledge, in-cluding grammars for natural languages (next bullet point), classhierarchies, part-whole hierarchies, decision networks and trees,relational tuples, if-then rules, associations of medical signs andsymptoms (Section 5.1), causal relations, and more. One universalformat for knowledge and one universal framework for processingmeans that different kinds of knowledge may be combined ﬂexi-bly and seamlessly according to need. The SP system provides forthe retrieval of information from a knowledge base in the man-ner of query-by-example, and has potential to support the devel-opment of query languages, if required. The system may serveas an intelligent database that also supports the use of traditionaldata models—but with advantages compared with existing sys-tems [20]. • Natural language processing . Grammatical rules, including wordsand their grammatical categories, may be represented with SP pat-terns. As we have seen (Figure 1) the parsing of natural languagemay be modelled via the building of multiple alignments. Thesame is true of the production of natural language. The frame-work provides an elegant means of representing discontinuous de-pendencies in syntax, including overlapping dependencies such asnumber dependencies and gender dependencies in languages likeFrench. As indicated in the previous item, the system may alsomodel non-syntactic ‘semantic’ structures and, because there isne simple format for different kinds of knowledge, the system fa-cilitates the seamless integration of syntax with semantics—witha consequent potential for the understanding of natural languagesand interlingua-based translations amongst languages. The systemis robust in the face of errors of omission, commission or substi-tution in sentences to be analysed, or stored linguistic knowledge,or both. The importance of context in the processing of languageis accommodated in the way the system searches for a global bestmatch for patterns: any pattern or partial pattern may be a contextfor any other. • Pattern recognition and computer vision . Thanks largely to theversatility of the multiple alignment concept, the SP system pro-vides a powerful framework for pattern recognition. It can modelpattern recognition at multiple levels of abstraction, it provides forcross-classiﬁcation and the integration of class-inclusion relationswith part-whole hierarchies, and it facilitates the seamless integra-tion of pattern recognition with various kinds of reasoning (nextbullet point), and other aspects of intelligence. A probability maybe calculated for any given classiﬁcation or any associated infer-ence. As in the processing of natural language, the system is robustin the face of errors of omission, commission or substitution in in-coming data, or stored knowledge, or both, and the importanceof context in recognition is accommodated in the way the systemsearches for a global best match for patterns. These ideas appearto have potential in the ﬁeld of computer vision, as discussed in[21]. • Reasoning . The SP system can model several kinds of reason-ing including one-step ‘deductive’ reasoning, abductive reason-ing, reasoning with probabilistic decision networks and decisiontrees, reasoning with ‘rules’, nonmonotonic reasoning and reason-ing with default values, reasoning in Bayesian networks (includ-ing ‘explaining away’), causal diagnosis, and reasoning which isnot supported by evidence. Since these several kinds of reasoningall ﬂow from one computational framework (multiple alignment),they may be seen as aspects of one process, working individuallyor together without inconsistencies or incompatibilities. Plausiblelines of reasoning may be achieved, even when relevant informa-tion is incomplete. Probabilities of inferences may be calculated,which may, as previously indicated, include extreme values (0 or1) in the case of logic-like ‘deductions’. • Planning and problem solving . The SP framework provides ameans of planning a route between two places, and, with the trans-lation of geometric patterns into textual form, it can solve the kindof geometric analogy problem that may be seen in some puzzlebooks and IQ tests [19, Chapter 8]. • Unsupervised learning . The SP70 model can derive a plausiblegrammar from a set of sentences without supervision or error cor-rection by a ‘teacher’, without the provision of ‘negative’ sam-ples, and without the grading of samples from simple to com-plex. It thus overcomes restrictions on what can be achieved withsome other models of learning and reﬂects more accurately whatis known about how children learn their ﬁrst language or lan-guages. The model draws on earlier research showing that induc-tive learning via principles of ‘minimum length encoding’ can leadto the discovery of entities that are psychologically natural—suchas words in natural languages [15]. As it stands now, the model isnot able to derive intermediate levels of abstraction or discontinu-ous dependencies in data, but those problems appear to be soluble.

To illustrate some of the versatility of the multiple alignment concept,Figure 2 shows how it may be applied to medical diagnosis [18]. This is the best multiple alignment found by SP62 with a set of Oldpatterns that provide information about diseases and a set of Newpatterns that describe the symptoms of an imaginary patient, ‘JohnSmith’.In the example, all the New patterns appear in column 0. Theyshow the name of the patient (‘ < patient > John Smith < /patient > ’)and his symptoms (‘ < appetite > poor < /appetite > ’, ‘ < breathing > rapid < /breathing > ’, and so on).The Old patterns in the multiple alignment, one in each of columns1 to 5, have various roles: • Column 1 . This is simply a framework for different aspects of anydisease, to facilitate the building of multiple alignments. • Column 2 . This shows that the most likely explanation of JohnSmith’s symptoms is that he has inﬂuenza. • Column 3 . This pattern represents a set of ‘ﬂu symptoms’. The rea-son that they are not shown within the main pattern for inﬂuenza(column 2) is that the same symptoms can appear in other dis-eases, most notably smallpox. • Column 4 . This pattern shows the symptoms of ‘fever’. As before,they are shown in a separate pattern because this cluster of symp-toms can appear in several different diseases. • Column 5 . This pattern, ‘ < t1 > < /t1 > ’, is, in effect, a‘value’ for a ‘variable’ in column 4 (‘ < t1 > < /t1 > ’), which servesto record the patient’s temperature.More detail may be found in [18] and [19, Section 6.5]. One of the main strengths of the SP theory is that it has a lot more tosay about the nature of intelligence than earlier theories of comput-ing.A useful step forward in the development of these ideas wouldbe the creation of a version of the SP machine as a high-parallel,open-source, software virtual machine, accessible via the web to re-searchers everywhere, with a good user interface. This would providea means for researchers to explore what can be done with the systemand to improve it.

REFERENCES [1] F. Attneave, ‘Some informational aspects of visual perception’,

Psycho-logical Review , , 183–193, (1954).[2] H. B. Barlow, ‘Sensory mechanisms, the reduction of redundancy, andintelligence’, in The Mechanisation of Thought Processes , ed., HMSO,535–559, Her Majesty’s Stationery Ofﬁce, London, (1959).[3] H. B. Barlow, ‘Trigger features, adaptation and economy of impulses’,in

Information Processes in the Nervous System , ed., K. N. Leibovic,209–230, Springer, New York, (1969).[4] G. J. Chaitin, ‘Randomness in arithmetic’,

Scientiﬁc American , (1),80–85, (1988).[5] S. C. Kleene, ‘l-deﬁnability and recursiveness’, Duke MathematicalJournal , , 340–353, (1936).[6] M. Li and P. Vit´anyi, An Introduction to Kolmogorov Complexity andIts Applications , Springer, New York, 2009. As a matter of detail, the patterns in this multiple alignment are arranged incolumns instead of rows, so that the multiple alignment can be ﬁtted moreneatly on to a page.7] A. A. Markov and N. M. Nagorny,

The Theory of Algorithms , Kluwer,Dordrecht, 1988.[8] E. L. Post, ‘Formal reductions of the general combinatorial decisionproblem’,

American Journal of Mathematics , , 197–268, (1943).[9] J. Rissanen, ‘Modelling by the shortest data description’, Automatica-J,IFAC , , 465–471, (1978).[10] J. B. Rosser, ‘Highlights of the history of the lamda-calculus’, Annalsof the History of Computing (USA) , (4), 337–349, (1984).[11] R. J. Solomonoff, ‘A formal theory of inductive inference. Parts I andII’, Information and Control , , 1–22 and 224–254, (1964).[12] A. M. Turing, ‘On computable numbers with an application to theEntscheidungsproblem’, Proceedings of the London Mathematical So-ciety , , 230–265 and 544–546, (1936).[13] A. M. Turing, ‘Computing machinery and intelligence’, Mind , , 433–460, (1950).[14] C. S. Wallace and D. M. Boulton, ‘An information measure for classi-ﬁcation’, Computer Journal , (2), 185–195, (1968).[15] J. G. Wolff, ‘The discovery of segments in natural language’, BritishJournal of Psychology , , 97–106, (1977). See: bit.ly/Yg3qQb.[16] J. G. Wolff, ‘Learning syntax and meanings through optimization anddistributional analysis’, in Categories and Processes in Language Ac-quisition , eds., Y. Levy, I. M. Schlesinger, and M. D. S. Braine, 179–215, Lawrence Erlbaum, Hillsdale, NJ, (1988). See: bit.ly/ZIGjyc.[17] J. G. Wolff, ‘Simplicity and power—some unifying ideas in comput-ing’,

Computer Journal , (6), 518–534, (1990). See: bit.ly/R3ga9n.[18] J. G. Wolff, ‘Medical diagnosis as pattern recognition in a frame-work of information compression by multiple alignment, uniﬁcationand search’, Decision Support Systems , , 608–625, (2006). See:bit.ly/XE7pRG.[19] J. G. Wolff, Unifying Computing and Cognition: the SP Theory andIts Applications , CognitionResearch.org, Menai Bridge, 2006. ISBNs:0-9550726-0-3 (ebook edition), 0-9550726-1-1 (print edition). Distrib-utors, including Amazon.com, are detailed on bit.ly/WmB1rs. The pub-lisher and its website was previously CognitionResearch.org.uk.[20] J. G. Wolff, ‘Towards an intelligent database system founded on the SPtheory of computing and cognition’,

Data & Knowledge Engineering , , 596–624, (2007). See: bit.ly/Yg2onp.[21] J. G. Wolff, ‘Application of the SP theory of intelligence to the under-standing of natural vision and the development of computer vision’, (inpreparation).[22] J. G. Wolff, ‘The SP theory of intelligence: beneﬁts and applications’,(submitted for publication). Figure 2.